Spark is a fast, in-memory cluster computing framework with a language-integrated interface in Scala. It shines at iterative MapReduce (e.g. machine learning) and interactive data mining, where keeping data in memory provides substantial speedups.
Configuration variable "mapred.child.ulimit" can be used to control
the maximum virtual memory of the child (map/reduce) processes.
** value of mapred.child.ulimit > value of mapred.child.java.opts
MRQL (the Map-Reduce Query Language) is an SQL-like query language for map-reduce computations. It is implemented on top of Apache's Hadoop. MRQL is powerful enough to express most common data analysis tasks over many different kinds of raw data, including hierarchical data and nested collections, such as XML data. It is more powerful than other current languages, such as Hive and Pig Latin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit map-reduce code.
P. Sethia, and K. Karlapalem. Engineering Applications of Artificial Intelligence, 24 (7):
1120--1127(2011)Infrastructures and Tools for Multiagent Systems.
G. Limaye, J. Chaudhary, and P. Punjabi. International Journal on Recent and Innovation Trends in Computing and Communication, 3 (3):
1699--1703(March 2015)