MRQL (the Map-Reduce Query Language) is an SQL-like query language for map-reduce computations. It is implemented on top of Apache's Hadoop. MRQL is powerful enough to express most common data analysis tasks over many different kinds of raw data, including hierarchical data and nested collections, such as XML data. It is more powerful than other current languages, such as Hive and Pig Latin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit map-reduce code.
Configuration variable "mapred.child.ulimit" can be used to control
the maximum virtual memory of the child (map/reduce) processes.
** value of mapred.child.ulimit > value of mapred.child.java.opts
Evolution of Google File System August 08, 2009 07:26:40 EDT There is an interesting interview about the evolution of the Google File System in ACM Queue. I think it is readable by anybody, not just ACM members. One of the morals of this story is that, even if you are building what you think will be the world's biggest, you still will make design decisions that you know are not scalable because you know how to implement them. It is better to get something running right away and start using it. Of course, they also ran into scalability problems that they did not expect. So, some of the evolution of GFS was planned, and some was unplanned.