Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
Sqoop is a tool designed to import data from relational databases into Hadoop. Sqoop uses JDBC to connect to a database. It examines each table’s schema and automatically generates the necessary classes to import data into the Hadoop Distributed File System (HDFS). Sqoop then creates and launches a MapReduce job to read tables from the database via DBInputFormat, the JDBC-based InputFormat. Tables are read into a set of files in HDFS. Sqoop supports both SequenceFile and text-based target and includes performance enhancements for loading data from MySQL.
M. Bayir, I. Toroslu, A. Cosar, и G. Fidan. WWW '09: Proceedings of the 18th international conference on World wide web, стр. 161--170. New York, NY, USA, ACM, (2009)
M. Becker, H. Mewes, A. Hotho, D. Dimitrov, F. Lemmerich, и M. Strohmaier. International Conference Companion on World Wide Web, стр. 17--18. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2016)
C. Bellettini, M. Camilli, L. Capra, и M. Monga. Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2012 14th International Symposium on, стр. 295-302. IEEE Computer Society, (сентября 2012)
Q. Chen, A. Therber, M. Hsu, H. Zeller, B. Zhang, и R. Wu. Proceedings of the 2009 International Database Engineering & Applications Symposium, стр. 43--53. New York, NY, USA, ACM, (2009)
F. Chierichetti, R. Kumar, и A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, стр. 231--240. New York, NY, USA, ACM, (2010)