Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
Sqoop is a tool designed to import data from relational databases into Hadoop. Sqoop uses JDBC to connect to a database. It examines each table’s schema and automatically generates the necessary classes to import data into the Hadoop Distributed File System (HDFS). Sqoop then creates and launches a MapReduce job to read tables from the database via DBInputFormat, the JDBC-based InputFormat. Tables are read into a set of files in HDFS. Sqoop supports both SequenceFile and text-based target and includes performance enhancements for loading data from MySQL.
M. Bayir, I. Toroslu, A. Cosar, и G. Fidan. WWW '09: Proceedings of the 18th international conference on World wide web, стр. 161--170. New York, NY, USA, ACM, (2009)
M. Becker, H. Mewes, A. Hotho, D. Dimitrov, F. Lemmerich, и M. Strohmaier. International Conference Companion on World Wide Web, стр. 17--18. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2016)
C. Bellettini, M. Camilli, L. Capra, и M. Monga. Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2012 14th International Symposium on, стр. 295-302. IEEE Computer Society, (сентября 2012)
Q. Chen, A. Therber, M. Hsu, H. Zeller, B. Zhang, и R. Wu. Proceedings of the 2009 International Database Engineering & Applications Symposium, стр. 43--53. New York, NY, USA, ACM, (2009)
F. Chierichetti, R. Kumar, и A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, стр. 231--240. New York, NY, USA, ACM, (2010)
F. Chierichetti, R. Kumar, и A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, стр. 231--240. New York, NY, USA, ACM, (2010)
H. chih Yang, A. Dasdan, R. Hsiao, и D. Parker. SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, стр. 1029--1040. New York, NY, USA, ACM, (2007)
H. chih Yang, A. Dasdan, R. Hsiao, и D. Parker. SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, стр. 1029--1040. New York, NY, USA, ACM, (2007)
C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, и K. Olukotun. Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems Vancouver, British Columbia, Canada, December 4-7, 2006, стр. 281-288. MIT Press, (2006)
R. Cordeiro, C. Jr., A. Traina, J. López, U. Kang, и C. Faloutsos. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, стр. 690-698. ACM, (2011)