Sqoop is a tool designed to import data from relational databases into Hadoop. Sqoop uses JDBC to connect to a database. It examines each table’s schema and automatically generates the necessary classes to import data into the Hadoop Distributed File System (HDFS). Sqoop then creates and launches a MapReduce job to read tables from the database via DBInputFormat, the JDBC-based InputFormat. Tables are read into a set of files in HDFS. Sqoop supports both SequenceFile and text-based target and includes performance enhancements for loading data from MySQL.
C. Schmitz, G. Peled, и O. Koren. Proceedings of the International Conference on Information Integration and Web-Based Applications & Services (IIWAS 2021), (2021 hadoop hdfs fragmentation)
W. Tantisiriroj, S. Son, S. Patil, S. Lang, G. Gibson, и R. Ross. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, стр. 67:1--67:12. New York, NY, USA, ACM, (2011)