Sqoop is a tool designed to import data from relational databases into Hadoop. Sqoop uses JDBC to connect to a database. It examines each table’s schema and automatically generates the necessary classes to import data into the Hadoop Distributed File System (HDFS). Sqoop then creates and launches a MapReduce job to read tables from the database via DBInputFormat, the JDBC-based InputFormat. Tables are read into a set of files in HDFS. Sqoop supports both SequenceFile and text-based target and includes performance enhancements for loading data from MySQL.
C. Schmitz, G. Peled, and O. Koren. Proceedings of the International Conference on Information Integration and Web-Based Applications & Services (IIWAS 2021), (2021 hadoop hdfs fragmentation)
W. Tantisiriroj, S. Son, S. Patil, S. Lang, G. Gibson, and R. Ross. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 67:1--67:12. New York, NY, USA, ACM, (2011)