The main use cases for Spark are iterative Machine Learning algorithms and Interactive analytics. From the ML side -------------------- Most ML algorithms ru...
9781449327279 - Hadoop Operations - If you’ve been tasked with the job of maintaining large and complex Hadoop clusters, or are about to be, this book is a must. You’ll learn the particulars of Hadoop operations, from planning, installing, and configuring the system to providing ongoing maintenance.
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Event-Detection - DBSCAN Algorithm in Map/Reduce logic, implemented with Hadoop and MongoDB, to analyze tweets and photos and to create geolocated events
C. Schmitz, G. Peled, and O. Koren. Proceedings of the International Conference on Information Integration and Web-Based Applications & Services (IIWAS 2021), (2021 hadoop hdfs fragmentation)