Kaggle is a platform for data prediction competitions. Companies, organizations and researchers post their data and have it scrutinized by the world's best statisticians.
A collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.
The collection represents a collaboration between LabROSA and The Echo Nest. More details, background, and instructions on how to use the datasets can be found at LabROSA’s site. The goal of sharing this data on Infochimps is to provide a large dataset for research and to encourage large-scale algorithms surrounding the data.
There is one dataset for each letter of the alphabet (A-Z) containing data for all songs that start with that letter, one dataset of additional files, and a small sample dataset.
Each of the datasets for each letter consists of song files in the HDF5 format.
Most of the data is licensed the same way as Echo Nest’s API. The code is under GNU public license.
S. Chen, J. Moore, D. Turnbull, and T. Joachims. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, page 714--722. New York, NY, USA, ACM, (2012)
J. Moore, S. Chen, D. Turnbull, and T. Joachims. Conference of the International Society for Music Information Retrieval Conference (ISMIR), page 401-406. (2013)
S. Chen, J. Xu, and T. Joachims. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 865--873. New York, NY, USA, ACM, (2013)
S. Chen, J. Moore, D. Turnbull, and T. Joachims. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 714--722. New York, NY, USA, ACM, (2012)