A collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.
The collection represents a collaboration between LabROSA and The Echo Nest. More details, background, and instructions on how to use the datasets can be found at LabROSA’s site. The goal of sharing this data on Infochimps is to provide a large dataset for research and to encourage large-scale algorithms surrounding the data.
There is one dataset for each letter of the alphabet (A-Z) containing data for all songs that start with that letter, one dataset of additional files, and a small sample dataset.
Each of the datasets for each letter consists of song files in the HDF5 format.
Most of the data is licensed the same way as Echo Nest’s API. The code is under GNU public license.
Kaggle is a platform for data prediction competitions. Companies, organizations and researchers post their data and have it scrutinized by the world's best statisticians.
Lieber Herr Michalk, vor einiger Zeit haben wir uns bei einer Konferenz kennen gelernt, bei der ich mich zum Verhalten der Kulturindustrie im Allgemeinen und der Musikindustrie im Besonderen äußern durfte.
S. Hachmeier, R. Jäschke, and H. Saadatdoorabi. Proceedings of the Conference on ``Lernen, Wissen, Daten, Analysen'', 3341, page 213--226. Aachen, (2022)
A. Correya, R. Hennequin, and M. Arcos. (2018)cite arxiv:1808.10351Comment: Music Information Retrieval, Cover Song Identification, Million Song Dataset, Natural Language Processing.
A. Vaglio, R. Hennequin, M. Moussallam, and G. Richard. Proceedings of the 22nd International Society for Music Information Retrieval Conferenc, Society for Music Information Retrieval, (November 2021)
X. Du, Z. Yu, B. Zhu, X. Chen, and Z. Ma. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, page 551--555. IEEE, (June 2021)