Classify music genre from a 10-second audio stream using a convolutional neural network trained on mel-frequency spectrograms.
MusicGenreClassification is a deep learning project that automatically identifies the genre of a 10-second music sample using a convolutional neural network. It solves the problem of music genre classification by processing audio into mel-frequency spectrograms and training a model to distinguish between ten genres. The project serves as a research implementation exploring effective techniques for audio-based machine learning tasks.
Researchers, students, and developers interested in audio signal processing, music information retrieval, and practical applications of convolutional neural networks for classification tasks.
It provides a complete, documented pipeline from dataset collection to model training, using a larger dataset and modern CNN architecture to achieve better results than some earlier academic papers on multi-genre classification.
Classify music genre from a 10 second sound stream using a Neural Network.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses a subset of the Million Song Dataset via 7Digital API, which is larger than GTZAN and improved results over prior work, as noted in the README for better generalization.
Employs mel-frequency spectrograms over MFCCs, which the README states provided 'extremely better results' for genre separation, enhancing classification accuracy.
Offers full scripts from data downloading (previewDownloader.py) to training (train.py), making it a self-contained example for audio deep learning workflows.
Implements a convolutional neural network with three hidden layers and max pooling, tailored for audio pattern recognition, as demonstrated in the results section.
Relies on the 7Digital API for dataset acquisition, which requires developer approval and may change or incur costs, limiting reproducibility and scalability.
Mel-frequency processing increases training time compared to MFCCs, as admitted in the README, making it resource-intensive for large-scale experiments.
Hardcoded for only ten specific genres (e.g., blues, rock) with no built-in mechanism for adding new genres, reducing adaptability for broader applications.
Scripts like preproccess.py and formatInput.py require manual setup and understanding of audio preprocessing, lacking production-ready documentation or deployment guides.