Dataset explainer
LAION-DISCO-12M: What It Is, and Whether Your Music Is in It
One of the largest public music datasets used in AI research. Here is what it actually contains, in plain terms, and how to check if your catalog is listed.
What it is, in one line
LAION-DISCO-12M is a public dataset of about 12.6 million music tracks and their metadata, released by the nonprofit LAION in November 2024 to help researchers train AI models on music. LAION calls it the largest publicly available music dataset. It is one of the collections that has circulated among AI music developers.
What is actually in it
The dataset is a list of links and metadata, not audio. Each entry points to a track on YouTube and records fields like song title, artist names, album, view count, and duration. LAION publishes the references and metadata, not the music files.
That distinction matters for artists. If your song is in LAION-DISCO-12M, it means your track was catalogued in a dataset built for AI research and shared among developers. It is a clear signal that your work is sitting in reachable training data. It is not, by itself, proof that any specific company downloaded the audio or trained a model on it.
How it was built
LAION started from a seed list of 45,218 artistsand recursively followed the "Fans might also like" recommendations on YouTube Music, walking the related-artist graph until no new artists turned up. That snowball produced about 250,000 artists and roughly 12.6 million songs. Because discovery worked through recommendation graphs, independent and mid-tier artists are well represented, not just superstars.
Why it matters for you
In 2026, reporting (including The Atlantic) described several large music datasets being passed around among AI developers. LAION-DISCO-12M is one of the public ones, which is exactly why it is checkable. You do not have to wonder whether your music is in this particular collection. You can look.
And if it is there and your song is unregistered, that is the gap worth closing, because the legal paths that could pay a creator only work for registered copyrights.
How to check if your music is in it
Run the free check: search your name, paste your titles, or import from Spotify. We cross-reference your catalog against LAION-DISCO-12M and the Free Music Archive (about 13.8 million tracks combined), and against 6.3 million U.S. Copyright Office records at the same time, so you see exposure and registration in one pass.
Sources
Frequently asked
What is LAION-DISCO-12M?
LAION-DISCO-12M is a public dataset released by LAION in November 2024 containing about 12.6 million links to music tracks on YouTube along with their metadata. LAION describes it as the largest publicly available music dataset, built to advance machine-learning research on audio and music foundation models.
Does LAION-DISCO-12M contain the actual audio of my songs?
No. The dataset holds links and metadata (song title, artist names, album, view count, duration), not the audio files themselves. A match means your track is catalogued in a dataset compiled for AI research and shared among developers, not that your recording was copied into it.
How was the dataset built?
LAION started from a seed list of 45,218 artists and recursively followed the "Fans might also like" recommendations on YouTube Music, expanding outward until no new artists were found. That produced roughly 250,000 artists and about 12.6 million songs.
How do I check if my music is in LAION-DISCO-12M?
Run the free check on Copyright Check. Search your artist or songwriter name, paste your titles, or import from Spotify, and we cross-reference your catalog against LAION-DISCO-12M and the Free Music Archive (roughly 13.8 million tracks total).
Check your own music free
See which of your songs appear in the public AI training datasets, and which are registered with the U.S. Copyright Office. Free, no signup.