Massive 300TB Archive of Spotify’s Top Songs Leaked

▼ Summary
– Anna’s Archive, a shadow library, announced it scraped and released a massive 300-terabyte torrent of Spotify’s music files and metadata.
– The archive claims this data represents over 99% of Spotify listens, creating the largest public music metadata database and an open music preservation archive.
– Spotify confirmed it is investigating the incident, stating a third party used illicit tactics to scrape public metadata and circumvent DRM for audio files.
– Anna’s Archive stated its motivation was preservation and building an authoritative, comprehensive archive of all music, similar to book datasets used to train AI.
– The legality and full scale of the scrape are unclear, with Spotify disabling the accounts involved and considering further action.
A massive trove of data reportedly scraped from Spotify has been released online, creating a significant controversy in the music and tech industries. A shadow library known as Anna’s Archive announced it has distributed approximately 300 terabytes of metadata and music files through bulk torrents. The organization claims this dataset captures over 99 percent of listens on the platform, positioning it as the largest publicly available music metadata collection with 256 million tracks. It also describes the release as the world’s first fully open “preservation archive” for music, containing 86 million individual audio files.
The leaked files are said to represent about 37 percent of the songs available on Spotify as of mid-2025. The scraping process prioritized popular content, filtering out tracks with minimal or no streams, including many low-quality or AI-generated songs. This selective approach aimed to compile a high-value dataset from the streaming service’s vast catalog.
Spotify has confirmed it is investigating the claims. The company stated that an internal review identified unauthorized scraping of public metadata and the circumvention of digital rights management (DRM) protections to access audio files. Spotify representatives noted they have disabled the user accounts believed to be responsible for the unlawful data collection, though the full scale of the breach remains under examination. It is not yet clear what legal actions, if any, the streaming giant might pursue to have the torrents removed.
For the operators of Anna’s Archive, the project was motivated by a desire to create a permanent, open resource. They stated that after discovering a method to scrape Spotify at a large scale, they saw an opportunity to build a music archive focused on long-term preservation. The Spotify data was considered a foundational step toward a more ambitious goal: constructing an authoritative, comprehensive list of torrents representing all music ever produced.
The archive’s organizers pointed out that no such universal catalog currently exists for music, drawing a parallel to the shadow library LibGen. That repository has been infamously used by various technology companies, including Meta and AI startups like Anthropic, to access pirated book datasets for training artificial intelligence models. This latest leak underscores the ongoing tension between open-access advocates and content platforms over data ownership, copyright, and the ethical sourcing of information for AI development.
(Source: Ars Technica)




