The world's largest shadow library—which is increasingly funded by AI developers—shocked the Internet this weekend by announcing it had "backed up Spotify" and started distributing 300 terabytes of metadata and music files in bulk torrents.
According to Anna's Archive, the data grab represents more than 99 percent of listens on Spotify, making it "the largest publicly available music metadata database with 256 million tracks." It's also "the world’s first 'preservation archive' for music which is fully open," with 86 million music files, the archive boasted.
The music files supposedly represent about 37 percent of songs available on Spotify as of July 2025. The scraped files were prioritized by popularity, with Anna's Archive weeding out many songs that are never streamed or are of poor quality, such as AI-generated songs.
On Monday, Spotify told Android Authority on Monday that it was investigating whether Anna's Archive had actually scraped its platform "at scale," as its blog claimed.
"An investigation into unauthorized access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM to access some of the platform’s audio files," Spotify said. "We are actively investigating the incident."
It's unclear how much Spotify data was actually scraped, Android Authority noted, or if the company will possibly pursue legal action to take down the torrents. Asked for comment, a Spotify spokesperson told Ars that "Spotify has identified and disabled the nefarious user accounts that engaged in unlawful scraping."
For Anna's Archive, the temptation to scrape the data may have been too much after stumbling upon "a way to scrape Spotify at scale," supposedly "a while ago."
"We saw a role for us here to build a music archive primarily aimed at preservation," the archive said. Scraping Spotify data was a "great start," they said, toward building an "authoritative list of torrents aiming to represent all music ever produced."
A list like that "does not exist for music," the archive said, and would be akin to LibGen—which was used by tech giants like Meta and startups like Anthropic to notoriously pirate book datasets to train AI.
Releasing the metadata torrents this December was the first step toward achieving this "preservation" mission, Anna's Archive said. Next, the Archive will release torrents of music files, starting with the most popular streams first, then eventually releasing torrents of less popular songs and album art. In the future, "if there is enough interest, we could add downloading of individual files to Anna’s Archive," the blog said.
Spotify told Ars that it's taking steps to avoid any future scraping.
"We’ve implemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behavior," Spotify's spokesperson said. "Since day one, we have stood with the artist community against piracy, and we are actively working with our industry partners to protect creators and defend their rights."
"This is insane": Users fear data grab will doom archive
Anna's Archive claimed that the Spotify data was scraped to help preserve "humanity’s musical heritage," protecting it "forever" from "destruction by natural disasters, wars, budget cuts, and other catastrophes."
However, some Anna's Archive fans—who largely use the search engine to find books, academic papers, and magazine articles—were freaked out by the news that Spotify data was scraped. On Hacker News, some users questioned whether the data would be useful to anyone but AI researchers, since searching bulk torrents for individual songs seemed impractical for music fans.
One user pointed out that "there are already tools to automatically locate and stream pirated TV and movie content automatic and on demand"—suggesting that music fans could find a way to stream the data. But others worried Anna's Archive may have been baited into scraping Spotify, perhaps taking on legal risks that AI companies prone to obscuring their training data sources likely wish to avoid.
"This is insane," a top commenter wrote. "Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?"
But Anna's Archive is clearly working to support AI developers, another noted, pointing out that Anna's Archive promotes selling "high-speed access" to "enterprise-level" LLM data, including "unreleased collections." Anyone can donate "tens of thousands" to get such access, the archive suggests on its webpage, and any interested AI researchers can reach out to discuss "how we can work together."
"AI may not be their original/primary motivation, but they are evidently on board with facilitating AI labs piracy-maxxing," a third commenter suggested.
Meanwhile, on Reddit, some fretted that Anna's Archive may have doomed itself by scraping the data. To them, it seemed like the archive was "only making themselves a target" after watching the Internet Archive struggle to survive a legal attack from record labels that ended in a confidential settlement last year.
"I'm furious with AA for sticking this target on their own backs," a redditor wrote on a post declaring that "this Spotify hacking will just ruin the actual important literary archive."
As Anna's Archive fans spiraled, a conspiracy was even raised that the archive was only "doing it for the AI bros, who are the ones paying the bills behind the scenes" to keep the archive afloat.
Ars could not immediately reach Anna's Archive to comment on users' fears or Spotify's investigation.
On Reddit, one user took comfort in the fact that the archive is "designed to be resistant to being taken out," perhaps preventing legal action from ever really dooming the archive.
"The domain and such can be gone, sure, but the core software and its data can be resurfaced again and again," the user explained.
But not everyone was convinced that Anna's Archive could survive brazenly torrenting so much Spotify data.
"This is like saying the Titanic is unsinkable" that user warned, suggesting that Anna's Archive might lose donations if Spotify-fueled takedowns continually frustrate downloads over time. "Sure, in theory data can certainly resurface again and again, but doing so each time, it will take money and resources, which are finite. How many times are folks willing to do this before they just give up?"
This story was updated to include Spotify's statement.
Read full article
Comments