The emergence of generative AI has led to a surge in demand for vast amounts of data, such as images, videos, and text snippets, to train intelligent models.
This has resulted in a new market that has caught the attention of tech giants, who are scrambling to gather more data to feed their AI models.
We have released an in-depth report that provides the first detailed insight into this growing market.
________________________________________________________________________
- Tech giants are paying billions to license user data for AI training.
- Aging internet platforms like Photobucket cashing in on archived content.
- Privacy concerns over people’s data being used without consent.
________________________________________________________________________
Inside Big Tech’s Hushed Race to Buy AI Training Data
Aging Internet Platforms Get New Life
Many companies with vast collections of user-generated content from the peak of Web 2.0 are now capitalizing on their archives.
Photobucket, a popular image-hosting platform for early social networks like MySpace, is currently negotiating to license its 13 billion photos and videos to multiple tech firms at rates up to $1 per image and over $1 per video.
CEO Ted Leonard stated that one buyer wishes to acquire more than 1 billion videos alone, and he expressed surprise at the insatiable demand for data from AI.
Scraping the Open Web No Longer Enough
Initially, Big Tech firms like Google, Meta, and OpenAI trained their generative AI models using freely scraped data from across the internet.
But that’s drawn a wave of copyright lawsuits.
So they’ve pivoted to striking licensing deals for paywalled and restricted content.
“There is a rush right now to go for copyright holders with private collections,” said lawyer Edward Klaris, advising on deals worth tens of millions.
The Data Feeding Frenzy
Major stock media providers like Shutterstock have inked nine-figure deals to provide access to their libraries.
An entire AI data supply chain industry is booming, with brokers securing rights to podcasts, videos, chatbot logs, and more.
Rates can hit $300 per hour of video content.
The makers claim this licensed data is “ethically sourced.”
However, resurfacing ancient personal posts raises privacy concerns over people’s data being exploited without consent.
A $30 Billion Market Ahead?
According to market researchers, the opaque AI data market is expected to grow significantly from $2.5 billion to nearly $30 billion in the next ten years as generative AI becomes more dominant.
As a result, Big Tech companies are fiercely competing to maintain their dominance in the industry, leading to an underground data race that has only just begun.
Leave a Reply