Google’s Latest Privacy Policy Update Reveals the Hidden Role of Public Data in AI Training!

TL;DR

Google’s updated privacy policy confirms the use of publicly available data for training its AI models and services.
The policy now includes “AI models” and other systems built on Google’s cloud platform, expanding beyond language models and Google Translate.
Controversies arise regarding data scraping practices and concerns over copyright infringement and license violation.

Google has updated its privacy policy. The company has confirmed that it uses publicly available data from the internet to train its AI models and services.

The updated policy mentions the use of public data for training Google’s AI models, including its chatbot Bard and search engine.

Previously, the policy only referred to “language models” and Google Translate, but now it includes “AI models” and other systems built on its cloud platform.

The company claims that this update does not fundamentally change the way it trains its AI models and that privacy principles and safeguards are incorporated into the development process.

Controversies Surrounding Data Scraping for AI Training

AI developers have been scraping various sources such as the internet, photo albums, books, social networks, and more to collect training data for AI systems.

This practice has raised concerns regarding copyright infringement and license violation, as the training data often includes protected material.

Lawsuits have been filed against companies like Stability AI, accused of misusing millions of images from Getty Images, and OpenAI and Microsoft, accused of scraping personal information and source code without consent.

The debate continues on whether such data scraping falls under fair use and whether the output of AI models constitutes a new form of work or a direct copy of the original data.

Impact on Access to Data and Licensing

People have become more aware of how AI models are trained. Due to this reason, some internet businesses have started charging developers for access to their data through APIs.

The social media platforms like Stack Overflow, Reddit, and Twitter have introduced charges or new rules for accessing their content.

Similarly, the platform like Shutterstock and Getty, have chosen to license their images to AI model builders and have formed partnerships with companies like Meta and Nvidia.

Source(S): The Register.