In an interesting turn of events, Apple has acknowledged using a controversial dataset for AI training while simultaneously distancing itself from the ethical concerns surrounding it. The tech giant confirmed it had utilized “The Pile,” a dataset compiled by AI research lab EleutherAI, which includes subtitles from YouTube videos obtained without explicit creator permission.

The dataset, which also incorporates content from Wikipedia, English Parliament records, and even Enron staff emails, was originally created to democratize AI development. However, its use by major tech companies like Apple, Nvidia, and Salesforce has raised eyebrows in the AI ethics community.

Apple, known for its stance on privacy and ethical data use, was quick to clarify its position. In statements to multiple tech publications, the company confirmed that while it had indeed used The Pile, it was not for its flagship Apple Intelligence project. Instead, Apple used the dataset to train its open-source OpenELM models, released in April.

“OpenELM was created purely for research purposes,” an Apple spokesperson told 9to5Mac. “It doesn’t power any of our AI or machine learning features, including Apple Intelligence.”

This revelation comes at a time when the AI industry is under increasing scrutiny for its data collection and training practices. Apple’s response seems calculated to maintain its image as a privacy-focused company while acknowledging its participation in broader AI research.

The company reiterated its commitment to ethical AI development, pointing out that its Apple Intelligence models are trained on licensed data and publicly available information collected by its web crawler. This stands in contrast to the YouTube subtitles and other potentially problematic data sources found in The Pile.

Furthermore, 9to5Mac reports that Apple “has no plans to build any new versions of the OpenELM model.”  So let’s just hope Apple and co. take steps to ensure the data they’re using isn’t scrapped from the web unethically. 

Dwayne Cubbins
400 Posts

For nearly a decade, I've been deciphering the complexities of the tech world, with a particular passion for helping users navigate the ever-changing tech landscape. From crafting in-depth guides that unlock your phone's hidden potential to uncovering and explaining the latest bugs and glitches, I make sure you get the most out of your devices. And yes, you might occasionally find me ranting about some truly frustrating tech mishaps.

Comments

Follow Us