OpenAI to train LLMs on Financial Times content

The Financial Times (full disclosure — the owners of The Next Web) have inked a deal with OpenAI. The American firm will use the British publisher’s content to train its generative AI models.

The deal is the latest in a string of new partnerships between OpenAI and global news publishers like Axel Springer, Associated Press, and Le Monde. The company did not disclose the financial terms of any of the contracts.

In 2023 alone, hundreds of pages of litigation and countless articles accused tech firms of stealing artists’ and publishers’ work to train their AI models.

OpenAI has come under fire for training its GPT models on content scraped from the web without consent. Last year, The New York Times even sued OpenAI and Microsoft for copyright infringement.

OpenAI’s recent tie-ups with publishers will allow it to continue to train its algorithms on web content. But, this time, it will have permission.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

Strategic partnership

The FT called the deal with OpenAI a “strategic partnership.”

The 100 million-plus users of ChatGPT will have direct access to summaries, quotes, and links to the publisher’s articles. This content is usually hidden behind a paywall. OpenAI will attribute all information from the FT to the publication.

In exchange, OpenAI will help the news organisation develop new AI tools. The FT already uses OpenAI products, including ChatGPT Enterprise, we can confirm.

FT Group CEO John Ridding said the publisher was still committed to “human journalism.”

“This is an important agreement in a number of respects,” said Ridding. “It recognises the value of our award-winning journalism and will give us early insights into how content is surfaced through AI.”

“Apart from the benefits to the FT, there are broader implications for the industry. It’s right, of course, that AI platforms pay publishers for the use of their material,” Ridding continued. “OpenAI understands the importance of transparency, attribution, and compensation – all essential for us. At the same time, it’s clearly in the interests of users that these products contain reliable sources.”

Fair use or unfair?

However, just because OpenAI is cozying up to publishers doesn’t mean it’s not still scraping information from the web without permission.

Earlier this month, the New York Times reported that OpenAI was using Youtube scripts to train its models. According to the publication, this contravenes copyright laws, since YouTube creators who upload videos to the platform still retain the copyright to the content they create.

OpenAI, however, insists its use of online material constitutes “fair use.” The firm, and many other tech companies, claim their large language models (LLMs) transform information gathered online into something entirely new.

Yet, as we’ve previously reported in-depth, studies have shown that LLMs consistently regurgitate large chunks of their original training text verbatim.

Agreements with publishers could mark a potential step forward for AI copyright contentions. However, they are likely to remain more the exception than the rule.

Story by Siôn Geschwindt

Siôn is a climate and energy reporter at TNW. From nuclear fusion to escooters, he covers the length and breadth of Europe's clean tech ecos (show all) Siôn is a climate and energy reporter at TNW. From nuclear fusion to escooters, he covers the length and breadth of Europe's clean tech ecosystem. He's happiest sourcing a scoop, investigating the impact of emerging technologies, and even putting them to the test. Siôn has five years journalism experience and holds a dual degree in media and environmental science from the University of Cape Town, South Africa.