DeepMind tells Google it has no idea how to make AI less toxic

Did you know Neural is taking the stage this fall? Together with an amazing line-up of experts, we will explore the future of AI during TNW Conference 2021. Secure your ticket now!

Opening the black box. Reducing the massive power consumption it takes to train deep learning models. Unlocking the secret to sentience. These are among the loftiest outstanding problems in artificial intelligence. Whoever has the talent and budget to solve them will be handsomely rewarded with gobs and gobs of money.

But there’s an even greater challenge stymieing the machine learning community, and it’s starting to make the world’s smartest developers look a bit silly. We can’t get the machines to stop being racist, xenophobic, bigoted, and misogynistic.

Nearly every big tech outfit and several billion-dollar non-profits are heavily invested in solving AI’s toxicity problem. And, according to the latest study on the subject, we’re not really getting anywhere.

From Shark Tank to Tinder Swindler

TNW Conference 2025 combines the latest breakthroughs in tech, the startup ecosystem & enterprise innovation

View the full agenda

The prob: Text generators, such as OpenAI’s GPT-3, are toxic. Currently, OpenAI has to limit usage when it comes to GPT-3 because, without myriad filters in place, it’s almost certain to generate offensive text.

In essence, numerous researchers have learned that text generators trained on unmitigated datasets (such as those containing conversations from Reddit) tend towards bigotry.

It’s pretty easy to reckon why: because a massive percentage of human discourse on the internet is biased with bigotry towards minority groups.

Background: It didn’t seem like toxicity was going to be an insurmountable problem back when deep learning exploded in 2014.

We all remember that time Google’s AI mistook a turtle for a gun right? That’s very unlikely to happen now. Computer vision’s gotten much better in the interim.

But progress has been less forthcoming in the field of NLP (natural language processing).

Simply put, the only way to stop a system such as GPT-3 from spewing out toxic language is to block it from doing so. But this solution has its own problems.

What’s new: DeepMind, the creators of AlphaGo and a Google sister company under the Alphabet umbrella, recently conducted a study of state-of-the-art toxicity interventions for NLP agents.

The results were discouraging.

Per a preprint paper from the DeepMind research team:

We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the REALTOXICITYPROMPTS dataset, this comes at the cost of reduced LM (language model) coverage for both texts about, and dialects of, marginalized groups.

Additionally, we find that human raters often disagree with high automatic toxicity scores after strong toxicity reduction interventions — highlighting further the nuances involved in careful evaluation of LM toxicity.

The researchers ran the intervention paradigms through their paces and compared their efficacy with that of human evaluators.

A group of paid study participants evaluated text generated by state-of-the-art text generators and rated its output for toxicity. When the researchers compared the human’s assessment to the machine’s, they found a large discrepancy.

AI may have a superhuman ability to generate toxic language but, like most bigots, it has no clue what the heck it’s talking about. Intervention techniques failed to accurately identify toxic output with the same accuracy as humans.

Quick take: This is a big deal. Text generators are poised to become ubiquitous in the business world. But if we can’t make them non-offensive, they can’t be deployed.

Right now, a text-generator that can’t tell the difference between a phrase such as “gay people exist” and “gay people shouldn’t exist,” isn’t very useful. Especially when the current solution to keeping it from generating text like the latter is to block it from using any language related to the LGBTQ+ community.

Blocking references to minorities as a method to solve toxic language is the NLP equivalent of a sign that says “for use by straight whites only.”

The scary part is that DeepMind, one of the world’s most talented AI labs, conducted this study and then forwarded the results to Jigsaw. That’s Google’s crack problem-solving team. It’s been unsuccessfully trying to solve this problem since 2016.

The near-future doesn’t look bright for NLP.

You can read the whole paper here.

Story by Tristan Greene

Editor, Neural by TNW

Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: (show all) Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: He/him

Get the TNW newsletter

Get the most important tech news in your inbox each week.

DeepMind tells Google it has no idea how to make AI less toxic

Get the TNW newsletter

Also tagged with

Proton launches ‘privacy-first’ AI email assistant to rival Google, Microsoft

Google DeepMind’s new AI robot plays table tennis at ‘human level’

Discover TNW All Access

Military vehicles to get mixed reality windshields controlled by human eyes

Dublin rejects Google’s new data centre plans over energy concerns