This article was published on July 7, 2022

Why ‘facial expression recognition’ AI is a total scam

Congratulations researchers, you just found the zillionth way to use basic prediction models to harm humans


Why ‘facial expression recognition’ AI is a total scam

A team of researchers at Jilin Engineering Normal University in China recently published a paper indicating they’d built an AI model capable of recognizing human facial expressions.

I’m going to save you some time here: they most certainly did not. Such a thing isn’t currently possible.

The ability to accurately recognize human emotions is what we here at Neural would refer to as a “deity-level” feat. The only people who truly know how you’re feeling at any given moment are you and any potential omnipotent beings out there.

But you don’t have to take my word for it. You can arrive at the same conclusion using your own critical thinking abilities.

Up front: The research is fundamentally flawed because it conflates facial expression with human emotion. You can falsify this premise by performing a simple experiment: assess your current emotional state then force yourself to make a facial expression that presents in diametric opposition to it.

If you’re feeling happy and you’re able to “act” sad, you’ve personally debunked the whole premise of the research. But, just for fun, let’s keep going.

Background: Don’t let the hype fool you. The researchers don’t train the AI to recognize expressions. They train the AI to beat a benchmark. There’s absolutely no conceptual difference between this system and one that tries to determine if an object is a hotdog or not.

What this means is the researchers built a machine that tries to guess labels. They’re basically showing their AI model 50,000 pictures, one at a time, and forcing it to choose from a set of labels.

The AI might, for example, have six different emotions to choose from — happy, sad, angry, scared, surprised, etc. — and no option to say “I don’t know.”

That’s why AI devs might run hundreds of thousands or even millions of “training iterations” to train their AI. The machines don’t figure things out using logic, they just try every possible combination of labels and adjust to feedback.

It’s a bit more complex than that, but the big important idea here is that the AI doesn’t care about or understand the data it’s parsing or the labels it’s applying.

You could show it pictures of cats and force it to “predict” whether each image was “Spiderman in disguise” or “the color yellow expressed in visual poetry” and it would apply one label or the other to each image.

The AI devs would tweak the parameters and run the models again until it was able to determine which cats were which with enough accuracy to pass a benchmark.

And then you could change the data back to pictures of human faces, keep the stupid “Spiderman” and “color yellow” labels, and retrain it to predict which labels fit the faces.

The point is that AI doesn’t understand these concepts. These prediction models are essentially just machines that stand in front of buttons pushing them randomly until someone tells them they got it right.

What’s special about them is that they can push tens of thousands of buttons in a matter of seconds and they never forget which order they pushed them in.

The problem: All of this seems useful because, when it comes to outcomes that don’t affect humans, predictions models are awesome.

When AI models try to predict something objective, such as whether a particular animal is a cat or a dog, they’re aiding human cognition.

You and I don’t have the time to go through every single image on the internet when we’re trying to find pictures of a cat. But Google’s search algorithms do.

That’s why you can search for “cute kitty cats” on Google and get back thousands of relevant pics.

But AI can’t determine whether a label is actually appropriate. If you label a circle with the word “square,” and train an AI on that label, it will just assume anything that looks like a circle is a square. A five-year-old human would tell you that you’ve mislabeled the circle.

Neural take: This is a total scam. The researchers present their work as useful for “fields like human–computer interactions, safe driving … and medicine,” but there’s absolutely no evidence to support their assertion.

The truth is that “computer interactions” have nothing to do with human emotion, safe driving algorithms are more efficacious when they focus on attention instead of emotionality, and there’s no place in medicine for weak, prediction-based assessments concerning individual conditions.  

The bottom line is simple: You can’t teach an AI to identify human sexuality, politics, religion, emotion, or any other non-intrinsic quality from a picture of their face. What you can do is perform prestidigitation with a prediction algorithm in hopes of exploiting human ignorance.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with