This year, Air Canada lost a lawsuit against a customer who was misled by an AI chatbot into purchasing full-price plane tickets, being assured they would later be refunded under the company’s bereavement policy. The airline tried to claim the bot was “responsible for its own actions.” This line of argumentation was rejected by the court and the company not only had to pay compensation, it also received public criticism for attempting to distance itself from the situation. It’s clear companies are liable for AI models, even when they make mistakes beyond our control.
The rapidly advancing world of AI, and particularly generative AI, is looked at with a mixture of awe and apprehension by businesses. Seen as a double-edged sword, AI has been viewed as a catalyst that has the power to speed up productivity, allowing you to do far more with less; but with kinks that can lead to issues ranging from customer dissatisfaction to lawsuits.
This is what’s become popularly known as ‘AI hallucinations,’ or when an AI model provides answers that are incorrect, irrelevant, or nonsensical.
“Luckily, it’s not a very widespread problem. It only happens between 2% to maybe 10% of the time at the high end. But still, it can be very dangerous in a business environment. Imagine asking an AI system to diagnose a patient or land an aeroplane,” says Amr Awadallah, an AI expert who’s set to give a talk at VDS2024 on How Gen-AI is Transforming Business & Avoiding the Pitfalls.
But most AI experts dislike this term. The terminology, and what’s behind it, i.e. our misunderstanding of how these occurrences happen, can potentially lead to pitfalls with ripple effects into the future.
As former VP of Product Intelligence Engineering at Yahoo! and VP of Developer Relations for Google Cloud, Awadallah has seen the technology evolve throughout his career and has since founded Vectara, a company focused on using AI and neural network technologies for natural language processing to help companies take advantage of the benefits search relevance can bring.
We spoke with him to get some clarity on why this term is so controversial, what businesses need to understand about ‘AI hallucinations,’ and whether or not they can be solved.
Why AI models don’t ‘hallucinate’
Using the term hallucination implies that, when an AI model provides the wrong information, it’s seeing or feeling something that isn’t there. But that’s not what’s happening behind the lines of code that puts these models into operation.
It’s very common that we as humans fall into this type of trap. Anthropomorphism, or the innate tendency to attribute human traits, emotions, or intentions to non-human entities, is a mechanism we use to grapple with the unknown, by viewing it through a human lens. The ancient Greeks used it to attribute human characteristics to deities; today, we’re most likely to use it to interpret our pets’ actions.
There is a particular danger that we can fall into this trap with AI, as it’s a technology that’s become so pervasive within our society in a very short time, but very few people actually understand what it is and how it works. For our minds to comprehend such a complex topic, we use shortcuts.
“I think the media played a big role in that because it’s an attractive term that creates a buzz. So they latched onto it and it’s become the standard way we refer to it now,” Awadallah says.
But just as assuming a wagging tail in the animal world equals friendly, misinterpreting the outputs an AI gives can lead us down the wrong path.
“It’s really attributing more to the AI than it is. It’s not thinking in the same way we’re thinking. All it’s doing is trying to predict what the next word should be given all the previous words that have been said,” Awadallah explains.
If he had to give this occurrence a name, he would call it a ‘confabulation.’ Confabulations are essentially the addition of words or sentences that fill in the blanks in a way that makes the information look credible, even if it’s incorrect.
“[AI models are] highly incentivised to answer any question. It doesn’t want to tell you, ‘I don’t know’,” says Awadallah.
The danger here is that while some confabulations are easy to detect because they border on the absurd, most of the time an AI will present information that is very believable. And the more we begin to rely on AI to help us speed up productivity, the more we may take their seemingly believable responses at face value. This means companies need to be vigilant about including human oversight for every task an AI completes, dedicating more and not less time and resources.
The answers an AI model provides are only as good as the data it has access to and the scope of your prompt. Since AI relies on patterns within its training data, rather than reasoning, its responses might be fallible depending on the training data that’s available to it (whether that information is incorrect or it has little data on that particular query) or it may depend on the nature and context of your query or task. For example, cultural context can result in different perspectives and responses to the same query.
In the case of narrow domain knowledge systems, or internal AI models that are built to retrieve information within a specific set of data, such as a business’ internal system, an AI will only have space for a certain amount of memory. Although this is a much larger amount of memory than a human can retain, it’s not unlimited. When you ask it questions beyond the scope of its memory, it will still be incentivized to answer by predicting what the next words could be.
Can AI misinformation be solved?
There’s been a lot of talk about whether or not ‘confabulations’ can be solved.
Awadallah and his team at Vectara are developing a method to combat confabulations in narrow domain knowledge systems. The way they do this is by creating an AI model with the specific task of fact checking the output of other AI models. This is known as Retrieval Augmented Generation (RAG).
Of course, Awadallah admits, just like with human fact checkers, there is always the possibility that something will slip by an AI fact checker, this is known as a false negative.
For open domain AI models, like ChatGPT, which are built to retrieve information about any topic across the wide expanse of the world wide web, dealing with confabulations is a bit trickier. Some researchers recently published a promising paper on the use of “semantic entropy” to detect AI misinformation. This method involves asking an AI the same question multiple times and assigning a score based on how different the answers range.
As we edge closer and closer to eliminating AI confabulations, an interesting question to consider is, do we actually want AI to be factual and correct 100% of the time? Could limiting their responses also limit our ability to use them for creative tasks?
Join Amr Awadallah at the seventh edition of VDS to find out more about how businesses can harness the power of generative AI, while avoiding the risks, at VDS2024 taking place October 23-24 in Valencia.
Get the TNW newsletter
Get the most important tech news in your inbox each week.
Content provided by VDS and TNW