How to avoid the uncanny valley in voice design

By Dawn Harpster
0 min read

While conversation design is all about building natural, human-like conversations, a bot that sounds a little too human can make us uneasy.
But why does it make us feel that way? And why do conversation designers need to be aware of it?
What is voice design?
Voice design is both an art and a science. It’s more than just writing a script to take the user from point A to point B and collect some information along the way. The designer needs to bridge the gap between technical functionality—the needs of the machine—while also accounting for the sometimes unpredictable behaviour of the human users—the needs of the humans. Every interaction is the sum of its parts, and must be taken in totality while creating a successful voice-centered design. The mapped user journey needs to be functional, feel natural, and accomplish its designed task.
Besides being a good writer, a voice designer needs to always be an advocate for the end-user and must also be aware of the limitations of human-to-machine voice interactions. For example, when two people are speaking on the phone, there are verbal cues to indicate when it is time for the other person to speak; a long pause, a subtle “um-hum” to indicate that someone is still listening, or even something very overt like, “what do you think?”. The absence of such cues in human-to-machine conversations can cause turn-taking problems which can quickly derail an interaction and cause the interaction to fail. This is just one example of the many considerations a designer needs to take into account when creating a voice-centered interaction.
Can an experience feel “too human”?
I once had the opportunity to hold a chimpanzee. I’d always been intrigued by chimps. I thought they were cute when I’d see them on TV. I loved when they would mimic human behavior. I suppose somewhere in my head I was expecting a chimp to be more like a dog.
When the chimp, the size of a toddler, was placed into my eager arms I was delighted and happy. We studied each other for a few minutes and I stared into the chimp’s human-like brown eyes. Then the chimp reached up to gently touch my hair.
And that is when I got really, really creeped out. Some invisible line had been crossed in that moment and I couldn’t give the chimp back to her handler fast enough. It was a feeling I didn’t expect and, at the time, couldn’t really explain.
This was my first foray into the uncanny valley. While the phenomenon is generally applied to artificial intelligence and computer-generated entities, it’s an apt metaphor to describe what I was feeling about the chimpanzee at that moment.

What is the uncanny valley?
The short answer is that humans don’t respond well to artificial intelligence (AI) systems (or chimps) that appear and behave in a way that is perceived as too human. It makes us uneasy. It creeps us out. We want our AI to be helpful and speed up tasks. We don’t want the AI to try to mimic human appearance, behavior, or emotions too closely.
One theory is that an AI that is too human causes cognitive dissonance. Humans are wired to classify everything as a survival mechanism. Interacting with an AI that is too human conflicts with our sensory information; we know it is an AI, but its human-like behavior or appearance conflicts with our perception and therefore, it makes us uncomfortable.
TV and films using computer-generated imagery (CGI) for character design have discovered that characters that appear too human are off-putting to viewers, and have had to make adjustments for the characters to appear “less human” to make the viewers more comfortable. Even when CGI depicts a well-known human character, like Carrie Fisher’s Princess Leia, it can be off-putting.
Some brands have slipped and fallen into the uncanny valley with brand-specific, human-like avatars. In 2018, a British bank tested a “digital human” and even gave it a name. It could greet customers, listen to queries, and answer customer questions. The results were reportedly very positive, but these tests have not translated into widespread implementation across UK branches and social media clips of this “digital human” have since been deleted.
In 2019, a high-end fashion brand had an unusual campaign featuring a top model alongside a virtual influencer, a human avatar with an online personality with over 3 million followers on Instagram. The campaign was designed to promote social change, but instead resulted in social backlash. Unfortunately for these brands, our appreciation for the creative and technical skill behind these human-like avatars tends to be followed by that creepy, too close for comfort feeling.
How to avoid the uncanny valley in voice design.
The backend systems that virtual agents access provide a wealth of data about callers. Depending on the use case, a backend system might have access to personal details about the caller including name, address, family information (like children’s names and birthdates), marital status, health conditions, bank balance, or recent orders. How you present this information to users can fill them with confidence, or give them an unsettling—big brother is watching—feeling. When creating virtual agents with access to customer data, only present and use what is necessary to make the interaction as frictionless and convenient as possible.
Empathy is a complex emotion and, given the standard transactional consumer applications of most virtual agents, is tricky to use in the best of times, and impossible in the worst of times. Given the strong reaction that humans have to other humans who inappropriately express or fail to express empathy, it is hardly surprising that humans react just as strongly to virtual systems that attempt to express empathy and miss the mark. While there are scores of research studies about how humans can feel empathy for a humanized robot, very little research exists about how humans feel about voice-only virtual agents expressing empathy to them in an interaction.
Avoid having the virtual agent express empathy. While our AI systems can detect sentiment, users don’t respond well to a virtual agent saying things like: “I understand you’re upset” or “I can tell that you’re angry.” This ventures into the uncanny valley, where something non-human seems a little too human and can make users feel uneasy.
Instead of having the virtual agent express empathy, offer sympathetic phrases that are simply statements of fact about the situation, not the emotion of the customer. For example, “I can help you with that.” “Let me help you fix that.” Move to, and focus on, solutions as quickly as possible to help diffuse the situation.
Imagine that you’re attempting to resolve a billing mistake. You’re already angry about money that has been removed from your bank account—money that you may have needed to pay rent or put gas in the car. You’re afraid of additional bank fees for rejected transactions. You may be speaking louder than you usually would and the voice agent says, “I can tell that you’re upset.” On the surface, this seems like a harmless statement. But nothing will escalate negative emotion in a human user faster than having a robot attempt to quantify or validate their negative feelings. Especially when that user is already reactive because they’re calling about a data breach, lost credit card, missing shipment, or a broken product. Unless you are 100% certain that the empathetic response your virtual agent is giving is completely accurate and appropriate to every possible user situation, it’s best not to try.
Concluding thoughts about voice design.
If your virtual agent sounds too human, looks too human, or acts too human, you’ve got a problem. All three of these can steer your customers straight into the uncanny valley. With the increased use and accuracy of sentiment detection, the thorny problem of empathy response in virtual agents is closer than ever to being solved. But until that day comes, use extreme caution—your attempt to soothe a customer’s feelings may backfire, right into your CSAT scores! So, as brutal as it may sound, you shouldn’t try to empathize with your customers through a virtual agent. It is okay if your bot feels and looks like a bot, as long as the customers feel they are getting their answers as quickly and efficiently as possible.
We know that the way you design customer conversations matters more than the artificial intelligence behind it. If you want to learn more about how to put tried and tested conversation design principles into practice when implementing a virtual agent see the following resources:
- Handbook: Designing customer conversations: Best practices.
- Product demo: Talkdesk Virtual Agent ™
- Webinar: Contact center masterclass: How to maximise ROI with conversational AI.
- Blog: Why conversational AI architects love human-in-the-loop.
FAQs
What is conversational AI?
Conversational AI refers to technologies that can understand both speech and text inputs and correctly respond in a natural, human-like manner. This technology uses machine learning (ML) and natural language processing (NLP), but the key to conversational AI is its use of natural language understanding (NLU) as a core feature. This enables it to imitate human interactions, and create natural conversational flows on both voice and digital channels.
What are virtual agents?
A virtual agent is an AI-powered agent assistant which is able to autonomously solve customer issues through a conversational experience. Virtual agents can answer a wide range of customer questions with great accuracy and are not limited by rule-based pre-defined flows. These tools can increase the customer self-service rate, improve the customer experience, and increase agent satisfaction.








