Google Duplex: An AI system to achieve real-world tasks over the phone

A new technology for conducting natural conversations.

Technology plays an important part in altering our lifestyle. Likewise, Human-Computer Interaction has expanded rapidly and steadily for three decades, attracting professionals from many other disciplines and incorporating diverse concepts and approaches.

In recent years, we have witnessed a revolution in the ability of computers to understand and to generate natural speech. Sometimes, it is often frustrating having to talk to stilted computerized voices that don’t understand natural language.

Google now have designed a machine that can deceive humans. Google has revealed a “terrifying” new technology i.e., Google Duplex, that you could use without ever knowing it. Google Duplex, a new technology for conducting natural conversations to carry out “real world” tasks over the phone.

It is an automated voice assistant that can book restaurant reservations, check opening hours and accomplish other tasks over the phone. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine.

There are several challenges in conducting natural conversations. For example, natural spontaneous speech people talk faster and less clearly or they use more complex sentences than when talking to computers. The problem is aggravated by phone calls, which often have loud background noises and sound quality issues.

Google Duplex tackles all these issues and makes conversations sound natural with advances in understanding, interacting, timing, and speaking. It involves a recurrent neural network (RNN) that is built with TensorFlow Extended (TFX). The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more.

Incoming sound is processed through an ASR system. This produces text that is analyzed with context data and other inputs to produce a response text that is read aloud through the TTS system.
Incoming sound is processed through an ASR system. This produces text that is analyzed with context data and other inputs to produce a response text that is read aloud through the TTS system.

At last, Whatever the task is, it clearly suggests the answers. The system is capable of carrying out sophisticated conversations and it completes the majority of its tasks fully autonomously, without human involvement. It has a self-monitoring capability, which allows it to recognize the tasks it cannot complete autonomously (e.g., scheduling an unusually complex appointment). In these cases, it signals to a human operator, who can complete the task.

Erik Brynjolfsson, a Massachusetts Institute of Technology professor and director of its Initiative on the Digital Economy said, “This technology is amazing, and [a] big step forward, but I don’t think the main goal of AI should be to mimic humans. Instead, AI researchers should make it as easy as possible for humans to tell whether they are interacting with another human or with a machine.”

Developers reported, “We trained Duplex on in-call practices that are typically simple for humans but challenging for machines, including “elaborations” (“for when?”), “syncs” (“can you hear me?”), “interruptions” (“can you start over?”) and “pauses” (“can you hold?”).

To prevent it from sounding too stilted or robotic, the system was also taught a number of so-called “speech disfluencies”: The “hmms,” “uhs” and other noises people make in casual conversation. Like humans, the AI makes those sounds to convey that it’s still gathering its thoughts.”

In cases where the task is too complex or the call goes awry, Google says, the AI will pass the call to a human operator.

A Google official said the service was different from those calls because it’s not for solicitation or telemarketing. The official added that the automated assistant will only call companies on phone numbers offered to the public for booking appointments or doing business.

Madeline Lamo, a University of Washington graduate student researching robotic harms and free speech, said the Google AI could also effectively flip the robocall dynamic on its head. “Instead of vendors and scammers using AI to contact potential consumers/scam victims en masse. The consumers are now empowered to make robocalls themselves.”

For users, Google Duplex is making supported tasks easier. Instead of making a phone call, the user simply interacts with the Google Assistant, and the call happens completely in the background without any user involvement.

Another benefit for users is that Duplex enables delegated communication with service providers in an asynchronous way, e.g., requesting reservations during off-hours, or with limited connectivity. It can also help address accessibility and language barriers, e.g., allowing hearing-impaired users, or users who don’t speak the local language, to carry out tasks over the phone.

See stories of the future in your inbox every morning.