Glasses with AI can read a silent speech

Acoustic sensing enables silent speech recognition without eyewear.


EchoSpeech uses speakers and microphones mounted on a glass frame, emitting inaudible sound waves toward the skin.EchoSpeech collects small skin deformations induced by silent utterances and uses them to detect silent speech by analyzing echoes from numerous routes.

The use of silent speech interfaces (SSI) has gained popularity recently. Silent speech has more application situations than spoken speech because it does not require users to vocalize sounds, which has limitations.

Ruidong Zhang, an information science doctorate student, is wearing EchoSpeech. This silent-speech recognition interface uses acoustic detection and artificial intelligence to continually recognize up to 31 unvocalized commands based on lip and mouth motions.

According to researchers, The low-power, wearable interface, developed by Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) Lab, takes only a few minutes of user training data before it can recognize commands and run on a smartphone.

Image showing EchoSpeech
Credit: Cornell Chronicle

The EchoSpeech glasses, equipped with microphones and speakers the size of pencil erasers, transform into a wearable AI-powered sonar system, sending and receiving soundwaves across the face and tracking mouth movements. The echo profiles are then analyzed in real-time by a deep learning system built by SciFi Lab researchers, with around 95% accuracy. 

Zhang is the lead author of “EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing,” which will be presented this month in Hamburg, Germany, at the Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI). 

Zhang from Cornell University said, “For people who cannot vocalize sound, this silent speech technology could be an excellent input for a voice synthesizer. It could give patients their voices back.”

EchoSpeech could be used to communicate with others via smartphone in places where voice is inconvenient or improper, such as a busy restaurant or a quiet library, with further development. The silent speaking interface can also be utilized with design tools such as CAD when connected with a stylus, eliminating the need for a keyboard and mouse.

He also said, “We’re moving sonar onto the body.” 

The SciFi Lab has created various wearable gadgets that use machine learning and tiny, miniature video cameras to track body, hand, and facial movements. The lab recently switched from cameras to acoustic sensors to track face and body movements, citing excellent battery life, stricter security and privacy, and smaller, more compact gear as reasons. EchoSpeech is based on the lab’s related acoustic-sensing technology, EarIO, a worn earbud that records facial movements.

According to Cheng Zhang, Most silent-speech recognition technology is limited to a small set of prepared orders. It needs the user to face or wear a camera, which is neither practical nor practicable. He noted that there are also significant privacy risks with wearable cameras for both the user and others with whom the user interacts.

EchoSpeech, an acoustic detection technology, eliminates the need for wearable video cameras. According to François Guimbretière, professor of information science at Cornell Bowers CIS and a co-author, Because audio data is significantly smaller than image or video data, it requires less bandwidth to analyze. It can be conveyed to a smartphone in real time over Bluetooth.

He said, “And because the data is processed locally on your smartphone instead of uploaded to the cloud. Privacy-sensitive information never leaves your control.”

According to Cheng Zhang, battery life also rises enormously: ten hours with acoustic sensing versus 30 minutes with a camera.

SciFi Lab researchers are investigating smart-glass applications that track the facial, eye, and upper body motions in future work. 

The researcher said, “We think glass will be an important personal computing platform to understand human activities in everyday settings.” 

The National Science Foundation contributed to the funding of this study.

Journal Reference:

  1. Ruidong Zhang, Zhengnan Lai, et al. EchoSpeech: Continuous Silent Speech Recognition onMinimally-obtrusive Eyewear Powered by Acoustic Sensing. 10.1145/3544548.3580801
Latest Updates