January 22, 2017 – Based on the Consumer Electronics Show (CES) in Las Vegas this month, speech and voice recognition have passed a critical point with word error rates of 6.3%. That’s an improvement over the last 4 years from 23% and approximates human error rates. Stated Shawn DuBravac, Chief economist for the Consumer Technology Association (CTA) “We’ve seen more progress in this technology in the last 30 months than we saw in the last 30 years….Ultimately vocal computing is replacing the traditional graphical user interface.”
CTA estimates current usage at about 5 million voice-activated digital assistants such as, OK Google, Siri, Cortana and Echo. DuBravac expects the number to double in 2017. This is consistent with a move away from the traditional model of what is a computer. Today voice and speech can be built into not just smart phones, but all kinds of technology from wearables to vacuum cleaners. We can talk to machines and they can talk to us and say more than, “I don’t understand. Please repeat the question.”
I grew up with voice recognition technology. My first experiences were with DragonDictate, a speech-to-text interface that required me to train the software to recognize how I spoke. This was the early 1990s and if you wanted to talk to your computer you needed to install a dedicated digital signal processor. Even then speaking was concatenated. You had to separate each word, or you could teach the voice system to recognize a phrase and type it using word processing software like WordPerfect and Microsoft Word. Recognition accuracy after about 6 months of use was better than 90%. Macros within WordPerfect gave users the ability to voice move and formatting commands such as “next page” or “bold sentence.” It was a great leap forward but hardly a product for mass use.
Phone interfaces using voice recognition with carefully constructed menus and scripts got introduced in the first decade of this century. Some were very sophisticated but just as many were annoying as hell, making you want to press “0” to talk to a real human. They have been much improved. Today I can call a number and talk to an automaton that understands when I want to halt home delivery of my newspaper. It seems pressing “0” isn’t needed as much anymore.
And now that we are in the second decade of the century central processors are powerful and tiny. Voice and speech in smartphones is almost a no brainer. And sensory-laden smart devices can discern continuous speech with high degrees of accuracy, and they can talk back to you as well. There are very few visual interfaces found in the smart technology that uses voice principally, things like wearable computing, (i.e., clothes), home assistants and robots, smart appliances, and smart devices. An entire home can be voice enabled with the latest built-in Internet-of-Things (IoT) sensory and device technology.
On full display at CES this year, Amazon’s Alexa stole the show. Google Home came a close second. Show goers could interact with Amazon’s home helper doing such mundane and daily activities as dictating a shopping list, making a voice request to turn off an appliance, dimming the room’s lights, lowering or opening window blinds, checking on restaurant reservations, doctor’s appointments, finding out what movies are playing locally, or the lowest airfare to book a holiday flight.
Interacting by voice with equipment on the factory floor is a new hot area of development. AT&T has starter kits, application program interfaces (APIs) designed to voice enable industrial equipment. Stated Mobeen Khan, Associate VP for IoT Solutions at AT&T “If I’m an engineer maintaining a machine with voice abilities, I could talk to it and run diagnostics just by speaking to it.”
Automobile manufacturers have bought into voice interfaces as an effective way to interact with the technologies available in today’s cars and trucks. Besides using smartphone voice apps like OK Google and Siri, speech and natural-language recognition is expected to be common in new vehicles by 2019 with the majority featuring the interface. Nvidia, the graphic card manufacturer, demonstrated DriveWorks and AI-CoPilot, sophisticated artificial intelligence and voice recognition technology for automobiles at CES. The technology is capable of discerning the difference between an ongoing phone conversation in the car and a command request. It also can alert drivers to hazards ahead using signals, speech and even respond to a threat when the operator seems distracted. It can even read lips.
I was also excited when the Dragon software arrived and bought a professional version but somehow I was not able to achieve the level of productivity that I had hoped for. Perhaps, there are too many parameters that voice/speech recognition tools need to content with like background noise as well as the mostly inaudible noises that most computing devices generate because of the ground noise produced by electrical circuits. Also there are privacy and nuisance issues while using it in public. A new approach may be necessary to make the best use of this exciting technology.
[…] Shawn DuBravac of the Consumer Technology Association, the world has seen more progress in VUI in the last 30 months than in the last 30 years, adding that ultimately, vocal computing is replacing the traditional […]
[…] Shawn DuBravac of the Consumer Technology Association, the world has seen more progress in VUI in the last 30 months than in the last 30 years, adding that ultimately, vocal computing is replacing the traditional […]
[…] Voice and Speech Recognition Go Mainstream Among Computer Interfaces […]
[…] Voice and Speech Recognition Go Mainstream Among Computer Interfaces […]
[…] Voice and Speech Recognition Go Mainstream Among Computer Interfaces […]
[…] Voice and Speech Recognition Go Mainstream Among Computer Interfaces […]