Voice User Interface
Encyclopedia
A Voice–user interface makes human interaction with computers possible through a voice/speech platform in order to initiate an automated
service or process.
A VUI is the interface to any speech application. Controlling a machine by simply talking to it was science fiction
only a short time ago. Until recently, this area was considered to be artificial intelligence
. However, with advances in technology, VUIs have become more commonplace, and people are taking advantage of the value that these hands-free, eyes-free interfaces provide in many situations.
However, VUIs are not without their challenges. People have very little patience for a "machine that doesn't understand". Therefore, there is little room for error: VUIs need to respond to input reliably, or they will be rejected and often ridiculed by their users. Designing a good VUI requires interdisciplinary talents of computer science
, linguistics
and human factors psychology
– all of which are skills that are expensive and hard to come by. Even with advanced development tools, constructing an effective VUI requires an in-depth understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction.
The characteristics of the target audience are very important. For example, a VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of power users (including field service workers), should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate "mixed initiative dialogs
", which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short, speech applications have to be carefully crafted for the specific business process that is being automated.
Not all business processes render themselves equally well for speech automation. In general, the more complex the inquiries and transactions are, the more challenging they will be to automate, and the more likely they will be to fail with the general public. In some scenarios, automation is simply not applicable, so live agent assistance is the only option. A legal advice hotline, for example, would be very difficult to automate. On the flip side, speech is perfect for handling quick and routine transactions, like changing the status of a work order, completing a time or expense entry, or transferring funds between accounts.
s or mobile phone
s, currently rely on small buttons for user input. These are either built into the device or are part of a touch-screen interface, such as that of the Apple iPod Touch
and iPhone
. Extensive button-pressing on devices with such small buttons can be tedious and inaccurate, so an easy-to-use, accurate, and reliable VUI would potentially be a major breakthrough in the ease of their use. Nonetheless, such a VUI would also benefit users of laptop- and desktop-sized computers, as well, as it would solve numerous problems currently associated with keyboard and mouse
use, including repetitive-strain injuries such as carpal tunnel syndrome
and slow typing speed on the part of inexperienced keyboard users. Moreover, keyboard use typically entails either sitting or standing stationary in front of the connected display; by contrast, a VUI would free the user to be far more mobile, as speech input eliminates the need to look at a keyboard.
Such developments could literally change the face of current machines and have far-reaching implications on how users interact with them. Hand-held devices would be designed with larger, easier-to-view screens, as no keyboard would be required. Touch-screen devices would no longer need to split the display between content and an on-screen keyboard, thus providing full-screen viewing of the content. Laptop computers could essentially be cut in half in terms of size, as the keyboard half would be eliminated and all internal components would be integrated behind the display, effectively resulting in a simple tablet computer
. Desktop computers would consist of a CPU and screen, saving desktop space otherwise occupied by the keyboard and eliminating sliding keyboard rests built under the desk's surface. Television remote control
s and keypads on dozens of other devices, from microwave ovens to photocopiers, could also be eliminated.
Numerous challenges would have to be overcome, however, for such developments to occur. First, the VUI would have to be sophisticated enough to distinguish between input, such as commands, and background conversation; otherwise, false input would be registered and the connected device would behave erratically. A standard prompt, such as the famous "Computer!" call by characters in science fiction TV shows and films such as Star Trek
, could activate the VUI and prepare it to receive further input by the same speaker. Conceivably, the VUI could also include a human-like representation: a voice or even an on-screen character, for instance, that responds back (e.g., "Yes, vamshi?") and continues to communicate back and forth with the user in order to clarify the input received and ensure accuracy.
Second, the VUI would have to work in concert with highly sophisticated software in order to accurately process and find/retrieve information or carry out an action as per the particular user's preferences. For instance, if Samantha prefers information from a particular newspaper, and if she prefers that the information be summarized in point-form, she might say, "Computer, find me some information about the flooding in southern China last night"; in response, the VUI that is familiar with her preferences would "find" facts about "flooding" in "southern China" from that source, convert it into point-form, and deliver it to her on screen and/or in voice form, complete with a citation. Therefore, accurate speech-recognition software
, along with some degree of artificial intelligence
on the part of the machine associated with the VUI, would be required.
Automation
Automation is the use of control systems and information technologies to reduce the need for human work in the production of goods and services. In the scope of industrialization, automation is a step beyond mechanization...
service or process.
A VUI is the interface to any speech application. Controlling a machine by simply talking to it was science fiction
Science fiction
Science fiction is a genre of fiction dealing with imaginary but more or less plausible content such as future settings, futuristic science and technology, space travel, aliens, and paranormal abilities...
only a short time ago. Until recently, this area was considered to be artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
. However, with advances in technology, VUIs have become more commonplace, and people are taking advantage of the value that these hands-free, eyes-free interfaces provide in many situations.
However, VUIs are not without their challenges. People have very little patience for a "machine that doesn't understand". Therefore, there is little room for error: VUIs need to respond to input reliably, or they will be rejected and often ridiculed by their users. Designing a good VUI requires interdisciplinary talents of computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, linguistics
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
and human factors psychology
Psychology
Psychology is the study of the mind and behavior. Its immediate goal is to understand individuals and groups by both establishing general principles and researching specific cases. For many, the ultimate goal of psychology is to benefit society...
– all of which are skills that are expensive and hard to come by. Even with advanced development tools, constructing an effective VUI requires an in-depth understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction.
The characteristics of the target audience are very important. For example, a VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of power users (including field service workers), should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate "mixed initiative dialogs
Dialog system
A dialog system or conversational agent is a computer system intended to converse with a human, with a coherent structure. Dialog systems have employed text, speech, graphics, haptics, gestures and other modes for communication on both the input and output channel.What does and does not constitute...
", which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short, speech applications have to be carefully crafted for the specific business process that is being automated.
Not all business processes render themselves equally well for speech automation. In general, the more complex the inquiries and transactions are, the more challenging they will be to automate, and the more likely they will be to fail with the general public. In some scenarios, automation is simply not applicable, so live agent assistance is the only option. A legal advice hotline, for example, would be very difficult to automate. On the flip side, speech is perfect for handling quick and routine transactions, like changing the status of a work order, completing a time or expense entry, or transferring funds between accounts.
Future Uses
Pocket-size devices, such as PDAPDA
A PDA is most commonly a Personal digital assistant, also known as a Personal data assistant, a mobile electronic device.PDA may also refer to:In science, medicine and technology:...
s or mobile phone
Mobile phone
A mobile phone is a device which can make and receive telephone calls over a radio link whilst moving around a wide geographic area. It does so by connecting to a cellular network provided by a mobile network operator...
s, currently rely on small buttons for user input. These are either built into the device or are part of a touch-screen interface, such as that of the Apple iPod Touch
IPod Touch
The iPod Touch is a portable media player, personal digital assistant, handheld game console, and Wi-Fi mobile device designed and marketed by Apple Inc. The iPod Touch adds the multi-touch graphical user interface to the iPod line...
and iPhone
IPhone
The iPhone is a line of Internet and multimedia-enabled smartphones marketed by Apple Inc. The first iPhone was unveiled by Steve Jobs, then CEO of Apple, on January 9, 2007, and released on June 29, 2007...
. Extensive button-pressing on devices with such small buttons can be tedious and inaccurate, so an easy-to-use, accurate, and reliable VUI would potentially be a major breakthrough in the ease of their use. Nonetheless, such a VUI would also benefit users of laptop- and desktop-sized computers, as well, as it would solve numerous problems currently associated with keyboard and mouse
Mouse (computing)
In computing, a mouse is a pointing device that functions by detecting two-dimensional motion relative to its supporting surface. Physically, a mouse consists of an object held under one of the user's hands, with one or more buttons...
use, including repetitive-strain injuries such as carpal tunnel syndrome
Carpal tunnel syndrome
Carpal Tunnel Syndrome is an entrapment idiopathic median neuropathy, causing paresthesia, pain, and other symptoms in the distribution of the median nerve due to its compression at the wrist in the carpal tunnel. The pathophysiology is not completely understood but can be considered compression...
and slow typing speed on the part of inexperienced keyboard users. Moreover, keyboard use typically entails either sitting or standing stationary in front of the connected display; by contrast, a VUI would free the user to be far more mobile, as speech input eliminates the need to look at a keyboard.
Such developments could literally change the face of current machines and have far-reaching implications on how users interact with them. Hand-held devices would be designed with larger, easier-to-view screens, as no keyboard would be required. Touch-screen devices would no longer need to split the display between content and an on-screen keyboard, thus providing full-screen viewing of the content. Laptop computers could essentially be cut in half in terms of size, as the keyboard half would be eliminated and all internal components would be integrated behind the display, effectively resulting in a simple tablet computer
Tablet computer
A tablet computer, or simply tablet, is a complete mobile computer, larger than a mobile phone or personal digital assistant, integrated into a flat touch screen and primarily operated by touching the screen...
. Desktop computers would consist of a CPU and screen, saving desktop space otherwise occupied by the keyboard and eliminating sliding keyboard rests built under the desk's surface. Television remote control
Remote control
A remote control is a component of an electronics device, most commonly a television set, used for operating the television device wirelessly from a short line-of-sight distance.The remote control is usually contracted to remote...
s and keypads on dozens of other devices, from microwave ovens to photocopiers, could also be eliminated.
Numerous challenges would have to be overcome, however, for such developments to occur. First, the VUI would have to be sophisticated enough to distinguish between input, such as commands, and background conversation; otherwise, false input would be registered and the connected device would behave erratically. A standard prompt, such as the famous "Computer!" call by characters in science fiction TV shows and films such as Star Trek
Star Trek
Star Trek is an American science fiction entertainment franchise created by Gene Roddenberry. The core of Star Trek is its six television series: The Original Series, The Animated Series, The Next Generation, Deep Space Nine, Voyager, and Enterprise...
, could activate the VUI and prepare it to receive further input by the same speaker. Conceivably, the VUI could also include a human-like representation: a voice or even an on-screen character, for instance, that responds back (e.g., "Yes, vamshi?") and continues to communicate back and forth with the user in order to clarify the input received and ensure accuracy.
Second, the VUI would have to work in concert with highly sophisticated software in order to accurately process and find/retrieve information or carry out an action as per the particular user's preferences. For instance, if Samantha prefers information from a particular newspaper, and if she prefers that the information be summarized in point-form, she might say, "Computer, find me some information about the flooding in southern China last night"; in response, the VUI that is familiar with her preferences would "find" facts about "flooding" in "southern China" from that source, convert it into point-form, and deliver it to her on screen and/or in voice form, complete with a citation. Therefore, accurate speech-recognition software
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
, along with some degree of artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
on the part of the machine associated with the VUI, would be required.
See also
- User interfaceUser interfaceThe user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...
- User interface engineering
- Speech recognitionSpeech recognitionSpeech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
- List of speech recognition software
- Voice browserVoice browserA voice browser is a web browser that presents an interactive voice user interface to the user. In addition, it typically provides an interface to the PSTN or a PBX. Just as a visual web browser works with HTML pages, a voice browser operates on pages that specify voice dialogues...
External links
- Voice Interfaces: Assessing the Potential by Jakob Nielsen