Speech Recognition Grammar Specification
Encyclopedia
Speech Recognition Grammar Specification (SRGS) is a W3C standard for how speech recognition grammars are specified. A speech recognition grammar is a set of word patterns, and tells a speech recognition system what to expect a human to say. For instance, if you call an auto attendant
Automated attendant
In telephony, an automated attendant allows callers to be automatically transferred to an extension without the intervention of an operator/receptionist). Many AAs will also offer a simple menu system...

 application, it will prompt you for the name of a person (with the expectation that your call will be transferred to that person's phone). It will then start up a speech recognizer, giving it a speech recognition grammar. This grammar contains the names of the people in the auto attendant's directory and a collection of sentence patterns which are the typical responses from callers to the prompt.

SRGS specifies two alternate but equivalent syntaxes, one based on XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

, and one using Augmented BNF
Backus–Naur form
In computer science, BNF is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols.It is applied wherever exact descriptions of...

 format. In practice, the XML syntax is used more frequently.

If the speech recognizer returned just a string containing the actual words spoken by the user, the voice application would have to do the tedious job of extracting the semantic meaning from those words. For this reason, SRGS grammars can be decorated with tag elements, which when executed, build up the semantic result. SRGS does not specify the contents of the tag elements: this is done in a companion W3C standard, Semantic Interpretation for Speech Recognition
Semantic Interpretation for Speech Recognition
Semantic Interpretation for Speech Recognition defines the syntax and semantics of annotations to grammar rules in the Speech Recognition Grammar Specification...

 (SISR). SISR is based on ECMAScript
ECMAScript
ECMAScript is the scripting language standardized by Ecma International in the ECMA-262 specification and ISO/IEC 16262. The language is widely used for client-side scripting on the web, in the form of several well-known dialects such as JavaScript, JScript, and ActionScript.- History :JavaScript...

, and ECMAScript statements inside the SRGS tags build up an ECMAScript semantic result object that is easy for the voice application to process.

Both SRGS and SISR are W3C Recommendations, the final stage of the W3C standards track. The W3C VoiceXML
VoiceXML
VoiceXML is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser,...

 standard, which defines how voice dialogs are specified, depends heavily on SRGS and SISR.

Examples

Here is an example of the Augmented BNF form of SRGS, as it could be used in an auto attendant application:

#ABNF 1.0 ISO-8859-1;

// Default grammar language is US English
language en-US;

// Single language attachment to tokens
// Note that "fr-CA" (Canadian French) is applied to only
// the word "oui" because of precedence rules
$yes = yes | oui!fr-CA;

// Single language attachment to an expansion
$people1 = (Michel Tremblay | André Roy)!fr-CA;

// Handling language-specific pronunciations of the same word
// A capable speech recognizer will listen for Mexican Spanish and
// US English pronunciations.
$people2 = Jose!en-US | Jose!es-MX;

/**
* Multi-lingual input possible
* @example may I speak to André Roy
* @example may I speak to Jose
*/
public $request = may I speak to ($people1 | $people2);

Here is the same SRGS example, using the XML form:



"http://www.w3.org/TR/speech-grammar/grammar.dtd">
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/06/grammar
http://www.w3.org/TR/speech-grammar/grammar.xsd"
xml:lang="en-US" version="1.0">




yes
oui






Michel Tremblay
André Roy






Jose
Jose





may I speak to André Roy
may I speak to Jose

may I speak to







See also

  • SISR
  • VoiceXML
    VoiceXML
    VoiceXML is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser,...

  • Pronunciation Lexicon Specification
    Pronunciation Lexicon Specification
    The Pronunciation Lexicon Specification is a W3C Recommendation, which is designed to enable interoperable specification of pronunciation information for both speech recognition and speech synthesis engines within voice browsing applications...

     (PLS)
  • Natural Language Semantics Markup Language
    Natural Language Semantics Markup Language
    Natural Language Semantics Markup Language is a markup language for providing systems with semantic interpretations for a variety of inputs, including speech and natural language text input. Natural Language Semantics Markup Language is currently a World Wide Web Consortium Working Draft.-External...

  • JSGF
    JSGF
    JSGF stands for Java Speech Grammar Format or the JSpeech Grammar Format . Developed by Sun Microsystems, it is a textual representation of grammars for use in speech recognition for technologies like XHTML+Voice...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK