Developing and Deploying VoiceXML Applications

Table of Contents

Developing VoiceXML Applications
Developing VoiceXML Applications
Using Document Type Definitions
Using Speech Recognition
Using DTMF Input
Extending VoiceXML with the Voice Browser Step
Placing Outbound Calls
Developing International Applications

Developing VoiceXML Applications


Voice eXtensible Markup Language (VoiceXML) is a web-based markup language for representing human-computer dialogs.

You can use the steps of the Cisco Customer Response Applications (CRA) Editor to design scripts that take advantage of the capabilities of VoiceXML.

This chapter describes how to use the capabilities of Cisco CRA 3.0 to develop VoiceXML applications. VoiceXML assumes a voice browser with audio output, including computer-synthesized Text-To-Speech (TTS), recorded speech, voice audio input, and DTMF (Dual Tone Multi-Frequency) digits.

Cisco CRA Voice Browser currently supports VoiceXML 1.0 and a subset of new VoiceXML 2.0 elements.


Note   Developing applications that include Automatic Speech Recognition (ASR) is generally more difficult than developing non-ASR applications or IVR (Interactive Voice Response) scripts. To ensure a good user experience, Cisco recommends engaging the services of a professional services company or consultant to develop or assist in developing most ASR applications.

This chapter includes the following sections:

Developing VoiceXML Applications

A VoiceXML script is a text-based XML (eXtensible Markup Language) document. You can use numerous tools to write VoiceXML, including simple text editors, server scripting languages, and third-party VoiceXML editors.

Example 12-1 shows a sample VoiceXML script.


Example 12-1   Sample VoiceXML Script
<?xml version="1.0"?>
<!DOCTYPE vxml SYSTEM "file:voicexml.dtd">
<vxml version="1.0">
<form>
<field name="name">
<prompt>Please speak your name?</prompt>
<grammar>[Peter Paul Mary]</grammar>
</field>
<filled>
Your name is <value expr = "name" />
</filled>
</form>
</vxml>

Using Document Type Definitions

A document type definition (DTD) defines the validity of an XML document. See "VoiceXML Implementation for Cisco Voice Browser," for the DTD defined for VoiceXML 1.0. You can instruct the parser to validate the document by referencing the document in the document type declaration.

The Cisco CRA Voice Browser includes a custom version of voicexml.dtd with some minor enhancements. The simplest way to use voicexml.dtd is to reference it with the URI (Uniform Resource Identifier) "file:voicexml.dtd". The document in Example 12-1 , for example, references the DTD by including the following line:

<!DOCTYPE vxml SYSTEM "file:voicexml.dtd">

You can also deploy DTD with your documents on the document server, by providing the URI to access the file in the XML document type definition.

Using DTD is optional. In the development phase, it can help you catch syntax errors in VoiceXML documents. After you test the code and find no syntax errors, you may choose not to use DTD in the production phrase to maximize efficient performance (by eliminating the need for parsing the DTD file itself and the validation process.)

Using Speech Recognition

VoiceXML accepts speech input, based on grammars configured to characterize the scope of recognition.

You define grammars using the <grammar> element.

Example 12-2 shows how to create an application that asks for and recognizes two words, "coffee" and "tea".


Example 12-2   Sample VoiceXML Script using Speech Recognition
<form>
<field name="choice">
Would you like some coffee or tea?
<!-- understand either 'coffee" or "tea" -->
<grammar>
[coffee tea]
</grammar>
<filled>
Just a second.
Your <value expr="choice"/> is ready.
</filled>
</field>
</form>

CRA Voice Browser supports Nuance GSL (Grammar Specification Language) grammar, which is a powerful language for specifying speech input.

In Example 12-3 , the first example has been enhanced to understand additional words and phrases, such as "hot coffee", "coffee", "espresso", "hot tea", and "tea". It also utilizes the slot attribute to assign return values.

For example, when a user says "espresso", the GSL grammar understands it to mean "coffee".


Example 12-3   VoiceXML Script with Expanded Recognition
<form>
<field name="choice">
Would you like some coffee or tea?
<!-- understand either 'coffee" or "tea" -->
<grammar>
<![CDATA[
[
(?hot coffee) {<choice "coffee">}
espresso {<choice "coffee">}
(?hot tea) {<choice "tea"> }
]
]]>
</grammar>
<filled>
Just a second.
Your <value expr="choice"/> is ready.
</filled>
</field>
</form>

In this example, the exchange might sound like this:

System: "Would you like some coffee or tea?"

Caller: "Hot tea."

System: "Just a second. Your tea is ready."


Note   As shown in the use of the <![CDATA[ . . . ]]> construct, GSL allows some characters such as angle brackets, which have special meaning in XML. In such cases, you enclose the GSL grammar between <![CDATA[ and ]]> to signify that it should be embedded as is, without XML processing.

Using DTMF Input

DTMF is a common form of caller input in IVR applications, and in many cases it is a good idea to design applications that give callers the choice to use either DTMF or speech input.

DTMF input is most commonly used for such purposes as menu navigation, getting a digit string (such as an account number) from the caller, and recognizing a digit pattern.

You can use one of the following methods to allow the script to determine when the DTMF input from the caller is complete:

  • A caller enters a specific termination key; for example, the "#" key.
  • A specified number of seconds have passed without the caller entering a tone.
  • The caller enters a predefined number of tones.

You can use the following DTMF properties to specify when the sequence is complete:

  • termchar—The terminating DTMF character for DTMF input recognition.

The default value is "#". The value of empty string means no terminating DTMF character is defined.

  • termtimeout—The terminating timeout to use when recognizing DTMF input.

The default value is "4s".

  • com.cisco.dtmf.termlength—Sets the maximum number of DTMF tones that can be entered and interpreted within a single recognition. Once the caller has entered this number of tones, the result is returned to the Voice Browser.

The default value is 32.

This section contains the following topics:

Using DTMF for Menu Navigation

One of the most common uses of DTMF is to allow users to navigate a menu of choices. You can use <menu> element to accomplish this navigation, as follows:

1. Use the <menu> element for the main prompt.

2. For each choice, insert a <choice> element.

3. Specify the DTMF key associated with the item and the next item to jump to when the item is chosen.

You may also insert a grammar in the <choice> element. This insertion allows the user to select the item either by pressing the DTMF key or by speaking the grammar item.

As an example of a banking application, the script gives the caller the following three choices, and then instructs the caller to press "*" when finished:

  • For checking account balance, press 1.
  • For savings account balance, press 2.
  • For credit card balance, press 3.

In this example, the caller presses 3. The system informs the caller "Sorry, you don't have a credit card account with us," and again offers the caller the same three choices. The caller presses "*", the system says "Thank you. Good-bye!" and ends the call.

Example 12-4 shows the scripting for this example.


Example 12-4   Using DTMF
<menu id="main">
<prompt>
For checking account balance, press 1.
For savings account balance, press 2.
For credit card balance, press 3.
Press * when you are finished.
</prompt>
<!-- Just 1 digit is enough -->
<property name="com.cisco.dtmf.termlength" value="1"/>
<choice dtmf="1" next="CheckBalance.jsp">
checking ?account
</choice>
<choice dtmf="2" next="SavingBalance.jsp">
savings ?account
</choice>
<choice dtmf="3" next="#credit">
credit ?card
</choice>
<choice dtmf="*" next="#exit">
[finish goodbye bye]
</choice>
</menu>
<form id="credit">
<block>
Sorry, you don't have a credit card account with us.
<goto next="#main"/>
</block>
</form>
<form id="exit">
<block>
Thank you. Good-bye!
</block>
</form>

The above example accepts both DTMF input and speech input. For example, rather than entering 2, a caller can also say "savings" to check the savings account balance.

In the example above, you use the "com.cisco.dtmf.termlength" property to limit the maximum number of digits to 1, so that the result is returned immediately after the caller inputs a single digit, without having to wait for a timeout or for the user to press a terminating key.

In addition to the <menu> and <choice> elements, VoiceXML also provides the <option> element, which you can use in a form for a similar purpose.

Receiving Digit String Input

You can use built-in "digits" grammar to accept digit strings such as credit card account information. You use the type attribute in a <field> element to select the built-in grammar.

Example 12-5 shows how to receive DTMF input for a credit card account number.


Example 12-5   Receiving Digit String Input
<form>
<field name="creditNumber" type="digits">
Please enter your credit card number.
Press the pound key when finished.
<filled>
Your credit card number is
<value expr="creditNumber" class="digits" mode="recorded"/>
<exit namelist="creditNumber"/>
</filled>
</field>
</form>

Using DTMF Grammar

The most flexible way to accept DTMF input is to use DTMF grammars that define how tones will be interpreted.

To include DTMF tones in your grammar, use the format "dtmf-0" for each of the tones 0 through 9. You can also use "dtmf-star" for the star ("*") key, "dtmf-pound" for the pound ("#") key, and "dtmf-?" to indicate an unknown key.

Example 12-6 shows a sample grammar that allows the caller to enter digits from the touch-tone pad. This example requests the user to use DTMF digits to enter PIN (Personal Identification Number) information. The grammar can be generated from the server from a user information database. It recognizes the key sequence 4-3-2-1.


Example 12-6   Using DTMF Grammar
<form>
<field name="getPin">
Please enter your pin
<!-- The pin is 4321 -->
<grammar>
(dtmf-4 dtmf-3 dtmf-2 dtmf-1)
</grammar>
<!-- accept input after user entered 4 digits -->
<property name="com.cisco.dtmf.termlength" value="4"/>
<nomatch>
Your input is incorrect. Please enter again.
</nomatch>
<nomatch count="3">
Sorry access is denied.
<exit/>
</nomatch>
</field>
<block>
Thank you. Please wait while we access your account.
</block>
</form>

In the example above, you use the "com.cisco.dtmf.termlength" property to limit the maximum number of digits to 4, so that the result is returned immediately after 4 digits are input, without having to wait for a timeout or for the caller to press a terminating key.

Extending VoiceXML with the Voice Browser Step

CRA Voice Browser is fully integrated with the CRA Engine. You can use scripts designed in the CRA Editor to extend VoiceXML applications by providing ICD (Integrated Contact Distribution) call control and resource management.

For example, you can use VoiceXML to build a speech dialog as a front end to collect information from the caller. You can then pass this information to the CRA script, and when the agent receives the call, the information collected by VoiceXML will be available.

You use the Voice Browser step in the Media palette of the CRA Editor to invoke a VoiceXML application.

You can use the bundled voicebrowser.aef script as an example for creating scripts that invoke VoiceXML. (You can create custom scripts to execute other steps in addition to VoiceXML.)

Figure 12-1 shows the voicebrowser.aef script as it appears in the Design pane of the CRA Editor.


Figure 12-1   Voicebrowser.aef Script


This script performs the following tasks:

1. Accepts the call

2. Starts Voice Browser

3. Terminates the call

4. Ends the script and releases system resources

The Voice Browser Step

You configure the Voice Browser step to access the URL by using the information stored in the string variable uri.

Figure 12-2 shows the configured General tab of the Voice Browser customizer window.


Figure 12-2   Voice Browser Customizer Window—Configured General Tab


The URL contains the VoiceXML programming that the caller accesses through the Voice Browser step.

To pass information to the VoiceXML document server, use the Request Parameters tab.

Figure 12-3 shows the Request Parameters tab of the Voice Browser customizer window.


Figure 12-3   Voice Browser Customizer Window—Request Parameters Tab


Use the Return Parameters tab to receive return values from the application.

Figure 12-4 shows the configured Return Parameters tab of the Voice Browser customizer window.


Figure 12-4   Voice Browser Customizer Window—Configured Return Parameters Tab


For example, the script in Example 12-5 returns the credit card number collected.

In order to pass this information to the script variable, add the mapping of the VoiceXML "creditNumber" to the "creditNumber" variable in the Return Parameters tab.

Placing Outbound Calls

You can also use VoiceXML to make outbound calls. The sample script outboundvoicebrowser.aef is provided with Cisco CRA for your convenience in building outbound VoiceXML applications.

Unlike inbound IVR applications, an outbound application first places a call and then executes the VoiceXML dialog. You can invoke the call by making an HTTP request to the CRA Engine, defining the destination phone number and VoiceXML URI as parameters.

Developing International Applications

Cisco CRA Voice Browser can generate TTS prompts and recognize speech in selected languages. In addition, Voice Browser localizes built-in grammars such as date and time. The script automatically activates the grammar for specific languages based on the language context of the call.

For a list of built-in grammar support, see Built-in Type Implementation of "VoiceXML Implementation for Cisco Voice Browser."

You can use select a language for an application in one of several ways:

  • Configure the language of the application in the CRA Administration web interface. This method is the most convenient for applications that use a single language. (For more information on language configuration, refer to the Cisco Customer Response Applications Administrator Guide.)
  • Use the Set Contact Info in the CRA script before invoking the Voice Browser step, in order to make language information available to the script.
  • Use the xml:lang attribute on the <vxml>, <grammar>, or <prompt> element. With this method, scripts can use multiple languages.
    • To specify the language for a VoiceXML document, use the xml:lang attribute in the <vxml> element.
    • To specify the language for individual prompt or grammar, set the xml:lang attribute in the <prompt> or <grammar> element.

The examples below illustrate the use of the xml:lang attribute.

Example 12-7 is a main menu that requests users to select the language, prompting users in both English and Spanish. The xml:lang attributes in the <prompt> elements specify the language to use for each prompt.


Example 12-7   mainmenu.vxml
<?xml version="1.0" encoding="ISO-8859-1"?>
<vxml version="1.0">
<form>
<field name="language">
<!-- read in English -->
<prompt xml:lang="en">
For English, press 1.
</prompt>
<!-- read in Spanish -->
<prompt xml:lang="es-MX">
Para Español, oprima 2.
</prompt>
<grammar>[dtmf-1 dtmf-2]</grammar>
<filled>
<if cond="language=='1'">
<goto next="info_en.vxml"/>
<elseif cond="language=='2'"/>
<goto next="info_es.vxml"/>
</if>
</filled>
</field>
</form>
</vxml>

If the user selects Spanish, the script executes the document info_es.vxml, as shown in Example 12-8. The xml:lang attribute of the <vxml> element specifies the Spanish language for the entire document.


Example 12-8   info_es.vxml
<?xml version="1.0" encoding="ISO-8859-1"?>
<vxml version="1.0" xml:lang="es-MX">
<form>
<field name="q">
<prompt>
¿Desea escuchar las noticias o el tiempo?
</prompt>
<grammar>
[(?las noticias) (?el tiempo)]
</grammar>
<filled>
<submit next="getInfo.jsp"/>
</filled>
</field>
</form>
</vxml>

Note   When you create non-English XML files, you must accurately set the character encoding. XML uses Unicode (UTF-8) by default, but you can use other encoding methods. For example, many Western European language text editors use ISO-8859-1 (latin-1) encoding by default. In this case, you must set the encoding attribute of the XML declaration correctly, as shown in the example above.

Although you may specify xml:lang in <grammar>, note that CRA does not support recognition of multiple languages at the same time. If the <grammar> elements specify conflicting languages, the last one specified will take precedence.