Communication between the Cisco Unified CVP VXML Server and the
Voice Browser is based on request-response cycles using VoiceXML over HTTP. VoiceXML
documents are linked together by using the Uniform Resource Identifiers (URI),
a standardized technology to reference resources within a network. User input
is carried out by web forms similar to HTML. Therefore, forms contain input
fields that are edited by the user and sent back to a server.
Resources for the Voice Browser are located on the Unified CVP
VXML Server. These resources are VoiceXML files, digital audio, instructions for
speech recognition (Grammars) and scripts. Every Communication process between
the VoiceXML browser and Voice Application has to be initiated by the VoiceXML browser
as a request to the Unified CVP VXML Server. For this purpose, VoiceXML files
contain Grammars which specify expected words and phrases. A Link contains the
URL that refers to the Voice application. The browser connects to that URL as
soon as it recovers a match between spoken input and one of the Grammars.
Note
Cisco Unified CVP VXML Server is coresident with the Call Server and Media Server.
When gauging Unified CVP VXML Server performance, consider the
following key aspects:
Use of prerecorded audio versus Text-to-Speech (TTS)
Voice user-interface applications tend to use prerecorded audio
files wherever possible. Recorded audio sounds much better than TTS.
Prerecorded audio file quality must be designed so that it does not impact
download time and browser interpretation. Make recordings in 8-bit mu-law 8 kHz
format.
Audio file caching
Make sure the voice gateway is set to cache audio content to prevent
delays from having to download files from the media source. For more details
about prompt management on supported gateways, see
Cisco IOS caching and streaming configuration.
Use of grammars
A voice application, like any user-centric application, is prone to
certain problems that might be discovered only through formal usability testing
or observation of the application in use. Poor speech recognition accuracy is
one type of problem common to voice applications, and a problem most often
caused by poor grammar implementation. When users mispronounce words or say
things that the grammar designer does not expect, the recognizer cannot match
their input against the grammar. Poorly designed grammars containing many
difficult-to-distinguish entries also results in many mis-recognized inputs,
leading to decreased performance on the Unified CVP VXML Server. Grammar tuning
is the process of improving recognition accuracy by modifying a grammar based
on an analysis of its performance.
Multi-language support
The Cisco IOS Voice Browser or the Media Resource Control Protocol (MRCP) specification does not impose restrictions on support for multiple languages. However, there might be restrictions on the automatic speech recognition (ASR) or TTS server. Check with your preferred ASR or TTS vendor about their support for your languages before preparing a multilingual application.
You can dynamically change the ASR server value by using the command cisco property com.cisco.asr-server in the VoiceVXML script. This property overrides any previous value set by the VoiceXML script.
Cisco Unified Call Studio installation
Cisco Unified Call Studio is an Integrated Development Environment (IDE). As in the case of any IDE, the Unified Call Studio needs to be installed in a setup that is conducive for development, such as workstations that are used for other software development or business analysis purposes. Because the Unified Call Studio is Eclipse-based, many other development activities (such as writing Java programs or building object models) can be migrated to this tool so that developers and analysts have one common utility for most of their development needs.
Because the Unified Call Studio has not been tested with Microsoft Windows 2008 R2 server, Cisco does not support co-locating the Cisco Unified Call Studio with the Unified CVP VXML Server.