Audio File Support

Table of Contents

Audio File Support

Audio File Support

This appendix describes the supported audio file formats, playout methods, and codecs for audio files that are played out by TCL or VoiceXML applications, or recorded by the gateway using the VoiceXML Voice Store and Forward feature. This appendix contains the following sections:

For information on specifying the recording or playout location in the VoiceXML document, refer to the Cisco VoiceXML Programmer's Guide .

Audio File Formats

Cisco VoiceXML gateways support two standard audio formats for recording and playback: .au (audio/basic) and .wav (audio/wav). The format used to record an audio file is specified by the VoiceXML document at the time of the recording. If it is not defined in the VoiceXML document, the default format type is audio/basic. The gateway uses standard codec numbers for the codec format in the audio header whenever possible, including G.711 u-law, G.711 a-law, and G.726 (32k ADPCM); other codecs use a proprietary format mapping and may therefore not be recognized by third-party audio players. For a listing of the codec numbers used by the Cisco gateway, see the "Codec Mappings in Audio Recordings" section.

Recordings made by the gateway are open-ended and real-time streaming, so the .au and .wav file headers generated by the Cisco gateway are encoded with a data length of 0. All audio files recorded by the Cisco gateway can be played back by the gateway, however, because the standard .wav format does not allow for a 0 length in the header, some third-party audio players may not be able to play back .wav files encoded by the gateway. The .au file format allows a 0 length, but some audio players may also not support these files.

For example, the Cool Edit audio player can play 0-length .au and .wav files that are recorded by the gateway using the G.711 u-law codec. Other audio players and other codecs require the audio file header to be modified before playback. For example, the Windows Media Player audio player can play G.711 u-law and G.711 a-law audio files if the 0-length field is modified. The Cool Edit audio player can play G.711 a-law files if the header is corrected.

To enable an audio player to play back recordings made by the Cisco gateway, you must use a correction utility to modify specific fields in the audio file headers. For information about how to modify the file headers, see the "Data-Length Correction Utility For Audio File Headers" section.

Audio File Playout

TCL and VoiceXML applications can be configured for incoming POTS or VoIP call legs to play announcements to the user and to request user input (digits). The gateway can also play back audio recordings made by VoiceXML applications. Audio files can be played toward both the PSTN side and the IP side of the call leg.

Playout Methods

TCL and VoiceXML applications can play out audio files by using the following playout methods from different locations:

Memory

The entire audio file is loaded into the gateway's memory and then played out to the appropriate call leg as needed. Memory-based prompts can be loaded from an HTTP server, or from Flash memory, a TFTP server, or an FTP server. Audio files can also be recorded into memory using VoiceXML recording capabilities, then played back from memory or submitted to an external HTTP server to become permanent audio files.

The amount of memory available to store audio prompts on the gateway can be configured by using the ivr prompt memory command.


Note   Flash memory allows a limited number of entries, typically 32 on most platforms. For the specific Flash memory limits for your platform, refer to the platform-specific reference documentation listed in the "Additional References" section on page 1-12.

HTTP, Flash, TFTP, or FTP Streamed

The Cisco gateway can stream audio files from an external server to the appropriate call leg as needed. Loading an entire audio prompt into local memory before beginning playout can limit the length of audio prompts and impact memory resources. With streaming, pieces of a prompt are loaded into memory and then, if necessary, deleted after they are played to free up memory. The audio file is played out while it is being loaded into memory, with playback beginning as soon as a piece of the prompt is loaded.

Prompts can be streamed from an HTTP server or from Flash memory, a TFTP server, or an FTP server. With HTTP, each time a prompt is played, the HTTP caching system is checked, and the audio file is reloaded if necessary. The HTTP cached flag in the VoiceXML document specifies whether an audio file that is loaded into memory is safe to use again, and does not have to be deleted.

To enable the gateway to stream audio files during playout, use the ivr prompt streamed command. HTTP prompts are streamed by default, but HTTP streaming can be disabled by using the no ivr prompt streamed http command.

The amount of memory available to store audio prompts on the gateway can be configured by using the ivr prompt memory command. Performance is best when there is enough memory to store the entire audio file. If the ivr prompt memory command is set to a value smaller than the size of a streamed file, performance is not as good.

RTSP Streamed

An external Real Time Streaming Protocol (RTSP) server can stream audio to the appropriate call leg as needed. RTSP is an application-level protocol that controls the on-demand delivery of real-time data, such as the delivery of audio streams from an audio server. By implementing an RTSP client on the Cisco VoIP gateway, a voice application running on the gateway can connect calls with audio streams from an external RTSP server. Prompts from RTSP servers are always streamed during playback. RTSP saves memory on the gateway because it is packet-based. Unlike HTTP or TFTP streaming, for example, RTSP streaming does not read any part of the audio file into RAM.


Note   When playing a series of short audio prompts, such as with dynamic prompts, non-streaming might be more efficient; streaming playout can cause noticeable delays and impact voice quality.

TTS Streamed

An external speech synthesizer using MRCP can generate prompts. Requests to synthesize speech from text strings or audio segments are sent to the media server, which responds with a real-time audio stream.

Dynamic Prompts

Dynamic prompts are formed by the underlying system assembling small audio files and playing them out in sequence. This provides simple TTS operations, like playing numbers, dollar amounts, dates, and time. For example, dynamic prompts can inform the caller of how much time is left in their debit account, as in:

"You have 15 minutes and 32 seconds of call time left in your account."

The above prompt is created using eight individual audio files. They are: youhave.au, 15.au, minutes.au, and.au, 30.au, 2.au, seconds.au, and leftinyouraccount.au. These audio files are assembled dynamically by the underlying system and played out as a single prompt.

The language and location of the audio files used for dynamic prompts can be specified in the TCL script or VoiceXML document, or these parameters can be configured on the Cisco gateway by using the call application voice language command and the call application voice set-location command.


Note   When playing a series of short audio prompts, such as with dynamic prompts, non-streaming might be more efficient; streaming playout can cause noticeable delays and impact voice quality.

TCL Language Modules for Dynamic Prompts

Each language uses a TCL language module. The TCL language module defines the list of TTS notations that the language supports. Cisco IOS software includes built-in language modules for English, Chinese, and Spanish. You can add support for new languages and new TTS notations by configuring a new TCL language module on the gateway.

The Cisco IOS infrastructure interfaces with the TCL language module to translate TTS notations supplied by the voice application into the specified language. Cisco IOS software translates TTS notations into the sequence of audio files according to the language structure. For example, English and French use different sequences for saying the date: the English language structure says the month first and then the day; the French language structure says the day first and then the month.


Note   Language modules are not used by external TTS servers; they are used by Cisco IOS software to assemble a list of dynamic prompts.

New TTS notations for the Cisco IOS built-in languages, such as playing dates and times of day, can also be configured. For example, if you configure a new English TCL language module, it overrides the built-in English TCL language module during the translation. When completed, any voice application can use the new notations, and the Cisco IOS infrastructure recognizes and plays the audio accordingly.


Note   TCL language modules are not TCL IVR scripts. They are pure TCL scripts and any system on the Cisco gateway (TCL IVR 1.0, 2.0, VoiceXML, MGCP) can use the configured language with little or no change to the Cisco IOS configuration.

For information on writing a new TCL language module, refer to the Cisco Pre-Paid Debitcard Multi-Language Programmer's Reference .

For information on configuring a new language module on the gateway, see the "Specifying a New Language Module for Dynamic Prompts" section.

Recording and Playback

The VoiceXML Voice Store and Forward feature allows streaming-based voice recording and playback features for various media including local memory, HTTP, ESMTP, and RTSP for 14 different Cisco codecs and two standard audio file formats, .au and .wav.

VoiceXML Recording Locations

The VoiceXML Voice Store and Forward feature supports audio recording and playback using local memory on the Cisco gateway or a choice of external media server locations:

  • Local memory—Recording and playback is supported for storing and retrieving audio files. Audio recordings can also be submitted to HTTP servers for permanent storage.

  • ESMTP—Recording is supported by directly streaming audio to ESMTP server as e-mail attachment. Playout directly from the ESMTP server is not supported.

  • HTTP—Recording is supported by directly streaming audio to HTTP server using the chunked transfer-encoding method; playback is supported using streaming and non-streaming methods.

  • RTSP—Recording and playout is supported by directly streaming audio to and from an RTSP server.

  • TFTP—Playout supported by retrieving audio file from a TFTP server, using streaming or non-streaming methods. Recording audio to a TFTP server is not supported.

The URL of the recording destination is specified in the VoiceXML document by using the Cisco property cisco-dest. For more information, see the Cisco VoiceXML Programmer's Guide .

Codec Support for Audio Recording

Table A-1 shows the codecs that are supported for audio recording and playback, by platform and minimum required Cisco IOS release.


Note   All codecs listed are supported for H.323 and SIP unless otherwise noted.


Table A-1: Codec Support for Audio Recording by Platform and Minimum Cisco IOS Release
Codec Cisco IOS Value Cisco 3600 Series Cisco AS5300 Cisco AS5350 Cisco AS5400

G.711 a-law

g711alaw

12.2(11)T

12.2(11)T

12.2(11)T

12.2(11)T

G.711 u-law

g711ulaw

12.2(11)T

12.2(2)XB

12.2(2)XB

12.2(2)XB

G.723.1 Annex-A (5.3 kbps)

g723ar53

12.2(11)T

12.2(11)T

12.2(11)T

12.2(11)T

G.723.1 Annex-A (6.3 kbps)

g723ar63

12.2(11)T

12.2(11)T

12.2(11)T

12.2(11)T

G.723.1 (5.3 kbps)

g723r53

12.2(11)T

12.2(2)XB1

12.2(2)XB1

12.2(2)XB1

G.723.1 (6.3 kbps)

g723r63

12.2(11)T

12.2(2)XB

12.2(2)XB1

12.2(2)XB1

G.726 (16 kbps)2

g726r16

12.2(11)T

12.2(11)T3

12.2(11)T3

12.2(11)T3

G.726 (24 kbps)2

g726r24

12.2(11)T

12.2(11)T

12.2(11)T3

12.2(11)T3

G.726 (32 kbps)2

g726r32

12.2(11)T

12.2(11)T

12.2(11)T3

12.2(11)T3

G.728 (16 kbps)

g728

12.2(11)T

12.2(11)T

Not supported

Not supported

G.729 (8 kbps)2

g729r8 (high complexity)

12.2(11)T

12.2(11)T

12.2(11)T

12.2(11)T

G.729 Annex-A (8 kbps)

g729r8 (medium complexity)

12.2(11)T

Not supported

Not supported

Not supported

G.729 Annex-B (8 kbps)2

g729br8 (high complexity

12.2(11)T

12.2(11)T

12.2(11)T

12.2(11)T

G.729A Annex-B (8 kbps)

g729br8 (medium complexity)

12.2(11)T

Not supported

Not supported

Not supported

GSM EFR

gsmefr

12.2(11)T

12.2(11)T

Not supported

Not supported

GSM FR

gsmfr

12.2(11)T

12.2(2)XB

12.2(11)T3

12.2(11)T3

1This codec is supported only for H.323 on this platform in Cisco IOS Release 12.2(2)XB; it is supported for SIP in Cisco IOS Release 12.2(11)T.
2
This codec is supported only for audio playback in Cisco IOS Release 12.2(2)XB.
3
This codec is not supported for SIP on this platform.



Note
  • If the codec for an audio recording is not specified in the VoiceXML document, the default codec used for the recording is G.711 u-law.
  • For recording and playback over an IP call leg, the codec negotiated between the originating and terminating ends must match the codec specified in the VoiceXML document. If the codec specified for the audio file is different than the codec negotiated for the call, the recording or playback fails and an error is generated.

  • To determine specific codec support for your platform and Cisco IOS release, use the codec command in dial-peer configuration mode.





Codec Mappings in Audio Recordings

Table A-2 lists the codec number that is mapped to each codec in the audio file header, for .au format and .wav format files.


Table A-2: Codec Numbers Used in Audio File Headers
.AU Files .WAV Files
Number Codec Number Codec

1

PCMULAW

1

MS_PCM

23

32K_ADPCM

2

MS_ADPCM

27

PCMALAW

6

G711_ALAW

39

CISCO_G729

7

G711_MULAW

42

CISCO_G729_b

100

32K_ADPCM

44

CISCO_GSMFR

5339

CISCO_G729

45

CISCO_GSMEFR

5342

CISCO_G729_b

47

CISCO_G723_1r53

5344

CISCO_GSMFR

48

CISCO_G723_1r63

5345

CISCO_GSMEFR

49

CISCO_G723_1ar53

5347

CISCO_G723_1r53

50

CISCO_G723_1ar63

5348

CISCO_G723_1r63

51

CISCO_G726_r16

5349

CISCO_G723_1ar53

52

CISCO_G726_r24

5350

CISCO_G723_1ar63

53

CISCO_G726_r32

5351

CISCO_G726_r16

55

CISCO_G728

5352

CISCO_G726_r24

5353

CISCO_G726_r32

5355

CISCO_G728



Data-Length Correction Utility For Audio File Headers

Because the recording duration is unknown at the beginning of an HTTP or ESMTP streaming session, the .au and .wav file headers generated by the Cisco gateway are encoded with a data length of 0 when recording to an HTTP or ESMTP server; the gateway correctly decodes 0-length files during playback.

The standard .wav format, however, does not allow for a 0-length in the header, so third-party audio players may not be able to play back .wav files encoded by the gateway. Although the .au format allows files with a 0-length, some audio players may not support these files either.

To enable an audio player to play back recordings made by the Cisco gateway to an HTTP or ESMTP server, you must use a correction utility to modify fields in the audio file headers. The corrections that must be made to the .au and .wav header fields are described below.


Note
  • Recordings made to an RTSP server, or to local memory and submitted to HTTP, use the correct data length in the headers and do not require a correction utility.
  • The Cool Edit audio player recognizes 0-length .au and .wav files recorded by the gateway using G.711 u-law so no changes to those files are required. Use the correction utility to enable the Cool Edit audio player to play back .au and .wav files that use G.711 a-law.

  • Windows Media Player and Vovida RTSP player require the correction utility to playback .au and .wav files using G.711 u-law, or .wav files using G.711 a-law.

  • Audio files recorded by third-party audio players might not completely follow the .au or .wav file format standards. Cisco reserves the right not to support non-standard .au or .wav file formats.

  • Concatenating two audio files without first correcting the data length fields and the data offsets can result in files with invalid data lengths. Cisco reserves the right not to support invalid data-length audio files.


.AU File Format Correction

One field in the .au header requires correcting; the data_size field located at offset 8 from the beginning of the .au file. The correction utility must find the total size of the source .au file, subtract the .au header size at offset 4 from this file size, and insert the result into the data_size field at the offset 8 position.

The Cisco gateway puts a value of 24 in offset 4 and a value of 0 in offset 8, but this is not guaranteed. The correction utility should follow the correction logic outlined above.


Note   The .au header fields are in big-endian format; byte-swapping is required for a correction utility running on little-endian platforms.

.WAV File Format Corrections

Three fields in the .wav header require correcting:

  • total_size_minus_8 field—Located at offset 4 from the beginning of the .wav file. The correction utility must find the total size of the source .wav file, subtract 8 from this file size, and insert the result into this field at the offset 4 position.

  • chunk_data_length field—The correction utility must find the total size of the source .wav file, and subtract one of these values from the file size:

    • 90 bytes for 32k ADPCM (G.726) codec

    • 56 bytes for all other codecs

then insert the result into the field at one of these offset positions:

  • offset 86 for 32k ADPCM (G.726) codec

  • offset 52 for all other codecs

  • total_sample_blocks field—total number of sample blocks inserted at this position:

    • offset 74 for 32k ADPCM (G.726) codec

    • offset 40 for all other codecs

The Cisco gateway puts a value of 0 in offset 4 and a value of 0 in offset 40, but this is not guaranteed. The correction utility should follow the correction logic outlined above.


Note   The .wav header fields are in little-endian format; byte-swapping is required for a correction utility running on big-endian platforms.