This white paper defines and outlines functioning of Customer Virtual Assist feature under 12.5 release of Cisco Unified Customer Voice Portal and Virtual Voice Browser of Cisco Contact Centre portfolio. It covers high-level aspects for designing an end-to-end Interactive Voice Response (IVR) solution and considers various technical aspects as well as vendor offerings. This paper is intended for anyone interested in learning more about the Cisco® Customer Virtual Assistant feature and its deployment.
The cisco products, service, or features identified in this document may not yet be available or may not be available in all areas and may be subject to change without notice. This is a cisco document and must not be altered while distributing. For current updates, raise a case with Cisco or reach out to the Cisco product team directly.
Customer Virtual Assistant (CVA) is a feature that provides a business’ customers with a human-like interactive interface, which is simpler and more intuitive compared to traditional IVR experiences. For enterprises, it is an opportunity to significantly improve effectiveness of traditional IVR-based systems with lessor IVR handling time and significant reduction in agent transfers.
Along with the CVA feature, built-in tools have been developed to make the Cisco Unified Customer Voice Portal (CVP) solution more simple, effective, and powerful to define business outcomes based on vision, requirements, and customer experience. Developers can explore a wide number of tools as part of call studio for customization and flexibility to develop a conversational experience for their end customers.
The Customer Virtual Assistant (CVA) feature enables an IVR platform to integrate with cloud-based speech services and provide transformation in the way customers interact in self-service systems today. End-customer interactions are more human-like and enable customers to express their issues more effectively, with detailed context. An Artificial Intelligence (AI) engine interprets intent, and processes in intended node compared to traversing through multiple nodes in traditional IVR. Due to the elaborate context available via speech media and intent identification, the IVR application can handle a vast amount of scenarios and respond appropriately, thereby reducing the calls directed toward actual agents.
The effectiveness and efficacy for CVA-based IVR can easily be evaluated using a stock report under the reporting server. Preliminary results shared by different customers for their CVA-based IVR reflect that the average traversal time and agent transfers have reduced significantly compared to their traditional IVR. Improved efficiencies for Average Traversal Time (ATT) and reduced agent transfers increase the effectiveness of end agents, so they can focus on highly complex issues rather than generic or medium-level complexity issues.
Customers migrating to CVA-based IVR can plan their migration in a phased manner, without fully impacting their traditional IVR business logic. In their business logic of traditional IVR, the most widely hit node can be identified and converted to a CVA node, whereas the remaining traditional business logic can operate as such. Alternatively, a customer can plan to automate their complete business logic on CVA-based architecture. More information on speech interfaces supported by CVA-based applications are covered in the next section.
CVA-based IVR leverages cloud-based speech services from Release 12.5 of Cisco Unified Customer Voice Portal and Virtual voice browser. While extending Cloud based services, abundant focus has been provided to CVP solution for being optimized and tuned to interoperate with Cloud based speech services. Even though additional services can be added at any point by cloud vendors, it is strongly recommended to choose one of more of following recommended services which have been certified for production deployments. For a complete CVA experience, an IVR application leverages all of these three services for processing each incoming customer call.
Speech services leveraged by CVA:
● Speech to text - Integrate cloud-based ASR services with your application for speech recognition operations. CVA currently supports the Google Speech to Text service.
● Speech to intent - CVA provides the capability of identifying the intent of customer utterances by processing the text received from speech-to-text operations. This cloud-based service is also known generically as Natural Language Understanding (NLU) and is provided by the Google Dialogflow service.
● Text To Speech (TTS) - Integrate cloud-based TTS services in your application for speech synthesis operations. This cloud-based service is provided by the Google Text to Speech service.
For an overall superior conversational experience, it is highly recommended that all these services be leveraged. Customers that plan to migrate in phased manner should prioritize these services based on your unique business requirements.
While planning for Google Dialogflow hosted services, key points to consider are:
4.1. Network bandwidth per call
Dialogflow supports both 8-bit and 16-bit G711ulaw codec for ASR and TTS services, whereas Cisco’s end-to-end collaboration voice infrastructure operates under 8-bit codec support for G711ulaw or G711alaw.
For a normal call with G711ulaw (8-Bit) codec configured, media is sent toward hosted services (uplink) as well as while receiving media from Google Dialogflow (downlink). For more information on bandwidth consumption, visit: https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cust_contact/contact_center/pcce/pcce_12_5_1/design/guide/pcce_b_soldg-for-packaged-cce-12_5.pdf (Section - CVA).
4.2. Secure connectivity (TLS handshake)
Traffic relaying from the VVB gRPC interface to Google Dialogflow is secured in both directions using TLS authentication. No certificates should be exchanged as the JSON key acts a token between nodes for providing mutual identity and secure connectivity.
4.3. Network ports and firewall
All requests to Google travel via the Google SDK (packaged on VVB). The protocol between VVB and Google is gRPC, which works over HTTP. There is no new requirement to open any network port in order to make the CVA feature work.
Therefore, if a customer is using a firewall in a network, no specific port (like Session Initiation Protocol [SIP]) needs to be opened for CVA traffic. However, ensure HTTP traffic is allowed between VVB and Google Dialogflow.
4.4. HTTP proxy
For CVA, each VVB would require reachability to the Google Dialogflow server farm. Providing Internet access for each device with a global IP address is generally not feasible for every browser device in the contact center. Therefore, it is highly recommended to use HTTP Proxy instead of routing HTTP traffic originating from each VVB to the Google Dialogflow server farm.
HTTP Proxy helps in hiding internal IP addresses and provides more security for traffic going to the Internet. Moreover, it removes the dependency to configure each VVB with a global IP address and have it operate with other devices on internal IP addresses.
In a case where direct Internet access has to be provided to each VVB configured for CVA, it is recommended to place these VVBs under DMZ.
Architecture - While configured for Customer Virtual Assistant functionality, Unified Customer Voice Portal (CVP) as a platform performs one or more operations—such as bootstrapping, licensing, looping, handling exit logic, context management, session management, business logic, or fulfilment—based on how a business logic application is configured.
More details on use cases and sample scripts are covered under Sections 6 and 8. A detailed view on CVA architecture for Cisco Packaged Contact Centre Enterprise (PCCE) can be found at: https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cust_contact/contact_center/pcce/pcce_12_5_1/design/guide/pcce_b_soldg-for-packaged-cce-12_5.pdf (Section - CVA Call Flows and Architecture).
Figure 1 illustrates self-service from Google Cloud BoT and agent services from premise based Contact Centre.
CVA architecture with Google CCAI
5.1 Customer Virtual Assistant (CVA) use cases
The CVA feature can be configured easily to meet business vision and requirements and to meet end objectives. It is imperative that core use cases or deployment models be understood before designing an overall solution.
5.1.1. Google-based IVR logic
Hosted IVR deployment is most suited for customers planning to migrate their IVR infrastructure into a cloud. Under hosted IVR deployment, only IVR business logic resides in the cloud, whereas agents will be registered to on-premises infrastructure. To get initiated with hosted deployment, a customer needs to develop IVR business logic and deploy it in a cloud such as Google Dialogflow engine. Because the entire business logic will be driven from a cloud, it is strongly recommended to deploy and maintain a CRM database in the cloud with at least a delay possible between hosted IVR and the CRM.
Note: For a CRM or custom CRM system to be migrated to cloud, check for its support with Google or relevant documentation on Dialogflow support for CRM under a cloud infrastructure.
Once hosted IVR is deployed, core signaling, and media processing will take place in the cloud and the CVP and VVB solutions will be in bridged mode for streaming media to the cloud. Once IVR handling is complete and agent engagement is required, call control will be transferred back to CVP for further processing of call and queue treatment (see Figure 2).
Flow diagram for hosted IVR
For a typical incoming call landing on Cisco Unified Border Element (CUBE) or Voice Gateway, signaling will be sent to CVP once the call is connected. Business logic requires the CVA feature to be invoked. RTP media from VVB will be converted and streamed to the cloud in real time. Once the stream is received at Dialogflow, recognition will take place and NLU service is engaged to identify intent. Intent identification and processing will happen based on BoT created in the cloud.
Based on intents matched, all three speech services will be engaged based on customer input by Dialogflow. Flow control will remain with Dialogflow unless the customer requests for agent transfer or if the call is disconnected.
Once agent transfer or call disconnection is requested by the customer, processing control will return to the CVP solution and premises logic will take over. Until this state is reached, the CVP/VVB solution works in bridged mode to relay media between the end customer and cloud services.
5.1.2. Premises-based intent and IVR processing
Premises-based intent handling and IVR processing deployment are more suitable for customers, requiring handling of PII or any other sensitive data information on their on-premise systems. Typically, in such deployments, PII information would never be sent to the cloud for processing; rather it would be collected in a manner where information is always retained and processed on premises.
For example, in a banking business use case, an initial customer greeting and prompt can be played using VVB, and once customer states his or her purpose in calling, speech can be passed to cloud ASR and NLU services to identify the intent for initial speech. If the intent identified requires processing of sensitive information like a credit card number to be punched in or PIN information to be entered, VVB can play the required prompt and collect Dual Tone Multi Frequency (DTMF) from the end customer.
This sensitive information would be collected by a local business application and sent to a locally located CRM for authentication and further processing. Once a customer is authenticated using PIN information, speech control can be passed back to the ASR service in the cloud. An important point to note about this deployment mode is that it is easy and seamless to pass control back and forth between cloud-based services and on-premises processing. The end customer wouldn’t be able to make out whether processing is taking place in the cloud or on the premises.
In this banking example, apart from PIN information, the local business application could also drive parameter filing while asking for required information missing from the customer intent. For example, if the customer wants to transfer a certain amount of money to another account, once “transfer” intent is identified, business logic at the CVP could request for pending variables, such as how much money, from what account to what account, and when to transfer the amount, etc. All of these parameters might require performing a CRM dip to see the account balance and hence, could be handled locally rather than in a cloud.
Essentially, this deployment model provides more flexibility in terms of defining actions to be taken at each stage based on customer input, and it is driven entirely from on-premises applications. Cloud services are engaged primarily for recognition of speech and “intent” identification. Once intent is identified, control is passed back to the CVP business application to process and decide what should happen next. Figure 3 illustrates a flow diagram for premises-based media handling and redirecting of media towards cloud.
Flow diagram for media handling and gRPC traffic towards cloud
Another advantage with this deployment is the use of existing features from traditional DTMF based IVR and combining it with speech-enabled services for an ultimate experience. For example, a prompt can be played asking a customer to speak or punch in DTMF for their PIN information. Once input is received on either channel, it would be processed and responded to. This could address concerns where customers could be traveling or in public areas with other people and are not comfortable speaking PII information aloud.
This deployment model brings up an awesome experience for end customer’s when power of traditional IVR and Cloud based services are utilized together in thoughtful manner.
5.2 Call Studio – Introduction to elements (for CVA services)
Call Studio for Release 12.5 of Webex Contact Center has been enhanced with the following elements added to ease the configuration of the CVA feature. These elements can be dragged and dropped just like other elements for creating new business applications or for enhancing an existing traditional application with the CVA feature. All these elements are grouped under the label of Customer Virtual Assistant (Figure 4).
For a consistent experience, ensure that the entire CVP solution (Call Studio, CVP, and VVB) are on same release version of 12.5.
Elements of Call Studio
Following is a brief introduction of each element:
Dialogflow has been created for engaging and managing ASR, NLU, and TTS services from a cloud. From a deployment standpoint, it helps simulate a hosted IVR deployment, wherein all speech services are engaged by Google Dialogflow and the entire business logic is controlled and driven from the cloud. In a typical hosted IVR scenario, CVP will relay media traffic toward Google Dialogflow and receive media traffic back for TTS prompts.
For more information on the parameter setting under the Dialogflow element, refer to the “12.5 Release Element Specification” guide. (The link is located in the reference section at the end of this white paper).
DialogflowIntent has been created for cloud services to be engaged for recognition (ASR service) and intent identification (NLU service). Once intent has been identified and passed to a VXML server on the CVP, intent handling and further action can be performed as per business script logic at the CVP. Here, flexibility has been provided for application developers to engage TTS services from the cloud or from the premises. This element ideally should be engaged when intent handling and call processing need to be driven from on-premises business logic.
In a typical scenario, CVP will replay media traffic toward Google Dialogflow and will be receive text (intent) back for further processing.
For more information on the parameter settings under the DialogflowIntent element, refer to the “12.5 Release Element Specification” guide. (The link is located in the reference section at the end of this white paper).
DialogflowParam works in conjunction with the DialogflowIntent element. In a typical premises-based IVR processing deployment, when customer intent is identified and passed to a VXML server, on quite a few occasions, parameter filling is required and should be driven by the CVP application rather than sending Google Dialogflow requests back and forth for each parameter. For example, a typical banking application could analyse missed inputs from customer speech and request for remaining mandatory inputs before processing the entire transaction. Under this scenario, the DialogflowParam element works in conjunction with the DialogflowIntent element to process the intent that has been identified.
For more information on parameter settings under the DialogflowParam element, refer to the “12.5 Release Element Specification” guide. (The link is located in the reference section at the end of this white paper).
Transcribe has been created to process customer speech and return text as output. It basically performs a recognition function and provides text as output. The Transcribe element should ideally be used when ASR functionality alone is required.
For more information on parameter settings under the Transcribe element, refer to the “12.5 Release Element Specification” guide. (The link is located in the reference section at the end of this white paper).
5.3 Cloud-based intent processing - sample script
Following is depiction of a sample script meant for hosted IVR deployment under Call Studio:
As a call hits an application, the Dialogflow element starts processing voice input. A dialogue with the customer continues and BoT is able to identify intents, process them, and relay them back as media via TTS services. For every request from the customer, a flow continues in a looped manner around the Dialogflow element and every intent matched would be run against a “decision box” to ensure whether call processing should continue or whether the customer needs to transfer the call to an agent.
If an agent transfer decision is triggered, the call will be routed to CVP and control is passed for queue processing and transferring to an agent. Figure 5 outlines this process.
Sample script of cloud-based intent processing
This sample script can be found at: https://github.com/CiscoDevNet/cvp-sample-code/tree/master/CustomerVirtualAssistant/DFAudio.
5.4 Premises-based intent processing - sample script
Following is a depiction of a sample script meant for premises-based IVR processing deployment under Call Studio:
The call flow relates to a banking application to check an account balance and transfer a certain amount of cash from a savings account to a different account. The initial Transcribe elements are to collect the customer ID from the customer via speech and validate it with ANI. Once the end customer is validated, call control is handed over to the DailogflowIntent element to identify the request from the customer. Based on customer input such as the amount to be transferred, the CVP business application will request any remaining parameters from the end customer to further process intent. Logic is defined in a way that if, at any point, the customer needs to check the balance from his account, it can be accomplished.
Once the money transfer transaction is over, the customer can end the call or request a transfer to an agent. Figure 6 outlines this process.
Sample script of premises-based intent processing
This sample script can be found at: https://github.com/CiscoDevNet/cvp-sample-code/tree/master/CustomerVirtualAssistant/DFRemote.
5.5 Speech services offering
For latest information on speech suites from Cisco, send an email message to email@example.com.
5.6 Creating a Dialogflow BoT
For an end-to-end operational application, development is required at both ends—IVR application hosted at the CVP and via a Google Dialogflow BoT. Sections 7 and 8 depict a sample application customers can create for the CVP IVR solution. To create a new agent or BoT for an IVR application in the cloud, use the procedure outlined in this section, illustrated by 12 figures. The procedure would remain same in the case of BoT creation for an on-premises solution or a hosted IVR solution. Log into your Google account and launch the Dialogflow application interface.
5.6.1. Click on “Create a new agent” on left-side pane for BoT creation.
Creation of Dialogflow BoT
5.6.2. Once an agent has been created, it would appear as shown in the following figure.
Verification of “Hosted-IVR” BoT
5.6.3. Languages available and configuration
English is the default language for any new application. An additional four languages can be selected for a given agent and used simultaneously. This means that a single BoT agent can handle additional four languages. A BoT can auto-detect an end-customer’s language and switch to it internally. If multiple languages are defined, you will want to ensure that respective intents are also defined for each language (see image below).
Multiple Language selection for “Hosted-IVR” BoT
5.7 Intent and entities creation
Once an agent is created, the next step is to define core application logic under the newly created BoT and define the flow of an application based on customer input. Broadly, every sentence an end customer speaks needs to be evaluated and an action identified. In a simple manner, an action can be translated to “identification of intent” from every sentence a customer speaks. Once an intent has been identified, relevant values or parameters from spoken sentences need to be extracted, which are also known as entities under the BoT. For example, imagine that a customer is trying to transfer $500 from a savings account to a third-party account. Here, the intent would be “transfer an amount” and the relevant entity or entities would be “$500”
Use the following procedure to define intents and respective entities under the BoT:
5.7.1. To define the new intent, click on the “Intents” tab on left pane, or to create a new intent, click on the plus sign.
Intent creation for “Hosted-IVR” BoT
5.7.2. Entities (or parameters) under a given intent are defined by clicking on the “Entities” tab on left pane. Or you can click on the plus sign to create a new entity mapped under an intent.
Entity creation for “Hosted-IVR” BoT
5.7.3. Once an entity is created, the reference values for identification or their edits can be performed.
Entity creation and mapping under an Intent
5.8 Intent training
With all intents and entities are defined and saved under BoT, their matching can be easily validated without making any voice calls. Next to “Intent Definition” tab in the top right section of the screen is a section with the heading, “Try it now”. Here, any sentence can be typed to see if required intents and entities are being matched. If any changes or additions to intents or entities are required, they can be easily modified based on the validation results. This process is called “Intent training” and can be performed at any time.
5.8.1. Adding training phrase for intent matching
5.8.2. Validate matching a sample phrase with an existing intent
Intent match validation and training
5.9 Downloading a JSON key
Once an agent is configured and ready to be deployed, a JSON key created for that particular project ID needs to be downloaded and attached to an IVR application under the PCCE solution. The procedure to download a JSON key from the Dialogflow site is shown in the following figure.
5.9.1. Click on the Settings icon next to the “Agent/BoT” name.
Key id information
5.9.2. Click on the service account credentials to access the token or key ID that has been generated for client access (illustrated in the following figure).
Downloading JSON Key
5.9.3. On left side of the screen, click on “Service account” to enable access for multiple users.
At this point, though it is not mandatory, roles and permissions can be defined for various users to manage this project. Steps to create a service account and define a respective role are outlined below. Step 1 is shown in the figure below.
Creating Service Account
5.9.4. Defining a role for respective users is shown in the following figure:
Assigning Roles and Permissions for a Service Account
5.9.5. More information on downloading a JSON file can be fetched from:
The customer can choose to select an entire suite, which is the default setting, or choose a particular service, such as TTS or ASR from Google. The rest of services can be utilized from their existing on-premises infrastructure. To configure any of ASR or TTS services alone, use the following procedure:
6.1 Configure ASR
To configure a single service using Google Cloud, use the following steps (accompanied with associated images of what each step looks like):
6.1.1 On the Dialogflow homepage, scroll down and click on “Project ID”.
Configuring ASR or TTS service under a give Project id
6.1.2 Screen for API services
On the new screen that appears, select “APIs & Services” from left side vertical pane. This will redirect you to a new screen. On the top search bar in that screen type the name of service to be enabled. (For example, type Cloud Text-to-Speech API or Cloud Speech-to-Text API and select the relevant API from the search results.) The procedure for selecting and enabling an API is shown in the image below.
Accessing API’s and Services menu
6.1.3 Select the required speech service to be enabled.
Finding Speech to Text API’s
6.1.4 Click on the “Enable” button for your chosen API to be enabled for a given Project ID.
Enabling Speech to Text API’s
For the CVA feature to be supported, VVB OVA under the 12.5 Release has been updated with additional memory and CPU cycles. For additional information on OVA specification, refer to the VVB virtualization Wiki. (The link is located in the reference section at the end of this white paper).
Apart from VVB OVA specification changes, there are no additional Cisco or third-party gateways required for enabling the CVA feature in the network.
With VVB hosting and catering to cloud services, there is marginal impact on VVB performance compared to ASR and TTS services traditional setups. Now, with the Google Dialogflow SDK packaged under VVB and conversion of RTP to HTTP data chunks and vice versa, VVB scale information can be referred under “VVB Scale”. (Check the References section.)
Conversational IVR services have been bundled along with CVP port licenses at no extra cost. So, if a customer has active Cisco Software Support Service (SWSS) with existing Perpetual licenses deployed, the customer can simply upgrade to the 12.5 Release to use the CVA feature.
Note: The 12.5 Release is Smart-License-enabled. The customer should read through all instructions related to Smart Licensing before upgrading to Release 12.5.
For queries related to bringing your own Google account, feel free to reach us via email at firstname.lastname@example.org.
The CVA feature is supported from Cisco UCCE version 11.6 or onwards. If a customer has a router and logger on version 11.6, they can upgrade their CVP and VVB solution to Release 12.5 in order to leverage the CVA feature. For more effective performance results, Cisco recommends that customers to upgrade their router and logger to at least the 12.0 release, as many fixes have been made between the two releases.
For customers on PCCE, all the components must be on the 12.5 Release for an end-to-end configuration and monitoring experience. However, if a customer still wants to maintain an existing router and logger on Release 12.0 and upgrade only the IVR solution to Release 12.5, the customer can do so. The pre-requisite for the PCCE solution on Release 12.0 to have the CVA feature, would require an Engineering Special(ES) to be installed. Cisco strongly recommends all CVPs and VVBs run Release 12.5 before using the CVA feature.
11.1 Configuring CVA – PCCE
For information on configuring the CVA feature on PCCE, refer to the feature guide at: https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cust_contact/contact_center/pcce/pcce_12_5_1/design/guide/pcce_b_soldg-for-packaged-cce-12_5.html.
For UCCE, refer to feature guide at: https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cust_contact/contact_center/icm_enterprise/icm_enterprise_12_5_1/design/guide/ucce_b_soldg-for-unified-cce-12_5.html.
For Perpetual licensing or for customers with traditional CVP and VVB ports, there are no ordering requirements. However, if a customer is using voice a VXML gateway, then the CVA feature is not supported on Cisco IOS® Software-based VXML gateways. In that case, migration from a VXML gateway to VVB should be planned to leverage the CVA feature. For more information on VVB migration, refer to: https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cust_contact/contact_center/cisco_vvb/VVB_12_5/installation/guide/ccvp_b_1251-migration-guide-for-cisco-virtualized-voice-browser-release-1251.html.
The CCBU ordering guide can be found at: https://www.cisco.com/c/dam/en/us/products/collateral/customer-collaboration/CCBUorderingguide.pdf.
No additional configurations are required to be done on VVB. However, you will want to ensure to add a SIP trigger and application name appropriately if a new application is created and added for the CVA feature.
For more information on the CVA feature’s monitoring and serviceability, refer to sections A, B, and C under the PCCE feature guides: https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cust_contact/contact_center/customer_voice_portal/cvp_12_5/administration/guide/ccvp_b_1251-administration-guide-for-cisco-unified-customer-voice-portal/ccvp_b_1251-administration-guide-for-cisco-unified-customer-voice-portal_chapter_01.html#task_B10C9C5D99FE90745387E74F3376121A.
The effectiveness of the CVA feature can be validated using a new stock report under Cisco Unified Intelligence Center (CUIC). Existing reports can be enhanced, or a custom report can be created using following parameters:
● Average time spent on IVR
● Percentage of calls transferred to agents
● Calls abandoned
Security considerations for the cloud or overall solution are of paramount importance to us. To ensure the strong security practices are adopted, implemented, and adhered to during every stage of data processing—at rest or in transit—Cisco and Google follow recommended standards, as well as additional strict frameworks.
More information on security compliance for Google Cloud security and Cisco Contact Centre security can be found at the links below:
Google Cloud Solution: https://cloud.google.com/security/.
CVP Installation Guide:
CVP Admin Guide:
CVP Config Guide:
VVB Installation Guide:
VVB Configuration Guide:
VVB Scale and Performance:
https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cust_contact/contact_center/pcce/pcce_12_5_1/design/guide/pcce_b_soldg-for-packaged-cce-12_5.pdf (For CVA scale, refer to Section - Cisco Virtualised Voice Browser Sizing).
For any questions related to the CVA feature set, support, billing, or for the migration of an existing application or BoT running under Google Dialogflow to Cisco infrastructure, please contact us via email at email@example.com.