Determine Current Solution
"How close to the
ceiling am I today?"
growth plans, when do I upgrade hardware?"
"What if I add
Capacity planning is
not a one-time task—it should be part of routine contact center operations. A
reliable capacity management plan helps prevent outages because the data
supports proactive modifications to the deployment that ultimately prevent a
particular outage. How might this happen?
system was initially designed and deployed, it was sized for a specific number
of agents with a certain number of skills groups configured per agent. At that
time, there was sufficient room to accommodate modest growth. As time went on,
small changes occurred with no hint of an issue in capacity – agents were
added, skill groups were added. There was no capacity management plan in place
and utilization increased with no one being aware. Eventually, utilization was
near maximum thresholds where in the midst of a busy period, an unexpected
outage occured. If a capacity management plan was in place, the increase in
utilization would have been seen with each change to the system. As utilization
increased nearing maximum capacity, either additional changes would have been
curtailed or an upgrade of hardware would have been done to accommodate the
additional changes, thus preventing an outage.
hardware) resource utilization data is at the foundation of capacity analysis.
The health monitoring performance counters discussed in the prior section are
used to determine the capacity utilization of the server. This section
describes the process and the reasons for doing routine capacity analysis and
requires the following steps:
samples after a defined monitoring period
the collected data is distributed into three buckets that equate to three
Level: resources on a single server
Level: resources associated with a single application or a single application
component (for example, Unified ICM/Unified CCE Router) on a multi-application
or multi-component server
Level:collective utilization level across the entire solution
Analyze Data for
Use the methods and
calculations provided in section 9.4 - Calculating Capacity Utilization to
determine utilization levels for each category.
After the data is
collected, categorized, and analyzed, it can then be related to:
utilization: A baseline - where am I at today?
What effect did the recent change have compared to the baseline?
Scenarios: If I add 200 agents, what will likely be the effect?
You should make changes to an existing Unified ICM/Unified CCE deployment in small steps. Then analyze the impact of each step with a well-established, repeatable process. This process includes the following phases (steps):
Sample Phase: Initiate data sampling at the same time for the same interval for each change made.
Collect and Categorize Phase: Collect the samples and distribute to appropriate buckets.
Analysis Phase: Check application resource boundaries – has any component exceeded utilization limits? Determine best fit for new deployment requirements. Estimate solution level capacity utilization for new requirements.
Change Phase: Implement changes to solution based on analysis and estimate of impact.
Do it all over again. Re-execute the process exactly the same it was done before you ensure that an equal comparison is made..
Capacity Planning – Getting Started
The first thing you must do to get started with a capacity management plan is to establish a baseline – answer the question: "what is my capacity utilization today?" To answer this question, you must first determine the busiest, recurring period within a reasonable timeframe. For most
business call centers, there is usually a 1-hour period of each day that is typically the busiest. Moreover, there can be busier days of the week (for example Monday vs. Wednesday); busier days of the month (last business day of the month) or busier weeks of the year (for example, the
first week in January for insurance companies, or for the IRS, the first two weeks of April). These traditionally busy hours, days, or weeks represent the most taxing period on the deployment; these are the periods during which a capacity utilization calculation is best because you
always want to ensure that your deployment is capable of handling the worst.
The steps to getting started are:
Set up basic sampling (daily) Sample the performance counter values: CPU, Memory, Disk, Network, Call and Agent Traffic
Determine the busy periodIdentify the recurring busy period – worst case scenario – by:
Establish a baseline of utilization for the target periodDetermine hardware capacity utilization
Identify components with high capacity utilization
Craft a recurring collection planDevise a plan that is repeatable – preferably automated – that can be done on a weekly basis whereby samples are obtained during the busiest hour of the week.
After you establish a baseline and identify a busy hour, daily sampling is no longer necessary; you must sample only during the busy hour on a weekly basis. However, if regular reporting shows that the busy hour may have changed, then you must complete daily sampling again so that you can
identify the new busy hour. After you identify the new busy hour, weekly sampling during the busy hour can resume.
To find the busy hour, you must initiate continuous data sampling to
cover a full week, 24 hours a day. The data sampled are the performance
counters for CPU, Memory, Disk, and Network as listed in
You can set up performance counter values to be written to a disk file in comma
separated values (.CSV) format, which is easily imported into a Microsoft Excel
workbook. Collect the data sample files, import them into Excel and graph them
to see the busy hour. You can import the data set into a graph in a matter of
minutes and easily determine the busy hour.
Figure 2. Graph of Samples to Find Busy Hour
Collected Data Categorization
Collected data should be categorized by critical resource for each change event or need. The list below shows the instigators for sampling, collecting, categorizing, analyzing data to determine capacity utilization.
You must establish and maintain a deployment baseline; this baseline is used to do before/after comparisons. You must establish a new baseline after you make a change in the deployment design.
Establish an initial baseline – today – with the current deployment design
Re-establish a baseline after deployment changes occur, such as:
Add/delete a Peripheral Gateway
Add/delete an Administration & Data Server
Clustering over WAN – any change to WAN characteristics
You can use week-to-week comparisons to identify changes that occurred that you were not aware of. For example, someone adds additional skill groups without prior approval or notification and suddenly utilization jumps, inexplicably, by 5%. Such a change is noteworthy enough to ask the
following questions: What changed? When? Why?
When analyzing the current solution, you must maintain deployment information and track changes:
Topology diagrams (network)
Cisco Unified Communications Manager Clusters
Unified IP-IVR or Unified CVP peripherals (and port quantity)
Third-party add ons
Changes to Unified ICM/Unified CCE configuration can impact computing resources and thus impact the utilization for a hardware platform, an application component and in some cases, the entire solution.
Configuration change examples:
Adding skill groups
Changing number of skill groups per agent
Adding ECC data
Increasing calls offered (per peripheral) per half hour
Using the baseline that you established, you can characterize the impact of the configuration change by comparing utilization before the change to utilization after change.
By making changes methodically in small steps, you can characterize each small change (for example, adding one skill group at a time) and note the impact. In the future, if a change request comes to add 10 skills group, you can make an educated guess at the overall utilization impact by
extrapolating: adding one skill group caused a 0.5% increase in PG CPU utilization at the half hour, so adding 10 skill groups can result in a 5% increase in PG CPU utilization at the half hour. Can a 5% increase in PG CPU utilization be accommodated?
Configuration changes often have an impact on performance. Ensure that you track ongoing changes and analyze the impact. The following configuration changes are likely to impact utilization:
Overall Database Size
Number of Skill Groups per Agent
Number of Skill Groups per Peripheral
Number of Call Types
Number of Dialed Numbers
Number of Agents per Peripheral
Total Agent Count
Amount of Attached Call Data
Other configuration factors that can affect utilization:
Agent level reporting
Persistent ECC, per call type, per peripheral
Percentage of call types per peripheral
Average skill group per agents and total skills per system
Number of Administration & Data Servers (real time feeds)
Number of concurrent reporting users
Examples of impacting traffic load changes:
Inbound call rate
For example, your marketing department is about to introduce a new discount program for an existing service: "Sign up before July 31 for the new discounted rate!" You have been monitoring inbound call rate (Unified ICM/Unified CCE Router: Calls/sec counter) and see a relatively
consistent 4 calls/sec inbound rate during the Monday morning busy hour as compared to an average of 3 calls/sec during the rest of the day. You predict that the new marketing program will increase the inbound call rate to 6 calls per second during the busy hour. You calculated
that utilization is at 50% during the busy hour while averaging at 40% during the rest of the day. You determine that the increase in call rate will push utilization as high as 75%, which the system can tolerate.
The Unified ICM/Unified CCE system is a collection of distributed, dependent software components that communicate by network messaging. Components communicate via a public network connection – some components also communicate via a private, dedicated network connection. On the
public network, the Unified ICM/Unified CCE may be competing for network bandwidth. Any increase in public network utilization may slow the ability of a Unified ICM/Unified CCE component to transmit data on the network, causing output queues to grow more than normal. This can
impact memory utilization on the server and timing of real-time operations.
Any change in traffic or load has a corresponding impact on utilization and capacity. Additional examples of impacting traffic include:
Overall Call Load—BHCA and Calls per Second
Persistent ECC, per call type, per peripheral
Percentage of call types per peripheral
Number of concurrent agents logged in (including monitored IVR ports)
Number of concurrent reporting users
When analyzing future growth, you must consider all possible migrations:
Business requirements for migration: Adding a new line of business, additional skill groups
Expected growth: Recent history has shown a steady 10% increase in agent population
Any hardware or software changes in the platform itself can have a corresponding impact on utilization.
A "technology refresh" upgrade (upgrading both hardware and software) of the Unified ICM/Unified CCE has a significant effect on capacity utilization. Advances in hardware capabilities and a continued focus on streamlining bottlenecks in the software have yielded significant increases in
server and component capacities.
In some cases, hardware upgrades (without a software upgrade) may be necessary to accommodate growth in the Unified ICM/Unified CCE deployment.
A "common ground" upgrade (upgrading software while retaining existing hardware) of Unified ICM/Unified CCE may have a differing effect on capacity utilization depending on the changes made to the software from one release to the next. In some components, utilization may increase slightly
because new functionality was added to the component, which has slightly decreased its execution performance. However, another component in which performance improvements was introduced, utilization may decrease from one release to the next.
You must plan to re-establish a capacity utilization baseline after any upgrade.
utilization data is at the foundation of capacity analysis. This data is
sampled values of performance counters such as: CPU, Memory, Disk, and Network.
The data set is from the busy hour as determined by the steps described above.
short-duration spikes that are statistical outliers, use a sample rate of one
sample every 15 seconds of each of the listed counters. Of the sample set, base
the calculation on the 95th percentile sample. The 95th percentile is the
smallest number that is greater than 95% of the numbers in a given set.
Counters are divided
into two categories:
value is only valid if the indicator values are
the indicator values are within acceptable levels, then the measurement value
is used in the forthcoming calculation to determine utilization.
value is a Boolean indication of
exceeding the maximum threshold is, of course,
the indicator value is
assume that capacity utilization was exceeded. If so, you must take steps to
return the system to < 100% utilization which may require hardware upgrade.
Capacity utilization is
considered to be >= 100% if published sizing limits are exceeded for any
given component. See the
Contact Center Enterprise Design Guide at
http://www.cisco.com/en/US/products/sw/custcosw/ps1844/products_implementation_design_guides_list.html for a quick reference on configuration
limits and scalability constraints. For more information seeUnified Communications in a
Virtualized Environment and the
Compatibility Matrix for
Unified CCE pages on the DocWiki, and the
Sizing Tool for information on system constraints. For example: if the
server on which a Unified CC PG is installed has a published capacity of 1,000
agents but there are 1,075 active agents at a particular time, the server is
considered to be greater than 100% utilization regardless of what might be
calculated using the methods described herein. The reason for this is that
although the server/application seems to be performing at acceptable levels,
any legitimate change in usage patterns could drive utilization beyond 100% and
cause a system outage because the published capacity was exceeded. Published
capacities seek to take into account differences between deployments and/or
changes in usage patterns without driving the server into the red zones of
performance thresholds. As such, all deployments must remain within these
published capacities to enjoy continued Cisco support.