The previous chapters provide background on the foundational topics related to SBCs, and this chapter can proceed to a meaningful discussion of the approaches to sizing and scaling SBCs for a desired implementation. This chapter discusses the following major topics as they pertain to scalability and sizing considerations for SBCs:
Platform Sizing—This section outlines the factors and constraints that dictate decisions around how to size the SBC platform for the desired environment. Specific sizing guidelines for CUBE are provided for the various supported platforms.
Licensing—Because many vendors have implemented licensing mechanisms that are coupled to the quantity of SBC nodes and/or call capacity of the platform, licensing considerations and procedures are discussed in this section.
Oversubscription Prevention Techniques—The final section of this chapter covers the general recommendations and techniques for preventing oversubscription of SBC-resources—including the RFCs for SIP overload control and Call Admission Control (CAC) techniques to control the use of finite system resources.
Though the topics in this chapter cover vendor-neutral concepts and approaches, note that a substantial amount of this topic is vendor and platform specific. When deploying an SBC, in considering sizing and licensing, be sure to consult design and deployment references specific for that vendor and platform. Also, given that sizing is tightly bound to the hardware available at the time, the platform-specific data in this chapter is subject to change as the publication ages. Please consult the Cisco Unified Border Element Data Sheet for the most recent information specific to these topics for CUBE.
This chapter covers the factors you need to understand to appropriately size an SBC deployment, license the deployment for the intended use, and control the finite resources of the platform effectively to prevent oversubscription or degradation of SBC service quality.
Platform Sizing
This section outlines the basic concepts involved in appropriately sizing an SBC platform for an environment. First, the general concepts around how vendors size SBCs are outlined, along with how to monitor and validate a platform’s capability to operate within the existing capacity. Then, as sizing recommendations are inherently vendor and platform specific, this section also covers the specific sizing guidelines for CUBE.
General Scalability Concepts
Sizing of SBCs is usually governed by two main constraints: calls per second and concurrent calls. These statistics determine the call density capability of the SBC platform but alone are not enough to determine the SBC footprint needed for a deployment. In addition to understanding the capability of the platform, the call behavior of the user base and the nature of how the infrastructure devices communicates between each other for the intended call flows must also be understood to determine the overall SBC deployment needs.
The following sections cover how to determine the maximum SBC call size capability, as well as how to design for the capacity needs of the user base supported by the SBC service.
Calls per Second and Messages per Second
Calls per second (CPS) is a measurement of the number of attempted calls that are coming into the platform for setup at any given time. It is typically determined based on the number of inbound call setup requests for new discrete calls within a defined measurement period, typically a single second. The processing of call setup requests is demanding on the platform’s resources. Current SBC platforms support a wide range of only a few CPS to more than 1000 CPS, depending on the platform hardware and processing capability.
Different call signaling protocols and different call flow types affect the extent to which the incoming call processing is affected. With a typical standard SIP call, there are a total of 14 messages (7 on each call leg) at a minimum. With early-offer calls, this increases to 9 messages per leg due to the addition of the PRACK transaction. Invoking supplementary call services such as call transfers, hold and resume, conferencing, or mid-call session refresh also increases the count of messages per call that need to be processed. As a result of a call having a variable number of messages, messages per second is a more ideal parameter for anticipating the capability of a platform; however, most vendors currently size SBCs based on CPS values instead of documenting messages-per-second capabilities.
Most vendors run benchmark tests to define these guidelines and have a standard call flow type that is tested across the platform. Call density is then increased until the platform reaches a specific threshold of CPU and memory utilization that is deemed at the threshold of being acceptable. Although the CPU utilization can theoretically get to 99.99…% before it starts blocking messages and resulting in performance degradation, general practice is to keep CPU around a maximum average of 75% and peaks below 90%. Some SBC vendors also deploy techniques to block or deprioritize lower-priority tasks when resources get scarce, which may entail rejecting calls at a specific CPU threshold. Cisco IOS starts rejecting calls when the 5-second CPU value exceeds 98%.
The configuration of additional call features has the potential to impact and reduce the CPS from the benchmark tests. Consider that there may be some impact to the CPS capability when enabling features such as the following:
Security features (for example, SRTP, TLS, IPsec, access lists, firewall, intrusion detection)
Authentication, authorization, and accounting (AAA) lookups
Gatekeeper requests
Voice XML (VXML) or TCL scripts attached to calls
CAC or Resource Reservation Protocol (RSVP)
SIP normalization
Media forking (such as call recording)
Simple Network Management Protocol (SNMP) polling/logging or call detail record (CDR) reporting
Multi-VRF (virtual routing and forwarding) support
Therefore, although vendors can provide guidelines on CPS for the platform, CPU should be monitored in the actual environment to understand actual behavior and identify conditions where CPU utilization could be causing impacts to SBC performance. The root concept is to understand if there are tasks that are waiting on availability of the CPU to process; although CPU utilization has high correlation to this condition, it is often not the best measurement to observe when application processes are waiting for system resources.
An alternative way to measure CPU performance is to observe the system load, which is the count of the active processes that are running or waiting. A system load value that is greater than the total number of CPU cores on the platform indicates that processes are waiting for a CPU to be available and creating latency.
Figure 8-1 provides a conceptual example of a system load of three on a four-core system. In this example, the system has an overall processing capacity of four simultaneous processes (one per core), yet there is only a system need for processing three processes. So, for this example, the instantaneous system load is three (all currently processing), against the overall capable capacity of four parallel processes. If four concurrent processes are exceeded, processes will have to wait in queue for processing. To put this another way, the system load is currently performing at 75% of the overall capacity in this example and, consequently, not introducing any additional delay due to queueing for processing time.
Figure 8-1 Visual Demonstration of Underutilized System Load
Now consider another example, where the same four-core system has more items to process within the same moment of time. In Figure 8-2, the system is processing four active processes, so there are not any more available cores to use. As a result, three additional processes are ready to be processed yet have to wait in the queue. A system load is the total of currently processing tasks plus those in the pending queue, so the system load here is measured as seven. Given that the system has only four cores, the system is currently being overloaded by 75% (100% – 7/4). This overload percentage is mathematically derived by subtracting 1 by the division of the system load and the number of cores, and representing as a percentage.
Figure 8-2 Visual Demonstration of System Load Being Exceeded
The instantaneous values are not the most meaningful measurements (and calculating them is not practical). What is more important is how quickly processes are entering and leaving the queue and CPU. Therefore, system load is often measured as an average of the values over a period of time and often represented as a decimal.
On Linux platforms (a common deployment platform for various SBC vendors), system load is measured with commands such as the uptime command, as shown in Example 8-1.
Example 8-1 Output of the uptime Command on Linux to Measure System Load
linux$ uptime 23:25 up 14:52, 2 users, load averages: 2.32 2.34 2.42
The highlighted values in Example 8-1 represent the 1-minute, 5-minute, and 15-minute averages of CPU load, respectively. This output was taken on an eight-core system and demonstrates that the system is operating under sufficient load, at about 30% of capacity (2.42 / 8).
In virtual environments, it is also important to ensure that the system load of the underlying compute infrastructure is running under adequate system load. With VMware ESXi environments, CPU load be observed with the parameter CPU Ready, which is a measurement of times when a virtual machine was ready to use CPU but was unable to be scheduled due to underlying host CPU resources not being busy. CPU Ready is represented in either a percentage or a time duration, depending on where the value is being observed. VMware suggests that the CPU Ready value should be below 1000 ms (also sometimes represented as a value of 5%) to ensure that processes aren’t being starved from CPU resources.
CPU Ready can be observed on ESXi through the options Advanced > Chart Options > CPU > Real Time > Ready under the virtual machine in vCenter. Figure 8-3 shows this output represented as a percentage, and Figure 8-4 shows the output represented as a duration of time:
Figure 8-3 Enabling CPU Ready Parameter in vCenter Chart Options
Figure 8-4 Observing CPU Ready Values for System Load on ESXi Environments
Monitoring of the CPU Ready value is especially important in virtual environments where a decision has been made to oversubscribe the virtual CPU (vCPU) allocation over the physical CPU cores (pCPUs) available. It is important to note that not all vendors support conditions where vCPUs are oversubscribed above a direct 1:1 relationship of vCPU to pCPU.
Concurrent Calls
Concurrent calls refers to the number of calls that are active on a platform at any given time. Depending on the platform, CPU may still be leveraged for forwarding of packets once a call is set up, but the typical significant constraint with the number of concurrent calls that a platform supports at once is the amount of available memory on the platform. Once the calls are already set up (as governed by the CPU constraints that impact CPS), the constraint shifts to the memory component. Memory is used to support both maintenance of the call state information and packet forwarding of the media stream. Note that many specialized SBC hardware platforms forward the majority of media plane packets in hardware to lessen the strain of packet forwarding on the CPU. The codec complexity and number of RTP and RTCP streams per call are main factors that dictate how many concurrent calls can be supported across the platform.
On Linux-based platforms, extra memory is used to cache information that is commonly read from disk. As a result, it is common to observe slow growth in used memory, but this should not be considered a memory leak, as the memory can be freed up again when needed. Instead of observing the used memory percentage, which includes the memory used by cache, the amount of memory used by cache should be ignored. On Linux-based platforms, available memory exclusive of the cache can be observed with the command free -m, as shown in Example 8-2.
Example 8-2 Output of free -m to Observe Memory Utilization
linux$ free -m
total used free shared buffers cached
Mem: 12286456 11715372 571084 0 81912 6545228
-/+ buffers/cache: 5088232 7198224
Swap: 24571408 54528 24516880
Some recent Linux distributions (anything with procps-3.3.10 and higher) will have a modified version of the command output, where the used parameter no longer accounts for the cache used by the system. With this change, a new parameter has been added, which is similar to the free count for -/+ buffers/cache but instead approximates how much memory would still be used by applications while still sparing a minimum amount of utilization for cache. Therefore, the available field provides a more real-word estimate of how much memory is available for applications than does free -/+ buffers/cache. The output of free -m with the available field is demonstrated in Example 8-3.
Example 8-3 Changed Output of free -m to Observe Available Memory
linux$ free -m
total used free shared buff/cache available
Mem: 3553 1192 857 16 1504 2277
Swap: 3689 0 3689
In addition to memory constraints that govern the concurrent call capacity, the concurrent call number is also potentially governed by any hard limits that may be placed on the platform (such as maximum calls supported). This limit may be platform or hardware specific, but it may also be restricted by the licensing of the platform permitting a specific maximum of concurrent calls.
Finally, if DSP resources are needed for transcoding, transrating, or interworking scenarios, then the total capacity of how many of these sessions can be supported must also be taken into account when determining the overall concurrent call capacity of the platform. Consult vendor-specific documentation to determine the total capacity of the available DSPs for the various scenarios where they may need to be invoked.
Call Traffic Engineering
Understanding only the capability of the platform is not enough to appropriately size an SBC environment. It is also important to understand the call volume of the overlaying user base that is consuming the SBC service.
The expected values of CPS and concurrent calls that will be seen in a call environment are largely based on two factors:
Busy hour call attempts (BHCAs)—The quantity of call attempts made within the busiest hour of the day
Average handle time (AHT)—The average time that a user is on a call, determined by the difference in time between call setup and final disconnect of the call
These values vary depending on the underlying nature of the user base that is placing and receiving calls. There is also an interaction effect between the two values, as the length of a call also dictates how many calls a user can potentially place in an hour. For instance, a manufacturing warehouse is likely to place fewer calls than a corporate office, and both will generally handle fewer calls than a customer care center. As a result, these different environments will result in different BHCAs based on the underlying worker types. Most non-contact center enterprises average between 1 and 2 BHCAs per user. When BHCA behavior is unknown, 4 BHCAs is a good value to use for non-contact center environments to account for unknowns. Contact center environments may be anywhere from 5 to 30 BHCAs per user. Contact center environments that see above 30 BHCAs per user are atypical, and as a result would require custom attention to sizing.
AHT values also vary based on the underlying work being performed. As a result, a contact center for a service desk handling password resets may be significantly lower than the AHT for a corporate office where employees are joining several hour-long conference calls in a day. The generally accepted practice in the industry for call traffic engineering is to use 3 minutes for AHT. Lower AHT values will be more demanding on the call infrastructure, so if it is known that a user base fields very short calls, this may be reduced to 90 seconds or substituted with the actual known AHT values that have been observed. Industry surveys show that the general average handling time is closer to somewhere between 7 and 15 minutes; therefore, using a 3-minute AHT as a standard for most deployments will help account for fluctuations and unknowns in call behavior as it is more aggressive than the reality generally seen in deployments.
When sizing for an SBC deployment, it is important to understand what the BHCA and AHT values are for the targeted environment. Usually, call detail records are available from the current call system, which can be leveraged to calculate the actual values for the environment. Actual values for BHCA and AHT should be used whenever possible to avoid invalid assumptions.
If previous values for call volume are not available, consider using the values listed in Table 8-1 as a starting point.
Table 8-1 BHCA, AHT, and Concurrent Call Suggested Values
Scenario |
BHCA per User |
Concurrent Calls |
AHT |
Low demand |
>1 |
10% |
3 minutes |
Average enterprise |
1.5 |
20% |
3 minutes |
High demand* |
4 |
33% |
3 minutes |
Contact center |
5–30 |
60–90% |
3 minutes |
In addition to understanding the characteristics discussed to this point, it is also important to understand how the call behavior corresponds to the type of calls the SBC will be handling. Not all calls considered for the concurrent call ratios may be going through the SBC deployed, based on the nature of how the calls route and for what call paths the SBC is intended to be deployed. Consider whether internal calls will still be routed across the SBC deployment and how SBCs distributed across different sites or regions may affect the call volume relevant for the SBC deployment.
Case Study: Sizing a Generic SBC
This section provides an example of an SBC deployment with the following requirements:
Contact center integration
5 SIP-to-SIP calls per second
200 concurrent calls
10% of calls requiring transcoding between G.711 and G.729
No expectation of additional capacity growth
SBC high availability with stateful failover
Geo-resiliency of SBC service across data centers
The following steps can be taken to appropriately size this SBC based on the requirements listed:
Step 1. Identify the vendor specification for calls per second and concurrent calls supported for the vendor’s platform(s).
Step 2. Include any additional call volume that may either be anticipated for future growth or for a risk buffer as a precaution against unanticipated factors.
Step 3. Assess the CPS requirements for the targeted deployment. Also consider that if there is a significant presence of complex call flows, the increased SIP messaging per call may reduce the vendor’s baseline guidelines on CPS.
Step 4. Factor in any additional features or supplementary services that may further reduce sizing based on vendor’s guidance and adjust accordingly. Refer to the list in the section “Calls per Second and Messages per Second,” earlier in this chapter, for some potential examples.
Step 5. Select a platform that meets the minimum requirements for both calls per second and concurrent calls, as determined in steps 1–4.
Step 6. Assess whether any other features are needed on the SBC that may exist only on some subset of the platform offerings to ensure that the target platform meets the functional needs of the deployment.
Step 7. If providing a design with local resiliency, account for the additional hardware needs to support local high availability of the SBC service.
Step 8. If providing a design with geo-resiliency, replicate the design in a second data center. Then account for this resiliency with the call routing design of these other components to provide geo-resiliency in the event that the SBC in one of the data centers becomes unavailable.
Based on the requirements, the SBC platform(s) to be selected for the deployment of this case study must meet a total of 5 SIP-to-SIP calls per second and 200 concurrent calls. There is a desire to be able to handle all the call-processing needs on a single active device, avoiding the need for external devices to load balance across multiple active SBCs.
The vendor chosen for this case study guides CPS only for standard SIP calls. Because this deployment is for a contact center, the vendor’s CPS requirements must be adjusted to better reflect the message density in the contact center environment. As a result, the base requirement of 5 CPS will be adjusted by a factor of three to account for the three-fold increase in per call SIP messaging in this contact center environment. Therefore, this environment will be sized against vendor guidelines for a platform that supports 15 CPS even though only an anticipated maximum of 5 CPS will be observed on this contact center platform.
Due to 10% of the calls requiring a transcoder, DSPs will also be needed. As a result, based on DSP calculations to support 20 high-complexity transcoding sessions for this platform, a hardware DSP module with 64 channels is selected for purchase with each chassis.
No additional features, such a SRTP or Multi-VRF routing, are needed for this deployment, so further adjustments to sizing are not needed at this point. Given the requirements for high availability, each data center would get two of the vendor’s medium-class chassis, so one device in a pair can perform as a warm standby. This design would then be replicated in an additional data center to provide geo-resiliency, resulting in a total of four devices being purchased to support these deployment requirements.
Because the concurrent call volume required is much less than the platform can handle, no additional memory expansions would be needed on these devices for this scenario.
The sizing of this platform doesn’t allow for any additional growth with the incoming call rate of 15 CPS, but it allows for 60% growth of transcoding resource needs and 93% growth of concurrent calls.
After these SBCs are deployed, utilization of CPU, memory, and media resources will be monitored and logged. The goal of this monitoring is to validate that the capacity assumptions from the original requirements are met and that the router is performing adequately based on the actual load across the platform for this specific production environment.
Monitoring thresholds will then be defined for the resource capacities, such that if demand on the router grows unexpectedly over time and consistently crosses above 75% for system load or memory utilization, administrators will be notified to purchase additional resources for handling the growth in demand.
CUBE Sizing
Many of the general sizing concepts outlined in the previous section also dictate how CUBE deployments are sized for call environments. This section describes the various items that may constrain CUBE’s call volume capability.
General CUBE Platform Sizing
With CUBE, the number of concurrent sessions and calls-per-second capability depend on the underlying platform. IOS-XE platforms where the signaling and media planes are discrete (such as the ASR series) allow for significantly higher CPS capability than platforms with monolithic processing. The improved performance is due to the platform’s use of separate hardware for handling of call signaling from the hardware for forwarding of established call’s media streams.
Sizing of the virtual CUBE environments (that is, vCUBE on the CSR 1000v) is dependent upon the CPU and DRAM provisioning.
For the most up-to-date information on CUBE sizing, consult the current Cisco Unified Border Element Data Sheet.
The following are some items to keep in mind when sizing CUBE for a deployment:
CPS capability decreases for more complex call flows where more messages per second need to be processed.
Media forking for call recording reduces the maximum sessions and CPS by the number of multiple sessions per call leg (for example, 5 calls being recorded to a single server would be considered a total of 10 sessions).
Multi-VRF support increases processing overhead in proportion to the number of VRFs configured.
Enabling the call monitoring feature slightly reduces the platform’s maximum concurrent call volume.
Media Resource Sizing for CUBE
In situations where media resources need to be invoked, the DSP resources may become a constraint before the platform’s call limitations are reached for concurrent calls.
When a transcoder (for converting between two mismatched codecs or transrating between two packetization/payload rates) is needed on CUBE, physical DSPs are required. As a result, when sizing a CUBE platform, it is important to properly size for the number of DSPs on the platform. Each platform has a different capable DSP density based on both the number of DSP slots available and the type of DSPs for the platform. Different DSP types may also support different densities for each codec. For either of these scenarios, the MTPs should be registered to UCM as a transcoder; this is a minor difference from when solving for transrating to UCM where a hardware MTP is configured. CUBE only supports registration of a transcoder resource and not a hardware MTP resource for the use of transrating.
Note that in situations where transcoding sessions may be reached before CUBE’s concurrent session limit is reached, the platform will still support the remainder of calls to reach the concurrent session limit as non-transcoder-invoked calls, as long as the latter is permissible by the needs of the call flow.
Before outlining the DSP density capabilities on platforms, a key concept of codec complexity must be discussed. It takes various processing power to encode and decode the various codecs, so codecs have been categorized into three general complexity buckets for ease of calculation. Each DSP platform may vary in terms of the complexity with which it treats each specific codec that it supports. Codec complexity is categorized as low, medium, and high. Enabling secure RTP codecs may also influence the complexity associated to the codec of interest.
In addition to considering codec complexity, it is important to understand the DSP type and quantities that each CUBE platform supports to appropriately determine the DSP capacity of a platform. Each CUBE platform supports different models of DSPs, and the various models differ in their capacity. Likewise, the various CUBE platforms also have varying numbers of DSP slots, and the number of slots influences the total number of DSPs that can be supported on the platform. Consult the router platform and DSP data sheets for more information on the current options.
Figure 8-5 shows a hypothetical router chassis with three DSP slots to house three DSP cards. The diagram demonstrates how a DSP card can be loaded in many permutations, based on the total capacity supported by a DSP. This example shows that these DSP card types could support 20 G.729 calls, 43 G.711 calls, or a combination thereof, such as 18 G.729 calls and 4 G.711 calls on a DSP. Note that there can be some unused DSP capacity, depending on how the different permitations of the codecs and quantities fit against the DSP’s available capacity.
Figure 8-5 DSP Capacity with Mixed Codecs
When codec complexity or the call density that is supported for a DSP platform is not known, it can be verified with the show voice dsp capabilities command shown in Example 8-4.
Example 8-4 Output of show voice dsp capabilities to Validate DSP Codec Density
CUBE# show voice dsp capabilities slot 0 dsp 2 DSP Type: SP2600 -43 Card 0 DSP id 2 Capabilities: Credits 645, G711Credits 15, HC Credits 32, MC Credits 20, FC Channel 43, HC Channel 20, MC Channel 32, Conference 8-party credits: G711 58, G729 107, G722 129, ILBC 215 Secure Credits: Sec LC Xcode 24, Sec HC Xcode 64, Sec MC Xcode 35, Sec G729 conf 161, Sec G722 conf 215, Sec ILBC conf 322, Sec G711 conf 92, Max Conference Parties per DSP: G711 88, G729 48, G722 40, ILBC 24, Sec G711 56, Sec G729 32, Sec G722 24 Sec ILBC 16, Voice Channels: g711perdsp = 43, g726perdsp = 32, g729perdsp = 20, g729aperdsp = 32, g723perdsp = 20, g728perdsp = 20, g723perdsp = 20, gsmperdsp = 32, gsmefrperdsp = 20, gsmamrnbperdsp = 20, ilbcperdsp = 20, modemrelayperdsp = 20 g72264Perdsp = 32, h324perdsp = 20, m_f_thruperdsp = 43, faxrelayperdsp = 32, maxchperdsp = 43, minchperdsp = 20, srtp_maxchperdsp = 27, srtp_minchperdsp = 14, faxrelay_srtp_perdsp = 14, g711_srtp_perdsp = 27, g729_srtp_perdsp = 14, g729a_srtp_perdsp = 24,
The output from Example 8-4 has the following important components:
Credits 645 — These are the total MIPS (machine instructions per second) processing cycles for the DSP, which correspond to the total density of processing that the DSP can support.
G711Credits 15 — These are the number of credits used for a single G.711 call, also referred to as flex complexity (FC), which in this case is 15 credits per call.
HC Credits 32 — High-complexity codecs use 32 credits on this DSP platform.
MC Credits 20 — Medium-complexity codecs use 20 credits on this DSP platform.
FC Channel 43 — This is the total number of flex calls supported on the entire DSP (or channel), which is the number of FC calls that can fit on the DSP. This is the same as calculating the codec complexity (15 for FC on this DSP) into the total number of credits (645) for the DSP.
HC Channel — This is the same as the FC Channel but for high-complexity codecs. (that is, 645 / 32 = 20 high-complexity calls for each channel.)
Sec HC Xcode 64 — Transcoding to a high-complexity SRTP codec takes 64 credits. (The encryption requires more DSP processing than that of a normal high-complexity call.)
g711perdsp = 43 — This is the list of each codec supported on the platform and the total calls supported by the platform. These are the same values as for Channel, listed earlier in the output, but now listed for each specific codec to help when the complexity is not known for a coded with a specific DSP type.
Troubleshooting Scalability Issues
To validate that there aren’t capacity issues on CUBE, there are three key components to observe:
CPU utilization and system load
Memory utilization
Media resource rejection
The troubleshooting methodologies for these items are described in the subsequent subsections.
CPU Utilization and System Load
It is important to monitor the CPU and system load on a platform to ensure that the calls per second and message processing are not putting the system in a situation where message processing is degraded. The goal is to keep average CPU processing below a 75% average. Monitoring of CPU through SNMP is discussed at length in Chapter 14, “Monitoring and Management.”
As mentioned earlier in this chapter, though, CPU utilization does not provide the most accurate measurement of load on a platform for processing. System load is a measurement that expresses the system load and any processes that are getting starved, and it more accurately reflects system processing performance than CPU utilization. With IOS-XE platforms (discussed in Chapter 2, “SBC Deployment Models”), system load can be observed with the command shown in Example 8-5.
Example 8-5 Validation of System Load on IOS-XE
Router# show platform software status control-processor brief | section Load Load Average Slot Status 1-Min 5-Min 15-Min RP0 Healthy 0.25 0.30 0.44 ESP0 Healthy 0.01 0.05 0.02 SIP0 Healthy 0.15 0.07 0.01
In Example 8-5, note the presence of a discrete entry for each processing module (for example, RP0, ESP0, SIP0).
The target values for system load will depend upon how many processing cores are available for each of the different control processor types. Target system load values should not peak above the number of cores available to the specific processing module. When system load values are greater than the number of processing cores, the system is experiencing load where the platform cannot process as fast as there is demand. Such scenarios result in some amount of performance degradation, as processes are waiting in a queue to be processed. The number of cores on a module can be found by referencing the corresponding data sheet for that module.
Example 8-6 provides an example that demonstrates excessive load for both the 1- and 5-minute averages. This example is from a platform with two cores for RP0, two cores for ESP0, and two cores for SIP0. Therefore, the target average load value to prevent processing latency is to keep each of these values below the number of cores (2). Highlighted values represent excessive load.
Example 8-6 Excessive System Load on IOS-XE
Router# show platform software status control-processor brief | section Load Load Average Slot Status 1-Min 5-Min 15-Min RP0 Healthy 1.25 1.30 3.44 ESP0 Healthy 3.01 2.05 1.02 SIP0 Healthy 1.15 2.07 1.01
It is advisable to log the system load values to a central monitoring appliance so that the system load can be trended and correlated to busy call periods in the network. System load trends should be periodically reviewed for the peak system load values, with the CPS or messages per second being observed during that period of time. It is also important to set a threshold to alert administrators if the system load is peaking above the number of cores on the platform to raise awareness when there may be degradation in platform performance. These monitoring concepts are discussed further in Chapter 14. Table 8-2 lists examples of OIDs for system load on IOS-XE.
Table 8-2 OIDs Corresponding to System Load for IOS-XE
Time Interval |
Module |
OID |
1-minute |
RP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.24.2 |
|
ESP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.24.3 |
|
SIP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.24.4 |
5-minute |
RP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.25.2 |
|
ESP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.25.3 |
|
SIP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.25.4 |
15-minute |
RP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.26.2 |
|
ESP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.26.3 |
|
SIP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.26.4 |
Because system load is highly correlated to the CPS and messages per second on the platform, it is often beneficial to understand where the CPS value is peaking on the platform.
CUBE has a useful command, show call history stats cps, that can be used to identify details on the CPS occurring up to the last 72 hours on the platform (see Example 8-7).
Example 8-7 Sample Output of show call history stats cps to Observe Max CPS
CUBE# show call history stats cps
2 1 1 1
10
9
8
7
6
5
4
3
2 *
1 * * * *
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
Call switching rate / CPS (last 60 seconds)
# = calls handled by the module per second
12 4 1124 4
111 11 11 11 111118285411111111111118759111111 111 811111
100
90
80
70
60
50 * *
40 * * *
30 * ** *
20 ** * **** *
10 *#**# **** *
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
Call switching rate / CPS (last 60 minutes)
* = maximum calls/s # = average calls/s
54544454 4 2414 3444455444444 44 2344444 3 42 2 54455
28467504111111361811258460026679811 11111841111153532591219 193116234750
100
90
80
70
60 *
50 ******* * * ** *** ***** * * ** * * ***
40 ******** * * * ************ ** ***** * * *****
30 ******** * * * ************* ** ******* * * * *****
20 ******** * ** * ************* ** ******* * ** * *****
10 ******** * **** ************* ** ******* * ** * *****
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
0 5 0 5 0 5 0 5 0 5 0 5 0
Call switching rate / CPS (last 72 hours)
* = maximum calls/s # = average calls/s
The output in Example 8-7 is useful for understanding the peak values for CPS, but it has a 72-hour limitation on history. Alternatively, the command show call history watermark cps table provides absolute timestamps for the top five peak values, across four intervals of time since the last router reload (all-time). The titles of the sections are somewhat misleading: They represent the previous 60 seconds, 60 minutes, 72 hours, and all-time (since the last router reload).
The output shown in Example 8-8 demonstrates a peak CPS of 63 calls per second during Thu, 26 Oct 2017 17:16:21 GMT.
Example 8-8 Example Output for show call history watermark cps table to Observe Times of Peak CPS
CUBE# show call history watermark cps table =============================================== Calls Per Second / CPS =============================================== ------- The WaterMark Table for Second -------- _______________________________________________ Value : 2, ts : [Wed, 01 Nov 2017 13:59:28 GMT] Value : 1, ts : [Wed, 01 Nov 2017 13:59:19 GMT] Value : 1, ts : [Wed, 01 Nov 2017 13:59:12 GMT] Value : 1, ts : [Wed, 01 Nov 2017 13:59:04 GMT] Value : 0, ts : [Wed, 01 Nov 2017 13:59:29 GMT] =============================================== ------- The WaterMark Table for Minute------- _______________________________________________ Value : 43, ts : [Wed, 01 Nov 2017 13:17:24 GMT] Value : 55, ts : [Wed, 01 Nov 2017 13:15:20 GMT] Value : 26, ts : [Wed, 01 Nov 2017 13:34:39 GMT] Value : 42, ts : [Wed, 01 Nov 2017 13:18:24 GMT] Value : 23, ts : [Wed, 01 Nov 2017 13:37:28 GMT] =============================================== ------- The WaterMark Table for Hour -------- _______________________________________________ Value : 55, ts : [Sun, 29 Oct 2017 14:15:24 GMT] Value : 54, ts : [Wed, 01 Nov 2017 11:00:00 GMT] Value : 53, ts : [Sun, 29 Oct 2017 18:00:30 GMT] Value : 52, ts : [Wed, 01 Nov 2017 13:10:40 GMT] Value : 50, ts : [Wed, 01 Nov 2017 07:20:00 GMT] =============================================== ------- The WaterMark Table for All-Time------- _______________________________________________ Value : 63, ts : [Thu, 26 Oct 2017 17:16:21 GMT] Value : 61, ts : [Fri, 01 Sep 2017 18:02:03 GMT] Value : 58, ts : [Fri, 20 Oct 2017 13:01:36 GMT] Value : 58, ts : [Fri, 15 Sep 2017 12:13:41 GMT] Value : 56, ts : [Mon, 18 Sep 2017 06:10:20 GMT] ===============================================
Finally, although CPS is the typical means by which SBC vendors size platforms, processing is more determined by the overall message processing count. CUBE also has a way to uncover the average and peak SIP messages per second, through the respective commands show sip-ua history stats message-rate and show sip-ua history watermark message-rate table. The structure of these commands’ output is identical to that found in Example 8-8, except for representing the message rate histroy instead of CPS history.
The values for the commands used in Examples 8-7 and 8-8 are also exposed with SNMP through cvCallVolumeStatsHistory, and the parent OIDs of interest are shown in Table 8-3.
Table 8-3 SNMP OIDs for CUBE Call Volume Statistics
SNMP Object |
OID |
cvCallRateStatsTable |
1.3.6.1.4.1.9.9.63.1.4.3.1 |
cvSipMsgRateStatsTable |
1.3.6.1.4.1.9.9.63.1.4.3.5 |
cvSipMsgRateWMTable |
1.3.6.1.4.1.9.9.63.1.4.3.9 |
cvCallRateWMTable |
1.3.6.1.4.1.9.9.63.1.4.3.6 |
Each of the OIDs from Table 8-3 have child attributes that offer maximum and average values for the defined interval.
Memory Utilization
In addition to observing processing load, it is important to continually monitor memory utilization of a CUBE platform. Memory is important for two main reasons: to identify whether the concurrent call capacity is encroaching on the limit of the platform, and to detect the presence of memory leaks.
Much as when monitoring or troubleshooting CPU and system load issues, it is useful to monitor and trend memory utilization with network management software. Trends in memory utilization are useful for identifying times when memory utilization is exceptionally high.
For classic IOS platforms, show memory statistics history can be used to observe a historical trend up through 3 days. The two main items of interest are the average memory utilization and ensuring that available memory does not become starved. Example 8-9 provides an example of output from this command.
Example 8-9 Output of show memory statistics history
CUBE# show memory statistics history
------------------ History of Processor Mempool ------------------
777777777777777777777777777777777777777777777777777777777777
222222222222222222222222222222222222222222222222222222222222
100
90
80
70 **********************************************************
60 **********************************************************
50 **********************************************************
40 **********************************************************
30 **********************************************************
20 **********************************************************
10 **********************************************************
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
Free memory per second (last 60 seconds)
777777777777777777777777777777777777777777777777777777777777
222222222222222222222222222222222222222222222222222222222222
100
90
80
70 ##########################################################
60 ##########################################################
50 ##########################################################
40 ##########################################################
30 ##########################################################
20 ##########################################################
10 ##########################################################
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
Free memory per minute (last 60 minutes)
* = maximum # = average
777777777777777777777777777777777777777777777777777777777777777777777777
222222222233333333333333444444444444444555555555555555666666666666667777
100
90
80 **#############################
70 ######################################################################
60 ######################################################################
50 ######################################################################
40 ######################################################################
30 ######################################################################
20 ######################################################################
10 ######################################################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
0 5 0 5 0 5 0 5 0 5 0 5 0
Free memory per hour (last 72 hours)
* = maximum # = average
Remember that Linux-based platforms handle memory differently. Consumed memory is used as cache, and this cache is discarded or reduced later, when other applications need to utilize that memory space. As a result, with IOS-XE, it is not useful to observe memory utilization the same way as mentioned earlier, as memory reserved for cache is still potentially available memory for use when needed.
For IOS-XE, instead of monitoring utilized memory, you can use the command show platform software status control-processor brief to observe the instantaneous committed memory utilization. Example 8-10 shows abbreviated output of this command, which is obtained by suffixing the parameter | section Memory.
Example 8-10 Viewing Committed Memory for IOS-XE
CUBE# show platform software status control-processor brief | section Memory Memory (kB) Slot Status Total Used(Pct) Free (Pct) Committed (Pct) RP0 Healthy 3874504 2188404 (56%) 1686100 (44%) 2155996 (56%) ESP0 Healthy 969088 590880 (61%) 378208 (39%) 363840 (38%) SIP0 Healthy 471832 295292 (63%) 176540 (37%) 288540 (61%)
The output in Example 8-10 shows the committed memory (that is, the amount of utilized memory minus the memory used for temporary cache), which cannot be relinquished and repurposed for other needs. In other words, the committed memory is the memory that the system needs for current operations. Example 8-10 shows that 23% of the ESP0 memory (calculated as the difference between 61% and 38%) is being used for temporary cache.
When trending this memory construct across time, the SNMP OIDs shown in Tables 8-4 through 8-6 are useful.
Table 8-4 OIDs for Committed Memory (cpmCPUMemoryHCCommitted)
Module |
OID |
RP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.29.2 |
ESP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.29.3 |
SIP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.29.4 |
Functional memory that is available would be the difference between the total available on the platform and cpmCPUMemoryHCCommitted. To convert the raw values for committed memory from above into the percentage utilized, the committed memory utilization can be compared against the sum of the used and free memory (effectively comparing committed memory against the total memory on the platform). The corresponding OIDs for used and free memory are outlined in Tables 8-5 and 8-6.
Table 8-5 OIDs for Memory Used, Inclusive of Temporary Cache (cpmCPUMemoryHCUsed)
Module |
OID |
RP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.17.2 |
ESP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.17.3 |
SIP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.17.4 |
Table 8-6 OIDs for Memory Available but Not Being Used (cpmCPUMemoryHCFree)
Module |
OID |
RP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.19.2 |
ESP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.19.3 |
SIP0 |
1.3.6.1.4.1.9.9.109.1.1.1.1.19.4 |
When troubleshooting potential scalability issues on a platform, it is important to observe the utilized (IOS) or committed (IOS-XE) memory on the platform to ensure that there is enough available memory for the platform to perform adequately. From the perspective of CUBE scalability, if used memory (IOS) or committed memory (IOS-XE) utilization approaches the physical available limit, the concurrent call volume being handled across the platform should be decreased.
Media Resource Capacity
It is often not possible to anticipate the number of calls that will need media resources in an actual call environment, as the need to transcode or transrate calls often depends upon external factors that may be intermittent from the remote side of the integration. For example, it is not uncommon for calls across CUBE to an ITSP with exactly the same calling and called numbers to intermittently offer different codecs, determined by the call path taken in the ITSP cloud for each call instance. As a result, it is likely necessary to monitor and tune the media resources available when production call load on the system is observed. From this, the percentage of calls that need media resources can be observed and then used as a benchmark to scale as demand changes the overall call volume across the SBC.
Utilization of media resources can be observed in real time with the command show dspfarm all. Example 8-11 shows output of this command, with active sessions for a transcoder (profile 3).
Example 8-11 Validating Active Media Resource Utilization with show dspfarm all
Cube1# show dspfarm all Dspfarm Profile Configuration Profile ID = 3, Service = TRANSCODING, Resource ID = 3 Profile Service Mode : Non Secure Profile Admin State : UP Profile Operation State : ACTIVE Application : CUBE Status : ASSOCIATED Resource Provider : FLEX_DSPRM Status : UP Total Number of Resources Configured : 2 Total Number of Resources Available : 1 Total Number of Resources Out of Service : 0 Total Number of Resources Active : 1 Codec Configuration: num_of_codecs:5 Codec : g729r8, Maximum Packetization Period : 60 Codec : g711ulaw, Maximum Packetization Period : 30 Codec : g711alaw, Maximum Packetization Period : 30 Codec : g729ar8, Maximum Packetization Period : 60 Codec : g729abr8, Maximum Packetization Period : 60 SLOT DSP VERSION STATUS CHNL USE TYPE RSC_ID BRIDGE_ID PKTS_TXED PKTS_RXED 0/1 1 46.2.0 UP 1 USED xcode 1 13 0 5 0/1 1 46.2.0 UP N/A FREE xcode 4 - - - Total number of DSPFARM DSP channel(s) 2
Some of the statistics shown in Example 8-11 are also available through SNMP. Tables 8-7 and 8-8 outline the applicable OIDs for these statistics for overall and profile-specific utilization of media resources.
Table 8-7 SNMP OIDs for Overall Media Resource Utilization
Object |
OID |
Description |
cdspTotAvailTranscodeSess |
1.3.6.1.4.1.9.9.86.1.7.1 |
Total configured transcoder sessions across all profiles on the platform |
cdspTotUnusedTranscodeSess |
1.3.6.1.4.1.9.9.86.1.7.2 |
Total unused transcoder sessions across all profiles on the platform |
Table 8-8 SNMP OIDs for Profile-Specific Media Resource Utilization
Object |
OID |
Description |
cdspTranscodeProfileMaxConfSess |
1.3.6.1.4.1.9.9.86.1.6.3.1.2 |
Maximum configured transcoding sessions per profile |
cdspTranscodeProfileMaxAvailSess |
1.3.6.1.4.1.9.9.86.1.6.3.1.3 |
Current available transcoding sessions per profile |
When media resources deplete, CUBE fails with a Q.850 cause value of 47 (Resource unavailable, unspecified). It is therefore useful to monitor when calls fail with this cause code on CUBE. The robust way to perform this is to observe cause values through centralized CDR collection from CUBE.
The command show h323 gateway cause-code can be used to quickly observe whether there is an increase in the number of call failure codes caused by a lack of media resources. Example 8-12 shows output for this command, which is available for use with H.323 calls through CUBE.
Example 8-12 Output of show h323 gateway cause-code for Observing Media Resource Failure
CUBE# show h323 gateway cause-code CAUSE CODE STATISTICS AT 01:40:25 DISC CAUSE CODE FROM OTHER PEER FROM H323 PEER 16 normal call clearing 66 4976 31 normal, unspecified 1 0 34 no circuit 31 0 41 temporary failure 3 0 44 no requested circuit 13 0 47 no resource 3 17
In Example 8-12, the local CUBE has run out of media resources 17 times, and one of its remote peers has run out of media resources 3 times. The former would be remediated by adding more resources for this CUBE, whereas the latter would need to be addressed by the administrator of the remote peer.
For SIP gateways, presence of calls that have failed with a cause value 488 can be observed in the output of show sip-ua statistics, as shown in Example 8-13.
Example 8-13 Output of show sip-ua statistics to Validate Calls Failed with Cause Value 488
CUBE# show sip-ua statistics
SIP Response Statistics (Inbound/Outbound)
Informational:
Trying 0/0, Ringing 0/0,
Forwarded 0/0, Queued 0/0,
SessionProgress 0/0
Success:
OkInvite 0/0, OkBye 0/0,
OkCancel 0/0, OkOptions 0/0,
OkPrack 0/0, OkPreconditionMet 0/0,
OkSubscribe 0/0, OkNOTIFY 0/0,
OkInfo 0/0, 202Accepted 0/0
OkRegister 12/49
Redirection (Inbound only except for MovedTemp(Inbound/Outbound)) :
MultipleChoice 0, MovedPermanently 0,
MovedTemporarily 0/0, UseProxy 0,
AlternateService 0
Client Error:
BadRequest 0/0, Unauthorized 0/0,
PaymentRequired 0/0, Forbidden 0/0,
NotFound 0/0, MethodNotAllowed 0/0,
NotAcceptable 0/0, ProxyAuthReqd 0/0,
ReqTimeout 0/0, Conflict 0/0, Gone 0/0,
ReqEntityTooLarge 0/0, ReqURITooLarge 0/0,
UnsupportedMediaType 0/0, BadExtension 0/0,
TempNotAvailable 0/0, CallLegNonExistent 0/0,
LoopDetected 0/0, TooManyHops 0/0,
AddrIncomplete 0/0, Ambiguous 0/0,
BusyHere 0/0, RequestCancel 0/0,
NotAcceptableMedia 0/13, BadEvent 0/0,
SETooSmall 0/0
Server Error:
InternalError 0/0, NotImplemented 0/0,
BadGateway 0/0, ServiceUnavail 0/0,
GatewayTimeout 0/0, BadSipVer 0/0,
PreCondFailure 0/0
Global Failure:
BusyEverywhere 0/0, Decline 0/0,
NotExistAnywhere 0/0, NotAcceptable 0/0
Miscellaneous counters:
RedirectRspMappedToClientErr 0
SIP Total Traffic Statistics (Inbound/Outbound)
Invite 0/0, Ack 0/0, Bye 0/0,
Cancel 0/0, Options 0/0,
Prack 0/0, Comet 0/0,
Subscribe 0/0, NOTIFY 0/0,
Refer 0/0, Info 0/0
Register 49/16
Retry Statistics
Invite 0, Bye 0, Cancel 0, Response 0,
Prack 0, Comet 0, Reliable1xx 0, Notify 0
Register 4, Subscribe 0
SDP application statistics:
Parses: 0, Builds 0
Invalid token order: 0, Invalid param: 0
Not SDP desc: 0, No resource: 0
Last time SIP Statistics were cleared: <never>
The counters from this command can be reset with clear sip-ua statistics.
Similar to observing the count of no resource conditions with the command in Example 8-13, the count of no resource conditions can be monitored with 1.3.6.1.4.1.9.9.152.1.2.4.50 (cSipStatsClientNoAcceptHereOuts) for SIP calls.
Cause codes and CDR failure monitoring are discussed further in Chapter 14, “Monitoring and Management.”