Software Architecture

The operating system software is based on a Linux software kernel and runs specific applications in the system such as monitoring tasks, various protocol stacks, and other items. The following figure shows an example block diagram of the operating system's software architecture.


Figure 1. Software Architecture Block Diagram

The software architecture is designed for high availability, flexibility, and performance. The system achieves these goals by implementing the following key software features:

  • Scalable control and data operations:System resources can be allocated separately for control and data paths. For example, certain processing cards could be dedicated to performing routing or security control functions while other cards are dedicated to processing user session traffic. As network requirements grow and call models change, hardware resources can be added to accommodate processes, such as encryption, packet filtering, etc., that require more processing power. Additionally, certain software task sizes are dynamically sized based on hardware and installed licenses thus conserving system memory.
  • Fault containment:The system isolates faults at the lowest possible levels through its High Availability Task (HAT) function that monitors all system entities for faults and performs automatic recovery and failover procedures using its Recovery Control Task (RCT).Processing tasks are distributed into multiple instances running in parallel so if an unrecoverable software fault occurs, the entire processing capabilities for that task are not lost. User session processes can be sub-grouped into collections of sessions so that if a problem is encountered in one sub-group users in another sub-group will not be affected by that problem. The architecture also allows check-pointing of processes, which is a mechanism to protect the system against any critical software processes that may fail.The self-healing attributes of the software architecture protects the system by anticipating failures and instantly spawning mirror processes locally or across card boundaries to continue the operation with little or no disruption of service. This unique architecture allows the system to perform at the highest level of resiliency and protects the user's data sessions while ensuring complete accounting data integrity.
  • Promotes internal location transparency:Processes can be distributed across the system to fit the needs of the network model and specific process requirements. For example, most tasks can be configured to execute on an SMC or a processing card, while some processor intensive tasks can also be performed across multiple processing cards to utilize multiple CPU resources. Distribution of these tasks is invisible to the user.
  • Leverages third party software components:The use of the Linux operating system kernel enables reuse of many well-tested, stable, core software elements such as protocol stacks, management services, and application programs.
  • Supports dynamic hardware removal/additions:By migrating tasks from one card to another via software controls, application cards can be “hot swapped” to dynamically add capacity and perform maintenance operations without service interruption.
  • Multiple context support:The system can be fully virtualized to support multiple logical instances of each service. This eliminates the possibility of any one domain disrupting operations for all users in the event of a failure.Further, multiple context support allows operators to assign duplicate/overlapping IP address ranges in different contexts.

Understanding the Distributed Software Architecture

To better understand the advantages of the system’s distributed software architecture, this section presents an overview of the various components used in processing a subscriber session. Numerous benefits are derived from the system’s ability to distribute and manage sessions across the entire system. The following information is intended to familiarize you with some of the components and terminology used in this architecture.

Software Tasks

To provide unprecedented levels of software redundancy, scalability, and robust call processing, the system's software is divided into a series of tasks that perform specific functions. These tasks communicate with each other as needed to share control and data information throughout the system.

A task is a software process that performs a specific function related to system control or session processing. There are three types of tasks that operate within the system:

  • Critical tasksThese tasks control essential functions to ensure the system’s ability to process calls. Examples of these would be system initialization and automatic error detection and recovery tasks.
  • Controller tasksThese tasks, often referred to as “Controllers”, serve several different purposes. These include:
  • Monitoring the state of their subordinate managers and allowing for intra-manager communication within the same subsystem.
  • Enabling inter-subsystem communication by communicating with controllers belonging to other subsystemsController tasks mask the distributed nature of the software from the user - allowing ease of management.
  • Manager tasksOften referred to as “Managers”, these tasks control system resources and maintain logical mappings between system resources. Some managers are also directly responsible for call processing.System-level processes can be distributed across multiple processors, thus reducing the overall workload on any given processor—thereby improving system performance. Additionally, this distributed design provides fault containment that greatly minimizes the impact to the number of processes or PPP sessions due to a failure.The SMC has a single Control Processor (CP) that is responsible for running tasks related to system management and control. Each PSC contains two CPs (CPU 0 and CPU 1) The CPs on the processing cards are responsible for PPP and call processing, and for running the various tasks and processes required to handle the mobile data call. In addition to the CPs, the processing cards also have a high-speed Network Processor Unit (NPU) used for enhanced IP forwarding.

Subsystems

Individual tasks that run on CPs can be divided into subsystems. A subsystem is a software element that either performs a specific task or is a culmination of multiple other tasks. A single subsystem can consist of critical tasks, controller tasks, and manager tasks.

Following is a list of the primary software subsystems:

  • System Initiation Task (SIT) Subsystem: This subsystem is responsible for starting a set of initial tasks at system startup and individual tasks as needed.
  • High Availability Task (HAT) Subsystem: Working in conjunction with the Recovery Control Task (RCT) subsystem, HAT is responsible for maintaining the operational state of the system. HAT maintains the system by monitoring the various software and hardware aspects of the system. On finding any unusual activities, such as the unexpected termination of a task, the HAT would take a suitable action like triggering an event prompting the RCT to take some corrective action or report the status.The benefit of having this subsystem running on every processor is that should an error occur, there is minimal or no impact to the service.
  • Recovery Control Task (RCT) Subsystem: Responsible for executing a defined recovery action for any failure that occurs in the system. The RCT subsystem receives recovery actions from the HAT subsystem.The RCT subsystem only runs on the active SMC and synchronizes the information it contains with the mirrored RCT subsystem on the standby management card.
  • Shared Configuration Task (SCT) Subsystem: Provides the system with a facility to set, retrieve, and be notified of system configuration parameter changes. This subsystem is primarily responsible for storing configuration data for the applications running within the system.The SCT subsystem runs only on the activeSMC and synchronizes the information it contains with the mirrored SCT subsystem on the standby management card.
  • Resource Management (RM) Subsystem: The RM subsystem is responsible for assigning resources to every system task upon their start-up. Resources are items such as CPU loading and memory. RM also monitors these items to verify the allocations are being followed. This subsystem is also responsible for monitoring all sessions and communicating with the Session Controller, a subordinate task of the Session subsystem, to enforce capacity licensing limits.
  • Virtual Private Network (VPN) Subsystem: Manages the administrative and operational aspects of all VPN-related entities in the system. The types of entities managed by the VPN subsystem include: Creating separate VPN contexts Starting the IP services within a VPN context Managing IP pools and subscriber IP addresses Distributing the IP flow information within a VPN contextAll IP operations within the system are done within specific VPN contexts. In general, packets are not forwarded across different VPN contexts. The only exception to this rule is the Session subsystem.
  • Network Processing Unit (NPU) Subsystem: The NPU subsystem is responsible for the following: “Fast-path” processing of frames using hardware classifiers to determine each packet’s processing requirements Receiving and transmitting user data frames to/from various physical interfaces IP forwarding decisions (both unicast and multicast) Per interface packet filtering, flow insertion, deletion, and modification Traffic management and traffic engineering Passing user data frames to/from processing card CPUs Modifying/adding/stripping datalink/network layer headers Recalculating checksums Maintaining statistics Managing both external line card ports and the internal connections to the data and control fabrics
  • Card/Slot/Port (CSP) Subsystem: Responsible for coordinating the events that occur when any card is inserted, locked, unlocked, removed, shut down, or migrated, the CSP subsystem is responsible for all card activity for each of the 48 slots in the chassis. It is also responsible for performing auto-discovery and configuration of ports on a newly inserted line card, and determining how line cards map to processing cards (including through an RCC in failover situations).The CSP subsystem runs only on the active SMC and synchronizes the information it contains with the mirrored SCT subsystem on the standby management card. It is started by the SIT subsystem, and monitored by the HAT subsystem for failures.
  • Session Subsystem: The Session subsystem is responsible for performing and monitoring the processing of a mobile subscriber's data flows. Session processing tasks for mobile data calls include: A10/A11 termination for CDMA2000 networks, GSM Tunneling Protocol (GTP) termination for GPRS and/or UMTS networks, asynchronous PPP processing, packet filtering, packet scheduling, Diffserv codepoint marking, statistics gathering, IP forwarding, and AAA services. Responsibility for each of these items is distributed across subordinate tasks (called Managers) to provide for more efficient processing and greater redundancy. A separate Session Controller task serves as an integrated control node to regulate and monitor each of the Managers and to communicate with the other active subsystems.This subsystem also manages all specialized user data processing, such as for payload transformation, filtering, statistics collection, policing, and scheduling.