To simultaneously increase performance, features, scale, port density, and power efficiency in current and future routers, Cisco has architected and implemented a new family of network processing chips. Innovative use of custom silicon allows Cisco to create products that would not be possible when using only general purpose processors or hard-wired commodity solutions that can be highly limiting.
Cisco® nPower network processors can meet the rapidly evolving demands on the network created by the Internet of Everything (IoE), where programmability is required with high performance and scale. The IoE requires not only increased network bandwidth but also large numbers of diverse flows and varying features for different flows, which will introduce new interactions between the routing infrastructure and cloud computing. The Cisco nPower processor supports the demands for the IoE and is designed to adapt to the dynamic evolution of internet working without sacrificing performance, power efficiency, and port density.
The Need for a New Network-Processor Design
Until recently, networks have been built to support specific services and devices, such as high-speed broadband access, secure VPNs, smartphones, and mobile broadband, using a fixed network hierarchy and multiple overlay networks. But the days of predictable traffic flows and guaranteed service-level agreements (SLAs) managed manually by network administrators are over. With the IoE, spiraling demands for on-demand video, cloud services, LTE mobility, and machine-to-machine (M2M) applications, as well as rising customer expectations for ubiquitous high-quality service connectivity, are putting tremendous pressure on network infrastructure and operations.
To avoid rapidly increasing costs and deteriorating SLAs, operators and enterprises must transform their infrastructure. They must simplify operations to achieve greater agility, accelerate service creation and look at new business models while reducing operational costs through automation.
Within this dynamic environment, even the network processing silicon is not immune from scrutiny. Often, the general-purpose processors in routers cannot achieve the necessary IoE performance, in which systems and subsystems must intercommunicate billions of events, connections, and operations per second. Traditionally, routers have been responsible for routing packets from one port to another and implementing designated quality of service (QoS) and security features. As the IoE evolves, these features remain important, but new requirements, such as programmability and application intelligence, become equally important.
It will no longer be feasible to apply a single feature set to all traffic passing through a router. With all the new types of traffic that will emerge in the IoE, whether generated by smart sensors, mobile devices, M2M transactions, or the greatly increased number of subscribers, many types of flows must be handled and treated differently.
The Cisco nPower Network Processor: Optimized for Networking
Cisco has created a unique architecture, based on modern processor design techniques, that allows the router to be programmed with the ease and flexibility of general-purpose processors. This simple adaptability contrasts with traditional network-processing architectures that are designed primarily for high speed, with only a limited degree of configurability and programmability. Such architectures can work well for a limited feature set or limited scale, but significant performance reductions usually occur when new features are added or high scale is required.
The IoE will consist of many different flows passing through the network, and those flows will have diverse feature requirements. The Cisco nPower X1, with 336 multi-threaded processor cores (672 threads), is designed to process many flows in parallel, so that different flows can be efficiently processed by different applications without interference between them. For example, one flow might require simple packet forwarding and would not be burdened by the processing of other flows that require other operations, such as connection setup. Applications may also directly interact with other entities in the network, supporting such features as coordinated control of flows. With the rich, high-level programming environment supported by the Cisco nPower processors, new features can be added quickly as they become important to the customer.
Although general-purpose server processors have certain levels of flexibility and are available on the market, they lack certain characteristics needed to attain high performance in the network such as the memory system or number of processor cores and threads supported. Compared to these processors, the Cisco nPower processor can achieve higher total performance and much greater power efficiency for networking applications, because the processor architecture and implementation are optimized to handle complex networking operations.
At the other end of the processing spectrum, traditional network processing solutions are often implemented as a "pipeline," with configurable hardware responsible for each feature in the pipeline. These architectures often fail or their respective performance degrades when the packet flows have diverse requirements or when unanticipated new features, such as new forms of tunneling are required. The Cisco nPower processor architecture avoids these type of problems and with its programmable capabilities is purpose built to address the requirements of software-defined networking (SDN) and OpenFlow standards with no modification required to the silicon.
Cisco nPower Memory System
In any high-performance processing system, memory performance is crucial. In general, memory access is required for the operations used to implement each feature of the router, such as table lookups, gathering of network statistics, and enforcement of QoS policies. Therefore, adding new features without substantial performance degradation requires both additional processing power and higher memory system performance.
At low scale, the memory system used by new features can be located on the processor chip itself or completely contained in high-speed caches. This approach can be used, for example, for per-subscriber tables or QoS profiles for some configurations. However, as the number of features increases or the scale of one or more features grows, the memory requirements will eventually exceed the capacity of the on-chip memory. Alternatively, because external DRAM performance is substantially lower than on-chip memory performance, an architecture using traditional external DRAM can expect significant degradation of capabilities, in this case.
Cisco addresses this situation by building a specialized memory system that is optimized for a very high access rate with high power efficiency. The Cisco nPower X1 network processor has an external memory system capable of many billions of lookups per second, which is an order of magnitude higher than today's high-end general purpose server processors. With this very-high-performance external memory system, more features and higher scale can be achieved without any sudden performance degradation as often experienced when using off-the-shelf architectures.
In addition, the Cisco nPower external memory system architecture provides a uniform memory pool that can be used by all features. In contrast, other architectures typically use dedicated memories for each type of feature - such as one for route lookup, one for counters, and one for QoS - which constrains each feature to a particular scale and performance level. With these architectures, it becomes impossible to trade the scale of one feature for another or to efficiently support multiple feature sets for diverse packet flows simultaneously.
By incorporating a high-access-rate memory system that can be used uniformly across various features, the Cisco nPower network processor provides exceptional power efficiency and allows increased port density, because less physical space is consumed by memory components.
Over the years, general-purpose processors have included hardware accelerators for vector processing, graphics, and high-performance computation. Cisco has included specialized hardware accelerators for common networking functions, such as hash-table and other complex data-lookup functions, statistics management, and access control list (ACL) processing. These accelerators have been carefully chosen to satisfy current and future routing needs, avoiding additional silicon and thus further reducing power consumption.
To support the demands of the evolving IoE, the Cisco nPower network processor uses hardware that can perform the computationally intensive operations required to implement highly accurate QoS at high scale and data rates. The computations performed by these integrated traffic-management functions would consume the entire processing capacity of many high-end general server processors. Yet, by integrating the traffic manager into the same chip as the packet processing functions, substantial savings in power and improvement in slot density are achieved.
Along with the processor, hardware accelerators, and traffic manager, the Cisco nPower network processor integrates features such as line-side 10 Gigabit Ethernet, 40 Gigabit Ethernet, and 100 Gigabit Ethernet media access control (MAC), optical transport network (OTN) framers, and a 10 megabit ternary content-addressable memory (TCAM). This level of integration further increases slot density and power efficiency, avoiding the need for external devices in several configurations.
Capabilities Required for the Evolving Internet of Everything
While other architectures and off-the-shelf silicon may be able to selectively meet some of the performance, scale, port-density, or power-efficiency levels that are necessary to satisfy the demands of the IoE, the Cisco nPower network processor architecture can achieve all these goals simultaneously. The power of a modern processor architecture, a high-performance memory system, and an integrated traffic manager support an array of advanced capabilities, including handling the high transaction rates demanded by the IoE, and collecting statistics to manage SLAs. These are just a few of the ways that these custom-built network processors improve router value and efficiency - by reducing the control plane bandwidth and eliminating external computation power that would otherwise be required. The architecture is easily adapted to various network configurations for core, lean-core, peering, and edge applications.