Cisco on Cisco
Routing and Switching Case Study: How Cisco IT Designed a Separate Network to Test Cisco Alpha Equipment
Special production-level network tests and validates product behavior in a safe environment.
Every day, more and more packet traffic flows over networks. More often than not, it moves through Cisco® devices or uses Cisco software. Some networks, like enterprise intranets, use Cisco equipment end-to-end; many others, including the public Internet, contain Cisco components. In the development process, an important question for Cisco Systems® has always been: "How reliable is this product?"
Although formal testing methods can and do isolate many problems, Cisco has always understood that no amount of testing can compare to seeing device and software behavior in a large-scale working network. Cisco product engineering methodologies require field trials with live traffic for preproduction products. Cisco IT was not willing to put these alpha products on its production network, and they needed a separate network?one with live traffic yet safely separated from the production network.
Kevin Smith, manager of network operations for the San Jose, California region, describes his introduction to development products in the Cisco network: "Originally, I had been a Cisco customer. When I joined Cisco, I believed that the Cisco network had to be the network of networks. When I got here, I discovered two distinct yet interconnected networks: The corporate network and the engineering network. Never in my wildest dreams did I imagine that the engineering network was used for testing. It was run by Engineering Computing Services, and it provided connectivity for the Cisco engineering staff. Those were the days when we didn't always understand what was happening with the network. We didn't know how to track availability or performance, and we also didn't know how to evaluate the impact of the bugs we found. It was hard to know where to put resources."
As Cisco grew, the need for understanding product behavior in a production-level network also grew. Cisco engineering needed more information about development issues and required a safer network that would remove the risk from standard Cisco business and engineering operations.
Cisco IT knew that this was a challenge throughout the industry. Most customers that consume infrastructure products need to try out products with live traffic before deploying them in their production networks, for a variety of reasons. At Cisco, IT runs multiple trials to learn (a) how compatible a product is with other embedded equipment, (b) how to automate as much of the deployment and monitoring and management as possible, and (c) how to prepare the deployment and support staff for the new product before a full-scale deployment begins.
In addition, Cisco needed to ensure excellence and availability for all its products. The best way to accomplish these goals was to operate those products in production-like environments before they were sold to customers. Members of the IT organization consulted their peers in other leading technology companies to learn how other organizations were using production environments to validate new products. What they discovered was that many technology companies run their businesses-in a protected and controlled way-on their own alpha products.
In 1999, Engineering Computing Services and IT merged, becoming a single organization. Within the IT organization that would run all the Cisco production networks as well as the engineering test networks, a new group emerged called IT Alpha. Cisco IT would focus on all production network connectivity. Any business-related activity would occur on the Cisco production network, including engineering. IT Alpha would manage, administer, and control a second network that would safely test new products in a production-level network. The young IT Alpha group would take over the validation function of the engineering network but with greater control and vastly reduced risk for Cisco. The idea was to do as much work as possible on a physically separate network. When a failure occurred, the group wanted it to be easy for employees on the alpha network to plug back into the production network without losing time and productivity. The IT Alpha network began.
Figure 1. Alpha Network Routing Advertisements and Static Routes
The IT Alpha network was designed to test new products in new configurations that changed-and still change-weekly. Years ago, when testing was done on the production network, this constant change could bring down parts of the network, preventing Cisco employees from working for hours at a time. The production network needed to be protected from this type of failure, but to do realistic, enterprise-style testing, alpha products still had to run in production environments. So the IT organization created a barrier, distinguishing and protecting the Cisco production network from the IT Alpha network. "The alpha environment can fail in unexpected ways, but ever since the barrier was inserted in the network, these failures never affect the production network", say Patrick Gilbreath, INS-ET engineer."
The group protects the production network from these potential problems by establishing uni-directional dynamic routing with the production network and doing route filtering (see Figure 1). "Using static routes pointing to null0 for alpha subnets on cluster gateways, then redistributing those static routes into production, we simply remove the potential routing abnormalities that could occur,"says INS-ET Engineer Ben Irving. "Furthermore, adding another layer of route filtering on the production gateways reduces the potential for route instability to an even greater extent." (See routing statements below.) Other problems, like virus and worm infections, are handled within the alpha network the same way they are in the production network: Infected devices are quarantined until they are cleaned and patched.
Over time, IT Alpha evolved into today's Intelligent Network Systems Emerging Technologies (INSET) group. With six full-time Cisco certified engineers and one project manager experienced with many different Cisco technologies and products, INSET is qualified to validate any solution in a production-level network. INSET maintains relationships with 14 different Cisco business units, along with many network and support teams within IT. Today, INSET manages about 50 separate product alpha network environments (examples, Figure 2), successfully helping Cisco business units to find feature gaps and bugs, and has built a reputation as the go-to organization to augment new Cisco "product readiness" initiatives.
The routing statement that accomplishes uni-directional dynamic routing appears below (IP addresses are fictitious; it would be unwise to copy and paste this into your own router):
router eigrp 109
redistribute static metric 100000 10 255 1 1500
network 192.168.0.0 0.0.255.255
ip route 10.32.224.0 255.255.224.0 Null0
ip route 10.34.224.0 255.255.224.0 Null0
The routing statement that accomplishes route filtering appears below (IP addresses are fictitious; it would be unwise to copy and paste this into your own router):
router eigrp 109
network 192.168.0.0 0.0.255.255
distribute-list defend-core out GigabitEthernet1/1
distribute-list allowed_in_alpha in GigabitEthernet1/1
distribute-list defend-core out GigabitEthernet1/2
distribute-list allowed_in_alpha in GigabitEthernet1/2
ip access-list standard allowed_in_alpha
permit 220.127.116.11 0.0.0.255
permit 18.104.22.168 0.0.0.255
ip access-list standard defend-core
Amy Rogers, a network engineer supporting the alpha network for INSET, describes how she works every day validating new products: "Because we support new, emerging technologies, I think we're really becoming an integral part of the development processes in the business units. New products are rolled out of an engineering lab into our network. The alpha network is really a network within a network. We run production-level traffic over it, but the actual Cisco network is protected. To get the kind of traffic we need, we use a combination of shadowing and sharing the production space. Working with the real traffic, in real situations, we get to understand the behavior of all kinds of new and emerging technologies and products. How is Cisco protected? There's a boundary between the production network and the alpha network. We use that boundary to manipulate routing and traffic. That's where we make our demarcation between production and alpha.
She continues: "Right now, I'm sitting at my desk, plugged into the alpha network. All traffic goes into the alpha net. We're using pre-FCS [first customer ship] or engineering code. I'm still using e-mail, MeetingMaker, my browser, and my messaging tool. My mail server is still on the production network. If I encounter problems, or other users in this group encounter them, we move back to the standard production network. For example, we're running a mobile IP validation right now. If it crashes, and as an end-user, I need my e-mail, I just unplug my laptop, and replug it into the production network. I'll get a different DHCP [IP] address, and instantly, I'm back up and running. We're really helping in the development efforts of the product itself. Regular quality assurance works, but you can learn a lot when you put these products into a midsize enterprise network. You can get the real-world view that you just can't get anyplace else. We can also point out missing features, usability, and we find a lot of bugs this way."
Figure 2. Products with versions in Alpha Testing in March 2004
Today, the INSET program is funded by Cisco internal customers in the business units. They decide on test parameters and create a test plan that incorporates specific features, size of the user base, and for enterprise-class products-the larger routers and switches, for instance-the amount of bandwidth needed. Smith describes test planning for INSET: "Anything that you'd normally find in a test plan: Scale of the test, feature sets of the test-even demos to end customers. Recently, one of the business units needed a pilot program with a prominent tier 1 service provider. The service provider wanted to buy Cisco equipment for a new Metro Ethernet offering. The validation process let them find out whether the Cisco product did what they needed it to do. INSET, working with the business unit and external customer, built the first optical Ethernet metro-area network, supporting the first service offering of its kind anywhere in the United States."
Cisco products are always in the INSET pipeline. For example, the Cisco Wireless IP Phone 7920 went through INSET testing and is now shipping to customers. INSET also validates next-generation metropolistan-area network infrastructure products, including auto-provisioning, Cisco CallManager and next-generation unified messaging software, and high-speed routers and switches. The list is a long one.
When successful, INSET's results are invisible to Cisco customers: Better validation means better performance in enterprise and service provider networks, fewer customer problems, and far fewer technical assistance calls. It also often means fuller and richer features.
At Cisco, reducing customer-found defects (CFDs) is a top priority and is an initiative mandated by John Chambers. One of the INSET team's most important functions while validating products is to find bugs before the product ships. Bugs are expensive. Cisco spends money and resources on CFDs, and so do Cisco customers. Today, INSET finds an average of 111 bugs every month, 333 a quarter. Bringing these bugs to the attention of product engineers, and getting them fixed before customers encounter them, is what the INSET team is all about.
The INSET team is continuing to expand its base of clients within Cisco, making sure that more Cisco product teams have the opportunity to "test drive" their prerelease products in a real-world environment so they can encounter and fix bugs before the product is shipped. This is partly a marketing effort?getting the word out to product managers throughout the many engineering product groups within Cisco that this capability exists?and partly a matter of increasing the scope and value of the INSET team.
For example, although the INSET team has provided testing for enterprise products, for a long time it has not been able to support service provider product groups. The reason is that the Cisco environment is an enterprise environment, and testing service provider products would require a radically different network environment than anything normally available within Cisco. "Last year (2003) the INSET team built an alpha service provider network across five buildings on the Cisco San Jose campus", says Neil Evans, INSET engineer. "Designed around high -end routers such as the Cisco 10000 and 12000 series core routers and Metro Ethernet routers, this environment was created to be usable for a variety of service provider products, including high-speed Layer 2 metro products, as well as Multiprotocol Label Switching (MPLS) VPN services across Layer 3 products."
Last, the INSET group is determined to add more client value to the test environment, and the new Network Access Module blade, which captures network traffic and application flow statistics, is just one more addition. The group has begun by capturing traffic data using the Multi-Router Traffic Grapher (MRTG) software tool, and making this data available to engineering teams that want to know more about the alpha test environment on a second-by-second basis as product issues appear. They plan to make application flow data available, which engineers might find useful in determining the root cause of unusual product anomalies found in the rigorous process of testing and improving products before they are shipped to customers.