For today's storage area networks (SANs), analyzers are indispensable for finding and correcting network problems. Analyzers supply SAN managers with a view into the traffic traversing their networks and allow them to quickly troubleshoot everything from bad cables to system failures. Connecting an analyzer to a SAN requires inline installation, that is, placing the analyzer on the links between SAN components. However, breaking links means interrupting normal SAN traffic and introducing downtime. To alleviate this operational problem, the Cisco® MDS 9000 Family of director and fabric switches provides an analyzer connection option that does not require breaking links, disrupting traffic, or downtime called the Switched Port Analyzer or SPAN port. This paper describes the abilities and limitations of the SPAN port for SAN analysis.
For managing and troubleshooting a production SAN, the Cisco SPAN port is a convenience feature. With the SPAN port, users who connect analysis equipment into their SAN fabric do not need to bring down the SAN or interrupt production traffic. Offered on the Cisco MDS 9000 Family of Fibre Channel switches and directors, SPAN technology allows users to configure any switch port to mirror data passing through any other switch port, or aggregate of switch ports, within the fabric. The SPAN port can mirror ports whether local (on the same switch) or remote (on another switch connected to the fabric). An analyzer uses the SPAN port to collect traffic for analysis.
The SPAN port is not a universal replacement for inline analysis. While convenient, it also has limitations. It is primarily recommended for use on upper-layer protocol analysis tasks like investigating interoperability and configuration issues. In other cases, inline connection is still the best location.
Among its advantages, the SPAN port:
• Provides analysis probing points without breaking any link or bringing down the SAN
• May be activated just when needed, from any port of the SAN
• Passes Upper-Layer Protocol (ULP) information in full, including ULP errors
• Aggregates all data from multiple probing points, up to the capacity of the output port being used as a SPAN port
• Allows multiple sessions and SPAN ports to be used at the same time if needed
Following are recommendations for the best use of the SPAN port:
• Use SPAN ports to debug configuration or interoperability problems related with Fibre Channel link services (such as FLOGI-Fabric Log In-and PLOGI-Port Log In-, fabric services (such as domain distribution, zoning, name services, routing protocols or fabric merging), and upper-layer protocols (Small Computer System Interface [SCSI]).
• Use analyzers that provide full line-rate monitoring, because the SPAN port provides no flow control mechanism.
• Do not use the SPAN port when the aggregated throughput to the SPAN port exceeds its physical capacity (1 or 2 Gbps), because it drops frames if input rate exceeds output rate. Do not use the SPAN port if spare capacity is unavailable in the SAN path from the probe point to the SPAN port.
• Do not use the SPAN port to debug FC-0, FC-1, and FC-2 problems, since it doesn't mirror 8b/10b coding, MAC layer primitives or errors (FC-0, FC-1, FC-2), and corrupted frames with cyclic redundancy check (CRC) errors. It also introduces some latency to mirrored data.
As a SAN analyzer vendor, Finisar tested the SPAN port to better understand its benefits and limitations and to understand how to use Finisar's test tools in combination with this feature. This paper first presents a tutorial of the SPAN port, a testing overview, and some additional technical details provided by Cisco. Then it presents the individual tests performed, the test results, the testing methodology, and the Finisar tools that were used.
SPAN PORT OVERVIEW
SPAN is a proprietary technology of Cisco Systems® that allows users of Cisco MDS 9000 Family switches and directors to configure the switch's ports to mirror data passing through any other Cisco MDS 9000 switch port, or aggregate of switch ports, within the fabric. The SPAN port can mirror ports whether local (on the same switch) or remote (on another switch connected to the fabric), as shown in Figure 1. The SPAN port does not mirror data from physical ports on other vendors' equipment.
The SPAN port's capabilities allow users to connect analysis equipment into their SAN fabric without bringing down the SAN or breaking links to install the analyzer hardware. Also, the SPAN port's remote monitoring capability-the ability to mirror ports on another switch-supplies a connection point for ports that might otherwise be inaccessible.
Figure 1. Local and Remote SPAN Port
Exercising the SPAN Port
Finisar tested the SPAN port by exposing a Cisco MDS 9506 Switch to various traffic stimuli and recording the SPAN port's outputs, without making assumptions regarding its inner workings. This approach should be similar to the behavior that any SAN administrator would perform when making use of this feature.
After Finisar completed this testing, Cisco provided a simplified architectural diagram of the SPAN port (Figure 2). The diagram, when looked at in combination with the test results, shows that the test results are consistent with expectations.
Figure 2. SPAN Port Simplified Architecture
Connecting to the SPAN Port-Initial Test
The initial testing started with the SPAN port's simplest feature, the ability to provide an access point to monitor traffic without bringing down the SAN and without interrupting production connections or traffic. Testers configured an arbitrary unused switch port as a SPAN destination port while passing data through two other ports on a Cisco MDS 9000 Series switch. The SPAN port began to mirror traffic. Configuring the SPAN port did not affect traffic flow on the network ports.
Checking SPAN Traffic-Link Layer Information
Once the SPAN port passed traffic, the results were recorded on an analyzer. Initial views showed that the SPAN port passed all Fibre Channel frames including start-of-frame (SOF), header, payload, and end-of-frame (EOF). However, link-layer (FC-2) primitive information, also called ordered sets, was absent. The missing data included link setup, error, and credit management primitives. The SPAN port did pass one link primitive, the IDLE, and two classes of frame primitives, SOF and EOF. [NOTE: Spell out IDLE at first use- it is a frame name, it cannot be spelled out]
Why doesn't the SPAN port mirror FC-2 primitives? According to Cisco, the SPAN port mirrors data after the switch's MAC layer, which keeps traffic flowing smoothly on the local link by sending FC-2 primitives that are interpreted by other switch ports' MAC interfaces. However, the MAC also strips off FC-2 primitive information on traffic it receives and then sends the remaining frames downstream within the switch for delivery. For instance, the MAC both sends and removes credit information (R-RDY) and Loop Initialization Procedure (LIP) primitives. (Any Fibre Channel text contains a complete list of FC-2 primitives.) While generally FC-2 primitive information is important only for debugging a specific set of link-layer problems, the potential loss of this information means that the SPAN port does not provide viable support for such tasks.
SPAN Port Performance Under Load
A switch's most important task is timely delivery of data from one port to another, even when a link becomes fully loaded. Finisar tested whether the SPAN port kept up with its source port by creating maximum line rate traffic from a single source port to a single SPAN (monitoring) port. Finisar tested both high-I/O (small frames) and high-throughput (large frames) traffic and compared the SPAN port's output with its source. After using a cross-port analysis utility to compare them, both frame streams were found to be identical.
While this test proved that the SPAN port provides the same data under load in same-rate one-to-one port mirroring, the result should not be extended to complex configurations, for instance aggregated links. It may be a common and useful practice to monitor several links with a single SPAN port, but if the total traffic sent to it exceeds its capacity (2 Gbps), the excess traffic is randomly discarded. As a consequence, configurations based on the aggregation of multiple ports' traffic to a single SPAN port may be better used for troubleshooting low-rate control traffic rather than monitoring heavy data transfer.
Cisco warns that the switch treats SPAN data with a lower priority than regular port-to-port data. In other words, if any resource under load must choose between passing normal traffic and SPAN data, the SPAN loses and the mirrored frames are arbitrarily discarded. This rule applies to preserving network traffic in any situation. For instance, when transporting remote SPAN traffic through an Inter Switch Link (ISL), which shares the ISL bandwidth with regular network traffic, the network traffic takes priority. If there is not enough capacity for the remote SPAN traffic, the switch drops it.
Knowing that the SPAN port arbitrarily drops traffic under specific load conditions, what strategy should users adopt so as not to miss frames? According to Cisco, the best strategy is to make decisions based on the traffic levels of the configuration and, when in doubt, to use the SPAN port only for relatively low-throughput situations. Examples include mirroring domain IDs, performing distribution and fabric merging procedures, querying name servers, performing zone management commands, routing FSPF information, performing host and storage login procedures, and reading SCSI target and LUNs control information. All of these procedures, while associated with typical troubleshooting situations, generally involve only a few frames and are not associated with heavy traffic situations.
Using Analyzers with the SPAN Port
Although there are situations in which the switch drops frames going to the SPAN port, they are not the only way to lose data when using the SPAN port. Users should also be aware that the port cannot be flow-controlled by the destination (analysis) device, because flow-controlling the SPAN mirrored output would, as a consequence, push back the flow-controlling action to the actual network traffic. This design choice is a consequence of the decision by Cisco not to affect the original network traffic while it is mirrored. Therefore, mirrored data issued from the SPAN port must be captured as quickly as it is produced, or the mirrored data may be lost. This characteristic becomes important if the analyzer connected to the SPAN port requires flow control. Flow-control related loss is unpredictable and leads to poor analysis result. A simple way to guarantee no data loss is to use analyzers that do not require flow control and provide full line-rate monitoring.
Using SPAN for Troubleshooting
The second phase of SPAN testing was performed with the aid of Finisar's Jammer error injector. This tool allows introducing arbitrary and controlled errors into the traffic flow. Because SPAN will not pass low-level errors, the Jammer was used to create many different types and classes of ULP errors such as SCSI check conditions, abort sequences (ABTS), and so on. The SPAN port passed, and the analyzer recorded, every ULP error introduced.
Cisco confirms that the primary aim of SPAN is to provide a method to debug ULP errors. Stated simply, if a Fibre Channel frame carries information that are related to an ULP problem (like Fibre Channel services or SCSI), the SPAN port is an excellent location for analysis.
When tested with link-layer errors and corrupt frames, the SPAN port proved less capable than with ULP errors. Given that the SPAN port mirrors data after the MAC layer, the SPAN port did not pass any link-layer errors such as corrupt link primitives or losses of synchronization.
Tests with corrupted frames showed that SPAN drops frames containing errors or error terminators. The SPAN port did not pass incoming frames containing a CRC error, an End-of-Frame invalid frame (EOFni), or End-of-Frame Abort (EOFna). This filtering occurred regardless of whether the SPAN port was configured to monitor ingress or egress ports.
Frame error filtering on the SPAN port does not necessarily make the port unsuited to detecting problems caused by frame errors, but it does require users to look for secondary effects, that is, the results of errors rather than the errors themselves. For instance, if a data frame containing an error is dropped, the destination device eventually detects that it is missing a frame as part of an exchange. This causes the device to issue an Abort Sequence (ABTS) frame, which is reported by the analysis system attached to the SPAN port.
An effect of the SPAN dropping corrupt frames is to make unwary users mistakenly believe that there are no errors coming into or out of the switch. When using the SPAN port, users must look carefully at second-order effects, such as increased ABTS occurrences, to determine when errors have occurred. Also, users should check other sources for error information. Many storage area component vendors, including Cisco, provide frame error counters on their monitoring interfaces to record the number of corrupt frames detected. Users with an analyzer connected to the SPAN port should check these counters if they suspect that frame errors exist on a line.
Another limitation of the SPAN port is latency. Unlike straight optical connections, the SPAN port is a switched port and thus introduces switching time. In the tests performed on the switch, the mirrored frame reached the SPAN port 8.3 to 20.5 microseconds after the instant the data frame was received on the ingress port, while in the opposite direction the mirrored frame reached the SPAN port 0.35 to 0.4 microseconds before the original data frame reached the actual destination port. (Note: Times are relative to readings on the host side of the switch.) For most users this small latency is unimportant; however, when measuring time-sensitive transactions the SPAN trace might be not accurate enough.
Finisar tested the SPAN port using the Cisco MDS 9506 Switch in a black-box fashion; that is, test stimuli were introduced to the switch and the resulting behavior was observed. Testers had no engineering information from Cisco at the time of testing other than that found in the users' manuals for the Cisco MDS 9000 Series.
A range of Finisar testing solutions were used to test the SPAN port. To create traffic at arbitrary line rates and verify data integrity, Finisar's Medusa Labs Test Tools Suite was used. Finisar's Xgig FibreChannel Jammer was used to introduce controlled errors into the data path. To monitor the SPAN port, Finisar's NetWisdom was used. To analyze test results and perform cross-port analysis, Finisar's Xgig FibreChannel Analyzer with Expert software was used. These tools are described further in the final section of this paper.
Figure 3 shows the configuration used to test the SPAN port on the Cisco MDS 9506 Switch. The yellow line between the Test Tools Server, Jammer, switch, and JBOD (Just a Bunch Of Disks) represents the bidirectional data connection between host, switch, and storage. Green lines connected to the Xgig FC Analyzer represent bidirectional analyzer connections to the host bus adapter (HBA), switch ingress, switch egress, and switch SPAN port. The red line between the switch and the NetWisdom probe represents two unidirectional SPAN ports. Depending upon the test configuration, the SPAN port connection mirrored the bidirectional data flow of either switch ingress or switch egress.
Figure 3. SPAN Port Testing Configuration
The following subsections cover each test performed on the Cisco MDS 9506 Switch.
Test 1: Monitor Normal Traffic
Test Description: Generate traffic flows. Monitor switch ingress and egress. Observe results.
Result: Other than IDLEs and frame delimiter ordered sets (SOF/EOF), the switch does not pass link-layer (Fibre Channel Layer 2) traffic to SPAN port. All normal frames pass to SPAN port.
Interpretation: The switch removes link-layer information, other than IDLEs, before passing traffic from the ingress or egress ports.
Test 2: Introduce Code Violation into Ordered Set
Test Description: Use Jammer to modify an IDLE to introduce a code violation at switch ingress. Monitor switch ingress to observe result.
Result: Switch does not pass IDLE containing the code violation to SPAN port (or egress). Switch replaces code violation with IDLE at SPAN port (and egress).
Interpretation: The switch "cleaned up" the data path and removed the code violation. Based upon this result, it is reasonable to assume that the SPAN will remove similar link-layer (Fibre Channel Layer 2) errors.
Test 3: Introduce Code Violation into Frame
Test Description: Use Jammer to modify a frame to introduce a code violation at switch ingress. Monitor switch ingress to observe result.
Result: Switch does not pass frame with code violation to SPAN port (or egress). Switch replaces frame with IDLEs at SPAN port (and egress).
Interpretation: The switch "dropped" the frame and replaced it with IDLEs. Based upon this result it is reasonable to assume that the SPAN will remove similar error frames with code violations or CRC errors.
Test 4: Simulate Credit Starvation
Test Description: Use Jammer to change R-RDYs (buffer-to-buffer credits) from the switch to the host to IDLEs. Monitor switch ingress to observe results.
Result: As the HBA times out on credit starvation, it sends an Abort Sequence (ABTS) frame (Fibre Channel Layer 4) to the switch. The switch responds by initiating a link reset by sending a Link Reset primitive (Fibre Channel Layer 2). The SPAN port mirrors the ABTS frame on the switch ingress input, but sends an IDLE instead of Link Reset on the ingress output.
Interpretation: Consistent with Test 1, the switch removes link-layer information, other than IDLEs, before passing traffic from the ingress or egress ports. Transport-layer information is unaffected.
Test 5: Break Link between Switch and Target
Test Description: Pull cable between switch and JBOD. Monitor results on SPAN ports at ingress and egress inputs.
Result: Cable pull results in IDLEs on egress input SPAN port (storage-to-switch) instead of loss of signal (LOS). Reconnecting cable results in Loop Initialization State Machine (LISM) at egress port. Egress port monitor shows only IDLEs. After LISM the ingress port (host-to-switch) monitor shows switch sending a Register for State Change Notification (RSCN) frame to host.
Interpretation: The SPAN port did not show any link-layer information regarding credits prior to credit starvation, and did not show the LISM sequence after. The SPAN port did show a transport-layer effect arising from the LISM, namely an RSCN from the switch to the host. This test shows the liability of not having link-layer information on the SPAN port. Without this information the user has only a secondary effect, the RSCN, to determine that something went wrong on the link.
Test 6: Send Frame with Incorrect Destination Address
Test Description: Use Jammer to modify frame destination address in a data frame of a write exchange to a nonexistent destination. Monitor SPAN port ingress input to observe the result.
Result: SPAN port does not mirror the faulty frame.
Interpretation: This test shows that the SPAN port does not act as a data mirror, showing everything on the input, but rather as a switched port. Because the SPAN port is indeed a switched port and does not have the modified frame destination address in its switching table, it does not output the frame.
Test 7: Send EOFni from Host to Switch
Test Description: Use Jammer to change the End-of-Frame Normal (EOFn) to End-of-Frame Invalid (EOFni) between the host and the switch. Monitor SPAN port ingress input to observe the result.
Result: The SPAN port does not mirror the frame containing the EOFni.
Interpretation: This test shows that the switch removes errors in frames coming from the host and does not pass them on to either the SPAN port or the storage.
Test 8: Send EOFa from Host to Switch
Test Description: Use Jammer to change the End-of-Frame Normal (EOFn) to End-of-Frame abnormal (EOFa) between the host and the switch. Monitor SPAN port ingress input to observe the result.
Result: The SPAN port does not mirror the frame containing the EOFna.
Interpretation: As with Test 7, this test shows that the switch removes errors in frames coming from the host and does not pass them on to either the SPAN port or the storage.
Test 9: Send EOFa from Target to Switch
Test Description: Use Jammer to change the End-of-Frame Normal (EOFn) to End-of-Frame abnormal (EOFa) between the storage and the switch. Monitor SPAN port egress input to observe the result.
Result: The SPAN port does not mirror the frame containing the EOFna.
Interpretation: This test shows that the switch removes errors in frames coming from the storage and does not pass them on to either the SPAN port or the host.
Test 10: Calculate SPAN Port Latency
Test Description: Generate traffic flows. Monitor switch ingress and egress. Compare timestamps between ingress and egress inputs and SPAN port.
Result: Delta time from ingress input to SPAN port a command frame (64 bytes) was 8.3 microseconds. From ingress input to SPAN port a data frame (2112 bytes) was 20.5 microseconds. On the return trip (storage-to-host) a command frame showed up 0.37 microseconds sooner on the SPAN port than on the ingress output (host side). A data frame showed up 0.4 microseconds sooner on the SPAN port than on the ingress output.
Interpretation: The SPAN port introduces latency between 8.3 and 20.5 microseconds compared to nonswitched traffic. Data returning from the egress (storage-side) to ingress (host-side) ports shows up slightly earlier on the SPAN port than on the ingress port.
When analyzing network traffic, network administrators have multiple options for analysis tools. The following two analyzers are available from Cisco and Finisar, respectively. For further information on either product, use the contact information at the end of the product's description.
Cisco Port Analyzer Adapter
The Cisco MDS 9000 Family includes the Port Analyzer Adapter (PAA), a portable, lightweight, self-contained hardware device that can be connected to any MDS 9000 Family switch port, configured as SPAN.
The PAA carries an input Fibre Channel port and an output Gigabit Ethernet (GE) port, and it encapsulates the Fibre Channel frames from the SD-Port into Ethernet frames that may be collected and analyzed using a locally attached PC.
The PAA may be used in conjunction with two PC-based analysis solutions, the Cisco Traffic Analyzer for Fibre Channel and the Cisco Protocol Analyzer for Fibre Channel.
The Cisco Protocol Analyzer enables Fibre Channel traffic, delivered through the SPAN port and encapsulated by PAA, to be decoded into details using a Java GUI. The traffic may be in real time or from a stored capture. This solution uses Ethereal public-domain software enhanced by Cisco for Fibre Channel and SCSI protocol decoding.
The Cisco Traffic Analyzer analyzes Fibre Channel traffic, delivered through the SPAN port and encapsulated by PAA, to provide numerous Fibre Channel- and SCSI-level performance metrics. The results are displayed through a Web browser user interface. The traffic may be in real time or from a stored capture. This solution uses Ntop public-domain software enhanced by Cisco for Fibre Channel and SCSI performance analysis. To learn more, visit:
The Xgig Analyzer is a full line-rate protocol analyzer designed for Fibre Channel and Gigabit Ethernet SAN analysis. It streamlines resolution of events that cause performance problems and enables users to design, implement, test, and evaluate SANs and their components by automatically analyzing captured traces for errant behaviors and providing extensive performance-analysis capabilities. Its high-level views take you quickly to the root of performance issues. Finisar analyzers search and process every record in a trace file to help pinpoint specific events fast, leaving engineers and designers free to concentrate on problems, not data. To learn more, visit:
Testers used the following Finisar equipment in addition to the Xgig Analyzer for its tests.
Medusa Labs Test Tools Suite
Finisar's Medusa Labs Test Tool Suite is a series of benchmark speed, data integrity, and stress test tools that allow test and validation engineers to better develop quality products in a shorter time frame. The suite was designed for engineers who work with DVT, validation, bring-up, design validation, and quality assurance.
Medusa Labs test tools are user-mode command-line applications that run on a host system. At the simplest level, the test tools operate in an initiator-target fashion. The host system acts as an initiator and the target can be any storage device internal or external to the host system. With Finisar's Medusa Labs Test Tools, the host system becomes a precision traffic generator using real-world application data. Because the tools are command-line based, they are ideal for setting up scripted test runs.
Finisar's Xgig Jammer is a real-time traffic manipulation tool designed to ensure that Fibre Channel and Gigabit Ethernet networks recover from all error conditions without data loss or corruption. With Xgig Jammer, network managers can manipulate traffic on a network to simulate errors, in real time, and verify that the recovery process operates as expected. The Xgig Jammer is used in conjunction with one or more Xgig Analyzers to capture modified traffic and the system response to that traffic. The Jammer can trigger the Xgig Analyzer and the Analyzer can arm the Jammer.
Finisar's NetWisdom® is a continuous line rate analysis solution that enables SAN managers to easily view comprehensive performance data about their networks. By gathering detailed Fibre Channel and SCSI statistics with dedicated hardware, NetWisdom offers the most complete performance analysis available, whereas information collected from software-only analysis solutions is typically aggregated and sampled. NetWisdom collects all relevant data in-band, but operates out-of-band so it does not compete with the storage network's critical processes of moving data to its destination. The NetWisdom architecture also ensures high levels of data integrity.