Top considerations for storage networks of the future
Application performance insight
Operational simplicity through automation
Machine learning and adaptive networks
Hybrid: cloud and on-premises IT
Figure 1. Storage network analytics pyramid
Storage network switches will provide meaningful data on application performance
At the heart of every data center is the application, and the area of focus is its performance. Soon storage network admins will have granular insight into every application’s health from a single source of truth that is “always on” and “sees everything.” Currently, network devices provide telemetry-data granularity at the application I/O level, which is essentially a tuple consisting of initiator-target-LUN (I-T-L). Soon networks will become application-aware and be able, in essence, to add a fourth dimension: For example, the same initiator host may be running a virtualized compute environment that hosts many application instances, such as Outlook, MySQL, and VDI. Use of a unique application identifier that lives as long as the application instance, along with the I-T-L tuple, will enable admins to maintain application performance baselines and monitor real-time application performance deviations against those baseline parameters.
By 2020,* 100 percent of network devices are expected to have in-built probes for telemetry
Three key factors that will make network devices pivotal to telemetry data collection are the following: built-in probes, scale, and extensibility. Built-in probes within the ASIC today ensure that no external hardware appliance is needed. Storage admins will not have to be concerned about the scale at which network devices can monitor I/Os, because every network ASIC will monitor the flows, distributing the load across the network ASICS practically evenly. As the I/Os scale, the network devices will be able to automatically scale the telemetry data collected from different ASICs as well as stream the large quantities of distributed data, using dedicated ports. Scalable built-in probes across the network will ensure that no special network design or tuning is required to capture specific flows, because the network will guarantee that every flow gets monitored somewhere along its path through the network.
A proliferation of analytics tools will require that data be provided in an industry-standard format
There is no “one size fits all” when it comes to network analysis tools. There are many options available today and will be tomorrow, with the advent of data science built on big-data platforms such as Hadoop. IT admins will have their own favorite tools, and therefore it will be expected that networks produce telemetry data in open, standard formats and either stream that data “over the wire” or provide intelligent open APIs to query the data pertaining to a particular instance. This openness will guarantee vendor neutrality, as well as flexibility, that will ensure intelligent, fast, and efficient consumption of the telemetry data by the tools.
Figure 2. Four cardinal rules of investment in high- performance storage networks
High-performance computing will become mainstream soon
High-performance computing is today prevalent mostly in areas of scientific research and financial and healthcare applications. The advent of high-performance multicore GPUs is allowing servers and storage to drive much higher I/O throughput than earlier. It is estimated that by year 2020, exponential growth in the number of users and amount of data they produce will result in a speed-up of application processing by at least 10X of current levels. Storage technologies such as NVMe/Fabric using deeper queues and parallel I/O processing will become a requirement, and today’s networks should be able, reliably, seamlessly, and nondisruptively, to provide the needed transition.
Reliability, consistency, and predictability will remain top priorities
As performance and speeds increase, the underlying network will be expected to be even more reliable in delivering those I/Os. Network reliability, transport reliability, loss-less and error-less delivery of I/Os with minimum retransmission will ensure that the network is able to keep up. A proven and time tested transport protocol will ensure that all frames originating from a host adapter or storage port will be error-free by either being able to correct errors using “Forward error correction” or ensuring that error frames are always dropped and not forwarded using “store and forward architecture.” I/Os will likely take multiple paths to increase efficiency, and, irrespective of path, key parameters such as exchange completion time, I/O throughput, and outstanding I/Os must remain consistent. Also, irrespective of which network port or path that I/Os traverse, a SAN admin should always be able to predictably and deterministically calculate network latency. For higher availability, having smaller groups of ports per physical ASIC is recommended over one large failure domain in a single ASIC serving all ports.
Higher speeds and multiple feeds
An increase in I/O throughput will automatically drive demand for more network bandwidth. The transport protocol of choice should have the potential to increase port speeds by at least 2X in the near future. But at the same time, the ports should support as many as four different lower rates and as few as three different generations of transceivers. This allows investment protection, the potential to double bandwidth when needed, and at the same time flexibility to interoperate with devices that have lower speed connectivity.
Higher usable-port density
A way to achieve higher bandwidth on a single network device is to have a higher density of front-panel ports. As network devices get denser in port count, it is important that they provide full line rate speeds and also 100 percent utilizable front-panel bandwidth using industry-standard transceivers and cables.
Figure 3. Autonomous storage network roadmap
Autonomous storage networks will cut down management overheads
The network device of choice should have features where the entire lifecycle of repetitive operations can be automated, starting with automatic day-0 provisioning “out of the box,” using custom configuration profiles that can be downloaded on the switch, and automatic detection of connected host and storage devices, zoning them without any manual configuration, and, in future, removing them when the devices disconnect to avoid dead zones. Another feature would be to automatically detect slow devices and isolate them into virtual links with lower priorities and eventually put them back to normal priorities when congestions recover. These features, coupled with virtual machine awareness, will enable pushing custom policies down to a network device and applying them on every flow originating from those virtual machines for application of Quality of Service (QoS), acceleration, and encryption, as some of the use cases.
Network devices already support APIs for management. These APIs provide a more efficient, secure, and programmable method of management compared to traditional interfaces, such as command line interfaces and SNMP. These APIs provide easily parsable responses in industry-standard formats such as JSON and XML. Repetitive management tasks can be converted into intelligent scripts using these APIs. Further, the switches today offer a sandbox facility that allows developers to rapidly develop python modules by simply converting existing command line interface commands and responses. In future, these switches will provide complete RESTful stateless APIs that will allow object value pairs for every managed object on the network. These objects will be compatible with popular RESTful interfaces such as RESTCONF and YANG.
OpenStack and latest configuration management tools
Most storage networks now support OpenStack integration of critical network functions such as zoning, which allows them to be managed via OpenStack-based orchestration tools. Modern storage network operating systems also have the capability to host containers, which will facilitate virtualized network functions without having to modify or upgrade the core operating system kernel, thus minimizing the risk to critical network functions running within the core kernel. A number of popular storage arrays and servers already support integration with modern management tools, which allow a very high degree of management automation built on top of RESTful programmable APIs. Storage network devices must in future support integration with these tools, to have a single pane of glass for network as well as storage and server automation.
Figure 4. Machine learning and adaptive storage networks
Storage networks implement traffic engineering for different workloads
Storage network devices implement enhanced QoS today using the classification of workloads and priority tagging of frames at the ingress of the fabric. Different workloads can be provided different prioritizations based on QoS policies that are saved on the switch. Enhanced functionalities such as Fibre Channel ERRDY and congestion-isolation features using virtual links are available to place storage traffic in different virtual queues with at least three different priorities: high, normal, and low. The switch detects a congestion condition and automatically distributes flows among three different buckets, which allows them to service the flows with different dedicated resources without causing frame drops. Fibre Channel ERRDY functionality is expected to be available end-to-end, from the host adapters to the storage ports, in the future.
Machine-learning capabilities will enable complete automation of traffic engineering
Building on top of traffic prioritization and virtual links, machine learning will be the next step for storage networks, where they will be able to automatically detect and classify a workload or a flow between an initiator and a target from the time it logs into the fabric. These will be classified based on workload profiles that will be downloaded onto the switch. Some workloads, such as I/Os from production applications that are most active during normal operating hours, will be classified as high priority virtual links and provided more resources, such as bandwidth and B2B credits. The network will learn over time the period of day when these workloads are most active and automatically kick in the prioritization. It will identify those workloads, and whenever they become active in the fabric, even during non-regular hours, the network will automatically adapt to prioritize them. A classic use case is where, during normal operating hours, an application I/O will be prioritized, and, during after-hours, backup applications will be prioritized over the application I/O.
Figure 5. Storage networks geared for a hybrid IT landscape
Today’s storage network must support the hybrid IT strategy of tomorrow’s data centers
Tomorrow’s data centers are going to adopt a hybrid model where some applications are going to be hosted on-premises, which are either mostly legacy or applications that, due to statutory or regulatory compliance reasons, need to be always hosted in a physical data center, and where some applications are going to be hosted on the cloud, which provides them benefits of affordable scale, ubiquitous provisioning, and resource sharing. The storage networks will still remain central to this hybrid IT strategy because critical Tier-1 to Tier–3 storage workloads will still need to be supported. The storage network should support virtualized environments by having the flexibility of being managed using the same tools that are used in the on-premises and cloud-native infrastructures. This is why having programmability and integration with open, automated management tools for today’s storage networks becomes imperative.
Converged I/O will be a significant requirement for hybrid IT environments
For modern storage networks to be able to support both on-premises and cloud-scale designs, convergence will play a significant role. It is important that the applications can seamlessly coexist and that resources can be shared across the two deployments for benefits of cost and reduction in the opex caused by maintaining two diverse networks. The network switch should support block, file, and object storages and also support different transport protocols, such as IP and Fibre Channel. A single converged network switch for the top-of-host-rack for converging application traffic as well as I/O workloads allows a common layer with logical segregation between on-premises and cloud scale applications.
Legacy on-premises data centers are not going to stop growing
While a considerable shift will be seen to cloud-based infrastructures, either private or public, because of the benefits of scale, cost savings, and true virtualization for efficiency, it’s a fact that investment in maintaining dedicated on-premises data centers is still going to increase. This is because some applications are not going to move to the cloud, and as users and data increase, increase of capacity will also be required. The storage network switch should provide the flexibility to “pay as you grow” as well as to scale up architectures using modular or semi modular designs. They should scale in ports, speeds, and end devices. These network switches must have a nonforklift upgrade path to higher speeds and next-generation modules, which will protect existing investment for a longer period.