To help protect against loss of assets due to network failure, COS Release 3.16.1 supports data resiliency:
In addition, COS supports bonding of two management interface ports to provide redundant connectivity in the event of a network interface card (NIC) failure.
This section describes these features and provides instructions for configuring them through the COS setup file or (for data resiliency only) through the V2PC GUI.
COS provides resiliency at the node and cluster level using one of two methods: mirroring or erasure coding. Resiliency at the node level is achieved using either local mirroring (LM) or local erasure coding (LEC), which is the default. Similarly, resiliency at the cluster level is achieved using either remote mirroring (RM), which is the default, or distributed erasure coding (DEC). Content resiliency policies are set per COS cluster, and all nodes within a cluster share the same resiliency patterns.
Note We recommend using either remote mirroring or DEC, but not both. COS 3.16.1 does not support migration from one scheme to another while preserving stored content. COS treats remote mirroring and DEC as mutually exclusive options. DEC is preferred, so if both are enabled, COS uses DEC.
Mirroring configures the disks in a node, or the nodes in a cluster, to hold a specified number of exact copies of the object data. The number of copies can be set from 1 to 4, with 1 representing the original object data only, and 4 representing the object data plus three copies. Thus, for example, a value of 2 specifies one copy plus the original object data.
To configure mirroring in the GUI, you specify the number of desired copies differently for local and remote mirroring:
Note For clusters of two or more nodes, always use DEC instead of remote mirroring.
Erasure coding, or software RAID, is a method of data protection in which cluster data is redundantly encoded, divided into blocks, and distributed or striped across different locations or storage media. The goal is to enable any data lost due to a drive or node failure in the cluster to be reconstructed using data stored in a part of the cluster that was not affected by the failure.
COS 3.16.1 supports both local erasure coding (LEC), in which data is striped across the disks in a node, and distributed erasure coding (DEC), in which data is striped across nodes in a cluster.
For both LEC and DEC, data recovery is performed as a low-priority background task to avoid possible impact on performance. Even so, erasure coding provides faster data reconstruction than hardware RAID. Speed is important in maintaining resiliency, because a failed node cannot help to recover other failed nodes until it has fully recovered.
Two key factors define the degree of resiliency of a given cluster configuration:
For LEC, COS assigns a default RR of 12:2, but you can choose another RR up to 18:4 maximum. For DEC, the cos-aic-client calculates the RR for a chosen cluster size and RF, but you can choose another RR up to 18:18 maximum. As you consider whether to use the defaults or assign new values, you must weigh a number of factors specific to your deployment, such as:
The following examples illustrate how these factors can be used to determine the best resiliency scheme for a particular COS node or cluster.
To see how the LEC resiliency ratio affects individual node behavior, consider a cluster in which each node has 4 TB disks, runs at 80% full, and can maintain a data read rate of 20 Gbps (2.5 GB/s).
If each node in the cluster has an assigned LEC RR of 10:1:
By comparison, if the same cluster were assigned the default LEC RR of 12:2 instead of 10:1:
Note • Actual data rebuild times are highly dependent on the capacity of the hardware components in the server. Servers that can deliver significantly more throughput reduce the rebuild time, while servers with less capacity increase the rebuild time. LEC uses local disk reads when recovering data, so the speed of the disk channel directly impacts recovery time.
Example 2: DEC at Cluster Level
To see how the DEC resiliency ratio affects cluster configuration and behavior, consider a cluster of 11 nodes, with each node running 80% full, maintaining a data read rate of 16 Gbps (2 GB/s), and having a total storage capacity of 100 TB.
If the cluster has an assigned DEC RR of 8:2:
By comparison, if the same cluster were assigned a DEC RR of 8:1 instead of 8:2:
Note • Actual data rebuild times are directly affected by the number of nodes in the cluster. If there are fewer nodes in the cluster, the work is divided among fewer servers, resulting in longer rebuild times. DEC uses remote data reads across the network when recovering data, so the speed of the network channel directly impacts the recovery time.
Example 3: Using LEC and DEC Together
To see how LEC and DEC work together, consider the cluster configuration described in the previous examples, but with a LEC RR of 10:1 for each node and a DEC RR of 8:2 for the cluster as a whole.
For any cluster configuration using both DEC and LEC, the total parity overhead is found by first applying the DEC overhead and then applying the LEC overhead, as follows:
Total Parity Overhead = (M1/N1) + ((1 + M1/N1) X (M2/N2))
where N1:M1 represents DEC parity and N2:M2 represents LEC parity.
Using the values from this example to illustrate:
Total Parity Overhead = (2/8) + ((1 + 2/8) X (1/10) = 37.5%
Note The parity calculations just described do not account for the additional free disk space needed for storage of recovered data. Be sure to include this requirement in your total overhead calculations. For example, a DEC RR of 8:2 requires free disk space for up to two failed nodes.
To configure resiliency through the V2PC GUI, open the GUI as described in Accessing the V2PC GUI and navigate to Cisco Cloud Object Store (COS) > COS Clusters.
Figure D-1 V2PC GUI, COS Clusters Page
Locate the cluster to be updated and click its Edit icon to enable it for editing. Choose the desired resiliency policy from the Asset Redundancy Policy drop-down list for the endpoint.
Note • If the desired policy does not appear, choose Service Domain Objects > Asset Redundancy Policies to confirm the policy exists. If not, click Add Row and create a new policy. Enter a profile name and expected cluster size, select a model (appropriate interfaces and names auto-populate), assign the profile to a cluster, configure a DNS server, and assign IP pools to each interface.
To configure local mirroring on a node manually:
Step 1 Open (or if not present, create) the COS file /arroyo/test/aftersetupfile for editing.
Step 2 Include the line vault local copy count in the file and set the value to 2, 3, or 4 as appropriate.
Note Setting the value to 1 simply maintains the original data and creates no additional copies.
Step 3 Disable local erasure coding by setting allow vault raid to 0 (or simply omit or remove this line).
To enable local erasure coding manually:
Step 1 Open (or if not present, create) the COS file /arroyo/test/aftersetupfile for editing.
Step 2 Set allow vault raid to 1 to enable LEC.
Step 3 Disable local mirroring by setting vault local copy count to 1 (or simply omit or remove this line).
To migrate a service endpoint from local mirroring to local erasure coding:
Step 1 Temporarily leave local mirroring enabled for the service endpoint.
Step 2 Enable LEC for the service endpoint and let it establish the parity needed for each data object.
Step 3 When parity is established, disable local mirroring.
Note For clusters of two or more nodes, always use DEC instead of remote mirroring.
To enable and configure remote mirroring manually:
Step 1 Open (or if not present, create) the COS file /arroyo/test/aftersetupfile for editing.
Step 2 Set vault mirror copies to the value 2, 3, or 4 as appropriate to enable remote mirroring. The value you enter specifies the object data plus the number of exact copies desired.
Note Setting the value to 1 simply maintains the original data and creates no additional copies.
Step 3 Disable distributed erasure coding by setting allow server raid to 0 (or simply omit or remove this line).
To enable and configure distributed erasure coding manually:
Step 1 Open (or if not present, create) the COS file /arroyo/test/aftersetupfile for editing.
Step 2 Set allow server raid to 1 and add the following lines immediately below:
This controls the number of data blocks used. The default <value> is 8, and the valid range is 1-18.
This controls the number of parity blocks used. The default <value> is 1, and the valid range is 1-18.
Note See Finding N:M Values to determine appropriate data block and parity block values.
Step 3 Disable remote mirroring by setting vault mirror copies to 0 (or simply omit or remove this line).
To configure DEC, you must specify the number of data blocks (N) and parity blocks (M) used for data encoding. Table D-1 shows the corresponding data-to-parity block (N:M) values for a given number of nodes in a cluster and for a given degree of resiliency desired for the cluster. For details, see Defining Resiliency.
Note COS does not currently support configuration of new N:M (data:parity) block values through the V2PC GUI. If you need to configure new N:M values, you must do so in the aftersetup file.
The ratios appearing in the cells of the table are N:M values, where N is the number of data blocks and M is the number of parity blocks needed to achieve the desired resiliency factor for a given node count.
To use the table to find the N:M values for a cluster:
Step 1 In the Nodes column, locate the row corresponding to the number of nodes in the cluster.
Note For COS 3.16.1, you must select the N:M configuration based upon the initial nodes in the cluster. COS does not currently support adding nodes to a cluster after DEC is configured for the cluster.
Step 2 Locate the column in the table whose header represents the desired RF value for the cluster.
Step 3 Find the corresponding N:M value at the intersection of the row and column just located.
Step 4 Configure DEC using N as the number of data blocks and M as the number of parity blocks.
Table D-1 lists the possible N:M values for DEC for 1-20 nodes and a resiliency factor (RF) of 0-4.
Table D-2 shows the total parity overhead (as described in Defining Resiliency) for a given number of nodes and resiliency factor for each of two LEC values, 12:1 and 12:2.
While an object is being created or modified using Swift write operations, copies of the object data can be stored in real time on the local COS node and its peer nodes in the COS cluster. This functionality works only if the RAID feature on the node is disabled.
To disable the RAID feature on the node, open /arroyo/test/setupfile and set allow vault raid to 0.
To replicate object data on the node, open /arroyo/test/setupfile and set vault local copy count to a value greater than 1. This value specifies the how many copies of the object data are to be stored on the node.
To replicate object data on the peer nodes, open /arroyo/test/setupfile and set vault mirror copies to a value greater than 1. This value specifies how many remote copies of the object data are maintained.
COS Release 3.16.1 supports dividing the COS nodes in a cluster into one or more resiliency groups. A resiliency group is a group of servers that work together to support DEC by striping and distributing data internally within the group. Using resiliency groups to manage DEC reduces communication overhead for both the cluster and the site in general.
When using resiliency groups, every node in a COS cluster is defined as a member of a resiliency group. This logical subdivision of the cluster does not, however, interfere with the ability of the cluster to store a file object wherever it makes the most sense based on the resiliency configuration of the cluster and the capacity of the nodes in the cluster.
Once defined, resiliency groups are managed by software and are transparent to the user.
To briefly review the components of a COS installation:
Defining a new resiliency group simply requires assigning an available resiliency group ID to each of the COS nodes to be included in the group. COS supports defining resiliency groups either via the V2PC GUI or via the setup file (use setupfile with MOS or VMP, and aftersetupfile with V2PC).
Note While it is possible to fully configure COS from the CLI, deployments typically use V2PC COS node profiles, in which case V2PC populates the data for the setup file through the COS node profile.
To define resiliency groups using the V2PC GUI:
Step 1 Log in to the V2PC GUI and, from the navigation menu, choose Cisco Cloud Object Storage (COS) > Node Profiles.
Step 2 Locate a COS node to be included in the resiliency group and open its node profile for editing.
Step 3 Assign an available Resiliency Group ID to the node (valid options are 1-200) and save the changes.
Step 4 Repeat Steps 2-3 for all nodes to be assigned to the same resiliency group ID.
To define resiliency groups using the setup file:
Step 1 Open (or if not present, create) one of the following files:
Step 2 Add the following line to the file:
where <value> is a number between 1 and 200.
Note The line allow server raid 0 must be set or cserver will not load.
Step 3 Save the changes to the file.
At boot-up, the local COS node goes through all the servers in the RemoteServers file and finds their Resilience Group ID. This request also tells the remote server the local server Resilience Group ID.
When a read request is received by a server, it will look up the object in Cassandra and locate the corresponding Resilience Group ID(s). If the local Resilience Group is in the list of IDs, the read can continue the same as it does in the current version of COS.
If the first Resilience Group in the list doesn’t actually have the desired data available, check the other resilience Groups one at a time until the desired data is located.
For the case where a different Resilience Group contains the desired object, check if the client supports being redirected to the host Resilience Group, and do so if it is supported.
For the case where read redirection isn’t supported:
1. Proxy read the information for the client.
2. Send a locate request to the servers in the Resilience Group we are connected to
3. Get back the location of the data stripes and their host server IP addresses.
4. Issue HTTP read requests to those servers to read the corresponding stripe data.
5. Once the stripe data is received, forward it to the client.
When a write request is received by a server, it will check if the current Resilience Group’s capacity is above average or below average. If the group is above average for capacity (more than 10%), it will choose another Resilience Group better suited to handle the write and forward the write request to a server in that Resilience Group.
If the client supports redirect, then the client write request can be redirected to the other Resilience Group.
If redirect isn’t supported, then that node will act as a proxy and forward the write request to the other Resilience Group.
Once a Resilience Group accepts a write request, it proceeds the same as in COS 3.12. When the goid value is written to Cassandra, the Resilience Group ID value will also be written so that the goid can be found in the future.
A new proc file /proc/calypso/status/resiliencegroupjsoninfo is available on every server in the cluster. It shows the server's view of the resiliency groups defined for the cluster. These groups are displayed in JSON format for ease of parsing.
Each server’s json info file should be similar – containing the same resilience groups and associated servers – but the order of the entries may be different on different servers.