Table Of Contents
Troubleshooting Cisco MDS DMM
DMM Overview
Best Practices
License Requirements
Initial Troubleshooting Checklist
Common Troubleshooting Tools
Troubleshooting Connectivity Issues
Cannot Connect to the SSM
No Peer-to-Peer Communication
Connection Timeouts
Troubleshooting Job Creation Issues
Failures During Job Creation
Opening the Job Error Log
DMM License Expires
Scheduled Job is Reset
Failures during Sessions Creation
Troubleshooting Job Execution Issues
DMM Jobs in Fail State
DMM Jobs in Reset State
Troubleshooting General Issues
DMM Error Reason Codes
Troubleshooting Cisco MDS DMM
This chapter describes procedures used to troubleshoot the data migration feature in the Cisco MDS 9000 Family multilayer directors and fabric switches. This chapter contains the following sections:
•DMM Overview
•Best Practices
•License Requirements
•Initial Troubleshooting Checklist
•Troubleshooting Connectivity Issues
•Troubleshooting Job Creation Issues
•Troubleshooting Job Execution Issues
•Troubleshooting General Issues
•DMM Error Reason Codes
DMM Overview
Cisco MDS DMM is an intelligent software application that runs on the Storage Services Module (SSM) of an MDS switch. With Cisco MDS DMM, no rewiring or reconfiguration is required for the server, the existing storage, or the SAN fabric. The SSM can be located anywhere in the fabric, as Cisco MDS DMM operates across the SAN. Data migrations are enabled and disabled by software control from the Cisco Fabric Manager.
Cisco MDS DMM provides a graphical user interface (GUI) (integrated into Fabric Manager) for configuring and executing data migrations. Cisco MDS DMM also provides CLI commands for configuring data migrations and displaying information about data migration jobs.
Best Practices
You can avoid problems when using DMM if you observe the following best practices:
•Use the SLD tool.
The DMM feature includes the Array-Specific Library (ASL), which is a database of information about specific storage array products. DMM uses ASL to automatically correlate LUN maps between multipath port pairs.
Use the SLD CLI or GUI output to ensure that your storage devices are ASL classified.
For migration jobs involving active-passive arrays, use the SLD output to verify the mapping of active and passive LUNs to ports. Only ports with active LUNs should be included in migration jobs.
For more information about the SLD tool, see Checking Storage ASL Status, page 3-2.
•Create a migration plan.
Cisco MDS DMM is designed to minimize the dependency on multiple organizations, and is designed to minimize service disruption. However, even with Cisco MDS DMM, data migration is a fairly complex activity. We recommend that you create a plan to ensure a smooth data migration.
•Configure enclosures.
Before creating a migration job with the DMM GUI, you need to ensure that server and storage ports are included in enclosures. You need to create enclosures for server ports. If the server has multiple single-port HBAs, all of these ports need to be included in one enclosure. Enclosures for existing and new storage ports are typically created automatically.
•Follow the topology guidelines.
Restrictions and recommendations for DMM topology are described in the "DMM Topology Guidelines" section on page 6-3.
•Ensure all required ports are included in the migration job
When creating a data migration job, you must include all possible server HBA ports that access the LUNs being migrated. This is because all writes to a migrated LUN need to be mirrored to the new storage until the cut over occurs, so that no data writes are lost.
For additional information about selecting ports for server-based jobs, refer to the "Ports to Include in a Server-Based Job" section on page 6-4.
License Requirements
Each SSM with Cisco MDS DMM enabled requires a DMM license. DMM operates without a license for a grace period of 180 days.
DMM licenses are described in the "Using DMM Software Licenses" section on page 2-1.
Initial Troubleshooting Checklist
Begin troubleshooting DMM issues by checking the troubleshooting checklist in Table 5-1.
Table 5-1 Initial Troubleshooting Checklist
Checklist
|
Checkoff
|
Verify that an SSM is installed in each fabric, and DMM is enabled on the SSMs.
|
|
Verify that your DMM licenses are valid.
|
|
Verify that DMM is the only intelligent application running on the SSM.
|
|
Verify that the existing and new storage devices are connected to a switch that supports FC-Redirect
|
|
Verify SAN OS 3.2(1) or later is running on the switches hosting the SSM and the storage.
|
|
Verify IP connectivity between peer SSMs, by using the ping command.
|
|
Common Troubleshooting Tools
The following navigation paths may be useful in troubleshooting DMM issues using Fabric Manager:
•Select End Devices > SSM Features to access the SSM configuration.
•Select End Devices > Data Mobility Manager to access the DMM status and configuration.
The following CLI commands on the SSM module may be useful in troubleshooting DMM issues:
•show dmm job
•show dmm job job-id job-id details
•show dmm job job-id job-id session
Note You need to connect to the SSM module using the attach module command prior to using the show dmm commands.
Troubleshooting Connectivity Issues
This section covers the following topics:
•Cannot Connect to the SSM
•No Peer-to-Peer Communication
•Connection Timeouts
Cannot Connect to the SSM
Problems connecting the SSM can be caused by SSH, zoning, or routing configuration issues. Table 5-2 shows possible solutions.
Table 5-2 Cannot Connect to the SSM
Symptom
|
Possible Cause
|
Solution
|
Cannot connect to the SSM.
|
SSH not enabled on the supervisor module.
|
Enable SSH on the switch that hosts the SSM. See Configuring SSH on the Switch, page 2-2.
|
Zoning configuration error.
|
If VSAN 1 default zoning is denied, ensure that the VSAN 1 interface (supervisor module) and the CPP IP/FC interface have the same zoning. See Configuring IP Connectivity, page 2-3
|
IP routing not enabled.
|
Ensure that IPv4 routing is enabled. Use the ip routing command in configuration mode.
|
IP default gateway.
|
Configure the default gateway for the CPP IPFC interface to be the VSAN 1 IP address. See Configuring IP Connectivity, page 2-3
|
No Peer-to-Peer Communication
Table 5-3 shows possible solutions to problems connecting to the peer SSM.
Table 5-3 No Peer-to-Peer Communication
Symptom
|
Possible Cause
|
Solution
|
Cannot ping the peer SSM.
|
No route to the peer SSM.
|
Configure a static route to the peer SSM. See Configuring IP Connectivity, page 2-3
|
Connection Timeouts
If the DMM SSH connection is generating too many timeout errors, you can change the SSL and SSH timeout values. These properties are stored in the Fabric Manager Server properties file (Cisco Systems/MDS 9000/conf/server.properties). You can edit this file with a text editor, or you can set the properties through the Fabric Manager Web Services GUI, under the Admin tab.
The following server properties are related to DMM:
•dmm.read.timeout—Read timeout for job creation. The default value is 60 seconds. The value is displayed in milliseconds.
•dmm.read.ini.timeout—Read timeout for a job or session query. The default value is 5 seconds. The value is displayed in milliseconds.
•dmm.connect.timeout—SSH connection attempt timeout. The default value is 6 seconds. The value is displayed in milliseconds.
•dmm.connection.retry—If set to true, DMM will retry if the first connection attempt fails. By default, set to true.
Troubleshooting Job Creation Issues
The DMM GUI displays error messages to help you troubleshoot basic configuration mistakes when using the job creation wizards. See Creating a Server-Based Migration Job, page 4-4. A list of potential configuration errors is included after the last step in the task.
The following sections describe other issues that may occur during job creation:
•Failures During Job Creation
•DMM License Expires
•Scheduled Job is Reset
•Failures during Sessions Creation
Failures During Job Creation
If you make a configuration mistake while creating a job, the job creation wizard displays an error message to help you troubleshoot the problem. You need to correct your input before the wizard will let you proceed.
Table 5-4 shows other types of failures that may occur during job creation.
Table 5-4 Failures During Job Creation
Symptom
|
Possible Cause
|
Solution
|
Create Job failures.
|
No SSM available.
|
Ensure that the fabric has an SSM with DMM enabled and a valid DMM license.
|
Job infrastructure setup error. Possible causes are incorrect selection of server/storage port pairs, the server and existing storage ports are not zoned, or IP connectivity between SSMs is not configured correctly.
|
The exact error is displayed in the job activity log. See the "Opening the Job Error Log" section.
|
LUN discovery failures.
|
Use the SLD command in the CLI to check that the LUNs are being discovered properly.
|
Opening the Job Error Log
To open the job activity log, follow these steps:
Step 1 (Optional) Drag the wizard window to expose the Data Migration Status command bar.
Step 2 Click the refresh button.
Step 3 Select the job that you are troubleshooting from the list of jobs.
Step 4 Click the Log command to retrieve the job error log.
Step 5 The job information and error strings (if any) for each SSM are displayed.
Step 6 Click Cancel in the Wizard to delete the job.
Note You must retrieve the job activity log before deleting the job.
DMM License Expires
If a time-bound license expires (or the default grace period expires), note the following behavior:
•All jobs currently in progress will continue to execute until they are finished.
•Jobs which are configured but not scheduled will run when the schedule kicks in
•Jobs which are stopped or in a failure state can also be started and executed.
•If the switch or SSM module performs a restart, the existing jobs cannot be restarted until the switch has a valid DMM license.
Scheduled Job is Reset
If the SSM or the switch performs a restart, all scheduled DMM jobs are placed in Reset state. Use the Modify command to restore jobs to the Scheduled state.
For each job, perform the following task:
Step 1 Select the job to be verified from the job list in the Data Migration Status pane.
Step 2 Click the Modify button in the Data Migration Status tool bar.
You see the Reschedule Job pop-up window, as shown in Figure 5-1.
Figure 5-1 Modify Job Schedule
Step 3 The originally configured values for migration rate and schedule are displayed. Modify the values if required.
Step 4 Click OK.
The job is automatically validated. If validation is successful, the job transitions into scheduled state. If you selected the Now radio button, the job starts immediately.
Failures during Sessions Creation
Figure 5-2 Failures during sessions creation
This section helps you troubleshoot an error when the new storage is smaller in size than the existing storage.The above figure in the DMM configuration wizard allows the user to configure sessions for the data migration job. The wizard displays a default session configuration. If any session is marked in red (as in the above figure) it implies that the session LUN in the new storage is smaller in size than the session LUN in the existing storage.
Although the LUN values displayed on the wizard are identical, the displayed LUN value in Gigabytes (GB) is rounded off to the third decimal.
The actual size of the LUNs can be verified using the show commands on the SSM CLI by completing the following steps.
•Note down the host pWWN, existing storage pWWN and the new storage pWWN as displayed on the wizard screen. In the above figure (example) the values are:
–Host: 21:00:00:e0:8b:92:fc:dc
–Existing storage: 44:51:00:06:2b:02:00:00
–New storage: 44:f1:00:06:2b:04:00:00
•Note down the SSM information displayed on the wizard screen. In the above example the SSM chosen for the session is "SSM:SANTest, Module 2", where SANTest is the switch and the SSM is Module 2 on that switch.
•From the switch console "attach" to the SSM console using the command attach module.
–Example: SANTest# attach module 2
•On the SSM CLI, display the Job Information.
=============================================================================================================
Data Mobility Manager Job Information
=============================================================================================================
Num Job Identifier Name Type Mode Method DMM GUI IP Peer SSM DPP Session Status
=============================================================================================================
1 1205521523 admin_2008/03/14-12:05 SRVR ONL METHOD-1 10.1.1.5 NOT_APPL 5 CREATED
•Using the Job Identifier from the CLI output, display the job details.
module-2# show dmm job job-id 1205521523 detail
Look for server information in the output and note down the VI pWWN corresponding to the host port selected:
-------------------------------------------------------------------------
Server Port List (Num Ports :1)
-------------------------------------------------------------------------
Num VSAN Server pWWN Virtual Initiator pWWN
-------------------------------------------------------------------------
1 4 21:00:00:e0:8b:92:fc:dc 26:72:00:0d:ec:4a:63:82
•Using the storage pWWN and the VI pWWN, run the following command to get the LUN information for the existing and new storage:
Output for existing storage:
module-2# show dmm job job-id 1205521523 storage tgt-pwwn 44:51:00:06:2b:02:00:00 vi-pwwn
26:72:00:0d:ec:4a:63:82
show dmm job job-id 1205521523 storage tgt-pwwn 0x445100062b020000 vi-pwwn
0x2672000dec4a6382
Data Mobility Manager LUN Information
StoragePort: 00:00:02:2b:06:00:51:44 VI : 82:63:4a:ec:0d:00:72:26
-------------------------------------------------------------------------------
ProductID : VLUN FC RAMDisk
SerialNum : 2fff00062b0e445100000000
ID : 600062b0000e44510000000000000000
Output for New Storage:
module-2# show dmm job job-id 1205521523 storage tgt-pwwn 44:f1:00:06:2b:04:00:00 vi-pwwn
26:72:00:0d:ec:4a:63:82
show dmm job job-id 1205521523 storage tgt-pwwn 0x44f100062b040000 vi-pwwn
0x2672000dec4a6382
Data Mobility Manager LUN Information
StoragePort: 00:00:04:2b:06:00:f1:44 VI : 82:63:4a:ec:0d:00:72:26
-------------------------------------------------------------------------------
ProductID : VLUN FC RAMDisk
SerialNum : 2fff00062b0e44f100000000
ID : 600062b0000e44f10000000000000000
As you can see from the above example
Existing Storage : Max LBA : 20973567
New Storage : Max LBA : 20971519
•Fix the LUN Size on the New Storage and reconfigure the Job.
Troubleshooting Job Execution Issues
If a failure occurs during the execution of a data migration job, DMM halts the migration job and the job is placed in Failed or Reset state.
The data migration job needs to be validated before restarting it. If the DMM job is in Reset state, FC-Redirect entries are removed. In the DMM GUI, validation is done automatically when you restart the job. In the CLI, you must be in Reset state to validate. You cannot validate in a failed state.
Note If a new port becomes active in the same zone as a migration job in progress, DMM generates a warning message in the system logs.
Troubleshooting job execution failures is described in the following sections:
•DMM Jobs in Fail State
•DMM Jobs in Reset State
DMM Jobs in Fail State
If DMM encounters SSM I/O error to the storage, the job is placed in Failed state. Table 5-5 shows possible solutions for jobs in Failed state.
Table 5-5 DMM Jobs in Failed State
Symptom
|
Possible Cause
|
Solution
|
DMM job status is Failed.
|
SSM failure.
|
If the SSM has performed a reload, you must restart or reschedule all failed jobs when the SSM returns to operational state.
|
Server HBA port offline
|
Check the server status and server port status. When the server port is available, restart the migration.
|
New storage port offline
|
Use FM to determine why the storage port is no longer online. When the storage port is available, restart the migration.
|
Server IO failure
|
Check the DMM Job log for server IO failures.
|
Migration IO failure
|
Check the DMM Job log for migration IO failures.
|
Internal processing failure
|
Check the DMM Job log for internal processing errors.
|
DMM Jobs in Reset State
Table 5-6 shows possible causes and solutions for jobs in Reset state.
Table 5-6 DMM Jobs in Reset State
Symptom
|
Possible Cause
|
Solution
|
DMM Job fails to complete and is placed in Reset state.
|
Server HBA port offline
|
Check the server status and server port status. When the server port is available, restart the migration.
|
Existing or new storage port offline
|
Use Fabric Manager to determine why the storage port is no longer online. When the storage port is available, restart the migration.
|
Server or storage port is moved out of the zone
|
Correct the zone configuration and restart the data migration job.
|
Existing Storage port is moved out of zone
|
Correct the zone configuration and restart the data migration job.
|
New Storage port is moved out of zone
|
Correct the zone configuration and restart the data migration job.
|
Loss of IP connectivity to the peer SSM
|
Restart the data migration job when IP connectivity has been restored.
|
SSM failure
|
If the SSM has performed a reload, you must restart or reschedule all failed jobs when the SSM returns to operational state.
|
Troubleshooting General Issues
If you need assistance with troubleshooting an issue, save the output from the relevant show commands.
You must connect to the SSM to execute DMM show commands. Use the attach module slot command to connect to the SSM.
The show dmm job command provides useful information for troubleshooting DMM issues. For detailed information about using this command, see the DMM CLI Command Reference appendix.
Save the output of command show dmm tech-support into a file when reporting a DMM problem to the technical support organization.
Also run the show tech-support fc-redirect command on all switches with FC-Redirect entries. Save the output into a file.
DMM Error Reason Codes
If DMM encounters an error while running the job creation wizard, a popup window displays the error reason code. Error reason codes are are also captured in the Job Activity Log. Table 5-7 provides a description of the error codes.
Table 5-7 DMM Error Codes
Error Code
|
Description
|
DMM_JOB_NOT_PRESENT
|
A job with specified job id was not found on the SSM.
|
DMM_JOB_ID_DUPLICATE
|
Job creation using a job id that already exists on the SSM.
|
DMM_JOB_ID_ZERO
|
Job id 0 is a invalid job id.
|
DMM_JOB_VSAN_MISMATCH
|
Server port VSAN number and corresponding storage port VSAN number is different.
|
DMM_JOB_TYPE_MISMATCH
|
SSM received a storage job query for a server-based job.
|
DMM_JOB_CREATION_ERROR
|
SSM failed while creating the data structures for the job, which could be a memory allocation failure.
|
DMM_JOB_INTERNAL_ERROR
|
SSM failed while creating the data structures for the job, which could be a memory allocation failure.
|
DMM_JOB_SESSION_EXEC
|
Attempting to delete a job while one or more sessions are in progress. Stop the job first before trying to delete it.
|
DMM_JOB_DPP_ALLOC_FAILURE
|
No DPP available to create a job. The maximum number of allowed jobs on a DPP exceeded.
|
DMM_JOB_INFRA_SETUP_ERROR
|
Failed to setup infrastructure for a job. Possible causes are incorrect selection of server/storage port pairs, the server and existing storage ports are not zoned, or IP connectivity between SSMs is not configured correctly.
|
DMM_JOB_INFRA_REMOTE_LMAP_ERR_TCP_DN
|
Failure to establish connection with the peer SSM during job creation.
|
DMM_JOB_INFRA_FC_REDIRECT_SETUP_ERR
|
Failed to install FC-Redirect entries for one or more server-storage pairs in the job.
|
DMM_JOB_INFRA_DPP_DIED
|
The DPP assigned to the JOB failed during job creation.
|
DMM_JOB_INFRA_NOT_ALLOWED
|
The SSM was unable to create the job. Retry the job creation.
|
DMM_JOB_SRC_LUN_INFO_NOT_PRESENT
|
A source LUN specified in the session was not discovered by SSM. This error can occur when trying to restart/reschedule a job in Reset state. A possible cause is a change in LUN inventory or LUN Mapping on the storage device.
|
DMM_JOB_DST_LUN_INFO_NOT_PRESENT
|
A destination LUN specified in the session was not discovered by SSM. This error can occur when trying to restart/reschedule a job in Reset state. A possible cause is a change in LUN inventory or LUN Mapping on the storage device.
|
DMM_VT_VSAN_DOWN
|
The storage VSAN is not operational or was suspended during JOB creation.
|
DMM_VT_ISAPI_CREATION_FAILED
|
Failed to create a Virtual Target corresponding to the storage port.
|
DMM_FC_RDRT_NO_DNS_ENTRY
|
FC-Redirect configuration failure. Storage/Server port not visible in FC Name Server on SSM switch.
|
DMM_FC_RDRT_NO_ZS_ENTRY
|
FC-Redirect configuration failure. The server and existing storage port are not zoned together.
|
DMM_FC_RDRT_INSTALL_ERROR
|
FC-Redirect configuration could not be installed in the fabric. A possible cause is that CFS is not enabled to distribute FC Redirect configuration.
|
DMM_FC_RDRT_LUXOR_ACL_ERROR
|
FC Redirect failed to program a rewrite entry in the local SSM.
|
DMM_SRVR_VT_LOGIN_SRVR_LOGIN_ERROR
|
SSM failed to log in or discover LUNs from the storage on behalf of the server. This can occur if the new storage access list is not programmed with the server pWWN, or there is no LUN Mapping on the storage for the selected server.
|
DMM_SRVR_VT_LOGIN_VI_LOGIN_ERROR
|
SSM Failed to log in discover LUNs from the storage on behalf of the storage-based job VI. This can occur if the storage access list is not programmed with the VI pWWN, or there is no LUN Mapping on the storage for the VI.
|
DMM_SRVR_VT_NO_PRLI_SRVR
|
No PRLI was received from the server after a PLOGI from the server to the storage was accepted.
|
DMM_PREVIOUS_REQ_INPROGRESS
|
The SSM cannot process a request because a previous operation on the job has not yet completed.
|
DMM_ITL_NOT_FOUND
|
This error may be generated when the user is performing manual correlation of the paths to a LUN from the DMM GUI. It is generated if a specified path (ITL) in the manual correlation has not be discovered by the SSM.
|
DMM_ITL_NOT_FOUND_IN_NON_ASL_LIST
|
Attempt to resolve a LUN path that has not been classified as NON ASL.
|
DMM_ILLEGAL_REQ
|
The selected command cannot be performed in the current job state.
|
DMM_INIT_NOT_FOUND
|
Failed to create a session because the server port is invalid.
|
DMM_SRC_TGT_NOT_FOUND
|
Failed to create a session because the existing storage port is invalid.
|
DMM_DST_TGT_NOT_FOUND
|
Failed to create a session because the new storage port is invalid.
|
DMM_ITL_NOT_FOUND_IN_GUI_ASL_LIST
|
Attempt to update a LUN path that has not been classified as GUI ASL.
|
DMM_ITL_FOUND_IN_AUTO_ASL_LIST
|
Attempt to resolve a LUN path that has already been classified as AUTO ASL.
|
DMM_SRC_LUN_GREATER_THAN_DST
|
Session creation failed because the source LUN has a greater size than the destination LUN.
|
DMM_TGT_NOT_REACHABLE
|
The storage port is offline.
|
DMM_SRC_TGT_NOT_ASL_CLASSIFIED
|
Failure returned when trying to create a session with an source LUN that has not been classified as AUTO ASL or GUI ASL. Manual correlation is required to resolve multipathing for the LUN.
|
DMM_DST_TGT_NOT_ASL_CLASSIFIED
|
Failure returned when trying to create a session with an destination LUN that has not been classified as AUTO ASL or GUI ASL. Manual correlation is required to resolve multipathing for the LUN.
|
DMM_SRC_LUN_ALREADY_EXISTS
|
Failure returned when trying to create a session with an source LUN that has already been used in another session in the job.
|
DMM_DST_LUN_ALREADY_EXISTS
|
Failure returned when trying to create a session with an destination LUN that has already been used in another session in the job.
|
DMM_VT_FC_REDIRECT_GET_CFG_ERR
|
The SSM failed to retrieve the existing configuration from FC-Redirect. The FC-Redirect process may no longer be running on the supervisor module.
|
DMM_NO_LICENSE
|
No active DMM license is available on the SSM where the job is being created.
|
DMM_VI_NOT_SEEING_ANY_LUNS
|
The storage job VI cannot see any LUNs from the existing and new storage ports. Possible causes - no access for VI pWWN on the storage ports or no LUN Mapping for the VI on the storage ports.
|
DMM_VI_NOT_SEEING_ES_LUNS
|
The storage job VI cannot see any LUNs from the existing storage ports. Possible causes - no access for VI pWWN on the existing storage ports or no LUN Mapping for the VI on the existing storage ports.
|
DMM_VI_NOT_SEEING_NS_LUNS
|
The storage job VI cannot see any LUNs from the new storage ports. Possible causes - no access for VI pWWN on the new storage ports or no LUN Mapping for the VI on the new storage ports.
|
DMM_NO_RESOURCES_TRY_LATER
|
Failure returned for the verify operation if shared SSM resources for verify are already being used by another job.
|
DMM_IT_PAIR_PRESENT_IN_ANOTHER_JOB
|
Failure returned for job create if the same server-storage port pair(s) is being used by an existing job.
|
DMM_JOB_NO_OFFLINE_FOR_ASYNC
|
Method 2 Data Migration does not support offline mode.
|
DMM_PEER_IP_CONNECT_FAILURE
|
Failure to estabilish IP connection with peer SSM. Check IP configuration on both SSMs.
|
DMM_VPORT_IN_EXISTING_ZONE:Remove old Storage Job Zones
|
A zone created for a old storage type DMM job still exists. Once a storage job is deleted, the corresponding zone needs to be removed from the zoneset. Delete zones for DMM Jobs that no longer exist.
|