Software Upgrade on GR Pairs

Considering config commit as reference. The same checklist is also applicable for other upgrade scenarios.

Checklist

Note

Do not perform cluster sync on both racks (Rack-1 and Rack-2) at the same time. Trigger manual switchover on Rack-1 before proceeding with Rack-1 upgrade.

  • Do not perform config commits on both racks at the same time. Perform config commit on each rack separately.

  • Before to the config commit procedure on Rack-1, initiate the CLI-based switchover on Rack-1 and make sure that Rack-2 is having Primary ownership for both the instances (instance-id 1 and instance-id 2).

  • Perform config commit on Rack-1. Wait for the successful config commit, PODs restart, and are back in running state to fetch the latest helm charts (if applicable).

  • Revert the role of Rack-1 to be Primary (Switch/Reset roles on both racks).

  • Verify that the available roles of Rack-1 (Primary) and Rack-2 (Standby) are on the expected status.

  • Repeat the preceding checklist for Rack-2.

Software Upgrade

Upgrading the Rack-1, when the GR is Enabled:

  1. Verify that the available roles of both instances on Rack-1 are in PRIMARY/STANDBY.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY"
  2. Initiate switch role for both instances on Rack-1 to STANDBY with failback-interval of 0 seconds. This step transitions the roles from PRIMARY/STANDBY to STANDBY_ERROR/STANDBY_ERROR.

    geo switch-role instance-id 1 role standby [failback-interval 0]
    geo switch-role instance-id 2 role standby [failback-interval 0]
    Note
    • Heartbeat between both the racks must be successful.

    • The CLI failback-interval is an optional command to provide backward compatibility of upgrades between releases. The value of failback-interval is 0. It is deprecated from current release and will be discontinued from the subsequent releases.

  3. Verify that the available roles of both instances have moved to STANDBY_ERROR on Rack-1.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  4. Verify that the available roles of both instances have moved to PRIMARY on Rack-2.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "PRIMARY"
  5. Perform rolling upgrade (or) non-graceful upgrade using system mode shutdown/running as per the requirement on Rack-1. To allow replication to finish, give a 5-minute gap between the GR switchover and SMF shutdown.

  6. Perform the following steps post completion of the upgrade procedure. Perform health check on Rack-1 and ensure the PODs have come up and Rack-1 is healthy.

  7. Verify that the available roles of both instances remain in STANDBY_ERROR mode on Rack-1.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  8. Initiate reset role for both instances on Rack-1 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.

    geo reset-role instance-id 1 role standby
    geo reset-role instance-id 2 role standby
  9. Verify that the roles of both instances have moved to STANDBY on Rack-1.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "STANDBY"
  10. Initiate switch role for instance-id 1 on Rack-2 to STANDBY. This step transitions the available roles of Rack-2 from PRIMARY/PRIMARY to STANDBY_ERROR/PRIMARY and Rack-1 from STANDBY/STANDBY to PRIMARY/STANDBY.

    geo switch-role instance-id 1 role standby [failback-interval 0]
  11. Verify that the available roles of the instances on Rack-2 are in STANDBY_ERROR/PRIMARY.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "PRIMARY"
  12. Verify that the available roles of both instances on Rack-1 are in PRIMARY/STANDBY.

    show role instance-id 1
     result "PRIMARY"
    show role instance-id 2
     result "STANDBY"
  13. Initiate reset role for instance-id 1 on Rack-2 to STANDBY. This step transitions the roles of Rack-2 from STANDBY_ERROR/PRIMARY to STANDBY/PRIMARY.

    geo reset-role instance-id 1 role standby
  14. Verify that the available roles of both instances on Rack-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
     result "PRIMARY"

Upgrading the Rack-2, when the GR is Enabled:

  1. Verify that the available roles of both instances on Rack-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "PRIMARY"
  2. Initiate switch role for both instances on Rack-2 to STANDBY with failback-interval of 0 seconds. This step transitions the roles from STANDBY/PRIMARY to STANDBY_ERROR/STANDBY_ERROR.

    geo switch-role instance-id 1 role standby [failback-interval 0]
    geo switch-role instance-id 2 role standby [failback-interval 0]
  3. Verify that the available roles of both instances move to STANDBY_ERROR on Rack-2.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  4. Verify that the available roles of both instances move to PRIMARY on Rack-1.

    show role instance-id 1
     result "PRIMARY"
    show role instance-id 2
     result "PRIMARY"
  5. Perform rolling upgrade (or) non-graceful upgrade via system mode shutdown/running as per the requirement on Rack-2.

  6. Perform the subsequent steps post completion of the upgrade procedure. Perform the health check on Rack-2 and ensure the PODs have come up and Rack-2 is healthy.

  7. Verify that the available roles of both the instances remain in STANDBY_ERROR on Rack-2.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  8. Initiate reset role for both instances on Rack-2 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.

    geo reset-role instance-id 1 role standby
    geo reset-role instance-id 2 role standby
  9. Verify that the available roles of both instances move to STANDBY on Rack-2.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "STANDBY"
  10. Initiate switch role for instance-id 2 on Rack-1 to STANDBY. This step transitions the available roles of Rack-1 from PRIMARY/PRIMARY to PRIMARY/STANDBY_ERROR and Rack-2 from STANDBY/STANDBY to STANDBY/PRIMARY.

    geo switch-role instance-id 2 role standby [failback-interval 0]
  11. Verify that the available roles of both instances on Rack-1 are in PRIMARY/STANDBY_ERROR.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY_ERROR"
  12. Verify that the available roles of both instances on Rack-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "PRIMARY"
  13. Initiate reset role for instance-id 2 on Rack-1 to STANDBY. This step transitions the roles of Rack-1 from PRIMARY/STANDBY_ERROR to PRIMARY/STANDBY.

    geo reset-role instance-id 2 role standby
  14. Verify that the available roles of both the instances on Rack-1 are in PRIMARY/STANDBY.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY"