Software Upgrade on GR Pairs

Considering config commit as reference. The same checklist is also applicable for other upgrade scenarios.

Checklist

Note

Don’t perform cluster sync on both sites (Rack-1/Site-1 and Rack-2/Site-2) at the same time. Trigger manual switchover on Rack-1 before proceeding with Rack-1/Site-1 upgrade.

  • Don’t peform config commits on both sites at the same time. Perform config commit on each site separately.

  • Prior to config commit on Rack-1/Site-1, initiate CLI based switchover on Rack-1/Site-1 and make sure that Rack-2/Site-2 is having Primary ownership for both the instances (instance-id 1 and instance-id 2).

  • Perform config commit on Rack-1/Site-1. Wait till config commit is successful and PODs restart and are back in running state to fetch latest helm charts (if applicable).

  • Revert the role of Rack-1/Site-1 to be Primary (Switch/Reset roles on both sites).

  • Verify that roles of Rack-1//Site-2 (Primary) and Rack-2//Site-2 (Standby) are as expected.

  • Repeat the above checklist for Rack-2/Site-2.

Software Upgrade

Rack-1/Site-1 Upgrade when GR is Enabled

  1. Verify that roles of both instances on Rack-1//Site-1 are in PRIMARY/STANDBY.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY"
  2. Initiate switch role for both instances on Rack-1/Site-1 to STANDBY with failback-interval of 30 seconds. This step transitions the roles from PRIMARY/STANDBY to STANDBY_ERROR/STANDBY_ERROR.

    Note

    Heartbeat between both the sites should be successful.

    geo switch-role instance-id 1 role standby failback-interval 30
    geo switch-role instance-id 2 role standby failback-interval 30
  3. Verify that roles of both instances has moved to STANDBY_ERROR on Rack-1/Site-1.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  4. Verify that roles of both instances has moved to PRIMARY on Rack-2/Site-2.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "PRIMARY"
  5. Perform rolling upgrade (or) non-graceful upgrade using system mode shutdown/running as per the requirement on Rack-1/Site-1.

  6. Perform the following steps post completion of upgrade procedure. Perform health check on Rack-1/Site-1 and ensure the PODs have come up and Rack-1/Site-1 is healthy.

  7. Verify that roles of both instances remain in STANDBY_ERROR mode on Rack-1/Site-1.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  8. Initiate reset role for both instances on Rack-1/Site-1 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.

    geo reset-role instance-id 1 role standby
    geo reset-role instance-id 2 role standby
  9. Verify that the roles of both instances have moved to STANDBY on Rack-1/Site-1.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "STANDBY"
  10. Initiate switch role for instance-id 1 on Rack-2/Site-2 to STANDBY. This step transitions roles of Rack-2/Site-2 from PRIMARY/PRIMARY to STANDBY_ERROR/PRIMARY and Rack-1/Site-1 from STANDBY/STANDBY to PRIMARY/STANDBY.

    geo switch-role instance-id 1 role standby failback-interval 30
  11. Verify that roles of the instances on Rack-2/Site-2 are in STANDBY_ERROR/PRIMARY.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "PRIMARY"
  12. Verify that the roles of both instances on Rack-1/Site-1 are in PRIMARY/STANDBY.

    show role instance-id 1
     result "PRIMARY"
    show role instance-id 2
     result "STANDBY"
  13. Initiate reset role for instance-id 1 on Rack-2/Site-2 to STANDBY. This step transitions the roles of Rack-2/Site-2 from STANDBY_ERROR/PRIMARY to STANDBY/PRIMARY.

    geo reset-role instance-id 1 role standby
  14. Verify that roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
     result "PRIMARY"

Rack-2/Site-2 Upgrade when GR is Enabled

  1. Verify that roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "PRIMARY"
  2. Initiate switch role for both instances on Rack-2/Site-2 to STANDBY with failback-interval of 30 seconds. This step transitions the roles from STANDBY/PRIMARY to STANDBY_ERROR/STANDBY_ERROR.

    geo switch-role instance-id 1 role standby failback-interval 30
    geo switch-role instance-id 2 role standby failback-interval 30
  3. Verify that roles of both instances move to STANDBY_ERROR on Rack-2/Site-2.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  4. Verify that roles of both instances move to PRIMARY on Rack-1/Site-1.

    show role instance-id 1
     result "PRIMARY"
    show role instance-id 2
     result "PRIMARY"
  5. Perform rolling upgrade (or) non-graceful upgrade via system mode shutdown/running as per the requirement on Rack-2/Site-2.

  6. Perform the subsequent steps post completion of upgrade procedure. Perform health check on Rack-2/Site-2 and ensure the PODs have come up and Rack-2/Site-2 is healthy

  7. Verify that roles of both the instances remain in STANDBY_ERROR on Rack-2/Site-2.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  8. Initiate reset role for both instances on Rack-2/Site-2 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.

    geo reset-role instance-id 1 role standby
    geo reset-role instance-id 2 role standby
  9. Verify that the roles of both instances move to STANDBY on Rack-2/Site-2.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "STANDBY"
  10. Initiate switch role for instance-id 2 on Rack-1/Site-1 to STANDBY. This step transitions roles of Rack-1/Site-2 from PRIMARY/PRIMARY to PRIMARY/STANDBY_ERROR and Rack-2/Site-2 from STANDBY/STANDBY to STANDBY/PRIMARY.

    geo switch-role instance-id 2 role standby failback-interval 30
  11. Verify that roles of both instances on Rack-1/Site-1 are in PRIMARY/STANDBY_ERROR.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY_ERROR"
  12. Verify that roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "PRIMARY"
  13. Initiate reset role for instance-id 2 on Rack-1/Site-1 to STANDBY. This step transitions the roles of Rack-1/Site-1 from PRIMARY/STANDBY_ERROR to PRIMARY/STANDBY.

    geo reset-role instance-id 2 role standby
  14. Verify that roles of both the instances on Rack-1/Site-1 are in PRIMARY/STANDBY.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY"