Use Parallel Upgrades with Acceptable Failures
Fleet Upgrade provides the Parallel upgrades and Acceptable failures fields to help you both speed up and control long upgrade runs with many devices.
As explained in Run a Fleet Upgrade job, whenever you create a Fleet Upgrade job, you can select up to 50 devices to be upgraded during that job. Depending on the number and size of each device upgrade, this can result in jobs lasting hours. This would be worse if you had to upgrade each device one at a time, in series. It would be still worse if you couldn't cancel the run if you started experiencing upgrade failures.
To help with these issues, you can use the Parallel upgrades field to specify how many upgrades you want performed at the same time, in parallel. If you enter a Parallel upgrades value equal to the total number of devices to be upgraded (up to the maximum of 50), all the upgrades will take place at the same time. If you leave Parallel upgrades set to the default value of 1, Fleet Upgrade performs each of them one at a time.
Most users specify a lower Parallel upgrades value, such as 5 or 10. Doing so helps conserve processing resources and ensures that only a few of the network devices in a 50-device job set will be offline at one time.
With a lower Parallel upgrades value, Fleet Upgrade performs the upgrades in batches. For a 50-device upgrade group with a Parallel upgrades value of 5, this means 10 batches of five upgrades each. In this case, Fleet Upgrade performs all five of the upgrades in batch #1 at the same time, in parallel, and doesn't initiate any of the upgrades in batch #2 until all of the upgrades in batch #1 are finished.
How can you cancel a job that's failing too often? Fleet Upgrade will automatically cancel the remaining upgrades in a job depending on the number of Acceptable failures you set. The value you specify in this field acts as a failure "budget" that, when exceeded, triggers automatic cancellation of all of the remaining upgrades in the run. If you want to avoid automatic cancellation entirely, specify an Acceptable failures value equal to the total number of devices to be upgraded (up to the maximum of 50). Set it to the default value of 1 if you want the system to cancel remaining upgrades after the very first failure.
Bear in mind that a batch will run to completion once started. The error budget defined in Acceptable failures will block additional batches from starting if it has been exceeded. Sometimes, this means the total number of actual failures will exceed the failure budget, and it will take longer for cancellation to kick in than you might expect.
For example: Let's assume that our job set is 50 devices. Our Parallel upgrades setting is 5 and our Acceptable failures setting is 5. That means we have 10 batches of 5 devices for Fleet Upgrade to perform. Let's further suppose that, during execution of batch #1, we encounter 4 failures. The 5-failures budget is not yet exceeded, so Fleet Upgrade will begin to execute all the upgrades in batch #2 in parallel. We then encounter 4 more failures in batch #2. The 5-failure budget is now exceeded, so Fleet Upgrade will automatically cancel execution of batch #3 and the remaining 7 other batches. However, we've actually encountered 8 failures, not 5. Similarly, we might encounter only 1 failure each in batches #1, #2, #3, and #4, then encounter 1 more failure in batch #5. We now have a total of 5 failures, but this does not exceed the failure budget, only equals it. So Fleet Upgrade then goes on to the next batch. Then, in batch #6, every upgrade fails, exceeding the failure budget and triggering cancellation of the run. In this case, we've actually encountered 10 failures, twice the number we specified. Also, cancellation wasn't triggered until batch #7 and device #35, some 70 percent of the way through the entire run.














Feedback