Steps to apply Windows Updates
Is there a set of instructions we should follow when applying Windows Updates on our SCSM, Orchestrator and Cireson servers?
Our company has been applying Windows Updates to all the servers on a monthly basis, and currently the update process has crashed our SCSM servers a few times. To prevent this from happening again, we would like to conduct the updates ourselves manually, instead of letting them be done automatically. We wonder if there is a sequence we should follow. Please advise.
Answers
It depends on how you're defining Windows Updates and crashed here. In that, are you referring to ALL Windows Updates? Or just Update Rollups for SC products? For crashing - stalled workflows? Portal unavailable but SCSM still running? A server that is hard down?
Hi, @Adam_Dzyacky, it is all Windows Server Updates, not on System Center products. The servers hanged and therefore stalled workflows. Portal and SCSM were still available.
Got it. So it sounds like the workflow server patches, reboots, and then fails to resume workflows from where it was pre-reboot. The environment otherwise runs because infrastructure is running but workflows just don't process new items.
Update Rollups are really the only thing to be laser focused on in terms of the order of deployment. Otherwise, when it comes to general Windows Updates the order should not matter. So unfortunately without a deep understanding of your environment, the updates, time of events, I can only speculate as to what has happened to these times you refer to.
But what I can tell you is the spectrum I moved across over the years with respect to the workflow server:
Thanks @Adam_Dzyacky. Good to know that there is no particular sequence we need to follow when applying the updates. Our team does not have much control over SCOM, so #3.a.i sound like a good option. May I know how to build it? Please advise.
I highly recommend at least trying to purse the SCOM route because it's a 100% out of box solution from Microsoft. Which makes build time next to nothing and your configuration time super low. Here's link for the SCOM MPs to monitor SCSM.
Never the less, here's the approach for a SCSM/SCO solution that you can build from the ground up. Please make sure you first test this in a development environment.
At this point, we'll have a workflow that runs every 5 minutes that write an Event Log entry that says "Workflows are running". If workflows are not running we won't see the message. We're halfway done and now it's time to move into Orchestrator.
The script here attempts to retrieve the 5555 event from the workflow server. If no results are returned then the Catch engages which write a new event 5556 of a Warning nature (seen as a Yellow Triangle in the Event Log) that writes out that workflows are not running for the sake of auditing. Then we'll stop the workflow service (HealthService), delete the Health Service folder, then start the service back up.
You could also get fancy with the runbook to do things like email you when workflows get restarted so you're aware its happening.
Thanks @Adam_Dzyacky . If we do install the SCOM management pack, will it help us restart the required workflows automatically? Re the term "workflows", I am referring the workflow that moves a newly created Service Request (SR) from "New" to "In Progress". I am also referring to the activity workflow within a SR. The same goes to Change Request (CR). After the system fell apart last week, we had to run PowerShell script against each of the SRs and CRs to update their status and kick start the stalled MAs within the PAs. There were no status to these MAs at all after their preceding activity became Completed. We had to go back to the requests each day to see where they are at and update the activity status manually with PowerShell script as needed.
Please advise.
The SCOM MP will only Alert, so you'd need to wire up a recovery action when that Alert occurs within SCOM.
And yes - we are talking about the same kind of "workflows" here 😁
Hi @Adam_Dzyacky , then the MP will not be that helpful. We need the app to be able to recover on its own. We want to avoid the tragedy that happened last week, so we are thinking of applying the Windows Updates manually, to make sure that the servers and everything is working at the end of the patching process before we open the app to users. Maybe it is something that I need bring to Microsoft's attention.
In both of my examples above, the recovery has to be built by you. Its just that going the SCOM route spares you from building the initial trigger/monitor condition.
Since you said your team doesn't have much control over SCOM, building the SCSM/SCO solution sounds like it will be the fastest route.