Steps to apply Windows Updates

Carol_Lee · November 2020

Is there a set of instructions we should follow when applying Windows Updates on our SCSM, Orchestrator and Cireson servers?

Our company has been applying Windows Updates to all the servers on a monthly basis, and currently the update process has crashed our SCSM servers a few times. To prevent this from happening again, we would like to conduct the updates ourselves manually, instead of letting them be done automatically. We wonder if there is a sequence we should follow. Please advise.

Adam_Dzyacky · November 2020

It depends on how you're defining Windows Updates and crashed here. In that, are you referring to ALL Windows Updates? Or just Update Rollups for SC products? For crashing - stalled workflows? Portal unavailable but SCSM still running? A server that is hard down?

Carol_Lee · November 2020

Hi, @Adam_Dzyacky, it is all Windows Server Updates, not on System Center products. The servers hanged and therefore stalled workflows. Portal and SCSM were still available.

Adam_Dzyacky · November 2020

Got it. So it sounds like the workflow server patches, reboots, and then fails to resume workflows from where it was pre-reboot. The environment otherwise runs because infrastructure is running but workflows just don't process new items.

Update Rollups are really the only thing to be laser focused on in terms of the order of deployment. Otherwise, when it comes to general Windows Updates the order should not matter. So unfortunately without a deep understanding of your environment, the updates, time of events, I can only speculate as to what has happened to these times you refer to.

But what I can tell you is the spectrum I moved across over the years with respect to the workflow server:

Manual patching. Manual checkups. Manual reset of workflow services.
1. It probably goes without saying but this is the most involved option. But I have some dashboarding solutions that you can use to visualize workflow delays in the portal so you can check in at whim.
Automated patching. Manual checkups
1. Sounds like this is your current state
Automated patching. Automated checkups
1. There are a couple ways to go about this:
  1. A custom SCSM workflow that writes the the Windows Event Log just to say "Everything is fine". But if that message doesn't appear every 5 minutes, use an SCO Runbook to automatically restart workflow services.
  2. Using SCOM with native SCSM monitoring, it can Alert/Notify on delayed workflows. With a custom action you could restart workflow services.

Carol_Lee · November 2020

Thanks @Adam_Dzyacky. Good to know that there is no particular sequence we need to follow when applying the updates. Our team does not have much control over SCOM, so #3.a.i sound like a good option. May I know how to build it? Please advise.

Adam_Dzyacky · November 2020

I highly recommend at least trying to purse the SCOM route because it's a 100% out of box solution from Microsoft. Which makes build time next to nothing and your configuration time super low. Here's link for the SCOM MPs to monitor SCSM.

Never the less, here's the approach for a SCSM/SCO solution that you can build from the ground up. Please make sure you first test this in a development environment.

With the SCSM Authoring tool, create a new management pack. Let's say "SCSM.Workflow.Monitoring.xml"
Create a new Workflow called "MonitorWorkflows"
Run on a schedule
Run it every 5 minutes
Drop a "Windows PowerShell Script" into the workflow and then edit the Script Body
This is a one-liner:

Write-EventLog -LogName "Operations Manager" -Source "Health Service Modules" -EventId 5555 -EntryType Information -Message "Workflows are running"

Save the MP, seal the MP, copy the MonitorWorkflows.dll to the SCSM installation directory on the Workflow server, then import the MP.

At this point, we'll have a workflow that runs every 5 minutes that write an Event Log entry that says "Workflows are running". If workflows are not running we won't see the message. We're halfway done and now it's time to move into Orchestrator.

With SCO, we'll create a new runbook called something like "Monitor SCSM Workflows"
We'll use a "Monitor Date/Time" and have it run every 8 minutes
Link it over to a Run .NET Script of a PowerShell type. This will be a couple lines.

$workflowServerName = "mgmtServerName"
try
{
    #attempt to get the event from the workflow server
    Get-EventLog -LogName "Operations Manager" -Source "Health Service Modules" -InstanceId 5555 -ComputerName $workflowServerName
}
catch
{
    #the event doesn't exist, write a new warning event, restart and clear workflow services on the workflow server
    Invoke-Command -ScriptBlock {
        Write-EventLog -LogName "Operations Manager" -Source "Health Service Modules" -EventId 5556 -EntryType Warning -Message "Workflows are not running. Cleaning up."
        Get-Service "HealthService" | Stop-Service -force
        Remove-Item -Path "C:\Program Files\Microsoft System Center\Service Manager\Health Service State" -Force -Recurse
        Start-Service "HealthService"
    } -ComputerName $workflowServerName
}

The script here attempts to retrieve the 5555 event from the workflow server. If no results are returned then the Catch engages which write a new event 5556 of a Warning nature (seen as a Yellow Triangle in the Event Log) that writes out that workflows are not running for the sake of auditing. Then we'll stop the workflow service (HealthService), delete the Health Service folder, then start the service back up.

You could also get fancy with the runbook to do things like email you when workflows get restarted so you're aware its happening.

Carol_Lee · November 2020

Thanks @Adam_Dzyacky . If we do install the SCOM management pack, will it help us restart the required workflows automatically? Re the term "workflows", I am referring the workflow that moves a newly created Service Request (SR) from "New" to "In Progress". I am also referring to the activity workflow within a SR. The same goes to Change Request (CR). After the system fell apart last week, we had to run PowerShell script against each of the SRs and CRs to update their status and kick start the stalled MAs within the PAs. There were no status to these MAs at all after their preceding activity became Completed. We had to go back to the requests each day to see where they are at and update the activity status manually with PowerShell script as needed.

Please advise.

Adam_Dzyacky · November 2020

The SCOM MP will only Alert, so you'd need to wire up a recovery action when that Alert occurs within SCOM.

And yes - we are talking about the same kind of "workflows" here 😁

Carol_Lee · November 2020

Hi @Adam_Dzyacky , then the MP will not be that helpful. We need the app to be able to recover on its own. We want to avoid the tragedy that happened last week, so we are thinking of applying the Windows Updates manually, to make sure that the servers and everything is working at the end of the patching process before we open the app to users. Maybe it is something that I need bring to Microsoft's attention.

Adam_Dzyacky · November 2020

In both of my examples above, the recovery has to be built by you. Its just that going the SCOM route spares you from building the initial trigger/monitor condition.

Since you said your team doesn't have much control over SCOM, building the SCSM/SCO solution sounds like it will be the fastest route.

Steps to apply Windows Updates

Answers

CIRESON COMMUNITY WEB SITE