Workflows aren't getting created/run AND SCSM Console - Administration - Workflows - Status is SLOW!
Recently we were having an issue with email notifications not triggering on every ticket as well as Change Requests not completing even after 15 minutes when the last activity has been finished and marked as such.
When I went into [SCSM Console - Administration - Workflows - Status] it was very slow to load the activities. In fact when I clicked on a workflow and clicked "All Instances" it was taking +5 minutes to refresh!
My theory was that this issue is simply caused from too many failed workflows being present.
So I used the SCSM C# SDK (copied into project and referenced Microsoft.EnterpriseManagement.Core.dll from an SCSM Server: C:\Program Files\Microsoft System Center 2012 R2\Service Manager\SDK Binaries ) to retry failed notification subscriptions:
using Microsoft.EnterpriseManagement; using Microsoft.EnterpriseManagement.Subscriptions; ~~~~~~~~~~~~ EnterpriseManagementGroup emg = new EnterpriseManagementGroup("SCSMSERVER"); //MONSTER WORKFLOW LIST List<IWorkflowSubscriptionBase> wfSubs = emg.Subscription.GetSubscriptionsByCriteria(new ManagementPackRuleCriteria("Name LIKE '%'")).ToList(); foreach (IWorkflowSubscriptionBase wfSub in wfSubs) { if (wfSub is NotificationSubscription) { IList<SubscriptionJobStatus> wfSubFailedInstances = emg.Subscription.GetFailedSubscriptionStatusById(wfSub.Id.Value); foreach(SubscriptionJobStatus wfSubFailedInstance in wfSubFailedInstances) { //Retry emg.Subscription.RetryFailedSubscription(wfSub, wfSubFailedInstance); } } }
It took about 5 seconds for the above to complete, I waited 15 minutes before doing anything else, but since I didn't receive messages, it became clear due to such slow performance the job wasn't present when it should've been since I still hadn't received an email for being assigned an incident.
Therefore it seemed due to too many failed workflows, new workflows weren't even being created let alone running when they should've been.
So the next step was to Ignore ALL failed workflow instances!
using Microsoft.EnterpriseManagement; using Microsoft.EnterpriseManagement.Subscriptions; ~~~~~~~~~~~~ EnterpriseManagementGroup emg = new EnterpriseManagementGroup("SCSMSERVER"); //MONSTER WORKFLOW LIST List<IWorkflowSubscriptionBase> wfSubs = emg.Subscription.GetSubscriptionsByCriteria(new ManagementPackRuleCriteria("Name LIKE '%'")).ToList(); foreach (IWorkflowSubscriptionBase wfSub in wfSubs) { IList<SubscriptionJobStatus> wfSubFailedInstances = emg.Subscription.GetFailedSubscriptionStatusById(wfSub.Id.Value); foreach (SubscriptionJobStatus wfSubFailedInstance in wfSubFailedInstances) { //Ignore emg.Subscription.IgnoreFailedSubscription(wfSubFailedInstance); } }
Running this took around 6 minutes the first time (running it after that took about 10 seconds). I then confirmed SCSM Console Workflow Instances were loading far quicker which was a major relief. I also saw the reported CRs were finally marked as completed after doing this.
Long story short, if notifications from SCSM seems to be intermittently failing, you may have too many failed workflow instances present and will have to mark them as ignored so new workflows will be able to get created due to SCSM working as designed, not as intended.
Comments
For the sake of making this easier on those less Visual Studio inclined, here's Connor's code re-written in PowerShell. In order to run you'll have to supply the name of your SCSM management server on line 1 and you'll have to import the native Microsoft SCSM cmd-lets and not SMlets.
Having two options based on where/how I want to integrate it is absolutely outstanding, too. Thanks, gents!
Good work lads.
Through the PowerShell (either version) as well as the SCSM Console - it looks like the absolute craziest highest failure rate is on Cireson's SCSM Action Log Notify. The failures occur when the Action Log is updated with a Comment when IsPrivate = null.
This event will occur when:
And to be very clear here, this is happening because of a bug that has never been addressed by Microsoft in the Exchange Connector. As such, Cireson's MP doesn't have a qualifer to go off of (i.e. it doesn't appear there is an error catch for handling IsPrivate = null). You can also see this as an Event on the workflow server with ID 33880
Anyways, I ran the "Ignore ALL Failed Workflow Instances" code with an extra line just after initializing wfSubFailedInstances:
Here's my log of failed instances over 0.
I do not use SCSM Action Log Notify and after seeing this log, I have since removed the Cireson Survey Build Survey Reports as we never got around to utilizing them. Overall the workflows appear to be working quite smoothly now.
I have an issue where all workflows have stopped being created - 2 days ago. So no email notifications and CRs being stuck at pending. Apparently no changes has been made the the SQL side or the VM with console on.
Can someone adjust the script above so i can run it to check for failed workflows? Or provide any tips?
thanks,
Troubleshooting workflows not being created is not fun at all.
I hope you did not remove the Primary Workflow Server Computer Asset from the Configuration Item section, because that stops all workflows for everything, including connectors. NEVER delete your SCSM servers from the Windows Server view in Service Manager. (See attachment for more info)
However I think what has happened is the Microsoft Monitoring Agent service messed up on the Primary Workflow Server and it needs to create new [dbo].[MT_HealthService] encryption keys again to sync and process data.
On Primary Workflow Server:
1) Stop service Microsoft Monitoring Agent
2) Delete folder "C:\Program Files\Microsoft System Center 2012 R2\Service Manager\Health Service State"
3) Start service Microsoft Monitoring Agent
4) I don't believe you need to stop the other 2 SCSM services, also you could restart server as well if you wish
I also followed some preventative maintenance advice for SCSM Management Servers and it seems to have helped:
The "Microsoft Monitoring Agent" in Control Panel should not have any SCSM Servers listed on the Service Manager primary management server, or other Service Manager management servers. If you have a server listed in the "Microsoft Monitoring Agent Properties" it should be removed.
Also the option "Automatically update management group assignments from AD DS" should be unchecked.
I have tried clearing the health service state folder but i still have like 200 scheduled instances that are not retrying.
I have tried the below.
Im getting all the scheduled instances but i cant retry them
Getting
Cannot find an overload for "RetryFailedSubscription" and the argument count: "1".
You are trying to call RetryFailedSubscription with only 1 parameter. Let's look at the difference between Ignore and Retry.
Ignore - Requires wfSubInstance
Retry - Requires wfSub and wfSubInstance
As you can see, you need to call the RetryFailedSubscription with those 2 parameters.
Let us know how it goes!
Exception calling "RetryFailedSubscription" with "2" argument(s): "Object reference not set to an instance of an object."
I tried running this script (original site is already deleted) which gave me lot of failed workflows.
I can see a few WFs which "Need attention" although having Status = Succeeded in the SCSM console \Administration\Workflows\Status. Clicking on Retry or Ignore has no effect.
I tried to run Adam's PS version of Retry/Ignore Failed/Scheduled WFs script, no change.
@Adam_Dzyacky, @Conner_Wood, do you have any words of wisdom on this? Thank you.
It's been a very long time since I've had to look at this stuff. I think the alerts are separate from the workflow status. However there's nothing you can really do about this unless you wanted to manually use SQL on the SCSM ServiceManager database.
SCSM is a clunky unfinished product and gives "errors" and "alerts" and "warnings" on pretty much everything.
If workflows themselves are failing to run, you may be missing required DLLs on the Primary Workflow Server that is responsible for running all workflows and is considered the thinking brain of SCSM.
Now that is interesting. I don't think I've ever seen Workflows that Need Attention despite having a status of Success on them. I also suspect that Ignore/Retry actions in the Console only work for Workflows with a Status of Failed...that is if intellisense in PowerShell ISE/VSCode is any indication:
When you hit "View Log" on one of these Successful/Needs Attention combos is there anything of value cited in there? If not, you can pipe...
and it should show you all of the individual workflows job details ever so slightly beyond what the console offers.