Workflows aren't getting created/run AND SCSM Console - Administration - Workflows - Status is SLOW!

Conner_Wood · May 2017

Howdy all,

Recently we were having an issue with email notifications not triggering on every ticket as well as Change Requests not completing even after 15 minutes when the last activity has been finished and marked as such.

When I went into [SCSM Console - Administration - Workflows - Status] it was very slow to load the activities. In fact when I clicked on a workflow and clicked "All Instances" it was taking +5 minutes to refresh!

My theory was that this issue is simply caused from too many failed workflows being present.

So I used the SCSM C# SDK (copied into project and referenced Microsoft.EnterpriseManagement.Core.dll from an SCSM Server: C:\Program Files\Microsoft System Center 2012 R2\Service Manager\SDK Binaries ) to retry failed notification subscriptions:

using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Subscriptions;

~~~~~~~~~~~~

EnterpriseManagementGroup emg = new EnterpriseManagementGroup("SCSMSERVER");
//MONSTER WORKFLOW LIST
List<IWorkflowSubscriptionBase> wfSubs = emg.Subscription.GetSubscriptionsByCriteria(new ManagementPackRuleCriteria("Name LIKE '%'")).ToList();
foreach (IWorkflowSubscriptionBase wfSub in wfSubs)
{
     if (wfSub is NotificationSubscription)
     {
          IList<SubscriptionJobStatus> wfSubFailedInstances = emg.Subscription.GetFailedSubscriptionStatusById(wfSub.Id.Value);
          foreach(SubscriptionJobStatus wfSubFailedInstance in wfSubFailedInstances)
          {
               //Retry
               emg.Subscription.RetryFailedSubscription(wfSub, wfSubFailedInstance);
          }
     }
}

It took about 5 seconds for the above to complete, I waited 15 minutes before doing anything else, but since I didn't receive messages, it became clear due to such slow performance the job wasn't present when it should've been since I still hadn't received an email for being assigned an incident.

Therefore it seemed due to too many failed workflows, new workflows weren't even being created let alone running when they should've been.

So the next step was to Ignore ALL failed workflow instances!

using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Subscriptions;

~~~~~~~~~~~~

EnterpriseManagementGroup emg = new EnterpriseManagementGroup("SCSMSERVER");
//MONSTER WORKFLOW LIST
List<IWorkflowSubscriptionBase> wfSubs = emg.Subscription.GetSubscriptionsByCriteria(new ManagementPackRuleCriteria("Name LIKE '%'")).ToList();
foreach (IWorkflowSubscriptionBase wfSub in wfSubs)
{
     IList<SubscriptionJobStatus> wfSubFailedInstances = emg.Subscription.GetFailedSubscriptionStatusById(wfSub.Id.Value);
     foreach (SubscriptionJobStatus wfSubFailedInstance in wfSubFailedInstances)
     {
          //Ignore
          emg.Subscription.IgnoreFailedSubscription(wfSubFailedInstance);
     }
}

Running this took around 6 minutes the first time (running it after that took about 10 seconds). I then confirmed SCSM Console Workflow Instances were loading far quicker which was a major relief. I also saw the reported CRs were finally marked as completed after doing this.

Long story short, if notifications from SCSM seems to be intermittently failing, you may have too many failed workflow instances present and will have to mark them as ignored so new workflows will be able to get created due to SCSM working as designed, not as intended.

Adam_Dzyacky · May 2017

Nice one @Conner_Wood! I was recently looking for something to mass ignore all workflows that didn't involve changing the SQL Grooming Retention policy for Job History. Your C# provided is just enough to...

For the sake of making this easier on those less Visual Studio inclined, here's Connor's code re-written in PowerShell. In order to run you'll have to supply the name of your SCSM management server on line 1 and you'll have to import the native Microsoft SCSM cmd-lets and not SMlets.

$scsmMgmtServer = "scsmMGMTServerHere"
$emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer

#MONSTER WORKFLOW LIST
$wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")

foreach ($wfSub in $wfSubs)
{
    $wfSubFailedInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Failed"}
    foreach ($subFailedInstance in $wfSubFailedInstances)
    {
        #Ignore
        $emg.Subscription.IgnoreFailedSubscription($subFailedInstance)
    }
}

Tom_Hendricks · May 2017

I was just about to write something like this, and LOVE the fact that I no longer need to!

Having two options based on where/how I want to integrate it is absolutely outstanding, too. Thanks, gents!

Leigh_Kilday · May 2017

@David_Wells, this may interest you.

Good work lads.

Britton_Plath · May 2017

Hey, I updated the powershell version with some testing options to see what the counts are prior to updating the failed requests. Also, included a subtext for pulling a specific MP.

$scsmMgmtServer = "managementservername"

$regkey = 'HKLM:\SOFTWARE\Microsoft\System Center\2010\Service Manager\Setup'
$ModuleFile = 'System.Center.Service.Manager.psd1'
$SMInstallDir = (Get-ItemProperty -Path $regkey).InstallDirectory
$FullModulePath = $SMInstallDir +'PowerShell\' +$ModuleFile

import-module $fullmodulepath -force

$emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer

#MONSTER WORKFLOW LIST
$wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")

<# How to find a specific MP and ID
$mp = Get-SCSMManagementPack -displayname "SCSM Action Log Notify"
$mp.ID
#>

#Option to only pull a specific Managment Pack
#$wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("ManagementPackId = '5588305e-2023-0f20-09fc-508654cecc61'")

foreach ($wfSub in $wfSubs)
{
    #$wfSub.Name
    $wfSubFailedInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Failed"}
    
    #Option to only show the failed MP Name and the failed Count of Workflows. 
    if($wfSubFailedInstances.Count -gt 0){
        $wfSub.Name +': '+ $wfSubFailedInstances.Count
    }
    
    <#Option to clear the failed instances
    foreach ($subFailedInstance in $wfSubFailedInstances)
    {
        #Ignore
        $emg.Subscription.IgnoreFailedSubscription($subFailedInstance)
    }
    #>
}

Adam_Dzyacky · May 2017

Nice update @Britton_Plath

Through the PowerShell (either version) as well as the SCSM Console - it looks like the absolute craziest highest failure rate is on Cireson's SCSM Action Log Notify. The failures occur when the Action Log is updated with a Comment when IsPrivate = null.

This event will occur when:

Someone who isn't the Affected User/Assigned To updates a work item
Your workflow account is sending notifications (like your Work Item has been created, updated) and it appends to the Action Log...as your wf account isn't Affected User/Assigned To

And to be very clear here, this is happening because of a bug that has never been addressed by Microsoft in the Exchange Connector. As such, Cireson's MP doesn't have a qualifer to go off of (i.e. it doesn't appear there is an error catch for handling IsPrivate = null). You can also see this as an Event on the workflow server with ID 33880

Conner_Wood · June 2017

Today I encountered an issue with workflows, I've assessed it was due to using AlwaysOn for the ServiceManager database and it automatically failed over to another server breaking workflows from even being created. The trigger would go to the database but it wouldn't go to the primary node. Personally I think [MT_Microsoft$SystemCenter$ResourceAccessLayer$SqlResourceStore] will need to be updated for the SQL Server portion to point to the AlwaysOn Listener Node which always goes to the Primary Node....

Anyways, I ran the "Ignore ALL Failed Workflow Instances" code with an extra line just after initializing wfSubFailedInstances:

System.Diagnostics.Debug.WriteLine("Workflow=" + wfSub.DisplayName + " , Workflow Failed Instance Count: " + wfSubFailedInstances.Count.ToString());

Here's my log of failed instances over 0.

Workflow=Cireson Survey Build Survey Reports , Workflow Failed Instance Count: 2813

Workflow=Adjust Incident Priority and Resolutin Time Rule Add , Workflow Failed Instance Count: 12

Workflow=SHR Response SLO for All Priority 4 Incidents-DeleteRelationship , Workflow Failed Instance Count: 15

Workflow=Incident Resolution OLA For P5 Incidents-DeleteRelationship , Workflow Failed Instance Count: 374

Workflow=Resolve Child Incidents (Parent Incident created) , Workflow Failed Instance Count: 15

Workflow=Incident Resolution OLA For P4 Incidents-DeleteRelationship , Workflow Failed Instance Count: 17

Workflow=Cireson Auto Close Service Requests Workflow , Workflow Failed Instance Count: 1

Workflow=Incident Resolution OLA For P5 Incidents-AddRelationship , Workflow Failed Instance Count: 29

Workflow=SetSupportGroupAssignDateOnNew , Workflow Failed Instance Count: 1

Workflow=IT Servers SLO notification-breached , Workflow Failed Instance Count: 1

Workflow= , Workflow Failed Instance Count: 1

Workflow=SHR Response SLO for All Priority 5 Incidents-DeleteRelationship , Workflow Failed Instance Count: 359

Workflow=Cireson Console Licensing Workflow , Workflow Failed Instance Count: 7

Workflow=SHR Response SLO for All Priority 3 Incidents-DeleteRelationship , Workflow Failed Instance Count: 9

Workflow=Cireson Asset Management Hardware Asset Catalog Item Workflow , Workflow Failed Instance Count: 2

Workflow=SystemCenter MonitoringHostKeepAlive Rule , Workflow Failed Instance Count: 3

I do not use SCSM Action Log Notify and after seeing this log, I have since removed the Cireson Survey Build Survey Reports as we never got around to utilizing them. Overall the workflows appear to be working quite smoothly now.

Conner_Wood · July 2017

@Adam_Dzyacky It never hurts to show which native SCSM modules need to be imported, here's updated PowerShell which also lists out how many failed for each workflow.

Import-Module -Force -Name Microsoft.EnterpriseManagement.Core.Cmdlets
Import-Module -Force -Name Microsoft.EnterpriseManagement.ServiceManager.Cmdlets

cls

$scsmMgmtServer = "scsmMGMTServerHere"<br>$emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer

#MONSTER WORKFLOW LIST
$wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")

foreach ($wfSub in $wfSubs)
{
    $wfSubFailedInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Failed"}
    Write-Host "#Failed=$($wfSubFailedInstances.Count) | WorkflowName=$($wfSub.DisplayName)"
    foreach ($subFailedInstance in $wfSubFailedInstances)
    {
        #Ignore
        $emg.Subscription.IgnoreFailedSubscription($subFailedInstance)
    }
}

Pierre_Smit · February 2018

Hi All,

I have an issue where all workflows have stopped being created - 2 days ago. So no email notifications and CRs being stuck at pending. Apparently no changes has been made the the SQL side or the VM with console on.

Can someone adjust the script above so i can run it to check for failed workflows? Or provide any tips?

thanks,

Conner_Wood · February 2018

@Pierre_Smit you are in for a world of hurt.
Troubleshooting workflows not being created is not fun at all.

I hope you did not remove the Primary Workflow Server Computer Asset from the Configuration Item section, because that stops all workflows for everything, including connectors. NEVER delete your SCSM servers from the Windows Server view in Service Manager. (See attachment for more info)

However I think what has happened is the Microsoft Monitoring Agent service messed up on the Primary Workflow Server and it needs to create new [dbo].[MT_HealthService] encryption keys again to sync and process data.

On Primary Workflow Server:
1) Stop service Microsoft Monitoring Agent
2) Delete folder "C:\Program Files\Microsoft System Center 2012 R2\Service Manager\Health Service State"
3) Start service Microsoft Monitoring Agent
4) I don't believe you need to stop the other 2 SCSM services, also you could restart server as well if you wish

I also followed some preventative maintenance advice for SCSM Management Servers and it seems to have helped:

The "Microsoft Monitoring Agent" in Control Panel should not have any SCSM Servers listed on the Service Manager primary management server, or other Service Manager management servers. If you have a server listed in the "Microsoft Monitoring Agent Properties" it should be removed.

Also the option "Automatically update management group assignments from AD DS" should be unchecked.

Magnus_Lundgren1 · February 2018

Could this be used to trigger retry on scheduled workflows?
I have tried clearing the health service state folder but i still have like 200 scheduled instances that are not retrying.

I have tried the below.
Im getting all the scheduled instances but i cant retry them

Import-Module -Force -Name Microsoft.EnterpriseManagement.Core.Cmdlets
Import-Module -Force -Name Microsoft.EnterpriseManagement.ServiceManager.Cmdlets

cls

$scsmMgmtServer = "srv-sc-sm01"
$emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer

#MONSTER WORKFLOW LIST
$wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")

foreach ($wfSub in $wfSubs)
{
    $wfSubFailedInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Scheduled"}
    Write-Host "#Scheduled=$($wfSubFailedInstances.Count) | WorkflowName=$($wfSub.DisplayName)"
        foreach ($subFailedInstance in $wfSubFailedInstances)
    {
        #Ignore
        $emg.Subscription.RetryFailedSubscription($subFailedInstance)
    }
}

Getting

Cannot find an overload for "RetryFailedSubscription" and the argument count: "1".

Conner_Wood · February 2018

I'm not sure if you can restart a scheduled workflow because it hasn't failed yet, however I know you can ignore it.

You are trying to call RetryFailedSubscription with only 1 parameter. Let's look at the difference between Ignore and Retry.

Import-Module -Force -Name Microsoft.EnterpriseManagement.Core.Cmdlets
Import-Module -Force -Name Microsoft.EnterpriseManagement.ServiceManager.Cmdlets

$scsmMgmtServer = "scsmMGMTServerHere"
$emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer

$emg.Subscription | Get-Member | Select-Object -ExpandProperty Definition

####
# YIELDS THE FOLLOWING METHOD INFORMATION
####

void IgnoreFailedSubscription(Microsoft.EnterpriseManagement.Subscriptions.SubscriptionJobStatus subscriptionWorkflowStatus), 
void ISubscriptionManagement.IgnoreFailedSubscription(Microsoft.EnterpriseManagement.Subscriptions.SubscriptionJobStatus subscriptionWorkflowStatus)

void RetryFailedSubscription(Microsoft.EnterpriseManagement.Subscriptions.IWorkflowSubscriptionBase subscriptionWorkflow, Microsoft.EnterpriseManagement.Subscriptions.SubscriptionJobStatus subscriptionWorkflowStatus), 
void ISubscriptionManagement.RetryFailedSubscription(Microsoft.EnterpriseManagement.Subscriptions.IWorkflowSubscriptionBase subscriptionWorkflow, Microsoft.EnterpriseManagement.Subscriptions.SubscriptionJobStatus subscriptionWorkflowStatus)

Ignore - Requires wfSubInstance
Retry - Requires wfSub and wfSubInstance

As you can see, you need to call the RetryFailedSubscription with those 2 parameters.

Import-Module -Force -Name Microsoft.EnterpriseManagement.Core.Cmdlets
Import-Module -Force -Name Microsoft.EnterpriseManagement.ServiceManager.Cmdlets

cls

$scsmMgmtServer = "srv-sc-sm01"
$emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer

#MONSTER WORKFLOW LIST
$wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")

foreach ($wfSub in $wfSubs)
{
    $wfSubInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Scheduled"}
    Write-Host "#Scheduled=$($wfSubInstances.Count) | WorkflowName=$($wfSub.DisplayName)"
    foreach ($subInstance in $wfSubInstances)
    {
        #Retry (REQUIRES $wfSub AND $subInstance)
        $emg.Subscription.RetryFailedSubscription($wfSub, $subInstance)
    }
}

Let us know how it goes!

Magnus_Lundgren1 · February 2018

Did not work

Exception calling "RetryFailedSubscription" with "2" argument(s): "Object reference not set to an instance of an object."

Peter_Miklian · July 2020

I tried running this script (original site is already deleted) which gave me lot of failed workflows.

I can see a few WFs which "Need attention" although having Status = Succeeded in the SCSM console \Administration\Workflows\Status. Clicking on Retry or Ignore has no effect.

I tried to run Adam's PS version of Retry/Ignore Failed/Scheduled WFs script, no change.

@Adam_Dzyacky, @Conner_Wood, do you have any words of wisdom on this? Thank you.

Conner_Wood · July 2020

It's been a very long time since I've had to look at this stuff. I think the alerts are separate from the workflow status. However there's nothing you can really do about this unless you wanted to manually use SQL on the SCSM ServiceManager database.

SCSM is a clunky unfinished product and gives "errors" and "alerts" and "warnings" on pretty much everything.

If workflows themselves are failing to run, you may be missing required DLLs on the Primary Workflow Server that is responsible for running all workflows and is considered the thinking brain of SCSM.

Adam_Dzyacky · July 2020

Now that is interesting. I don't think I've ever seen Workflows that Need Attention despite having a status of Success on them. I also suspect that Ignore/Retry actions in the Console only work for Workflows with a Status of Failed...that is if intellisense in PowerShell ISE/VSCode is any indication:

$emg.Subscription.IgnoreFailedSubscription(...
$emg.Subscription.RetryFailedSubscription(...

When you hit "View Log" on one of these Successful/Needs Attention combos is there anything of value cited in there? If not, you can pipe...

$subFailedInstance | select *

and it should show you all of the individual workflows job details ever so slightly beyond what the console offers.

Workflows aren't getting created/run AND SCSM Console - Administration - Workflows - Status is SLOW!

Comments

CIRESON COMMUNITY WEB SITE