Workflows aren't getting created/run AND SCSM Console - Administration - Workflows - Status is SLOW!

Conner_WoodConner_Wood Customer Advanced IT Monkey ✭✭✭
Howdy all,

Recently we were having an issue with email notifications not triggering on every ticket as well as Change Requests not completing even after 15 minutes when the last activity has been finished and marked as such.

When I went into [SCSM Console - Administration - Workflows - Status] it was very slow to load the activities.  In fact when I clicked on a workflow and clicked "All Instances" it was taking +5 minutes to refresh!

My theory was that this issue is simply caused from too many failed workflows being present.

So I used the SCSM C# SDK (copied into project and referenced Microsoft.EnterpriseManagement.Core.dll from an SCSM Server: C:\Program Files\Microsoft System Center 2012 R2\Service Manager\SDK Binaries ) to retry failed notification subscriptions:
using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Subscriptions;

~~~~~~~~~~~~

EnterpriseManagementGroup emg = new EnterpriseManagementGroup("SCSMSERVER");
//MONSTER WORKFLOW LIST
List<IWorkflowSubscriptionBase> wfSubs = emg.Subscription.GetSubscriptionsByCriteria(new ManagementPackRuleCriteria("Name LIKE '%'")).ToList();
foreach (IWorkflowSubscriptionBase wfSub in wfSubs)
{
     if (wfSub is NotificationSubscription)
     {
          IList<SubscriptionJobStatus> wfSubFailedInstances = emg.Subscription.GetFailedSubscriptionStatusById(wfSub.Id.Value);
          foreach(SubscriptionJobStatus wfSubFailedInstance in wfSubFailedInstances)
          {
               //Retry
               emg.Subscription.RetryFailedSubscription(wfSub, wfSubFailedInstance);
          }
     }
}

It took about 5 seconds for the above to complete, I waited 15 minutes before doing anything else, but since I didn't receive messages, it became clear due to such slow performance the job wasn't present when it should've been since I still hadn't received an email for being assigned an incident.

Therefore it seemed due to too many failed workflows, new workflows weren't even being created let alone running when they should've been.

So the next step was to Ignore ALL failed workflow instances!
using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Subscriptions;

~~~~~~~~~~~~

EnterpriseManagementGroup emg = new EnterpriseManagementGroup("SCSMSERVER");
//MONSTER WORKFLOW LIST
List<IWorkflowSubscriptionBase> wfSubs = emg.Subscription.GetSubscriptionsByCriteria(new ManagementPackRuleCriteria("Name LIKE '%'")).ToList();
foreach (IWorkflowSubscriptionBase wfSub in wfSubs)
{
     IList<SubscriptionJobStatus> wfSubFailedInstances = emg.Subscription.GetFailedSubscriptionStatusById(wfSub.Id.Value);
     foreach (SubscriptionJobStatus wfSubFailedInstance in wfSubFailedInstances)
     {
          //Ignore
          emg.Subscription.IgnoreFailedSubscription(wfSubFailedInstance);
     }
}

Running this took around 6 minutes the first time (running it after that took about 10 seconds).  I then confirmed SCSM Console Workflow Instances were loading far quicker which was a major relief.  I also saw the reported CRs were finally marked as completed after doing this.

Long story short, if notifications from SCSM seems to be intermittently failing, you may have too many failed workflow instances present and will have to mark them as ignored so new workflows will be able to get created due to SCSM working as designed, not as intended.

Comments

  • Adam_DzyackyAdam_Dzyacky Customer Contributor Monkey ✭✭✭✭✭
    edited May 2017
    Nice one @Conner_Wood! I was recently looking for something to mass ignore all workflows that didn't involve changing the SQL Grooming Retention policy for Job History. Your C# provided is just enough to...

    For the sake of making this easier on those less Visual Studio inclined, here's Connor's code re-written in PowerShell. In order to run you'll have to supply the name of your SCSM management server on line 1 and you'll have to import the native Microsoft SCSM cmd-lets and not SMlets.

    $scsmMgmtServer = "scsmMGMTServerHere"
    $emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer
    
    #MONSTER WORKFLOW LIST
    $wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")
    
    foreach ($wfSub in $wfSubs)
    {
        $wfSubFailedInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Failed"}
        foreach ($subFailedInstance in $wfSubFailedInstances)
        {
            #Ignore
            $emg.Subscription.IgnoreFailedSubscription($subFailedInstance)
        }
    }

  • Tom_HendricksTom_Hendricks Customer Super IT Monkey ✭✭✭✭✭
    I was just about to write something like this, and LOVE the fact that I no longer need to!

    Having two options based on where/how I want to integrate it is absolutely outstanding, too.  Thanks, gents!
  • Leigh_KildayLeigh_Kilday Member Ninja IT Monkey ✭✭✭✭
    @David_Wells, this may interest you.

    Good work lads.
  • Britton_PlathBritton_Plath Customer IT Monkey ✭
    Hey, I updated the powershell version with some testing options to see what the counts are prior to updating the failed requests. Also, included a subtext for pulling a specific MP.
    $scsmMgmtServer = "managementservername"
    
    $regkey = 'HKLM:\SOFTWARE\Microsoft\System Center\2010\Service Manager\Setup'
    $ModuleFile = 'System.Center.Service.Manager.psd1'
    $SMInstallDir = (Get-ItemProperty -Path $regkey).InstallDirectory
    $FullModulePath = $SMInstallDir +'PowerShell\' +$ModuleFile
    
    import-module $fullmodulepath -force
    
    $emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer
    
    #MONSTER WORKFLOW LIST
    $wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")
    
    <# How to find a specific MP and ID
    $mp = Get-SCSMManagementPack -displayname "SCSM Action Log Notify"
    $mp.ID
    #>
    
    #Option to only pull a specific Managment Pack
    #$wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("ManagementPackId = '5588305e-2023-0f20-09fc-508654cecc61'")
    
    foreach ($wfSub in $wfSubs)
    {
        #$wfSub.Name
        $wfSubFailedInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Failed"}
        
        #Option to only show the failed MP Name and the failed Count of Workflows. 
        if($wfSubFailedInstances.Count -gt 0){
            $wfSub.Name +': '+ $wfSubFailedInstances.Count
        }
        
        <#Option to clear the failed instances
        foreach ($subFailedInstance in $wfSubFailedInstances)
        {
            #Ignore
            $emg.Subscription.IgnoreFailedSubscription($subFailedInstance)
        }
        #>
    }
  • Adam_DzyackyAdam_Dzyacky Customer Contributor Monkey ✭✭✭✭✭
    edited May 2017
    Nice update @Britton_Plath

    Through the PowerShell (either version) as well as the SCSM Console - it looks like the absolute craziest highest failure rate is on Cireson's SCSM Action Log Notify. The failures occur when the Action Log is updated with a Comment when IsPrivate = null.

    This event will occur when:
    • Someone who isn't the Affected User/Assigned To updates a work item
    • Your workflow account is sending notifications (like your Work Item has been created, updated) and it appends to the Action Log...as your wf account isn't Affected User/Assigned To

    And to be very clear here, this is happening because of a bug that has never been addressed by Microsoft in the Exchange Connector. As such, Cireson's MP doesn't have a qualifer to go off of (i.e. it doesn't appear there is an error catch for handling IsPrivate = null). You can also see this as an Event on the workflow server with ID 33880
  • Conner_WoodConner_Wood Customer Advanced IT Monkey ✭✭✭
    Today I encountered an issue with workflows, I've assessed it was due to using AlwaysOn for the ServiceManager database and it automatically failed over to another server breaking workflows from even being created.  The trigger would go to the database but it wouldn't go to the primary node.  Personally I think [MT_Microsoft$SystemCenter$ResourceAccessLayer$SqlResourceStore] will need to be updated for the SQL Server portion to point to the AlwaysOn Listener Node which always goes to the Primary Node....

    Anyways, I ran the "Ignore ALL Failed Workflow Instances" code with an extra line just after initializing wfSubFailedInstances:
    System.Diagnostics.Debug.WriteLine("Workflow=" + wfSub.DisplayName + " , Workflow Failed Instance Count: " + wfSubFailedInstances.Count.ToString());

    Here's my log of failed instances over 0.
    Workflow=Cireson Survey Build Survey Reports , Workflow Failed Instance Count: 2813
    Workflow=Adjust Incident Priority and Resolutin Time Rule Add , Workflow Failed Instance Count: 12
    Workflow=SHR Response SLO for All Priority 4 Incidents-DeleteRelationship , Workflow Failed Instance Count: 15
    Workflow=Incident Resolution OLA For P5 Incidents-DeleteRelationship , Workflow Failed Instance Count: 374
    Workflow=Resolve Child Incidents (Parent Incident created) , Workflow Failed Instance Count: 15
    Workflow=Incident Resolution OLA For P4 Incidents-DeleteRelationship , Workflow Failed Instance Count: 17
    Workflow=Cireson Auto Close Service Requests Workflow , Workflow Failed Instance Count: 1
    Workflow=Incident Resolution OLA For P5 Incidents-AddRelationship , Workflow Failed Instance Count: 29
    Workflow=SetSupportGroupAssignDateOnNew , Workflow Failed Instance Count: 1
    Workflow=IT Servers SLO notification-breached , Workflow Failed Instance Count: 1
    Workflow= , Workflow Failed Instance Count: 1
    Workflow=SHR Response SLO for All Priority 5 Incidents-DeleteRelationship , Workflow Failed Instance Count: 359
    Workflow=Cireson Console Licensing Workflow , Workflow Failed Instance Count: 7
    Workflow=SHR Response SLO for All Priority 3 Incidents-DeleteRelationship , Workflow Failed Instance Count: 9
    Workflow=Cireson Asset Management Hardware Asset Catalog Item Workflow , Workflow Failed Instance Count: 2
    Workflow=SystemCenter MonitoringHostKeepAlive Rule , Workflow Failed Instance Count: 3

    I do not use SCSM Action Log Notify and after seeing this log, I have since removed the Cireson Survey Build Survey Reports as we never got around to utilizing them.  Overall the workflows appear to be working quite smoothly now.
  • Conner_WoodConner_Wood Customer Advanced IT Monkey ✭✭✭
    @Adam_Dzyacky It never hurts to show which native SCSM modules need to be imported, here's updated PowerShell which also lists out how many failed for each workflow.
    Import-Module -Force -Name Microsoft.EnterpriseManagement.Core.Cmdlets
    Import-Module -Force -Name Microsoft.EnterpriseManagement.ServiceManager.Cmdlets
    
    cls
    
    $scsmMgmtServer = "scsmMGMTServerHere"
    $emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer #MONSTER WORKFLOW LIST $wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'") foreach ($wfSub in $wfSubs) { $wfSubFailedInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Failed"} Write-Host "#Failed=$($wfSubFailedInstances.Count) | WorkflowName=$($wfSub.DisplayName)" foreach ($subFailedInstance in $wfSubFailedInstances) { #Ignore $emg.Subscription.IgnoreFailedSubscription($subFailedInstance) } }
  • Pierre_SmitPierre_Smit Customer IT Monkey ✭
    Hi All,

    I have an issue where all workflows have stopped being created - 2 days ago. So no email notifications and CRs being stuck at pending. Apparently no changes has been made the the SQL side or the VM with console on.

    Can someone adjust the script above so i can run it to check for failed workflows? Or provide any tips?

    thanks,
  • Conner_WoodConner_Wood Customer Advanced IT Monkey ✭✭✭
    @Pierre_Smit you are in for a world of hurt.
    Troubleshooting workflows not being created is not fun at all.

    I hope you did not remove the Primary Workflow Server Computer Asset from the Configuration Item section, because that stops all workflows for everything, including connectors. NEVER delete your SCSM servers from the Windows Server view in Service Manager.  (See attachment for more info)

    However I think what has happened is the Microsoft Monitoring Agent service messed up on the Primary Workflow Server and it needs to create new [dbo].[MT_HealthService] encryption keys again to sync and process data.

    On Primary Workflow Server:
         1)  Stop service Microsoft Monitoring Agent
         2)  Delete folder "C:\Program Files\Microsoft System Center 2012 R2\Service Manager\Health Service State"
         3)  Start service Microsoft Monitoring Agent
         4)  I don't believe you need to stop the other 2 SCSM services, also you could restart server as well if you wish

    I also followed some preventative maintenance advice for SCSM Management Servers and it seems to have helped:

    The "Microsoft Monitoring Agent" in Control Panel should not have any SCSM Servers listed on the Service Manager primary management server, or other Service Manager management servers. If you have a server listed in the "Microsoft Monitoring Agent Properties" it should be removed.

    Also the option "Automatically update management group assignments from AD DS" should be unchecked.

  • Magnus_Lundgren1Magnus_Lundgren1 Customer Adept IT Monkey ✭✭
    Could this be used to trigger retry on scheduled workflows?
    I have tried clearing the health service state folder but i still have like 200 scheduled instances that are not retrying.

    I have tried the below.
    Im getting all the scheduled instances but i cant retry them

    Import-Module -Force -Name Microsoft.EnterpriseManagement.Core.Cmdlets
    Import-Module -Force -Name Microsoft.EnterpriseManagement.ServiceManager.Cmdlets
    
    cls
    
    $scsmMgmtServer = "srv-sc-sm01"
    $emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer
    
    #MONSTER WORKFLOW LIST
    $wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")
    
    foreach ($wfSub in $wfSubs)
    {
        $wfSubFailedInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Scheduled"}
        Write-Host "#Scheduled=$($wfSubFailedInstances.Count) | WorkflowName=$($wfSub.DisplayName)"
            foreach ($subFailedInstance in $wfSubFailedInstances)
        {
            #Ignore
            $emg.Subscription.RetryFailedSubscription($subFailedInstance)
        }
    }
    Getting

    Cannot find an overload for "RetryFailedSubscription" and the argument count: "1".

  • Conner_WoodConner_Wood Customer Advanced IT Monkey ✭✭✭
    edited February 27
    I'm not sure if you can restart a scheduled workflow because it hasn't failed yet, however I know you can ignore it.

    You are trying to call RetryFailedSubscription with only 1 parameter.  Let's look at the difference between Ignore and Retry.

    Import-Module -Force -Name Microsoft.EnterpriseManagement.Core.Cmdlets
    Import-Module -Force -Name Microsoft.EnterpriseManagement.ServiceManager.Cmdlets
    
    $scsmMgmtServer = "scsmMGMTServerHere"
    $emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer
    
    $emg.Subscription | Get-Member | Select-Object -ExpandProperty Definition
    
    ####
    # YIELDS THE FOLLOWING METHOD INFORMATION
    ####
    
    void IgnoreFailedSubscription(Microsoft.EnterpriseManagement.Subscriptions.SubscriptionJobStatus subscriptionWorkflowStatus), 
    void ISubscriptionManagement.IgnoreFailedSubscription(Microsoft.EnterpriseManagement.Subscriptions.SubscriptionJobStatus subscriptionWorkflowStatus)
    
    void RetryFailedSubscription(Microsoft.EnterpriseManagement.Subscriptions.IWorkflowSubscriptionBase subscriptionWorkflow, Microsoft.EnterpriseManagement.Subscriptions.SubscriptionJobStatus subscriptionWorkflowStatus), 
    void ISubscriptionManagement.RetryFailedSubscription(Microsoft.EnterpriseManagement.Subscriptions.IWorkflowSubscriptionBase subscriptionWorkflow, Microsoft.EnterpriseManagement.Subscriptions.SubscriptionJobStatus subscriptionWorkflowStatus)
    
    Ignore - Requires wfSubInstance
    Retry - Requires wfSub and wfSubInstance

    As you can see, you need to call the RetryFailedSubscription with those 2 parameters.

    Import-Module -Force -Name Microsoft.EnterpriseManagement.Core.Cmdlets
    Import-Module -Force -Name Microsoft.EnterpriseManagement.ServiceManager.Cmdlets
    
    cls
    
    $scsmMgmtServer = "srv-sc-sm01"
    $emg = New-Object Microsoft.EnterpriseManagement.EnterpriseManagementGroup $scsmMgmtServer
    
    #MONSTER WORKFLOW LIST
    $wfSubs = $emg.Subscription.GetSubscriptionsByCriteria("Name LIKE '%'")
    
    foreach ($wfSub in $wfSubs)
    {
        $wfSubInstances = $emg.Subscription.GetFailedSubscriptionStatusById($wfSub.id) | ?{$_.status -eq "Scheduled"}
        Write-Host "#Scheduled=$($wfSubInstances.Count) | WorkflowName=$($wfSub.DisplayName)"
        foreach ($subInstance in $wfSubInstances)
        {
            #Retry (REQUIRES $wfSub AND $subInstance)
            $emg.Subscription.RetryFailedSubscription($wfSub, $subInstance)
        }
    }


    Let us know how it goes!

  • Magnus_Lundgren1Magnus_Lundgren1 Customer Adept IT Monkey ✭✭
    Did not work :(

    Exception calling "RetryFailedSubscription" with "2" argument(s): "Object reference not set to an instance of an object."

Sign In or Register to comment.