Home Service Manager

SCSM Workflows just stopped. NO events being created :(

Pierre_SmitPierre_Smit Customer IT Monkey ✭
Hi All,

I have an issue where all workflows have stopped being created - 2 days ago. So no email notifications and CRs being stuck at pending. Apparently no changes has been made the the SQL side or the VM with console on.

Does anyone have a script to check the amount of failed workflows?


Best Answers


  • Nicholas_VelichNicholas_Velich Cireson Consultant Ninja IT Monkey ✭✭✭✭
    Hi Pierre,

    You can check out this post here on troubleshooting workflows: https://blogs.technet.microsoft.com/servicemanager/2013/01/14/troubleshooting-workflow-performance-and-delays/

  • Pierre_SmitPierre_Smit Customer IT Monkey ✭
    Thanks Nicholas. I assume after renaming the Health Service State folder a new one will be created?
  • Justin_WorkmanJustin_Workman Cireson Support Super IT Monkey ✭✭✭✭✭
    Thanks Nicholas. I assume after renaming the Health Service State folder a new one will be created?
    Yep.  Once you restart the SCSM services, it will create a new Health Service State folder.
  • Adam_DzyackyAdam_Dzyacky Product Owner Contributor Monkey ✭✭✭✭✭
    edited February 2018

    The link @Nicholas_Velich provides is the absolute, must, go-to, starting point for addressing this issue within SCSM. However if this is your first foray into workflow troubleshooting it can be a little overwhelming, confusing, and/or you may end up down a lot of different rabbit holes trying to address a root cause. On top of this, there is no silver bullet solution here given the variables that exist across SCSM environments (other System Center products being used/synced, various connector's schedules start/end times, how many connectors, Cireson's Asset management workflows, custom management packs/workflows, SQL backend, disk subsystem, etc.)

    Here's some things I would be checking just as an initial diagnosis:
    • How many connectors do you have whose schedules overlap?
    • How long do those connectors run? Minutes? Hours?
    • Do any of the stock connectors overlap with core SCSM data warehouse processing? (12am, 2am)
    • Do any long running Cireson workflows overlap with core SCSM data warehouse processing? (12am, 2am)
    • How many other System Center products are being used (and in turn, synced) with SCSM? SCOM and SCCM generate a lot of really useful and valuable data, but the SCCM connector does have an issue when it comes to machines in Azure/Hyper-V that in turn leads into some issues with Cireson Asset Management workflows
    • I feel like there are two SQL queries that don't get enough attention in SCSM performance/workflow troubleshooting and those are the following. The first will tell you the noisiest objects in the ServiceManager db or to be more verbose, what are the objects that are getting the most changes applied to them. In essence, is there a lot of unnecessary background noise happening in the SCSM Database which in turn competes with workflows for resources on SQL? That said, the observation to be made from the query isn't necessarily "large numbers are bad" but more so, is there a insanely large gap between the first couple of items listed and the rest? The second query will tell you what are the largest tables in the ServiceManager DB. This query isn't anything new because it's taken directly from troubleshooting SCOM performance from Kevin Holman's "useful scom sql queries". The observation to be made from this query is really "EntityChangeLog" table should not be your largest table in ServiceManager. If it is, again this points to a lot of background noise happening in SCSM that could be competing for SQL resources with workflows.

    --Loudest Objects in SCSM
     SELECT TOP 50 BME.FullName, COUNT(1)
     FROM EntityChangeLog AS ECL WITH(NOLOCK)
     JOIN BaseManagedEntity AS BME WITH(NOLOCK)
        ON ECL.EntityId = BME.BaseManagedEntityId
     WHERE RelatedEntityId IS NULL
     GROUP BY BME.FullName

    --Largest Tables in SCSM
    SELECT TOP 1000
    a2.name AS [tablename], (a1.reserved + ISNULL(a4.reserved,0))* 8 AS 'reserved (KB)', 
    a1.rows as row_count, a1.data * 8 AS 'data (KB)', 
    (CASE WHEN (a1.used + ISNULL(a4.used,0)) > a1.data THEN (a1.used + ISNULL(a4.used,0)) - a1.data ELSE 0 END) * 8 AS 'index size (KB)', 
    (CASE WHEN (a1.reserved + ISNULL(a4.reserved,0)) > a1.used THEN (a1.reserved + ISNULL(a4.reserved,0)) - a1.used ELSE 0 END) * 8 AS 'unused (KB)', 
    (row_number() over(order by (a1.reserved + ISNULL(a4.reserved,0)) desc))%2 as l1, 
    a3.name AS [schemaname] 
    FROM (SELECT ps.object_id, SUM (CASE WHEN (ps.index_id < 2) THEN row_count ELSE 0 END) AS [rows], 
    SUM (ps.reserved_page_count) AS reserved, 
    SUM (CASE WHEN (ps.index_id < 2) THEN (ps.in_row_data_page_count + ps.lob_used_page_count + ps.row_overflow_used_page_count) 
    ELSE (ps.lob_used_page_count + ps.row_overflow_used_page_count) END ) AS data, 
    SUM (ps.used_page_count) AS used 
    FROM sys.dm_db_partition_stats ps 
    GROUP BY ps.object_id) AS a1 
    LEFT OUTER JOIN (SELECT it.parent_id, 
    SUM(ps.reserved_page_count) AS reserved, 
    SUM(ps.used_page_count) AS used 
    FROM sys.dm_db_partition_stats ps 
    INNER JOIN sys.internal_tables it ON (it.object_id = ps.object_id) 
    WHERE it.internal_type IN (202,204) 
    GROUP BY it.parent_id) AS a4 ON (a4.parent_id = a1.object_id) 
    INNER JOIN sys.all_objects a2  ON ( a1.object_id = a2.object_id ) 
    INNER JOIN sys.schemas a3 ON (a2.schema_id = a3.schema_id) 
    WHERE a2.type <> N'S' and a2.type <> N'IT'
    order by row_count desc

    This particular discussion is one I feel like I could write a book on, so to spare your mouse wheel from so much scrolling. I'll wrap this up for now. Best of luck troubleshooting!
  • Pierre_SmitPierre_Smit Customer IT Monkey ✭
    Thanks all for your quick responses. The suggestion of renaming the "Health Service State" folder and then restarting the server has resolved the issue!

    Thank again.  
Sign In or Register to comment.