Home Service Manager

SCSM 2019 Performance Issues

James_JohnsonJames_Johnson Customer Adept IT Monkey ✭✭

I'm working on migrating from SCSM 2016 to 2019, I have our 2019 instances running side by side currently with our 2016 and I have synced the data with LMA tool. However I'm having terrible performance in both the console and web app, I'm seeing The database subscription query is longer than expected. Messages on the primary management server for what seems like most of the workflows as well as some Transaction (Process ID 105) was deadlocked on lock resources with another process and has been chosen as the deadlock victim warnings. I don't have any custom workflows and there is no one using this system right now.

I've used the minutes behind query from troubleshooting workflow performance and everything is caught up, I've tried to verify indices are okay and as far as I can tell they are. I'm not much of a DBA so not sure what else I can do to figure out what's going on with SQL. I will see the SQL server CPU spike to 100% every so often, but usually it's around 10-15% usage.


Details about our configuration:

Primary mgmt server: Server 2019, 4 cpu cores, 32gb ram

Database server: Server 2019, SQL 2017, 8 cpu cores, 32gb ram

Secondary mgmt/portal serevr: Server 2019, 4 cpu cores, 24gb ram

All servers are VM's on Citrix xen server hypervisor. CPU's are Xeon Gold 6230, same virtual pool and same subnet for networking.


Any thoughts or ideas on how to troubleshoot would be greatly appreciated!


Thanks,

James

Answers

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Hi @James_Johnson

    When you installed 2019, did you installed UR2?

    If so, did you apply the fix they mention:

    Known issues with this update:

    Microsoft is getting reports that Workflows are getting delayed due to intermittently crashing of MonitoringHost.exe process. An error (Event ID: 1026) is also logged under Event Viewer with Exception Info 'UnauthorizedAccessException'.


    Solution

    Run the SQL script from hereafter applying the UR patch.


    Thanks,

    Shane

  • James_JohnsonJames_Johnson Customer Adept IT Monkey ✭✭

    Hi @Shane_White

    I did install UR2 but I had already run that SQL script, didn't seem to have much effect for me. We're supposed to leave that block commented out right?


    Thanks,

    James

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Hi @James_Johnson

    Yep just run the script how it is when it is downloaded! So that rules that out!

    The other thing you could do is run the below script to see how far behind your workflows are and maybe which one is causing the backup!

    Thanks,

    Shane

  • James_JohnsonJames_Johnson Customer Adept IT Monkey ✭✭

    I've checked that before and none of the workflows are actually behind at all, some of them do have a value in NonZeroEventCount if that matters.


    Thanks,

    James

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    What portal version are you on?

    Do you have any customisation in your environment?

  • James_JohnsonJames_Johnson Customer Adept IT Monkey ✭✭

    10.2.4, and yes we have some customizations but they are the same ones currently running on our production servers. I can try removing them for now.

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Just try to eliminate as much as possible to kind of narrow it down.

    The other you could do is run a SQL Server Profiler trace and see what's running that is heavy!

  • James_JohnsonJames_Johnson Customer Adept IT Monkey ✭✭

    @Shane_White

    So I've never run a trace before but it doesn't seem to be anything too heavy hitting the system. Here's a screenshot of what I'm seeing constantly hit the system:

    And running a query on the table I'm saving the data to these are the most expensive queries that have run:

    Not sure if this data looks normal or not. I still have the trace running, hoping to see what's going on when I get the subscriptions are going slow message. However even as I'm running the trace right now, the sql server is sitting around 20% cpu usage and it take multiple minutes to click anything in the console before it loads. Thoughts?


    Thanks!

    James

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    @James_Johnson

    The SQL seems fine to me.. what about the CPU on the SCSM Primary Workflow server? Nothing is seeming to standout at the moment

    Thanks,

    Shane

  • James_JohnsonJames_Johnson Customer Adept IT Monkey ✭✭

    It's sitting around 75% utilization with System Center Management Service Host Process and Microsoft.Mom.Sdk.ServiceHost being the two heavy users.

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Yh something is definitely hammering the SDK..

    Is there anything else in the Operations manager log?

    Is the SCSM version the only difference between the 2 environments?

    I see you have Cireson+ So you could raise a ticket for this issue

  • James_JohnsonJames_Johnson Customer Adept IT Monkey ✭✭

    The operations manager log is pretty quiet, only 3 events in the last 3 hours and they are just client connecting events from me trying to open the console.

    I wasn't sure exactly what Cireson+ covered but I'll go ahead and do that.

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Did you ever raise a ticket for this @James_Johnson Or find anything else that could be off use? Could you have too many queues or user roles that is just meaning workflows are slow to process objects?

  • James_JohnsonJames_Johnson Customer Adept IT Monkey ✭✭
    edited May 5

    Hi @Shane_White

    I did open a request. Justin and John Doyle were helping me try and figure out what was happening. Nothing seemed to be wrong, workflows were not falling behind. Nothing was coming up in the logs on any of the servers. I do have quite a few queues but it's running fine in our current scsm 2016 environment.

    We weren't able to narrow down what the processes were actually doing and figured a ticket to Microsoft was probably needed.

    I was wondering why queues were so bad performance, I've read that in a few other places too. What would be some alternative ways to group tickets if I have a multitude of different groups using SCSM?

    Oh I did notice that our ETL maintenance was causing a lock on the system for a few minutes when it ran. I turned off ECL logging since we import nearly 500k users into the database. Hopefully that will help with some performance issues.

    Thanks,

    James

Sign In or Register to comment.