Home Service Manager

SCSM 2019 Performance Issues

James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭

I'm working on migrating from SCSM 2016 to 2019, I have our 2019 instances running side by side currently with our 2016 and I have synced the data with LMA tool. However I'm having terrible performance in both the console and web app, I'm seeing The database subscription query is longer than expected. Messages on the primary management server for what seems like most of the workflows as well as some Transaction (Process ID 105) was deadlocked on lock resources with another process and has been chosen as the deadlock victim warnings. I don't have any custom workflows and there is no one using this system right now.

I've used the minutes behind query from troubleshooting workflow performance and everything is caught up, I've tried to verify indices are okay and as far as I can tell they are. I'm not much of a DBA so not sure what else I can do to figure out what's going on with SQL. I will see the SQL server CPU spike to 100% every so often, but usually it's around 10-15% usage.


Details about our configuration:

Primary mgmt server: Server 2019, 4 cpu cores, 32gb ram

Database server: Server 2019, SQL 2017, 8 cpu cores, 32gb ram

Secondary mgmt/portal serevr: Server 2019, 4 cpu cores, 24gb ram

All servers are VM's on Citrix xen server hypervisor. CPU's are Xeon Gold 6230, same virtual pool and same subnet for networking.


Any thoughts or ideas on how to troubleshoot would be greatly appreciated!


Thanks,

James

Answers

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Hi @James_Johnson

    When you installed 2019, did you installed UR2?

    If so, did you apply the fix they mention:

    Known issues with this update:

    Microsoft is getting reports that Workflows are getting delayed due to intermittently crashing of MonitoringHost.exe process. An error (Event ID: 1026) is also logged under Event Viewer with Exception Info 'UnauthorizedAccessException'.


    Solution

    Run the SQL script from hereafter applying the UR patch.


    Thanks,

    Shane

  • James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭

    Hi @Shane_White

    I did install UR2 but I had already run that SQL script, didn't seem to have much effect for me. We're supposed to leave that block commented out right?


    Thanks,

    James

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Hi @James_Johnson

    Yep just run the script how it is when it is downloaded! So that rules that out!

    The other thing you could do is run the below script to see how far behind your workflows are and maybe which one is causing the backup!

    Thanks,

    Shane

  • James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭

    I've checked that before and none of the workflows are actually behind at all, some of them do have a value in NonZeroEventCount if that matters.


    Thanks,

    James

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    What portal version are you on?

    Do you have any customisation in your environment?

  • James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭

    10.2.4, and yes we have some customizations but they are the same ones currently running on our production servers. I can try removing them for now.

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Just try to eliminate as much as possible to kind of narrow it down.

    The other you could do is run a SQL Server Profiler trace and see what's running that is heavy!

  • James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭

    @Shane_White

    So I've never run a trace before but it doesn't seem to be anything too heavy hitting the system. Here's a screenshot of what I'm seeing constantly hit the system:

    And running a query on the table I'm saving the data to these are the most expensive queries that have run:

    Not sure if this data looks normal or not. I still have the trace running, hoping to see what's going on when I get the subscriptions are going slow message. However even as I'm running the trace right now, the sql server is sitting around 20% cpu usage and it take multiple minutes to click anything in the console before it loads. Thoughts?


    Thanks!

    James

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    @James_Johnson

    The SQL seems fine to me.. what about the CPU on the SCSM Primary Workflow server? Nothing is seeming to standout at the moment

    Thanks,

    Shane

  • James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭

    It's sitting around 75% utilization with System Center Management Service Host Process and Microsoft.Mom.Sdk.ServiceHost being the two heavy users.

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Yh something is definitely hammering the SDK..

    Is there anything else in the Operations manager log?

    Is the SCSM version the only difference between the 2 environments?

    I see you have Cireson+ So you could raise a ticket for this issue

  • James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭

    The operations manager log is pretty quiet, only 3 events in the last 3 hours and they are just client connecting events from me trying to open the console.

    I wasn't sure exactly what Cireson+ covered but I'll go ahead and do that.

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Did you ever raise a ticket for this @James_Johnson Or find anything else that could be off use? Could you have too many queues or user roles that is just meaning workflows are slow to process objects?

  • James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭
    edited May 2021

    Hi @Shane_White

    I did open a request. Justin and John Doyle were helping me try and figure out what was happening. Nothing seemed to be wrong, workflows were not falling behind. Nothing was coming up in the logs on any of the servers. I do have quite a few queues but it's running fine in our current scsm 2016 environment.

    We weren't able to narrow down what the processes were actually doing and figured a ticket to Microsoft was probably needed.

    I was wondering why queues were so bad performance, I've read that in a few other places too. What would be some alternative ways to group tickets if I have a multitude of different groups using SCSM?

    Oh I did notice that our ETL maintenance was causing a lock on the system for a few minutes when it ran. I turned off ECL logging since we import nearly 500k users into the database. Hopefully that will help with some performance issues.

    Thanks,

    James

  • Shane_WhiteShane_White Cireson Support Super IT Monkey ✭✭✭✭✭

    Hi @James_Johnson

    Interesting..and the shortest answer without getting too technical is the more queues you have the more scoping SCSM has to handle each time a ticket is raised and the more things that need to be processed before the rights are granted on a ticket.

    What are the differences between the current SCSM 2016 env and the new 2019 one?

    How did the ETL changes help? any improvement?

    Thanks,

    Shane

  • John_LongJohn_Long Customer Advanced IT Monkey ✭✭✭

    We found that setting the SQL compatibility level back to 2012 fixed a lot of our problems with database performance. Prior to that we frequently had service requests and associated activities running 90 minutes behind.

    We wondered if it was something similar to this:

    Service Manager console becomes slow after SQL Server upgrade - Service Manager | Microsoft Docs

  • James_JohnsonJames_Johnson Customer Advanced IT Monkey ✭✭✭

    There was no difference between the environments that I am aware of, I even tried doing in-place upgrades but the same thing happened. Stopping the ETL logging seems to have helped a little bit with the locks and hopefully it will keep getting better as that old data keeps getting groomed out.

    @John_Long Thanks for the suggestion, I think I found a similar article at one point in my googling and already had the compatibility level set to 2012.

    I did also have two incidents with duplicate ID's that was causing some issues so maybe getting that resolved will help with upgrades too. I've kind of put this on hold for now but I am going to try and spinning up some new VMs soon to try upgrading once again.

Sign In or Register to comment.