Home General Discussion

SMA Runbooks stuck

Stephane_BouillonStephane_Bouillon Customer Advanced IT Monkey ✭✭✭

Hi, I'm baffled as to why since an hour or so all of my new runbooks are in a running or queued state. None of them seems to complete. I already rebooted the SMA server, but it remains stuck.

Any hints as to how to get it up and running again ?

Stephane

Best Answer

Answers

  • John_LongJohn_Long Customer Advanced IT Monkey ✭✭✭
    edited December 2019

    Hi @Stephane_Bouillon , we had this from time to time before we moved to Azure Automation. Check for a stuck Orchestrator.Sandbox.exe process on one of your runbook servers. Kill that process. A new one should spawn and rapidly start consuming CPU. Jobs should restart, sadly all at once unless you had some concurrency controls programmed in! :D

  • Stephane_BouillonStephane_Bouillon Customer Advanced IT Monkey ✭✭✭

    Thanks, I killed the Orchestrator.Sandbox, but that didn't help. It doesn't consume cpu either. The runbook service on the other hand is consuming over 50%. Restarting this service didn't help either.

  • John_LongJohn_Long Customer Advanced IT Monkey ✭✭✭

    Aww ok. :(

  • Stephane_BouillonStephane_Bouillon Customer Advanced IT Monkey ✭✭✭

    I'm afraid I'll have to open a support case. I'll keep you posted.

  • Stephane_BouillonStephane_Bouillon Customer Advanced IT Monkey ✭✭✭

    I have 136 runbooks queued, and 76 running, but nothing moves

  • John_LongJohn_Long Customer Advanced IT Monkey ✭✭✭

    Any database errors?

  • Stephane_BouillonStephane_Bouillon Customer Advanced IT Monkey ✭✭✭

    I tried to restart the SQL Server Agent on the SMA Server, and I got the following error:

    TITLE: Microsoft SQL Server Management Studio

    ------------------------------

    Unable to restart service SQLSERVERAGENT on server HQSCSMA01. (mscorlib)

    ------------------------------

    ADDITIONAL INFORMATION:

    Unable to stop service SQLSERVERAGENT on server HQSCSMA01. (ObjectExplorer)

    ------------------------------

    The RPC server is unavailable. (Exception from HRESULT: 0x800706BA) (mscorlib)

  • John_LongJohn_Long Customer Advanced IT Monkey ✭✭✭

    Either you've got a SQL server issue there or you attempted to restart the SQL Server remotely and the WMI/RPC calls were blocked by a firewall?

    Either way, it's something to look into!

  • Stephane_BouillonStephane_Bouillon Customer Advanced IT Monkey ✭✭✭

    Support gave the following suggestion, but unfortunately, no change in the number of queued and running runbooks, none are completing:

    The fastest way to get it unstuck would probably be to 
    stop the Microsoft Monitoring Agent service, 
    the System Center Data Access Service, 
    and the System Center Configuration service on the Primary SCSM Server. 
    
    Delete the contents of the Service Manager\Health Service State folder 
    and then restart the services.
    

    I've also restarted all the components in the system, meaning the database server, SCSM management server, SCSM Workflow server and SMA server.

    I'm at a loss, if anyone has other suggestions, I have to get this up and running by monday :(

  • Stephane_BouillonStephane_Bouillon Customer Advanced IT Monkey ✭✭✭
    Answer ✓

    The problem was finally solved by following the recommendations in this article https://social.technet.microsoft.com/wiki/contents/articles/36041.service-management-automation-sma-troubleshooting-queued-runbook-jobs.aspx

    This fix consists of having each runbook instance run in it's own process, by updating the configuration file parameters MaxRunningJobs: (default 30, set to 1) and TotalAllowedJobs: (default 1000, set to 1)

Sign In or Register to comment.