SMA Runbooks stuck
Hi, I'm baffled as to why since an hour or so all of my new runbooks are in a running or queued state. None of them seems to complete. I already rebooted the SMA server, but it remains stuck.
Any hints as to how to get it up and running again ?
Stephane
Best Answer
-
Stephane_Bouillon Customer Advanced IT Monkey ✭✭✭
The problem was finally solved by following the recommendations in this article https://social.technet.microsoft.com/wiki/contents/articles/36041.service-management-automation-sma-troubleshooting-queued-runbook-jobs.aspx
This fix consists of having each runbook instance run in it's own process, by updating the configuration file parameters MaxRunningJobs: (default 30, set to 1) and TotalAllowedJobs: (default 1000, set to 1)
0
Answers
Hi @Stephane_Bouillon , we had this from time to time before we moved to Azure Automation. Check for a stuck Orchestrator.Sandbox.exe process on one of your runbook servers. Kill that process. A new one should spawn and rapidly start consuming CPU. Jobs should restart, sadly all at once unless you had some concurrency controls programmed in! :D
Thanks, I killed the Orchestrator.Sandbox, but that didn't help. It doesn't consume cpu either. The runbook service on the other hand is consuming over 50%. Restarting this service didn't help either.
Aww ok. :(
I'm afraid I'll have to open a support case. I'll keep you posted.
I have 136 runbooks queued, and 76 running, but nothing moves
Any database errors?
I tried to restart the SQL Server Agent on the SMA Server, and I got the following error:
TITLE: Microsoft SQL Server Management Studio
------------------------------
Unable to restart service SQLSERVERAGENT on server HQSCSMA01. (mscorlib)
------------------------------
ADDITIONAL INFORMATION:
Unable to stop service SQLSERVERAGENT on server HQSCSMA01. (ObjectExplorer)
------------------------------
The RPC server is unavailable. (Exception from HRESULT: 0x800706BA) (mscorlib)
Either you've got a SQL server issue there or you attempted to restart the SQL Server remotely and the WMI/RPC calls were blocked by a firewall?
Either way, it's something to look into!
Support gave the following suggestion, but unfortunately, no change in the number of queued and running runbooks, none are completing:
I've also restarted all the components in the system, meaning the database server, SCSM management server, SCSM Workflow server and SMA server.
I'm at a loss, if anyone has other suggestions, I have to get this up and running by monday :(
The problem was finally solved by following the recommendations in this article https://social.technet.microsoft.com/wiki/contents/articles/36041.service-management-automation-sma-troubleshooting-queued-runbook-jobs.aspx
This fix consists of having each runbook instance run in it's own process, by updating the configuration file parameters MaxRunningJobs: (default 30, set to 1) and TotalAllowedJobs: (default 1000, set to 1)