Home Service Manager

Workflow Errors - Runbook Automation Activities

Simon_ZeinhoferSimon_Zeinhofer Customer Advanced IT Monkey ✭✭✭

We receive the following warning message every 10 minutes in our Operations Manager log on our SCSM Workflow server:

OleDb Module encountered a failure 0x80004005 during execution and will post it as output data item. Unspecified error

: Login timeout expired 


Workflow name: Microsoft.SystemCenter.ServiceManager.LfxWorkflows.Monitor 

Instance name: WIN0011228.engel.int 

Instance ID: {1E15BCEC-B840-E54B-283B-B3F8769C060F} 

Management group: SCOM


Has anyeon seen this message before? I have searched the internet but all I could find was related to other errors at SCOM servers.

Workflows are running fine at most, BUT we have an issue, that sometimes a runbook in SCO is running successfully, but the Runbook Automation Activity is going to "Failed" - and that around 30-40 minutes after the runbook has already been finished.

May this be tied to each other?

What also happens very rarely is, that runbook automation activities, which fail (and this time they really failed) and then are set to rerun are set to failed, although the runbook ran successfully without warnings or errors. We recaive this error in the OpsManager log:

Data Access Layer rejected retry on SqlError:

 Request: p_ManagedEntityInsert -- (BaseManagedEntityId=d00f4a3c-e7ff-bbbc-033d-0da538909283), (TypedManagedEntityId=d00f4a3c-e7ff-bbbc-033d-0da538909283), (ManagedTypeId=5fe5d511-efb9-54a1-4be9-811f60e186c4), (FullName=Microsoft.SystemCenter.Orchestrator.RunbookAutomationActivity:RB42256), (Path=), (Name=RB42256), (TopLevelHostEntityId=d00f4a3c-e7ff-bbbc-033d-0da538909283), (DiscoverySourceId=7431e155-3d9e-4724-895e-c03ba951a352), (HealthServiceEntityId=ca8c4c5c-01df-32f4-baa9-63751e7c7183), (PerformHealthServiceCheck=False), (TimeGenerated=6/23/2022 6:53:00 AM), (IsOptimistic=True), (LifetimeRelationshipId=), (IsConnectorLoggingDisabled=False), (ConcurrentConnections=True), (LastModified=6/23/2022 6:52:38 AM), (TypedInstanceInserted=False), (ChangeId=), (RETURN_VALUE=1)

 Class: 16

 Number: 777980010

 Message: Instance Id = {D00F4A3C-E7FF-BBBC-033D-0DA538909283} last modification is more recent than submitted.



Can anyone help here?

Best Answer

  • Simon_ZeinhoferSimon_Zeinhofer Customer Advanced IT Monkey ✭✭✭
    edited October 2022 Answer ✓

    @Adam_Dzyacky I just came across this old post and I wanted to add one thing:

    Point 2 :The runbook IS indeed running successfully on SCO side, but the Runbook automation activity will be set to failed

    We found out why this occurs.

    We've set our SCO Connector to check the runbooks from our Automation Activities to 3 minutes (from the standard 5) via exporting the Mgmt Pack and reimporting it.

    So when a runbook is running, the job ID is mapped into the SCOJobID field inside the automation activity. When the job has status failed, the Automation activity is set to failed as well - BUT the JobID remains the same and will also not be exchanged by the new one. So when we rerun the activity, the JobID is still the one from the failed runbook. And so the activity is set to failed, although it has been running successfully this time.

    We also had the same issue in our old system, where the check was every 5 minutes.

    I digged into that and saw, that the exchange of the scojobid occurs pretty late and not frequently. So when we are "too fast" and set the activity to rerun, the connector checks the ID before it is replaced with the new one - In my opinion this is a bug, which should be solved by Microsoft.

    As opening a case with microsoft is really painful most of the time I wrote a workflow in the authoring tool, which clears the jobID as soon as a Automation Activity is set to failed. Since that it works 100 % of the time and the error didn't occur again.

Answers

  • Adam_DzyackyAdam_Dzyacky Product Owner Contributor Monkey ✭✭✭✭✭
    edited June 2022

    First question - you said this happens on your SCSM workflow server, but the management group listed is your SCOM instance? Just want to make sure there isn't a typo up there.

    Second question: Assuming that isn't a typo, would it be fair to assume you are monitoring SCSM via SCOM Agent instead of Agentless?

    Third: Just wanna make sure I'm getting this right on the SCO side of things, you have some SCSM process that kicks off a runbook, but SOMETIMES...sometimes at no consistent point during the day (which is to say it happens sporadically in a day if it even happens)...that runbook doesnt even kick off. SCSM marks it as failed. You re-run it, and then its fine?

  • Simon_ZeinhoferSimon_Zeinhofer Customer Advanced IT Monkey ✭✭✭
    edited June 2022

    Hello @Adam_Dzyacky Thanks for the fast answer :)

    Answer to first and second question: It is no Typo. It is the SCOM Instance. And yes, as far as I know we monitor it via SCOM Agent (I still have to question our Infrastructure Team here). Is that an issue?

    Answer to third question: There are 3 different scenarios:

    1. The runbook IS indeed running successfully on SCO side, but the Runbook automation activity will be set to failed - it happened on some runbooks which were running for around 5 minutes - the thing is, the activity end time in SCSM was shown 50 minutes after start time, which was just not true. After a rerun everything is fine.
    2. The runbook IS NOT even running on SCO side, but the Runbook automation activity will be set to failed . After a rerun everything is fine.
    3. The runbook has been running and failed. After a rerun (and the error has been solved in the runbook) the sco runbook succeeds, but the runbook automation activity shows failed. That is the scenario where we get the "Data Access Layer rejected retry on SqlError:" error in the logs.

    Yesterday I searched the net and found a script, which runs a "service task" (clears some tables, removes orphaned runbooks etc.) on SCO - after that we tried the same requests (where the RB Activities were set to failed) again, and there were no errors. I am not sure if we really solved the problem or if it was just luck.

    I hope my answers are sufficient ;)

    For anyone interested in the SCO service task, this is the github link: https://github.com/souravmahato7/Codes/blob/SCORCH/DatabaseMaitenanceScript.PS1

    For us it worked without any errors.

  • Simon_ZeinhoferSimon_Zeinhofer Customer Advanced IT Monkey ✭✭✭

    @Adam_Dzyacky I found an old community post from you, where you mention the Authorization Cache maintenance task occuring the same time as the runbook should be invoked - this might have been the same issue here ;)

    I changed the setting, so these 3 tasks only occur once at 00:00 - This resolved the issue for us.

  • Simon_ZeinhoferSimon_Zeinhofer Customer Advanced IT Monkey ✭✭✭
    edited October 2022 Answer ✓

    @Adam_Dzyacky I just came across this old post and I wanted to add one thing:

    Point 2 :The runbook IS indeed running successfully on SCO side, but the Runbook automation activity will be set to failed

    We found out why this occurs.

    We've set our SCO Connector to check the runbooks from our Automation Activities to 3 minutes (from the standard 5) via exporting the Mgmt Pack and reimporting it.

    So when a runbook is running, the job ID is mapped into the SCOJobID field inside the automation activity. When the job has status failed, the Automation activity is set to failed as well - BUT the JobID remains the same and will also not be exchanged by the new one. So when we rerun the activity, the JobID is still the one from the failed runbook. And so the activity is set to failed, although it has been running successfully this time.

    We also had the same issue in our old system, where the check was every 5 minutes.

    I digged into that and saw, that the exchange of the scojobid occurs pretty late and not frequently. So when we are "too fast" and set the activity to rerun, the connector checks the ID before it is replaced with the new one - In my opinion this is a bug, which should be solved by Microsoft.

    As opening a case with microsoft is really painful most of the time I wrote a workflow in the authoring tool, which clears the jobID as soon as a Automation Activity is set to failed. Since that it works 100 % of the time and the error didn't occur again.

Sign In or Register to comment.