I have been having problems with SCSM TempDB growing too large

Chris_SpragueChris_Sprague Customer IT Monkey ✭

I have been working with Microsoft on the issue for a number of weeks and they have not found the problem yet. The TempDB will fill up over and over and some things seem to not function properly.


We were able to find the query below in SQL as the problem (this occurs every HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\System Center\2010\Common\GroupCalcPollingIntervalMilliseconds and times out when the TembDB fills up):

Trying to find help from technet and other sites, it looks like the common denominator for everyone with these large queries that cause tempdb to grow out of control is Cireson integrations.

Is the query below realated to any Cireson Integrations?

Is there any patch or solution to stop this from occuring?

https://social.technet.microsoft.com/Forums/ie/en-US/1d38dfd5-6f4c-47f0-bdaa-d8f3fb240011/large-temp-db-growth?forum=systemcenterservicemanager

FROM (SELECT [S].[BaseManagedEntityId] AS [S],0 AS CS,[R0].[TargetEntityId] AS [T0],1 AS CRT0,[R1].[TargetEntityId] AS [T1],2 AS CRT1,[R2].[TargetEntityId] AS [T2],3 AS CRT2,[R3].[TargetEntityId] AS [T3],4 AS CRT3,[R4].[TargetEntityId] AS [T4],5 AS CRT4,[R5].[TargetEntityId] AS [T5],6 AS CRT5,[R6].[TargetEntityId] AS [T6],7 AS CRT6,[R7].[TargetEntityId] AS [T7],8 AS CRT7,[R8].[SourceEntityId] AS [T8],9 AS CRT8,[R9].[TargetEntityId] AS [T9],10 AS CRT9,[R10].[TargetEntityId] AS [T10],11 AS CRT10,[R11].[TargetEntityId] AS [T11],12 AS CRT11,[R12].[TargetEntityId] AS [T12],13 AS CRT12,[R13].[TargetEntityId] AS [T13],14 AS CRT13,[R14].[TargetEntityId] AS [T14],15 AS CRT14,[R15].[TargetEntityId] AS [T15],16 AS CRT15,[R16].[TargetEntityId] AS [T16],17 AS CRT16,[R17].[TargetEntityId] AS [T17],18 AS CRT17

FROM dbo.TypedManagedEntity AS S 

 LEFT OUTER JOIN dbo.Relationship AS R0 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAssignedToUser

   ON (R0.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R0.[RelationshipTypeId] = '15e577a3-6bf9-6713-4eac-ba5a5b7c4722'

   AND R0.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R1 /*SOURCE_HINT_PATTERN*/-- System.WorkItemClosedByUser

   ON (R1.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R1.[RelationshipTypeId] = 'ba8180d3-5bf9-1bbd-ae87-145dd8fc520f'

   AND R1.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R2 /*SOURCE_HINT_PATTERN*/-- System.WorkItemCreatedByUser

   ON (R2.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R2.[RelationshipTypeId] = 'df738111-c7a2-b450-5872-c5f3b927481a'

   AND R2.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R3 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAffectedUser

   ON (R3.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R3.[RelationshipTypeId] = 'dff9be66-38b0-b6d6-6144-a412a3ebd4ce'

   AND R3.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R4 /*SOURCE_HINT_PATTERN*/-- System.WorkItemRelatesToConfigItem

   ON (R4.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R4.[RelationshipTypeId] = 'd96c8b59-8554-6e77-0aa7-f51448868b43'

   AND R4.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R5 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAboutConfigItem

   ON (R5.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R5.[RelationshipTypeId] = 'b73a6094-c64c-b0ff-9706-1822df5c2e82'

   AND R5.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R6 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAboutConfigItem

   ON (R6.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R6.[RelationshipTypeId] = 'b73a6094-c64c-b0ff-9706-1822df5c2e82'

   AND R6.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R7 /*SOURCE_HINT_PATTERN*/-- System.WorkItemRelatesToWorkItem

   ON (R7.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R7.[RelationshipTypeId] = 'cb6ce813-ea8d-094d-ee5a-b755701f4547'

   AND R7.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R8 /*TARGET_HINT_PATTERN*/-- System.WorkItemRelatesToWorkItem

   ON (R8.[TargetEntityId] = S.[BaseManagedEntityId]

   AND R8.[RelationshipTypeId] = 'cb6ce813-ea8d-094d-ee5a-b755701f4547'

   AND R8.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R9 /*SOURCE_HINT_PATTERN*/-- System.WorkItemHasFileAttachment

   ON (R9.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R9.[RelationshipTypeId] = 'aa8c26dc-3a12-5f88-d9c7-753e5a8a55b4'

   AND R9.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R10 /*SOURCE_HINT_PATTERN*/-- System.EntityLinksToKnowledgeDocument

   ON (R10.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R10.[RelationshipTypeId] = 'ec8a256b-cc89-6f18-af6a-e9cdc38cc573'

   AND R10.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R11 /*SOURCE_HINT_PATTERN*/-- System.WorkItemRelatesToRequestOffering

   ON (R11.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R11.[RelationshipTypeId] = '2730587f-3d88-a4e4-42d8-08cf94535a6e'

   AND R11.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R12 /*SOURCE_HINT_PATTERN*/-- System.WorkItemHasActionLog

   ON (R12.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R12.[RelationshipTypeId] = '0edb327d-7ef8-0501-0494-df0a51eaba38'

   AND R12.[IsDeleted] = 0

   AND EXISTS ( -- Restricted to System.WorkItem.TroubleTicket.ActionLog

    SELECT 1 FROM dbo.[TypedManagedEntity]

    WHERE R12.[TargetEntityId] = [BaseManagedEntityId]

    AND [ManagedTypeId] = 'dbb6a632-0a7e-cef8-1fc9-405d5cd4d911' AND [IsDeleted] = 0))

 LEFT OUTER JOIN dbo.Relationship AS R13 /*SOURCE_HINT_PATTERN*/-- System.WorkItemHasCommentLog

   ON (R13.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R13.[RelationshipTypeId] = '79d27435-5917-b0a1-7911-fb2b678f32a6'

   AND R13.[IsDeleted] = 0

   AND EXISTS ( -- Restricted to System.WorkItem.TroubleTicket.AnalystCommentLog

    SELECT 1 FROM dbo.[TypedManagedEntity]

    WHERE R13.[TargetEntityId] = [BaseManagedEntityId]

    AND [ManagedTypeId] = 'f14b70f4-878c-c0e1-b5c1-06ca22d05d40' AND [IsDeleted] = 0))

 LEFT OUTER JOIN dbo.Relationship AS R14 /*SOURCE_HINT_PATTERN*/-- System.WorkItemHasCommentLog

   ON (R14.[SourceEntityId] = S.[BaseManagedEntityId]

   AND R14.[RelationshipTypeId] = '79d27435-5917-b0a1-7911-fb2b678f32a6'

   AND R14.[IsDeleted] = 0

   AND EXISTS ( -- Restricted to System.WorkItem.TroubleTicket.UserCommentLog

    SELECT 1 FROM dbo.[TypedManagedEntity]

    WHERE R14.[TargetEntityId] = [BaseManagedEntityId]

    AND [ManagedTypeId] = 'a3d4e16f-5e8a-18ba-9198-d9815194c986' AND [IsDeleted] = 0))

 LEFT OUTER JOIN dbo.Relationship AS R15 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAssignedToUser

   ON (R15.[SourceEntityId] = R7.[TargetEntityId]

   AND R15.[RelationshipTypeId] = '15e577a3-6bf9-6713-4eac-ba5a5b7c4722'

   AND R15.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R16 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAssignedToUser

   ON (R16.[SourceEntityId] = R8.[SourceEntityId]

   AND R16.[RelationshipTypeId] = '15e577a3-6bf9-6713-4eac-ba5a5b7c4722'

   AND R16.[IsDeleted] = 0)

 LEFT OUTER JOIN dbo.Relationship AS R17 /*SOURCE_HINT_PATTERN*/-- System.FileAttachmentAddedByUser

   ON (R17.[SourceEntityId] = R9.[TargetEntityId]

   AND R17.[RelationshipTypeId] = 'ffd71f9e-7346-d12b-85d6-7c39f507b7bb'

   AND R17.[IsDeleted] = 0)

WHERE (S.[ManagedTypeId] = '04b69835-6343-4de2-4b19-6be08c612989'

   AND S.[IsDeleted] = 0)) AS PC

LEFT OUTER JOIN

(

-- MTV_Select_eca3c52a-f273-5cdc-f165-3eb95a2b26cf

SELECT [MTV_System$Domain$User].[UserName_6AF77E23_669B_123F_B392_323C17097BBD], [MTV_System$Domain$User].[BaseManagedEntityId]

FROM dbo.MTV_System$Domain$User AS MTV_System$Domain$User 

) AS SubqueryJoin0_0 ON SubqueryJoin0_0.[BaseManagedEntityId] = [PC].[T0]

 

WHERE SubqueryJoin0_0.[UserName_6AF77E23_669B_123F_B392_323C17097BBD] IS NULL OPTION (KEEP PLAN)

Answers

  • Adam_DzyackyAdam_Dzyacky Customer Contributor Monkey ✭✭✭✭✭
    edited December 2019

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\System Center\2010\Common\GroupCalcPollingIntervalMilliseconds

    If I recall correctly...

    That registry key is generally brought up when it comes to overall SCSM performance and is in of itself a bit of a balancing act. The key is responsible for SCSM knowing how often permissions should be recalculated e.g. "Does this user have permissions to view/edit a Work Item" or "Does this User have the ability to view/edit this Config Item." Depending on how many Work/Config Items you have in your operational DB, coupled with how you've setup RBAC, the permission re-calculation can be very expensive to perform frequently (I think the default time is 30 seconds?). The easy solution here is to take the number up, but that comes at the cost that brand new Work Items your Analysts may not have access to until the next calculation.

    When the Cireson portal is deployed, the Cache Builder performs a similar function. Only it's job is to run against the SCSM operational DB to sync tables out and into ServiceManagement. Again, if you have a complex or misconfigured RBAC within SCSM. CacheBuilder suffers its own performance issues resulting in overall system health issues. A way to observe this after the fact is to check out your CacheBuilder log and see how long it takes Cachebuilder to complete a full sync. Ideally it should be only a couple of minutes (single digits).


    All of the above assumes you're using Queues and scoping within SCSM.

  • Chris_SpragueChris_Sprague Customer IT Monkey ✭

    We have some very basic RBAC (limits which views a user can see and what service offerings are available in the Self Service portal) but nothing major.  Microsoft has not found the cause of why the query takes so long, so I was just posting here.

    Roles:

    basic user (Self-Service) - Can submit incidents - limited by catalog group.

    Advanced user (Self-Service) - Can submit incidents and service requests - limited by catalog group.

    Technician - Console user - Advanced operator limited by views. (also has a offerings limited by catalog group)

    I had the group calculation set at every 10 minutes and changed to once an hour.

  • Chris_SpragueChris_Sprague Customer IT Monkey ✭

    Note, the SQL Program_name that it shows it

    SC DAL--GroupCalc

  • Chris_SpragueChris_Sprague Customer IT Monkey ✭

    The tempdb still fills up. Hoping Microsoft can come back with what is causing the group calculation to grow so large.

  • Adam_DzyackyAdam_Dzyacky Customer Contributor Monkey ✭✭✭✭✭

    Cycling through some notes here -

    • Do you have a large work item retention history/have you checked to see if grooming jobs are failing?
    • What version of the Cireson portal are you running? Do you have any customizations deployed such as Work Item badges?
    • How often does TempDB max out? Is there some pattern such as every 15 minutes? every hour? How long has this whole thing been going on?
    • There is a thread I and others were on awhile ago around performance (but not specifically TempDB related) over here that has several useful SQL queries you can run against your ServiceManager DB such as checking grooming job history. These queries may or may not help you out.
    • On your workflow server's Operations Manager event log. Over the last 24 hours of events is the majority informational? critical? or warning?
  • Chris_SpragueChris_Sprague Customer IT Monkey ✭
    1. Grooming is working. We only have 365 days right now.
    2. Not using the Cireson Portal, just the Notify Analyst and Autoclose. (we have the Provance Suite (Provance Asset Management w/ GridPro Analyst and /w itNetX self Service) and I am hoping to convince management to go with Cireson after we work out all the bugs with the product as it is now.
    3. About every 30 minutes. Not sure how long it has been going on. In the development side, the query finishes, but there is not as many work items.
    4. We have done similar things, but I will look at this also.
    5. Most informational. Some like the ones below:

    -----Error-------------

    Data Access Layer rejected retry on SqlError:

     Request: p_ManagedEntityInsert -- (BaseManagedEntityId=35b8999d-6c69-068a-1a78-82d12509da08), (TypedManagedEntityId=35b8999d-6c69-068a-1a78-82d12509da08), (ManagedTypeId=a604b942-4c7b-2fb2-28dc-61dc6f465c68), (FullName=System.WorkItem.Incident:IR2698004), (Path=), (Name=IR2698004), (TopLevelHostEntityId=35b8999d-6c69-068a-1a78-82d12509da08), (DiscoverySourceId=7431e155-3d9e-4724-895e-c03ba951a352), (HealthServiceEntityId=aead6610-e8da-5379-0953-0d12de4f8ff1), (PerformHealthServiceCheck=False), (TimeGenerated=12/17/2019 6:10:18 PM), (IsOptimistic=True), (LifetimeRelationshipId=), (IsConnectorLoggingDisabled=False), (ConcurrentConnections=True), (LastModified=12/17/2019 6:07:36 PM), (TypedInstanceInserted=False), (ChangeId=), (RETURN_VALUE=1)

     Class: 16

     Number: 777980010

     Message: Instance Id = {35B8999D-6C69-068A-1A78-82D12509DA08} last modification is more recent than submitted.

    -----Warningr-------------

    The database subscription query is longer than expected. Check the database or simplify the database subscription criteria.

     The following errors were encountered:

     Exception message: Subscription query is taking long.

    Subscription id: b9d0e2c7-3846-feb9-1ebd-3ccbdc366234

    Time spent in new state: 00:00:09.9995120

    Time spent in querying state: 00:00:10.0105115

     One or more subscriptions were affected by this.  

    Subscription name: ServiceManager.ActivityManagement.DependentActivityStatusRule 

    Instance name: Activity Workflow Target 

    Instance ID: {2ED5ABBB-C0B8-7622-DAB6-7D792C212F54} 

    Management group: CORSMPMG

  • Adam_DzyackyAdam_Dzyacky Customer Contributor Monkey ✭✭✭✭✭

    Is that 365 for all Work Item types and their History? or just Work Items? And in terms of Work Item volume in your operational/ServiceManager DB are we talking 10k work items? 50k? 100k?

    And I'm assuming your using SQL Profiler to gather this information - do you know by chance which of your management servers is triggering this SQL work? My guess is your workflow server, but doesn't hurt to ask.

  • Chris_SpragueChris_Sprague Customer IT Monkey ✭

    work item are 365. History is 30 days. about 15k work items.

    It is the workflow server that is triggering the query.

  • Benjamin_StobbartBenjamin_Stobbart Customer IT Monkey ✭
    edited December 2019

    @Chris_Sprague

    We have been trouble shooting a similar TempDB issue with one of our 4 SCSM Instances as well for a few months now.

    What we have observed a loop of a query that appears to be doing user permission calculations. It times out and begins perform a SQL roll back. While that is still rolling back, another one of the same query kicks off and it then also times out also. It does this over and over again and eventually we end up with so many still rolling back and a new one kicked off that it has eaten up the entire tempDB we have available.

    Interestingly we have found that restarting one of our Secondary Management servers resolves the issue temporarily only to return a month or so later.

    We have been unable to identify the root cause at this stage, but would be interested if Microsoft identify anything further for you.

    We do not run Cireson products on this instance though, so IF it is the same thing you are experiencing I am not convinced it will be Cireson related.

  • Chris_SpragueChris_Sprague Customer IT Monkey ✭

    I will post the solution if Microsoft identifies one. Hopefully it will be soon. I have tried restarting all management servers without improving the problem.

  • Chris_SpragueChris_Sprague Customer IT Monkey ✭

    The solution was to change the functional level of the database to 2012 instead of 2014. MS is still searching for why the system is not functioning using the 2014 optimization plans, as it is supposed to be compatible.

Sign In or Register to comment.