I have been having problems with SCSM TempDB growing too large
I have been working with Microsoft on the issue for a number of weeks and they have not found the problem yet. The TempDB will fill up over and over and some things seem to not function properly.
We were able to find the query below in SQL as the problem (this occurs every HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\System Center\2010\Common\GroupCalcPollingIntervalMilliseconds and times out when the TembDB fills up):
Trying to find help from technet and other sites, it looks like the common denominator for everyone with these large queries that cause tempdb to grow out of control is Cireson integrations.
Is the query below realated to any Cireson Integrations?
Is there any patch or solution to stop this from occuring?
FROM (SELECT [S].[BaseManagedEntityId] AS [S],0 AS CS,[R0].[TargetEntityId] AS [T0],1 AS CRT0,[R1].[TargetEntityId] AS [T1],2 AS CRT1,[R2].[TargetEntityId] AS [T2],3 AS CRT2,[R3].[TargetEntityId] AS [T3],4 AS CRT3,[R4].[TargetEntityId] AS [T4],5 AS CRT4,[R5].[TargetEntityId] AS [T5],6 AS CRT5,[R6].[TargetEntityId] AS [T6],7 AS CRT6,[R7].[TargetEntityId] AS [T7],8 AS CRT7,[R8].[SourceEntityId] AS [T8],9 AS CRT8,[R9].[TargetEntityId] AS [T9],10 AS CRT9,[R10].[TargetEntityId] AS [T10],11 AS CRT10,[R11].[TargetEntityId] AS [T11],12 AS CRT11,[R12].[TargetEntityId] AS [T12],13 AS CRT12,[R13].[TargetEntityId] AS [T13],14 AS CRT13,[R14].[TargetEntityId] AS [T14],15 AS CRT14,[R15].[TargetEntityId] AS [T15],16 AS CRT15,[R16].[TargetEntityId] AS [T16],17 AS CRT16,[R17].[TargetEntityId] AS [T17],18 AS CRT17
FROM dbo.TypedManagedEntity AS S
LEFT OUTER JOIN dbo.Relationship AS R0 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAssignedToUser
ON (R0.[SourceEntityId] = S.[BaseManagedEntityId]
AND R0.[RelationshipTypeId] = '15e577a3-6bf9-6713-4eac-ba5a5b7c4722'
AND R0.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R1 /*SOURCE_HINT_PATTERN*/-- System.WorkItemClosedByUser
ON (R1.[SourceEntityId] = S.[BaseManagedEntityId]
AND R1.[RelationshipTypeId] = 'ba8180d3-5bf9-1bbd-ae87-145dd8fc520f'
AND R1.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R2 /*SOURCE_HINT_PATTERN*/-- System.WorkItemCreatedByUser
ON (R2.[SourceEntityId] = S.[BaseManagedEntityId]
AND R2.[RelationshipTypeId] = 'df738111-c7a2-b450-5872-c5f3b927481a'
AND R2.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R3 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAffectedUser
ON (R3.[SourceEntityId] = S.[BaseManagedEntityId]
AND R3.[RelationshipTypeId] = 'dff9be66-38b0-b6d6-6144-a412a3ebd4ce'
AND R3.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R4 /*SOURCE_HINT_PATTERN*/-- System.WorkItemRelatesToConfigItem
ON (R4.[SourceEntityId] = S.[BaseManagedEntityId]
AND R4.[RelationshipTypeId] = 'd96c8b59-8554-6e77-0aa7-f51448868b43'
AND R4.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R5 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAboutConfigItem
ON (R5.[SourceEntityId] = S.[BaseManagedEntityId]
AND R5.[RelationshipTypeId] = 'b73a6094-c64c-b0ff-9706-1822df5c2e82'
AND R5.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R6 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAboutConfigItem
ON (R6.[SourceEntityId] = S.[BaseManagedEntityId]
AND R6.[RelationshipTypeId] = 'b73a6094-c64c-b0ff-9706-1822df5c2e82'
AND R6.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R7 /*SOURCE_HINT_PATTERN*/-- System.WorkItemRelatesToWorkItem
ON (R7.[SourceEntityId] = S.[BaseManagedEntityId]
AND R7.[RelationshipTypeId] = 'cb6ce813-ea8d-094d-ee5a-b755701f4547'
AND R7.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R8 /*TARGET_HINT_PATTERN*/-- System.WorkItemRelatesToWorkItem
ON (R8.[TargetEntityId] = S.[BaseManagedEntityId]
AND R8.[RelationshipTypeId] = 'cb6ce813-ea8d-094d-ee5a-b755701f4547'
AND R8.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R9 /*SOURCE_HINT_PATTERN*/-- System.WorkItemHasFileAttachment
ON (R9.[SourceEntityId] = S.[BaseManagedEntityId]
AND R9.[RelationshipTypeId] = 'aa8c26dc-3a12-5f88-d9c7-753e5a8a55b4'
AND R9.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R10 /*SOURCE_HINT_PATTERN*/-- System.EntityLinksToKnowledgeDocument
ON (R10.[SourceEntityId] = S.[BaseManagedEntityId]
AND R10.[RelationshipTypeId] = 'ec8a256b-cc89-6f18-af6a-e9cdc38cc573'
AND R10.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R11 /*SOURCE_HINT_PATTERN*/-- System.WorkItemRelatesToRequestOffering
ON (R11.[SourceEntityId] = S.[BaseManagedEntityId]
AND R11.[RelationshipTypeId] = '2730587f-3d88-a4e4-42d8-08cf94535a6e'
AND R11.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R12 /*SOURCE_HINT_PATTERN*/-- System.WorkItemHasActionLog
ON (R12.[SourceEntityId] = S.[BaseManagedEntityId]
AND R12.[RelationshipTypeId] = '0edb327d-7ef8-0501-0494-df0a51eaba38'
AND R12.[IsDeleted] = 0
AND EXISTS ( -- Restricted to System.WorkItem.TroubleTicket.ActionLog
SELECT 1 FROM dbo.[TypedManagedEntity]
WHERE R12.[TargetEntityId] = [BaseManagedEntityId]
AND [ManagedTypeId] = 'dbb6a632-0a7e-cef8-1fc9-405d5cd4d911' AND [IsDeleted] = 0))
LEFT OUTER JOIN dbo.Relationship AS R13 /*SOURCE_HINT_PATTERN*/-- System.WorkItemHasCommentLog
ON (R13.[SourceEntityId] = S.[BaseManagedEntityId]
AND R13.[RelationshipTypeId] = '79d27435-5917-b0a1-7911-fb2b678f32a6'
AND R13.[IsDeleted] = 0
AND EXISTS ( -- Restricted to System.WorkItem.TroubleTicket.AnalystCommentLog
SELECT 1 FROM dbo.[TypedManagedEntity]
WHERE R13.[TargetEntityId] = [BaseManagedEntityId]
AND [ManagedTypeId] = 'f14b70f4-878c-c0e1-b5c1-06ca22d05d40' AND [IsDeleted] = 0))
LEFT OUTER JOIN dbo.Relationship AS R14 /*SOURCE_HINT_PATTERN*/-- System.WorkItemHasCommentLog
ON (R14.[SourceEntityId] = S.[BaseManagedEntityId]
AND R14.[RelationshipTypeId] = '79d27435-5917-b0a1-7911-fb2b678f32a6'
AND R14.[IsDeleted] = 0
AND EXISTS ( -- Restricted to System.WorkItem.TroubleTicket.UserCommentLog
SELECT 1 FROM dbo.[TypedManagedEntity]
WHERE R14.[TargetEntityId] = [BaseManagedEntityId]
AND [ManagedTypeId] = 'a3d4e16f-5e8a-18ba-9198-d9815194c986' AND [IsDeleted] = 0))
LEFT OUTER JOIN dbo.Relationship AS R15 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAssignedToUser
ON (R15.[SourceEntityId] = R7.[TargetEntityId]
AND R15.[RelationshipTypeId] = '15e577a3-6bf9-6713-4eac-ba5a5b7c4722'
AND R15.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R16 /*SOURCE_HINT_PATTERN*/-- System.WorkItemAssignedToUser
ON (R16.[SourceEntityId] = R8.[SourceEntityId]
AND R16.[RelationshipTypeId] = '15e577a3-6bf9-6713-4eac-ba5a5b7c4722'
AND R16.[IsDeleted] = 0)
LEFT OUTER JOIN dbo.Relationship AS R17 /*SOURCE_HINT_PATTERN*/-- System.FileAttachmentAddedByUser
ON (R17.[SourceEntityId] = R9.[TargetEntityId]
AND R17.[RelationshipTypeId] = 'ffd71f9e-7346-d12b-85d6-7c39f507b7bb'
AND R17.[IsDeleted] = 0)
WHERE (S.[ManagedTypeId] = '04b69835-6343-4de2-4b19-6be08c612989'
AND S.[IsDeleted] = 0)) AS PC
LEFT OUTER JOIN
(
-- MTV_Select_eca3c52a-f273-5cdc-f165-3eb95a2b26cf
SELECT [MTV_System$Domain$User].[UserName_6AF77E23_669B_123F_B392_323C17097BBD], [MTV_System$Domain$User].[BaseManagedEntityId]
FROM dbo.MTV_System$Domain$User AS MTV_System$Domain$User
) AS SubqueryJoin0_0 ON SubqueryJoin0_0.[BaseManagedEntityId] = [PC].[T0]
WHERE SubqueryJoin0_0.[UserName_6AF77E23_669B_123F_B392_323C17097BBD] IS NULL OPTION (KEEP PLAN)
Answers
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\System Center\2010\Common\GroupCalcPollingIntervalMilliseconds
If I recall correctly...
That registry key is generally brought up when it comes to overall SCSM performance and is in of itself a bit of a balancing act. The key is responsible for SCSM knowing how often permissions should be recalculated e.g. "Does this user have permissions to view/edit a Work Item" or "Does this User have the ability to view/edit this Config Item." Depending on how many Work/Config Items you have in your operational DB, coupled with how you've setup RBAC, the permission re-calculation can be very expensive to perform frequently (I think the default time is 30 seconds?). The easy solution here is to take the number up, but that comes at the cost that brand new Work Items your Analysts may not have access to until the next calculation.
When the Cireson portal is deployed, the Cache Builder performs a similar function. Only it's job is to run against the SCSM operational DB to sync tables out and into ServiceManagement. Again, if you have a complex or misconfigured RBAC within SCSM. CacheBuilder suffers its own performance issues resulting in overall system health issues. A way to observe this after the fact is to check out your CacheBuilder log and see how long it takes Cachebuilder to complete a full sync. Ideally it should be only a couple of minutes (single digits).
All of the above assumes you're using Queues and scoping within SCSM.
We have some very basic RBAC (limits which views a user can see and what service offerings are available in the Self Service portal) but nothing major. Microsoft has not found the cause of why the query takes so long, so I was just posting here.
Roles:
basic user (Self-Service) - Can submit incidents - limited by catalog group.
Advanced user (Self-Service) - Can submit incidents and service requests - limited by catalog group.
Technician - Console user - Advanced operator limited by views. (also has a offerings limited by catalog group)
I had the group calculation set at every 10 minutes and changed to once an hour.
Note, the SQL Program_name that it shows it
SC DAL--GroupCalc
The tempdb still fills up. Hoping Microsoft can come back with what is causing the group calculation to grow so large.
Cycling through some notes here -
-----Error-------------
Data Access Layer rejected retry on SqlError:
Request: p_ManagedEntityInsert -- (BaseManagedEntityId=35b8999d-6c69-068a-1a78-82d12509da08), (TypedManagedEntityId=35b8999d-6c69-068a-1a78-82d12509da08), (ManagedTypeId=a604b942-4c7b-2fb2-28dc-61dc6f465c68), (FullName=System.WorkItem.Incident:IR2698004), (Path=), (Name=IR2698004), (TopLevelHostEntityId=35b8999d-6c69-068a-1a78-82d12509da08), (DiscoverySourceId=7431e155-3d9e-4724-895e-c03ba951a352), (HealthServiceEntityId=aead6610-e8da-5379-0953-0d12de4f8ff1), (PerformHealthServiceCheck=False), (TimeGenerated=12/17/2019 6:10:18 PM), (IsOptimistic=True), (LifetimeRelationshipId=), (IsConnectorLoggingDisabled=False), (ConcurrentConnections=True), (LastModified=12/17/2019 6:07:36 PM), (TypedInstanceInserted=False), (ChangeId=), (RETURN_VALUE=1)
Class: 16
Number: 777980010
Message: Instance Id = {35B8999D-6C69-068A-1A78-82D12509DA08} last modification is more recent than submitted.
-----Warningr-------------
The database subscription query is longer than expected. Check the database or simplify the database subscription criteria.
The following errors were encountered:
Exception message: Subscription query is taking long.
Subscription id: b9d0e2c7-3846-feb9-1ebd-3ccbdc366234
Time spent in new state: 00:00:09.9995120
Time spent in querying state: 00:00:10.0105115
One or more subscriptions were affected by this.
Subscription name: ServiceManager.ActivityManagement.DependentActivityStatusRule
Instance name: Activity Workflow Target
Instance ID: {2ED5ABBB-C0B8-7622-DAB6-7D792C212F54}
Management group: CORSMPMG
Is that 365 for all Work Item types and their History? or just Work Items? And in terms of Work Item volume in your operational/ServiceManager DB are we talking 10k work items? 50k? 100k?
And I'm assuming your using SQL Profiler to gather this information - do you know by chance which of your management servers is triggering this SQL work? My guess is your workflow server, but doesn't hurt to ask.
work item are 365. History is 30 days. about 15k work items.
It is the workflow server that is triggering the query.
@Chris_Sprague
We have been trouble shooting a similar TempDB issue with one of our 4 SCSM Instances as well for a few months now.
What we have observed a loop of a query that appears to be doing user permission calculations. It times out and begins perform a SQL roll back. While that is still rolling back, another one of the same query kicks off and it then also times out also. It does this over and over again and eventually we end up with so many still rolling back and a new one kicked off that it has eaten up the entire tempDB we have available.
Interestingly we have found that restarting one of our Secondary Management servers resolves the issue temporarily only to return a month or so later.
We have been unable to identify the root cause at this stage, but would be interested if Microsoft identify anything further for you.
We do not run Cireson products on this instance though, so IF it is the same thing you are experiencing I am not convinced it will be Cireson related.
I will post the solution if Microsoft identifies one. Hopefully it will be soon. I have tried restarting all management servers without improving the problem.
The solution was to change the functional level of the database to 2012 instead of 2014. MS is still searching for why the system is not functioning using the 2014 optimization plans, as it is supposed to be compatible.