How much time does Queues take to apply to work items in your environment?
We currently have 13 queues (5 IR, 5 SR, 1 PR, 1 RR, 1 CR) and the GroupCalcPollingIntervalMilliseconds is not set.
We usually have good workflow performance but the queues are taking a bit more time that we excepted. We currently average at 2.5 minutes with some that went as high as 25 minutes.
We did have our database compatibility level set to SQL 2016 (130). Changing it helped with the console performance but the queues didn't get much better. I still have a Premier ticket opened but I wanted to see what kind of queue application time others are seeing.
Are you seeing similar queue application time?
Gerhard_Goossens Customer Advanced IT Monkey ✭✭✭We have 7 queues and it takes about two minutes for the work items to appear for the analysts. We have about 30 Support Groups in IR and SR supporting 65k users and 4500 analysts. I have noticed that the workflows that kicks off every minute to do something that affects the queues make the process a couple of minutes longer.5
SET @Delta = 0
SET @Language = 'ENU'
SET @MaxState = (
FROM EntityChangeLog WITH(NOLOCK)
SET @MaxStateDate = (
WHERE EntityTransactionLogId = @MaxState
LT.LTValue AS 'Display Name',
S.State AS 'Current Workflow Watermark',
@MaxState AS 'Current Transaction Log Watermark',
FROM EntityTransactionLog WITH(NOLOCK)
WHERE EntityTransactionLogId = S.State), @MaxStateDate) AS 'Minutes Behind',
R.RuleName AS 'MP Rule Name',
MT.TypeName AS 'Source Class Name',
S.LastModified AS 'Rule Last Modified',
S.IsPeriodicQueryEvent AS 'Is Periodic Query Subscription', --Note: 1 means it is a periodic query subscription
R.RuleEnabled AS 'Rule Enabled', -- Note: 4 means the rule is enabled
FROM CmdbInstanceSubscriptionState AS S WITH(NOLOCK)
LEFT OUTER JOIN Rules AS R
ON S.RuleId = R.RuleId
LEFT OUTER JOIN ManagedType AS MT
ON S.TypeId = MT.ManagedTypeId
LEFT OUTER JOIN LocalizedText AS LT
ON R.RuleId = LT.MPElementId
S.State <= @MaxState - @Delta
AND R.RuleEnabled <> 0
AND LT.LTStringType = 1
AND LT.LanguageCode = @Language
AND S.IsPeriodicQueryEvent = 0
/*Note: Uncomment this line and use this optional criteria if you want to
look at a specific workflow that you know the display name of*/
--AND LT.LTValue LIKE '%Test%'
ORDER BY S.State Asc
SELECT DB_NAME ([vfs].[database_id]) AS [DB],
[BytesTransferred] = CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0) THEN 0
ELSE ([num_of_bytes_read] + [num_of_bytes_written]) END,
[BytesRead] = CASE WHEN [num_of_reads] = 0 THEN 0
ELSE [num_of_bytes_read] END,
[BytesWritten] = CASE WHEN [num_of_writes] = 0 THEN 0
ELSE [num_of_bytes_written] END,
[ReadLatency] = CASE WHEN [num_of_reads] = 0 THEN 0
ELSE ([io_stall_read_ms] / [num_of_reads]) END,
[WriteLatency] = CASE WHEN [num_of_writes] = 0 THEN 0
ELSE ([io_stall_write_ms] / [num_of_writes]) END,
[Latency] = CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0) THEN 0
ELSE ([io_stall] / ([num_of_reads] + [num_of_writes])) END,
[AvgBPerRead] = CASE WHEN [num_of_reads] = 0 THEN 0
ELSE ([num_of_bytes_read] / [num_of_reads]) END,
[AvgBPerWrite] = CASE WHEN [num_of_writes] = 0 THEN 0
ELSE ([num_of_bytes_written] / [num_of_writes]) END,
[AvgBPerTransfer] = CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0) THEN 0
ELSE (([num_of_bytes_read] + [num_of_bytes_written]) /([num_of_reads] + [num_of_writes])) END,
LEFT ([mf].[physical_name], 2) AS [Drive],
FROM sys.dm_io_virtual_file_stats (NULL,NULL) AS [vfs]
JOIN sys.master_files AS [mf]
ON [vfs].[database_id] = [mf].[database_id] AND [vfs].[file_id] = [mf].[file_id]
-- WHERE [vfs].[file_id] = 2 -- log files
ORDER BY [Latency] DESC
-- ORDER BY [ReadLatency] DESC
--ORDER BY [BytesRead] DESC;
When a change is made on the work item it updates the work item so it ends up being evaluated again the next time the workflow runs for queues. Since basically the base line of everything in SCSM is 1 minutes cycle of 100 batches. If in 1 minute you have 200 IR changes it will take 2 minutes just for those changes to process thru the queues correctly, and then in the next minute is another 200 changes occur you just keep getting further and further behind. Part of the key to the system is having a good estimate of your rate of change and adjusting the system for that. It is possible to increase the cycle and/or increase the batch size. That depends on the performance of your SQL farm.
Now on the system design I ask myself thinks like do I need a queue for CR's? Is the requirement that analysts do not have access to CR's or that they must NOT have access to CR's. If must NOT, ok. But if just not accessible, don't provide rights in SCSM or train them how to find. Most of my environment is security thru omission, the analysts do not go any where but My work and in truth are too lazy to look any where else.
Yes, I agree, it also depends on the amount of WI in the live DB, if you have a large amount of WI being created you should adjust the batch size / frequency of the updates and change the retention/grooming frequency. If you have a DW all the stats are in there for reporting and you need to ask yourself "Do we really need to have ALL WI available for daily operations" if not change the retention to something like 3 months.
I will take that an average of about two minutes should be normal, depending on when the workflow got trigerred and how many items were modified since the previous batch.
I will continue working on other performance improvments that can be done to insure the best workflow performance.