server running cachebuilder sometimes doesn't fully re-initialize after reboot

Adam_DzyackyAdam_Dzyacky Customer Contributor Monkey ✭✭✭✭✭
I've noticed this maybe once or twice, in that occasionally when the box running the Cache Builder service reboots (as a result of SCCM patching requiring said reboot) that when the server comes back up, the Cache Builder while running isn't writing/syncing data between SCSM and the Service Management database - or if it is, it seems to be rather hit n miss (in that some data such as new work items are there, but not all of them from "today")

A simple restart of the service resolves this and doesn't cause anyone any heartache, but curious if something else is going on here or anyone else has experienced this? Otherwise, seems like something that would be easily runbooked.

Comments

  • Brian_WiestBrian_Wiest Customer Ninja IT Monkey ✭✭✭✭
    We used to run into this and belived that it was due to the system finalizing patching after the reboot casuing a delay in the System Center Data Access Service taking long then norm to start, as we would find in the cache builder log a line item of the Data Access service not found.
    "ERROR [   5]:  Error connecting to management server: The Data Access service is either not running or not yet initialized. Check the event log for more information."

    How we adjusted for this is we have a powerscell command that walks thru each of our managment servers (6) stops the required services, and deletes the config cache. Perform our patching on all servers. When the farm is done we run a script to restart the services in a specific server/service order.
  • Adam_DzyackyAdam_Dzyacky Customer Contributor Monkey ✭✭✭✭✭
    Hrmmm, I can get behind that order of operations logic.
  • Adam_DzyackyAdam_Dzyacky Customer Contributor Monkey ✭✭✭✭✭
    I really like automating and do my best to create it unnecessarily. That said, can @Brian_Wiest or anyone at Cireson think of a reason why I shouldn't create a Windows Service dependency on the System Center Data Access Service (omsdk) for the Cireson CacheBuilder service (CacheBuilder)?

    But it should be noted that out of my own curiosity, I've just deployed this to a dev SCSM environment :)

    sc config CacheBuilder depend= omsdk

  • chelsea_alonsochelsea_alonso Customer IT Monkey ✭
    edited December 2016
    I had that issue yesterday and threw this in a .net poweshell activity in a once and hour monitor:
    $cacheLogFile = "\\serverName\g$\InetPub\PortalPath\bin\Logs\CacheBuilder.log"
    $cacheLog = Get-Content $cacheLogFile
    $cacheService = Get-Service -Name "Cireson Cache Builder" -ComputerName "serverName"
    $date = Get-Date -format "yyyyMMdd"
    $restarted = $false
    if($cacheLog -like "*Unable to sync WorkItemCommand*"){ #this was the error I needed to trigger the restart
        Stop-Service $cacheService
        Start-Sleep 10
        Rename-Item $cacheLogFile "CacheBuilder$date.log"
        Start-Service $cacheService    
        $restarted = $true
    }
    if($cacheService.Status -ne "Running") {
        Start-Service $cacheService
        $restarted = $true
    }
    $status = $cacheService.Status

    I also put a conditional link that when $restarted equals True it emails me saying that it restarted and what the current $status is.
  • Robert_BrentleyRobert_Brentley Customer IT Monkey ✭

    Hi Adam,

    I always set my Cireson servers to have the Cache Builder service to Automatic Delay Start within service.msc else I tend to notice within a virtual environment that the Cache Builder service tries to start too early.

    It might be worth trying that.

    Also note that when you upgrade it does usually flip it back to just start automatically without the delay start.

  • Mike_StormsMike_Storms Customer IT Monkey ✭

    We have a very well documented procedure for quiescing our  SCSM/Cireson environment whenever maintenance is being applied rather it be patching or other work. This seems to real help reduce the number if hic-cups in the system. I have been working on a orchestration to auto-mate and validate the process.

    However even with doing this we still have occasions where WI especially IRs are in SCSM but not visible in through the portal or grid views don't match. For the grid view mismatches I have had good luck just adding a note to the WI or activity which seems to clear the issue; I assume it modifies the last modified date and then Cireson refreshes it.

    However missing items in the portal seem only to be resolved via a restart cache-builder... There has to be a better way  ... to force missing WIs into the portal... I have tried modifying them but that doesn't work... refreshing cache builder during the day can lead to other issues on a busy system...

    Any thoughts? We are running 7.4..2012.11 but the issue has been around since I can remember...

    Also we still have issues with workflows activities getting stuck in an incorrect status... We usually just run a PowerShell to fix the status to keep things rolling... but these little nuances take time and give the product a bad user experience and question it's integrity...

    Any stuff folks have done to limit or fix or even auto detect and fix would be awesome to share...


Sign In or Register to comment.