CacheBuilder keeps stopping and can't be restarted

Ludvig_Liljequist · October 2016

Hi,

Since a week or two we have been having this very strange behavior. The service stops and gives the error message

"Error connecting to management server: The underlying provider failed on Open."
when trying to start it. The SCSM Console works fine so the management server is not down or malfunctioning. We are also getting
"System.InvalidOperationException: Instance failure.

at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)"
from WebPortal. This would of course point to some database connectivity issue but the database seems fine as well.

It all gets resolved by shutting down all servers and perfoming the startup in the correct order (db-server, primary mgmt, other mgmt, datawarehouse, Ciresom application server).

Any ideas? This is getting really frustrating as it occurs every day now.

Thank you all community members!

/ Ludvig

Brett_Moffett · October 2016

Hmmm. The only time I have seen an error like this is permissions on the account that is running the cachebuilder service.
But if that was the case then I would expect that it would never work, but you seem to be able to get it to work in a certain startup order.
The only other thing that might be happening is the resources on the SQL server are timing out. It might be worth checking the health of the servers when this is occurring. Things like RAM utilization, Disk IO, Ping rate etc.If there is a memory leak or a networking issue that creeps in over time this would be showing the symptoms that you are reporting.

As always, if you have a support agreement with us, please log a support call and one of our friendly support team will be in touch.

I hope this answers your question.

Ludvig_Liljequist · October 2016

Thank you Brett! I have opened a support case as well but I have learned that there is so much knowledge in this community so I wanted to reach out here as well!

We have monitored the db as well and also performed a Cireson Health Check two weeks ago so the technical setup should be more than fine... That is why this is so interesting and irritating

/ L

Brett_Moffett · October 2016

@Ludvig_Liljequist I can understand why this is getting frustrating.
Do you have full logging turned on for the portal?
Open the Portal install folder and edit the Web.config file in a text editor like Notepad++.
Search for the following:

<root>

<appender-ref ref="TraceAppender" />

<appender-ref ref="RollingLogFileAppender" />

<appender-ref ref="EventLogAppender" />

</root>

Edit the Level Value to read "ALL" and save the file
Restart the website and the app pool.
This will give you a verbose log that might give you more detail

Ludvig_Liljequist · October 2016

Hi!

Yes, I have all the logging enabled and will now wait for the problem to re-appear in order to send more information to the support team.

/ L

Leigh_Kilday · October 2016

This may seem simple, but have you checked that the service account running the cache isn't locked out?

Ludvig_Liljequist · October 2016

Hi! Yes, I have, thank you

I actually think that I may have solved the issue. It seems like the application server tried to contact the database server randomly on multiple IPs and the listener was only active on one of them. I created a separate DNS A-record for only one of the addresses and it seems to work fine for now.

Ludvig_Liljequist · October 2016

And to continue it is still fascinating that a reboot fixed the issue even though nothing else did - not even flushing the DNS cache and that the environment has been working for a year without this ever happening. I suppose some MS or driver update change some behavior.

CacheBuilder keeps stopping and can't be restarted

Answers

CIRESON COMMUNITY WEB SITE