Portal will not load in browser over https
Hi all! So I recently got my production environment up and running, and all was smooth sailing until recently. I spun up the portal, and all seemed good, but now I can't hit the console or website through the NLB name (and I was able to in the beginning).
Setup: one workflow server, three management servers behind an NLB name of service desk
I have IIS set on each management server so there's a binding for both servicedesk and servicedesk.company.com on 443, using an ADCS internal cert with both names in the SAN listing.
What I can do:
- I can connect to the console by direct server name (all four servers)
- I can connect to the portal through http only through the direct server name
What I can't do:
- Can't connect to the portal using servicedesk.company.com or servicedesk
- Can't connect to the console with servicedesk.company.com
I get an error when I try to connect to the console, saying the OMSDK service isn't running, but it definitely is. I've restarted all the OMSDK and OMCFG services across all the boxes, as well and the cachebuilder service on the workflow server. Still no dice. Here's the error output I get:
Date: 8/20/2020 8:52:14 AM
Application: Service Manager 2019 Console
Application Version: 10.19.1035.0
Severity: Error
Message: Failed to connect to server 'servicedesk.company.com'
Microsoft.EnterpriseManagement.Common.ServiceNotRunningException: The Data Access service is either not running or not yet initialized. Check the event log for more information. ---> System.ServiceModel.EndpointNotFoundException: Could not connect to net.tcp://servicedesk.company.com:5724/DispatcherService. The connection attempt lasted for a time span of 00:00:21.0010215. TCP error code 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.100.24.149:5724. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.100.24.149:5724
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
--- End of inner exception stack trace ---
Server stack trace:
at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
at System.ServiceModel.Channels.BufferedConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(TimeSpan timeout)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.LayeredChannel`1.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.CallOpenOnce.System.ServiceModel.Channels.ServiceChannel.ICallOnce.Call(ServiceChannel channel, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.CallOnceManager.CallOnce(TimeSpan timeout, CallOnceManager cascade)
at System.ServiceModel.Channels.ServiceChannel.EnsureOpened(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Microsoft.EnterpriseManagement.Common.Internal.IDispatcherService.Connect(SdkClientConnectionOptions connectionOptions)
at Microsoft.EnterpriseManagement.Common.Internal.SdkDataLayerProxyCore.Initialize(EnterpriseManagementConnectionSettings connectionSettings, SdkChannelObject`1 channelObjectDispatcherService)
at Microsoft.EnterpriseManagement.Common.Internal.SdkDataLayerProxyCore.CreateEndpoint[T](EnterpriseManagementConnectionSettings connectionSettings, SdkChannelObject`1 channelObjectDispatcherService)
--- End of inner exception stack trace ---
at Microsoft.EnterpriseManagement.Common.Internal.ExceptionHandlers.HandleChannelExceptions(Exception ex)
at Microsoft.EnterpriseManagement.Common.Internal.SdkDataLayerProxyCore.CreateEndpoint[T](EnterpriseManagementConnectionSettings connectionSettings, SdkChannelObject`1 channelObjectDispatcherService)
at Microsoft.EnterpriseManagement.Common.Internal.SdkDataLayerProxyCore.ConstructEnterpriseManagementGroupInternal[T,P](EnterpriseManagementConnectionSettings connectionSettings, ClientDataAccessCore clientCallback)
at Microsoft.EnterpriseManagement.Common.Internal.SdkDataLayerProxyCore.RetrieveEnterpriseManagementGroupInternal[T,P](EnterpriseManagementConnectionSettings connectionSettings, ClientDataAccessCore callbackDispatcherService)
at Microsoft.EnterpriseManagement.Common.Internal.SdkDataLayerProxyCore.Connect[T,P](EnterpriseManagementConnectionSettings connectionSettings, ClientDataAccessCore callbackDispatcherService)
at Microsoft.EnterpriseManagement.ServiceManagementGroup.InternalInitialize(EnterpriseManagementConnectionSettings connectionSettings, EnterpriseManagementGroupInternal internals)
at Microsoft.EnterpriseManagement.UI.SdkDataAccess.ManagementGroupSessionManager.Connect(String server)
at Microsoft.EnterpriseManagement.ServiceManager.UI.Console.Credentials.ManagementGroupConnection.TryConnectToManagementGroupJob(Object sender, ConsoleJobEventArgs args)
System.ServiceModel.EndpointNotFoundException: Could not connect to net.tcp://servicedesk.company.com:5724/DispatcherService. The connection attempt lasted for a time span of 00:00:21.0010215. TCP error code 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.100.24.149:5724. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.100.24.149:5724
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
--- End of inner exception stack trace ---
Server stack trace:
at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
at System.ServiceModel.Channels.BufferedConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(TimeSpan timeout)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.LayeredChannel`1.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.CallOpenOnce.System.ServiceModel.Channels.ServiceChannel.ICallOnce.Call(ServiceChannel channel, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.CallOnceManager.CallOnce(TimeSpan timeout, CallOnceManager cascade)
at System.ServiceModel.Channels.ServiceChannel.EnsureOpened(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Microsoft.EnterpriseManagement.Common.Internal.IDispatcherService.Connect(SdkClientConnectionOptions connectionOptions)
at Microsoft.EnterpriseManagement.Common.Internal.SdkDataLayerProxyCore.Initialize(EnterpriseManagementConnectionSettings connectionSettings, SdkChannelObject`1 channelObjectDispatcherService)
at Microsoft.EnterpriseManagement.Common.Internal.SdkDataLayerProxyCore.CreateEndpoint[T](EnterpriseManagementConnectionSettings connectionSettings, SdkChannelObject`1 channelObjectDispatcherService)
System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.100.24.149:5724
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
Best Answer
-
T_R_Ash_McCan_1 Member IT Monkey ✭
Ok, classic case of pulling every lever except for the one you need to pull. I finally pulled the right lever today. I changed the cluster mode from unicast to multicast (even though i was in an acceptable unicast format), and poof - everything works. Everything pings, everything balances, web page loads and console loads by hostname in various states of NLB server availability. Brian, thank you for giving this a shot with me!
1
Answers
How is your NLB configured?
Do you have a VIP configured for the console port?
Updating the VIP port should address the console issue.
For the portal connection what error do you get on 443? Are you talking about the Microsoft portal or the Cireson Portal?
Hi Brian, thanks for your assistance, NLB is configured with Windows NLB, all three management servers sit behind the servicedesk DNS name and it's IP address, and each server is configured with a management NIC and and a NLB NIC. Each of the servers' NLB NIC addresses are what is used in the NLB cluster.
Since you are running Dual NIC on the MGT hosts. Is the secondary NIC for the NLB DNS registered to the MGT server name? It needs to be so the SPNs match up to the server name.
This doesn't sound familiar, so I am going to assume this has not been done. This is outside my comfort zone so I'm learning a lot of new things with IIS. I will need to research this and see how this all works. If you know of a good blog/guide, I'd gladly take a look at it as well.
This has nothing to do with IIS. This is AD and DNS, you are going to want to review the setspn commands. But I am willing to bet they were configured properly on the server AD object as you can connect directly to the host console connection.
In CMD prompt.
setspn -l dbgmscmmgt1
Expect to see
MSOMSdkSvc/dbgmscmmgt1.company.com
MSOMSdkSvc/dbgmscmmgt1
So a Ping -a (ip of dbgmscmmgt1) should return the dbgmscmmgt1 name.
That will show the "trust" route. IP->Name->SPN
For the NLB NIC if you Ping -a (ip) if that IP is not in reverse DNS to the name of the server your trust route is not there so the SPN will fail.
Disclaimer while I am a Domain Admin I am not fully versed on AD/DNS so I know someone will knock my verbiage but this is how I work thru my authinentication issues in application farms.
Ok, I think this is starting to come together a little bit. When I first created these servers with the dual NICs, I ran into an issue where both NICs were registering in DNS, which was wrecking havoc with my NLB and management capabilities for obvious reasons. So, I configured the NLB NIC not to register in DNS and configured it's DNS to be blank, and that solved my NLB and management issues. I'll bet money this is all related. If I need to modify what I've done on that end, that's cool, and I just wanted to get all that out in the open in case it's going to mess with the SPN configuration.
You might want to consider separating out the NLB into a remote box so the servers are single NIC.
I havn't seen Windows NLB Clustering features but the Docs showed there was a remote option.
We currently use a Kemp NLB and it works like a charm for the Console connections on 6 Management servers and 5 Portal servers. Plus offloading the SSL traffic.
Yeah I need to poke around a bit, I feel I'm so close to getting this up and running. The funny thing is when I first configured all these servers, it actually worked, for all of a few hours, and then stopped. Like I could actually connect to the console and website as the NLB name. As far as I know, I haven't changed anything since then.
Oh one other thing - while it only has the one server, my dev environment is set up the exact same way, and I am able to make it work, so I'm assuming it's just the way it's trying to pass traffic around the horn.
Do you use the free Kemp NLB solution?
We have physical hardware pair in a cluster mode as it supports much more then just the SCSM solution.
Gotcha, that's pretty cool. So I've been troubleshooting this more, and he's what I've found:
I've stopped two of the servers, so right now there's only one server responding in the cluster. I'm taking DNS out of the equation for now, and just focusing on getting things to respond through IP addresses.
What I've found is that the Cireson web portal works through the MGMT NIC IP address, but not through the NLB NIC. I'm convinced I've messed up the NLB config, but I'm not sure how, or how to fix it yet.
So, each server has two NICs, a MGMT NIC with the MGMT IP, and the NLB NIC, with the NLB IP and the Cluster IP.
So I can hit the portal using the IP address of the MGMT IP of all three servers:
https://[MGMT NIC IP on MGTServer1]/View/94ecd540-714b-49dc-82d1-0b34bf11888f
https://[MGMT NIC IP on MGTServer2]/View/94ecd540-714b-49dc-82d1-0b34bf11888f
https://[MGMT NIC IP on MGTServer3]/View/94ecd540-714b-49dc-82d1-0b34bf11888f
And of course the NLB name (servicedesk) isn't going to work, because it's the cluster name for the NLB IPs on the NLB NICs.
As I stated earlier, I'm no NLB expert by any means, so I'm trying to figure out what the config for this should look like.
Im my dev environment, which is set up the exact same way except with only one server in the NLB cluster, I can hit the portal by hostname but not by any of the 3 IPs! Weird stuff.
I figured I'd let you know my findings so far. 👍️
Ok, classic case of pulling every lever except for the one you need to pull. I finally pulled the right lever today. I changed the cluster mode from unicast to multicast (even though i was in an acceptable unicast format), and poof - everything works. Everything pings, everything balances, web page loads and console loads by hostname in various states of NLB server availability. Brian, thank you for giving this a shot with me!