Error creating an on-premise multi-machine Service Fabric Cluster - azure-service-fabric

In order to test and evaluate SF for production use, I created one (single-machine) test cluster on a production machine with three nodes, which worked fine. However, I failed to create a multi-machine cluster with three nodes.
I followed these instructions: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-for-windows-server/
All machines:
are on the same (secure) network with the following IPs: 10.0.10.12, 10.0.11.12, 10.0.12.12.
are virtual and were created freshly from the same image.
are not part of a domain. The setup is done with the administrator account with the same password on all machines.
using Windows Server 2012 R2 with PowerShell 4.0.
have disabled firewalls (public and private).
This is the clusterConfig.json:
{
"name":"SampleCluster",
"clusterManifestVersion":"1.0.0",
"apiVersion":"2015-01-01-alpha",
"nodes":[
{
"nodeName":"vm1",
"iPAddress":"10.0.10.12",
"nodeTypeRef":"NodeType0",
"faultDomain":"fd:/dc1/fd1",
"upgradeDomain":"UD0"
},
{
"nodeName":"vm2",
"iPAddress":"10.0.11.12",
"nodeTypeRef":"NodeType0",
"faultDomain":"fd:/dc1/fd2",
"upgradeDomain":"UD1"
},
{
"nodeName":"vm3",
"iPAddress":"10.0.12.12",
"nodeTypeRef":"NodeType0",
"faultDomain":"fd:/dc1/fd3",
"upgradeDomain":"UD2"
}
],
"diagnosticsFileShare": {
"etlReadIntervalInMinutes": "5",
"uploadIntervalInMinutes": "10",
"dataDeletionAgeInDays": "7",
"etwStoreConnectionString": "file:c:\\ProgramData\\SF\\FileshareETW",
"crashDumpConnectionString": "file:c:\\ProgramData\\SF\\FileshareCrashDump",
"perfCtrConnectionString": "file:c:\\ProgramData\\SF\\FilesharePerfCtr"
},
"properties":{
"reliabilityLevel": "Bronze",
"nodeTypes": [
{
"name": "NodeType0",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpoint": "19001",
"httpGatewayEndpointPort": "19080",
"applicationPorts": {
"startPort": "20001",
"endPort": "20031"
},
"ephemeralPorts": {
"startPort": "20032",
"endPort": "20062"
},
"isPrimary": true
}
],
"fabricSettings": [
{
"name": "Setup",
"parameters": [
{
"name": "FabricDataRoot",
"value": "C:\\ProgramData\\SF"
},
{
"name": "FabricLogRoot",
"value": "C:\\ProgramData\\SF\\Log"
}
]
}
]
}
}
When I start the cluster setup from one of the machines (it was 10.0.10.12), this is written to the PowerShell console:
Cab extracted.
Creating Service Fabric Cluster...
If it's taking too long, please check in Task Manager details and see if Fabric.exe for each node is running. If not, p
lease look at: 1. traces in DeploymentTraces directory and 2. traces in FabricLogRoot configured in ClusterConfig.json.
Trace folder doesn't exist. Creating trace folder: C:\copy\DeploymentTraces
Verifying remote procedure call access against cluster machines.
Processing and validating cluster config.
Creating FabricSettingsMetadata from C:\copy\ServiceFabricPackage\bin\Fabric\Fabric.Code\Configurations.csv
Configuring nodes.
Copying installer & package to all machines.
Configuring machine 10.0.10.12
Configuring machine 10.0.11.12
Here the setup remains for a few minutes. Then a timeout occurs:
Timed out waiting for Installer Service to start for machine 10.0.11.12.
CreateCluster Error: System.InvalidOperationException: Cannot start service FabricInstallerSvc on computer '10.0.11.12'.
---> System.ComponentModel.Win32Exception: The system cannot find the file specified
--- End of inner exception stack trace ---
at System.ServiceProcess.ServiceController.Start(String[] args)
at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()
Errors occurred during cluster creation.
CreateCluster Exception 0: System.AggregateException: One or more errors occurred. ---> System.InvalidOperationException
: Cannot start service FabricInstallerSvc on computer '10.0.11.12'. ---> System.ComponentModel.Win32Exception: The syste
m cannot find the file specified
--- End of inner exception stack trace ---
at System.ServiceProcess.ServiceController.Start(String[] args)
at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()
--- End of inner exception stack trace ---
---> (Inner Exception #0) System.InvalidOperationException: Cannot start service FabricInstallerSvc on computer '10.0.11
.12'. ---> System.ComponentModel.Win32Exception: The system cannot find the file specified
--- End of inner exception stack trace ---
at System.ServiceProcess.ServiceController.Start(String[] args)
at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()<---
When I check the Services on the particular machine (10.0.11.12), I found the Service Fabric Installer Service in the list, but which is not running. Further I can find an error in the Windows Event Log showing this (which is in line with the error message above):
The Service Fabric Installer Service service failed to start due to the following error:
The system cannot find the file specified.
On the particular machine, I located the following log file: C:\ProgramData\SF\Log\traces\FabricInstallerService_5.1.150.9590_131111077992093094.trace. It contains this:
2016-06-22 22:23:19.224,Info ,708,General.FabricInstallerServiceImpl,FabricInstallerService starting ...
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b4480bcf0,Attempting to attach child AsyncOperation 3b4480bdf0.
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b4480bdf0,Calling OnStart
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b4480bdf0,Attempting to attach child AsyncOperation 3b4480c9b0.
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b4480c9b0,Calling OnStart
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b4480c9b0,FinishComplete called with S_OK
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b44811270,Attempting to attach child AsyncOperation 3b44811630.
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b44811630,Calling OnStart
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b4480bdf0,FinishComplete called with S_OK
2016-06-22 22:23:19.224,Noise ,1652,General.FabricInstallerServiceImpl,FabricUpgradeManager open returned S_OK
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b4480bcf0,Detaching child AsyncOperation 3b4480bdf0.
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b4480bdf0,Detaching child AsyncOperation 3b4480c9b0.
2016-06-22 22:23:19.224,Info ,1652,FabricInstallerService.FabricUpgradeManager,Upgrade started with FabricDataRoot:C:\ProgramData\SF, FabricLogRoot:C:\ProgramData\SF\Log, FabricCodePath:C:\Program Files\Microsoft Service Fabric\bin\fabric\fabric.code, FabricRoot:C:\Program Files\Microsoft Service Fabric, TargetInformationFilePath:C:\ProgramData\SF\TargetInformation.xml, TargetInformationDescription:TargetInformationFileDescription { CurrentInstallation = WindowsFabricDeploymentDescription { IsValid = true, InstanceId = 0, MSILocation = , ClusterManifestLocation = , InfrastructureManifestLocation = , NodeName = , UpgradeEntryPointExe = , UpgradeEntryPointExeParameters = , UndoUpgradeEntryPointExe = FabricSetup.exe, UndoUpgradeEntryPointExeParameters = /operation:Uninstall , }TargetInstallation = WindowsFabricDeploymentDescription { IsValid = false, InstanceId = , MSILocation = , ClusterManifestLocation = , InfrastructureManifestLocation = , NodeName = , UpgradeEntryPointExe = , UpgradeEntryPointExeParameters = , UndoUpgradeEntryPointExe = , UndoUpgradeEntryPointExeParameters = , }}
2016-06-22 22:23:19.224,Info ,1652,FabricInstallerService.FabricUpgradeManager,Stopping fabric host
2016-06-22 22:23:19.224,Info ,1652,FabricInstallerService.FabricUpgradeManager,Error 0x80070424 while waiting for fabric host service to stop.
2016-06-22 22:23:19.224,Error ,1652,FabricInstallerService.FabricUpgradeManager,Unable to stop fabric host service; error
2016-06-22 22:23:19.224,Error ,1652,FabricInstallerService.FabricUpgradeManager,Error E_FAIL while trying to stop fabric host service
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation#3b44811630,FinishComplete called with E_FAIL
2016-06-22 22:23:19.224,Warning ,1652,FabricInstallerService.FabricUpgradeManager,Upgrade finished with error E_FAIL
2016-06-22 22:23:19.224,Info ,1636,General.FabricInstallerServiceImpl,service stopping (shutdown = false) ...
2016-06-22 22:23:19.224,Info ,1636,General.FabricInstallerServiceImpl,Stop FabricUpgradeManager called
2016-06-22 22:23:19.240,Info ,2472,General.FabricInstallerServiceImpl,Close FabricUpgradeManager, with timeout 5:00.000
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation#3b4480be00,Attempting to attach child AsyncOperation 3b4480c4d0.
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation#3b4480c4d0,Calling OnStart
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation#3b4480c4d0,Attempting to attach child AsyncOperation 3b4480c5d0.
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation#3b4480c5d0,Calling OnStart
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation#3b4480c5d0,FinishComplete called with S_OK
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation#3b4480c4d0,FinishComplete called with S_OK
2016-06-22 22:23:19.240,Noise ,2472,General.FabricInstallerServiceImpl,Close FabricUpgradeManager returned S_OK
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation#3b4480be00,Detaching child AsyncOperation 3b4480c4d0.
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation#3b4480c4d0,Detaching child AsyncOperation 3b4480c5d0.
This is the point where I am stuck. My thoughts are:
Communication and accessibility between the machines seem to be OK, since the setup files were copied, and the setup process started.
The Service Fabric Installer Service seem to play an important role here.
It seems that the Service Fabric Installer Service works properly on the machine where I started the setup process, but on the remote machines they fail.
Any ideas? Thanks.

I'm not exactly sure of your scenario as I've only had experience installing cluster using Windows Machine group or gMSA account - and it's always been a rather painful experience...but if you persist, you'll get there in the end!
You mention that it is running in a secured network, but not part of a domain? Typically in Active directory environments, SF runs under NETWORK SERVICE account - so potentially you can try adding all the machines to local Administrators group of each machine.
I know that with gMSA account, I had to add this to local admin group on each machine - as well as grant it logon as a service.
Other than that, I also suggest you check the event log - Administrators and Security Audit logs, in particular

I hit the similar issue. Turns out that having Service Fabric SDK/Service Fabric installed via the web installer was breaking my install. I uninstalled those and it worked.
Also i have issues running the script of a machine that wasn't going to be a node in the cluster. (I'd drop a comment but I don't have enough points)

Related

Service Fabric Application fails to find the managed identity endpoint

The Service Fabric cluster exists, the applications exists and are running. The user-assigned managed identity exists in the same resource group the cluster is. NOTE: I do not know how to verify whether it is assigned to the cluster or not.
Code is trying to create a Storage queues client using the identity and I get the error below, which I think means that the fabric:/System/ManagedIdentityTokenService is not running. NOTE: I do not know how to verify whether the service is running or not.
NOTE: Very similar code worked in other clusters.
NOTE: the underlying VMSS does have the managed identity associated to it.
NOTE: I am using Storage SDK 12. The C# code does the following:
ManagedIdentityCredentials cred = new ManagedIdentityCredentials(ClientId: "XYZ...");
string queueEndpoint = string.Format("https://{0}.queue.core.windows.net/{1}", accountName, queueName);
QueueClient qc = QueueClient(new Uri(queueEndpoint), cred);
bool b = await qc.CreateIfNotExistsAsync(); // This one throws the error below.
Any guidance to fix this issue would be appreciated.
Error:
Trying to create a queue (using MSI) failed with exception Azure.Identity.CredentialUnavailableException: No managed identity endpoint found.
at Azure.Identity.ExtendedAccessToken.GetTokenOrThrow()
at Azure.Identity.ManagedIdentityCredential.d__8.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AccessTokenCache.d__11.MoveNext()

#LocatorApplication starts and then immediately stops

Everything seems to be created fine but once it finishes initializing everything it just stops.
#SpringBootApplication
#LocatorApplication
public class ServerApplication {
public static void main(String[] args) {
SpringApplication.run(ServerApplication.class, args);
}
}
Log:
2020-08-03 10:59:18.250 INFO 7712 --- [ main] o.a.g.d.i.InternalLocator : Locator started on 10.25.209.139[8081]
2020-08-03 10:59:18.250 INFO 7712 --- [ main] o.a.g.d.i.InternalLocator : Starting server location for Distribution Locator on LB183054.dmn1.fmr.com[8081]
2020-08-03 10:59:18.383 INFO 7712 --- [ main] c.f.g.l.LocatorSpringApplication : Started LocatorSpringApplication in 8.496 seconds (JVM running for 9.318)
2020-08-03 10:59:18.385 INFO 7712 --- [m shutdown hook] o.a.g.d.i.InternalDistributedSystem : VM is exiting - shutting down distributed system
2020-08-03 10:59:18.395 INFO 7712 --- [m shutdown hook] o.a.g.i.c.GemFireCacheImpl : GemFireCache[id = 1329087972; isClosing = true; isShutDownAll = false; created = Mon Aug 03 10:59:15 EDT 2020; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.
2020-08-03 10:59:18.416 INFO 7712 --- [m shutdown hook] o.a.g.d.i.ClusterDistributionManager : Shutting down DistributionManager 10.25.209.139(locator1:7712:locator)<ec><v0>:41000.
2020-08-03 10:59:18.517 INFO 7712 --- [m shutdown hook] o.a.g.d.i.ClusterDistributionManager : Now closing distribution for 10.25.209.139(locator1:7712:locator)<ec><v0>:41000
2020-08-03 10:59:18.518 INFO 7712 --- [m shutdown hook] o.a.g.d.i.m.g.Services : Stopping membership services
2020-08-03 10:59:18.518 INFO 7712 --- [ip View Creator] o.a.g.d.i.m.g.Services : View Creator thread is exiting
2020-08-03 10:59:18.520 INFO 7712 --- [Server thread 1] o.a.g.d.i.m.g.Services : GMSHealthMonitor server thread exiting
2020-08-03 10:59:18.536 INFO 7712 --- [m shutdown hook] o.a.g.d.i.ClusterDistributionManager : DistributionManager stopped in 120ms.
2020-08-03 10:59:18.537 INFO 7712 --- [m shutdown hook] o.a.g.d.i.ClusterDistributionManager : Marking DistributionManager 10.25.209.139(locator1:7712:locator)<ec><v0>:41000 as closed.
Yes, this is the expected behavior, OOTB.
Most Apache Geode processes (clients (i.e. ClientCache), Locators, Managers and "peer" Cache nodes/members of a cluster/distributed system) only create daemon Threads (i.e. non-blocking Threads). Therefore, the Apache Geode JVM process will startup, initialize itself and then shutdown immediately.
Only an Apache Geode CacheServer process (a "peer" Cache that has a CacheServer component to listen for client connections), starts and continues to run. That is because the ServerSocket used to listen for client Socket connections is created on a non-daemon Thread (i.e. blocking Thread), which prevents the JVM process from shutting down. Otherwise, a CacheServer would fall straight through as well.
You might be thinking, well, how does Gfsh prevent Locators (i.e. using the start locator command) and "servers" (i.e. using the start server command) from shutting down?
NOTE: By default, Gfsh creates a CacheServer instance when starting a GemFire/Geode server using the start server command. The CacheServer component of the "server" can be disabled by specifying the --disable-default-server option to the start server command. In this case, this "server" will not be able to serve clients. Still the peer node/member will continue to run, but not without extra help. See here for more details on the start server Gfsh command.
So, how does Gfsh prevent the processes from falling through?
Under-the-hood, Gfsh uses the LocatorLauncher and ServerLauncher classes to configure and fork the JVM processes to launch Locators and servers, respectively.
By way of example, here is Gfsh's start locator command using the LocatorLauncher class. Technically, it uses the configuration from the LocatorLauncher class instance to construct (and specifically, here) the java command-line used to fork and launch (and specifically, here) a separate JVM process.
However, the key here is the specific "command" passed to the LocatorLauncher class when starting the Locator, which is the START command (here).
In the LocatorLauncher class, we see that the START command does the following, from the main method, to the run method, it starts the Locator, then waitsOnLocator (with implementation).
Without the wait, the Locator would fall straight through as you are experiencing.
You can simulate the same effect (i.e. "falling straight through") using the following code, which uses the Apache Geode API to configure and launch a Locator (in-process).
public class ApacheGeodeLocatorApplication {
public static void main(String[] args) {
LocatorLauncher locatorLauncher = new LocatorLauncher.Builder()
.set("jmx-manager", "true")
.set("jmx-manager-port", "0")
.set("jmx-manager-start", "true")
.setMemberName("ApacheGeodeBasedLocator")
.setPort(0)
.build();
locatorLauncher.start();
//locatorLauncher.waitOnLocator();
}
}
This simple little program will fall straight through. However, if you uncomment locatorLaucncher.waitOnLocator(), then the JVM process will block.
This is not unlike what SDG's LocatorFactoryBean class (see source) is doing actually. It, too, uses the LocatorLauncher class to configure and bootstrap a Locator in-process. The LocatorFactoryBean is the class used to configure and bootstrap a Locator when declaring the SDG #LocatorApplication annotation on your #SpringBootApplication class.
However, I do think there is room for improvement, here. Therefore, I have filed DATAGEODE-361.
In the meantime, and as a workaround, you can achieve the same effect of a blocking Locator by having a look at the Smoke Test for the same in Spring Boot for Apache Geode (SBDG) project. See here.
However, after DATAGEODE-361 is complete, the extra logic preventing the Locator JVM process from shutting down will no longer be necessary.

How to configure IdentityServer4 Client for production behind a Reverse Proxy to address "Correlation failed. Unknown location" Error

IdentityServer4 client through an IIS reverse proxy server getting “Exception: Correlation failed. Unknown location”.
We have a .NET Core MVC application that authenticates with our Identity server 4 application. This is working well.
However we need to deploy this into an environment where the application server and the Identity server have to be accessed through a reverse proxy server.
In our client we change the OIDC Host of the Authority option to the reverse proxy server of our Identity server.
In Identity server we change the Host of the redirect Uri for this client to point to the reverse proxy for our client.
We have configured the client to pass the redirect Uri with the hostname of the clients reverse proxy server.
With this configuration when an unauthorized user access the client, the user is redirected to login on the Identity server.
The login is successful but the user gets the following error when returning to the client application: “Exception: Correlation failed. Unknown location”.
However the use has been successfully authenticated and can access the client application.
.AddOpenIdConnect("oidc", options =>
{
options.SignInScheme = "Cookies";
options.Authority = Configuration.GetSection("Uris").GetSection("IdentityServer").Value;
options.RequireHttpsMetadata = false;
options.ClientId = "mvc";
options.ClientSecret = "secret";
options.ResponseType = "code id_token";
options.SaveTokens = true;
options.Scope.Add("api1");
options.Scope.Add("openid");
options.Scope.Add("profile");
options.Scope.Add("offline_access");
options.ClaimActions.MapJsonKey("website", "website");
options.Events.OnRedirectToIdentityProvider = async n =>
{
n.ProtocolMessage.RedirectUri = Configuration.GetSection("Uris").GetSection("RedirectUri").Value;
await Task.FromResult(0);
};
Error return in browser:
An unhandled exception occurred while processing the request.
Exception: Correlation failed.
Unknown location
Exception: An error was encountered while handling the remote login.
Microsoft.AspNetCore.Authentication.RemoteAuthenticationHandler+d__12.MoveNext()
Error Logging Information:
Application started. Press Ctrl+C to shut down.
warn: Microsoft.AspNetCore.Authentication.OpenIdConnect.OpenIdConnectHandler[15]
'.AspNetCore.Correlation.oidc.pz2cS4-GHvVSgHgHOJQQTWa8dL_CDKjEBAGqA4Sg-RY' cookie not found.
fail: Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddleware[0]
An unhandled exception has occurred while executing the request
System.Exception: An error was encountered while handling the remote login. ---> System.Exception: Correlation failed.
--- End of inner exception stack trace ---
at Microsoft.AspNetCore.Authentication.RemoteAuthenticationHandler`1.d__12.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at IdentityServer4.Hosting.FederatedSignOut.AuthenticationRequestHandlerWrapper.d__6.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.d__6.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddleware.d__7.MoveNext()
When '.AddOpenIdConnect' is called, an OpenIdConnectHandler gets registered through ASP.NET Core's dependency injection system and will get invoked with the options that you configure in the lambda.
This OpenIdConnectHandler inherits from the RemoteAuthenticationHandler which is where the method to generate and validate correlation ids lives. A correlation id is a random string that's set as the name of a cookie, and is set and verified on your browser to ensure you are using the same user-agent that initiated the login.
When the correlation cookie is set but not received, the OpenIdConnectHandler will fail the request. This failed request ultimately gets handled by the RemoteAuthenticationHandler calling the 'Events.RemoteFailed' delegate on the OpenIdConnectOptions (the RemoteAuthenticationHandler knows about the events because the OpenIdConnectOptions is actually a derived class of RemoteAuthenticationOptions).
To handle this authentication failure more gracefully, you can find a way to identify this failure within the OpenIdConnectOptions' and potentially show a session expired page, or find a way to automatically re-authenticate.
TLDR: ASP.NET Core sets a correlation cookie with a fairly short expiration time that fails the authentication if too much time is taken between the redirect to login and the code exchange. To handle it gracefully you can set the 'RemoteFailure' on the option's event property.

Patch Orchestration Application issue - NodeAgentSFUtility.exe crashing

so I'm working on getting POA going. The issue I'm running into is that as soon as the Node Agent NT Service (POSNodeSvc) starts, it runs NodeAgentSFUtility.exe which then fails with the below exception and an HRESULT of 80071c43 which seems to mean "connection denied". No logs are present. They both runs as SYSTEM . Running this on an on prem cluster using Windows security. BTW, all the SF services for POA are showing green in the SF Explorer, so it seems that there perhaps is room for better health reporting around this exe not running correctly.
Application: NodeAgentSFUtility.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Runtime.InteropServices.COMException
at System.Fabric.Interop.NativeClient+IFabricQueryClient9.EndGetApplicationList2(IFabricAsyncOperationContext)
at System.Fabric.FabricClient+QueryClient.GetApplicationListAsyncEndWrapper(IFabricAsyncOperationContext)
at System.Fabric.Interop.AsyncCallOutAdapter2`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Finish(IFabricAsyncOperationContext, Boolean)
Exception Info: System.Fabric.FabricConnectionDeniedException
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
at Microsoft.ServiceFabric.PatchOrchestration.NodeAgentSFUtility.Helpers.CoordinatorServiceHelper+<GetApplicationDeployedStatusAsync>d__1.MoveNext()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
at Microsoft.ServiceFabric.PatchOrchestration.NodeAgentSFUtility.CommandProcessor+<GetApplicationDeployedStatusAsync>d__10.MoveNext()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
at Microsoft.ServiceFabric.PatchOrchestration.NodeAgentSFUtility.CommandProcessor+<ProcessArguments>d__5.MoveNext()
Exception Info: System.AggregateException
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean)
at System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken)
at Microsoft.ServiceFabric.PatchOrchestration.NodeAgentSFUtility.Program.Main(System.String[])
I was able to make this work by adding the following to the cluster manifest:
"ClientIdentities": [
{
"Identity": "NT AUTHORITY\\SYSTEM",
"IsAdmin": true
}
]
Not quite sure if this really is needed? Can someone please confirm. There is no mention of this in the POA docs - https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-patch-orchestration-application
Thanks,
Hans
There appears to be a POA fix coming to address this. See link in above comment.

Failed to Create Development Service Fabric Cluster on Windows Server 2016 Standard

I am attempting to create a local development (unsecured) Service Fabric Cluster on Windows Server 2016 Standard. I have followed the instructions found in this article. However, I'm getting a rather interesting error and cannot find anything to help me resolve this.
FabricHostSvc was not installed by FabricInstallerSvc on machine
localhost. FabricSetup may have failed. CreateCluster Error:
System.AggregateException: One or more errors occurred. --->
System.Fabric.FabricServiceNotFoundExc eption: FabricHostSvc was not
installed by FabricInstallerSvc on machine localhost. FabricSetup may
have failed. at
Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(Str
ing machineName, ServiceController installerSvc) at
System.Threading.Tasks.Parallel.<>c__DisplayClass17_01.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at
System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object
) --- End of inner exception stack trace --- at
System.Threading.Tasks.Task.ThrowIfExceptional(Boolean
includeTaskCanceledExceptions) at
System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout,
CancellationToken cancellationToken) at
System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive,
Int32 toExclusive, ParallelOptions parallel Options, Action1 body,
Action2 bodyWithState, Func4 bodyWithLocal, Func1 localInit,
Action1 localFinally) at
System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable1
source, ParallelOptions parallelOption s, Action1 body, Action2
bodyWithState, Action3 bodyWithStateAndIndex, Func4
bodyWithStateAndLocal, Func5 bodyWithE verything, Func1 localInit,
Action1 localFinally) at
System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable1 source,
Action1 body) at
Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.RunFabricServices(List1
machines, FabricPacka geType fabricPackageType) at
Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.<CreateClusterAsyncInternal>d__7.MoveNext()
---> (Inner Exception #0) System.Fabric.FabricServiceNotFoundException: FabricHostSvc was not
installed by FabricInstall erSvc on machine localhost. FabricSetup may
have failed. at
Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(Str
ing machineName, ServiceController installerSvc) at
System.Threading.Tasks.Parallel.<>c__DisplayClass17_01.b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at
System.Threading.Tasks.Task.<>c__DisplayClass176_0.b__0(Object
)<---
Cleaning up faulted installation. FabricRoot not found in registry of
target machine localhost. Create Cluster failed. For more information
please look at traces in FabricLogRoot. Create Cluster failed with
exception: System.AggregateException: One or more errors occurred.
---> System.AggregateExcep tion: One or more errors occurred. at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.d__7.MoveNext()
--- End of stack trace from previous location where exception was thrown --- at
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task
task) at
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
task) at
Microsoft.ServiceFabric.DeploymentManager.DeploymentManager.d__0.MoveNext()
--- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean
includeTaskCanceledExceptions) at
System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout,
CancellationToken cancellationToken) at
Microsoft.ServiceFabric.Powershell.ClusterCmdletBase.NewCluster(String
clusterConfigurationFilePath, String fabric PackageSourcePath, Boolean
cleanupOnFailure)
---> (Inner Exception #0) System.AggregateException: One or more errors occurred. at
Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.d__7.MoveNext()
--- End of stack trace from previous location where exception was thrown --- at
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task
task) at
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
task) at
Microsoft.ServiceFabric.DeploymentManager.DeploymentManager.d__0.MoveNext()<---
Has anyone encountered this error before and fixed it? How is this error resolved?
Side Note: After receiving this error I ran the CleanFabric PowerShell script and removed all the Service Fabric files from the server and tried running the installation again with the same error message.
In addition, there are no Service Fabric SDKs installed on the machine (the ones you'd use on a local development machine). The reason for this is due to the official prerequisites stated by Microsoft shown below.
Prerequisites for each machine that you want to add to the cluster:
1. A minimum of 16 GB of RAM is recommended.
2. A minimum of 40 of GB available disk space is recommended.
3. A 4 core or greater CPU is recommended.
4. Connectivity to a secure network or networks for all machines.
5. Windows Server 2012 R2 or Windows Server 2012 (you need to
have KB2858668 installed).
6. .NET Framework 4.5.1 or higher, full install.
7. Windows PowerShell 3.0. The RemoteRegistry service should be running on all the machines.
The cluster administrator deploying and configuring the cluster must have administrator privileges on each of the machines. You cannot install Service Fabric on a domain controller.
I cannot help but feel there is something obvious missing but I've followed the docs very closely so this is rather perplexing.
Service Fabric drivers have a signing issue which is preventing them from being installed on Windows Server 2016 and Windows 10 Anniversary edition. Please wait for the next version or try with version 5.2.