Azure service fabric node type instance count doubled on creating cluster using ARM - azure-service-fabric

I'm experimenting on creating a new service fabric cluster using ARM template and modify the template to add certificates, etc. The cluster and all resources are successfully created, but I noticed that initially the number of node instances are 2x, plus 1 than what I set to. For example, if I set "vmInstanceCount" to 3, I see 7 instances are currently creating.
But if I just wait and let them finish, then 4 instances were deleted and it will keep the three instances. One problem here is that it randomly select what to keep, thus, the names to keep can be node_1, node_4, node_6 which is messy.
Here's my snippet of nodeType:
"nodeTypes": [
{
"name": "[variables('vmNodeType0Name')]",
"applicationPorts": {
"endPort": 30000,
"startPort": 20000
},
"clientConnectionEndpointPort": "[variables('fabricTcpGatewayPort')]",
"ephemeralPorts": {
"endPort": 65534,
"startPort": 49152
},
"httpGatewayEndpointPort": "[variables('fabricHttpGatewayPort')]",
"isPrimary": true,
"vmInstanceCount": "[variables('vmInstanceCount')]",
"reverseProxyEndpointPort": "[variables('reverseProxyEndpointPort')]",
"durabilityLevel": "Bronze"
}
]
...
"sku": {
"name": "[variables('vmssSkuName')]",
"capacity": "[variables('vmssSkuCapacity')]",
"tier": "Standard"
}

I was talking to a Microsoft support earlier and this issue is actually a new feature as we can see here https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-design-overview#overprovisioning
I will close this issue as I found the answer. However, I still have some concern on the naming part but I will throw that question to MS.

Related

pod identity on aks cluster crreation

Right now, it's impossible to have assigned user assigned identities on arm templates (and terraform) on cluster creation. I already tried a lot of things, and updates works great, after inserting manually with:
az aks pod-identity add --cluster-name my-aks-cn --resource-group myrg --namespace myns --name example-pod-identity --identity-resource-id /subscriptions/......
But, I want to have this done at once, with the deployment, so I need to insert the pod user identities to the cluster automatically. I also tried to run the command using the DeploymentScripts but the deployment scripts are not ready to use preview aks extersion.
My config looks like this:
{
"type": "Microsoft.ContainerService/managedClusters",
"apiVersion": "2021-02-01",
"name": "[variables('cluster_name')]",
"location": "[variables('location')]",
"dependsOn": [
"[resourceId('Microsoft.Network/virtualNetworks', variables('vnet_name'))]"
],
"properties": {
....
"podIdentityProfile": {
"allowNetworkPluginKubenet": null,
"enabled": true,
"userAssignedIdentities": [
{
"identity": {
"clientId": "[reference(resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', 'managed-indentity'), '2018-11-30').clientId]",
"objectId": "[reference(resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', 'managed-indentity'), '2018-11-30').principalId]",
"resourceId": "[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', 'managed-indentity')]"
},
"name": "managed-indentity",
"namespace": "myns"
}
],
"userAssignedIdentityExceptions": null
},
....
},
"identity": {
"type": "SystemAssigned"
}
},
I'm always getting the same issue:
"statusMessage": "{\"error\":{\"code\":\"InvalidTemplateDeployment\",\"message\":\"The template deployment 'deployment_test' is not valid according to the validation procedure. The tracking id is '.....'. See inner errors for details.\",\"details\":[{\"code\":\"PodIdentityAddonUserAssignedIdentitiesNotAllowedInCreation\",\"message\":\"Provisioning of resource(s) for container service cluster-12344 in resource group myrc failed. Message: {\\n \\\"code\\\": \\\"PodIdentityAddonUserAssignedIdentitiesNotAllowedInCreation\\\",\\n \\\"message\\\": \\\"PodIdentity addon does not support assigning pod identities on creation.\\\"\\n }. Details: \"}]}}",
The Product team has shared the answer here: https://github.com/Azure/aad-pod-identity/issues/1123
which says:
This is a known limitation in the existing configuration. We will fix
this in the V2 implementation.
For others who are facing the same issue, please refer to the GitHub issue above.

A quiestion about mcrouter, WarmUpRoute handle can not set multiple cold servers

I wanna use WarmUpRoute to prepare two cold memcached data node, I use this config:
{
"pools": {
"cold": {
"servers": ["xxxxx:11212", "xxxx:11213"]
},
"warm": {
"servers": ["xxxxx:11211"]
}
},
"route": {
"type": "WarmUpRoute",
"cold": "PoolRoute|cold",
"warm": "PoolRoute|warm"
}
}
But when I do some test, I connet mcrouter and set some data, found that only 1 cold server and warm server are success to save data, another cold node cannot set data successfuly. I am confuse, my config has some problem or is this a bug?

Standalone Service Fabric - AWS - FileStoreService - Copy-ServiceFabricApplicationPackage Fails

I have a 3 node standalone windows service fabric setup in AWS. The TestConfiguration and CreateCluster scripts run successfully, however on attempting to deploy any applications into the cluster I get the following error from powershell.
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .\pkg\<packagename> -ImageStoreConnectionString fabric:ImageStore
Copy-ServiceFabricApplicationPackage : An error occurred during this operation. Please check the trace logs for more
details.
At line:1 char:1
+ Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .\pkg\ ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Copy-ServiceFabricApplicationPackage], FabricException
+ FullyQualifiedErrorId : CopyApplicationPackageErrorId,Microsoft.ServiceFabric.Powershell.CopyApplicationPackage
Not sure which trace logs would be useful in diagnosing the error, however checking the windows event log on one of the nodes I see the following errors, all for the FileStoreService.
ImpersonateAndCopyFile for SourcePath:\\<ipaddress>\StoreShare_Node3\131601795137630192\6.0.232.9494_0\131601794828730764_8589934592_1.ClusterManifest.xml, DestinationPath:C:\ProgramData\SF\Node1\Fabric\work\Applications\__FabricSystem_App4294967295\work\Store\131601795317314061\6.0.232.9494_0\131601794828730764_8589934592_1.ClusterManifest.xml failed: 0x8007052e. Have tried all access tokens.
CopyFile: SourcePath:\\<ip address>\StoreShare_Node3\131601795137630192\6.0.232.9494_0\131601794828730764_8589934592_1.ClusterManifest.xml, DestinationPath:C:\ProgramData\SF\Node1\Fabric\work\Applications\__FabricSystem_App4294967295\work\Store\131601795317314061\6.0.232.9494_0\131601794828730764_8589934592_1.ClusterManifest.xml, Error:0x8007052e, ElapsedTime:80
CopyFile: no new token is found. current token count: 2
Any ideas what this could be? I have recreated a new cluster with no security, firewall has all ports opened both in AWS and on the node machines (trying to remove all things that could be blocking the copying). Within AWS am using SimpleAD so all nodes are running with the same AD administrator, and can communicate to create the cluster.
Below is the cluster config I'm using, kept it as simple as I could to try to limit the causes of the problems.
Any help with diagnosing the copy file issues, or even pointing me at the relevant trace logs would be great.
Additionally I notice the ImageStoreService is showing warnings within Service Fabric Explorer
Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false.
Partition reconfiguration is taking longer than expected.
ImageStoreService 3 3 00000000-0000-0000-0000-000000003000
P/P Ready Node3 131601795137630192
S/S InBuild Node1 131601795317314061
S/S InBuild Node2 131601795317314062
(Showing 3 out of 3 replicas. Total available replicas: 1)
EDIT
Additional Information
On investigating the problem more I ran the Copy-ServiceFabricApplicationPackage with -Debug flag and it now gives the below error, suggesting the user name or password being used to either upload the package from my computer into the cluster, or for the cluster to distribute node to node is incorrect. I presume for node to node it is using the local accounts it creates ending in fffff for which I don't know why it would be creating invalid user credentials. If its between the computer uploading the package and the cluster, then currently I'm running with no security turned on, so don't know why this would be an issue?? Any help much appreciated.
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath ..\pkg\Release -ImageStoreConnectionString fabric:imagestore -Debug
VERBOSE: System.Fabric.FabricException: An error occurred during this operation. Please check the trace logs for more details. ---> System.Runtime.InteropServices.COMException: The user name or password is incorrect. (Exception from HRESULT: 0x8007052E)
Thanks
{
"name": "SampleCluster",
"clusterConfigurationVersion": "1.0.0",
"apiVersion": "08-2017",
"nodes": [
{
"nodeName": "Node1",
"iPAddress": "<node 1 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/0",
"upgradeDomain": "UD0"
},
{
"nodeName": "Node2",
"iPAddress": "<node 2 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/1",
"upgradeDomain": "UD1"
},
{
"nodeName": "Node3",
"iPAddress": "<node 3 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/2",
"upgradeDomain": "UD2"
}
],
"properties": {
"diagnosticsStore": {
"metadata": "Please replace the diagnostics store with an actual file share accessible from all cluster machines.",
"dataDeletionAgeInDays": "7",
"storeType": "FileShare",
"IsEncrypted": "false",
"connectionstring": "c:\\ProgramData\\SF\\DiagnosticsStore"
},
"nodeTypes": [
{
"name": "StandardNodeType",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpointPort": "19001",
"leaseDriverEndpointPort": "19002",
"serviceConnectionEndpointPort": "19003",
"httpGatewayEndpointPort": "19080",
"reverseProxyEndpointPort": "19081",
"applicationPorts": {
"startPort": "20000",
"endPort": "30000"
},
"ephemeralPorts": {
"startPort": "49152",
"endPort": "65534"
},
"isPrimary": true
}
],
"fabricSettings": [
{
"name": "Setup",
"parameters": [
{
"name": "FabricDataRoot",
"value": "C:\\ProgramData\\SF"
},
{
"name": "FabricLogRoot",
"value": "C:\\ProgramData\\SF\\Log"
}
]
}
],
"addOnFeatures": [
"DnsService",
"RepairManager"
]
}
}
After more investigating, I discovered it was due to not correctly enabling File Sharing on the windows boxes. Although shown as enabled within the Properties of the Network Adaptor. I failed to realise the settings needed to be enabled under the Advanced Sharing Centre Options (Control Panel\Network and Internet\Network and Sharing Center\Advanced sharing settings).

Drools stateful session per request

We are trying to use Drool as our rule engine service. What we done till now is listed below
Deployed workbench 7.2.Final
Deployed KIE server 7.2.0.Final
Configured some data objects, rules, deployed the changes to KIE server and we are able to execute the rule using rest API
Most of our requirements satisfied by stateless session (Give a set of data, execute the rule and return the data, that's it) . But using stateless we have to compromise many of the important features provided by Drools stateful session.
So we are trying to use stateful session per request. Which means the session should get disposed as soon as the request end. Also, parallel request should not interfere each other even if the session name is same
We found about container runtime strategy configuration (Workbench > Deploy > {any container} > Process Configuration > Runtime strategy)
But even after configure the container strategy to Per Request, it still behave same as Singleton (the session is not getting disposed after each request)
Few place we read it as, run time strategy only implemented in jBPM
The way we make request to KIE server is shown below
Request: POST {HOST}/kie-server/services/rest/server/containers/instances/TestRequest_1.0.4
{
"lookup": "ab-session", //stateful session
"commands": [
{
"insert": {
"out-identifier": "125",
"object": {
"com.myteam.testrequest.Product": {
"id": "123",
"name": "Hoo Hoo",
"count": 0
}
},
"return-object": "true"
}
},
{
"insert": {
"out-identifier": "126",
"object": {
"com.myteam.testrequest.Product": {
"id": "123",
"name": "Hoo Hoo",
"count": 0
}
},
"return-object": "true"
}
},
{"fire-all-rules": "hf2"}
]
}
We need help in achieving this requirement. Also, please help understand if we done something wrong
In kmodule.xml you may try to add "prototype" scope, because default is "singleton":
<ksession name="SessionName" type="stateful" default="false" clockType="realtime" scope="prototype"/>

How to add a ETW provider to an existing service fabric cluster using powershell?

I have already created a service fabric cluster with azure diagnostics and it is functional currently with my services deployed into that cluster. I have an ETW EventSource in my service that I would like to start collecting events from because my service code already uses this event source to write my service related events. Since the cluster is already enabled for azure diagnostics and my services are already deployed into that cluster, I think it is a simple matter of updating the ETW provider with my event source in this service fabric cluster. Here is the exported template (only a partial is shown that is relevant for azure diagnostics):
{
"properties": {
"publisher": "Microsoft.Azure.Diagnostics",
"type": "IaaSDiagnostics",
"typeHandlerVersion": "1.5",
"autoUpgradeMinorVersion": true,
"settings": {
"WadCfg": {
"DiagnosticMonitorConfiguration": {
"overallQuotaInMB": "50000",
"EtwProviders": {
"EtwEventSourceProviderConfiguration": [
{
"provider": "Microsoft-ServiceFabric-Actors",
"scheduledTransferKeywordFilter": "1",
"scheduledTransferPeriod": "PT5M",
"DefaultEvents": {
"eventDestination": "ServiceFabricReliableActorEventTable"
}
},
{
"provider": "Microsoft-ServiceFabric-Services",
"scheduledTransferPeriod": "PT5M",
"DefaultEvents": {
"eventDestination": "ServiceFabricReliableServiceEventTable"
}
},
{
"provider": "Bb.ServiceFabric.Infrastructure.Container",
"scheduledTransferPeriod": "PT1M",
"DefaultEvents": {
"eventDestination": "ServiceFabricReliableServiceEventTable"
}
}
],
"EtwManifestProviderConfiguration": [
{
"provider": "cbd93bc2-71e5-4566-b3a7-595d8eeca6e8",
"scheduledTransferLogLevelFilter": "Information",
"scheduledTransferKeywordFilter": "4611686018427387904",
"scheduledTransferPeriod": "PT5M",
"DefaultEvents": {
"eventDestination": "ServiceFabricSystemEventTable"
}
}
]
}
}
},
"StorageAccount": "sfdgsmsraghuplaygrou6827"
}
},
"name": "VMDiagnosticsVmExt_vmNodeType0Name"
}
I would like to update following EtwProviders/EtwEventSourceProviderConfiguration to contain following section (as MyCompany.MyServices.MyStatelessService is the name of my service's EventSource):
{
"provider": "MyCompany.MyServices.MyStatelessService",
"scheduledTransferPeriod": "PT5M",
"DefaultEvents": {
"eventDestination": "ServiceFabricReliableServiceEventTable"
}
}
Here are my questions:
Is this the correct way of inserting an ETW provider/EventSource (from my service) into an existing cluster (that is already enabled with azure diagnostics)?
Can I add this event source (as a ETW event source provider) using a powershell command(s)?
If so, what is the exact powershell command (using all the information from the above code fragment)?
Note: I am using .net framework 4.5.2.
All seems good with the added configuration above. Just be aware that for ETWProviders the EventDestination cannot contain hyphens (-), yours don't so you are ok.
To update the Windows Azure Diagnostics (WAD) agent configuration, you can use either PowerShell or Cloud Explorer in Visual Studio.
For the former, simply update the ARM template and use the New-AzureRmResourceGroupDeployment cmdlet. See here for further information: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-diagnostics-how-to-setup-wad/#update-diagnostics-to-collect-and-upload-logs-from-new-eventsource-channels
For using Cloud Explorer in Visual Studio. Browse to your Virtual Machine Scale Set (as this is the Azure resource that holds the WAD configuration). Right-click and choose Update Diagnostics. In the dialog shown, you have the option to upload a private and public configuration file. Simple take a .json document containing the {"WadCfg": {}} element, and upload that as a public configuration.
If you need to update the private configuration specifies the storage account name and AccessKey:
{
"storageAccountName": "",
"storageAccountKey": "",
"storageAccountEndPoint": "https://core.windows.net",
}
Hope this helps.
Mikkel