Standalone Service Fabric - AWS - FileStoreService - Copy-ServiceFabricApplicationPackage Fails - azure-service-fabric

I have a 3 node standalone windows service fabric setup in AWS. The TestConfiguration and CreateCluster scripts run successfully, however on attempting to deploy any applications into the cluster I get the following error from powershell.
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .\pkg\<packagename> -ImageStoreConnectionString fabric:ImageStore
Copy-ServiceFabricApplicationPackage : An error occurred during this operation. Please check the trace logs for more
details.
At line:1 char:1
+ Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .\pkg\ ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Copy-ServiceFabricApplicationPackage], FabricException
+ FullyQualifiedErrorId : CopyApplicationPackageErrorId,Microsoft.ServiceFabric.Powershell.CopyApplicationPackage
Not sure which trace logs would be useful in diagnosing the error, however checking the windows event log on one of the nodes I see the following errors, all for the FileStoreService.
ImpersonateAndCopyFile for SourcePath:\\<ipaddress>\StoreShare_Node3\131601795137630192\6.0.232.9494_0\131601794828730764_8589934592_1.ClusterManifest.xml, DestinationPath:C:\ProgramData\SF\Node1\Fabric\work\Applications\__FabricSystem_App4294967295\work\Store\131601795317314061\6.0.232.9494_0\131601794828730764_8589934592_1.ClusterManifest.xml failed: 0x8007052e. Have tried all access tokens.
CopyFile: SourcePath:\\<ip address>\StoreShare_Node3\131601795137630192\6.0.232.9494_0\131601794828730764_8589934592_1.ClusterManifest.xml, DestinationPath:C:\ProgramData\SF\Node1\Fabric\work\Applications\__FabricSystem_App4294967295\work\Store\131601795317314061\6.0.232.9494_0\131601794828730764_8589934592_1.ClusterManifest.xml, Error:0x8007052e, ElapsedTime:80
CopyFile: no new token is found. current token count: 2
Any ideas what this could be? I have recreated a new cluster with no security, firewall has all ports opened both in AWS and on the node machines (trying to remove all things that could be blocking the copying). Within AWS am using SimpleAD so all nodes are running with the same AD administrator, and can communicate to create the cluster.
Below is the cluster config I'm using, kept it as simple as I could to try to limit the causes of the problems.
Any help with diagnosing the copy file issues, or even pointing me at the relevant trace logs would be great.
Additionally I notice the ImageStoreService is showing warnings within Service Fabric Explorer
Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false.
Partition reconfiguration is taking longer than expected.
ImageStoreService 3 3 00000000-0000-0000-0000-000000003000
P/P Ready Node3 131601795137630192
S/S InBuild Node1 131601795317314061
S/S InBuild Node2 131601795317314062
(Showing 3 out of 3 replicas. Total available replicas: 1)
EDIT
Additional Information
On investigating the problem more I ran the Copy-ServiceFabricApplicationPackage with -Debug flag and it now gives the below error, suggesting the user name or password being used to either upload the package from my computer into the cluster, or for the cluster to distribute node to node is incorrect. I presume for node to node it is using the local accounts it creates ending in fffff for which I don't know why it would be creating invalid user credentials. If its between the computer uploading the package and the cluster, then currently I'm running with no security turned on, so don't know why this would be an issue?? Any help much appreciated.
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath ..\pkg\Release -ImageStoreConnectionString fabric:imagestore -Debug
VERBOSE: System.Fabric.FabricException: An error occurred during this operation. Please check the trace logs for more details. ---> System.Runtime.InteropServices.COMException: The user name or password is incorrect. (Exception from HRESULT: 0x8007052E)
Thanks
{
"name": "SampleCluster",
"clusterConfigurationVersion": "1.0.0",
"apiVersion": "08-2017",
"nodes": [
{
"nodeName": "Node1",
"iPAddress": "<node 1 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/0",
"upgradeDomain": "UD0"
},
{
"nodeName": "Node2",
"iPAddress": "<node 2 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/1",
"upgradeDomain": "UD1"
},
{
"nodeName": "Node3",
"iPAddress": "<node 3 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/2",
"upgradeDomain": "UD2"
}
],
"properties": {
"diagnosticsStore": {
"metadata": "Please replace the diagnostics store with an actual file share accessible from all cluster machines.",
"dataDeletionAgeInDays": "7",
"storeType": "FileShare",
"IsEncrypted": "false",
"connectionstring": "c:\\ProgramData\\SF\\DiagnosticsStore"
},
"nodeTypes": [
{
"name": "StandardNodeType",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpointPort": "19001",
"leaseDriverEndpointPort": "19002",
"serviceConnectionEndpointPort": "19003",
"httpGatewayEndpointPort": "19080",
"reverseProxyEndpointPort": "19081",
"applicationPorts": {
"startPort": "20000",
"endPort": "30000"
},
"ephemeralPorts": {
"startPort": "49152",
"endPort": "65534"
},
"isPrimary": true
}
],
"fabricSettings": [
{
"name": "Setup",
"parameters": [
{
"name": "FabricDataRoot",
"value": "C:\\ProgramData\\SF"
},
{
"name": "FabricLogRoot",
"value": "C:\\ProgramData\\SF\\Log"
}
]
}
],
"addOnFeatures": [
"DnsService",
"RepairManager"
]
}
}

After more investigating, I discovered it was due to not correctly enabling File Sharing on the windows boxes. Although shown as enabled within the Properties of the Network Adaptor. I failed to realise the settings needed to be enabled under the Advanced Sharing Centre Options (Control Panel\Network and Internet\Network and Sharing Center\Advanced sharing settings).

Related

pod identity on aks cluster crreation

Right now, it's impossible to have assigned user assigned identities on arm templates (and terraform) on cluster creation. I already tried a lot of things, and updates works great, after inserting manually with:
az aks pod-identity add --cluster-name my-aks-cn --resource-group myrg --namespace myns --name example-pod-identity --identity-resource-id /subscriptions/......
But, I want to have this done at once, with the deployment, so I need to insert the pod user identities to the cluster automatically. I also tried to run the command using the DeploymentScripts but the deployment scripts are not ready to use preview aks extersion.
My config looks like this:
{
"type": "Microsoft.ContainerService/managedClusters",
"apiVersion": "2021-02-01",
"name": "[variables('cluster_name')]",
"location": "[variables('location')]",
"dependsOn": [
"[resourceId('Microsoft.Network/virtualNetworks', variables('vnet_name'))]"
],
"properties": {
....
"podIdentityProfile": {
"allowNetworkPluginKubenet": null,
"enabled": true,
"userAssignedIdentities": [
{
"identity": {
"clientId": "[reference(resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', 'managed-indentity'), '2018-11-30').clientId]",
"objectId": "[reference(resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', 'managed-indentity'), '2018-11-30').principalId]",
"resourceId": "[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', 'managed-indentity')]"
},
"name": "managed-indentity",
"namespace": "myns"
}
],
"userAssignedIdentityExceptions": null
},
....
},
"identity": {
"type": "SystemAssigned"
}
},
I'm always getting the same issue:
"statusMessage": "{\"error\":{\"code\":\"InvalidTemplateDeployment\",\"message\":\"The template deployment 'deployment_test' is not valid according to the validation procedure. The tracking id is '.....'. See inner errors for details.\",\"details\":[{\"code\":\"PodIdentityAddonUserAssignedIdentitiesNotAllowedInCreation\",\"message\":\"Provisioning of resource(s) for container service cluster-12344 in resource group myrc failed. Message: {\\n \\\"code\\\": \\\"PodIdentityAddonUserAssignedIdentitiesNotAllowedInCreation\\\",\\n \\\"message\\\": \\\"PodIdentity addon does not support assigning pod identities on creation.\\\"\\n }. Details: \"}]}}",
The Product team has shared the answer here: https://github.com/Azure/aad-pod-identity/issues/1123
which says:
This is a known limitation in the existing configuration. We will fix
this in the V2 implementation.
For others who are facing the same issue, please refer to the GitHub issue above.

LDAP passport strategy for Hyperledegr composer

I have been trying to use LDAP passport strategy for authentication in hyperledger composer rest server. I am using below configuration for ldap passport:
export COMPOSER_PROVIDERS='{
"ldap": {
"provider":"ldap",
"authScheme":"ldap",
"module":"passport-ldapauth",
"authPath":"/auth/ldap",
"successRedirect":"/",
"failureRedirect":"/",
"server":"{
"url":"ldap://localhost:389",
"bindOn":"cn=admin,dc=example, dc=com",
"bindCredentials":"*****",
"searchBase":"ou=admin,dc=example,dc=com",
}"
}
}'
While starting composer-rest-server with authentication its showing error
SyntaxError: Unexpected token
in JSON at position 210
at JSON.parse (<anonymous>)
at Promise.then (/home/mfgteg/.nvm/versions/node/v8.9.3/lib/node_modules/composer-rest-server/server/server.js:127:34)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
I got the correct format. Thanks to a link I found on IBM site
I used the following configuration :
export COMPOSER_PROVIDERS='{
"ldap": {
"provider": "ldap",
"authScheme": "ldap",
"module": "passport-ldapauth",
"authPath": "/auth/ldap",
"successRedirect": "/",
"failureRedirect": "/",
"server": {
"url": "ldap://localhost:389",
"bindDn": "cn=admin,dc=example, dc=com",
"bindCredentials": "*****",
"searchBase": "ou=admin,dc=example,dc=com"
}
}
}'
However I am yet to figure out what to mention in "callbackURL".
Use the variable provided below. Just change the successRedirect and credentials as per your configuration. Also in case you are running Client application on some other machine, you may need to change the localhost in url to your machine IP address.
Note : I have tested this with open LDAP configured.
COMPOSER_PROVIDERS='{
"ldap": {
"provider": "ldap",
"authScheme": "ldap",
"module": "passport-ldapauth",
"authPath": "/auth/ldap",
"successRedirect": "Where you want to redirect",
"failureRedirect": "/ldap",
"session": true,
"json": true,
"LdapAttributeForLogin": "cn",
"LdapAttributeForUsername": "cn",
"server": {
"url": "ldap://localhost:389",
"bindDN": "cn=admin,dc=hsc,dc=com",
"bindCredentials": "xxxxx",
"searchBase": "ou=users,dc=hsc,dc=com",
"searchFilter": "(cn={{username}})"
}
}
}'

Azure service fabric node type instance count doubled on creating cluster using ARM

I'm experimenting on creating a new service fabric cluster using ARM template and modify the template to add certificates, etc. The cluster and all resources are successfully created, but I noticed that initially the number of node instances are 2x, plus 1 than what I set to. For example, if I set "vmInstanceCount" to 3, I see 7 instances are currently creating.
But if I just wait and let them finish, then 4 instances were deleted and it will keep the three instances. One problem here is that it randomly select what to keep, thus, the names to keep can be node_1, node_4, node_6 which is messy.
Here's my snippet of nodeType:
"nodeTypes": [
{
"name": "[variables('vmNodeType0Name')]",
"applicationPorts": {
"endPort": 30000,
"startPort": 20000
},
"clientConnectionEndpointPort": "[variables('fabricTcpGatewayPort')]",
"ephemeralPorts": {
"endPort": 65534,
"startPort": 49152
},
"httpGatewayEndpointPort": "[variables('fabricHttpGatewayPort')]",
"isPrimary": true,
"vmInstanceCount": "[variables('vmInstanceCount')]",
"reverseProxyEndpointPort": "[variables('reverseProxyEndpointPort')]",
"durabilityLevel": "Bronze"
}
]
...
"sku": {
"name": "[variables('vmssSkuName')]",
"capacity": "[variables('vmssSkuCapacity')]",
"tier": "Standard"
}
I was talking to a Microsoft support earlier and this issue is actually a new feature as we can see here https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-design-overview#overprovisioning
I will close this issue as I found the answer. However, I still have some concern on the naming part but I will throw that question to MS.

How to add a ETW provider to an existing service fabric cluster using powershell?

I have already created a service fabric cluster with azure diagnostics and it is functional currently with my services deployed into that cluster. I have an ETW EventSource in my service that I would like to start collecting events from because my service code already uses this event source to write my service related events. Since the cluster is already enabled for azure diagnostics and my services are already deployed into that cluster, I think it is a simple matter of updating the ETW provider with my event source in this service fabric cluster. Here is the exported template (only a partial is shown that is relevant for azure diagnostics):
{
"properties": {
"publisher": "Microsoft.Azure.Diagnostics",
"type": "IaaSDiagnostics",
"typeHandlerVersion": "1.5",
"autoUpgradeMinorVersion": true,
"settings": {
"WadCfg": {
"DiagnosticMonitorConfiguration": {
"overallQuotaInMB": "50000",
"EtwProviders": {
"EtwEventSourceProviderConfiguration": [
{
"provider": "Microsoft-ServiceFabric-Actors",
"scheduledTransferKeywordFilter": "1",
"scheduledTransferPeriod": "PT5M",
"DefaultEvents": {
"eventDestination": "ServiceFabricReliableActorEventTable"
}
},
{
"provider": "Microsoft-ServiceFabric-Services",
"scheduledTransferPeriod": "PT5M",
"DefaultEvents": {
"eventDestination": "ServiceFabricReliableServiceEventTable"
}
},
{
"provider": "Bb.ServiceFabric.Infrastructure.Container",
"scheduledTransferPeriod": "PT1M",
"DefaultEvents": {
"eventDestination": "ServiceFabricReliableServiceEventTable"
}
}
],
"EtwManifestProviderConfiguration": [
{
"provider": "cbd93bc2-71e5-4566-b3a7-595d8eeca6e8",
"scheduledTransferLogLevelFilter": "Information",
"scheduledTransferKeywordFilter": "4611686018427387904",
"scheduledTransferPeriod": "PT5M",
"DefaultEvents": {
"eventDestination": "ServiceFabricSystemEventTable"
}
}
]
}
}
},
"StorageAccount": "sfdgsmsraghuplaygrou6827"
}
},
"name": "VMDiagnosticsVmExt_vmNodeType0Name"
}
I would like to update following EtwProviders/EtwEventSourceProviderConfiguration to contain following section (as MyCompany.MyServices.MyStatelessService is the name of my service's EventSource):
{
"provider": "MyCompany.MyServices.MyStatelessService",
"scheduledTransferPeriod": "PT5M",
"DefaultEvents": {
"eventDestination": "ServiceFabricReliableServiceEventTable"
}
}
Here are my questions:
Is this the correct way of inserting an ETW provider/EventSource (from my service) into an existing cluster (that is already enabled with azure diagnostics)?
Can I add this event source (as a ETW event source provider) using a powershell command(s)?
If so, what is the exact powershell command (using all the information from the above code fragment)?
Note: I am using .net framework 4.5.2.
All seems good with the added configuration above. Just be aware that for ETWProviders the EventDestination cannot contain hyphens (-), yours don't so you are ok.
To update the Windows Azure Diagnostics (WAD) agent configuration, you can use either PowerShell or Cloud Explorer in Visual Studio.
For the former, simply update the ARM template and use the New-AzureRmResourceGroupDeployment cmdlet. See here for further information: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-diagnostics-how-to-setup-wad/#update-diagnostics-to-collect-and-upload-logs-from-new-eventsource-channels
For using Cloud Explorer in Visual Studio. Browse to your Virtual Machine Scale Set (as this is the Azure resource that holds the WAD configuration). Right-click and choose Update Diagnostics. In the dialog shown, you have the option to upload a private and public configuration file. Simple take a .json document containing the {"WadCfg": {}} element, and upload that as a public configuration.
If you need to update the private configuration specifies the storage account name and AccessKey:
{
"storageAccountName": "",
"storageAccountKey": "",
"storageAccountEndPoint": "https://core.windows.net",
}
Hope this helps.
Mikkel

Service Fabric .Net Framework 4.5.1 and 4.6

After changing the target framework from 4.5.1 to 4.6 the service in Auzure Fail, the local deployment is working.
Do I need to add .Net 4.6 support ? - I'm unable to find where I can see the frameworks available in my cluster in azure.
Thank you
ApplicationName :
fabric:/Lending20.Service.IdentityManagement AggregatedHealthState
: Error UnhealthyEvaluations :
Unhealthy services: 100% (1/1), ServiceType='IdentityManagementServiceType',
MaxPercentUnhealthyServices=0%.
Unhealthy service:
ServiceName='fabric:/Lending20.Service.IdentityManagement/Identity
ManagementService', AggregatedHealthState='Error'.
Unhealthy partitions: 100% (1/1),
MaxPercentUnhealthyPartitionsPerService=0%.
Unhealthy partition:
PartitionId='7c68b397-fda3-491d-9e17-921cd24217ca',
AggregatedHealthState='Error'.
Error event: SourceId='System.FM', Property='State'.
ServiceHealthStates :
ServiceName :
fabric:/Lending20.Service.IdentityManagement/IdentityManagementService
AggregatedHealthState : Error
DeployedApplicationHealthStates :
ApplicationName : fabric:/Lending20.Service.IdentityManagement
NodeName : _lending1
AggregatedHealthState : Ok
HealthEvents :
SourceId : System.CM
Property : State
HealthState : Ok
SequenceNumber : 3464
SentAt : 11/21/2015 12:38:08 PM
ReceivedAt : 11/21/2015 12:38:08 PM
TTL : Infinite
Description : Application has been created.
RemoveWhenExpired : False
IsExpired : False
Transitions : Warning->Ok = 11/21/2015 12:38:08 PM, LastError = 1/1/0001
12:00:00 AM
You can use the following ARM template to install .NET 4.6.1. Note that it's dependent on this script (used by Service Profiler). You also can replace it with any other PowerShell script.
The parameter is the base name of the node. So if you have VM0,.. VM5 in your cluster, you should set vmName = 'VM'. The vmExtensionLoop is set to 5 nodes; you can also change that of course.
If you use an ARM template to deploy your cluster, you can include this as part of it. Note it can slow down the deployment of the scale set, since it requires a restart.
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"vmName": {
"type": "string",
"metadata": {
"description": "Virtual machine name."
},
}
},
"resources": [
{
"apiVersion": "2015-05-01-preview",
"type": "Microsoft.Compute/virtualMachines/extensions",
"name": "[concat(parameters('vmName'),copyIndex(0), '/CustomScriptExtensionInstallNet461')]",
"location": "[variables('location')]",
"tags": {
"displayName": "CustomScriptExtensionInstallNet461"
},
"properties": {
"publisher": "Microsoft.Compute",
"type": "CustomScriptExtension",
"typeHandlerVersion": "1.4",
"autoUpgradeMinorVersion": true,
"settings": {
"fileUris": [ "https://gist.githubusercontent.com/aelij/7ea90dda4a187a482584/raw/a3e0f946d4a22b0af803edb503d0a30a263fba2c/InstallNetFx461.ps1" ],
"commandToExecute": "powershell.exe -ExecutionPolicy Unrestricted -File InstallNetFx461.ps1"
}
},
"copy": {
"name": "vmExtensionLoop",
"count": 5
}
}
]
}
.NET 4.6 is not yet available in the default Windows Server 2012 image used in Azure. At this point, your only option is to log into each VM and install it.
Use the windows Server 2016 image to get .net 4.6.1. pre installed. vmImageSku:"2016-Datacenter" when provisoning the cluster.
another option is use azure resource group template that includes a DSC extension to provision your VMs to have .net 46 installed.
Here is the snippet in my dsc powershell to deal with the installation of .net 461
code or gist for more complete script
Until 4.6 is supported by Azure natively, I'd use a custom VM image with .NET 4.6 preinstalled. See this article for details on how to create and use one.
Now .NET 4.6 and above is available in the Release of SDK 2.5.216 and Runtime 5.5.216
For more details please see: https://azure.microsoft.com/en-us/blog/announcing-azure-service-fabric-5-5-and-sdk-2-5/