Azure Service Fabric publish upgrade from Visual Studio - PowerShell Script Error

Azure Service Fabric publish upgrade from Visual Studio - PowerShell Script Error - azure-service-fabric

I am trying to publish an upgrade of a Service Fabric application from Visual Studio 2017 to our Azure Service Fabric Cluster. In mid-September, I successfully published an upgrade of this same app with same PowerShell script to SFC with no issues. I am now trying to upgrade it at the next version number and suddenly getting this error.
I get the following error during Publish, related to Powershell.
2>Started executing script 'Deploy-FabricApplication.ps1'.
2>powershell -NonInteractive -NoProfile -WindowStyle Hidden -ExecutionPolicy Bypass -Command ". 'C:\Users\pj\Source\Workspaces\VDevelopment\trunk\Services\Sources\src\For.Application.ServiceFabric.Sources\Scripts\Deploy-FabricApplication.ps1' -ApplicationPackagePath 'C:\Users\pj\Source\Workspaces\VDevelopment\trunk\Services\Sources\src\For.Application.ServiceFabric.Sources\pkg\Debug' -PublishProfileFile 'C:\Users\pj\Source\Workspaces\VDevelopment\trunk\Services\Sources\src\For.Application.ServiceFabric.Sources\PublishProfiles\Cloud.xml' -DeployOnly:$false -ApplicationParameter:#{} -UnregisterUnusedApplicationVersionsAfterUpgrade $false -OverrideUpgradeBehavior 'None' -OverwriteBehavior 'SameAppTypeAndVersion' -SkipPackageValidation:$false -ErrorAction Stop"
2>Copying application package to image store...
2>Upload to Image Store succeeded
2>Registering application type...
2>Register application type started. Use Get-ServiceFabricApplicationType to query for status.
2>Running Image Builder process ...
2>Application package is registered.
2>Start upgrading application...
2>aka.ms/upgrade-defaultservices
2>Start-ServiceFabricApplicationUpgrade : aka.ms/upgrade-defaultservices
2>At C:\Program Files\Microsoft SDKs\Service
2>Fabric\Tools\PSModule\ServiceFabricSDK\Publish-UpgradedServiceFabricApplication.ps1:317 char:13
2>+ Start-ServiceFabricApplicationUpgrade #UpgradeParameters
2>+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2> + CategoryInfo : InvalidOperation: (Microsoft.Servi...usterConnection:ClusterConnection) [Start-ServiceFa
2> bricApplicationUpgrade], FabricException
2> + FullyQualifiedErrorId : UpgradeApplicationErrorId,Microsoft.ServiceFabric.Powershell.StartApplicationUpgrade
2>
2>Finished executing script 'Deploy-FabricApplication.ps1'.
2>Time elapsed: 00:07:39.0407526
2>The PowerShell script failed to execute.
========== Build: 1 succeeded, 0 failed, 10 up-to-date, 0 skipped ==========
========== Publish: 0 succeeded, 1 failed, 0 skipped ==========
Any idea what's going on here? Again, when I last published this in September, with the same script, no issues at all, and I haven't made any changes to the solution other than upgrading the Manifest versions to push it out as a new upgraded version.
I noted this S/O thread: Getting error as part of trying to upgrade Service Fabric Application using Start-ServiceFabricApplicationUpgrade and saw the user's error was similar, but the answer does not apply to my issue because all three steps in the answer provided are definitely included in my powershell deploy script.
I can add the deployment script if helpful, but will wait until that is requested as it's long, and I only want to post it here if someone feels it's needed to diagnose.

You are getting this error because you are changing some parameters in a DefaultService that are not allowed by default.
The link aka.ms/upgrade-defaultservices shown in the error logs explain this.
Some default service parameters defined in the application manifest
can also be upgraded as part of an application upgrade.
Only the service parameters that support being changed through
Update-ServiceFabricService can be changed as part of an upgrade. The
behavior of changing default services during application upgrade is as
follows:
Default services in the new application manifest that do not already exist in the cluster are created.
Default services that exist in both the previous and new application manifests are updated. The parameters of the default
service in the new application manifest overwrite the parameters of
the existing service. The application upgrade will rollback
automatically if updating a default service fails.
Default services that do not exist in the new application manifest are deleted if they exist in the cluster. Note that deleting a default
service will result in deleting all that service's state and cannot be
undone.
Also, there is this other SO question about the same thing: Default service descriptions can not be modified as part of upgrade set EnableDefaultServicesUpgrade to true
The item 1 above is a common approach, where new services are added to the solution and later created during the upgrade without errors, the item 2 and 3 are the restricted approach that requires the EnableDefaultServicesUpgrade.
The item 2, is like described in the answer you've added, you changed MinReplicaSize and TargetReplicaSize to 1 during a manual update, when SF validated the state of your service for upgrade, it identified the difference and prevented the upgrade to continue, if you had set cluster setting EnableDefaultServicesUpgrade to true it would continue and override the default values.
The item 3, would occur you when you removed the service and added again, you had changed or misspelled the name, SF default settings would prevent the deletion of this service.
Regarding the solution you've found(delete and recreate), is not ideal,
In scenarios where you have stateful services running in production, would be risky to apply, because you would have to backup the state, re-deploy the services, and restore the backup, in some cases, depending on what these changes are, you wouldn't be able to restore the backup, because they have to match with the original services definitions (partitions type, number, and son on). You would also lose the benefits of Rolling Updates, and your service would go down maybe for a while if these backups are big.

The issue had to do with us trying to push out the application with mismatched node instances. We have a stateful service running under this application that is supposed to have MinReplicaSize and TargetReplicaSize set to 3. Yesterday, due to an issue, we deleted and re-created this service inside the SF Explorer. Upon doing so, it reset the replica size parameters back to 1. So we used a Powershell script to change them back to 3, but that script did not include all the necessary commands to get the service back to the exact state it was in before we deleted it. So today when we went to upgrade the app, the app in SFC wouldn't accept an upgrade from VS deployment, because of mismatches between what was in the parameters of the solution vs. what was in our SFC. To resolve, we re-deleted those services first, then deployed from VS, and no more error.

Related

Failed to start service VisualStudioRemoteDeployer

We are using on site Dev-Ops and have a similar problem to that described in the link Example from SO.
But ours is intermittent.
Our environment uses two build and deploy machines, which each deploy machine having two worker agents.
For one of our projects, when it is deployed, we constantly get the error:
The VisualStudioRemoteDeployerc4d3852f-411b-48ba-97d8-5e09c8d07ce4 service failed to start due to the following error:
%%2
But here is the rub, not every time. Sometimes the deployment completes without error.
Other projects that use the same deployment machine and the same target server work each and every time without fail.
The deployment log reports "The WSMan provider host process did not return a proper response." as an error.
Checking the allocated memory, described in PowerShell Out of Memory, to find our set at 2.1 Billion.

This is an interesting issue that I have uncovered. The source of this problem stems from the interaction of McAfee Endpoint security.
Said antivirus was reporting that when the remote powershell script, using WSMan, was called. McAfee, saw this as a viral payload and canceled the deployment by stopping the service from running and deleting the payload. This has been reported to McAfee as an issue. In the mean time, internal network security settings for McAfee has had to be modified to ignore the processes used by powershell in remote deployment.

Service Fabric: Failed to contact Naming Service, but only from build server

I am using the Deploy-FabricApplication.ps1 script to deploy my cluster. From visual studio everything works as expected. From my appveyor build server running the exact same powershell script with the same parameters throws the Failed to contact naming service error. This apparently started out of nowhere. In fact it only fails for my dev enviornment, my qa environment still works on the build server as well. Neither cluster has been touched since it was setup, and no configuration has been changed. Just all of the sudden the error starts happening.
Not including too many details here because this is more of a question about what could cause the exact same deployment script with same parameters to fail in a different environment. Since it works from visual studio I assume it is safe to say the cluster and profiles are setup correctly.
Parameters I am passing into the deployment script are belowser
-ApplicationPackagePath
-PublishProfileFile
-DeployOnly:$false
-UnregisterUnusedApplicationVersionsAfterUpgrade $false
-OverrideUpgradeBehavior 'None'
-OverwriteBehavior 'SameAppTypeAndVersion'
-SkipPackageValidation:$false
-ErrorAction Stop

Error starting Service Fabric Cluster (v 1.5.175)

I've been trying to install and start the new preview SDK, and even after several installs/uninstalls/reboots I always get this error when running DevClusterSetup:
Start-Service : Failed to start service 'Microsoft Service Fabric Host Service (FabricHostSvc)'.
At C:\Program Files\Microsoft SDKs\Service Fabric\Tools\Scripts\ClusterSetupUtilities.psm1:433 char:5
(full log below)
What I've tried, from other posts on stackoverflow:
reparing the performance counters with lodctr /R
used the system file checker with SFC /SCANNOW
checked that the windows firewall is running (and tried disabling it for the domain networks)
made sure I have enough disk space
The windows service "Microsoft Service Fabric Host Service" is always "Starting", but never starts.
I have two hints as to what the source of the problem might be, but can't solve it:
a) in the event viewer (Microsoft-Service Fabric > Admin) there are 4 errors that occur everytime the service attempts to start:
Unable to stop data collector for performance counters. The command
"logman stop FabricCounters" failed with error code -2147287038.
System.Fabric.FabricDeployer.InvalidDeploymentException: Failed to
start performance counter collection when creating or updating
deployment
FabricDeployer::Install failed with error 0xffffffff
FabricDeployer::Install failed with error 0xffffffff, Rolling back
b) In the C:\SfDevCluster\Log\Traces folder there are files named something like FabricSetup-131034051696570691.trace . All of them have the same content, and in the middle there are warnings like these:
FabricSetup.FabricSetup.EventTraceInstaller,Method QueryDataCollectorSet failed with HRESULT: -2144337918
FabricSetup.FabricSetup.EventTraceInstaller,Method StopPlaTraceSession failed with HRESULT: -2144337918
and then further down the error:
FabricSetup.FabricSetup.FabricDeployer,Configuration Deployment failed with error 0xffffffff
If I go and check the Fabric deployer files (eg, fabricdeployer-635945286697202537.trace), I have a single error at the end, after a series of Performance counter deletes:
FabricDeployer.FabricDeployer,Executing command: logman stop FabricCounters
FabricDeployer.FabricDeployer,Unable to stop data collector for performance counters. The command "logman stop FabricCounters" failed with error code -2147287038.
but this error seems to come after some other error, as part of the rollback.
Any ideas? This is very frustrating and there is very little info on the net.
I've also tried cleaning the installation with ClearCluster.ps1 and installing the dev cluster to a different folder, always with the same result.
I am running Win10 with VS2015 Update 1, Azure SDK 2.8.2.1 . My user is a liveid which is local admin.

I'll start with a short answer to unblock you. From an elevated powershell session run:
Unregister-ScheduledTask FabricCounters

I had the exact same issue but in my case the FabricCounters task wasn't there. So I did a search for other "Fabric*" tasks via Get-ScheduledTask Fabric* and found both FabricAppInfoTraces and FabricQueryTraces to be present still after uninstall.
I removed both Tasks using Unregister-ScheduledTask <name>, reinstalled the SDK and was able to start my local cluster again!

How do I deploy service fabric application from VSTS release pipeline?

I have configured a CI build for a Service Fabric application, in Visual Studio Team Services, according to this documentation: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-set-up-continuous-integration
But instead of having my CI build do the publishing, I only perform the Build and Package tasks, and include all Service Fabric related output, such as pkg folder, scripts, publish profiles and application parameters, in the drop. This way I can pass it along to the new Release pipeline (agent-based releases) to do the actual deployment of my service fabric application.
In my release definition I have a single Azure Powershell task, that uses an ARM endpoint (with proper service principals configured).
When I deploy my app to an existing service fabric cluster, I use the default Deploy-FabricApplication cmdlet passing along the pkg folder and a publish profile that is configured with a connection to the existing cluster.
The release fails with an error message "Cluster connection instance is null". And I cannot understand why?
Doing some debugging I have found that:
The Deploy-FabricApplication cmdlet executes the Connect-ServiceFabricCluster cmdlet just fine, but as soon as the Publish-NewServiceFabricApplication cmdlet takes over execution, then the cluster connection is lost.
I would expect that this scenario is possible using the service fabric cmdlets, but I cannot figure out how to keep the cluster connection open during depoyment.
UPDATE: The link to the documentation no longer refers to the Service Fabric powershell scripts, so the pre-condition for this question is no longer documented. The article now refers to the VSTS build and release tasks, which can be prefered over the powershell cmdlets I tried to use.

When the Connect-ServiceFabricCluster function is called (from Deploy-FabricApplication.ps1) a local $clusterConnection variable is set after the call to Connect-ServiceFabricCluster. You can see that using Get-Variable.
Unfortunately there is logic in some of the SDK scripts that expect that variable to be set but because they run in a different scope, that local variable isn't available.
It works in Visual Studio because the Deploy-FabricApplication.ps1 script is called using dot source notation, which puts the $clusterConnection variable in the current scope.
I'm not sure if there is a way to use dot sourcing when running a script though the release pipeline but you could, as a workaround, make the $clusterConnection variable global right after it's been set via the Connect-ServiceFabricCluster call. Edit your Deploy-FabricApplication.ps1 script and add the following line after the connection logic (~line 169):
$global:clusterConnection = $clusterConnection
By the way, you might want to consider setting up custom build/release tasks that deploy a Service Fabric application, rather than using the various Deploy-FabricApplication.ps1 scripts.

There now exists a built-in VSTS task for deploying a Service Fabric app so you no longer need to bother with executing the PowerShell script on your own. Task documentation page is at https://www.visualstudio.com/docs/build/steps/deploy/service-fabric-deploy. The original CI article has also been updated which provides details on how to set everything up: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-set-up-continuous-integration/.

Try to use "PowerShell" task instead of "Azure PowerShell" task.

I hit the same bug today and opened a GitHub issue here
On a side note, VS generated script Deploy-FabricApplication.ps1 uses module
"$((Get-ItemProperty -Path "HKLM:\SOFTWARE\Microsoft\Service Fabric SDK" -Name "FabricSDKPSModulePath").FabricSDKPSModulePath)\ServiceFabricSDK.psm1"
That's where Publish-NewServiceFabricApplication comes from. You can check the deployment logic and rewrite it in more sane way using lower-level ServiceFabric SDK cmdlets (potentially getting connection using Get-ServiceFabricClusterConnection instead of global-ling it)

Azure deployment with PowerShell, "New-AzureDeployment : There was no endpoint listening at https://management.core.windows.net/..."

Following the guide and powershell script from this article,
https://www.windowsazure.com/en-us/develop/net/common-tasks/continuous-delivery/
I've run into an extremely odd error:
9/4/2012 9:02 PM - Creating New Deployment: In progress
New-AzureDeployment : There was no endpoint listening at https://management.core.windows.net/5921d8af-88a1-4f63-9673-5e1ae1df7e8a/services/storageservices/Build_2012-09-04_02-27.1/dist/LNEC_Admin.Azure.cspkg/keys that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
It's odd because we're on build "Build_2012-09-04_08-16.1", not the one mentioned in the URL above (which no longer even exists on the filesystem). This is under Jenkins CI which runs under the NETWORK SERVICE account. If I run it by hand with my own account the same error results, but with a lnecint in place of the build directory: https://management.core.windows.net/5921d8af-88a1-4f63-9673-5e1ae1df7e8a/services/storageservices/lnecint/keys
That keyword "lnecint" isn't mentioned anywhere in any config (I've searched every file on the entire machine and TFS server). It was the name of a storage account, but it's long ago been deleted.
VS 2012, Azure SDK 1.7.1

There's definitely an issue with your endpoint. Can you check what parameters you're passing to the "New-AzureDeployment" Cmdlet?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse