Microsoft Service Fabric - fabric:/System/ImageStoreService not running - azure-service-fabric

I am trying to copy an app to the service fabric image store.
I am not able to copy the application via VS or Powershell (probably because of the fabric:/System/ImageStoreService being in Error state). The operation times out when done using Visual Studio and when done using Powershell - it just stays stuck indefinitely.
I don't know how to approach services that are not running on the Service Fabric Cluster. I have other services failing on the cluster as well - this is a new test cluster created using the Azure portal yesterday (see attached screenshot).
Error event: SourceId='System.FM', Property='State'. Partition is below target replica or instance count.
ImageStoreService 3 3 00000000-0000-0000-0000-000000003000
N/P InBuild _nodr_0 131636712421204228
(Showing 1 out of 1 replicas. Total available replicas: 0)

Related

How to sync user directory on bitbucket server to jira with both running on aks?

When trying to sync the user directories of Jira to other atlassian products (confluence and bitbucket server running on aks) a 403 error is returned.
Upon looking into this error the following steps have been attempted:
https://confluence.atlassian.com/stashkb/unable-to-connect-to-jira-for-authentication-forbidden-403-323391874.html
The IP adresses have been added to the whitelist of Jira. The next step in solutions online is to restart the Jira
service.
This however causes issues as upon running the stop/start-jira.sh files inside the pod the service returns
with none of the previous settings and all configurations including backups are gone. Taking us back to square one.
cluster size:
current set-up
3 x Standard D8 v3 (8 vcpus, 32 GiB memory) cluster on aks
Used the following images installed through UI:
atlassian/jira-software
cptactionhank/docker-atlassian-jira
Exec into pod and go to /opt/atlassian/jira/bin
run ./(start/stop)-jira.sh
What should happen is that when going back to the url the Jira instance is reset and all configuration files in the pod for the service are lost.
The logs of the pod give error no 137 as a common error when restarting.
update:
https://github.com/int128/devops-kompose/tree/master/atlassian-jira-software
The following helm chart has also been used and achieved the same result.

Azure Service Fabric Cluster Update

I have a cluster in Azure and it failed to update automatically so I'm trying a manual update. I tried via the portal, it failed so I kicked off an update using PS, it failed also. The update starts then just sits at "UpdatingUserConfiguration" then after an hour or so fails with a time out. I have removed all application types and check my certs for "NETWORK SERVCIE". The cluster is 5 VM single node type, Windows.
Error
Set-AzureRmServiceFabricUpgradeType : Code: ClusterUpgradeFailed,
Message: Cluster upgrade failed. Reason Code: 'UpgradeDomainTimeout',
Upgrade Progress:
'{"upgradeDescription":{"targetCodeVersion":"6.0.219.9494","
targetConfigVersion":"1","upgradePolicyDescription":{"upgradeMode":"UnmonitoredAuto","forceRestart":false,"u
pgradeReplicaSetCheckTimeout":"37201.09:59:01","kind":"Rolling"}},"targetCodeVersion":"6.0.219.9494","target
ConfigVersion":"1","upgradeState":"RollingBackCompleted","upgradeDomains":[{"name":"1","state":"Completed"},
{"name":"2","state":"Completed"},{"name":"3","state":"Completed"},{"name":"4","state":"Completed"}],"rolling
UpgradeMode":"UnmonitoredAuto","upgradeDuration":"02:02:07","currentUpgradeDomainDuration":"00:00:00","unhea
lthyEvaluations":[],"currentUpgradeDomainProgress":{"upgradeDomainName":"","nodeProgressList":[]},"startTime
stampUtc":"2018-05-17T03:13:16.4152077Z","failureTimestampUtc":"2018-05-17T05:13:23.574452Z","failureReason"
:"UpgradeDomainTimeout","upgradeDomainProgressAtFailure":{"upgradeDomainName":"1","nodeProgressList":[{"node
Name":"_mstarsf10_1","upgradePhase":"PreUpgradeSafetyCheck","pendingSafetyChecks":[{"kind":"EnsureSeedNodeQu
orum"}]}]}}'.
Any ideas on what I can do about a "EnsureSeedNodeQuorum" error ?
The root cause was only 3 seed nodes in the cluster as a result of the cluster being build with a VM scale set that had "overprovision" set to true. Lesson learned, remember to set "overprovision" to false.
I ended up deleting the cluster and scale set and recreated using my stored ARM template.

Upload speed publish Service Fabric application from Azure VM to Azure cluster

I am trying to publish my Service Fabric application from a Azure VM as was suggested here: Operation timed out publishing Service Fabric application to Azure
The Azure VM is created in the same datacenter as my Service Fabric cluster. But for some reason I am only getting upload speeds from around 200 Kbps.
With the hard-coded 10 minutes timeout in the publish script in Visual Studio, this is not enough to get my application published.
Are there any suggestions on how I might increase my upload speed?
Since version 2.5.216 of the Service Fabric SDK, you have to ability to compress the package prior to sending.
Add the following line to the PublishProfiles\Cloud.xml file to enable compression (and change the timeout from 10 minutes to 60 minutes if you want to):
<CopyPackageParameters CopyPackageTimeoutSec="3600" CompressPackage="true" />
See this lengthy disucssion

Debug Service Fabric DNX/asp.net 5 Stateless Service in Azure Cluster

I have published my dnx/Web Service Fabric stateless service to local - it works. I publish to the cloud (carefully setting up the correct ports) and it does not start correctly. The error is the usual partition is below replica count
My suspicion is that dnx is not installed by default on the cluster VMs. Any way to get around that? I don't appear to get a login to those VMs so I can install asp.net 5 manually.
Found the issue - it was not DNX.
I set up a new cluster and was able to log in. There are 22304 error messages saying that my second non-dnx stateless service which is in the same application package is causing this event:
.NET Runtime version : 4.0.30319.34014 - This application could not be started.This application requires one of the following versions of the .NET Framework:
.NETFramework,Version=v4.5.2
Do you want to install this .NET Framework version now?
I'll figure out how to target correctly.

Service Fabric stateful service no longer replicates

FURTHER UPDATE: this error has not occurred since the November update.
EDIT: you may want to read this if your stateful service stops working for no apparent reason. Typical sign is using WordCount-like app (for example), the service deployment reports that one partition is remaining and after 5 tries gives up. The stateless service starts ok. The diagnostics reports multiple "Constructed instance of type WordCountService". If You have this, then you may have the same problem I have. No amount of uninstalling VS/SF/Azure SDKs helps. I now use a VM template with VS/Azure/SF installed and just delete and recreate it each time this error occurs (it is rare but has happened several times). Assume MSFT is aware and fixing for beta.
ORIGINAL:
Summary question: Is there a way to reset Service Fabric completely?
Background: I have a stateful/stateless app service based on Wordcount example. All of a sudden, after deployment the app no longer replicates the stateful service (1 instance, 2 replicas). The stateless service is deployed ok (one instance, no replicas).
The partition status of the primary partition is reporting "Partition is below target replica or instance count". The replica status is "InBuild" for replicas, Primary is OK.
On the primary node, there is a warning "Replica had multiple failures during open. Error = -2147024894.
I have tried cleaning the cluster, uninstalling/reinstalling service fabric, deleting the SfDevCluster directory entirely etc.
If I copy the exact code to another computer with service fabric installed, it works (and I mean copy/paste the whole solution directory).
I had a similar problem last week but it caused the host service not to start. Tried uninstall/reinstall/clean/remove SDKs, remove Visual Studio, etc. The only thing that fixed it was a reinstall of windows.