Service Fabric Application - changing instance count on application update fails - azure-service-fabric

I am building a CI/CD pipeline to release SF Stateless Application packages into clusters using parameters for everything. This is to ensure environments (DEV/UAT/PROD) can be scoped with different settings.
For example in a DEV cluster an application package may have an instance count of 3 (in a 10 node cluster)
I have noticed that if an application is in the cluster and running with an instance count (for example) of 3, and I change the deployment parameter to anything else (e.g. 5), the application package will upload and register the type, but will fail on attempting to do a rolling upgrade of the running application.
This also works the other way e.g. if the running app is -1 and you want to reduce the count on next rolling deployment.
Have I missed a setting or config somewhere, is this how it is supposed to be? At present its not lending itself to being something that is easily scaled without downtime.
At its simplest form we just want to be able to change instance counts on application updates, as we have an infrastructure-as-code approach to changes, builds and deployments for full tracking ability.
Thanks in advance

This is a common error when using Default services.
This has been already answered multiple times in these places:
Default service descriptions can not be modified as part of upgrade set EnableDefaultServicesUpgrade to true
https://blogs.msdn.microsoft.com/maheshk/2017/05/24/azure-service-fabric-error-to-allow-it-set-enabledefaultservicesupgrade-to-true/
https://github.com/Microsoft/service-fabric/issues/253#issuecomment-442074878

Related

Terminate all ECS tasks before new deployment

If an ECS task runs in multiple instances, the rolling deployment methods usually terminate tasks one by one and replace it with a newer version (newer task definition), in order to maximize uptime.
But we have some containers that may make schema upgrades on first-run (as an example), and here we never want old versions of a container to run at the same time as newer versions.
Another example is coupled data formats, that a new front-end talks to a new API. We'd rather have a moment of downtime than have old front-ends talk to new APIs, or new front-ends talk to old APIs.
ECS doesn't seem to provide a deployment strategy that terminates all instances of a task before starting to roll out new ones.
Before I roll some bash script that reduces task counts to 0 prior to an update and returns it to its original value, does anyone know if AWS provides a way to cleanly drain a service of tasks before rolling out a new task definition?
In a similar scenario I would do it as follows.
I would add my pipeline (if using a stack code) a CodeDeploy step and set the traffic to "All-at-once", this doc can start to direct you something.
In case I was unable to work with the stack code, I would create a lambda, which would be triggered by the pipeline, a step before the deploy, this lambda could interact with the ECS, resetting the tasks to receive the new version of the application, but in this situation, a downtime will be felt, of at least 2 ~ 5 minutes, depending on how your application is optimized. You can see a little more about the lib that will allow you to do this here.

Whole Application level rolling update

My kubernetes application is made of several flavors of nodes, a couple of “schedulers” which send tasks to quite a few more “worker” nodes. In order for this app to work correctly all the nodes must be of exactly the same code version.
The deployment is performed using a standard ReplicaSet and when my CICD kicks in it just does a simple rolling update. This causes a problem though since during the rolling update, nodes of different code versions co-exist for a few seconds, so a few tasks during this time get wrong results.
Ideally what I would want is that deploying a new version would create a completely new application that only communicates with itself and has time to warm its cache, then on a flick of a switch this new app would become active and start to get new client requests. The old app would remain active for a few more seconds and then shut down.
I’m using Istio sidecar for mesh communication.
Is there a standard way to do this? How is such a requirement usually handled?
I also had such a situation. Kubernetes alone cannot satisfy your requirement, I was also not able to find any tool that allows to coordinate multiple deployments together (although Flagger looks promising).
So the only way I found was by using CI/CD: Jenkins in my case. I don't have the code, but the idea is the following:
Deploy all application deployments using single Helm chart. Every Helm release name and corresponding Kubernetes labels must be based off of some sequential number, e.g. Jenkins $BUILD_NUMBER. Helm release can be named like example-app-${BUILD_NUMBER} and all deployments must have label version: $BUILD_NUMBER . Important part here is that your Services should not be a part of your Helm chart because they will be handled by Jenkins.
Start your build with detecting the current version of the app (using bash script or you can store it in ConfigMap).
Start helm install example-app-{$BUILD_NUMBER} with --atomic flag set. Atomic flag will make sure that the release is properly removed on failure. And don't delete previous version of the app yet.
Wait for Helm to complete and in case of success run kubectl set selector service/example-app version=$BUILD_NUMBER. That will instantly switch Kubernetes Service from one version to another. If you have multiple services you can issue multiple set selector commands (each command executes immediately).
Delete previous Helm release and optionally update ConfigMap with new app version.
Depending on your app you may want to run tests on non user facing Services as a part of step 4 (after Helm release succeeds).
Another good idea is to have preStop hooks on your worker pods so that they can finish their jobs before being deleted.
You should consider Blue/Green Deployment strategy

Can a ReplicaSet be configured to allow in progress updates to complete?

I currently have a kubernetes setup where we are running a decoupled drupal/gatsby app. The drupal acts as a content repository that gatsby pulls from when building. Drupal is also configured through a custom module to connect to the k8s api and patch the deployment gatsby runs under. Gatsby doesn't run persistently, instead this deployment uses gatsby as an init container to build the site so that it can then be served by a nginx container. By patching the deployment(modifying a label) a new replicaset is created which forces a new gatsby build, ultimately replacing the old build.
This seems to work well and I'm reasonably happy with it except for one aspect. There is currently an issue with the default scaling behaviour of replica sets when it comes to multiple subsequent content edits. When you make a subsequent content edit within drupal it will still contact the k8s api and patch the deployment. This results in a new replicaset being created, the original replicaset being left as is, the previous replicaset being scaled down and any pods that are currently being created(gatsby building) are killed. I can see why this is probably desirable in most situations but for me this increases the amount of time that it takes for you to be able to see these changes on the site. If multiple people are using drupal at the same time making edits this will be compounded and could become problematic.
Ideally I would like the containers that are currently building to be able to complete and for those replicasets to finish scaling up, queuing another replicaset to be created once this is completed. This would allow any updates in the first build to be deployed asap, whilst queueing up another build immediately after to include any subsequent content, and this could continue for as long as the load is there to require it and no longer. Is there any way to accomplish this?
It is the regular behavior of Kubernetes. When you update a Deployment it creates new ReplicaSet and respectively a Pod according to new settings. Kubernetes keeps some old ReplicatSets in case of possible roll-backs.
If I understand your question correctly. You cannot change this behavior, so you need to do something with architecture of your application.

meaning of parameter "mode " of set-AzureDeployment

What is the meaning of "mode" of set-AzureDeployment?
-Mode
Specifies the mode of upgrade. Supported values are: "Auto", "Manual", and "Simultaneous".
What does "Auto","Manual", and "Simultaneous" mean?
I am particularly interested in "Simultaneous". Does it mean my package will be deployed to multiple instances simultaneously?
Thanks
Mode specifies the type of update to initiate. Role instances are allocated to update domains when the service is deployed. Updates can be initiated manually in each update domain or initiated automatically in all update domains.
If not specified, the default value is Auto. If set to Manual, WalkUpgradeDomain must be called to apply the update. If set to Auto, the update is automatically applied to each update domain in sequence.
To perform an automatic update of a deployment, call Upgrade Deployment or Change Deployment Configuration with the Mode element set to automatic. The update proceeds from that point without a need for further input. You can call Get Operation Status to determine when the update is complete.
To perform a manual update, first call Upgrade Deployment with the Mode element set to manual. Next, call Walk Upgrade Domain to update each domain within the deployment. You should make sure that the operation is complete by calling Get Operation Status before updating the next domain. More information please refer to this link.
One of the new deployment options we now support is the ability to do a “Simultaneous Update” of a Cloud Service (we sometimes also refer to this as the “Blast Option”). When you use this option we bypass the normal upgrade domain walk that is done by default with Cloud Services (where we upgrade parts of the Cloud Service sequentially to avoid ever bringing the entire service down) and we instead upgrade all roles and instances simultaneously. With today’s release this simultaneous update logic now happens within Windows Azure (on the cloud side). This has the benefit of enabling the Cloud Service update to happen much faster. More information please refer to this link.
I am particularly interested in "Simultaneous". Does it mean my
package will be deployed to multiple instances simultaneously?
The answer is yes.

How can we route a request to every pod under a kubernetes service on Openshift?

We are building a Jboss BRMS application with two microservices in spring-boot, one for rule generation (SRV1) and one for rule execution (SRV2).
The idea is to generate the rules using the generation microservice (SRV1) and persist them in the database with versioning. The next part of the process is having the execution microservice load these persisted rules into each pods memory by querying the information from the shared database.
There are two following scenarios when this should happen :
When the rule execution service pod/pods starts up, it queries the db for the lastest version and every pod running the execution application loads those rules from the shared db.
The second senario is we manually want to trigger the loading of a specific version of rules on every pod running the execution application preferably via a rest call.
Which is where the problem lies!
Whenever we try and issue a rest request to the api, since it is load balanced under a kubernetes service, the request hits only one of the pods and the rest of them do not load the specific rules.
Is there a programatic or design change that may help us achieve that or is there any other way we construct our application to achieve a capability to load a certain version of rules on all pods serving the execution microservice.
The second senario is we manually want to trigger the loading of a specific version of rules on every pod running the execution application preferably via a rest call.
What about using Rolling Updates? When you want to change the version of rules to be fetched within all execution pods, tell OpenShift to do rolling update which kills/starts all your pods one by one until all pods are on the new version, thus, they fetch the specific version of rules at the startup. The trigger of Rolling Updates and the way you define the version resolution is up to you. For instance: Have an ENV var within a pod that defines the version of rules that are going to be fetched from db, then change the ENV var to a new value and perform Rollling Updates. At the end, you should end up with new set of pods, all of them fetching the version rules based on the new value of the ENV var you set.