Staging slot and vip-swap - azure-service-fabric

Coming from the classic Cloud Service model, after having used it to 5 years now, we are very used to the concept of a staging slot and the vip-swap capability. Yes this upgrade model has many warts but also many benefits.
Clearly the SF doesn't expose this model. So I wonder was it just not a popular model in Cloud Services, or does it just really not make sense 6 years later?
Is this one of those paradigm changes where I just have to re-think how we deploy, and forge ahead with the newly prescribed model (rolling upgrades)? Or are there known techniques to setting up something like staging slots with SF?
Looking for advice...

VIP swaps don't make sense for stateful compute, and Service Fabric is largely a stateful compute platform (even if you only use stateless services, the system services themselves are stateful). If your services have your data in them, you have to do a rolling upgrade if you want to keep your data and keep it consistent.
So yeah, it's a paradigm change, but a good one. It encourages continuous delivery and frequent upgrades because upgrades are integrated right into the platform and don't cost you anything extra. You don't need to pay for staging VMs, which can get expensive for large deployments, and that might even discourage continuous delivery.
Now, you can do something similar to a staging deployment for stateless services. In Service Fabric, your "deployments" are applications, not VMs. So you can create an instance of a new application version side-by-side with an instance of the previous application version and route your traffic however you want, whether that's gradually move users to the instance of the new version, or just flip a switch and send all your traffic to the new version all at once. This of course doesn't work for stateful services, because all of your data is still in the previous version application instance.

Related

How do you track your current deployments?

Imagine there is an application consisting from bunch of microservices. All of these microservices can be developed/deployed completely independently from each other. Each microservice can be "described" with several attributes - e.g. current API version, release version, commit hash etc. Along with that, there are several environments used in development process - e.g. Testing environment (often called Sandbox), Staging environment, Pre-Release environment and obviously Production environment.
Is there a convenient tool/way/approach to track, basically, what attribute is currently deployed to which environment? For instance, get a quick access to information like "what is the current version of Restful API at Pre-Release environment"? Or more complex one - "what was this version two month ago"? And of course see the "global picture" as well?
Theres no ready to use solution on the market yet according to my knowledge.
Some teams are using git ops https://www.twistlock.com/2018/08/06/gitops-101-gitops-use/ to get ahead of the chaos challenge a lot of different micro services usually ship with.
Another technology in a somewhat different, yet related direction are micro service meshes, istio https://istio.io/ being one of them.
There are also test approaches like contract testing or heavy integration tests, that are more expensive, but also provide more confidence.

Version Management in Large SOA/Microservice Architectures

We are about to embark on a large programme of work to migrate a small number of hugely monolithic 3 tier frameworks into a SOA/Microservice architecture. However there is one thing that I haven't really managed to nail down, version management (note the use of the word management, not control)
One of the core principles of this programme is that each component is absolutely independent, and therefore is designed, developed, built, versioned, deployed, operated, monitored and deprecated independently of all other Consumers and Services. This is the right principle and therefore means that the future holds 15+ clients and 50+ services. In operation we need to quickly and very reliably know all the dependencies. In a world where a service may have 3 or 4 versions of its API in production and a consumer may use 20+ services the dependency tree very quickly becomes large and complex.
So my question is how do you guys manage this? How do you maintain your "enterprise version matrix" (if that is even the correct terminology)?

Should actors/services be split into multiple projects?

I'm testing out Azure Service Fabric and started adding a lot of actors and services to the same project - is this okay to do or will I lose any of service fabric features as fail overs, scaleability etc?
My preference here is clearly 1 actor/1 service = 1 project. The big win with a platform like this is that it allows you to write proper microservice-oriented applications at close to no cost, at least compared to the implementation overhead you have when doing similar implementations on other, somewhat similar platforms.
I think it defies the point of an architecture like this to build services or actors that span multiple concerns. It makes sense (to me at least) to use these imaginary constraints to force you to keep the area of responsibility of these services as small as possible - and rather depend on/call other services in order to provide functionality outside of the responsibility of the project you are currently implementing.
In regards to scaling, it seems you'll still be able to scale your services/actors independently even though they are a part of the same project - at least that's implied by looking at the application manifest format. What you will not be able to do, though, are independent updates of services/actors within your project. As an example; if your project has two different actors, and you make a change to one of them, you will still need to deploy an update to both of them since they are part of the same code package and will share a version number.

Canary release strategy vs. Blue/Green

My understanding of a canary release is that it's a partial release to a subset of production nodes with sticky sessions turned on. That way you can control and minimize the number of users/customers that get impacted if you end up releasing a bad bug.
My understanding of a blue/green release is that you have 2 mirrored production environments ("blue" and "green"), and you push changes out to all the nodes of either blue or green at once, and then use networking magic to control which environment users are routed to via DNS.
So, before I begin, if anything I have said so far is incorrect, please begin by correcting me!
Assuming I'm more or less on track, then a couple of questions about the two strategies:
Are there scenarios where canary is preferred over blue/green, and vice versa?
Are there scenarios where a deployment model can implement both strategies at the same time?
I have written a detailed essay on this topic here: http://blog.itaysk.com/2017/11/20/deployment-strategies-defined
In my opinion, the difference is whether or not the new 'green' version is exposed to real users. If it is, then I'd call it Canary. A common way to implement Canary is regular Blue/Green with the addition of smart routing of specific users to the new version. Read the post for a detailed comparison
Blue/Green:
Canary:
Blue-green releasing is simpler and faster.
You can do a blue-green release if you've tested the new version in a testing environment and are very certain that the new version will function correctly in production. Always using feature toggles is a good way to increase your confidence in a new version, since the new version functions exactly like the old until someone flips a feature toggle. Breaking your application into small, independently releaseable services is another, since there is less to test and less that can break.
You need to do a canary release if you're not completely certain that the new version will function correctly in production. Even if you are a thorough tester, the Internet is a large and complex place and is always coming up with unexpected challenges. Even if you use feature toggles, one might be implemented incorrectly.
Deployment automation takes effort, so most organizations will plan to use one strategy or the other every time.
So do blue-green deployment if you're committed to practices that allow you to be confident in doing so. Otherwise, send out the canary.
The essence of blue-green is deploying all at once and the essence of canary deployment is deploying incrementally, so given a single pool of users I can't think of a process that I would describe as doing both at the same time. If you had multiple independent pools of users, e.g. using different regional data centers, you could do blue-green within each data center and canary across data centers. Although if you didn't need canary deployment within a data center, you probably wouldn't need it across data centers.
Although both of these terms look quite close to each other, they have subtle differences. One put confidence in your functionality release and the other put confidence the way you release.
Canary
The canary release is a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to
the entire infrastructure.
It is about to get an idea of how new version will perform (integrate with other apps, CPU, memory, disk usage, etc).
Blue/Green:
It is more about the predictable release with zero downtime deployment.
Easy rollbacks in case of failure.
Completely automated deployment process
Here are some inline definition -
Blue-Green Deployment - When deploying a new version of an
application, a second environment is created. Once the new
environment is tested, it takes over from the old version. The old
environment can then be turned off.
 
A/B Testing - Two versions of an application are running at the same time. A portion of requests go to each. Developers can then compare the versions.
 
Canary Release - A new version of a microservice is started along with the old versions. That new version can then take a portion of the requests and the team can test how this new version interacts with the overall system.
A good start of definitions.
I think it also helpes in making a decision for your strategy if you split your "release" definition in "deploy" and "release(functionality)".
Deploy (binaries)
The action of binary deployment of your product to a (production) system.
Release (functionality)
The action of managing availability of functionality to (groups of) users.
Why? You typically have (multiple) two concerns when "releasing":
1) Bugs / backwards compatibility /etc
2) Verifying the validness/usability of new features
Then ask yourselves, before choosing a Canary or Blue/green or whatever gray/mixed mode strategy: What concern(s) do we have when releasing/deploying the new version? And only then if you know your concerns, choose your strategy.
Additionally, it is possible to do more complex Deploy/Release strategies.
E.g, in some clouds/infra it is possible to have multiple production servers, and relay load in different proportions to different servers and versions of your product, and monitor soundness before scaling a release/deploy up to all users.
Feature flagging
The action of "configuring" (cold, or even hot) which functionality is (not)available for which (group) of users
If you also do something like "feature flagging" you can deploy first, measure soundness of your release in backwards compatibility/bug perspective, and release new functionality gradually to different users, or vice versa (scale down or even rollback functionality and/or binaries).
Feature flagging allows for splitting availability of functionality from deployment of binaries, and gives much more fine-grained decision making then only "deploy/rollback"
May, 2022 Update:
The difference between Blue-Green Deployment(Blue-Green Release) and Canary Deployment(Canary Release) is:
Blue-Green Deployment is quick
Canary Deployment is gradual
Blue-Green Deployment:
There are two environments, Blue environment which is "old" and contains one or more applications (instances or containers) and Green environment which is "new" and contains one or more applications (instances or containers).
Then, 100% traffic is quickly switched from Blue environment to Green environment at once as shown below and you can say Blue-Green Deployment is the quick way of Canary Deployment.:
This image above is from https://www.encora.com/insights/zero-downtime-deployment-techniques-blue-green-deployments originally created by the company "Encora"
Canary Deployment:
There are two environments, Blue environment which is "old" and contains one or more applications(instances or containers) and Green environment which is "new" and contains one or more applications(instances or containers).
Then, 100% traffic is gradually switched from Blue environment to Green environment taking a longer time(30 minutes, hours, or days) than Blue-Green Deployment as shown below and you can say Canary Deployment is the gradual way of Blue-Green Deployment:
This image above is from https://www.encora.com/insights/zero-downtime-deployment-techniques-canary-deployments originally created by the company "Encora"

Best Practices for deployments on a 24x7 system asp.net platform

We have built an enterprise web application on asp.net platform which is well load balanced across several servers. We are struggling a bit in terms of doing regular deployments as the application has been defined with an SLA of zero downtime.
Any guidance / tips would be highly appreciated for Implementing best practices to support uninterrupted deployment.
My two favorite books that cover some of these topics are Continuous Delivery by Humble/Farley and Web Operations by Allspaw/Robbins.
I think the "easy" part here is to do a rolling deployment where you pull a node out of the load balancer, upgrade it, run smoke tests, and place it back in the load balancer. Different users will encounter different versions of the app, but you get zero downtime.
The hard part is the backend system / database that these web-apps are likely hitting. You basically need to have both old and new schemas available concurrently which is challenging. Look at techniques like the expand / contract database pattern as an approach to pulling this off.