We are in the process of migrating our Service Fabric services to Kubernetes. Most of them were "stateless" services and were easy to migrate. However, we have one "stateful" service that uses SF's Reliable Collections pretty heavily.
K8s has Statefulsets, but that's not really comparable to SF's reliable collections.
Is there a .NET library or other solution to implement something similar to SF's Reliable Collections in K8s?
AFAIK this cannot be done by using a .Net library.
K8 is all about orchestration. SF on the other hand is both an orchestrator + rich programming /application model + state management.
If you want to do something like reliable collection in K8 then you have to either
A) built your own replication solution with leader election and all.
B) use private etcd/cockroachdb etc. store
This article is pretty good in terms of differences.
https://learn.microsoft.com/en-us/archive/blogs/azuredev/service-fabric-and-kubernetes-comparison-part-1-distributed-systems-architecture#split-brain-and-other-stateful-disasters
"Existing systems provide varying levels of support for microservices, the most prominent being Nirmata, Akka, Bluemix, Kubernetes, Mesos, and AWS Lambda [there’s a mixed bag!!]. SF is more powerful: it is the only data-ware orchestration system today for stateful microservices"
However, they don't solve the coordination problem (saving data on the primary instance will automatically replicate to the others for recovery when the primary instance dies). That's what SF reliable collections does out of the box.
StatefulSets are valuable for applications that require one or more of the following.
Stable (persistent), unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
If your application doesn't require any of these, you should deploy your application using a Deployment.
There is a good Kubernetes guide on how to Run a Replicated Stateful Application here.
This page shows how to run a replicated stateful application using a StatefulSet controller. This application is a replicated MySQL database. The example topology has a single primary server and multiple replicas, using asynchronous row-based replication.
The StatefulSet controller starts Pods one at a time, in order by their ordinal index. It waits until each Pod reports being Ready before starting the next one. And it is used to perform orderly startup of MySQL replication by running Init Containers.
Operators can help
While operators are not necessary, they can help run stateful apps on Kubernetes with features like application-level HA management, backups, and restore.
You can use existing Operators or develop your own. The operator package includes all the configurations needed to deploy and manage the application from a Kubernetes point of view – from a StatefulSet to be used to any required storage, rollout strategies, persistence and affinity configuration, and more. Kubernetes will then rely on the operator to validate instances of the application against the specification to ensure it runs in the same way across instances in all clusters it is deployed in.
Some DB operators:
You can deploy MySQL database using Kubernetes operator developed by Oracle (currently is in a preview state):
https://github.com/mysql/mysql-operator
There’s also a PostgreSQL operator by Crunchydata to deploy PostgreSQL to Kubernetes:
https://github.com/CrunchyData/postgres-operator
MongoDB owns an operator to deploy MongoDB Enterprise to a Kubernetes cluster:
https://github.com/mongodb/mongodb-enterprise-kubernetes
You can find ready-made operators on OperatorHub.io to suit your use case.
Related
In the Kubernetes world, a typical/classic pattern is using Deployment for Stateless Applications and using StatefulSet for a stateful application.
I am using a vendor product (Ping Access) which is meant to be a stateless application (it plays the role of a Proxy in front of other Ping products such as Ping Federate).
The github repo for Ping Cloud (where they run these components as containers) shows them running Ping Access (a stateless application) as a Stateful Set.
I am reaching out to their support team to understand why anyone would run a Stateless application as a StatefulSet.
Are there other examples of such usage (as this appears strange/bizarre IMHO)?
I also observed a scenario where a customer is using a StatefulApp (Ping Federate) as a regular deployment instead of hosting them as a StatefulSet.
The Ping Cloud repository does build and deploy Ping Federate as a StatefulSet.
Honestly, both these usages, running a stateless app as a StatefulSet (Ping Access) and running a stateful app as a deployment (Ping Federate) sound like classic anti-patterns.
Apart from the ability to attach dedicated Volumes to StatefulSets you get the following features of which some might be useful for stateless applications:
Ordered startup and shutdown of Pods with K8s doing them one by one in an ordered fashion.
Possibility to guarantee that not more than a single Pod is running at a time even during unscheduled Pod restarts.
Stable DNS names for Pods.
I can only speculate, why Ping Federate uses a StatefulSet. Possibly, it has to do with access limitations of the downstream services it connects to.
The consumption of PingAccess is stateless, but the operation is very much stateful. Namely, the PingAccess admin console maintains a database for configuration, and part of that configuration includes clustered engine mapping and session keys.
Thus, if you were to take away the persistent volume, restarting the admin console would decouple all the engines in the cluster. Then the engines no longer receive configuration.. and web session keys would be mismatched.
The ping-cloud-base repo uses StatefulSet for engines not for persistent volumes, but for sts naming scheme. I personally disagree with this and recommend using Deployment for engines. The only downside is you then have to remove orphaned engines from the admin configuration. Orphaned engines meaning engine config that stays in the admin console db after the engine deployment is rolled/updated. These can be removed from the admin UI, or API. Pretty easy to script in a hook.
It would be ideal for an application that is not a datastore to run without persistent volume, but for the reasons mentioned above, the PingAccess admin console does require and act like a persistent datastore so I think StatefulSet is okay.
Finally, the Ping DevOps team focuses support on their helm chart (where engines are also deployments by default). I'd suspect the community and enterprise support is much larger there for folks deploying on their own. ping-cloud-base is a good place to pick up some hooks though.
I deployed this week a Redis instance using Bitnami's Helm Chart into a GKE (Google Kubernetes Engine) cluster. Although I've been successful on this part, the challenge now is to make a failover disaster recovery strategy that replicates the data to another Redis instance in another GKE cluster (but same GCP project). How can I do this? I tried Persistence Volume Claims but they are only visible inside the cluster.
Redis Enterprise does have a WAN (multi-geo) replication mode, however I've never used it and it looks like it dramatically limits which features of Redis you can use to things that are compatible with a CRDT model. Basically first you need to answer how you would do this by hand, and then investigate automating that. WAN failover is a very complex thing, and generally you wouldn't even want to do it since you wouldn't fail over at the Redis level. Instead you would fail over your entire DC (or whatever you want to call your failure zones). Distributed database modelling and management is very tricky, here be dragons.
I don't believe that it is a good idea to host a cluster of message queues that persist messages inside Kubernetes.
I don't think hosting any cluster inside kubernetes nor any stateful system inside Kubernetes is a good idea.
Kubernetes is designed for stateless restful services, to scale in and out, etc.
First of all, am I right?
If I am right, then
I just need some good valid points to convince some of my colleagues.
Thanks
Kubernetes was primarily designed for stateless applications, and managing state in Kubernetes introduces additional complexities that you have to manage (persistence, backups, recovery, replication, high availability, etc.).
Let's cite some sources (emphases added):
While it’s perfectly possible to run stateful workloads like databases in Kubernetes with enterprise-grade reliability, it requires a large investment of time and engineering that it may not make sense for your company to make [...] It’s usually more cost-effective to use managed services instead.
Cloud Native DevOps with Kubernetes, p. 13, O'Reilly, 2019.
The decision to use Statefulsets should be taken judiciously because usually stateful applications require much deeper management that the orchestrator cannot really manage well yet.
Kubernetes Best Practices, Chapter 16, O'Reilly, 2019.
But...
Support for running stateful applications in Kubernetes is steadily increasing, the main tools being StatefulSets and Operators:
Stateful applications require much more due diligence, but the reality of running them in clusters has been accelerated by the introduction of StatefulSets and Operators.
Kubernetes Best Practices, Chapter 16, O'Reilly, 2019.
Managing stateful applications such as database systems in Kubernetes is still a complex distributed system and needs to be carefully orchestrated using the native Kubernetes primitives of pods, ReplicaSets, Deployments, and StatefulSets, but using Operators that have specific application knowledge built into them as Kubernetes-native APIs may help to elevate these systems into production-based clusters.
Kubernetes Best Practices, Chapter 16, O'Reilly, 2019.
Conclusion
As a best practice at the time of this writing, I would say:
Avoid managing state inside Kubernetes if you can. Use external services (e.g. cloud services, like DynamoDB or CloudAMQP) for managing state.
If you have to manage state inside the Kubernetes cluster, check if an Operator exists exists for the type of application that you want to run, and if yes, use it.
I would like to know if it is possible for multiple pods in the same Kubernetes cluster to access a database which is configured using persistent volumes on a Google cloud persistent disk.
Currently I am building a microservices achitecture web app which has 3 node apis in different pods all accessing the same database. So how do I achieve this with kubernetes.
Kindly let me know if my architecture is right as well
You can certainly connect multiple node-based app pods to the same database. It is sometimes said that microservices shouldn't share a database but this depends on what your apps are doing, the project history and the extent to which you want the parts to be worked on separately.
There are questions you have to answer about running databases at scale, such as your future load and whether you want to use relational databases if you're going to try to span availability zones. And there are
some specific to kubernetes, especially around how you associate DB Pods to data. See https://stackoverflow.com/a/53980021/9705485. Another popular option is to use a managed DB service from a cloud provider. If you do run the DB in k8s then I'd suggest looking for a helm chart or looking at an operator, such as the kubeDB operator, to avoid crafting the kubernetes descriptors yourself and to get more guidance on running the DB and setting it up.
If it's a new project and you've not used k8s before then you'll also have to decide where to host your code, your docker images and your deployment descriptors and how to setup your CI pipelines. If you've not got answers to these questions already then I'd suggest looking at Jenkins-X as it will provide you with out of the box defaults for a whole cluster and CI setup and a template ('build pack') for building node apps and deploying them to staging and prod environments through a pipeline.
When using Rubbernecks do I need to still use Containers for Oracle?
Here is my case, I am designing microservice architecture for my application using kubernetes.
I have deployed all my front-end, middleware and back-end services in different containers.
But for the data layer, I am not sure is it the right way to deploy Oracle in a container, as the database cannot be scaled horizontally. And if I want to scale the data container vertically in Kubernetes, How can I do that?
Generally how this is handled in the Kubernetes world?
You really need to come to terms with what you want/expect.
You can run a database on kube, specialy since StatefulSets. You might want to look into PersistentVolumes and PersistentVolumeClaims as well.
You can connect from kube services to an off cluster database managed in traditional way.
Which way you choose depends largely on what are you projects constraints and team experience or expected solution portability between envs and level of automation you want/have around it.