I am trying to deploy Kong in GKE as per the documentation https://github.com/Kong/kong-dist-kubernetes
I noticed cassandra is available as StatefulSet but Postgres as ReplicationController. Can I understand the difference? Also can anyone suggest how to choose between these 2?
ReplicationControllers predates StatefulSets. It was a way to manage your pod replicas. The 'newer' approach to manage your replicas is ReplicaSets which is used by Deployments.
StatefulSets is meant for applications that require your pods to start in an ordered way together with some sort of data stored on disk. So it's very suitable for master/slave datastore or ring topology datastores like Cassandra. I would strongly recommend using StatefulSets for these types of workloads.
StatefulSet is better for managing stateful applications (which postgres and cassandra definitely are) as it provides possibility to create PersistentVolumeClaim to use GKE PD in your case so your state will be stored on separate partition on dedicated PD. In comparison Postgres deployment using ReplicationController which you provided use emptyDir so it means when you delete by accident/failure POD with Postgres all data will be lost so in that case you will need probably to re-initialize your Kong deployment (run Kong migrations, configure routers, etc.)
Related
We are in the process of migrating our Service Fabric services to Kubernetes. Most of them were "stateless" services and were easy to migrate. However, we have one "stateful" service that uses SF's Reliable Collections pretty heavily.
K8s has Statefulsets, but that's not really comparable to SF's reliable collections.
Is there a .NET library or other solution to implement something similar to SF's Reliable Collections in K8s?
AFAIK this cannot be done by using a .Net library.
K8 is all about orchestration. SF on the other hand is both an orchestrator + rich programming /application model + state management.
If you want to do something like reliable collection in K8 then you have to either
A) built your own replication solution with leader election and all.
B) use private etcd/cockroachdb etc. store
This article is pretty good in terms of differences.
https://learn.microsoft.com/en-us/archive/blogs/azuredev/service-fabric-and-kubernetes-comparison-part-1-distributed-systems-architecture#split-brain-and-other-stateful-disasters
"Existing systems provide varying levels of support for microservices, the most prominent being Nirmata, Akka, Bluemix, Kubernetes, Mesos, and AWS Lambda [there’s a mixed bag!!]. SF is more powerful: it is the only data-ware orchestration system today for stateful microservices"
However, they don't solve the coordination problem (saving data on the primary instance will automatically replicate to the others for recovery when the primary instance dies). That's what SF reliable collections does out of the box.
StatefulSets are valuable for applications that require one or more of the following.
Stable (persistent), unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
If your application doesn't require any of these, you should deploy your application using a Deployment.
There is a good Kubernetes guide on how to Run a Replicated Stateful Application here.
This page shows how to run a replicated stateful application using a StatefulSet controller. This application is a replicated MySQL database. The example topology has a single primary server and multiple replicas, using asynchronous row-based replication.
The StatefulSet controller starts Pods one at a time, in order by their ordinal index. It waits until each Pod reports being Ready before starting the next one. And it is used to perform orderly startup of MySQL replication by running Init Containers.
Operators can help
While operators are not necessary, they can help run stateful apps on Kubernetes with features like application-level HA management, backups, and restore.
You can use existing Operators or develop your own. The operator package includes all the configurations needed to deploy and manage the application from a Kubernetes point of view – from a StatefulSet to be used to any required storage, rollout strategies, persistence and affinity configuration, and more. Kubernetes will then rely on the operator to validate instances of the application against the specification to ensure it runs in the same way across instances in all clusters it is deployed in.
Some DB operators:
You can deploy MySQL database using Kubernetes operator developed by Oracle (currently is in a preview state):
https://github.com/mysql/mysql-operator
There’s also a PostgreSQL operator by Crunchydata to deploy PostgreSQL to Kubernetes:
https://github.com/CrunchyData/postgres-operator
MongoDB owns an operator to deploy MongoDB Enterprise to a Kubernetes cluster:
https://github.com/mongodb/mongodb-enterprise-kubernetes
You can find ready-made operators on OperatorHub.io to suit your use case.
I need to deploy Grafana in a Kubernetes cluster in a way so that I can have multiple persistent volumes stay in sync - similar to what they did here.
Does anybody know how I can use the master/slave architecture so that only 1 pod writes while the others read? How would I keep them in sync? Do I need additional scripts to do that? Can I use Grafana's built-in sqlite3 database or do I have to set up a different one (Mysql, Postgres)?
There's really not a ton of documentation out there about how to deploy statefulset applications other than Mysql or MongoDB.
Any guidance, experience, or even so much as a simple suggestion would be a huge help. Thanks!
StatefulSets are not what you think and have nothing to do with replication. They just handle the very basics of provisioning storage for each replica.
The way you do this is as you said by pointing Grafana at a "real" database rather than local Sqlite.
Once you do that, you use a Deployment because Grafana itself is then stateless like any other webapp.
I heard that statefulset is suitable for database.
But StatefulSet will create different pvc for echo pod.
If I set the replicas=3.then I get 3 pod and 3 different pvc with different data.
For database users,they only want a database not 3 database.
So Its clear we should not use statefulset in this situation.
But when should we use statefulset.
A StatefulSet does three big things differently from a Deployment:
It creates a new PersistentVolumeClaim for each replica;
It gives the pods sequential names, starting with statefulsetname-0; and
It starts the pods in a specific order (ascending numerically).
This is useful when the database itself knows how to replicate data between different copies of itself. In Elasticsearch, for example, indexes are broken up into shards. There are by default two copies of each shard. If you have five Pods running Elasticsearch, each one will have a different fraction of the data, but internally the database system knows how to route a request to the specific server that has the datum in question.
I'd recommend using a StatefulSet in preference to manually creating a PersistentVolumeClaim. For database workloads that can't be replicated, you can't set replicas: greater than 1 in either case, but the PVC management is valuable. You usually can't have multiple databases pointing at the same physical storage, containers or otherwise, and most types of Volumes can't be shared across Pods.
We can deploy a database to Kubernetes as a stateful application. Usually, when we deploy pods they have their own storage, but that storage is ephemeral - if the container kills its storage, it’s gone with it.
So, we’ll have a Kubernetes object to tackle that scenario: when we want our data to persist we attach a pod with a respective persistent volume claim. By doing so, if our container kills our data, it will be in the cluster, and the new pod will access the data accordingly.
Some limitations of using StatefulSet are:
1.Required use of persistent volume provisioner to provision storage for pod-based on request storage class.
2.Deleting or scaling down the replicas will not delete the volume attached to StatefulSet. It ensures the safety of the data.
3.StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods.
4.StatefulSet doesn’t provide any guarantee to delete all pods when StatefulSet is deleted, unlike deployment, which deletes all pods associated with deployment when the deployment is deleted. You have to scale down pod replicas to 0 before deleting StatefulSet.
stateful set useful for running the application which stores the state basically.
Stateful set database run the multiple replicas of POD and PVC however internally they all auto sync. Data sync across the pods and PVC.
So ideally it's best option to use the stateful sets with multiple replicas to get the HA database.
Now it depends on the use case which database you want to use, it supports replication or not clustering, etc.
here is MySQL example with replication details : https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/
I am having a stateful application - I am keeping data in user's sessions (basically data in HttpSession object) - but I do not have any requirement to write anything to persistent disk.
From what I have read so far - StatefulSet workloads are meant for stateful applications, but my understanding so far is that even though my application is a stateful application but Deployment workloads can also suffice my requirement because I do not want to write anything to persistent disks.
However, one point I am not sure about is that suppose I use Deployment workload and a lot of user data is present in my HttpSession object, now due to some reason Kubernetes restarts my Pod then of course all that user session data will be lost. So, my question are following:
Does StatefulSet handles this situation any better than Deployment workload?
So, only difference between Deployment workload and StatefulSet workload is about absence/presence of persistent disk or there is something to do with application session management as well in case of StatefulSet?
Does StatefulSet handles this situation any better than Deployment workload?
No. Neither Deployment nor StatefulSet will preserve memory contents. To preserve session information, you'll need to store it somewhere. One common approach is to use Redis.
So, only difference between Deployment workload and StatefulSet workload is about absence/presence of persistent disk or there is something to do with application session management as well in case of StatefulSet?
No, there are other differences:
StatefulSets create (and re-create) deterministic, consistent pod names (identifiers).
StatefulSets, are deployed, scaled, and updated one by one in a deterministic, consistent order. The next pod will be created only after the previous one reached the Running state.
Additionally, it's worth mentioning that persistent disks can be attached to pods that aren't part of a StatefulSet. It's just that it's convenient to have disks always be attached to a pod with a consistent id. For instance if you have pods running a replicated database, you can use StatefulSets to ensure that the master replica's disk is always attached to pod #1.
Edit:
Link to official documentation about StatefulSets
From the documentation:
Like a Deployment, a StatefulSet manages Pods that are based on an
identical container spec. Unlike a Deployment, a StatefulSet maintains
a sticky identity for each of their Pods. These pods are created from
the same spec, but are not interchangeable: each has a persistent
identifier that it maintains across any rescheduling.
...
StatefulSets are valuable for applications that require one or more of
the following.
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
In the above, stable is synonymous with persistence across Pod
(re)scheduling. If an application doesn't require any stable
identifiers or ordered deployment, deletion, or scaling, you should
deploy your application using a workload object that provides a set of
stateless replicas. Deployment or ReplicaSet may be better suited to
your stateless needs.
I have a MySQL database pod with 3 replicas.Now I'm making some changes in one pod(pod data,not pod configuration), say I'm adding a table.How will the change reflect on the other replicas of the pod?
I'm using kubernetes v1.13 with 3 worker nodes.
PODs do not sync. Think of them as independend processes.
If you want a clustered MySQL installation, the Kubernetes docs describe how to do this by using a StatefulSet: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/#deploy-mysql
In essence you have to configure master/slave instances of MySQL yourself.
Pods are independent from each other, if you modify one pod the others will not be affected
As per your configuration - changes applied in one pod wont be reflected on all others. These are isolated resources.
There is a good practice to deploy such things using PersistentVolumeClaims and StatefulSets.
You can always find explanation with examples and best practices in Run a Replicated Stateful Application documentation.
If you have three mysql server pods, then you have 3 independent databases. Even though you created them from the same Deployment. So, depending on what you do, you might end up with bunch of databases in the cluster.
I would create 1 mysql pod, with persistence, so if one pod dies, the next one would take if from where the other one left. Would not lose data.
If what you want is high availability, or failover replica, you would need to manage it on your own.
Generally speaking, K8s should not be used for storage purposes.
You are good to have common storage among those 3 pods (PVC) and also consider STS when running databases on k8s.