MariaDB Server vs MariaDB Galera Cluster HA Replication - kubernetes

I am planning to deploy HA database cluster on my kubernetes cluster. I am new to database and I am confused by the various database terms. I have decided on MariaDB and I have found two charts, MariaDB and MariaDB Galera Cluster.
I understand that both can achieve the same goal, but what are the main differences between the two? Under what scenario I should use either or?
Thanks in advance!

I'm not an expert so take my explanation with precaution (and double check it)
The main difference between the MariaDB's Chart and the MariaDB Galera Cluster's Chart is that the first one will deploy the standard master-slave (or primary-secondary) database, while the second one is a resilient master-master (or primary-primary) database cluster.
What does it means in more detail is the following:
MariaDB Chart will deploy a Master StatefulSet and a Slave StatefulSet which will spawn (with default values) one master Pod and 2 slave Pods. Once your database is up and running, you can connect to the master and write or read data, which is then replicated on the slaves, so that you have safe copies of your data available.
The copies can be used to read data, but only the master Pod can write new data in the database. Should the Pod crash.. or the Kubernetes cluster node where the Pod is running malfunction, you will not be able to write new data until the master's Pod is once more up and running (which may require manual intervention).. or if you perform a failover, promoting one of the other Pods to be the new temporary master (which also requires a manual intervention or some setup with proxies or virtual ips and so on).
Galera Cluster Chart instead, will deploy something more resilient. With default values, it will create a single StatefulSet with 3 Pods.. and each one of these Pods will be able to either read and write data, acting virtually as a master.
This means that if one of the Pods stop working for whatever reason, the other 2 will continue serving the database as if nothing happened, making the whole thing way more resilient. When the Pod (which stopped working) will come back up and running, it will obtain the new / different data from the other Pods, getting in sync.
In exchange for the resilience of the whole infrastructure (it would be too easy if the Galera Cluster solution would offer extreme resilience with no drawbacks), there are some cons in a multi-master application, with the more commons being some added latency in the operations, required to keep everything in sync and consistent.. and added complexity, which often may brings headaches.
There are several other limits with Galera Cluster, like explicit LOCKS of tables not working or that all tables must declare a primary key. You can find the full list here (https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/)
Deciding between the two solutions mostly depends on the following question:
Do you have the necessity that, should one of your Kubernetes cluster node fail, the database keeps working (and being usable by your apps) like nothing happened, even if one of its Pods was running on that particular node?

Related

How can I make a Kubernetes multi-cluster Redis solution?

I deployed this week a Redis instance using Bitnami's Helm Chart into a GKE (Google Kubernetes Engine) cluster. Although I've been successful on this part, the challenge now is to make a failover disaster recovery strategy that replicates the data to another Redis instance in another GKE cluster (but same GCP project). How can I do this? I tried Persistence Volume Claims but they are only visible inside the cluster.
Redis Enterprise does have a WAN (multi-geo) replication mode, however I've never used it and it looks like it dramatically limits which features of Redis you can use to things that are compatible with a CRDT model. Basically first you need to answer how you would do this by hand, and then investigate automating that. WAN failover is a very complex thing, and generally you wouldn't even want to do it since you wouldn't fail over at the Redis level. Instead you would fail over your entire DC (or whatever you want to call your failure zones). Distributed database modelling and management is very tricky, here be dragons.

How pod replicas sync with each other - Kubernetes?

I have a MySQL database pod with 3 replicas.Now I'm making some changes in one pod(pod data,not pod configuration), say I'm adding a table.How will the change reflect on the other replicas of the pod?
I'm using kubernetes v1.13 with 3 worker nodes.
PODs do not sync. Think of them as independend processes.
If you want a clustered MySQL installation, the Kubernetes docs describe how to do this by using a StatefulSet: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/#deploy-mysql
In essence you have to configure master/slave instances of MySQL yourself.
Pods are independent from each other, if you modify one pod the others will not be affected
As per your configuration - changes applied in one pod wont be reflected on all others. These are isolated resources.
There is a good practice to deploy such things using PersistentVolumeClaims and StatefulSets.
You can always find explanation with examples and best practices in Run a Replicated Stateful Application documentation.
If you have three mysql server pods, then you have 3 independent databases. Even though you created them from the same Deployment. So, depending on what you do, you might end up with bunch of databases in the cluster.
I would create 1 mysql pod, with persistence, so if one pod dies, the next one would take if from where the other one left. Would not lose data.
If what you want is high availability, or failover replica, you would need to manage it on your own.
Generally speaking, K8s should not be used for storage purposes.
You are good to have common storage among those 3 pods (PVC) and also consider STS when running databases on k8s.

Kubernetes with hybrid containers on one VM?

I have played around a little bit with docker and kubernetes. Need some advice here on - Is it a good idea to have one POD on a VM with all these deployed in multiple (hybrid) containers?
This is our POC plan:
Customers to access (nginx reverse proxy) with a public API endpoint. eg., abc.xyz.com or def.xyz.com
List of containers that we need
Identity server Connected to SQL server
Our API server with Hangfire. Connected to SQL server
The API server that connects to Redis Server
The Redis in turn has 3 agents with Hangfire load-balanced (future scalable)
Setup 1 or 2 VMs?
Combination of Windows and Linux Containers, is that advisable?
How many Pods per VM? How many containers per Pod?
Should we attach volumes for DB?
Thank you for your help
Cluster size can be different depending on the Kubernetes platform you want to use. For managed solutions like GKE/EKS/AKS you don't need to create a master node but you have less control over our cluster and you can't use latest Kubernetes version.
It is safer to have at least 2 worker nodes. (More is better). In case of node failure, pods will be rescheduled on another healthy node.
I'd say linux containers are more lightweight and have less overhead, but it's up to you to decide what to use.
Number of pods per VM is defined during scheduling process by the kube-scheduler and depends on the pods' requested resources and amount of resources available on cluster nodes.
All data inside running containers in a Pod are lost after pod restart/deletion. You can import/restore DB content during pod startup using Init Containers(or DB replication) or configure volumes to save data between pod restarts.
You can easily decide which container you need to put in the same Pod if you look at your application set from the perspective of scaling, updating and availability.
If you can benefit from scaling, updating application parts independently and having several replicas of some crucial parts of your application, it's better to put them in the separate Deployments. If it's required for the application parts to run always on the same node and if it's fine to restart them all at once, you can put them in one Pod.

Kubernetes Citus setup with individual hostname/ip

I am in the process of learning Kubernetes with a view to setting up a simple cluster with Citus DB and I'm having a little trouble with getting things going, so would be grateful for any help.
I have a docker image containing my base debian image configured for Citus for the project, and I want to set it up at this point with one master, that should mount a GCP master disk with a Postgres DB that I'll then distribute among the other containers, each mounted with a individual separate disk with empty tables (configured with the Citus extension) to hold what gets distributed to each. I'd like to automate this further at some point, but now I'm aiming for just a master container, and eight nodes. My plan is to create a deployment that opens port 5432 and 80 on each node, and I thought that I can create two pods, one to hold the master and one to hold the eight nodes. Ideally I'd want to mount all the disks and then run a post-mount script on the master that will find all the node containers (by IP or hostname??), add them as Citus nodes, then run create_distributed_table to distribute the data.
My confusion at present is about how to label all the individual nodes so they will keep their internal address or hostname and so in the case of one going down it will be replaced and resume with the data on the PD. I've read about ConfigMaps and setting hostname aliases but I'm still unclear about how to proceed. Is this possible, or is this the wrong way to approach this kind of setup?
You are looking for a StatefulSet. That lets you have a known number of pod replicas; with attached storage (PersistentVolumes); and consistent DNS names. In the pod spec I would launch only a single copy of the server and use the StatefulSet's replica count to control the number of "nodes" (also a Kubernetes term), if the replica is #0 then it's the master.

GitLab HA with Kubernetes and Gluster

I currently have GitLab omnibus setup on Docker. I plan to have HA for the same by adding it to Kubernetes and have persistence using Gluster. I have played around configuring Kubernetes with Gluster. Now it's time to bring GitLab into Kubernetes. GitLab uses PostgreSQL as the default db.
My query is that to implement HA, should i
a) split GitLab into GitLab application and PostgreSQL container, and then run both (Application and DB) in their own cluster of pods i.e., separate deployments of replicas of GitLab app and PostgreSQL?
OR
b) keep using the omnibus installer and just have replicas of this single, standalone container?
Does it really make any difference whether
1) writes happen to a db cluster exposed via service to the GitLab app
OR
2) writes happening directly to the omnibus GitLab container (which has db within itself)
Just want to make sure that i don't unnecessarily end up making the setup complex. Having GitLab in Kubernetes along with Gluster already makes things a little complex. So does splitting app and db makes sense or just the omnibus setup will suffice? Concerned about concurrent writes to db.
According to http://docs.gitlab.com/ce/install/kubernetes/gitlab_omnibus.html#introduction you should use dedicated Redis and PostgreSQL HA clusters. Option b) and 1)
For less downtime better to use PostgreSQL master-slave cluster (https://www.postgresql.org/docs/10/static/different-replication-solutions.html) and Redis Cluster master-slave (https://redis.io/topics/cluster-tutorial). "Note that the minimal (Redis) cluster that works as expected requires to contain at least three master nodes".
If you will use only GlusterFS to bring failover to PostgreSQL, you can get some errors requires manual repair when one DB instance crashes and another brings up. Like this: How do I fix Postgres so it will start after an abrupt shutdown?