Add on-premise CockroachDB node to a cluster hosted in Kubernetes - kubernetes

I'm planning to deploy a small Kubernetes cluster (3x 32GB Nodes). I'm not experienced with K8S and I need to come up with some kind of resilient SQL database setup and CockroachDB seems like a great choice.
I wonder if it's possible to relatively easy deploy a configuration, where some CockroachDB instances (nodes?) are living inside the K8S cluster, but at the same time some other instances live outside the K8S cluster (2 on-premise VMs). All those CockroachDB would need to be considered a single CockroachDB cluster. It might be also worth noting that Kubernetes would be hosted in the cloud (eg. Linode).
By relatively easy I mean:
simplish to deploy
requiring little maintenance

Yes, it's straight forward to do a multi-cloud deployment of CRDB. This is one of the great advantages of cockroachdb. Simply run the cockroach start command on each of the VMs/pods running cockroachdb and they will form a cluster.
See this blog post/tutorial for more info: https://www.cockroachlabs.com/blog/multi-cloud-deployment/

Related

Hybrid nodes on single kubernetes cluster

I am now running two kubernetes clusters.
First Cluster is running on bare metal, and Second Cluster is running on EKS.
but since maintaining EKS costs a lot, so I am finding ways to change this service as Single Cluster that autoscales on AWS.
I did tried to consider several solutions such as RHACM, Rancher and Anthos.
But those solutions are for controlling multi cluster.
I just want to change this cluster as "onpremise based cluster that autoscales (on AWS) when lack of resources"
I could find "EKS anywhere" solution but since price is too high, I want to build similar architecture.
need advice for any use cases for ingress controller, or (physical) loadbalancer, or other architecture that could satisfies those conditions
Cluster API is probably what you need. It is a concept of creating Clusters with Machine objects. These Machine objects are then provisioned using a Provider. This provider can be Bare Metal Operator provider for your bare metal nodes and Cluster API Provider AWS for your AWS nodes. All resting in a single cluster (see the docs below for many other provider types).
You will run a local Kubernetes cluster which will have the Cluster API running in it. This will include components that will allow you to be able to create different Machine objects and tell Kubernetes also how to provision those machines.
Here is some more reading:
Cluster API Book: Excellent reading on the topic.
Documentation for CAPI Provider - AWS.
Documentation for the Bare Metal Operator I worked on this project for a couple of years and the community is pretty amazing. This GitHub repository hosts the CAPI Provider for bare metal nodes.
This should definitely get you going. You can start by running different providers individually to get a taste of how they work and then work with Cluster API and see it in function.

How to simulate node joins and failures with a local Kubernetes cluster?

I'm developing a Kubernetes scheduler and I want to test its performance when nodes join and leave a cluster, as well as how it handles node failures.
What is the best way to test this locally on Windows 10?
Thanks in advance!
Unfortunately, you can't add nodes to Docker Desktop with Kubernetes enabled. Docker Desktop is single-node only.
I can think of two possible solutions, off the top of my head:
You could use any of the cloud providers. Major (AWS, GCP, Azure) ones have some kind of free tier (under certain usage, or timed). Adding nodes in those environments is trivial.
Create local VM for each node. This is less than perfect solution - very resource intesive. To make adding nodes easier, you could use kubeadm to provision your cluster.

Good solutions to automate infrastructure deployment locally?

I have recently been reading more about infrastructure as a service (IaaS) and platform as a service (PaaS) and had some questions. I see when we opt for a PaaS solution, it is generally very easy to create the infrastructure as the cloud providers handle that for us and we can even automate the deployment using an infrastructure as code solution like Terraform.
But if we use an IaaS solution or even a local on premise cluster, we lose a lot of the automation it seems that PaaS allows. So I was curious, are there any good tools out there for automating infrastructure deployment on a local cluster that is not in the cloud?
The best thing I could think of was to run a local Kubernetes cluster and then Dockerize each of the infrastructure components, but this seems difficult as each node in the cluster will need its own specific configuration files.
From my basic Googling, it seems like there is not a good solution to this.
Edit:
I was not clear enough with my original intentions. I have two problems I am trying to solve.
How do I automate infrastructure deployment locally? For example, suppose I wanted to create a Hadoop HDFS cluster. I would need to configure one node to be the namenode with an accessible IP, and the other nodes to be datanodes that are aware of the namenode's IP. At the moment, I have to do this manually by logging into each node, checking it's IP, and then configuring each one. How would I automate this? If I were to use a Kubernetes approach, how do I specify that one of the running pods needs to be the namenode and the others are datanodes? How do I find the pods' IPs and have them be aware of the namenode IP?
The next problem I have is very similar to the first, but a slight modification. How would I deploy specific configuration files to each node. For instance in Kafka, the configuration file for one node, requires the IPs of the Zookeeper nodes, as well as the IP it should listen on. This may be different for every node in the cluster. Is there a good way to make these config files pod specific, so that I do not have to do bash text processing to insert the correct contents into each pod's config files?
You can use Terraform for all of your on-premise Infra. Automation, and Ansible for configuration management.
Let's say you have three HPE servers, Install K8s or VMware on them using Ansible, then you can treat them as three Avvaliabilty zones in one region, same as AWS. from this you can start deploying dockerize apps, or helm charts using Terraform.
Summary:
Ansbile for installing and configuration K8s.
Terraform for provisioning K8s.
Helm for installing apps on K8s.
After this you gonna have a base automated on-premise Infra.

High available kubernetes cluster? bootkube or kubeadm self-hosting

I am already running a single master kubernetes cluster now and I am doing research about setting up Highly available Kubernetes clusters. I was thinking of Multi master cluster setup then realized self-hosted cluster might be a better option to go future ready.
Additional challenge is I am doing it in Bare Metal (Meaning, I am going to use cloud vms from these cloud provider, Hetzner, Linode, DigitialOcean and they have CSI driver, cloud controller manager etc., )
In this case, I see 2 options.
Setup with bootkube (https://github.com/kubernetes-sigs/bootkube)
Setup with kubeadm self-hosting. (https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/self-hosting/)
I assume this is still an early topic hence I am not able to find guidance to choose the right approach and then correct documentation. I need this for a scalable production environment where I will start small with at least 8 nodes and can grow faster.
Is bootkube considerable for future readiness?
or kubeadm self-hosting is still in alpha stage, am I getting into a risk running a production environment?
Any good, documentation, blog, article to go in this direction?
I use Keepalived + Haproxy and Ansible to deploy HA kubernetes cluster. Now kubeadm supports join control plane command, so it easy to integrate with ansible.
You can also refer: https://github.com/kubernetes-sigs/kubespray.

GitLab HA with Kubernetes and Gluster

I currently have GitLab omnibus setup on Docker. I plan to have HA for the same by adding it to Kubernetes and have persistence using Gluster. I have played around configuring Kubernetes with Gluster. Now it's time to bring GitLab into Kubernetes. GitLab uses PostgreSQL as the default db.
My query is that to implement HA, should i
a) split GitLab into GitLab application and PostgreSQL container, and then run both (Application and DB) in their own cluster of pods i.e., separate deployments of replicas of GitLab app and PostgreSQL?
OR
b) keep using the omnibus installer and just have replicas of this single, standalone container?
Does it really make any difference whether
1) writes happen to a db cluster exposed via service to the GitLab app
OR
2) writes happening directly to the omnibus GitLab container (which has db within itself)
Just want to make sure that i don't unnecessarily end up making the setup complex. Having GitLab in Kubernetes along with Gluster already makes things a little complex. So does splitting app and db makes sense or just the omnibus setup will suffice? Concerned about concurrent writes to db.
According to http://docs.gitlab.com/ce/install/kubernetes/gitlab_omnibus.html#introduction you should use dedicated Redis and PostgreSQL HA clusters. Option b) and 1)
For less downtime better to use PostgreSQL master-slave cluster (https://www.postgresql.org/docs/10/static/different-replication-solutions.html) and Redis Cluster master-slave (https://redis.io/topics/cluster-tutorial). "Note that the minimal (Redis) cluster that works as expected requires to contain at least three master nodes".
If you will use only GlusterFS to bring failover to PostgreSQL, you can get some errors requires manual repair when one DB instance crashes and another brings up. Like this: How do I fix Postgres so it will start after an abrupt shutdown?