Where/How to configure Cassandra.yaml when deployed by Google Kubernetes Engine - kubernetes

I can't find the answer to a pretty easy question: Where can I configure Cassandra (normally using Cassandra.yaml) when its deployed on a cluster with kubernetes using the Google Kubernetes Engine?
So I'm completely new to distributed databases, Kubernetes etc. and I'm setting up a cassandra cluster (4VMs, 1 pod each) using the GKE for a university course right now.
I used the official example on how to deploy Cassandra on Kubernetes that can be found on the Kubernetes homepage (https://kubernetes.io/docs/tutorials/stateful-application/cassandra/) with a StatefulSet, persistent volume claims, central load balancer etc. Everything seems to work fine and I can connect to the DB via my java application (using the datastax java/cassandra driver) and via Google CloudShell + CQLSH on one of the pods directly. I created a keyspace, some tables and started filling them with data (~100million of entries planned), but as soon as the DB reaches some size, expensive queries result in a timeout exception (via datastax and via cql), just as expected. Speed isn't necessary for these queries right now, its just for testing.
Normally I would start with trying to increase the timeouts in the cassandra.yaml, but I'm unable to locate it on the VMs and have no clue where to configure Cassandra at all. Can someone tell me if these configuration files even exist on the VMs when deploying with GKE and where to find them? Or do I have to configure those Cassandra details via Kubectl/CQL/StatefulSet or somewhere else?

I think the faster way to configure cassandra in Kubernetes Engine, you could use the next deployment of Cassandra from marketplace, there you could configure your cluster and you could follow this guide that is also marked there to configure it correctly.
======
The timeout config seems to be a configuration that require to be modified inside the container (Cassandra configuration itself).
you could use the command: kubectl exec -it POD_NAME -- bash in order to open a Cassandra container shell, that will allow you to get into the container configurations and you could look up for the configuration and change it for what you require.
after you have the configuration that you require you will need to automate it in order to avoid manual intervention every time that one of your pods get recreated (as configuration will not remain after a container recreation). Next options are only suggestions:
Create you own Cassandra image from am own Docker file, changing the value of the configuration you require from there, because the image that you are using right now is a public image and the container will always be started with the config that the pulling image has.
Editing the yaml of your Satefulset where Cassandra is running you could add an initContainer, which will allow to change configurations of your running container (Cassandra) this will make change the config automatically with a script ever time that your pods run.
choose the option that better fits for you.

Related

Export logs of Kubernetes cronjob to a path after each run

I currently have a Cronjob that has a job that schedule at some period of time and run in a pattern. I want to export the logs of each pod runs to a file in the path as temp/logs/FILENAME
with the FILENAME to be the timestamp of the run being created. How am I going to do that? Hopefully to provide a solution. If you would need to add a script, then please use python or shell command. Thank you.
According to Kubernetes Logging Architecture:
In a cluster, logs should have a separate storage and lifecycle
independent of nodes, pods, or containers. This concept is called
cluster-level logging.
Cluster-level logging architectures require a separate backend to
store, analyze, and query logs. Kubernetes does not provide a native
storage solution for log data. Instead, there are many logging
solutions that integrate with Kubernetes.
Which brings us to Cluster-level logging architectures:
While Kubernetes does not provide a native solution for cluster-level
logging, there are several common approaches you can consider. Here
are some options:
Use a node-level logging agent that runs on every node.
Include a dedicated sidecar container for logging in an application pod.
Push logs directly to a backend from within an application.
Kubernetes does not provide log aggregation of its own. Therefore, you need a local agent to gather the data and send it to the central log management. See some options below:
Fluentd
ELK Stack
You can find all logs that PODs are generating at /var/log/containers/*.log
on each Kubernetes node. You could work with them manually if you prefer, using simple scripts, but you will have to keep in mind that PODs can run on any node (if not restricted), and nodes may come and go.
Consider sending your logs to an external system like ElasticSearch or Grafana Loki and manage them there.

How to share configuration files between different clusters belonging to the same project in Google cloud Platform?

I have a cluster with several workloads and different configurations on GCP's Kubernetes Engine.
I want to create a clone of this existing cluster along with cloning all the workloads in it. It turns out, you can clone a cluster but not the workloads.
So, at this point, I'm copying the deployment yaml's of the workloads from the cluster which is working fine, and using them for the newly created workload's in the newly created cluster.
When I'm deploying the pods of this newly created workload, the pods are stuck in the pending state.
In the logs of the container, I can see that the error has something to do with Redis.
The error it shows is, Error: Redis connection to 127.0.0.1:6379 failed - connect ECONNREFUSED 127.0.0.1:6379 at TCPConnectWrap.afterConnect [as oncomplete].
Also, when I'm connected with the first cluster and run the command,
kubectl get secrets -n=development, it shows me a bunch of secrets which are supposed to be used by my workload.
However, when I'm connected with the second cluster and run the above kubectl command, I just see one service related secret.
My question is how do I make my workload of the newly created cluster to use the configurations of the already existing cluster.
I think there are few things that can be done here:
Try to use kubectl config command and set the same context for both of your clusters.
You can find more info here and here
You may also try to use Kubernetes Cluster Federation. But bear in mind that it is still in alpha.
Remember that keeping your config in a version control system is generally a very good idea. You want to store it before the cluster applies defaults while exporting.
Please let me know if that helped.

decentralised, updatable configuration with kubernetes

I need to keep some configuration maybe files or otehwrise in all instances of a kubernetes docker image deployment.
I need the ability to remotely update the configuration in all of the running pods of the deployment. This is to be followed by invocation of some java code in all of the running pods of the docker image deployment.
Whenever a new pod comes up of the same docker image deployment it should have the updated configuration.
I dont want the configuration stored anywhere centrally as much as possible. Want it in each pod of the docker image deployment.
What are my choices?
As a last resort I could do it as a rolling deployment update.
R
Rolling deployment, or similar- update to a mounted config map, etc- is the kubernetes option. Always results in an application restart.
Having an application support live configuration updates, running some code after receiving those updates, without restart- that's an application feature.
Handwavy way of doing this-
Have the correct configuration live in a ConfigMap.
Have the application listen on a separate port for either a signal to retrieve updated configuration (if the application is k8s aware) or to actually receive the configuration bits themselves. Have the application be able to handle this live configuration update process, the difficulty of which depends on the framework in use.
Have another application be responsible for delivering these updates- watch for changes to the ConfigMap, get the list of Pods in the deployment, deliver either a signal or the updated configuration to each of the Pods.
Have the first application not get to what k8s recognizes as Ready state without having received updated configuration from the second.

GitLab HA with Kubernetes and Gluster

I currently have GitLab omnibus setup on Docker. I plan to have HA for the same by adding it to Kubernetes and have persistence using Gluster. I have played around configuring Kubernetes with Gluster. Now it's time to bring GitLab into Kubernetes. GitLab uses PostgreSQL as the default db.
My query is that to implement HA, should i
a) split GitLab into GitLab application and PostgreSQL container, and then run both (Application and DB) in their own cluster of pods i.e., separate deployments of replicas of GitLab app and PostgreSQL?
OR
b) keep using the omnibus installer and just have replicas of this single, standalone container?
Does it really make any difference whether
1) writes happen to a db cluster exposed via service to the GitLab app
OR
2) writes happening directly to the omnibus GitLab container (which has db within itself)
Just want to make sure that i don't unnecessarily end up making the setup complex. Having GitLab in Kubernetes along with Gluster already makes things a little complex. So does splitting app and db makes sense or just the omnibus setup will suffice? Concerned about concurrent writes to db.
According to http://docs.gitlab.com/ce/install/kubernetes/gitlab_omnibus.html#introduction you should use dedicated Redis and PostgreSQL HA clusters. Option b) and 1)
For less downtime better to use PostgreSQL master-slave cluster (https://www.postgresql.org/docs/10/static/different-replication-solutions.html) and Redis Cluster master-slave (https://redis.io/topics/cluster-tutorial). "Note that the minimal (Redis) cluster that works as expected requires to contain at least three master nodes".
If you will use only GlusterFS to bring failover to PostgreSQL, you can get some errors requires manual repair when one DB instance crashes and another brings up. Like this: How do I fix Postgres so it will start after an abrupt shutdown?

Couchbase on Google Container Engine resets itself

I have deployed a 4 node Couchbase cluster using Docker images on the Google Container Engine with Kubernetes. I was able to access the Couchbase Console, look at the buckets, query etc. Now, after a couple of days, I go the Console URL and the Couchbase initial setup screen comes up! As though this is a fresh install. I can see that the nodes and pods are all still up and running.
Had a similar problem on my Windows box with Docker cluster (No Kubernetes). I redeployed that again.
Anyone else experienced this?
When you destroy and recreate container instances all the underlying state is lost.
If you want to preserve the state of your couchbase installation you'll need to use a docker data volume. Just create one and mount your couchbase data file directory.
On gcp, you'll additionally want to map a directory on the data volume to a persistent disk.