Running kubectl commands in parallel with different credentials - kubernetes

I'm currently running two Kubernetes clusters one on Google cloud and one on IBM cloud. To manage them I use kubectl. I've made a script that executes some commands on one of the clusters then switches to the other and does some other work there.
This works fine as long as the script only runs in one process, however when run in parallel the credentials are sometimes overwritten by one process when in use by another and this obviously causes issues.
I therefore want to know if I can supply kubectl with a credentials file for every call, instead of storing it in a environmental variable with kubectl config set-credentials.
Any help/solution is much appreciated.

If I need to work with multiple clusters using kubectl I am splitting my terminal and setting KUBECONFIG for each split:
For my first split:
export KUBECONFIG=~/.kube/cluster1
For the second split
export KUBECONFIG=~/.kube/cluster2
It is working pretty well, but this approach has one issue:
If you are using some kind of prompt with the current Kubernetes context it will give you different output and it might be missing leading.
For scripts, I am just changing value of KUBECONFIG in for loop, to loop over each cluster.

You need to use Kubefed in order to manage multiple clusters.
It will take one cluster as the main one, and execute all the same requests to the second cluster.

Related

GCloud authentication race conditions

I'm trying to avoid race conditions with gcloud / gsutil authentication on the same system but different CI/CD jobs on my Gitlab-Runner on a Mac Mini.
I have tried setting the auth manually with
RUN gcloud auth activate-service-account --key-file="gitlab-runner.json"
RUN gcloud config set project $GCP_PROJECT_ID
for the Dockerfile (in which I'm performing a download operation from a Google Cloud Storage bucket).
I'm using a configuration in the bash script to run the docker command and in the same script for authenticating I'm using
gcloud config configurations activate $TARGET
Where I've previously done the above two commands to save them to the configuration.
The configurations are working fine if I start the CI/CD jobs one after the other has finished. But I want to trigger them for all clients at the same time, which causes race conditions with gcloud authentication and one of the jobs trying to download from the wrong project bucket.
How to avoid a race condition? I'm already authenticating before each gsutil command but still its causing the race condition. Do I need something like CloudBuild to separate the runtime environments?
You can use Cloud Build to get separate execution environments but this might be an overkill for your use case, as a Cloud Build worker uses an entire VM which might be just too heavy, linux containers / Docker can provide necessary isolation as well.
You should make sure that each container you run has a unique config file placed in the path expected by gcloud. The issue may come from improper volume mounting (all the containers share the same location from the host/OS), or maybe you should mount a directory containing their configuration file (unique for each bucket) on running an image, or perhaps you should run gcloud config configurations activate in a Dockerfile step (thus creating image variants for different buckets if it’s feasible).
Alternatively, and I think this solution might be easier, you can switch from Cloud SDK distribution to standalone gsutil distribution. That way you can provide a path to a boto configuration file through an environment variable.
Such variables can be specified on running a Docker image.

How can I get log rotation working inside a kubernetes container/pod?

Our setup:
We are using kubernetes in GCP.
We have pods that write logs to a shared volume, with a sidecar container that sucks up our logs for our logging system.
We cannot just use stdout instead for this process.
Some of these pods are long lived and are filling up disk space because of no log rotation.
Question:
What is the easiest way to prevent the disk space from filling up here (without scheduling pod restarts)?
I have been attempting to install logrotate using: RUN apt-get install -y logrotate in our Dockerfile and placing a logrotate config file in /etc/logrotate.d/dynamicproxy but it doesnt seem to get run. /var/lib/logrotate/status never gets generated.
I feel like I am barking up the wrong tree or missing something integral to getting this working. Any help would be appreciated.
We ended up writing our own daemonset to properly collect the logs from the nodes instead of the container level. We then stopped writing to shared volumes from the containers and logged to stdout only.
We used fluentd to the logs around.
https://github.com/splunk/splunk-connect-for-kubernetes/tree/master/helm-chart/splunk-kubernetes-logging
In general, you should write logs to stdout and configure log collection tool like ELK stack. This is the best practice.
However, if you want to run logrotate as a separate process in your container - you may use Supervisor, which serves as a very simple init system and allows you to run as many parallel process in container as you want.
Simple example for using Supervisor for rotating Nginx logs can be found here: https://github.com/misho-kr/docker-appliances/tree/master/nginx-nodejs
If you write to the filesystem the application creating the logs should be responsible for rotation. If you are running a java application with logback or log4j it is simple configuration change. For other languages/frameworks it is usually similar.
If that is not an option you could use a specialized tool to handle the rotation and piping the output to it. One example would be http://cr.yp.to/daemontools/multilog.html
As method of last resort you could investigate to log into a named pipe (FIFO) instead of a real file and have some other process handling the retrieval and writing of the data - including the rotation.

Statefulset - Possible to Skip creation of pod 0 when it fails and proceed with the next one?

I currently do have a problem with the statefulset under the following condition:
I have a percona SQL cluster running with persistent storage and 2 nodes
now i do force both pods to fail.
first i will force pod-0 to fail
Afterwards i will force pod-1 to fail
Now the cluster is not able to recover without manual interference and possible dataloss
Why:
The statefulset is trying to bring pod-0 up first, however this one will not be brought online because of the following message:
[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1
What i could do alternatively, but what i dont really like:
I could change ".spec.podManagementPolicy" to "Parallel" but this could lead to race conditions when forming the cluster. Thus i would like to avoid that, i basically like the idea of starting the nodes one after another
What i would like to have:
the possibility to have ".spec.podManagementPolicy":"OrderedReady" activated but with the possibility to adjust the order somehow
to be able to put specific pods into "inactive" mode so they are being ignored until i enable them again
Is something like that available? Does someone have any other ideas?
Unfortunately, nothing like that is available in standard functions of Kubernetes.
I see only 2 options here:
Use InitContainers to somehow check the current state on relaunch.
That will allow you to run any code before the primary container is started so you can try to use a custom script in order to resolve the problem etc.
Modify the database startup script to allow it to wait for some Environment Variable or any flag file and use PostStart hook to check the state before running a database.
But in both options, you have to write your own logic of startup order.

Call one container from another in kubernetes

Say I have a container image that contains a large command-line program that is executed from the shell. I have another container that contains a scheduler whose job it is to invoke the first container when it receives a certain signal. For various reasons I don't want to put them in the same container (mainly because the scheduler can invoke many different tools, and different versions of those tools, and I don't want to have to put all the tools and their versions in the same container image.)
I know how to put two containers in the same pod. However, the default behavior is to run both containers at startup. What I want to be able to do is to have the scheduler be able to decide when to invoke the other container, and to be able to specify the command-line arguments (and ideally, environment variables) for it. Also, I need to know the exit status. Extra credit for getting stdout/stderr, but I can hack around with volumes if I need to.
I also know how to do this if the second container was a server, but in this case it's a shell program.
A quick way to do this is:
Add a kubectl proxy in your container startup
Then call a kubernetes job from the first pod.
This would create a lightweight solution in which the desired job can be queried for success state, seemingly fulfilling your requirements

Using Ansible to automatically configure AWS autoscaling group instances

I'm using Amazon Web Services to create an autoscaling group of application instances behind an Elastic Load Balancer. I'm using a CloudFormation template to create the autoscaling group + load balancer and have been using Ansible to configure other instances.
I'm having trouble wrapping my head around how to design things such that when new autoscaling instances come up, they can automatically be provisioned by Ansible (that is, without me needing to find out the new instance's hostname and run Ansible for it). I've looked into Ansible's ansible-pull feature but I'm not quite sure I understand how to use it. It requires a central git repository which it pulls from, but how do you deal with sensitive information which you wouldn't want to commit?
Also, the current way I'm using Ansible with AWS is to create the stack using a CloudFormation template, then I get the hostnames as output from the stack, and then generate a hosts file for Ansible to use. This doesn't feel quite right – is there "best practice" for this?
Yes, another way is just to simply run your playbooks locally once the instance starts. For example you can create an EC2 AMI for your deployment that in the rc.local file (Linux) calls ansible-playbook -i <inventory-only-with-localhost-file> <your-playbook>.yml. rc.local is almost the last script run at startup.
You could just store that sensitive information in your EC2 AMI, but this is a very wide topic and really depends on what kind of sensitive information it is. (You can also use private git repositories to store sensitive data).
If for example your playbooks get updated regularly you can create a cron entry in your AMI that runs every so often and that actually runs your playbook to make sure your instance configuration is always up to date. Thus avoiding having "push" from a remote workstation.
This is just one approach there could be many others and it depends on what kind of service you are running, what kind data you are using, etc.
I don't think you should use Ansible to configure new auto-scaled instances. Instead use Ansible to configure a new image, of which you will create an AMI (Amazon Machine Image), and order AWS autoscaling to launch from that instead.
On top of this, you should also use Ansible to easily update your existing running instances whenever you change your playbook.
Alternatives
There are a few ways to do this. First, I wanted to cover some alternative ways.
One option is to use Ansible Tower. This creates a dependency though: your Ansible Tower server needs to be up and running at the time autoscaling or similar happens.
The other option is to use something like packer.io and build fully-functioning server AMIs. You can install all your code into these using Ansible. This doesn't have any non-AWS dependencies, and has the advantage that it means servers start up fast. Generally speaking building AMIs is the recommended approach for autoscaling.
Ansible Config in S3 Buckets
The alternative route is a bit more complex, but has worked well for us when running a large site (millions of users). It's "serverless" and only depends on AWS services. It also supports multiple Availability Zones well, and doesn't depend on running any central server.
I've put together a GitHub repo that contains a fully-working example with Cloudformation. I also put together a presentation for the London Ansible meetup.
Overall, it works as follows:
Create S3 buckets for storing the pieces that you're going to need to bootstrap your servers.
Save your Ansible playbook and roles etc in one of those S3 buckets.
Have your Autoscaling process run a small shell script. This script fetches things from your S3 buckets and uses it to "bootstrap" Ansible.
Ansible then does everything else.
All secret values such as Database passwords are stored in CloudFormation Parameter values. The 'bootstrap' shell script copies these into an Ansible fact file.
So that you're not dependent on external services being up you also need to save any build dependencies (eg: any .deb files, package install files or similar) in an S3 bucket. You want this because you don't want to require ansible.com or similar to be up and running for your Autoscale bootstrap script to be able to run. Generally speaking I've tried to only depend on Amazon services like S3.
In our case, we then also use AWS CodeDeploy to actually install the Rails application itself.
The key bits of the config relating to the above are:
S3 Bucket Creation
Script that copies things to S3
Script to copy Bootstrap Ansible. This is the core of the process. This also writes the Ansible fact files based on the CloudFormation parameters.
Use the Facts in the template.