GKE - How to expose pod for cross-cluster communication with MCS - kubernetes

In Google Cloud Platform, my goal is to have one cluster with a message queue and a pod to consume these in another cluster with MCS (Multi Cluster Service). When trying this out with only one cluster it went fairly smooth. I used the container name with the port number as the endpoint to connect to the redpanda message queue like this:
Now I want to do this between two clusters, but I'm having trouble configuring stuff right. This is my setup:
I followed this guide to set the clusters up which seemed to work (hard to tell, but no errors), and the redpanda application inside the pod is configured to be on localhost:9092. Unfortunately, I'm getting a Connection Error when running the consumer on my-service-export.my-ns.svc.clusterset.local:9092.
Is it correct to expose the pod with the message queue on its localhost?
Are there ways I can debug or test the connection between pods easier?

Ok, got it working. I obviously misread the setup at one point and had to re-do some stuff to get it working.
Also, the my-service-export should probably have the same name as the service you want to export, in my case redpanda.
A helpful tool to check the connection without running up a consumer is the simple dnsutils image. Use this deployment file and change the namespace to my-ns:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: my-ns # <----
spec:
containers:
- name: dnsutils
image: k8s.gcr.io/e2e-test-images/jessie-dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
Then spin it up with apply, exec into it, then run host my-service-export.my-ns.svc.clusterset.local. If you get an IP back you are probably good.

Related

upgrading to bigger node-pool in GKE

I have a node-pool (default-pool) in a GKE cluster with 3 nodes, machine type n1-standard-1. They host 6 pods with a redis cluster in it (3 masters and 3 slaves) and 3 pods with an nodejs example app in it.
I want to upgrade to a bigger machine type (n1-standard-2) with also 3 nodes.
In the documentation, google gives an example to upgrade to a different machine type (in a new node pool).
I have tested it while in development, and my node pool was unreachable for a while while executing the following command:
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do
kubectl cordon "$node";
done
In my terminal, I got a message that my connection with the server was lost (I could not execute kubectl commands). After a few minutes, I could reconnect and I got the desired output as shown in the documentation.
The second time, I tried leaving out the cordon command and I skipped to the following command:
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do
kubectl drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node";
done
This because if I interprete the kubernetes documentation correctly, the nodes are automatically cordonned when using the drain command. But I got the same result as with the cordon command: I lost connection to the cluster for a few minutes, and I could not reach the nodejs example app that was hosted on the same nodes. After a few minutes, it restored itself.
I found a workaround to upgrade to a new node pool with bigger machine types: I edited the deployment/statefulset yaml files and changed the nodeSelector. Node pools in GKE are tagged with:
cloud.google.com/gke-nodepool=NODE_POOL_NAME
so I added the correct nodeSelector to the deployment.yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
labels:
app: example
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
nodeSelector:
cloud.google.com/gke-nodepool: new-default-pool
containers:
- name: example
image: IMAGE
ports:
- containerPort: 3000
This works without downtime, but I'm not sure this is the right way to do in a production environment.
What is wrong with the cordon/drain command, or am I not using them correctly?
Cordoning a node will cause it to be removed from the load balancers backend list, so will a drain. The correct way to do it is to set up anti-affinity rules on the deployment so the pods are not deployed on the same node, or the same region for that matter. That will cause an even distribution of pods throught your node pool.
Then you have to disable autoscaling on the old node pool if you have it enabled, slowly drain 1-2 nodes a time and wait for them to appear on the new node pool, making sure at all times to keep one pod of the deployment alive so it can handle traffic.

Running kubectl proxy from same pod vs different pod on same node - what's the difference?

I'm experimenting with this, and I'm noticing a difference in behavior that I'm having trouble understanding, namely between running kubectl proxy from within a pod vs running it in a different pod.
The sample configuration run kubectl proxy and the container that needs it* in the same pod on a daemonset, i.e.
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
# ...
spec:
template:
metadata:
# ...
spec:
containers:
# this container needs kubectl proxy to be running:
- name: l5d
# ...
# so, let's run it:
- name: kube-proxy
image: buoyantio/kubectl:v1.8.5
args:
- "proxy"
- "-p"
- "8001"
When doing this on my cluster, I get the expected behavior. However, I will run other services that also need kubectl proxy, so I figured I'd rationalize that into its own daemon set to ensure it's running on all nodes. I thus removed the kube-proxy container and deployed the following daemon set:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-proxy
labels:
app: kube-proxy
spec:
template:
metadata:
labels:
app: kube-proxy
spec:
containers:
- name: kube-proxy
image: buoyantio/kubectl:v1.8.5
args:
- "proxy"
- "-p"
- "8001"
In other words, the same container configuration as previously, but now running in independent pods on each node instead of within the same pod. With this configuration "stuff doesn't work anymore"**.
I realize the solution (at least for now) is to just run the kube-proxy container in any pod that needs it, but I'd like to know why I need to. Why isn't just running it in a daemonset enough?
I've tried to find more information about running kubectl proxy like this, but my search results drown in results about running it to access a remote cluster from a local environment, i.e. not at all what I'm after.
I include these details not because I think they're relevant, but because they might be even though I'm convinced they're not:
*) a Linkerd ingress controller, but I think that's irrelevant
**) in this case, the "working" state is that the ingress controller complains that the destination is unknown because there's no matching ingress rule, while the "not working" state is a network timeout.
namely between running kubectl proxy from within a pod vs running it in a different pod.
Assuming your cluster has an software defined network, such as flannel or calico, a Pod has its own IP and all containers within a Pod share the same networking space. Thus:
containers:
- name: c0
command: ["curl", "127.0.0.1:8001"]
- name: c1
command: ["kubectl", "proxy", "-p", "8001"]
will work, whereas in a DaemonSet, they are by definition not in the same Pod and thus the hypothetical c0 above would need to use the DaemonSet's Pod's IP to contact 8001. That story is made more complicated by the fact that kubectl proxy by default only listens on 127.0.0.1, so you would need to alter the DaemonSet's Pod's kubectl proxy to include --address='0.0.0.0' --accept-hosts='.*' to even permit such cross-Pod communication. I believe you also need to declare the ports: array in the DaemonSet configuration, since you are now exposing that port into the cluster, but I'd have to double-check whether ports: is merely polite, or is actually required.

OpenShift - how can pods resolve each other names

I m trying to have cloudera manager and cloudera agents on openshift, in order to run the installation I need to get all the pods communicating with each other.
Manually, I modified the /etc/hosts on the manager and add all the agents and on the agents I added the manager and all the other agents.
Now I wanted to automate this, let suppose I add a new agent, I want it to resolve the manager and the host (I can get a part of it done, by passing the manager name as an env variable and with a shell script add it to the /etc/hosts, not the ideal way but still solution). But the second part would be more difficult, to get the manager to resolve every new agent, and also to resolve every other agent on the same service.
I was wondering if there is a way so every pod on the cluster can resolve the others names ?
I have to services cloudera-manager with one pod, and an other service cloudera-agent with -let's say- 3 agents.
do you have any idea ?
thank you.
Not sure, but it looks like you could benefit from StatefulSets.
There are other ways to get the other pods ips (like using a headless service or requesting to the serverAPI directly ) but StatefulSets provide :
Stable, unique network identifiers
Stable, persistent storage.
Lots of other functionality that facilitates the deployment of a special kind of clusters like distributed databases. Not sure my term 'distributed' here is correct, but it helps me remind what they are for :).
If you want to get all Pods running under a certain Service, make sure to use a headless Service (i.e. set clusterIP: None). Then, you can query your local DNS-Server for the Service and will receive A-Records for all Pods assigned to it:
---
apiVersion: v1
kind: Service
metadata:
name: my-sv
namespace: my-ns
labels:
app: my-app
spec:
clusterIP: None
selector:
app: my-app
Then start your Pods (make sure to give app: labels for assignment) and query your DNS-Server from any of them:
kubectl exec -ti my-pod --namespace=my-ns -- /bin/bash
$ nslookup my-sv.my-ns.svc.cluster.local
Server: 10.255.3.10
Address: 10.255.3.10#53
Name: my-sv.my-ns.svc.cluster.local
Address: 10.254.24.11
Name: my-sv.my-ns.svc.cluster.local
Address: 10.254.5.73
Name: my-sv.my-ns.svc.cluster.local
Address: 10.254.87.6

How to create a post-init container in Kubernetes?

I'm trying to create a redis cluster on K8s. I need a sidecar container to create the cluster after the required number of redis containers are online.
I've got 2 containers, redis and a sidecar. I'm running them in a statefulset with 6 replicas. I need the sidecar container to run just once for each replica then terminate. It's doing that, but K8s keeps rerunning the sidecar.
I've tried setting a restartPolicy at the container level, but it's invalid. It seems K8s only supports this at the pod level. I can't use this though because I want the redis container to be restarted, just not the sidecar.
Is there anything like a post-init container? My sidecar needs to run after the main redis container to make it join the cluster. So an init container is no use.
What's the best way to solve this with K8s 1.6?
I advise you to use Kubernetes Jobs:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
This kind of Job will keep running until it is completed once. In this job you could try detecting if all the required nodes are available in order to form the cluster.
A better answer is to just make the sidecar enter an infinite sleep loop. If it never exits it'll never keep on being restarted. Resource limits can be used to ensure there's minimal impact on the cluster.
Adding postStart as an optional answer to this problem:
https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/#define-poststart-and-prestop-handlers
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]

How are minions chosen for given pod creation request

How does kubernetes choose the minion among many available for a given pod creation command? Is it something that can be controlled/tweaked ?
If replicated pods are submitted for deployment, is kubernetes intelligent enough to place them in different minions if they expose the same container/host port pair? Or does it always place different replicas in different minions ?
What about corner cases like what if two different pods (not necessarily replicas) that expose same host/container port pair are submitted? Will they carefully be placed on different minions ?
If a pod requires specific compute/memory requirements, can it be placed in a minion/host that has sufficient resources left to meet those requirement?
In summary, is there detailed documentation on kubernetes pod placement strategy?
Pods are scheduled to ports using the algorithm in generic_scheduler.go
There are rules that prevent host-port conflicts, and also to make sure that there are sufficient memory and cpu requirements. predicates.go
One way to choose minion for pod creation is using nodeSelector. Inside the yaml file of pod specify the label of minion for which you want to choose the minion.
apiVersion: v1
kind: Pod
metadata:
name: nginx1
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
key: value