Unable to import local minikube cluster to rancher on Mac (cattle-cluster-agent fails on :8080/health) - minikube

I have installed rancher on Mac, and used custom port numbers; am able to connect to localhost:443 and work through rancher GUI.
docker run --privileged -d --restart=unless-stopped -p 980:80 -p 981:443 --name rancher rancher/rancher
I then created a minikube local cluster, & tried to import that into rancher via GUI.
minikube -p dxmcs3 start
As suggested by Rancher as part of GUI, I ran the following to import; since the minikube POD needs to be able to access rancher endpoint on my host machine (mac), I updated the CATTLE_SERVER host to point to https://host.minikube.internal:981 prior to running.
curl --insecure -sfL https://localhost:981/v3/import/zsh5mtnkkrtz7tj7scbpb59q5tsjzmhg7r5476z7gdnh4xgjczt7cd_c-6sfj7.yaml | sed 's/https:\/\/localhost:981/https:\/\/host.minikube.internal:981/g' | kubectl apply -f -
The deployments go well, but the cluster's import shows up as "pending" forever on Rancher UI.
on digging, I noticed that the rancher/rancher-agent:v2.5.9 POD fails to start. More specifically, "cluster-register" container fails on health check up start. It tries to to http://(pod-ip):8080/health & fails.
containers:
- env:
- name: CATTLE_FEATURES
- name: CATTLE_IS_RKE
value: "false"
- name: CATTLE_SERVER
value: https://host.minikube.internal:981
- name: CATTLE_CA_CHECKSUM
value: d8e8de5d121fac709e414123ff792458931530569555a3d233d555deb62a9490
- name: CATTLE_CLUSTER
value: "true"
- name: CATTLE_K8S_MANAGED
value: "true"
- name: CATTLE_CLUSTER_REGISTRY
image: rancher/rancher-agent:v2.5.9
imagePullPolicy: IfNotPresent
name: cluster-register
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 2
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
Appreciate the community's help with resolving this. Or, Any pointer to a page I can refer. I really want to be able to have Rancher manage multiple minikube based local k8s environments for my learning & experimentation.

Related

Skaffold deploys an extra pod when deploying with helm

I have skaffold build that creates 2 docker images - myimg1 and myimg2.
When I try to deploy via helm, skaffold deploys them fine, then tries to deploy and additional pod for myimg1. I can see that in minikube dashboard. At first see myimg1:<tag> but in a few seconds that changes to myimg1-0
deploy:
kubectl:
manifests:
- k8s/pgadmin.yaml
kubeContext: minikube
helm:
releases:
- name: myrelease
chartPath: charts/mychart
artifactOverrides:
myimg1.container.image: myimg1
- name: myrelease
chartPath: charts/mychart
artifactOverrides:
myimg2.container.image: myimg2
Values.yam looks like this:
myimg1:
name: myimg1
container:
image: jfrog.host.com/docker_repo/myimg1:latest
pullPolicy: IfNotPresent
port: 5432
service:
type: ClusterIP
port: 5432
myimg2:
name: myimag2
container:
image: jfrog.host.com/docker_repo/myimg1:latest
pullPolicy: IfNotPresent
port: 8080
service:
type: ClusterIP
port: 8080
portName: http
Now, if I run helm manually and override the image tags via command line arguments, it deploys fine:
helm install --atomic --debug myrelease ./charts/mychart/ -f ./charts/mychart/values.yaml --set-string myimg1.container.image=myimg1:latest --set-string myimg2.container.image: myimg2:latest
When I run skaffold dev however, both containers are deployed, then the second image is deployed again and it tries to pulls from the remote registry.
As far as I can tell everything is set up fine:
Is there anyway to debug this?
I see the following log message:
time="2022-10-07T10:40:50-04:00" level=info msg="Streaming logs from pod: myimg1-0 container: myimg1" subtask=-1 task=DevLoop
time="2022-10-07T10:40:50-04:00" level=debug msg="Running command: [kubectl --context minikube logs --since=7s -f myimg1-0 -c myrelease --namespace default]" subtask=-1 task=DevLoop

Kubernetes postStart hook leads to race condition

I use a MySQL on Kubernetes with a postStart hook which should run a query after the start of the database.
This is the relevant part of my template.yaml:
spec:
containers:
- name: ${{APP}}
image: ${REGISTRY}/${NAMESPACE}/${APP}:${VERSION}
imagePullPolicy: Always
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- hostname && sleep 12 && echo $QUERY | /opt/rh/rh-mysql80/root/usr/bin/mysql
-h localhost -u root -D grafana
-P 3306
ports:
- name: tcp3306
containerPort: 3306
readinessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 15
timeoutSeconds: 1
livenessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 120
timeoutSeconds: 1
When the pod start, the PVC for the database gets corruped and the pod crashes. When I restart the pod, it works. I guess the query runs, when the database is not up yet. I guess this might get fixed with the readinessprobe, but I am not an expert at these topics.
Did anyone else run into a similar issue and knows how to fix it?
Note that postStart will be call at least once but may also be called more than once. This make postStart a bad place to run query.
You can set pod restartPolicy: OnFailure and run the query in separate MYSQL container. Start your second container with wait and run your query. Note that your query should produce idempotent result or your data integrity may breaks; consider when the pod is re-create with the existing data volume.

How to test if container is running postgres

I just deployed a docker with Postgres on it on AWS EKS.
Below is the description details.
How do i access or test if postgres is working. I tried accessing both IP with post within VPC from worker node.
psql -h #IP -U #defaultuser -p 55432
Below is the deployment.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:10.4
imagePullPolicy: "IfNotPresent"
ports:
- containerPort: 55432
# envFrom:
# - configMapRef:
# name: postgres-config
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgredb
volumes:
- name: postgredb
persistentVolumeClaim:
claimName: efs
Surprisingly I am able to connect to psql but on 5432. :( Not sure what I am doing wrong. I passed containerPort as 55432
In short, you need to run the following command to expose your database on 55432 port.
kubectl expose deployment postgres --port=55432 --target-port=5432 --name internal-postgresql-svc
From now on, you can connect to it via port 55432 from inside your cluster by using the service name as a hostname, or via its ClusterIP address:
kubectl get internal-postgresql-svc
What you did in your deployment manifest file, you just attached additional information about the network connections a container uses, between misleadingly, because your container exposes 5432 port only (you can verify it by your self here). You should use a Kubernetes Service - abstraction which enables access to your PODs, and does the necessary port mapping behind the scene.
Please check also different port Types, if you want to expose your postgresql database outside of the Kubernetes cluster.
To test if progress is running fine inside POD`s container:
kubectl run postgresql-postgresql-client --rm --tty -i --restart='Never' --namespace default --image bitnami/postgresql --env="PGPASSWORD=<HERE_YOUR_PASSWORD>" --command -- psql --host <HERE_HOSTNAME=SVC_OR_IP> -U <HERE_USERNAME>

What is the equivalent for depends_on in kubernetes

I have a docker compose file with the following entries
version: '2.1'
services:
mysql:
container_name: mysql
image: mysql:latest
volumes:
- ./mysqldata:/var/lib/mysql
environment:
MYSQL_ROOT_PASSWORD: 'password'
ports:
- '3306:3306'
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3306"]
interval: 30s
timeout: 10s
retries: 5
test1:
container_name: test1
image: test1:latest
ports:
- '4884:4884'
- '8443'
depends_on:
mysql:
condition: service_healthy
links:
- mysql
The Test-1 container is dependent on mysql and it needs to be up and running.
In docker this can be controlled using health check and depends_on attributes.
The health check equivalent in kubernetes is readinessprobe which i have already created but how do we control the container startup in the pod's?????
Any directions on this is greatly appreciated.
My Kubernetes file:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: deployment
spec:
replicas: 1
template:
metadata:
labels:
app: deployment
spec:
containers:
- name: mysqldb
image: "dockerregistry:mysqldatabase"
imagePullPolicy: Always
ports:
- containerPort: 3306
readinessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 15
periodSeconds: 10
- name: test1
image: "dockerregistry::test1"
imagePullPolicy: Always
ports:
- containerPort: 3000
That's the beauty of Docker Compose and Docker Swarm... Their simplicity.
We came across this same Kubernetes shortcoming when deploying the ELK stack.
We solved it by using a side-car (initContainer), which is just another container in the same pod thats run first, and when it's complete, kubernetes automatically starts the [main] container. We made it a simple shell script that is in loop until Elasticsearch is up and running, then it exits and Kibana's container starts.
Below is an example of a side-car that waits until Grafana is ready.
Add this 'initContainer' block just above your other containers in the Pod:
spec:
initContainers:
- name: wait-for-grafana
image: darthcabs/tiny-tools:1
args:
- /bin/bash
- -c
- >
set -x;
while [[ "$(curl -s -o /dev/null -w ''%{http_code}'' http://grafana:3000/login)" != "200" ]]; do
echo '.'
sleep 15;
done
containers:
.
.
(your other containers)
.
.
This was purposefully left out. The reason being is that applications should be responsible for their connect/re-connect logic for connecting to service(s) such as a database. This is outside the scope of Kubernetes.
While I don't know the direct answer to your question except this link (k8s-AppController), I don't think it's wise to use same deployment for DB and app. Because you are tightly coupling your db with app and loosing awesome k8s option to scale any one of them as needed. Further more if your db pod dies you loose your data as well.
Personally what I would do is to have a separate StatefulSet with Persistent Volume for database and Deployment for app and use Service to make sure their communication.
Yes I have to run few different commands and may need at least two separate deployment files but this way I am decoupling them and can scale them as needed. And my data is being persistent as well!
As mentioned, you should run the database and the application containers in separate pods and connect them with a service.
Unfortunately, both Kubernetes and Helm don't provide a functionality similar to what you've described. We had many issues with that and tried a few approaches until we have decided to develop a smallish utility that solved this problem for us.
Here's the link to the tool we've developed: https://github.com/Opsfleet/depends-on
You can make pods wait until other pods become ready according to their readinessProbe configuration. It's very close to Docker's depends_on functionality.
In Kubernetes terminology one your docker-compose set is a Pod.
So, there is no depends_on equivalent there. Kubernetes will check all containers in a pod and they all have to be alive for a mark that pod as Healthy and will always run them together.
In your case, you need to prepare configuration of Deployment like that:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
template:
metadata:
labels:
app: app-and-db
spec:
containers:
- name: app
image: nginx
ports:
- containerPort: 80
- name: db
image: mysql
ports:
- containerPort: 3306
After pod will be started, your database will be available on localhost interface for your application, because of network conception:
Containers within a pod share an IP address and port space, and can find each other via localhost. They can also communicate with each other using standard inter-process communications like SystemV semaphores or POSIX shared memory.
But, as #leninhasda mentioned, it is not a good idea to run database and application in your pod and without Persistent Volume. Here is a good tutorial on how to run a stateful application in the Kubernetes.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
what about liveness and readiness ??? supports commands, http requests and more
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5

Why Kubernetes Pod gets into Terminated state giving Completed reason and exit code 0?

I am struggling to find any answer to this in the Kubernetes documentation. The scenario is the following:
Kubernetes version 1.4 over AWS
8 pods running a NodeJS API (Express) deployed as a Kubernetes Deployment
One of the pods gets restarted for no apparent reason late at night (no traffic, no CPU spikes, no memory pressure, no alerts...). Number of restarts is increased as a result of this.
Logs don't show anything abnormal (ran kubectl -p to see previous logs, no errors at all in there)
Resource consumption is normal, cannot see any events about Kubernetes rescheduling the pod into another node or similar
Describing the pod gives back TERMINATED state, giving back COMPLETED reason and exit code 0. I don't have the exact output from kubectl as this pod has been replaced multiple times now.
The pods are NodeJS server instances, they cannot complete, they are always running waiting for requests.
Would this be internal Kubernetes rearranging of pods? Is there any way to know when this happens? Shouldn't be an event somewhere saying why it happened?
Update
This just happened in our prod environment. The result of describing the offending pod is:
api:
Container ID: docker://7a117ed92fe36a3d2f904a882eb72c79d7ce66efa1162774ab9f0bcd39558f31
Image: 1.0.5-RC1
Image ID: docker://sha256:XXXX
Ports: 9080/TCP, 9443/TCP
State: Running
Started: Mon, 27 Mar 2017 12:30:05 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 24 Mar 2017 13:32:14 +0000
Finished: Mon, 27 Mar 2017 12:29:58 +0100
Ready: True
Restart Count: 1
Update 2
Here it is the deployment.yaml file used:
apiVersion: "extensions/v1beta1"
kind: "Deployment"
metadata:
namespace: "${ENV}"
name: "${APP}${CANARY}"
labels:
component: "${APP}${CANARY}"
spec:
replicas: ${PODS}
minReadySeconds: 30
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
component: "${APP}${CANARY}"
spec:
serviceAccount: "${APP}"
${IMAGE_PULL_SECRETS}
containers:
- name: "${APP}${CANARY}"
securityContext:
capabilities:
add:
- IPC_LOCK
image: "134078050561.dkr.ecr.eu-west-1.amazonaws.com/${APP}:${TAG}"
env:
- name: "KUBERNETES_CA_CERTIFICATE_FILE"
value: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: "metadata.namespace"
- name: "ENV"
value: "${ENV}"
- name: "PORT"
value: "${INTERNAL_PORT}"
- name: "CACHE_POLICY"
value: "all"
- name: "SERVICE_ORIGIN"
value: "${SERVICE_ORIGIN}"
- name: "DEBUG"
value: "http,controllers:recommend"
- name: "APPDYNAMICS"
value: "true"
- name: "VERSION"
value: "${TAG}"
ports:
- name: "http"
containerPort: ${HTTP_INTERNAL_PORT}
protocol: "TCP"
- name: "https"
containerPort: ${HTTPS_INTERNAL_PORT}
protocol: "TCP"
The Dockerfile of the image referenced in the above Deployment manifest:
FROM ubuntu:14.04
ENV NVM_VERSION v0.31.1
ENV NODE_VERSION v6.2.0
ENV NVM_DIR /home/app/nvm
ENV NODE_PATH $NVM_DIR/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/v$NODE_VERSION/bin:$PATH
ENV APP_HOME /home/app
RUN useradd -c "App User" -d $APP_HOME -m app
RUN apt-get update; apt-get install -y curl
USER app
# Install nvm with node and npm
RUN touch $HOME/.bashrc; curl https://raw.githubusercontent.com/creationix/nvm/${NVM_VERSION}/install.sh | bash \
&& /bin/bash -c 'source $NVM_DIR/nvm.sh; nvm install $NODE_VERSION'
ENV NODE_PATH $NVM_DIR/versions/node/$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/versions/node/$NODE_VERSION/bin:$PATH
# Create app directory
WORKDIR /home/app
COPY . /home/app
# Install app dependencies
RUN npm install
EXPOSE 9080 9443
CMD [ "npm", "start" ]
npm start is an alias for a regular node app.js command that starts a NodeJS server on port 9080.
Check the version of docker you run, and whether the docker daemon was restarted during that time.
If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.
If this is still relevant, we just had a similar problem in our cluster.
We have managed to find more information by inspecting the logs from docker itself. ssh onto your k8s node and run the following:
sudo journalctl -fu docker.service
I hade similar problem when we upgraded to version 2.x Pos get restarted even after the Dags ran successfully.
I later resolved it after a long time of debugging by overriding the pod template and specifying it in the airflow.cfg file.
[kubernetes]
....
pod_template_file = {{ .Values.airflow.home }}/pod_template.yaml
---
# pod_template.yaml
apiVersion: v1
kind: Pod
metadata:
name: dummy-name
spec:
serviceAccountName: default
restartPolicy: Never
containers:
- name: base
image: dummy_image
imagePullPolicy: IfNotPresent
ports: []
command: []