Rabbitmq Cluster configuration in Google Kubernetes Engine - kubernetes

I have installed krew and installed rabbitmq-plugin using the same. Using the kubectl rabbitmq -n create instance --image=custom-image:v1 command created a rabbitmq stateful set in my google kubernetes engine cluster.
The deployment was successful, but now when I try to update the stateful set with new image custom-image:v2, it is not getting rolled out.
Can someone help me here ?
Thanks & Regards,
Robin

Normally if you check the statefulSet events you will get a hint on what's going on wrongly. Usually, if the previous version was running, v2 is not reachable or can't be deployed.
kubectl describe statefulset <statefulSet-name> -n <namespace>

Related

Issues setting up Prometheus on EKS - Pods in Pending State (Seems to be dependent on PVCs waiting on Volume being created)

I have an EKS cluster for my university project and I want to setup Prometheus on the cluster. To do this I am using helm with the following commands (see this tutorial https://archive.eksworkshop.com/intermediate/240_monitoring/deploy-prometheus/):
kubectl create namespace prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2" \
--set server.persistentVolume.storageClass="gp2"
When I check the status of the prometheus pods, the alert-manager and server seem to be in an infinite Pending state:
When I describe the prometheus-alertmanager-0 pod I see the following VolumeBinding error:
When I describe the prometheus-server-5d858bd4bd-6xmws pod I see the following VolumeBinding error:
I can also see there are 2 pvcs in Pending state:
When I describe the prometheus-server pvc, I can see its waiting for a volume to be created:
Im familiar with Kubernetes basics but pvcs is not something that I have used before. Is the solution here to create a "volume" and if so how do I do that?, would that solve the issue?, or am I way off the mark?
Should I try to install Prometheus in a different way?
Any help on this greatly appreciated
Note: Although similar this is not a duplicate of Prometheus server in pending state after installation using Helm. For one the errors highlighted there are different errors, also other manual steps such as creating volumes were also performed (which I have not done), Finally, I am following the specific tutorial referenced and also I am asking if I should try to setup Prometheus a different way if there is a simpler way

Airflow is receiving incorrect POD status from Kubernetes

We are using Airflow to schedule Spark job on Kubernetes. Recently, I have encountered a scenario where:
airflow received error 404 with message "pods pod-name not found"
I manually checked that POD was actually working fine at that time. In fact, I was able to collect logs using kubectl logs -f -n namespace podname
What happened due to this is that airflow created another POD for running the same job which resulted in race condition.
Airflow is using Kubernetes Python client's read_namespaced_pod API()
def read_pod(self, pod):
"""Read POD information"""
try:
return self._client.read_namespaced_pod(pod.metadata.name, pod.metadata.namespace)
except BaseHTTPError as e:
raise AirflowException(
'There was an error reading the kubernetes API: {}'.format(e)
)
I believe read_namespaced_pod() calls Kubernetes API. In order to investigate this further, I would like to like check logs of Kubernetes API server.
Can you please share steps to check what is happening on Kubernetes side ?
Note: Kubernetes version is 1.18 and Airflow version is 1.10.10.
Answering the question from the perspective of logs/troubleshooting:
I believe read_namespaced_pod() calls Kubernetes API. In order to investigate this further, I would like to like check logs of Kubernetes API server.
Yes, you are correct, this function calls the Kubernetes API. You can check the logs of Kubernetes API server by running:
$ kubectl logs -n kube-system KUBERNETES_API_SERVER_POD_NAME
I would also consider checking the kube-controller-manager:
$ kubectl logs -n kube-system KUBERNETES_CONTROLLER_MANAGER_POD_NAME
The example output of it:
I0413 12:33:12.840270 1 event.go:291] "Event occurred" object="default/nginx-6799fc88d8" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: nginx-6799fc88d8-kchp7"
A side note!
Above commands will work assuming that your kubernetes-apiserver and kubernetes-controller-manager Pod is visible to you
Can you please share steps to check what is happening on Kubernetes side ?
This question targets the basics of troubleshooting/logs checking.
For that you can use following commands (and the ones mentioned earlier):
$ kubectl get RESOURCE RESOURCE_NAME:
example: $ kubectl get pod airflow-pod-name
also you can add -o yaml for more information
$ kubectl describe RESOURCE RESOURCE_NAME:
example: $ kubectl describe pod airflow-pod-name
$ kubectl logs POD_NAME:
example: $ kubectl logs airflow-pod-name
Additional resources:
Kubernetes.io: Docs: Concepts: Cluster administration: Logging Architecture
Kubernetes.io: Docs: Tasks: Debug application cluster: Debug cluster

Argo CD pod can't pull the image

I'm running a hybrid AKS cluster where I have one Linux and one Windows node. The Windows node will run a legacy server app. I want to use Argo CD to simplify the deployments.
After following the installation instructions (installation manifest) and installing Argo in the cluster I noticed I can't connect to its dashboard.
Troubleshooting the issue I found that the Argo pod can't pull the image. Below output of kubectl describe pod argocd-server-75b6967787-xfccz -n argocd
Another thing that is visible here is that Argo pod got assigned to a windows node. From what I found here, Argo can't run on windows nodes. I think that's the root cause of the problems.
Does anyone know how can I force Argo pods to run on Linux node?
I found that something like nodeSelector could be useful.
nodeSelector:
kubernetes.io/os: linux
But how could I apply the nodeSelector on the already deployed Argo?
Does anyone know how can I force Argo pods to run on Linux node?
check for the other node label and change the node selector
nodeSelector:
kubernetes.io/os: linux
you can directly edit the current deployment of argo CD
kubectl edit deployment argocd-server -n <namespace name>
you can edit the label directly using cli and update it
I added nodeSelector to the pod templates of argocd-server, argocd-repo-server and argocd-application-controller workloads (former two are Deployment, the latter is a StatefulSet).
After that I redeployed the manifests. The images got pulled and I was able to access the Argo dashboard.

kubernetes to openshift equivalent command

In kubernetes I have a command as
kubectl create deployment nginx --image=ewoutp/docker-nginx-curl -n web
What should I run if I want to create this inside openshift cluster
I tried this
oc create deployment nginx --image=ewoutp/docker-nginx-curl -n web
I am getting error as error:
no matches for extensions/, Kind=Deployment
Can someone help me?
It might indicate that your openshift cluster is not running. Check oc status to view status of your current project. If it is not running you should create new project.
If you cluster is running you can run oc create deployment nginx --image=ewoutp/docker-nginx-curl -n web -o yaml to verify if apiVersion is correct. Currently used version is apps/v1. If it is incorrect you can save it to file and edit to match current version.

How to See the Application logs in Pod

We are moving towards Microservice and using K8S for cluster orchestration. We are building infra using Dynatrace and Prometheus server for metrics collection but they are yet NOT in good shape.
Our Java Application on one of the Pod is not working. I want to see the application logs.
How do I access these logs?
Assuming the application logs to stdout/err, kubectl logs -n namespacename podname.