GCloud kubernetes cluster with 1 Insufficient cpu error - kubernetes

I created a Kubernetes cluster on Google Cloud using:
gcloud container clusters create my-app-cluster --num-nodes=1
Then I deployed my 3 apps (backend, frontend and a scraper) and created a load balancer. I used the following configuration file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-deployment
labels:
app: my-app
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-server
image: gcr.io/my-app/server
ports:
- containerPort: 8009
envFrom:
- secretRef:
name: my-app-production-secrets
- name: my-app-scraper
image: gcr.io/my-app/scraper
ports:
- containerPort: 8109
envFrom:
- secretRef:
name: my-app-production-secrets
- name: my-app-frontend
image: gcr.io/my-app/frontend
ports:
- containerPort: 80
envFrom:
- secretRef:
name: my-app-production-secrets
---
apiVersion: v1
kind: Service
metadata:
name: my-app-lb-service
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- name: my-app-server-port
protocol: TCP
port: 8009
targetPort: 8009
- name: my-app-scraper-port
protocol: TCP
port: 8109
targetPort: 8109
- name: my-app-frontend-port
protocol: TCP
port: 80
targetPort: 80
When typing kubectl get pods I get:
NAME READY STATUS RESTARTS AGE
my-app-deployment-6b49c9b5c4-5zxw2 0/3 Pending 0 12h
When investigation i Google Cloud I see "Unschedulable" state with "insufficient cpu" error on pod:
When going to Nodes section under my cluster in the Clusters page, I see 681 mCPU requested and 940 mCPU allocated:
What is wrong? Why my pod doesn't start?

Every container has a default CPU request (in GKE I’ve noticed it’s 0.1 CPU or 100m). Assuming these defaults you have three containers in that pod so you’re requesting another 0.3 CPU.
The node has 0.68 CPU (680m) requested by other workloads and a total limit (allocatable) on that node of 0.94 CPU (940m).
If you want to see what workloads are reserving that 0.68 CPU, you need to inspect the pods on the node. In the page on GKE where you see the resource allocations and limits per node, if you click the node it will take you to a page that provides this information.
In my case I can see 2 pods of kube-dns taking 0.26 CPU each, amongst others. These are system pods that are needed to operate the cluster correctly. What you see will also depend on what add-on services you have selected, for example: HTTP Load Balancing (Ingress), Kubernetes Dashboard and so on.
Your pod would take CPU to 0.98 CPU for the node which is more than the 0.94 limit, which is why your pod cannot start.
Note that the scheduling is based on the amount of CPU requested for each workload, not how much it actually uses, or the limit.
Your options:
Turn off any add-on service which is taking CPU resource that you don't need.
Add more CPU resource to your cluster. To do that you will either need to change your node pool to use VMs with more CPU, or increase the number of nodes in your existing pool. You can do this in GKE console or via the gcloud command line.
Make explicit requests in your containers for less CPU that will override the defaults.
apiVersion: apps/v1
kind: Deployment
...
spec:
containers:
- name: my-app-server
image: gcr.io/my-app/server
...
resources:
requests:
cpu: "50m"
- name: my-app-scraper
image: gcr.io/my-app/scraper
...
resources:
requests:
cpu: "50m"
- name: my-app-frontend
image: gcr.io/my-app/frontend
...
resources:
requests:
cpu: "50m"

Related

Why my Nodeport service change its port number?

I am trying to install the velero for k8s. During the installation when try to install mini.io I changes its service type from cluster IP to Node Port. My Pods run successfully and also I can see the node Port services is up and running.
master-k8s#masterk8s-virtual-machine:~/velero-v1.9.5-linux-amd64$ kubectl get pods -n velero -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
minio-8649b94fb5-vk7gv 1/1 Running 0 16m 10.244.1.102 node1k8s-virtual-machine <none> <none>
master-k8s#masterk8s-virtual-machine:~/velero-v1.9.5-linux-amd64$ kubectl get svc -n velero NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
minio NodePort 10.111.72.207 <none> 9000:31481/TCP 53m
When I try to access my services port number changes from 31481 to 45717 by it self. Every time when I correct port number and hit enter it changes back to new port and I am not able to access my application.
These are my codes from mini.io service file.
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
type: NodePort
ports:
- port: 9000
targetPort: 9000
protocol: TCP
selector:
component: minio
What I have done so far?
I look for the log and everything show successful No error. I also try it with Load balancer service. With Load balancer port not not changes but I am not able to access the application.
Noting found on google about this issue.
I also check all the namespaces pods and services to check if these Port numbers are being used. No services use these ports.
What Do I want?
Can you please help me to find out what cause my application to change its port. Where is the issue and how to fix it.? How can I access application dashbord?
Update Question
This is the full codes file. It may help to find my mistake.
apiVersion: v1
kind: Namespace
metadata:
name: velero
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
strategy:
type: Recreate
selector:
matchLabels:
component: minio
template:
metadata:
labels:
component: minio
spec:
volumes:
- name: storage
emptyDir: {}
- name: config
emptyDir: {}
containers:
- name: minio
image: minio/minio:latest
imagePullPolicy: IfNotPresent
args:
- server
- /storage
- --config-dir=/config
env:
- name: MINIO_ACCESS_KEY
value: "minio"
- name: MINIO_SECRET_KEY
value: "minio123"
ports:
- containerPort: 9002
volumeMounts:
- name: storage
mountPath: "/storage"
- name: config
mountPath: "/config"
---
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
# ClusterIP is recommended for production environments.
# Change to NodePort if needed per documentation,
# but only if you run Minio in a test/trial environment, for example with Minikube.
type: NodePort
ports:
- port: 9002
nodePort: 31482
targetPort: 9002
protocol: TCP
selector:
component: minio
---
apiVersion: batch/v1
kind: Job
metadata:
namespace: velero
name: minio-setup
labels:
component: minio
spec:
template:
metadata:
name: minio-setup
spec:
restartPolicy: OnFailure
volumes:
- name: config
emptyDir: {}
containers:
- name: mc
image: minio/mc:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- "mc --config-dir=/config config host add velero http://minio:9000 minio minio123 && mc --config-dir=/config mb -p velero/velero"
volumeMounts:
- name: config
mountPath: "/config"
Edit2 Logs Of Pod
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD
Formatting 1st pool, 1 set(s), 1 drives per set.
WARNING: Host local has more than 0 drives of set. A host failure will result in data becoming unavailable.
MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-01-25T00-19-54Z (go1.19.4 linux/amd64)
Status: 1 Online, 0 Offline.
API: http://10.244.1.108:9000 http://127.0.0.1:9000
Console: http://10.244.1.108:33045 http://127.0.0.1:33045
Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.
Edit 3 Logs of Pod
master-k8s#masterk8s-virtual-machine:~/velero-1.9.5$ kubectl logs minio-8649b94fb5-qvzfh -n velero
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD
Formatting 1st pool, 1 set(s), 1 drives per set.
WARNING: Host local has more than 0 drives of set. A host failure will result in data becoming unavailable.
MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-01-25T00-19-54Z (go1.19.4 linux/amd64)
Status: 1 Online, 0 Offline.
API: http://10.244.2.131:9000 http://127.0.0.1:9000
Console: http://10.244.2.131:36649 http://127.0.0.1:36649
Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.
You can set the nodePort number inside the port config so that it won't be automatically set.
Try this Service:
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
type: NodePort
ports:
- port: 9000
nodePort: 31481
targetPort: 9000
protocol: TCP
selector:
component: minio

Kubernetes : RabbitMQ pod is spammed with connections from kube-system

I'm currently learning Kubernetes and all its quircks.
I'm currently using a rabbitMQ Deployment, service and pod in my cluster to exchange messages between apps in the cluster. However, I saw an abnormal amount of the rabbitMQ pod restarts.
After installing prometheus and Grafana to see the problem, I saw that the rabbitMQ pod would consume more and more memory and cpu until it gets killed by the OOMkiller every two hours or so. The graph looks like this :
Graph of CPU consumption in my cluster (rabbitmq in red)
After that I looked into the rabbitMQ pod UI, and saw that an app in my cluster (ip 10.224.0.5) was constantly creating new connections, this IP corresponding to my kube-system and my prometheus instance, as shown by the following logs :
k get all -A -o wide | grep 10.224.0.5
E1223 12:13:48.231908 23198 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
E1223 12:13:48.311831 23198 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
kube-system pod/azure-ip-masq-agent-xh9jk 1/1 Running 0 25d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/cloud-node-manager-h5ff5 1/1 Running 0 25d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/csi-azuredisk-node-sf8sn 3/3 Running 0 3d15h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/csi-azurefile-node-97nbt 3/3 Running 0 19d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/kube-proxy-2s5tn 1/1 Running 0 3d15h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
monitoring pod/prometheus-prometheus-node-exporter-dztwx 1/1 Running 0 20h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
Also, I noticed that these connections seem tpo be blocked by rabbitMQ, as the field connection.blocked in the client properties is set to true, as shown in the follwing image:
Print screen of a connection details from rabbitMQ pod's UI
I saw in the documentation that rabbitMQ starts to blocks connections when it hits low on resources, but I set the cpu and memory limits to 1 cpu and 1 Gib RAM, and the connections are blocked from the start anyway.
On the cluster, I'm also using Keda which uses the rabbitmq pod, and polls it every one second to see if there are any messages in a queue (I set pollingInterval to 1 in the yaml). But as I said earlier, it's not Keda that's creating all the connections, it's kube-system. Unless keda uses a component described earlier in the log to poll rabbitmq, and that the Keda's polling interval does not corresponds to seconds (which is highly unlikely as it's written in the docs that this polling intertval is given in seconds), I don't know at all what's going on with all these connections.
The following section contains the yamls of all the components that might be involved with this problem (keda and rabbitmq) :
rabbitMQ Replica Count.yaml
apiVersion: v1
kind: ReplicationController
metadata:
labels:
component: rabbitmq
name: rabbitmq-controller
spec:
replicas: 1
template:
metadata:
labels:
app: taskQueue
component: rabbitmq
spec:
containers:
- image: rabbitmq:3.11.5-management
name: rabbitmq
ports:
- containerPort: 5672
name: amqp
- containerPort: 15672
name: http
resources:
limits:
cpu: 1
memory: 1Gi
rabbitMQ Service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
component: rabbitmq
name: rabbitmq-service
spec:
type: LoadBalancer
ports:
- port: 5672
targetPort: 5672
name: amqp
- port: 15672
targetPort: 15672
name: http
selector:
app: taskQueue
component: rabbitmq
keda JobScaler, Secret and TriggerAuthentication (sample data is just a replacement for fields that I do not want to be revealed :) ):
apiVersion: v1
kind: Secret
metadata:
name: keda-rabbitmq-secret
data:
host: sample-host # base64 encoded value of format amqp://guest:password#localhost:5672/vhost
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-rabbitmq-conn
namespace: default
spec:
secretTargetRef:
- parameter: host
name: keda-rabbitmq-secret
key: host
---
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: builder-job-scaler
namespace: default
spec:
jobTargetRef:
parallelism: 1
completions: 1
activeDeadlineSeconds: 600
backoffLimit: 5
template:
spec:
volumes:
- name: shared-storage
emptyDir: {}
initContainers:
- name: sourcesfetcher
image: sample image
volumeMounts:
- name: shared-storage
mountPath: /mnt/shared
env:
- name: SHARED_STORAGE_MOUNT_POINT
value: /mnt/shared
- name: RABBITMQ_ENDPOINT
value: sample host
- name: RABBITMQ_QUEUE_NAME
value: buildOrders
containers:
- name: builder
image: sample image
volumeMounts:
- name: shared-storage
mountPath: /mnt/shared
env:
- name: SHARED_STORAGE_MOUNT_POINT
value: /mnt/shared
- name: MINIO_ENDPOINT
value: sample endpoint
- name: MINIO_PORT
value: sample port
- name: MINIO_USESSL
value: "false"
- name: MINIO_ROOT_USER
value: sample user
- name: MINIO_ROOT_PASSWORD
value: sampel password
- name: BUCKET_NAME
value: "hex"
- name: SERVER_NAME
value: sample url
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 500m
memory: 512Mi
restartPolicy: OnFailure
pollingInterval: 1
maxReplicaCount: 2
minReplicaCount: 0
rollout:
strategy: gradual
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: buildOrders
mode: QueueLength
value: "1"
authenticationRef:
name: keda-trigger-auth-rabbitmq-conn
Any help would very much appreciated!

How to get kubernetes service external ip dynamically inside manifests file?

We are creating a deployment in which the command needs the IP of the pre-existing service pointing to a statefulset. Below is the manifest file for the deployment. Currently, we are manually entering the service external IP inside this deployment manifest. Now we would like it to auto-populate during runtime. Is there a way to achieve this dynamically using environment variables or another way?
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-api
namespace: app-api
spec:
selector:
matchLabels:
app: app-api
replicas: 1
template:
metadata:
labels:
app: app-api
spec:
containers:
- name: app-api
image: asia-south2-docker.pkg.dev/rnd20/app-api/api:09
command: ["java","-jar","-Dallow.only.apigateway.request=false","-Dserver.port=8084","-Ddedupe.searcher.url=http://10.10.0.6:80","-Dspring.cloud.zookeeper.connect-string=10.10.0.6:2181","-Dlogging$.file.path=/usr/src/app/logs/springboot","/usr/src/app/app_api/dedupe-engine-components.jar",">","/usr/src/app/out.log"]
livenessProbe:
httpGet:
path: /health
port: 8084
httpHeaders:
- name: Custom-Header
value: ""
initialDelaySeconds: 60
periodSeconds: 60
ports:
- containerPort: 4016
resources:
limits:
cpu: 1
memory: "2Gi"
requests:
cpu: 1
memory: "2Gi"
NOTE: The IP in question here is the Internal load balancer IP, i.e. the external IP for the service and the service is in a different namespace. Below is the manifest for the same
apiVersion: v1
kind: Service
metadata:
name: app
namespace: app
annotations:
cloud.google.com/load-balancer-type: "Internal"
labels:
app: app
spec:
selector:
app: app
type: LoadBalancer
ports:
- name: container
port: 80
targetPort: 8080
protocol: TCP
You could use the following command instead:
command:
- /bin/bash
- -c
- |-
set -exuo pipefail
ip=$(dig +search +short servicename.namespacename)
exec java -jar -Dallow.only.apigateway.request=false -Dserver.port=8084 -Ddedupe.searcher.url=http://$ip:80 -Dspring.cloud.zookeeper.connect-string=$ip:2181 -Dlogging$.file.path=/usr/src/app/logs/springboot /usr/src/app/app_api/dedupe-engine-components.jar > /usr/src/app/out.log
It first resolves the ip address using dig (if you don't have dig in your image - you need to substitute it with something else you have), then execs your original java command.
As of today I'm not aware of any "native" kubernetes way to provide IP meta information directly to the pod.
If you are sure they exist before, and you deploy into the same namespace, you can read them from environment variables. It's documented here: https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables.
When a Pod is run on a Node, the kubelet adds a set of environment variables for each active Service. It adds {SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT variables, where the Service name is upper-cased and dashes are converted to underscores. It also supports variables (see makeLinkVariables) that are compatible with Docker Engine's "legacy container links" feature.
For example, the Service redis-master which exposes TCP port 6379 and has been allocated cluster IP address 10.0.0.11, produces the following environment variables:
REDIS_MASTER_SERVICE_HOST=10.0.0.11
REDIS_MASTER_SERVICE_PORT=6379
REDIS_MASTER_PORT=tcp://10.0.0.11:6379
REDIS_MASTER_PORT_6379_TCP=tcp://10.0.0.11:6379
REDIS_MASTER_PORT_6379_TCP_PROTO=tcp
REDIS_MASTER_PORT_6379_TCP_PORT=6379
REDIS_MASTER_PORT_6379_TCP_ADDR=10.0.0.11
Note, those wont update after the container is started.

Rabbit mq - Error while waiting for Mnesia tables

I have installed rabbitmq using helm chart on a kubernetes cluster. The rabbitmq pod keeps restarting. On inspecting the pod logs I get the below error
2020-02-26 04:42:31.582 [warning] <0.314.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-02-26 04:42:31.582 [info] <0.314.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
When I try to do kubectl describe pod I get this error
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-rabbitmq-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-config
Optional: false
healthchecks:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-healthchecks
Optional: false
rabbitmq-token-w74kb:
Type: Secret (a volume populated by a Secret)
SecretName: rabbitmq-token-w74kb
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m27s (x878 over 7h21m) kubelet, gke-analytics-default-pool-918f5943-w0t0 Readiness probe failed: Timeout: 70 seconds ...
Checking health of node rabbit#rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Status of node rabbit#rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
I have provisioned the above on Google Cloud on a kubernetes cluster. I am not sure during what specific situation it started failing. I had to restart the pod and since then it has been failing.
What is the issue here ?
TLDR
helm upgrade rabbitmq --set clustering.forceBoot=true
Problem
The problem happens for the following reason:
All RMQ pods are terminated at the same time due to some reason (maybe because you explicitly set the StatefulSet replicas to 0, or something else)
One of them is the last one to stop (maybe just a tiny bit after the others). It stores this condition ("I'm standalone now") in its filesystem, which in k8s is the PersistentVolume(Claim). Let's say this pod is rabbitmq-1.
When you spin the StatefulSet back up, the pod rabbitmq-0 is always the first to start (see here).
During startup, pod rabbitmq-0 first checks whether it's supposed to run standalone. But as far as it can see on its own filesystem, it's part of a cluster. So it checks for its peers and doesn't find any. This results in a startup failure by default.
rabbitmq-0 thus never becomes ready.
rabbitmq-1 is never starting because that's how StatefulSets are deployed - one after another. If it were to start, it would start successfully because it sees that it can run standalone as well.
So in the end, it's a bit of a mismatch between how RabbitMQ and StatefulSets work. RMQ says: "if everything goes down, just start everything and the same time, one will be able to start and as soon as this one is up, the others can rejoin the cluster." k8s StatefulSets say: "starting everything all at once is not possible, we'll start with the 0".
Solution
To fix this, there is a force_boot command for rabbitmqctl which basically tells an instance to start standalone if it doesn't find any peers. How you can use this from Kubernetes depends on the Helm chart and container you're using. In the Bitnami Chart, which uses the Bitnami Docker image, there is a value clustering.forceBoot = true, which translates to an env variable RABBITMQ_FORCE_BOOT = yes in the container, which will then issue the above command for you.
But looking at the problem, you can also see why deleting PVCs will work (other answer). The pods will just all "forget" that they were part of a RMQ cluster the last time around, and happily start. I would prefer the above solution though, as no data is being lost.
Just deleted the existing persistent volume claim and reinstalled rabbitmq and it started working.
So every time after installing rabbitmq on a kubernetes cluster and if I scale down the pods to 0 and when I scale up the pods at a later time I get the same error. I also tried deleting the Persistent Volume Claim without uninstalling the rabbitmq helm chart but still the same error.
So it seems each time I scale down the cluster to 0, I need to uninstall the rabbitmq helm chart, delete the corresponding Persistent Volume Claims and install the rabbitmq helm chart each time to make it working.
IF you are in the same scenario like me and you don't know who deployed the helm chart and how was it deployed... you can edit the statefulset directly to avoid messing up more things..
I was able to make it work without deleting the helm_chart
kubectl -n rabbitmq edit statefulsets.apps rabbitmq
under the spec section I added as following the env variable RABBITMQ_FORCE_BOOT = yes:
spec:
containers:
- env:
- name: RABBITMQ_FORCE_BOOT # New Line 1 Added
value: "yes" # New Line 2 Added
And that should fix the issue also... please first try to do it in a proper way as is explained above by Ulli.
In my case solution was simple
Step1: Downscale the statefulset it will not delete the PVC.
kubectl scale statefulsets rabbitmq-1-rabbitmq --namespace teps-rabbitmq --replicas=1
Step2: Access the RabbitMQ Pod.
kubectl exec -it rabbitmq-1-rabbitmq-0 -n Rabbit
Step3: Reset the cluster
rabbitmqctl stop_app
rabbitmqctl force_boot
Step4:Rescale the statefulset
kubectl scale statefulsets rabbitmq-1-rabbitmq --namespace teps-rabbitmq --replicas=4
I also got a similar kind of error as given below.
2020-06-05 03:45:37.153 [info] <0.234.0> Waiting for Mnesia tables for
30000 ms, 9 retries left 2020-06-05 03:46:07.154 [warning] <0.234.0>
Error while waiting for Mnesia tables:
{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-05 03:46:07.154 [info] <0.234.0> Waiting for Mnesia tables for
30000 ms, 8 retries left
In my case, the slave node(server) of the RabbitMQ cluster was down. Once I started the slave node, master node's started without an error.
test this deploy:
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq-namespace
name: rabbitmq
labels:
app: rabbitmq
type: LoadBalancer
spec:
type: NodePort
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
nodePort: 31672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
nodePort: 30672
- name: stomp
protocol: TCP
port: 61613
targetPort: 61613
selector:
app: rabbitmq
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq-namespace
name: rabbitmq-lb
labels:
app: rabbitmq
spec:
# Headless service to give the StatefulSet a DNS which is known in the cluster (hostname-#.app.namespace.svc.cluster.local, )
# in our case - rabbitmq-#.rabbitmq.rabbitmq-namespace.svc.cluster.local
clusterIP: None
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
- name: stomp
port: 61613
selector:
app: rabbitmq
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
namespace: rabbitmq-namespace
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_stomp].
rabbitmq.conf: |
## Cluster formation. See http://www.rabbitmq.com/cluster-formation.html to learn more.
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
## Should RabbitMQ node name be computed from the pod's hostname or IP address?
## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
## Set to "hostname" to use pod hostnames.
## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
## environment variable.
cluster_formation.k8s.address_type = hostname
## Important - this is the suffix of the hostname, as each node gets "rabbitmq-#", we need to tell what's the suffix
## it will give each new node that enters the way to contact the other peer node and join the cluster (if using hostname)
cluster_formation.k8s.hostname_suffix = .rabbitmq.rabbitmq-namespace.svc.cluster.local
## How often should node cleanup checks run?
cluster_formation.node_cleanup.interval = 30
## Set to false if automatic removal of unknown/absent nodes
## is desired. This can be dangerous, see
## * http://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
## * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
## See http://www.rabbitmq.com/ha.html#master-migration-data-locality
queue_master_locator=min-masters
## See http://www.rabbitmq.com/access-control.html#loopback-users
loopback_users.guest = false
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbitmq-namespace
spec:
serviceName: rabbitmq
replicas: 3
selector:
matchLabels:
name: rabbitmq
template:
metadata:
labels:
app: rabbitmq
name: rabbitmq
state: rabbitmq
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
containers:
- name: rabbitmq-k8s
image: rabbitmq:3.8.3
volumeMounts:
- name: config-volume
mountPath: /etc/rabbitmq
- name: data
mountPath: /var/lib/rabbitmq/mnesia
ports:
- name: http
protocol: TCP
containerPort: 15672
- name: amqp
protocol: TCP
containerPort: 5672
livenessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 10
resources:
requests:
memory: "0"
cpu: "0"
limits:
memory: "2048Mi"
cpu: "1000m"
readinessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
imagePullPolicy: Always
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: RABBITMQ_USE_LONGNAME
value: "true"
# See a note on cluster_formation.k8s.address_type in the config file section
- name: RABBITMQ_NODENAME
value: "rabbit#$(HOSTNAME).rabbitmq.$(NAMESPACE).svc.cluster.local"
- name: K8S_SERVICE_NAME
value: "rabbitmq"
- name: RABBITMQ_ERLANG_COOKIE
value: "mycookie"
volumes:
- name: config-volume
configMap:
name: rabbitmq-config
items:
- key: rabbitmq.conf
path: rabbitmq.conf
- key: enabled_plugins
path: enabled_plugins
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "default"
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rabbitmq
namespace: rabbitmq-namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: rabbitmq-namespace
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: rabbitmq-namespace
subjects:
- kind: ServiceAccount
name: rabbitmq
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: endpoint-reader

Resource allocation to container in Kubernetes pods

Consider the below .yaml file :
application/guestbook/redis-slave-deployment.yaml
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: redis-slave
labels:
app: redis
spec:
selector:
matchLabels:
app: redis
role: slave
tier: backend
replicas: 2
template:
metadata:
labels:
app: redis
role: slave
tier: backend
spec:
containers:
- name: slave
image: gcr.io/google_samples/gb-redisslave:v1
resources:
requests:
cpu: 100m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
ports:
- containerPort: 6379
The resource section isn't clear to me! If I have 16G RAM and 4core CPU, each core 2GHz, then how much are the requested resources which come above?
So you have a total of 4 CPU cores and 16GB RAM. This Deployment will start two Pods (replicas) and each will start with 0.1 cores and 0.1GB reserved on the Node on which it starts. So in total 0.2 cores and 0.2GB will be reserved, leaving up to 15.8GB and 3.8cores. However the actual usage may exceed the reservation as this is only a the requested amount. To specify an upper limit you use a limits section.
It can be counter-intuitive that CPU allocation is based on cores rather than GHz - there's a fuller explanation in the GCP docs and more on the arithmetic in the official kubernetes docs