I am trying to install the velero for k8s. During the installation when try to install mini.io I changes its service type from cluster IP to Node Port. My Pods run successfully and also I can see the node Port services is up and running.
master-k8s#masterk8s-virtual-machine:~/velero-v1.9.5-linux-amd64$ kubectl get pods -n velero -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
minio-8649b94fb5-vk7gv 1/1 Running 0 16m 10.244.1.102 node1k8s-virtual-machine <none> <none>
master-k8s#masterk8s-virtual-machine:~/velero-v1.9.5-linux-amd64$ kubectl get svc -n velero NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
minio NodePort 10.111.72.207 <none> 9000:31481/TCP 53m
When I try to access my services port number changes from 31481 to 45717 by it self. Every time when I correct port number and hit enter it changes back to new port and I am not able to access my application.
These are my codes from mini.io service file.
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
type: NodePort
ports:
- port: 9000
targetPort: 9000
protocol: TCP
selector:
component: minio
What I have done so far?
I look for the log and everything show successful No error. I also try it with Load balancer service. With Load balancer port not not changes but I am not able to access the application.
Noting found on google about this issue.
I also check all the namespaces pods and services to check if these Port numbers are being used. No services use these ports.
What Do I want?
Can you please help me to find out what cause my application to change its port. Where is the issue and how to fix it.? How can I access application dashbord?
Update Question
This is the full codes file. It may help to find my mistake.
apiVersion: v1
kind: Namespace
metadata:
name: velero
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
strategy:
type: Recreate
selector:
matchLabels:
component: minio
template:
metadata:
labels:
component: minio
spec:
volumes:
- name: storage
emptyDir: {}
- name: config
emptyDir: {}
containers:
- name: minio
image: minio/minio:latest
imagePullPolicy: IfNotPresent
args:
- server
- /storage
- --config-dir=/config
env:
- name: MINIO_ACCESS_KEY
value: "minio"
- name: MINIO_SECRET_KEY
value: "minio123"
ports:
- containerPort: 9002
volumeMounts:
- name: storage
mountPath: "/storage"
- name: config
mountPath: "/config"
---
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
# ClusterIP is recommended for production environments.
# Change to NodePort if needed per documentation,
# but only if you run Minio in a test/trial environment, for example with Minikube.
type: NodePort
ports:
- port: 9002
nodePort: 31482
targetPort: 9002
protocol: TCP
selector:
component: minio
---
apiVersion: batch/v1
kind: Job
metadata:
namespace: velero
name: minio-setup
labels:
component: minio
spec:
template:
metadata:
name: minio-setup
spec:
restartPolicy: OnFailure
volumes:
- name: config
emptyDir: {}
containers:
- name: mc
image: minio/mc:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- "mc --config-dir=/config config host add velero http://minio:9000 minio minio123 && mc --config-dir=/config mb -p velero/velero"
volumeMounts:
- name: config
mountPath: "/config"
Edit2 Logs Of Pod
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD
Formatting 1st pool, 1 set(s), 1 drives per set.
WARNING: Host local has more than 0 drives of set. A host failure will result in data becoming unavailable.
MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-01-25T00-19-54Z (go1.19.4 linux/amd64)
Status: 1 Online, 0 Offline.
API: http://10.244.1.108:9000 http://127.0.0.1:9000
Console: http://10.244.1.108:33045 http://127.0.0.1:33045
Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.
Edit 3 Logs of Pod
master-k8s#masterk8s-virtual-machine:~/velero-1.9.5$ kubectl logs minio-8649b94fb5-qvzfh -n velero
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD
Formatting 1st pool, 1 set(s), 1 drives per set.
WARNING: Host local has more than 0 drives of set. A host failure will result in data becoming unavailable.
MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-01-25T00-19-54Z (go1.19.4 linux/amd64)
Status: 1 Online, 0 Offline.
API: http://10.244.2.131:9000 http://127.0.0.1:9000
Console: http://10.244.2.131:36649 http://127.0.0.1:36649
Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.
You can set the nodePort number inside the port config so that it won't be automatically set.
Try this Service:
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
type: NodePort
ports:
- port: 9000
nodePort: 31481
targetPort: 9000
protocol: TCP
selector:
component: minio
I'm currently learning Kubernetes and all its quircks.
I'm currently using a rabbitMQ Deployment, service and pod in my cluster to exchange messages between apps in the cluster. However, I saw an abnormal amount of the rabbitMQ pod restarts.
After installing prometheus and Grafana to see the problem, I saw that the rabbitMQ pod would consume more and more memory and cpu until it gets killed by the OOMkiller every two hours or so. The graph looks like this :
Graph of CPU consumption in my cluster (rabbitmq in red)
After that I looked into the rabbitMQ pod UI, and saw that an app in my cluster (ip 10.224.0.5) was constantly creating new connections, this IP corresponding to my kube-system and my prometheus instance, as shown by the following logs :
k get all -A -o wide | grep 10.224.0.5
E1223 12:13:48.231908 23198 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
E1223 12:13:48.311831 23198 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
kube-system pod/azure-ip-masq-agent-xh9jk 1/1 Running 0 25d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/cloud-node-manager-h5ff5 1/1 Running 0 25d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/csi-azuredisk-node-sf8sn 3/3 Running 0 3d15h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/csi-azurefile-node-97nbt 3/3 Running 0 19d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/kube-proxy-2s5tn 1/1 Running 0 3d15h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
monitoring pod/prometheus-prometheus-node-exporter-dztwx 1/1 Running 0 20h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
Also, I noticed that these connections seem tpo be blocked by rabbitMQ, as the field connection.blocked in the client properties is set to true, as shown in the follwing image:
Print screen of a connection details from rabbitMQ pod's UI
I saw in the documentation that rabbitMQ starts to blocks connections when it hits low on resources, but I set the cpu and memory limits to 1 cpu and 1 Gib RAM, and the connections are blocked from the start anyway.
On the cluster, I'm also using Keda which uses the rabbitmq pod, and polls it every one second to see if there are any messages in a queue (I set pollingInterval to 1 in the yaml). But as I said earlier, it's not Keda that's creating all the connections, it's kube-system. Unless keda uses a component described earlier in the log to poll rabbitmq, and that the Keda's polling interval does not corresponds to seconds (which is highly unlikely as it's written in the docs that this polling intertval is given in seconds), I don't know at all what's going on with all these connections.
The following section contains the yamls of all the components that might be involved with this problem (keda and rabbitmq) :
rabbitMQ Replica Count.yaml
apiVersion: v1
kind: ReplicationController
metadata:
labels:
component: rabbitmq
name: rabbitmq-controller
spec:
replicas: 1
template:
metadata:
labels:
app: taskQueue
component: rabbitmq
spec:
containers:
- image: rabbitmq:3.11.5-management
name: rabbitmq
ports:
- containerPort: 5672
name: amqp
- containerPort: 15672
name: http
resources:
limits:
cpu: 1
memory: 1Gi
rabbitMQ Service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
component: rabbitmq
name: rabbitmq-service
spec:
type: LoadBalancer
ports:
- port: 5672
targetPort: 5672
name: amqp
- port: 15672
targetPort: 15672
name: http
selector:
app: taskQueue
component: rabbitmq
keda JobScaler, Secret and TriggerAuthentication (sample data is just a replacement for fields that I do not want to be revealed :) ):
apiVersion: v1
kind: Secret
metadata:
name: keda-rabbitmq-secret
data:
host: sample-host # base64 encoded value of format amqp://guest:password#localhost:5672/vhost
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-rabbitmq-conn
namespace: default
spec:
secretTargetRef:
- parameter: host
name: keda-rabbitmq-secret
key: host
---
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: builder-job-scaler
namespace: default
spec:
jobTargetRef:
parallelism: 1
completions: 1
activeDeadlineSeconds: 600
backoffLimit: 5
template:
spec:
volumes:
- name: shared-storage
emptyDir: {}
initContainers:
- name: sourcesfetcher
image: sample image
volumeMounts:
- name: shared-storage
mountPath: /mnt/shared
env:
- name: SHARED_STORAGE_MOUNT_POINT
value: /mnt/shared
- name: RABBITMQ_ENDPOINT
value: sample host
- name: RABBITMQ_QUEUE_NAME
value: buildOrders
containers:
- name: builder
image: sample image
volumeMounts:
- name: shared-storage
mountPath: /mnt/shared
env:
- name: SHARED_STORAGE_MOUNT_POINT
value: /mnt/shared
- name: MINIO_ENDPOINT
value: sample endpoint
- name: MINIO_PORT
value: sample port
- name: MINIO_USESSL
value: "false"
- name: MINIO_ROOT_USER
value: sample user
- name: MINIO_ROOT_PASSWORD
value: sampel password
- name: BUCKET_NAME
value: "hex"
- name: SERVER_NAME
value: sample url
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 500m
memory: 512Mi
restartPolicy: OnFailure
pollingInterval: 1
maxReplicaCount: 2
minReplicaCount: 0
rollout:
strategy: gradual
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: buildOrders
mode: QueueLength
value: "1"
authenticationRef:
name: keda-trigger-auth-rabbitmq-conn
Any help would very much appreciated!
I'm using a HPA based on a custom metric on GKE.
The HPA is not working and it's showing me this error log:
unable to fetch metrics from custom metrics API: the server is currently unable to handle the request
When I run kubectl get apiservices | grep custom I get
v1beta1.custom.metrics.k8s.io services/prometheus-adapter False (FailedDiscoveryCheck) 135d
this is the HPA spec config :
spec:
scaleTargetRef:
kind: Deployment
name: api-name
apiVersion: apps/v1
minReplicas: 3
maxReplicas: 50
metrics:
- type: Object
object:
target:
kind: Service
name: api-name
apiVersion: v1
metricName: messages_ready_per_consumer
targetValue: '1'
and this is the service's spec config :
spec:
ports:
- name: worker-metrics
protocol: TCP
port: 8080
targetPort: worker-metrics
selector:
app.kubernetes.io/instance: api
app.kubernetes.io/name: api-name
clusterIP: 10.8.7.9
clusterIPs:
- 10.8.7.9
type: ClusterIP
sessionAffinity: None
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
What should I do to make it work ?
First of all, confirm that the Metrics Server POD is running in your kube-system namespace. Also, you can use the following manifest:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.1
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
imagePullPolicy: Always
volumeMounts:
- name: tmp-dir
mountPath: /tmp
If so, take a look into the logs and look for any stackdriver adapter’s line. This issue is commonly caused due to a problem with the custom-metrics-stackdriver-adapter. It usually crashes in the metrics-server namespace. To solve that, use the resource from this URL, and for the deployment, use this image:
gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.1
Another common root cause of this is an OOM issue. In this case, adding more memory solves the problem. To assign more memory, you can specify the new memory amount in the configuration file, as the following example shows:
apiVersion: v1
kind: Pod
metadata:
name: memory-demo
namespace: mem-example
spec:
containers:
- name: memory-demo-ctr
image: polinux/stress
resources:
limits:
memory: "200Mi"
requests:
memory: "100Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]
In the above example, the Container has a memory request of 100 MiB and a memory limit of 200 MiB. In the manifest, the "--vm-bytes", "150M" argument tells the Container to attempt to allocate 150 MiB of memory. You can visit this Kubernetes Official Documentation to have more references about the Memory settings.
You can use the following threads for more reference GKE - HPA using custom metrics - unable to fetch metrics, Stackdriver-metadata-agent-cluster-level gets OOMKilled, and Custom-metrics-stackdriver-adapter pod keeps crashing.
What do you get for kubectl get pod -l "app.kubernetes.io/instance=api,app.kubernetes.io/name=api-name"?
There should be a pod, to which the service reffers.
If there is a pod, check its logs with kubectl logs <pod-name>. you can add -f to kubectl logs command, to follow the logs.
Adding this block in my EKS nodes security group rules solved the issue for me:
node_security_group_additional_rules = {
...
ingress_cluster_metricserver = {
description = "Cluster to node 4443 (Metrics Server)"
protocol = "tcp"
from_port = 4443
to_port = 4443
type = "ingress"
source_cluster_security_group = true
}
...
}
I have implemented a gRPC service, build it into a container, and deployed it using k8s, in particular AWS EKS, as a DaemonSet.
The Pod starts and turns to be in Running status very soon, but it takes very long, typically 300s, for the actual service to be accessible.
In fact, when I run kubectl logs to print the log of the Pod, it is empty for a long time.
I have logged something at the very starting of the service. In fact, my code looks like
package main
func init() {
log.Println("init")
}
func main() {
// ...
}
So I am pretty sure when there are no logs, the service is not started yet.
I understand that there may be a time gap between the Pod is running and the actual process inside it is running. However, 300s looks too long for me.
Furthermore, this happens randomly, sometimes the service is ready almost immediately. By the way, my runtime image is based on chromedp headless-shell, not sure if it is relevant.
Could anyone provide some advice for how to debug and locate the problem? Many thanks!
Update
I did not set any readiness probes.
Running kubectl get -o yaml of my DaemonSet gives
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2021-10-13T06:30:16Z"
generation: 1
labels:
app: worker
uuid: worker
name: worker
namespace: collection-14f45957-e268-4719-88c3-50b533b0ae66
resourceVersion: "47265945"
uid: 88e4671f-9e33-43ef-9c49-b491dcb578e4
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: worker
uuid: worker
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "2112"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: worker
uuid: worker
spec:
containers:
- env:
- name: GRPC_PORT
value: "22345"
- name: DEBUG
value: "false"
- name: TARGET
value: localhost:12345
- name: TRACKER
value: 10.100.255.31:12345
- name: MONITOR
value: 10.100.125.35:12345
- name: COLLECTABLE_METHODS
value: shopping.ShoppingService.GetShop
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: DISTRIBUTABLE_METHODS
value: collection.CollectionService.EnumerateShops
- name: PERFORM_TASK_INTERVAL
value: 0.000000s
image: xxx
imagePullPolicy: Always
name: worker
ports:
- containerPort: 22345
protocol: TCP
resources:
requests:
cpu: 1800m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- env:
- name: CAPTCHA_PARALLEL
value: "32"
- name: HTTP_PROXY
value: http://10.100.215.25:8080
- name: HTTPS_PROXY
value: http://10.100.215.25:8080
- name: API
value: 10.100.111.11:12345
- name: NO_PROXY
value: 10.100.111.11:12345
- name: POD_IP
image: xxx
imagePullPolicy: Always
name: source
ports:
- containerPort: 12345
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/ssl/certs/api.crt
name: ca
readOnly: true
subPath: tls.crt
dnsPolicy: ClusterFirst
nodeSelector:
api/nodegroup-app: worker
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: ca
secret:
defaultMode: 420
secretName: ca
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 2
desiredNumberScheduled: 2
numberAvailable: 2
numberMisscheduled: 0
numberReady: 2
observedGeneration: 1
updatedNumberScheduled: 2
Furthermore, there are two containers in the Pod. Only one of them is exceptionally slow to start, and the other one is always fine.
When you use HTTP_PROXY for your solution, watchout how it may route differently from your underlying cluster network - which often result to unexpected timeout.
I have posted community wiki answer to summarize the topic:
As gohm'c has mentioned in the comment:
Do connections made by container "source" always have to go thru HTTP_PROXY, even if it is connecting services in the cluster - do you think possible long time been taken because of proxy? Can try kubectl exec -it <pod> -c <source> -- sh and curl/wget external services.
This is an good observation. Note that some connections can be made directly and that adding extra traffic through the proxy may result in delays. For example, a bottleneck may arise. You can read more information about using an HTTP Proxy to Access the Kubernetes API in the documentation.
Additionally you can also create readiness probes to know when a container is ready to start accepting traffic.
A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.
I'm trying to bring up a RabbitMQ cluster on Kubernetes using Rabbitmq-peer-discovery-k8s plugin and I always have only on pod running and ready but the next one always fails.
I tried multiple changes to my configuration and this is what got at least one pod running
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rabbitmq
namespace: namespace-dev
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: namespace-dev
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: namespace-dev
subjects:
- kind: ServiceAccount
name: rabbitmq
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: endpoint-reader
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: "rabbitmq-data"
labels:
name: "rabbitmq-data"
release: "rabbitmq-data"
namespace: "namespace-dev"
spec:
capacity:
storage: 5Gi
accessModes:
- "ReadWriteMany"
nfs:
path: "/path/to/nfs"
server: "xx.xx.xx.xx"
persistentVolumeReclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "rabbitmq-data-claim"
namespace: "namespace-dev"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
selector:
matchLabels:
release: rabbitmq-data
---
# headless service Used to access pods using hostname
kind: Service
apiVersion: v1
metadata:
name: rabbitmq-headless
namespace: namespace-dev
spec:
clusterIP: None
# publishNotReadyAddresses, when set to true, indicates that DNS implementations must publish the notReadyAddresses of subsets for the Endpoints associated with the Service. The default value is false. The primary use case for setting this field is to use a StatefulSet's Headless Service to propagate SRV records for its Pods without respect to their readiness for purpose of peer discovery. This field will replace the service.alpha.kubernetes.io/tolerate-unready-endpoints when that annotation is deprecated and all clients have been converted to use this field.
# Since access to the Pod using DNS requires Pod and Headless service to be started before launch, publishNotReadyAddresses is set to true to prevent readinessProbe from finding DNS when the service is not started.
publishNotReadyAddresses: true
ports:
- name: amqp
port: 5672
- name: http
port: 15672
selector:
app: rabbitmq
---
# Used to expose the dashboard to the external network
kind: Service
apiVersion: v1
metadata:
namespace: namespace-dev
name: rabbitmq-service
spec:
type: NodePort
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
nodePort: 31672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
nodePort: 30672
selector:
app: rabbitmq
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
namespace: namespace-dev
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_peer_discovery_k8s].
rabbitmq.conf: |
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
queue_master_locator=min-masters
loopback_users.guest = false
cluster_formation.randomized_startup_delay_range.min = 0
cluster_formation.randomized_startup_delay_range.max = 2
cluster_formation.k8s.service_name = rabbitmq-headless
cluster_formation.k8s.hostname_suffix = .rabbitmq-headless.namespace-dev.svc.cluster.local
vm_memory_high_watermark.absolute = 1.6GB
disk_free_limit.absolute = 2GB
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbitmq
spec:
serviceName: rabbitmq-headless # Must be the same as the name of the headless service, used for hostname propagation access pod
selector:
matchLabels:
app: rabbitmq # In apps/v1, it needs to be the same as .spec.template.metadata.label for hostname propagation access pods, but not in apps/v1beta
replicas: 3
template:
metadata:
labels:
app: rabbitmq # In apps/v1, the same as .spec.selector.matchLabels
# setting podAntiAffinity
annotations:
scheduler.alpha.kubernetes.io/affinity: >
{
"podAntiAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": [{
"labelSelector": {
"matchExpressions": [{
"key": "app",
"operator": "In",
"values": ["rabbitmq"]
}]
},
"topologyKey": "kubernetes.io/hostname"
}]
}
}
spec:
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
containers:
- name: rabbitmq
image: rabbitmq:3.7.10
resources:
limits:
cpu: "0.5"
memory: 2Gi
requests:
cpu: "0.3"
memory: 2Gi
volumeMounts:
- name: config-volume
mountPath: /etc/rabbitmq
- name: rabbitmq-data
mountPath: /var/lib/rabbitmq/mnesia
ports:
- name: http
protocol: TCP
containerPort: 15672
- name: amqp
protocol: TCP
containerPort: 5672
livenessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 5
readinessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 5
imagePullPolicy: IfNotPresent
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_NODENAME
value: "rabbit#$(HOSTNAME).rabbitmq-headless.namespace-dev.svc.cluster.local"
# If service_name is set in ConfigMap, there is no need to set it again here.
# - name: K8S_SERVICE_NAME
# value: "rabbitmq-headless"
- name: RABBITMQ_ERLANG_COOKIE
value: "mycookie"
volumes:
- name: config-volume
configMap:
name: rabbitmq-config
items:
- key: rabbitmq.conf
path: rabbitmq.conf
- key: enabled_plugins
path: enabled_plugins
- name: rabbitmq-data
persistentVolumeClaim:
claimName: rabbitmq-data-claim
I only get one pod running and ready instead of the 3 replicas
[admin#devsvr3 yaml]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-0 1/1 Running 0 2m2s
rabbitmq-1 0/1 Running 1 43s
Inspecting the failing pod I got this.
[admin#devsvr3 yaml]$ kubectl logs rabbitmq-1
## ##
## ## RabbitMQ 3.7.10. Copyright (C) 2007-2018 Pivotal Software, Inc.
########## Licensed under the MPL. See http://www.rabbitmq.com/
###### ##
########## Logs: <stdout>
Starting broker...
2019-02-06 21:09:03.303 [info] <0.211.0>
Starting RabbitMQ 3.7.10 on Erlang 21.2.3
Copyright (C) 2007-2018 Pivotal Software, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/
2019-02-06 21:09:03.315 [info] <0.211.0>
node : rabbit#rabbitmq-1.rabbitmq-headless.namespace-dev.svc.cluster.local
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : XhdCf8zpVJeJ0EHyaxszPg==
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-1.rabbitmq-headless.namespace-dev.svc.cluster.local
2019-02-06 21:09:10.617 [error] <0.219.0> Unable to parse vm_memory_high_watermark value "1.6GB"
2019-02-06 21:09:10.617 [info] <0.219.0> Memory high watermark set to 103098 MiB (108106919116 bytes) of 257746 MiB (270267297792 bytes) total
2019-02-06 21:09:10.690 [info] <0.221.0> Enabling free disk space monitoring
2019-02-06 21:09:10.690 [info] <0.221.0> Disk free limit set to 2000MB
2019-02-06 21:09:10.698 [info] <0.224.0> Limiting to approx 1048476 file handles (943626 sockets)
2019-02-06 21:09:10.698 [info] <0.225.0> FHC read buffering: OFF
2019-02-06 21:09:10.699 [info] <0.225.0> FHC write buffering: ON
2019-02-06 21:09:10.702 [info] <0.211.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-1.rabbitmq-headless.namespace-dev.svc.cluster.local is empty. Assuming we need to join an existing cluster or initialise from scratch...
2019-02-06 21:09:10.702 [info] <0.211.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2019-02-06 21:09:10.702 [info] <0.211.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2019-02-06 21:09:10.702 [info] <0.211.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-02-06 21:09:10.702 [info] <0.211.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2019-02-06 21:09:10.710 [info] <0.211.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
{inet,[inet],nxdomain}]}
2019-02-06 21:09:10.711 [error] <0.210.0> CRASH REPORT Process <0.210.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 138
2019-02-06 21:09:10.711 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,815}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
[admin#devsvr3 yaml]$
What did I do wrong here?
finally i fixed it by adding this in /etc/resolv.conf of my pods:
[my-rabbit-svc].[my-rabbitmq-namespace].svc.[cluster-name]
to add this in my pod i used this setting in my StatefulSet:
dnsConfig:
searches:
- [my-rabbit-svc].[my-rabbitmq-namespace].svc.[cluster-name]
full documentation here
Try to set:
cluster_formation.k8s.host = [your kubernetes endpoint ip addres]
cluster_formation.k8s.port = [your kubernetes endpoint port]
because it seems that your pod cannot solve this name:
kubernetes.default.svc.cluster.local
Try using this stable helm chart https://github.com/helm/charts/tree/master/stable/rabbitmq
One of the possible solution here, instead of attaching dns config to pod - use k8s proxy sidecar. So, instead of solving
kubernetes.default.svc.cluster.local
You could setup sidecar container like
- name: "k8s-api-sidecar"
image: "tommyvn/kubectl-proxy:latest"
in statefullset/deployment
And change configmap to use it
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = localhost
cluster_formation.k8s.port = 8001
cluster_formation.k8s.scheme = http
If you take a look into this repository https://github.com/tommyvn/kubectl-proxy , you will find that it is just a call kubectl proxy.