Airflow Helm Chart Worker Node Error - CrashLoopBackOff

Airflow Helm Chart Worker Node Error - CrashLoopBackOff - kubernetes

I am using official Helm chart for airflow. Every Pod works properly except Worker node.
Even in that worker node, 2 of the containers (git-sync and worker-log-groomer) works fine.
The error happened in the 3rd container (worker) with CrashLoopBackOff. Exit code status as 137 OOMkilled.
In my openshift, memory usage is showing to be at 70%.
Although this error comes because of memory leak. This doesn't happen to be the case for this one. Please help, I have been going on in this one for a week now.
Kubectl describe pod airflow-worker-0 ->
worker:
Container ID: <>
Image: <>
Image ID: <>
Port: <>
Host Port: <>
Args:
bash
-c
exec \
airflow celery worker
State: Running
Started: <>
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: <>
Finished: <>
Ready: True
Restart Count: 3
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
DUMB_INIT_SETSID: 0
AIRFLOW__CORE__FERNET_KEY: <> Optional: false
Mounts:
<>
git-sync:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
GIT_SYNC_REV: HEAD
Mounts:
<>
worker-log-groomer:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
Args:
bash
/clean-logs
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
AIRFLOW__LOG_RETENTION_DAYS: 5
Mounts:
<>
I am pretty much sure you know the answer. Read all your articles on airflow. Thank you :)
https://stackoverflow.com/users/1376561/marc-lamberti

The issues occurs due to placing a limit in "resources" under helm chart - values.yaml in any of the pods.
By default it is -
resources: {}
but this causes an issue as pods can access unlimited memory as required.
By changing it to -
resources:
limits:
cpu: 200m
memory: 2Gi
requests:
cpu: 100m
memory: 512Mi
It makes the pod clear on how much it can access and request.
This solved my issue.

Related

How to modify Apache Nifi node adresses while deploying in Kubernetes?

In Kubernetes I would like to deploy Apache Nifi Cluster in StatefulSet with 3 nodes.
Problem is I would like to modify node adresses recursively in an init container in my yaml file.
I have to modify these parameters for each nodes in Kubernetes:
'nifi.remote.input.host'
'nifi.cluster.node.address'
I need to have these FQDN added recursively in Nifi properties:
nifi-0.nifi.NAMESPACENAME.svc.cluster.local
nifi-1.nifi.NAMESPACENAME.svc.cluster.local
nifi-2.nifi.NAMESPACENAME.svc.cluster.local
I have to modify the properties before deploying so I tried the following init container but doesn't work :
initContainers:
- name: modify-nifi-properties
image: busybox:v01
command:
- sh
- -c
- |
# Modify nifi.properties to use the correct hostname for each node
for i in {1..3}; do
sed -i "s/nifi-$((i-1))/nifi-$((i-1)).nifinamespace.nifi.svc.cluster.local/g" /opt/nifi/conf/nifi.properties
done
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 100m
memory: 100Mi
How can I do it ?

Autopurge config for zookeeper in kubernetes not working

I am trying to put the autopurge config for zookeeper in release.yaml file but it doesn't seem working.
Even after adding the purgeInterval =1 and snapRetainCount = 5, it's always autopurge.snapRetainCount=3 autopurge.purgeInterval=0
Below is the .yaml I am using for zookeeper in Kubernetes-
zookeeper:
## If true, install the Zookeeper chart alongside Pinot
## ref: https://github.com/kubernetes/charts/tree/master/incubator/zookeeper
enabled: true
urlOverride: "my-zookeeper:2181/pinot"
port: 2181
replicaCount: 3
autopurge:
purgeInterval: 1
snapRetainCount: 5
env:
## The JVM heap size to allocate to Zookeeper
ZK_HEAP_SIZE: "256M"
#ZOO_MY_ID: 1
persistence:
enabled: true
image:
PullPolicy: "IfNotPresent"
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 1Gi
Can anyone please help?

Error in running DPDK L2FWD application on a container managed by Kubernetes

I am trying to run DPDK L2FWD application on a container managed by Kubernetes.
To achieve this I have done the below steps -
I have created single node K8s setup where both master and client are running on host machine. As network plug-in, I have used Calico Network.
To create customized DPDK docker image, I have used the below Dockerfile
FROM ubuntu:16.04 RUN apt-get update
RUN apt-get install -y net-tools
RUN apt-get install -y python
RUN apt-get install -y kmod
RUN apt-get install -y iproute2
RUN apt-get install -y net-tools
ADD ./dpdk/ /home/sdn/dpdk/
WORKDIR /home/sdn/dpdk/
To run DPDK application inside POD, below host's directories are mounted to POD with privileged access:
/mnt/huge
/usr
/lib
/etc
Below is k8s deployment yaml used to create the POD
apiVersion: v1
kind: Pod
metadata:
name: dpdk-pod126
spec:
containers:
- name: dpdk126
image: dpdk-test126
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
resources:
requests:
memory: "2Gi"
cpu: "100m"
volumeMounts:
- name: hostvol1
mountPath: /mnt/huge
- name: hostvol2
mountPath: /usr
- name: hostvol3
mountPath: /lib
- name: hostvol4
mountPath: /etc
securityContext:
privileged: true
volumes:
- name: hostvol1
hostPath:
path: /mnt/huge
- name: hostvol2
hostPath:
path: /usr
- name: hostvol3
hostPath:
path: /home/sdn/kubernetes-test/libtest
- name: hostvol4
hostPath:
path: /etc
Below configurations are already done in host -
Huge page mounting.
Interface binding in user space.
After successful creation of POD, when trying to run a DPDK L2FWD application inside POD, I am getting the below error -
root#dpdk-pod126:/home/sdn/dpdk# ./examples/l2fwd/build/l2fwd -c 0x0f -- -p 0x03 -q 1
EAL: Detected 16 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: 1007 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: FATAL: Cannot get hugepage information.
EAL: Cannot get hugepage information.
EAL: Error - exiting with code: 1
Cause: Invalid EAL arguments

According to this, you might be missing
medium: HugePages from your hugepage volume.
Also, hugepages can be a bit finnicky. Can you provide the output of:
cat /proc/meminfo | grep -i huge
and check if there's any files in /mnt/huge?
Also maybe this can be helpful. Can you somehow check if the hugepages are being mounted as mount -t hugetlbfs nodev /mnt/huge?

First of all, you have to verify, that you have enough hugepages in your system. Check it with kubectl command:
kubectl describe nodes
where you could see something like this:
Capacity:
cpu: 12
ephemeral-storage: 129719908Ki
hugepages-1Gi: 0
hugepages-2Mi: 8Gi
memory: 65863024Ki
pods: 110
If your hugepages-2Mi is empty, then your k8s don't see mounted hugepages
After mounting hugepages into your host, you can prepare your pod to work with hugepages. You don't need to mount hugepages folder as you shown. You can simply add emptyDir volume like this:
volumes:
- name: hugepage-2mi
emptyDir:
medium: HugePages-2Mi
HugePages-2Mi is a specific resource name that corresponds with hugepages of 2Mb size. If you want to use 1Gb size hugepages then there is another resource for it - hugepages-1Gi
After defining the volume, you can use it in volumeMounts like this:
volumeMounts:
- mountPath: /hugepages-2Mi
name: hugepage-2mi
And there is one additional step. You have to define resource limitations for hugepages usage:
resources:
limits:
hugepages-2Mi: 128Mi
memory: 128Mi
requests:
memory: 128Mi
After all this steps, you can run your container with hugepages inside container
As #AdamTL mentioned, you can find additional info here

How to Do Kubectl cp from running pod to local,says no such file or directory

How to Do Kubectl cp from running pod to local,says no such file or directory
I have contents in Ubuntu container as below
vagrant#ubuntu-xenial:~/k8s/pods$ kubectl exec command-demo-67m2b -c
ubuntu
-- sh -c "ls /tmp"
docker-sock
Now simply i want to copy above /tmp contents using below kubectl cp command
kubectl cp command-demo-67m2b/ubuntu:/tmp /home
I have a command output as below
vagrant#ubuntu-xenial:~/k8s/pods$ kubectl cp command-demo-67m2b/ubuntu:/tmp
/home
error: tmp no such file or directory
Now All i want to do is copy above /tmp folder to local host,unfortunately kubectl says no such file or directory.
I amn confused when /tmp folder exists in Ubuntu container why kubectl cp saying folder not found
My pod is command-demo-67m2b and container name is ubuntu
But the pod is up and running as shown below
vagrant#ubuntu-xenial:~/k8s/pods$ kubectl describe pods command-demo-67m2b
Name: command-demo-67m2b
Namespace: default
Node: ip-172-31-8-145/172.31.8.145
Start Time: Wed, 16 Jan 2019 00:57:05 +0000
Labels: controller-uid=a4ac12c1-1929-11e9-b787-02d8b37d95a0
job-name=command-demo
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: memory
request for container ubuntu; memory limit for container ubuntu
Status: Running
IP: 10.1.40.75
Controlled By: Job/command-demo
Containers:
command-demo-container:
Container ID:
docker://c680fb336242f456d90433a9aa89cf3e1cb1d45d73447769fcf86ce329176437
Image: tarunkumard/fromscratch6.0
Image ID: docker- ullable://tarunkumard/fromscratch6.0#sha256:709b588aa4edcc9bc2b39bee60f248bb02347a605da09fb389c448e41e2f543a
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 16 Jan 2019 00:57:07 +0000
Finished: Wed, 16 Jan 2019 00:58:36 +0000
Ready: False
Restart Count: 0
Limits:
memory: 1Gi
Requests:
memory: 900Mi
Environment: <none>
Mounts:
/opt/gatling-fundamentals/build/reports/gatling/ from docker-sock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-w6jt6
(ro)
ubuntu:
Container ID:
docker://7da9d43816253048fb4137fadc6c2994aac93fd272391b73f2fab3b02487941a
Image: ubuntu:16.04
Image ID: docker-
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
--
Args:
while true; do sleep 10; done;
State: Running
Started: Wed, 16 Jan 2019 00:57:07 +0000
Ready: True
Restart Count: 0
Limits:
memory: 1Gi
Requests:
memory: 1Gi
Environment:
JVM_OPTS: -Xms900M -Xmx1G
Mounts:
/docker-sock from docker-sock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-w6jt6
(ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
docker-sock:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-w6jt6:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-w6jt6
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Here is my yaml file in case you need for reference:-
apiVersion: batch/v1
kind: Job
metadata:
name: command-demo
spec:
ttlSecondsAfterFinished: 100
template:
spec:
volumes:
- name: docker-sock # Name of the AWS EBS Volume
emptyDir: {}
restartPolicy: Never
containers:
- name: command-demo-container
image: tarunkumard/fromscratch6.0
volumeMounts:
- mountPath: /opt/gatling-fundamentals/build/reports/gatling/
#
Mount path within the container
name: docker-sock # Name must match the
AWS
EBS volume name defined in spec.Volumes
imagePullPolicy: Never
resources:
requests:
memory: "900Mi"
limits:
memory: "1Gi"
- name: ubuntu
image: ubuntu:16.04
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 10; done;" ]
volumeMounts:
- mountPath: /docker-sock # Mount path within the container
name: docker-sock # Name must match the
AWS
EBS volume name defined in spec.Volumes
imagePullPolicy: Never
env:
- name: JVM_OPTS
value: "-Xms900M -Xmx1G"
I expect kubectl cp command to copy contents from pod container to local

In your original command to exec into a container, you pass the -c ubuntu command, meaning you're selecting the Ubuntu container from the pod:
kubectl exec command-demo-67m2b -c
ubuntu
-- sh -c "ls /tmp"
However, in your kubectl cp command, you're not specifying the same container:
kubectl cp command-demo-67m2b/ubuntu:/tmp /home
Try this:
kubectl cp command-demo-67m2b:/tmp /home -c ubuntu

cannot schedule kubernetes pods with request for nvidia.com/gpu

i have been able to get kubernetes to recognise my gpus on my nodes:
$ kubectl get node MY_NODE -o yaml
...
allocatable:
cpu: "48"
ephemeral-storage: "15098429006"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 263756344Ki
nvidia.com/gpu: "8"
pods: "110"
capacity:
cpu: "48"
ephemeral-storage: 16382844Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 263858744Ki
nvidia.com/gpu: "8"
pods: "110"
...
and i spin up a pod with
Limits:
cpu: 2
memory: 2147483648
nvidia.com/gpu: 1
Requests:
cpu: 500m
memory: 536870912
nvidia.com/gpu: 1
However, the pod stays in PENDING with:
Insufficient nvidia.com/gpu.
Am i spec'ing the resources correctly?

Have you installed NVIDIA plugin in K8S?
kubectl create -f nvidia.io/device-plugin.yml
Some devices are too old and cannot be healthchecked.So this option must be disabled:
containers:
- image: nvidia/k8s-device-plugin:1.9
name: nvidia-device-plugin-ctr
env:
- name: DP_DISABLE_HEALTHCHECKS
value: "xids"
Take a look at:
Device plugin: https://kubernetes.io/docs/concepts/cluster-administration/device-plugins/
NVIDIA github: https://github.com/NVIDIA/k8s-device-plugin

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Airflow Helm Chart Worker Node Error - CrashLoopBackOff - kubernetes

Related

How to modify Apache Nifi node adresses while deploying in Kubernetes?

Autopurge config for zookeeper in kubernetes not working

Error in running DPDK L2FWD application on a container managed by Kubernetes

How to Do Kubectl cp from running pod to local,says no such file or directory

cannot schedule kubernetes pods with request for nvidia.com/gpu

Categories

Resources