Autopurge config for zookeeper in kubernetes not working - kubernetes

I am trying to put the autopurge config for zookeeper in release.yaml file but it doesn't seem working.
Even after adding the purgeInterval =1 and snapRetainCount = 5, it's always autopurge.snapRetainCount=3 autopurge.purgeInterval=0
Below is the .yaml I am using for zookeeper in Kubernetes-
zookeeper:
## If true, install the Zookeeper chart alongside Pinot
## ref: https://github.com/kubernetes/charts/tree/master/incubator/zookeeper
enabled: true
urlOverride: "my-zookeeper:2181/pinot"
port: 2181
replicaCount: 3
autopurge:
purgeInterval: 1
snapRetainCount: 5
env:
## The JVM heap size to allocate to Zookeeper
ZK_HEAP_SIZE: "256M"
#ZOO_MY_ID: 1
persistence:
enabled: true
image:
PullPolicy: "IfNotPresent"
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 1Gi
Can anyone please help?

Related

How to modify Apache Nifi node adresses while deploying in Kubernetes?

In Kubernetes I would like to deploy Apache Nifi Cluster in StatefulSet with 3 nodes.
Problem is I would like to modify node adresses recursively in an init container in my yaml file.
I have to modify these parameters for each nodes in Kubernetes:
'nifi.remote.input.host'
'nifi.cluster.node.address'
I need to have these FQDN added recursively in Nifi properties:
nifi-0.nifi.NAMESPACENAME.svc.cluster.local
nifi-1.nifi.NAMESPACENAME.svc.cluster.local
nifi-2.nifi.NAMESPACENAME.svc.cluster.local
I have to modify the properties before deploying so I tried the following init container but doesn't work :
initContainers:
- name: modify-nifi-properties
image: busybox:v01
command:
- sh
- -c
- |
# Modify nifi.properties to use the correct hostname for each node
for i in {1..3}; do
sed -i "s/nifi-$((i-1))/nifi-$((i-1)).nifinamespace.nifi.svc.cluster.local/g" /opt/nifi/conf/nifi.properties
done
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 100m
memory: 100Mi
How can I do it ?

Airflow Helm Chart Worker Node Error - CrashLoopBackOff

I am using official Helm chart for airflow. Every Pod works properly except Worker node.
Even in that worker node, 2 of the containers (git-sync and worker-log-groomer) works fine.
The error happened in the 3rd container (worker) with CrashLoopBackOff. Exit code status as 137 OOMkilled.
In my openshift, memory usage is showing to be at 70%.
Although this error comes because of memory leak. This doesn't happen to be the case for this one. Please help, I have been going on in this one for a week now.
Kubectl describe pod airflow-worker-0 ->
worker:
Container ID: <>
Image: <>
Image ID: <>
Port: <>
Host Port: <>
Args:
bash
-c
exec \
airflow celery worker
State: Running
Started: <>
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: <>
Finished: <>
Ready: True
Restart Count: 3
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
DUMB_INIT_SETSID: 0
AIRFLOW__CORE__FERNET_KEY: <> Optional: false
Mounts:
<>
git-sync:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
GIT_SYNC_REV: HEAD
Mounts:
<>
worker-log-groomer:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
Args:
bash
/clean-logs
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
AIRFLOW__LOG_RETENTION_DAYS: 5
Mounts:
<>
I am pretty much sure you know the answer. Read all your articles on airflow. Thank you :)
https://stackoverflow.com/users/1376561/marc-lamberti
The issues occurs due to placing a limit in "resources" under helm chart - values.yaml in any of the pods.
By default it is -
resources: {}
but this causes an issue as pods can access unlimited memory as required.
By changing it to -
resources:
limits:
cpu: 200m
memory: 2Gi
requests:
cpu: 100m
memory: 512Mi
It makes the pod clear on how much it can access and request.
This solved my issue.

Failed to log RabbitMQ in file

I'm trying to send RabbitMQ logs to both console and file, I'm using RabbitMQ Operator to run cluster and this definition.yaml:
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: rabbitmqcluster
spec:
image: rabbitmq:3.8.9-management
replicas: 3
imagePullSecrets:
- name: rabbitmq-cluster-registry-access
service:
type: ClusterIP
persistence:
storageClassName: rbd
storage: 20Gi
resources:
requests:
cpu: 2000m
memory: 6Gi
limits:
cpu: 2000m
memory: 6Gi
rabbitmq:
additionalConfig: |
log.console = true
log.console.level = debug
log.file = rabbit.log
log.dir = /var/lib/rabbitmq/
log.file.level = debug
log.file = true
additionalPlugins:
- rabbitmq_top
- rabbitmq_shovel
- rabbitmq_management
- rabbitmq_peer_discovery_k8s
- rabbitmq_stomp
- rabbitmq_prometheus
- rabbitmq_peer_discovery_consul
after running on the cluster, this is console logs:
## ## RabbitMQ 3.8.9
## ##
########## Copyright (c) 2007-2020 VMware, Inc. or its affiliates.
###### ##
########## Licensed under the MPL 2.0. Website: https://rabbitmq.com
Doc guides: https://rabbitmq.com/documentation.html
Support: https://rabbitmq.com/contact.html
Tutorials: https://rabbitmq.com/getstarted.html
Monitoring: https://rabbitmq.com/monitoring.html
Logs: <stdout>
Config file(s): /etc/rabbitmq/rabbitmq.conf
/etc/rabbitmq/conf.d/default_user.conf
Starting broker...2021-07-18 07:22:26.203 [info] <0.272.0>
node : rabbit#rabbitmqcluster-server-0.rabbitmqcluster-nodes.rabbitmq-system
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
: /etc/rabbitmq/conf.d/default_user.conf
cookie hash : shP20jU/vTqNF4lW9g0tqg==
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit#rabbitmqcluster-server-0.rabbitmqcluster-nodes.rabbitmq-system
and after starting the cluster there is no log file in the path I specified and the crash.log is in the directory:
$ls
crash.log
what should I do?

Dask Helm Chart - How to create Dask-CUDA-Worker nodes

I installed the Dask Helm Chart with a revised values.yaml to have 10 workers, however instead of Dask Workers I want to create Dash CUDA Workers to take advantage of the NVIDIA GPUs on my multi-node cluster.
I tried to modify the values.yaml as follows to get Dask CUDA workers instead of Dask Workers, but the worker pods are able to start. I did already install the NVIDIA GPUs on all my nodes on the Kubernetes per the official instructions so I'm not sure what DASK needs to see in order to create the Dask-Cuda-Workers.
worker:
name: worker
image:
repository: "daskdev/dask"
tag: 2.19.0
pullPolicy: IfNotPresent
dask_worker: "dask-cuda-worker"
#dask_worker: "dask-worker"
pullSecrets:
# - name: regcred
replicas: 15
default_resources: # overwritten by resource limits if they exist
cpu: 1
memory: "4GiB"
env:
# - name: EXTRA_APT_PACKAGES
# value: build-essential openssl
# - name: EXTRA_CONDA_PACKAGES
# value: numba xarray -c conda-forge
# - name: EXTRA_PIP_PACKAGES
# value: s3fs dask-ml --upgrade
resources: {}
# limits:
# cpu: 1
# memory: 3G
# nvidia.com/gpu: 1
# requests:
# cpu: 1
# memory: 3G
# nvidia.com/gpu: 1
As dask-cuda-worker is not yet officially in the dask image you will need to pull the image a different image: rapidsai/rapidsai:latest

cannot schedule kubernetes pods with request for nvidia.com/gpu

i have been able to get kubernetes to recognise my gpus on my nodes:
$ kubectl get node MY_NODE -o yaml
...
allocatable:
cpu: "48"
ephemeral-storage: "15098429006"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 263756344Ki
nvidia.com/gpu: "8"
pods: "110"
capacity:
cpu: "48"
ephemeral-storage: 16382844Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 263858744Ki
nvidia.com/gpu: "8"
pods: "110"
...
and i spin up a pod with
Limits:
cpu: 2
memory: 2147483648
nvidia.com/gpu: 1
Requests:
cpu: 500m
memory: 536870912
nvidia.com/gpu: 1
However, the pod stays in PENDING with:
Insufficient nvidia.com/gpu.
Am i spec'ing the resources correctly?
Have you installed NVIDIA plugin in K8S?
kubectl create -f nvidia.io/device-plugin.yml
Some devices are too old and cannot be healthchecked.So this option must be disabled:
containers:
- image: nvidia/k8s-device-plugin:1.9
name: nvidia-device-plugin-ctr
env:
- name: DP_DISABLE_HEALTHCHECKS
value: "xids"
Take a look at:
Device plugin: https://kubernetes.io/docs/concepts/cluster-administration/device-plugins/
NVIDIA github: https://github.com/NVIDIA/k8s-device-plugin