We have Celery Beat set up using the following deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-beat
labels:
deployment: celery-beat
spec:
replicas: 1
minReadySeconds: 120
selector:
matchLabels:
app: celery-beat
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0 # Would rather have downtime than an additional instance in service?
maxUnavailable: 1
template:
metadata:
labels:
app: celery-beat
spec:
containers:
- name: celery-beat
image: image_url
command: ["/var/app/scripts/kube_worker_beat_start.sh"]
imagePullPolicy: Always
ports:
- containerPort: 8000
name: http-server
livenessProbe: #From https://github.com/celery/celery/issues/4079#issuecomment-437415370
exec:
command:
- /bin/sh
- -c
- celery -A app_name status | grep ".*OK"
initialDelaySeconds: 3600
periodSeconds: 3600
readinessProbe:
exec:
command:
- /bin/sh
- -c
- celery -A app_name status | grep ".*OK"
initialDelaySeconds: 60
periodSeconds: 30
resources:
limits:
cpu: "0.5" #500mcores - only really required on install
requests:
cpu: "30m"
I have found the RollingUpdate settings tricky because with Celery Beat you really don't want two instances, otherwise you might get duplicate tasks being completed. This is super important for us to avoid since we're using it to send out push notifications.
With the current settings, when a deployment rolls out there is 3-5mins of downtime, because the existing instance is terminated immediately and we have to wait for the new one to set itself up.
Is there a better way of configuring this reduce the downtime whilst ensuring a maximum of one service is ever in service?
Related
In our Project, we are using Solr exporter to fetch Solr metrics like heap usage and send it to Prometheus. We have configured alert manager to fire alert when Solr heap usage exceeds 80%. This application is configured for all instances.
Is there a way we can configure a common Solr exporter which will fetch Solr metrics from all instances which are in different namespaces in that cluster.? (This is reduce the resource consumption in kuberenetes caused by multiple solr exporter instances.)
YAML configs for solr exporter given below:
apiVersion: v1
kind: Service
metadata:
name: solr-exporter
labels:
app: solr-exporter
spec:
ports:
- port: 9983
name: client
selector:
app: solr
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: solr
name: solr-exporter
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: solr
pod: solr-exporter
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: solr
pod: solr-exporter
spec:
containers:
- command:
- /bin/bash
- -c
- /opt/solr/contrib/prometheus-exporter/bin/solr-exporter -p 9983 -z zk-cs:2181
-n 7 -f /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml
image: solr:8.1.1
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: 9983
scheme: HTTP
initialDelaySeconds: 480
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
name: solr-exporter
ports:
- containerPort: 9983
protocol: TCP
readinessProbe:
failureThreshold: 2
httpGet:
path: /metrics
port: 9983
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 500m
requests:
cpu: 50m
memory: 64Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml
name: solr-exporter-config
readOnly: true
subPath: solr-exporter-config.xml
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- |-
apt-get update
apt-get install curl -y
PROTOCOL="http://"
COUNTER=0;
while [ $COUNTER -lt 30 ]; do
curl -v -k -s --connect-timeout 10 "${PROTOCOL}solrcluster:8983/solr/admin/info/system" && exit 0
sleep 2
done;
echo "Did NOT see a Running Solr instance after 60 secs!";
exit 1;
image: solr:8.1.1
imagePullPolicy: Always
name: mpi-demo-solr-exporter-init
resources: {}
securityContext:
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: solr-exporter-config
name: solr-exporter-config
I have pod that includes one init container and one app container,
between them there is a volume with shared folder.
My issue is that once a day or even more, the init container run itself and therefore it delete the node modules from the volume, then the main app crash because of missing modules.
The app container is not making any restarts, only the init container.
is anyone familiar with this issue in k8s? why those restarts happens only in the init container?
Thanks :)
edit:
the deployment yaml file -
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "25"
creationTimestamp: "2020-05-19T06:48:18Z"
generation: 25
labels:
apps: ******
commit: ******
name: *******
namespace: fleet
resourceVersion: "24059934"
selfLink: *******
uid: *******
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: *******
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: *******
commit: *******
revision: *******
spec:
containers:
image: XXXXXXXXXXXX
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 30
name: *******
ports:
- containerPort: 1880
name: http
protocol: TCP
readinessProbe:
failureThreshold: 20
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 30
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/breeze
name: workdir
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: flowregistrykey
initContainers:
image: XXXXXXXXXXXX
imagePullPolicy: IfNotPresent
name: get-flow-json
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /work-dir
name: workdir
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: workdir
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-06-01T12:30:10Z"
lastUpdateTime: "2020-06-01T12:30:10Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2020-05-19T06:48:18Z"
lastUpdateTime: "2020-06-01T12:45:05Z"
message: ReplicaSet "collection-associator.sandbox.services.collection-8784dcb9d"
has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 25
readyReplicas: 1
replicas: 1
updatedReplicas: 1
and this image explain the problem-
the pod was created 7 days ago, but the files inside were created today, and there is no node_modules folder - because only the init container run again and not the app container so there was no mpm install
the init container run itself and therefore it delete the node modules from the volume
initContainer is only run when a pod restart. Don't treat it as service or application. It should be a script, a job only for setup before your application.
then the main app crash because of missing modules.
node_modules is not dynamic loading. It's loaded when you npm start
You might want to try livenessProbe.
Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides liveness probes to detect and remedy such situations.
initContainers:
- name: set-delete-time
command: ["/bin/sh","-c"]
args:
- |
# Let's set 3600
echo 3600 > deleteTime
...
containers:
- name: node-app
livenessProbe:
exec:
command:
# If deleteTime > 0 exit 0
# Else /tmp/deleteTime - $periodSeconds > /tmp/deleteTime
- cat
- /tmp/deleteTime
- ...
initialDelaySeconds: 5
periodSeconds: 5
I am trying to implement the Rolling update of deployments in Kubernetes. I have followed a lot of articles that say that there would be zero downtime but when I run curl continuously. A couple of my requests failed before getting a response back. Below is the deployment file.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp-deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: my-image
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
The next thing I did was added
MinReadySeconds: 120
This takes care of this issue but it is not an optimum solution since we want to switch to the next pod as soon as it starts servicing requests and kill the old pod. I have two questions -
Can there be a condition when both the pods - the new and the old are
running and both start servicing the traffic? That would also not be
ideal as well. Since we want only one pod to service the request at a
time.
Is there any other out of the box solution that Kubernetes provides
to do a rolling deployment?
Try this. this should work for you . try doing a update of your image.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp-deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: nginx
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
For your better understanding check this link
I am trying to setup mongodb on kubernetes with istio. My statefulset is as follows:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: treeservice
namespace: staging
spec:
serviceName: tree-service-service
replicas: 1
selector:
matchLabels:
app: treeservice
template:
metadata:
labels:
app: treeservice
spec:
containers:
- name: mongodb-cache
image: mongo:latest
imagePullPolicy: Always
ports:
- containerPort: 30010
volumeMounts:
- name: mongodb-cache-data
mountPath: /data/db
resources:
requests:
memory: "4Gi" # 4 GB
cpu: "1000m" # 1 CPUs
limits:
memory: "4Gi" # 4 GB
cpu: "1000" # 1 CPUs
readinessProbe:
exec:
command:
- mongo
- --eval "db.stats()" --port 30010
initialDelaySeconds: 60 #wait this period after staring fist time
periodSeconds: 30 # polling interval every 5 minutes
timeoutSeconds: 60
livenessProbe:
exec:
command:
- mongo
- --eval "db.stats()" --port 30010
initialDelaySeconds: 60 #wait this period after staring fist time
periodSeconds: 30 # polling interval every 5 minutes
timeoutSeconds: 60
command: ["/bin/bash"]
args: ["-c","mongod --port 30010 --replSet test"] #bind to localhost
volumeClaimTemplates:
- metadata:
name: mongodb-cache-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast
resources:
requests:
storage: 300Gi
however, the pod is not created and I see the following error:
kubectl describe statefulset treeservice -n staging
Warning FailedCreate 1m (x159 over 1h) statefulset-controller create Pod treeservice-0 in StatefulSet treeservice failed error: Pod "treeservice-0" is invalid: spec.containers[1].env[7].name: Invalid value: "ISTIO_META_statefulset.kubernetes.io/pod-name": a valid environment variable name must consist of alphabetic characters, digits, '_', '-', or '.', and must not start with a digit (e.g. 'my.env-name', or 'MY_ENV.NAME', or 'MyEnvName1', regex used for validation is '[-._a-zA-Z][-._a-zA-Z0-9]*')
I assum treeservice is a valid pod name. Am I missing something?
I guess it's due to this issue https://github.com/istio/istio/issues/9571 which is still open
I made it work temporarily using the following:
annotations:
sidecar.istio.io/inject: "false"
I would like to run specific command after initialization of deployment is successful.
This is my yaml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: auth
spec:
replicas: 1
template:
metadata:
labels:
app: auth
spec:
containers:
- name: auth
image: {{my-service-image}}
env:
- name: NODE_ENV
value: "docker-dev"
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 3000
However, I would like to run command for db migration after (not before) deployment is successfully initialized and pods are running.
I can do it manually for every pod (with kubectl exec), but this is not very scalable.
I resolved it using lifecycles:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: auth
spec:
replicas: 1
template:
metadata:
labels:
app: auth
spec:
containers:
- name: auth
image: {{my-service-image}}
env:
- name: NODE_ENV
value: "docker-dev"
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 3000
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", {{cmd}}]
You can use Helm to deploy a set of Kubernetes resources. And then, use a Helm hook, e.g. post-install or post-upgrade, to run a Job in a separate docker container. Set your Job to invoke db migration. A Job will run >=1 Pods to completion, so it fits here quite well.
I chose to use a readinessProbe
My application requires configuration after the process has completely started.
The postStart command was running before the app was ready.
readinessProbe:
exec:
command: [healthcheck]
initialDelaySeconds: 30
periodSeconds: 2
timeoutSeconds: 1
successThreshold: 3
failureThreshold: 10