Argo dag issue when submitting the workflow stuck in Container creating state - argo-workflows

I have an argo dag set up where there is a dag based workflow submitting to argo.
The yaml structure is as follows:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: spark-argo-dag
namespace: argo
annotations:
workflows.argoproj.io/description: "Spark Argo Dag pipelines"
spec:
entrypoint: spark-argo-dag
arguments:
parameters:
- name: date
value: 20220404
- name: region
value: us-east
- name: asset_class
value: Equity
templates:
- name: spark-argo-dag
dag:
tasks:
- name: ingest-vendor-data
template: ingestStage
arguments:
parameters:
- name: region
value: '{{workflow.parameters.region}}'
- name: asset_class
value: '{{workflow.parameters.asset_class}}'
- name: transform-data
dependencies: [ingest-vendor-data]
template: transformData
arguments:
parameters:
- name: file_date
value: '{{workflow.parameters.date}}'
- name: save-data
dependencies: [transform-data]
template: saveData
- name: ingestStage
inputs:
parameters:
- name: region
- name: asset_class
container:
command:
args: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://kubernetes.default.svc \
--deploy-mode cluster \
--conf spark.kubernetes.container.image=test-spark-stage1:latest \
--conf spark.driver.extraJavaOptions='-Divy.cache.dir=/tmp -Divy.home=/tmp' \
--conf spark.app.name=test-spark-job \
--conf spark.jars.ivy=/tmp/.ivy \
--conf spark.kubernetes.driverEnv.HTTP2_DISABLE=true \
--conf spark.kubernetes.namespace=argo \
--conf spark.executor.instances=1 \
--packages org.postgresql:postgresql:42.1.4 \
--conf spark.kubernetes.driver.pod.name=custom-app \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=default \
--class IngestStage \
local:///opt/app/test-spark-stage1.jar local {{input.parameters.region}} {{input.parameters.asset_class}}"
]
image: srivastu/test-spark-stage1:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
name: exepv
volumes:
- name: exepv
hostPath:
path: /home/docker/data
type: Directory
- name: transformData
inputs:
parameters:
- name: file_date
container:
command:
args: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://kubernetes.default.svc \
--deploy-mode cluster \
--conf spark.kubernetes.container.image=test-spark-stage2:latest \
--conf spark.driver.extraJavaOptions='-Divy.cache.dir=/tmp -Divy.home=/tmp' \
--conf spark.jars.ivy=/tmp/.ivy \
--conf spark.kubernetes.driverEnv.HTTP2_DISABLE=true \
--conf spark.kubernetes.namespace=argo \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.executor.volumes.hostPath.exepv.mount.path='/data' \
--conf spark.kubernetes.executor.volumes.hostPath.exepv.options.path='/home/docker/data' \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--packages io.delta:delta-core_2.12:1.1.0 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=default \
--class TransformDataStage \
local:///opt/app/test-spark-stage2.jar local {{inputs.parameters.file_date}}"
]
image: srivastu/test-spark-stage2:latest
imagePullPolicy: IfNotPresent
- name: saveData
container:
command:
args: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://kubernetes.default.svc \
--deploy-mode cluster \
--conf spark.kubernetes.container.image=test-spark-stage1:latest \
--conf spark.driver.extraJavaOptions='-Divy.cache.dir=/tmp -Divy.home=/tmp' \
--conf spark.jars.ivy=/tmp/.ivy \
--conf spark.kubernetes.driverEnv.HTTP2_DISABLE=true \
--conf spark.kubernetes.namespace=argo \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.executor.volumes.hostPath.exepv.mount.path='/data' \
--conf spark.kubernetes.executor.volumes.hostPath.exepv.options.path='/home/docker/data' \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--packages io.delta:delta-core_2.12:1.1.0 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=default \
--class SaveDataStage \
local:///opt/app/test-spark-stage1.jar local "
]
image: srivastu/test-spark-stage2:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
name: exepv
volumes:
- name: exepv
hostPath:
path: /home/docker/data
type: Directory
Now when i submit the workflow using the command :
argo submit -n argo --watch test_spark_staged_argo.yaml
When i run this the container gets stuck in Container Creating state
Status: Running
Conditions:
PodRunning False
Created: Mon Apr 11 14:28:06 +0530 (22 minutes ago)
Started: Mon Apr 11 14:28:06 +0530 (22 minutes ago)
Duration: 22 minutes 35 seconds
Progress: 0/1
Parameters:
date: 20220404
region: us-east
asset_class: Equity
STEP TEMPLATE PODNAME DURATION MESSAGE
● spark-argo-dagsp2gk spark-argo-dag
└─◷ ingest-vendor-data ingestStage spark-argo-dagsp2gk-609318480 22m ContainerCreating
Since its stuck in Container creating state logs cannot be viewed in of container
When doing kubectl command to view the list of pods get the following output
[#a-uexsrqhp5gj1 spark_argo_test]$ kubectl get pods -n argo
NAME READY STATUS RESTARTS AGE
argo-server-78b4844f66-9ch7z 1/1 Running 0 3d20h
hello-world-zvjqz 0/2 Completed 0 3h29m
spark-argo-dagffpvf-1073011739 0/2 ContainerCreating 0 4h19m
spark-argo-dagsp2gk-609318480 0/2 ContainerCreating 0 111m
workflow-controller-756c8c87ff-w9l4z 1/1 Running 0 3d19h
[#a-uexsrqhp5gj1 spark_argo_test]$
Not able to figure out the exact reason why it went into Container Creating state

Related

ChaosToolkit --var-file option is fails with error Error: no such option: --var-file /tmp/token.env

We are trying to run a chaos experiment and running into this error:
ubuntu#ip-x-x-x-x:~$ kubectl logs -f pod/chaos-testing-hn8c5 -n <ns>
[2022-12-08 16:05:22 DEBUG] [cli:70] ###############################################################################
[2022-12-08 16:05:22 DEBUG] [cli:71] Running command 'run'
[2022-12-08 16:05:22 DEBUG] [cli:75] Using settings file '/root/.chaostoolkit/settings.yaml'
Usage: chaos run [OPTIONS] SOURCE
Try 'chaos run --help' for help.
Error: no such option: --var-file /tmp/token.env
Here is the spec file:
spec:
serviceAccountName: {{ .Values.serviceAccount.name }}
restartPolicy: Never
initContainers:
- name: {{ .Values.initContainer.name }}
image: "{{ .Values.initContainer.image.name }}:{{ .Values.initContainer.image.tag }}"
imagePullPolicy: {{ .Values.initContainer.image.pullPolicy }}
command: ["sh", "-c", "curl -X POST https://<url> -H 'Content-Type: application/x-www-form-urlencoded' -d 'grant_type=client_credentials&client_id=<client_id&client_secret=<client_secret>'| jq -r --arg prefix 'ACCESS_TOKEN=' '$prefix + (.access_token)' > /tmp/token.env;"]
volumeMounts:
- name: token-path
mountPath: /tmp
- name: config
mountPath: /experiment
readOnly: true
containers:
- name: {{ .Values.image.name }}
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
image: {{ .Values.image.repository }}
args:
- --verbose
- run
- --var-file /tmp/token.env
- /experiment/terminate-all-pods.yaml
env:
- name: CHAOSTOOLKIT_IN_POD
value: "true"
volumeMounts:
- name: token-path
mountPath: /tmp
- name: config
mountPath: /experiment
readOnly: true
resources:
limits:
cpu: 20m
memory: 64Mi
requests:
cpu: 20m
memory: 64Mi
volumes:
- name: token-path
emptyDir: {}
- name: config
configMap:
name: {{ .Values.experiments.name }}
We have also tried using the --var "KEY=VALUE" which also failed with the same error.
Any help with this is appreciated. We have hit the wall at this point in time
Docker image being used is: https://hub.docker.com/r/chaostoolkit/chaostoolkit/tags
The kubernetes manifest is slight incorrect.
The environment variable injection worked when passing it like this:
args:
- --verbose
- run
- --var-file
- /tmp/token.env
- /experiment/terminate-all-pods.yaml
The option flag and its value needed to be on two separate lines

Kaniko Image Cache in Jenkins Kubernetes Agents

Here's the Jenkinsfile, I'm spinning up:
pipeline {
agent {
kubernetes {
yaml '''
apiVersion: v1
kind: Pod
metadata:
name: kaniko
namespace: jenkins
spec:
containers:
- name: kaniko
image: gcr.io/kaniko-project/executor:v1.8.1-debug
imagePullPolicy: IfNotPresent
command:
- /busybox/cat
tty: true
volumeMounts:
- name: jenkins-docker-cfg
mountPath: /kaniko/.docker
- name: image-cache
mountPath: /cache
imagePullSecrets:
- name: regcred
volumes:
- name: image-cache
persistentVolumeClaim:
claimName: kaniko-cache-pvc
- name: jenkins-docker-cfg
projected:
sources:
- secret:
name: regcred
items:
- key: .dockerconfigjson
path: config.json
'''
}
}
stages {
stage('Build & Cache Image'){
steps{
container(name: 'kaniko', shell: '/busybox/sh') {
withEnv(['PATH+EXTRA=/busybox']) {
sh '''#!/busybox/sh -xe
/kaniko/executor \
--cache \
--cache-dir=/cache \
--dockerfile Dockerfile \
--context `pwd`/Dockerfile \
--insecure \
--skip-tls-verify \
--destination testrepo/kaniko-test:0.0.1'''
}
}
}
}
}
}
Problem is the executor doesn't dump the cache anywhere I can find. If I rerun the pod and stage, the executor logs say that there's no cache. I want to retain the cache using a PVC as you can see. Any thoughts? Do I miss something?
Thanks in advance.
You should use separate pod kaniko-warmer, which will download you specific images.
- name: kaniko-warmer
image: gcr.io/kaniko-project/warmer:latest
args: ["--cache-dir=/cache",
"--image=nginx:1.17.1-alpine",
"--image=node:17"]
volumeMounts:
- name: kaniko-cache
mountPath: /cache
volumes:
- name: kaniko-cache
hostPath:
path: /opt/volumes/database/qazexam-front-cache
type: DirectoryOrCreate
Then volume kaniko-cache could be mounted to kaniko executor

Kubernetes Cron-Job Slack Notification

I have a cronjob which creates postgres backup job.. I would like to send notification to slack channels via webhook with cronjob status fail or success. How can I add a condition or specify the status of Job and sending to slack? I suppose that also below curl request will work but please warn if you see any fault.
kind: CronJob
metadata:
name: standup
spec:
schedule: "* 17 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: standup
image: busybox
resources:
requests:
cpu: 1m
memory: 100Mi
env:
- args: /bin/sh
- -c
- curl -X POST -H 'Content-type: application/json' --data '{"text":"Hello, World!"}' https://hooks.slack.com/services/TQPCENFHP/
restartPolicy: OnFailure
~ semural$ kubectl logs $pods -n database
The following backups are available in specified backup path:
Added `s3` successfully.
[2020-04-13 14:24:46 UTC] 0B postgresql-cluster/
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
postgresql-postgresql-helm-backup 0 0 * * * False 0 8h 18h
NAME COMPLETIONS DURATION AGE
postgresql-postgresql-helm-backup-1586822400 1/1 37s 8h
postgresql-postgresql-helm-backup-list 1/1 2s 18h
postgresql-postgresql-helm-pgmon 1/1 49s 18h
I think we can create a simple script to get the cronjob status:
import json
import os
from kubernetes import client, config, utils
from kubernetes.client.rest import ApiException
from api.exceptions import BatchApiNamespaceNotExistedException
class Constants:
BACKOFF_LIMIT = 1
STATUS_RUNNING = "RUNNING"
STATUS_SUCCEED = "SUCCEED"
STATUS_FAILED = "FAILED"
STATUS_NOT_FOUND = "NOT FOUND"
class KubernetesApi:
def __init__(self):
try:
config.load_incluster_config()
except:
config.load_kube_config()
self.configuration = client.Configuration()
self.api_instance = client.BatchV1Api(client.ApiClient(self.configuration))
self.api_instance_v1_beta = client.BatchV1beta1Api(client.ApiClient(self.configuration))
def get_job_status(self, job):
if job is not None:
total_failed_pod = job.status.failed or 0
total_succeeded_pod = job.status.succeeded or 0
if total_failed_pod + total_succeeded_pod < Constants.BACKOFF_LIMIT:
return Constants.STATUS_RUNNING
elif total_succeeded_pod > 0:
return Constants.STATUS_SUCCEED
return Constants.STATUS_FAILED
return Constants.STATUS_NOT_FOUND
def get_cron_job_status(self, namespace):
try:
cron_job_list = self.api_instance_v1_beta.list_namespaced_cron_job(namespace=namespace,
watch=False)
except ApiException as e:
raise BatchApiNamespaceNotExistedException("Exception when calling BatchV1Api->list_namespaced_cron_job: %s\n" % e)
for cron_job in cron_job_list.items:
if cron_job.status.active is not None:
for active_cron_job in cron_job.status.active:
job = self.api_instance.read_namespaced_job(namespace=namespace,
name=active_cron_job.name)
if job_status == Constants.STATUS_FAILED:
# Do whatever you want in there
print(job_status)
So if the status is failed then we can send the log to slack.
I think what you have is already a good start.. Assuming you have the curl command as a script that takes the first argument as the message to be posted you can do something as the following:
kind: CronJob
metadata:
name: standup
spec:
schedule: "* 17 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: standup
image: busybox
resources:
requests:
cpu: 1m
memory: 100Mi
env:
- args: /bin/sh
- -c
- run-job.py || notify-cron-job "FAIL" && notify-cron-job "SUCCESS"

Pass json string to environment variable in a k8s deployment for Envoy

I have a K8s deployment with one pod running among others a container with Envoy sw. I have defined image in such way that if there is an Environment variable EXTRA_OPTS defined it will be appended to the command line to start Envoy.
I want to use that variable to override default configuration as explained in
https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-config-yaml
Environment variable works ok for other command options such as "-l debug" for example.
Also, I have tested expected final command line and it works.
Dockerfile set Envoy to run in this way:
CMD ["/bin/bash", "-c", "envoy -c envoy.yaml $EXTRA_OPTS"]
What I want is to set this:
...
- image: envoy-proxy:1.10.0
imagePullPolicy: IfNotPresent
name: envoy-proxy
env:
- name: EXTRA_OPTS
value: ' --config-yaml "admin: { address: { socket_address: { address: 0.0.0.0, port_value: 9902 } } }"'
...
I have succesfully tested running envoy with final command line:
envoy -c /etc/envoy/envoy.yaml --config-yaml "admin: { address: { socket_address: { address: 0.0.0.0, port_value: 9902 } } }"
And I have also tested a "simpler" option in EXTRA_OPTS and it works:
...
- image: envoy-proxy:1.10.0
imagePullPolicy: IfNotPresent
name: envoy-proxy
env:
- name: EXTRA_OPTS
value: ' -l debug'
...
I would expect Envoy running with this new admin port, instead I'm having param errors:
PARSE ERROR: Argument: {
Couldn't find match for argument
It looks like quotes are not being passed to the actual Environment variable into the container...
Any clue???
Thanks to all
You should set ["/bin/bash", "-c", "envoy -c envoy.yaml"] as an ENTRYPOINT in you dockerfile or use command in kubernetes and then use args to add additional arguments.
You can find more information in docker documentation
Let me explain by example:
$ docker build -t fl3sh/test:bash .
$ cat Dockerfile
FROM ubuntu
RUN echo '#!/bin/bash' > args.sh && \
echo 'echo "$#"' >> args.sh && \
chmod -x args.sh
CMD ["args","from","docker","cmd"]
ENTRYPOINT ["/bin/bash", "args.sh", "$ENV_ARG"]
cat args.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: args
name: args
spec:
containers:
- args:
- args
- from
- k8s
image: fl3sh/test:bash
name: args
imagePullPolicy: Always
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Never
status: {}
Output:
pod/args $ENV_ARG args from k8s
cat command-args.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: command-args
name: command-args
spec:
containers:
- command:
- /bin/bash
- -c
args:
- 'echo args'
image: fl3sh/test:bash
imagePullPolicy: Always
name: args
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Never
status: {}
Output:
pod/command-args args
cat command-env-args.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
run: command-env-args
name: command-env-args
spec:
containers:
- env:
- name: ENV_ARG
value: "arg from env"
command:
- /bin/bash
- -c
- exec echo "$ENV_ARG"
image: fl3sh/test:bash
imagePullPolicy: Always
name: args
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Never
status: {}
Output:
pod/command-env-args arg from env
cat command-no-args.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: command-no-args
name: command-no-args
spec:
containers:
- command:
- /bin/bash
- -c
- 'echo "no args";echo "$#"'
image: fl3sh/test:bash
name: args
imagePullPolicy: Always
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Never
status: {}
Output:
pod/command-no-args no args
#notice ^ empty line above
cat no-args.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: no-args
name: no-args
spec:
containers:
- image: fl3sh/test:bash
name: no-args
imagePullPolicy: Always
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Never
status: {}
Output:
pod/no-args $ENV_ARG args from docker cmd
If you need to recreate my example you can use this loop to get this output like above:
for p in `kubectl get po -oname`; do echo cat ${p#*/}.yaml; echo ""; \
cat ${p#*/}.yaml; echo -e "\nOutput:"; printf "$p "; \
kubectl logs $p;echo "";done
Conclusion if you need to pass env as arguments use:
command:
- /bin/bash
- -c
- exec echo "$ENV_ARG"
I hope now it is clear.

How use /var/run/docker.sock inside running docker-compose container?

I have docker-compose.yml like this:
version: '3'
services:
zabbix-agent:
image: zabbix/zabbix-agent
ports:
- "10050:10050"
- "10051:10051"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /etc/localtime:/etc/localtime:ro
- ./zbx_env/etc/zabbix/zabbix_agentd.d:/etc/zabbix/zabbix_agentd.d:ro
- ./zbx_env/var/lib/zabbix/modules:/var/lib/zabbix/modules:ro
- ./zbx_env/var/lib/zabbix/enc:/var/lib/zabbix/enc:ro
- ./zbx_env/var/lib/zabbix/ssh_keys:/var/lib/zabbix/ssh_keys:ro
links:
- db
env_file:
- .env_agent
user: root
privileged: true
pid: "host"
stop_grace_period: 5s
labels:
com.zabbix.description: "Zabbix agent"
com.zabbix.company: "Zabbix SIA"
com.zabbix.component: "zabbix-agentd"
com.zabbix.os: "ubuntu"
postgres-server:
image: postgres:latest
volumes:
- ./zbx_env/var/lib/postgresql/data:/var/lib/postgresql/data:rw
env_file:
- .env_db_pgsql
user: root
stop_grace_period: 1m
In zabbix-agent i use UserParameter like as:
...
UserParameter=pgsql.ping[*],/bin/echo -e "\\\timing \n select 1" | psql -qAtX $1 | tail -n 1 |cut -d' ' -f2|sed 's/,/./'
...
When i call from zabbix-server this UserParameter, i have error about not exists psql. And it's correct - in container 'zabbix-agent' psql not exist.
How can i run psql that containing in 'postgres-server' from 'zabbix-agent' and get result?
Just run:
curl -H 'Content-Type: application/json' --unix-socket /var/run/docker.sock localhost:4243/containers/zabbix-agent/exec -d '{"Cmd":["date"]}'
How make requests look this:
https://docs.docker.com/develop/sdk/examples/
API reference look this:
https://docs.docker.com/engine/api/v1.27/#operation/ContainerExec