How can I specify Argo workflow multiple output artifacts to a single directory? - argo-workflows

I'm using Argo Workflow, and want to produce 2 separate artifacts. Defining the output artifacts as below, it tells me path '/tmp' already mounted in inputs.artifacts.txt. How can I mount produce 2 separate artifacts to a single directory(in this case, /tmp)?
outputs:
artifacts:
- name: txt
path: /tmp
s3:
endpoint: s3.amazonaws.com
bucket: <My Bucket>
key: test.txt.tgz
accessKeySecret:
name: vault-data
key: s3_access_key-0
secretKeySecret:
name: vault-data
key: s3_secret_key-0
- name: total-file-count
path: /tmp
s3:
endpoint: s3.amazonaws.com
bucket: <My Bucket>
key: total-file-count.tgz
accessKeySecret:
name: vault-data
key: s3_access_key-0
secretKeySecret:
name: vault-data
key: s3_secret_key-0

path refers to the full path of the artifact to be written to S3 (not just to the directory in which the file is found).
To write both artifacts to S3, use the full path of the source files. Assuming the filenames match the key names, this should work:
outputs:
artifacts:
- name: txt
path: /tmp/test.txt.tgz
s3:
endpoint: s3.amazonaws.com
bucket: <My Bucket>
key: test.txt.tgz
accessKeySecret:
name: vault-data
key: s3_access_key-0
secretKeySecret:
name: vault-data
key: s3_secret_key-0
- name: total-file-count
path: /tmp/total-file-count.tgz
s3:
endpoint: s3.amazonaws.com
bucket: <My Bucket>
key: total-file-count.tgz
accessKeySecret:
name: vault-data
key: s3_access_key-0
secretKeySecret:
name: vault-data
key: s3_secret_key-0

Related

Argoworkflow can't call template inputs parameters within steps

kind: Workflow
metadata:
generateName: small-
spec:
entrypoint: fan-out-in-params-workflow
arguments:
parameters:
- name: jira-ticket
value: INFRA-000
templates:
- name: fan-out-in-params-workflow
steps:
- - name: generate
template: gen-host-list
- - name: pre-conditions
template: pre-conditions
arguments:
parameters:
- name: host
value: "{{item}}"
withParam: "{{steps.generate.outputs.result}}"
- name: gen-host-list
inputs:
artifacts:
- name: host
path: /tmp/host.txt
s3:
key: host.txt
script:
image: python:alpine3.6
command: [python]
source: |
import json
import sys
filename="{{ inputs.artifacts.host.path }}"
with open(filename, 'r', encoding='UTF-8') as f:
json.dump([line.rstrip() for line in f], sys.stdout)
- name: pre-conditions
inputs:
parameters:
- name: host
steps:
- - name: online-check
template: online-check
arguments:
parameters:
- name: host
value: {{inputs.parameters.host}}
- name: online-check
inputs:
parameters:
- name: host
script:
image: python:alpine3.6
command:
- python
source: |
print({{inputs.parameters.host}})
Hi there, I'm quite new to argoworkflow. Now I'm trying to call the template pre-conditions inputs parameters host like I posted above. But it seems the host params passes to template pre-conditions successfully but I can't get it in steps online-check, anyone can give me some advice ? Anything will be appreciated !

argo workflow submit error - duplicated node name

I am trying to use argo events + argo workflow . However I am constantly getting this duplicated nodename for ideally not sure why is it saying so . I have a sensor which reacts to events and it has a dag workflow.
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: argocd-dotnet-kafka-subscriber
spec:
template:
serviceAccountName: argo-events-sa
dependencies:
- name: github
eventSourceName: github
eventName: github-app # argocd-dotnet-kafka-event
triggers:
- template:
name: trigger
argoWorkflow:
group: argoproj.io
version: v1alpha1
resource: workflows
operation: submit
source:
resource:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: argocd-dotnet-kafka-
namespace: workflows
spec:
entrypoint: build
serviceAccountName: workflow
volumes:
- name: regcred
secret:
secretName: regcred
items:
- key: .dockerconfigjson
path: config.json
- name: github-access
secret:
secretName: github-access
items:
- key: token
path: token
- key: user
path: user
- key: email
path: email
templates:
- name: build
dag:
tasks:
- name: build
templateRef:
name: container-image
template: build-kaniko-git
clusterScope: true
arguments:
parameters:
- name: repo_url
value: "https://github.com/Workquark/argocd-dotnet-kafka-subscriber-deploy"
- name: repo_ref
value: ""
- name: repo_commit_id
value: ""
- name: container_image
value: joydeep1985/argocd-dotnet-kafka-subscriber-deploy
- name: container_tag
value: "latest"
- name: test
script:
image: alpine
command: [sh]
source: |
echo This is a testing simulation...
sleep 5
volumeMounts:
- name: github-access
mountPath: /.github/
parameters:
- src:
dependencyName: github
dataKey: body.repository.git_url
dest: spec.templates.0.dag.tasks.0.arguments.parameters.0.value
- src:
dependencyName: github
dataKey: body.ref
dest: spec.templates.0.dag.tasks.0.arguments.parameters.1.value
- src:
dependencyName: github
dataKey: body.after
dest: spec.templates.0.dag.tasks.0.arguments.parameters.2.value
- src:
dependencyName: github
dataKey: body.repository.name
dest: spec.templates.0.dag.tasks.0.arguments.parameters.3.value
operation: append
- src:
dependencyName: github
dataKey: body.after
dest: spec.templates.0.dag.tasks.0.arguments.parameters.4.value
- src:
dependencyName: github
dataKey: body.repository.name
dest: spec.templates.0.dag.tasks.1.arguments.parameters.4.value
- src:
dependencyName: github
dataKey: body.after
dest: spec.templates.0.dag.tasks.1.arguments.parameters.5.value
- src:
dependencyName: github
dataKey: body.repository.name
dest: spec.templates.0.dag.tasks.2.arguments.parameters.4.value
- src:
dependencyName: github
dataKey: body.after
dest: spec.templates.0.dag.tasks.2.arguments.parameters.5.value
Above is the sensor code for it .
apiVersion: argoproj.io/v1alpha1
kind: ClusterWorkflowTemplate
metadata:
name: container-image
spec:
serviceAccountName: workflow
templates:
- name: build-kaniko-git
inputs:
parameters:
- name: repo_url
- name: repo_ref
value: refs/heads/master
- name: repo_commit_id
value: HEAD
- name: container_image
- name: container_tag
container:
image: gcr.io/kaniko-project/executor:debug
command: [/kaniko/executor]
args:
- --context={{inputs.parameters.repo_url}}#{{inputs.parameters.repo_ref}}#{{inputs.parameters.repo_commit_id}}
- --destination={{inputs.parameters.container_image}}:{{inputs.parameters.container_tag}}
volumeMounts:
- name: regcred
mountPath: /kaniko/.docker/
Above is the templateref for the argo workflow for kaniko . The error I keep getting is -
time="2022-04-20T01:25:40.089Z" level=fatal msg="Failed to submit workflow:
templates.build sorting failed: duplicated nodeName "
{"level":"error","ts":1650417940.0938516,"logger":"argo-events.sensor","caller":"sensors/listener.go:355","msg":"failed to execute a trigger","sensorName":"argocd-dotnet-kafka-subscriber","error":"failed to execute trigger: timed out waiting for the condition: failed to execute submit command for workflow : exit status 1","errorVerbose":"timed out waiting for the condition: failed to execute submit command for workflow : exit status 1\nfailed to execute trigger\ngithub.com/argoproj/argo-events/sensors.(*SensorContext).triggerOne\n\t/home/runner/work/argo-events/argo-events/sensors/listener.go:408\ngithub.com/argoproj/argo-events/sensors.(*SensorContext).triggerWithRateLimit\n\t/home/runner/work/argo-events/argo-events/sensors/listener.go:353\nruntime.goexit\n\t/opt/hostedtoolcache/go/1.17.1/x64/src/runtime/asm_amd64.s:1581","triggerName":"trigger","triggeredBy":["github"],"triggeredByEvents":["32623564393765662d343331612d346333342d613166352d346230613238613735353163"],"stacktrace":"github.com/argoproj/argo-events/sensors.(*SensorContext).triggerWithRateLimit\n\t/home/runner/work/argo-events/argo-events/sensors/listener.go:355"}

airflow.exceptions.AirflowException: Dag could not be found; either it does not exist or it failed to parse

I recently upgraded the Airflow from 1.10.11 to 2.2.3 after following the steps given in https://airflow.apache.org/docs/apache-airflow/stable/upgrading-from-1-10/index.html. I first up upgraded to 1.10.15 as suggested which worked fine. But after upgrading to 2.2.3, I'm unable to execute the DAGs from UI as the task is going into queued state. When I check the task pod logs, I see this error:
[2022-02-22 06:46:23,886] {cli_action_loggers.py:105} WARNING - Failed to log action with (sqlite3.OperationalError) no such table: log
[SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?, ?, ?, ?, ?, ?, ?)]
[parameters: ('2022-02-22 06:46:23.880923', 'dag id', 'task id', 'cli_task_run', None, 'airflow', '{"host_name": "pod name", "full_command": "[\'/home/airflow/.local/bin/airflow\', \'tasks\', \ task id\', \'manual__2022-02-22T06:45:47.840912+00:00\', \'--local\', \'--subdir\', \'DAGS_FOLDER/dag_file.py\']"}')]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2022-02-22 06:46:23,888] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/repo/xxxxx.py
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 282, in task_run
dag = get_dag(args.subdir, args.dag_id)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 193, in get_dag
f"Dag {dag_id!r} could not be found; either it does not exist or it failed to parse."
airflow.exceptions.AirflowException: Dag 'xxxxx' could not be found; either it does not exist or it failed to parse
I did try exec into the webserver and scheduler using "kubectl exec -it airflow-dev-webserver-6c5755d5dd-262wd -n dev --container webserver -- /bin/sh". I could see all the dags under /opt/airflow/dags/repo/. Even in the error it says Filling up the DagBag from /opt/airflow/dags/repo/ but couldn't understand what's making the task execution to go into queued state.
I figured out the issue using below steps:
I triggered a DAG after which I could see a task pod going into error state. So I did "kubectl logs {pod_name} git-sync" to check whether the DAGs are being copied in the first place or not. Then I found this below error:
image
Then I realized that it is the problem with permissions for writing the DAGs to the DAGs folder. For this I tried changing the "readOnly: false" under "volumeMounts" section.
image
That's it!!! It worked. Below worked finally:
Pod Template File:
apiVersion: v1
kind: Pod
metadata:
labels:
component: worker
release: airflow-dev
tier: airflow
spec:
containers:
- args: []
command: []
env:
- name: AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY
value: ECR repo link
- name: AIRFLOW__SMTP__SMTP_PORT
value: '587'
- name: AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG
value: docker image tag
- name: AIRFLOW__KUBERNETES__GIT_SYNC_RUN_AS_USER
value: '65533'
- name: AIRFLOW__CORE__ENABLE_XCOM_PICKLING
value: 'True'
- name: AIRFLOW__KUBERNETES__LOGS_VOLUME_CLAIM
value: dw-airflow-dev-logs
- name: AIRFLOW__KUBERNETES__RUN_AS_USER
value: '50000'
- name: AIRFLOW__KUBERNETES__DAGS_IN_IMAGE
value: 'False'
- name: AIRFLOW__SCHEDULER__SCHEDULE_AFTER_TASK_EXECUTION
value: 'False'
- name: AIRFLOW__SMTP__SMTP_MAIL_FROM
value: email id
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'False'
- name: AIRFLOW__SMTP__SMTP_PASSWORD
value: xxxxxxxxx
- name: AIRFLOW__SMTP__SMTP_HOST
value: smtp-relay.gmail.com
- name: AIRFLOW__KUBERNETES__NAMESPACE
value: dev
- name: AIRFLOW__SMTP__SMTP_USER
value: xxxxxxxxxx
- name: AIRFLOW__CORE__EXECUTOR
value: LocalExecutor
- name: AIRFLOW_HOME
value: /opt/airflow
- name: AIRFLOW__CORE__DAGS_FOLDER
value: /opt/airflow/dags
- name: AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT
value: /opt/airflow/dags
- name: AIRFLOW__KUBERNETES__FS_GROUP
value: "50000"
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
key: connection
name: airflow-dev-airflow-metadata
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
key: connection
name: airflow-dev-airflow-metadata
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
key: fernet-key
name: airflow-dev-fernet-key
envFrom: []
image: docker image
imagePullPolicy: IfNotPresent
name: base
ports: []
volumeMounts:
- mountPath: /opt/airflow/dags
name: airflow-dags
readOnly: false
subPath: /repo
- mountPath: /opt/airflow/logs
name: airflow-logs
- mountPath: /etc/git-secret/ssh
name: git-sync-ssh-key
subPath: ssh
- mountPath: /opt/airflow/airflow.cfg
name: airflow-config
readOnly: true
subPath: airflow.cfg
- mountPath: /opt/airflow/config/airflow_local_settings.py
name: airflow-config
readOnly: true
subPath: airflow_local_settings.py
hostNetwork: false
imagePullSecrets:
- name: airflow-dev-registry
initContainers:
- env:
- name: GIT_SYNC_REPO
value: xxxxxxxxxxxxx
- name: GIT_SYNC_BRANCH
value: master
- name: GIT_SYNC_ROOT
value: /git
- name: GIT_SYNC_DEST
value: repo
- name: GIT_SYNC_DEPTH
value: '1'
- name: GIT_SYNC_ONE_TIME
value: 'true'
- name: GIT_SYNC_REV
value: HEAD
- name: GIT_SSH_KEY_FILE
value: /etc/git-secret/ssh
- name: GIT_SYNC_ADD_USER
value: 'true'
- name: GIT_SYNC_SSH
value: 'true'
- name: GIT_KNOWN_HOSTS
value: 'false'
image: k8s.gcr.io/git-sync:v3.1.6
name: git-sync
securityContext:
runAsUser: 65533
volumeMounts:
- mountPath: /git
name: airflow-dags
readOnly: false
- mountPath: /etc/git-secret/ssh
name: git-sync-ssh-key
subPath: ssh
nodeSelector: {}
restartPolicy: Never
securityContext:
fsGroup: 50000
runAsUser: 50000
serviceAccountName: airflow-dev-worker-serviceaccount
volumes:
- emptyDir: {}
name: airflow-dags
- name: airflow-logs
persistentVolumeClaim:
claimName: dw-airflow-dev-logs
- name: git-sync-ssh-key
secret:
items:
- key: gitSshKey
mode: `444`
path: ssh
secretName: airflow-private-dags-dev
- configMap:
name: airflow-dev-airflow-config
name: airflow-config [](url)

How to manage and reference multiple artifact repositories in argo workflow

I have multiple artifact repositories and have them configured in the configMap, like:
apiVersion: v1
kind: ConfigMap
metadata:
name: artifact-repositories
data:
bucket1: |
s3:
endpoint: ...
bucket: bucket1
accessKeySecret:
...
secretKeySecret:
...
bucket2: |
s3:
endpoint: ...
bucket: bucket2
accessKeySecret:
...
secretKeySecret:
...
Then, I want to reference them in a key-only way in the same workflow :
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-
spec:
entrypoint: main
artifactRepositoryRef:
configMap: artifact-repositories
key: bucket1
templates:
- name: main
steps:
- - name: step1
template: step1
- - name: step2
template: step2
- name: step1
container:
...
outputs:
artifacts:
- name: art-output
path: /tmp/s1.txt
s3: # use bucket1 through artifactRepositoryRef
key: argo/s1.txt
- name: step2
container:
...
outputs:
artifacts:
- name: art-output
path: /tmp/s2.txt
s3:
# how to use bucket2 in a key-only way
key: argo/s2.txt
artifactRepositoryRef can only reference one artifact repository, how to reference another artifact repository in a concise way ?

kubernetess multiple deployments using one code base but different configuration (environement variables)

I have a project where we are consuming data from kafka and publishing to mongo. In fact the code base does only one task, may be mongo to kafka migration, kafka to mongo migration or something else.
we have to consume from different kafka topics and publish to different mongo collections. Now these are parallel streams of work.
Current design is to have one codebase which can consume from Any topic and publish to Any mongo collection which is configurable using Environment variables. So we created One kubernetes Pod and have multiple containers inside it. each container has different environment variables.
My questions:
Is it wise to use multiple containers in one pod. Easy to distinguish, but as they are tightly coupled , i am guessing high chance of failure and not actually proper microservice design.
Should I create multiple deployments for each of these pipelines ? Would be very difficult to maintain as each will have different deployment configs.
Is there any better way to address this ?
Sample of step 1:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: test-raw-mongodb-sink-apps
namespace: test-apps
spec:
selector:
matchLabels:
app: test-raw-mongodb-sink-apps
template:
metadata:
labels:
app: test-raw-mongodb-sink-apps
spec:
containers:
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-alchemy
- name: INPUT_TOPIC
value: test.raw.ptv.alchemy
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8081"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/dpl/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-alchemy
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-bloomberg
- name: INPUT_TOPIC
value: test.raw.pretrade.bloomberg
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8082"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-bloomberg
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-calypso
- name: INPUT_TOPIC
value: test.raw.ptv.calypso
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8083"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-calypso
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-dtres
- name: INPUT_TOPIC
value: test.raw.ptv.dtres
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8084"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-dtres
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-feds
- name: INPUT_TOPIC
value: test.raw.ptv.feds
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8085"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-feds
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-hoops
- name: INPUT_TOPIC
value: test.raw.ptv.hoops
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8086"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-hoops
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-mxcore
- name: INPUT_TOPIC
value: test.raw.ptv.murex_core
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8087"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-mxcore
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-mxeqd
- name: INPUT_TOPIC
value: test.raw.ptv.murex_eqd_sa
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8088"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-mxeqd
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-mxgts
- name: INPUT_TOPIC
value: test.raw.ptv.murex_gts_sa
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8089"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-mxgts
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-mxmr
- name: INPUT_TOPIC
value: test.raw.ptv.murex_mr
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8090"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-mxmr
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-mxgtscf
- name: INPUT_TOPIC
value: test.raw.cashflow.murex_gts_sa
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8091"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-mxgtscf
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-mxcoll
- name: INPUT_TOPIC
value: test.raw.collateral.mxcoll
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8092"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-mxcoll
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-mxcoll-link
- name: INPUT_TOPIC
value: test.raw.collateral.mxcoll_link
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8093"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-mxcoll-link
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-ost
- name: INPUT_TOPIC
value: test.raw.ptv.ost
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8094"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-ost
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
- env:
- name: EVENTS_TOPIC
value: test.ops.proc-events
- name: GROUP_ID
value: test-mongodb-sink-posmon
- name: INPUT_TOPIC
value: test.raw.ptp.posmon
- name: MONGODB_AUTH_DB
value: admin
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: MONGODB_PASSWORD
value: test123
- name: MONGODB_PORT
value: "27017"
- name: MONGODB_USERNAME
value: root
- name: SERVER_PORT
value: "8095"
- name: KAFKA_BROKERS
value: kafka-cluster-kafka-bootstrap.kafka:9093
- name: TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: ca.password
name: kafka-ca-cert
- name: KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
key: user.password
name: kafka
image: tools.testCompany.co.za:8093/local/tt--mongodb-map:0.0.7.0-SNAPSHOT
name: test-mongodb-sink-posmon
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /app/resources
name: properties
- mountPath: /stores
name: stores
readOnly: true
Thanks
A templating tool like Helm will let you fill in the environment-variable values from deploy-time settings. In Helm this would look like:
env:
- name: EVENTS_TOPIC
value: {{ .Values.eventsTopic }}
- name: GROUP_ID
value: {{ .Values.groupId }}
- name: INPUT_TOPIC
value: {{ .Values.inputTopic }}
You could then deploy this multiple times with different sets of topics:
helm install alchemy . \
--set eventsTopic=test.ops.proc-events \
--set groupId=test-mongodb-sink-alchemy \
--set inputTopic=test.raw.ptv.alchemy
helm install bloomberg . \
--set eventsTopic=test.ops.proc-events \
--set groupId=test-mongodb-sink-bloomberg \
--set inputTopic=test.raw.pretrade.bloomberg
You could write the Helm chart to be configured with a list of topic sets, too, and only deploy the set once:
{{- $top := . -}}{{-/* because "range" overwrites "." */-}}
{{- $topic := range $topics -}}
---
apiVersion: v1
kind: Deployment
metadata:
name: {{ $topic.name }}
spec:
...
env:
- name: EVENT_TOPIC
value: {{ $top.Values.eventTopic }}{{/* common to all deployments */}}
- name: GROUP_ID
value: test-mongodb-sink-{{ $topic.name }}
- name: INPUT_TOPIC
value: {{ $topic.inputTopic }}
Write configuration like:
eventTopic: test.ops.proc-events
topics:
- name: alchemy
inputTopic: test.raw.ptv.alchemy
- name: bloomberg
inputTopic: test.raw.pretrade.bloomberg
And deploy like:
helm install connector . -f topic-listing.yaml
In any case, you will want only one container per pod. There are a couple of reasons for this. If the list of topics ever changes, this lets you create or delete deployments without interfering with the other topics; if everything was in a single pod, you'd have to stop and restart everything together, and it can take Kafka a minute or two to figure out what happens. In a Kafka context, you can also run as many consumers as there are partitions on a topic, but not really more; if you have a very busy topic you can easily set that deployment's replicas: to have multiple consumers for multiple partitions, but if everything together is in one pod, your only choice is to scale everything together.
Is it wise to use multiple containers in one pod. Easy to distinguish, but as they are tightly coupled , i am guessing high chance of failure and not actually proper microservice design.
You most likely want to deploy them as separate services, so that you can update or re-configure them independently of eachother.
Should I create multiple deployments for each of these pipelines ? Would be very difficult to maintain as each will have different deployment configs.
Kustomize is a built-in tool in kubectl that is a good choice when you want to deploy the same manifest in multiple environments with different configurations. This solution require no additional tool other than kubectl.
Deploying to multiple environments with Kustomize
Directory structure:
base/
- deployment.yaml # fully deployable manifest - no templating
- kustomization.yaml # default values e.g. for dev environment
app1/
- kustomization.yaml # specific values for app1
app2/
- kustomization.yaml # specific values for app2
Example Deployment manifest with Kustomization
Here, the environment variables is loaded from a ConfigMap such that we can use configMapGenerator. This file is base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongodb-sink
namespace: test-apps
spec:
template: // some fiels, e.g. labels are omitted in example
spec:
containers:
- name: mongodb-sink
image: mongodb-map:0.0.7.0-SNAPSHOT
env:
- name: MONGODB_HOST0
value: test-mongodb-0.test-mongodb-headless.test-infra
- name: MONGODB_HOST1
value: test-mongodb-1.test-mongodb-headless.test-infra
- name: GROUP_ID
valueFrom:
configMapKeyRef:
name: my-values
key: GROUP_ID
- name: INPUT_TOPIC
valueFrom:
configMapKeyRef:
name: my-values
key: INPUT_TOPIC
...
Also add a base/kustomization.yaml file to describe the configMapGenerator and related files.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
configMapGenerator:
- name: my-values
behavior: replace
literals:
- GROUP_ID=test-mongodb-sink-calypso
- INPUT_TOPIC=test.raw.ptv.calypso
... # also add your other values
Preview Manifests
kubectl kustomize base/
Apply Manifests
kubectl apply -k base/
Add config for app1 and app2
With app1 we now want to use the manifest we have in base/ and just overlay what is different for app1. This file is app1/kustomization.yaml and similar for app2/kustomization.yaml.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../base
namePrefix: bloomberg-sink- # this gives your Deployment a prefixed name
configMapGenerator:
- name: my-values
behavior: replace
literals:
- GROUP_ID=test-mongodb-sink-bloomberg
- INPUT_TOPIC=test.raw.pretrade.bloomberg
... # also add your other values
Preview Manifests
kubectl kustomize app1/
Apply Manifests
kubectl apply -k app1/
Documentation
Kubernetes: Declarative Management of Kubernetes Objects Using Kustomize
SIG CLI: Kustomization file