Can someone give me some guidance on best practice to bring multiple Talend jobs dynamically into Kubernetes?
I am using Talend Open Studio for Big Data
I have a listener server for Job2Docker
How do I change the scripts to automate a push to Docker Hub?
Is it possible to have a dynamic CronJob K8s type that can run jobs based on a configuration file.
I ended up not using Job2Docker in favour of a simple docker process.
Build the Talend Standalone job.
Unzip the build into a folder called jobs.
Use the below Dockerfile example to build and push your docker image.
Dockerfile
FROM java
WORKDIR /talend
COPY ./jobs /talend
Create a CronJob type for K8s
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: etl-edw-cronjob
spec:
schedule: "0 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: etl-edw-job
image: dockerhubrepo/your-etl
command: ["sh", "./process_data_warehouse_0.1/process_data_warehouse/process_data_warehouse_run.sh"]
env:
- name: PGHOST
value: postgres-cluster-ip-service
- name: PGPORT
value: "5432"
- name: PGDATABASE
value: infohub
- name: PGUSER
value: postgres
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: pgpassword
key: PGPASSWORD
- name: MONGOSERVER
value: mongo-service
- name: MONGOPORT
value: "27017"
- name: MONGODB
value: hearth
Related
I am trying to backup postgres database from RDS using K8s cronjob.
I have created cronjob for it my EKS cluster and credentials are in Secrets.
When Its try to copy backup fail into AWS S3 bucket pod fails with error:
aws: error: argument command: Invalid choice, valid choices are:
I tried different options but its not working.
Anybody please help in resolving this issue.
Here is brief info:
K8s cluster is on AWS EKS
Db is on RDS
I am using following config for my cronjob:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "*/3 * * * *"
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
initContainers:
- name: dump
image: postgres:12.1-alpine
volumeMounts:
- name: data
mountPath: /backup
args:
- pg_dump
- "-Fc"
- "-f"
- "/backup/redash-postgres.pgdump"
- "-Z"
- "9"
- "-v"
- "-h"
- "postgress.123456789.us-east-2.rds.amazonaws.com"
- "-U"
- "postgress"
- "-d"
- "postgress"
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
# Retrieve postgres password from a secret
name: postgres
key: POSTGRES_PASSWORD
containers:
- name: save
image: amazon/aws-cli
volumeMounts:
- name: data
mountPath: /backup
args:
- aws
- "--version"
envFrom:
- secretRef:
# Must contain AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
name: s3-backup-credentials
restartPolicy: Never
volumes:
- name: data
emptyDir: {}
Try this:
...
containers:
- name: save
image: amazon/aws-cli
...
args:
- "--version" # <-- the image entrypoint already call "aws", you only need to specify the arguments here.
...
I have my dockerfile in which i have used postgres:12 image and i modified it using some ddl scripts and then i build this image and i can run the container through docker run command but how i can use Kubernetes jobs to run build image , as I dont have good exp on k8s.
This is my dockerfile here you can see it.
docker build . -t dockerdb
FROM postgres:12
ENV POSTGRES_PASSWORD xyz#123123!233
ENV POSTGRES_DB test
ENV POSTGRES_USER test
COPY ./Scripts /docker-entrypoint-initdb.d/
How i can customize the below code using the below requirement
apiVersion: batch/v1
kind: Job
metadata:
name: job-1
spec:
template:
metadata:
name: job-1
spec:
containers:
- name: postgres
image: gcr.io/project/pg_12:dev
command:
- /bin/sh
- -c
- "not sure what command should i give in last line"
Not sure how you are running the docker image
if you are running your docker image without passing any command you can directly run the image in Job.
docker run <imagename>
once your Dockerimage is ready and build you can run it directly
You job will get executed without passing any command
apiVersion: batch/v1
kind: Job
metadata:
name: job-1
spec:
template:
metadata:
name: job-1
spec:
containers:
- name: postgres
image: gcr.io/project/pg_12:dev
if you want to pass any argument or command that you can pass further
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: <CHANGE IMAGE URL>
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Just to update above template is for Cronjob, Cronjob run on specific time.
As the documentation shows, you should be setting the env vars when doing a docker run like the following:
docker run --name some-postgres -e POSTGRES_PASSWORD='foo' POSTGRES_USER='bar'
This sets the superuser and password to access the database instead of the defaults of POSTGRES_PASSWORD='' and POSTGRES_USER='postgres'.
However, I'm using Skaffold to spin up a k8s cluster and I'm trying to figure out how to do something similar. How does one go about doing this for Kubernetes and Skaffold?
#P Ekambaram is correct but I would like to go further into this topic and explain the "whys and hows".
When passing passwords on Kubernetes, it's highly recommended to use encryption and you can do this by using secrets.
Creating your own Secrets (Doc)
To be able to use the secrets as described by #P Ekambaram, you need to have a secret in your kubernetes cluster.
To easily create a secret, you can also create a Secret from generators and then apply it to create the object on the Apiserver. The generators should be specified in a kustomization.yaml inside a directory.
For example, to generate a Secret from literals username=admin and password=secret, you can specify the secret generator in kustomization.yaml as
# Create a kustomization.yaml file with SecretGenerator
$ cat <<EOF >./kustomization.yaml
secretGenerator:
- name: db-user-pass
literals:
- username=admin
- password=secret
EOF
Apply the kustomization directory to create the Secret object.
$ kubectl apply -k .
secret/db-user-pass-dddghtt9b5 created
Using Secrets as Environment Variables (Doc)
This is an example of a pod that uses secrets from environment variables:
apiVersion: v1
kind: Pod
metadata:
name: secret-env-pod
spec:
containers:
- name: mycontainer
image: redis
env:
- name: SECRET_USERNAME
valueFrom:
secretKeyRef:
name: mysecret
key: username
- name: SECRET_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
restartPolicy: Never
Source: here and here.
Use the below YAML
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 1
template:
metadata:
labels:
name: postgres
spec:
containers:
- name: postgres
image: postgres:11.2
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: "sampledb"
- name: POSTGRES_USER
value: "postgres"
- name: POSTGRES_PASSWORD
value: "secret"
volumeMounts:
- name: data
mountPath: /var/lib/postgresql
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: postgres
spec:
type: ClusterIP
ports:
- port: 5432
selector:
name: postgres
I am using the following job template:
apiVersion: batch/v1
kind: Job
metadata:
name: rotatedevcreds2
spec:
template:
metadata:
name: rotatedevcreds2
spec:
containers:
- name: shell
image: akanksha/dsserver:v7
env:
- name: DEMO
value: "Hello from the environment"
- name: personal_AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: rotatecreds-env
key: personal_aws_secret_access_key
- name: personal_AWS_SECRET_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: rotatecreds-env
key: personal_aws_secret_access_key_id
- name: personal_GIT_TOKEN
valueFrom:
secretKeyRef:
name: rotatecreds-env
key: personal_git_token
command:
- "bin/bash"
- "-c"
- "whoami; pwd; /root/rotateCreds.sh"
restartPolicy: Never
imagePullSecrets:
- name: regcred
The shell script runs some ansible tasks which results in:
TASK [Get the existing access keys for the functional backup ID] ***************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "aws iam list-access-keys --user-name ''", "failed_when_result": true, "msg": "[Errno 2] No such file or directory", "rc": 2}
However if I spin a pod using the same iamge using the following
apiVersion: batch/v1
kind: Job
metadata:
name: rotatedevcreds3
spec:
template:
metadata:
name: rotatedevcreds3
spec:
containers:
- name: shell
image: akanksha/dsserver:v7
env:
- name: DEMO
value: "Hello from the environment"
- name: personal_AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: rotatecreds-env
key: personal_aws_secret_access_key
- name: personal_AWS_SECRET_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: rotatecreds-env
key: personal_aws_secret_access_key_id
- name: personal_GIT_TOKEN
valueFrom:
secretKeyRef:
name: rotatecreds-env
key: personal_git_token
command:
- "bin/bash"
- "-c"
- "whoami; pwd; /root/rotateCreds.sh"
restartPolicy: Never
imagePullSecrets:
- name: regcred
This creates a POD and I am able to login to the pod and run /root/rotateCreds.sh
While running the job it seems it not able to recognose the aws cli. I tried debugging whoami and pwd which is equal to root and / respectively and that is fine. Any pointers what is missing? I am new to jobs.
For further debugging in the job template I added a sleep for 10000 seconds so that I can login to the container and see what's happening. I noticed after logging in I was able to run the script manually too. aws command was recognised properly.
It is likely your PATH is not set correctly,
a quick fix is to define the absolute path of aws-cli like /usr/local/bin/aws in /root/rotateCreds.sh script
Ok so I added an export command to update the path and that fixed the issue. The issue was: I was using command resource so it was not in bash environment. So either we can use a shell resource with bash argument as described here:
https://docs.ansible.com/ansible/latest/modules/shell_module.html
or export new PATH.
with the help of kubernetes I am running daily jobs on GKE, On a daily basis based on cron configured in kubernetes a new container spins up and try to insert some data into BigQuery.
The setup that we have is we have 2 different projects in GCP in one project we maintain the data in BigQuery in other project we have all the GKE running so when GKE has to interact with different project resource my guess is I have to set an environment variable with name GOOGLE_APPLICATION_CREDENTIALS which points to a service account json file, but since every day kubernetes is spinning up a new container I am not sure how and where I should set this variable.
Thanks in Advance!
NOTE: this file is parsed as a golang template by the drone-gke plugin.
---
apiVersion: v1
kind: Secret
metadata:
name: my-data-service-account-credentials
type: Opaque
data:
sa_json: "bas64JsonServiceAccount"
---
apiVersion: v1
kind: Pod
metadata:
name: adtech-ads-apidata-el-adunit-pod
spec:
containers:
- name: adtech-ads-apidata-el-adunit-container
volumeMounts:
- name: service-account-credentials-volume
mountPath: "/etc/gcp"
readOnly: true
volumes:
- name: service-account-credentials-volume
secret:
secretName: my-data-service-account-credentials
items:
- key: sa_json
path: sa_credentials.json
This is our cron jobs for loading the AdUnit Data
apiVersion: batch/v2alpha1
kind: CronJob
metadata:
name: adtech-ads-apidata-el-adunit
spec:
schedule: "*/5 * * * *"
suspend: false
concurrencyPolicy: Replace
successfulJobsHistoryLimit: 10
failedJobsHistoryLimit: 10
jobTemplate:
spec:
template:
spec:
containers:
- name: adtech-ads-apidata-el-adunit-container
image: {{.image}}
args:
- -cp
- opt/nyt/DFPDataIngestion-1.0-jar-with-dependencies.jar
- com.nyt.cron.AdUnitJob
env:
- name: ENV_APP_NAME
value: "{{.env_app_name}}"
- name: ENV_APP_CONTEXT_NAME
value: "{{.env_app_context_name}}"
- name: ENV_GOOGLE_PROJECTID
value: "{{.env_google_projectId}}"
- name: ENV_GOOGLE_DATASETID
value: "{{.env_google_datasetId}}"
- name: ENV_REPORTING_DATASETID
value: "{{.env_reporting_datasetId}}"
- name: ENV_ADBRIDGE_DATASETID
value: "{{.env_adbridge_datasetId}}"
- name: ENV_SALESFORCE_DATASETID
value: "{{.env_salesforce_datasetId}}"
- name: ENV_CLOUD_PLATFORM_URL
value: "{{.env_cloud_platform_url}}"
- name: ENV_SMTP_HOST
value: "{{.env_smtp_host}}"
- name: ENV_TO_EMAIL
value: "{{.env_to_email}}"
- name: ENV_FROM_EMAIL
value: "{{.env_from_email}}"
- name: ENV_AWS_USERNAME
value: "{{.env_aws_username}}"
- name: ENV_CLIENT_ID
value: "{{.env_client_id}}"
- name: ENV_REFRESH_TOKEN
value: "{{.env_refresh_token}}"
- name: ENV_NETWORK_CODE
value: "{{.env_network_code}}"
- name: ENV_APPLICATION_NAME
value: "{{.env_application_name}}"
- name: ENV_SALESFORCE_USERNAME
value: "{{.env_salesforce_username}}"
- name: ENV_SALESFORCE_URL
value: "{{.env_salesforce_url}}"
- name: GOOGLE_APPLICATION_CREDENTIALS
value: "/etc/gcp/sa_credentials.json"
- name: ENV_CLOUD_SQL_URL
valueFrom:
secretKeyRef:
name: secrets
key: cloud_sql_url
- name: ENV_AWS_PASSWORD
valueFrom:
secretKeyRef:
name: secrets
key: aws_password
- name: ENV_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: secrets
key: dfp_client_secret
- name: ENV_SALESFORCE_PASSWORD
valueFrom:
secretKeyRef:
name: secrets
key: salesforce_password
restartPolicy: OnFailure
So, if your GKE project is project my-gke, and the project containing the services/things your GKE containers need access to is project my-data, one approach is to:
Create a service account in the my-data project. Give it whatever GCP roles/permissions are needed (ex. roles/bigquery.
dataViewer if you have some BigQuery tables that your my-gke GKE containers need to read).
Create a service account key for that service account. When you do this in the console following https://cloud.google.com/iam/docs/creating-managing-service-account-keys, you should automatically download a .json file containing the SA credentials.
Create a Kubernetes secret resource for those service account credentials. It might look something like this:
apiVersion: v1
kind: Secret
metadata:
name: my-data-service-account-credentials
type: Opaque
data:
sa_json: <contents of running 'base64 the-downloaded-SA-credentials.json'>
Mount the credentials in the container that needs access:
[...]
spec:
containers:
- name: my-container
volumeMounts:
- name: service-account-credentials-volume
mountPath: /etc/gcp
readOnly: true
[...]
volumes:
- name: service-account-credentials-volume
secret:
secretName: my-data-service-account-credentials
items:
- key: sa_json
path: sa_credentials.json
Set the GOOGLE_APPLICATION_CREDENTIALS environment variable in the container to point to the path of the mounted credentials:
[...]
spec:
containers:
- name: my-container
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /etc/gcp/sa_credentials.json
With that, any official GCP clients (ex. the GCP Python client, GCP Java Client, gcloud CLI, etc. should respect the GOOGLE_APPLICATION_CREDENTIALS env var and, when making API requests, automatically use the credentials of the my-data service account that you created and mounted the credentials .json file for.