Scaling Airflow with a Celery cluster using Docker swarm - docker-compose

As the title says, i want to setup Airflow that would run on a cluster (1 master, 2 nodes) using Docker swarm.
Current setup:
Right now i have Airflow setup that uses the CeleryExecutor that is running on single EC2.
I have a Dockerfile that pulls Airflow's image and pip install -r requirements.txt.
From this Dockerfile I'm creating a local image and this image is used in the docker-compose.yml that spins up the different services Airflow need (webserver, scheduler, redis, flower and some worker. metadb is Postgres that is on a separate RDS).
The docker-compose is used in docker swarm mode ie. docker stack deploy . airflow_stack
Required Setup:
I want to scale the current setup to 3 EC2s (1 master, 2 nodes) that the master would run the webserver, schedule, redis and flower and the workers would run in the nodes.
After searching and web and docs, there are a few things that are still not clear to me that I would love to know
from what i understand, in order for the nodes to run the workers, the local image that I'm building from the Dockerfile need to be pushed to some repository (if it's really needed, i would use AWS ECR) for the airflow workers to be able to create the containers from that image. is that correct?
syncing volumes and env files, right now, I'm mounting the volumes and insert the envs in the docker-compose file. would these mounts and envs be synced to the nodes (and airflow workers containers)? if not, how can make sure that everything is sync as airflow requires that all the components (apart from redis) would have all the dependencies, etc.
one of the envs that needs to be set when using a CeleryExecuter is the broker_url, how can i make sure that the nodes recognize the redis broker that is on the master
I'm sure that there are a few more things that i forget, but what i wrote is a good start.
Any help or recommendation would be greatly appreciated
Thanks!
Dockerfile:
FROM apache/airflow:2.1.3-python3.9
USER root
RUN apt update;
RUN apt -y install build-essential;
USER airflow
COPY requirements.txt requirements.txt
COPY requirements.airflow.txt requirements.airflow.txt
RUN pip install --upgrade pip;
RUN pip install --upgrade wheel;
RUN pip install -r requirements.airflow.txt
RUN pip install -r requirements.txt
EXPOSE 8793 8786 8787
docker-compose.yml:
version: '3.8'
x-airflow-celery: &airflow-celery
image: local_image:latest
volumes:
-some_volume
env_file:
-some_env_file
services:
webserver:
<<: *airflow-celery
command: airflow webserver
restart: always
ports:
- 80:8080
healthcheck:
test: [ "CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]" ]
interval: 10s
timeout: 30s
retries: 3
scheduler:
<<: *airflow-celery
command: airflow scheduler
restart: always
deploy:
replicas: 2
redis:
image: redis:6.0
command: redis-server --include /redis.conf
healthcheck:
test: [ "CMD", "redis-cli", "ping" ]
interval: 30s
timeout: 10s
retries: 5
ports:
- 6379:6379
environment:
- REDIS_PORT=6379
worker:
<<: *airflow-celery
command: airflow celery worker
deploy:
replicas: 16
flower:
<<: *airflow-celery
command: airflow celery flower
ports:
- 5555:5555

Sounds like you are heading in the right direction (with one general comment at the end though).
Yes, you need to push image to container registry and refer to it via public (or private if you authenticate) tag. The tag in this case is usally the registry/name:tag. For example you can see one of the CI images of Airlfow here: https://github.com/apache/airflow/pkgs/container/airflow%2Fmain%2Fci%2Fpython3.9 - the purpose is a bit different (we use it for our CI builds) but the mechanism is the same: you build it locally, tag with the "registry/image:tag" docker build . --tag registry/image:tag and run docker push registry/image:tag.
Then whenever you refer to it from your docker compose, via registry/image:tag, docker compose/swarm will pull the right image. Just make sure you make unique TAGs when you build your images to know which image you push (and account for future images).
Env files should be fine and they will distribute across the instances, but locally mounted volumes will not. You either need to have some shared filesystem (like NFS, maybe EFS if you use AWS) where the DAGs are stored, or use some other synchronization method to distribute the DAGs. It can be for example git-sync - which has very nice properties especially if you use Git to store the DAG files, or baking DAGs into the image (which requires to re-push images when they change). You can see different options explained in our Helm Chart https://airflow.apache.org/docs/helm-chart/stable/manage-dags-files.html
You cannot use localhost you need to set it to a specific host and make sure your broker URL is reachable from all instances. This can be done either by assining specific IP address/DNS name to your 'broker' instance and opening up the right ports in firewalls (make sure you control where you can reach thsoe ports from) and maybe even employing some load-balancing.
I do not know DockerSwarm well enough how difficult or easy it is to set it all up, but nonestly, that's kind of a lot of work - it seems - to do it all manually.
I would strongly, really strongly encourage you to use Kubernetes and the Helm Chart which Airlfow community develops: https://airflow.apache.org/docs/helm-chart/stable/index.html . There a lot of issues and necessary configurations either solved in the K8S (scaling, shared filesystems - PVs, networking and connectiviy, resource management etc. etc.) or by our Helm (Git-Sync side containers, broker configuration etc.)

I run Airflow CeleryExecutor on Docker Swarm.
So assuming that you have Docker Swarm set up on your nodes, here are a few things you can do:
Map shared volumes to NFS folders like this (same for plugins and logs, or anything else you need to share)
volumes:
dags:
driver_opts:
type: "none"
o: "bind"
device: "/nfs/airflow/dags"
I personally use Docker Secrets to handle my webserver password, database password, etc. (similarly, I use Docker configs to pass in my celery and webserver config)
secrets:
postgresql_password:
external: true
fernet_key:
external: true
webserver_password:
external: true
To have Airflow read the secrets, I added a simple bash script that gets added to the entrypoint.sh script. So in my stack file I do not need to hardcode any passwords, but if the DOCKER-SECRET string is available, then it will look in /run/secrets/ (I think I used this as an example when setting it up https://gist.github.com/bvis/b78c1e0841cfd2437f03e20c1ee059fe)
In my entrypoint script I add the script that expands Docker Secrets:
source /env_secrets_expand.sh
x-airflow-variables: &airflow-variables
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
...
AIRFLOW__WEBSERVER__SECRET_KEY: DOCKER-SECRET->webserver_secret_key
This is how the postgres image is set up as well, without environment variables:
services:
postgres:
image: postgres:11.5
secrets:
- source: postgresql_password
target: /run/secrets/postgresql_password
environment:
- POSTGRES_USER=airflow
- POSTGRES_DB=airflow
- POSTGRES_PASSWORD_FILE=/run/secrets/postgresql_password
You can obviously use Swarm labels or hostnames to determine which nodes a certain service should run
scheduler:
<<: *airflow-common
environment: *airflow-variables
command: scheduler
deploy:
replicas: 2
mode: replicated
placement:
constraints:
- node.labels.type == worker
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
logging:
driver: fluentd
options:
tag: docker.airflow.scheduler
fluentd-async-connect: "true"
And for Celery workers, I have my default queue and then a special queue which is pinned to a single node for historical reasons (clients have white listed this specific IP address so I need to ensure that tasks only run on that node). So my entrypoint runs exec airflow celery "$#" -q "$QUEUE_NAME", and my stack file is like this:
worker_default:
<<: *airflow-common
environment:
<<: *airflow-variables
QUEUE_NAME: default
command: worker
deploy:
replicas: 3
mode: replicated
placement:
constraints:
- node.labels.type == worker
worker_nodename:
<<: *airflow-common
environment:
<<: *airflow-variables
QUEUE_NAME: nodename
command: worker
deploy:
replicas: 1
mode: replicated
placement:
constraints:
- node.hostname == nodename
I use Gitlab CI/CD to deploy my DAGs/plugins whenever I merge to main, and to build the images and deploy the services if I update the Dockerfile or other certain files. I have been running Airflow this way for a few years now (2017 or 2018) but I do plan on switching to Kubernetes eventually since that seems like the more standard approach.

Related

Hyperledger fabric explorer - Docker Compose to Kubernetes

I have a working docker based setup - peer(s), orderers and explorer (db & app) which I am aiming to deployed on GCP - Kubernetes.
For the peer(s) and orderer I have used the docker images and created kubernetes yaml file with (StatefulSet, Service, NodePort and Ingress) to deploy on Kubernetes.
For Explorer I have the below docker-compose file which depends on my local connection-profile and crypto files.
I am struggling to deploy explorer on kubernetes and looking for advice on the approach
I have tried to convert docker-compose using Kompose - but face issues while translating network and health-check tags.
I have tried to create a single docker-image (Dockerfile - multiple FROM tags) from hyperledger/explorer-db:latest and hyperledger/explorer:latest but again specifying network becomes an issue.
Any suggestions or examples on how Explorer can be deployed in the cluster ??
Thanks
Explorer Docker Compose
version: '2.1'
volumes:
pgdata:
walletstore:
networks:
mynetwork.com:
external:
name: my-netywork
services:
explorerdb.mynetwork.com:
image: hyperledger/explorer-db:latest
container_name: explorerdb.mynetwork.com
hostname: explorerdb.mynetwork.com
environment:
- DATABASE_DATABASE=fabricexplorer
- DATABASE_USERNAME=hppoc
- DATABASE_PASSWORD=password
healthcheck:
test: "pg_isready -h localhost -p 5432 -q -U postgres"
interval: 30s
timeout: 10s
retries: 5
volumes:
- pgdata:/var/lib/postgresql/data
networks:
- mynetwork.com
explorer.mynetwork.com:
image: hyperledger/explorer:latest
container_name: explorer.mynetwork.com
hostname: explorer.mynetwork.com
environment:
- DATABASE_HOST=explorerdb.mynetwork.com
- DATABASE_DATABASE=fabricexplorer
- DATABASE_USERNAME=hppoc
- DATABASE_PASSWD=password
- LOG_LEVEL_APP=info
- LOG_LEVEL_DB=info
- LOG_LEVEL_CONSOLE=debug
- LOG_CONSOLE_STDOUT=true
- DISCOVERY_AS_LOCALHOST=false
volumes:
- ./config.json:/opt/explorer/app/platform/fabric/config.json
- ./connection-profile:/opt/explorer/app/platform/fabric/connection-profile
- ../config/crypto-config:/tmp/crypto
- walletstore:/opt/explorer/wallet
ports:
- 8080:8080
depends_on:
explorerdb.mynetwork.com:
condition: service_healthy
networks:
- mynetwork.com
Explorer Dockerfile - multiple froms
# Updated to Fabric 2.x
#1. Docker file for setting up the Orderer
# FROM hyperledger/fabric-orderer:1.4.2
FROM hyperledger/explorer-db:latest
ENV DATABASE_DATABASE=fabricexplorer
ENV DATABASE_USERNAME=hppoc
ENV DATABASE_PASSWORD=password
FROM hyperledger/explorer:latest
COPY ./config/explorer/. /opt/explorer/
COPY ./config/crypto-config/. /tmp/crypto
ENV DATABASE_HOST=explorerdb.xxx.com
ENV DATABASE_DATABASE=fabricexplorer
ENV DATABASE_USERNAME=hppoc
ENV DATABASE_PASSWD=password
ENV LOG_LEVEL_APP=info
ENV LOG_LEVEL_DB=info
ENV LOG_LEVEL_CONSOLE=debug
ENV LOG_CONSOLE_STDOUT=true
ENV DISCOVERY_AS_LOCALHOST=false
ENV DISCOVERY_AS_LOCALHOST=false
# ENV EXPLORER_APP_ROOT=${EXPLORER_APP_ROOT:-dist}
# ENV ${EXPLORER_APP_ROOT}/main.js name - hyperledger-explorer
ENTRYPOINT ["tail", "-f", "/dev/null"]
There are 2 groups of required steps for this setup. One I tested is:
1.Create a K8s cluster
2.Connect your cluster with the cloud shell
3.Clone this repository
git clone https://github.com/acloudfan/HLF-K8s-Cloud.git
4.Setup the storage class
cd HLF-K8s-Cloud/gcp kubectl apply -f . This will setup the storage class
5.Launch the Acme Orderer
cd .. kubectl apply -f ./k8s-acme-orderer.yaml Check the logs for 'acme-orderer-0' to ensure there is no error
6.Launch the Acme Peer
kubectl apply -f ./k8s-acme-peer.yaml Check the logs for 'acme-peer-0' to ensure there is no error
7.Setup the Channel & Join acme peer to it.
kubectl exec -it acme-peer-0 /bin/bash ./submit-channel-create.sh
./join-channel.sh
Ensure that peer has joined the channel
peer channel list
exit
8.Launch the budget Peer and join it to the channel
kubectl apply -f ./k8s-budget-peer.yaml Wait for the container to launch & check the logs for errors
kubectl exec -it budget-peer-0 /bin/bash ./fetch-channel-block.sh ./join-channel.sh
Ensure that peer has joined the channel
peer channel list
exit ** At this point your K8s Fabric Network is up **
Validate the network
1.Install & Instantiate the test chaincode
kubectl exec -it acme-peer-0 /bin/bash
./cc-test.sh install ./cc-test.sh instantiate
2.Invoke | Query the chaincode to see the changes in values of a/b
./cc-test.sh query ./cc-test.sh invoke
3.Check the values inside the Budget peer
kubectl exec -it acme-peer-0 /bin/bash
./cc-test.sh install
./cc-test.sh query The query should return the same values as you see in acme-peer Execute invoke/query in both peers to validate
Plus, you can visit the following threads to see option 2 and more references on the proper steps to set up your environment Production Network with GKE, HLF-K8s-Cloud, Hyperledger Fabric blockchain deployment on Google Kubernetes Engine and hyperledger/fabric-peer.

How to access dockerized app under test in gitlab CI

I have testng project with selenium for integration testing of frontend app in vuejs and springboot backend. So in order to run tests I need first to bring up all dependent projects:
springboot and mongodb
vuejs frrontend app
Each project is in its own repo.
So I have created docker images of springboot and frontend app and will put it up in gitlab container registry.
Then in the testeng project plan to use docker-compose in .gitlab-ci.yml. Here is docker-compose.yml for testng project:
version: '3.7'
services:
frontendapp:
image: demo.app-frontend-selenium
container_name: frontend-app-selenium
depends_on:
- demoapi
ports:
- 8080:80
demoapi:
image: demo.app-backend-selenium
container_name: demo-api-selenium
depends_on:
- mongodb
environment:
- SPRING_PROFILES_ACTIVE=prod
- SCOUNT_API_ENDPOINTS_WEB_CORS_OPTIONS_ALLOWEDORIGINS=*
- SPRING_DATA_MONGODB_HOST=mongodb
- SPRING_DATA_MONGODB_DATABASE=demo-api-selenium
- KEYCLOAK_AUTH-SERVER-URL=https://my-keycloak-url/auth
ports:
- 8082:80
mongodb:
image: mongo:4-bionic
container_name: mongodb-selenium
environment:
MONGO_INITDB_DATABASE: demo-api-selenium
ports:
- 27017:27017
volumes:
- ./mongo-init.js:/docker-entrypoint-initdb.d/mongo-init.js:ro
After running docker-compose in gitlab-ci.yml what will be url of frontend app in order to execute tests?
When I do it locally I am using following urls for testing:
frontend app: http://localhost:8080
api: http://localhost:8082
But in case when running on gitlab ci what will be url to access frontend and api?
TL;DR instead of using localhost you need to use the hostname of your docker daemon (docker:dind) service. If you setup docker-in-docker for your GitLab job per usual setup, this is most likely docker.
So the urls you need to use according to your compose file are:
frontend app: http://docker:8080
api: http://docker:8082
my_job:
services:
- name: docker:dind
alias: docker # this is the hostname of the daemon
variables:
DOCKER_TLS_CERTDIR: ""
DOCKER_HOST: "tcp://docker:2375"
image: docker:stable
script:
- docker run -d -p 8000:80 strm/helloworld-http
- apk update && apk add curl # install curl and let server start
- curl http://docker:8000 # use the daemon to reach your containers
For a full explanation of this, read on.
Docker port mapping in Gitlab CI vs locally
How it works locally
Normally, when you use docker-compose locally on your system, you are typically running the docker daemon on your localhost (e.g. using docker desktop).
When you provide a port mapping like 8080:80 it means to publish port 8080 on the daemon host bound to port 80 in the container. When running locally, that means you can reach the container via localhost.
In GitLab
However, when you're running docker-in-docker on GitLab CI the important difference in this environment is that the docker daemon is remote. So, when you expose ports through the docker API, the ports are exposed on the docker daemon host not locally in your job container.
Hence, you must use the hostname of the docker daemon, not localhost, to reach your started containers.
Alternative solutions
An alternative to this would be to conduct your testing inside the same docker network that you create with your compose stack. That way, your testing is agnostic of where the docker environment lives and can, for example, leverage the service aliases in your compose file (like frontendapp, demoapi, etc) instead of relying on published ports.
For example, you may choose add a test container to your compose stack. Some testing libraries like Testcontainers can help set this up, too.

How to run schema scripts after running couchbase via docker compose?

I have a schema script /data/cb-create.sh that I have made available on a container volume. When I run docker-compose up, my server is not initialized at the time command is executed. So those commands fail because the server isn't launched just yet. I do not see a Starting Couchbase Server -- Web UI available at http://<ip>:8091 log line when the .sh script is running to initialize the schema. This is my docker compose file. How can I sequence it properly?
version: '3'
services:
couchbase:
image: couchbase:community-6.0.0
deploy:
replicas: 1
ports:
- 8091:8091
- 8092:8092
- 8093:8093
- 8094:8094
- 11210:11210
volumes:
- ./:/data
command: /bin/bash -c "/data/cb-create.sh"
container_name: couchbase
volumes:
kafka-data:
First: You should choose either an entrypoint or a command statement.
I guess an option is to write a small bash script where you put these commands in order.
Then in the command you specify running that bash script.

Ambassador API Gateway doesn't pickup services

I'm a new Ambassador user here. I have walked thru the tutorial, in an effort to understand how use ambassador gateway. I am attempting to run this locally via Docker Compose until it's ready for deployment to K8s in production.
My use case is that all http traffic comes in on port 80, and then directed to the appropriate service. Is it considered best practice to have a docker-compose.yaml file in the working directory that refers to services in the /config directory? I ask because this doesn't appear to actually pickup my files (the postgres startup doesn't show in console). And when I run "docker ps" I only show:
CONTAINER ID IMAGE PORTS NAMES
8bc8393ac04c 05a916199684 k8s_statsd_ambassador-8564bfb874-q97l9_default_e775d686-a93c-11e8-9caa-025000000001_0
1c00f2341caf d7cf7cf837f9 k8s_ambassador_ambassador-8564bfb874-q97l9_default_e775d686-a93c-11e8-9caa-025000000001_0
fe20c4819514 05a916199684 k8s_statsd_ambassador-8564bfb874-xzvkl_default_e775ffe6-a93c-11e8-9caa-025000000001_0
ba6415b028ba d7cf7cf837f9 k8s_ambassador_ambassador-8564bfb874-xzvkl_default_e775ffe6-a93c-11e8-9caa-025000000001_0
9df07dc5083d 05a916199684 k8s_statsd_ambassador-8564bfb874-w5vsq_default_e773ed53-a93c-11e8-9caa-025000000001_0
682e1f9902a0 d7cf7cf837f9 k8s_ambassador_ambassador-8564bfb874-w5vsq_default_e773ed53-a93c-11e8-9caa-025000000001_0
bb6d2f749491 quay.io/datawire/ambassador:0.40.2 0.0.0.0:80->80/tcp apigateway_ambassador_1
I have a docker-compose.yaml:
version: '3.1'
# Define the services/containers to be run
services:
ambassador:
image: quay.io/datawire/ambassador:0.40.2
ports:
- 80:80
volumes:
# mount a volume where we can inject configuration files
- ./config:/ambassador/config
postgres:
image: my-postgresql
ports:
- '5432:5432'
and in /config/mapping-postgres.yaml:
---
apiVersion: ambassador/v0
kind: Mapping
name: postgres_mapping
rewrite: ""
service: postgres:5432
volumes:
- ../my-postgres:/docker-entrypoint-initdb.d
environment:
- POSTGRES_MULTIPLE_DATABASES=db1, db2, db3
- POSTGRES_USER=<>
- POSTGRES_PASSWORD=<>
volumes and environment are not valid configs for Ambassador Mappings. Ambassador lets you proxy to postgres but the authentication has to be handled by your application.
Having said that, it looks like your Postgres container is not starting. (Perhaps because it needs an initial config). You can check for errors with:
$ docker ps -a | grep postgres
$ docker logs <container-id-from-previous-step>
You can also check a postgres docker compose example here.
Is it considered best practice to have a docker-compose.yaml file in the working directory that refers to services in the /config directory?
It's pretty standard, but you can use any directory you like for this.

docker-compose: postgres data not persisting

I have a main service in my docker-compose file that uses postgres's image and, though I seem to be successfully connecting to the database, the data that I'm writing to it is not being kept beyond the lifetime of the container (what I did is based on this tutorial).
Here's my docker-compose file:
main:
build: .
volumes:
- .:/code
links:
- postgresdb
command: python manage.py insert_into_database
environment:
- DEBUG=true
postgresdb:
build: utils/sql/
volumes_from:
- postgresdbdata
ports:
- "5432"
environment:
- DEBUG=true
postgresdbdata:
build: utils/sql/
volumes:
- /var/lib/postgresql
command: true
environment:
- DEBUG=true
and here's the Dockerfile I'm using for the postgresdb and postgresdbdata services (which essentially creates the database and adds a user):
FROM postgres
ADD make-db.sh /docker-entrypoint-initdb.d/
How can I get the data to stay after the main service has finished running, in order to be able to use it in the future (such as when I call something like python manage.py retrieve_from_database)? Is /var/lib/postgresql even the right directory, and would boot2docker have access to it given that it's apparently limited to /Users/?
Thank you!
The problem is that Compose creates a new version of the postgresdbdata container each time it restarts, so the old container and its data gets lost.
A secondary issue is that your data container shouldn't actually be running; data containers are really just a namespace for a volume that can be imported with --volumes-from, which still works with stopped containers.
For the time being the best solution is to take the postgresdbdata container out of the Compose config. Do something like:
$ docker run --name postgresdbdata postgresdb echo "Postgres data container"
Postgres data container
The echo command will run and the container will exit, but as long as don't docker rm it, you will still be able to refer to it in --volumes-from and your Compose application should work fine.