I am new to Spark(3.0.0_preview) and Scala(SBT). I have written a spark streaming job that I can run successfully on my local from my IDE
Now, I a looking for a way to dockerize the code so that I can run it with my docker-compose that builds the spark cluster
My docker-compose:
version: "3.3"
services:
spark-master:
image: rd/spark:latest
container_name: spark-master
hostname: spark-master
ports:
- "8080:8080"
- "7077:7077"
networks:
- spark-network
environment:
- "SPARK_LOCAL_IP=spark-master"
- "SPARK_MASTER_PORT=7077"
- "SPARK_MASTER_WEBUI_PORT=8080"
command: "/start-master.sh"
spark-worker:
image: rd/spark:latest
depends_on:
- spark-master
ports:
- 8080
networks:
- spark-network
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "SPARK_WORKER_WEBUI_PORT=8080"
command: "/start-worker.sh"
networks:
spark-network:
driver: bridge
ipam:
driver: default
Docker Files:
FROM openjdk:8-alpine
RUN apk --update add wget tar bash
RUN wget http://apache.mirror.anlx.net/spark/spark-3.0.0-preview2/spark-3.0.0-preview2-bin-hadoop2.7.tgz
RUN tar -xzf spark-3.0.0-preview2-bin-hadoop2.7.tgz && \
mv spark-3.0.0-preview2-bin-hadoop2.7 /spark && \
rm spark-3.0.0-preview2-bin-hadoop2.7.tgz
COPY start-master.sh /start-master.sh
COPY start-worker.sh /start-worker.sh
This seems a simple request but I am having a hard time finding good documentation on it.
Got it working with the following:
project-structure:
project
src
build.sbt
Dockerfile
Dockerfile-app
docker-compose.yml
docker-compose.yml
version: "3.3"
services:
spark-scala-env:
image: app/spark-scala-env:latest
build:
context: .
dockerfile: Dockerfile
app-spark-scala:
image: app/app-spark-scala:latest
build:
context: .
dockerfile: Dockerfile-app
spark-master:
image: app/app-spark-scala:latest
container_name: spark-master
hostname: localhost
ports:
- "8080:8080"
- "7077:7077"
networks:
- spark-network
environment:
- "SPARK_LOCAL_IP=spark-master"
- "SPARK_MASTER_PORT=7077"
- "SPARK_MASTER_WEBUI_PORT=8080"
command: ["sh", "-c", "/spark/bin/spark-class org.apache.spark.deploy.master.Master --ip $${SPARK_LOCAL_IP} --port $${SPARK_MASTER_PORT} --webui-port $${SPARK_MASTER_WEBUI_PORT}"]
spark-worker:
image: app/app-spark-scala:latest
hostname: localhost
depends_on:
- spark-master
ports:
- 8080
networks:
- spark-network
#network_mode: host
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "SPARK_WORKER_WEBUI_PORT=8080"
- "SPARK_WORKER_CORES=2"
command: ["sh", "-c", "/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port $${SPARK_WORKER_WEBUI_PORT} $${SPARK_MASTER}"]
app-submit-job:
image: app/app-spark-scala:latest
ports:
- "4040:4040"
environment:
- "SPARK_APPLICATION_MAIN_CLASS=com.app.spark.TestAssembly"
- "SPARK_MASTER=spark://spark-master:7077"
- "APP_PACKAGES=org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0-preview2"
- "APP_JAR_LOC=/app/target/scala-2.12/app_spark_scala-assembly-0.2.jar"
hostname: localhost
networks:
- spark-network
volumes:
- ./appdata:/appdata
command: ["sh", "-c", "/spark/bin/spark-submit --packages $${APP_PACKAGES} --class $${SPARK_APPLICATION_MAIN_CLASS} --master $${SPARK_MASTER} $${APP_JAR_LOC}"]
networks:
spark-network:
driver: bridge
ipam:
driver: default
Dockerfile
FROM openjdk:8-alpine
ARG SPARK_VERSION
ARG HADOOP_VERSION
ARG SCALA_VERSION
ARG SBT_VERSION
ENV SPARK_VERSION=${SPARK_VERSION:-3.0.0-preview2}
ENV HADOOP_VERSION=${HADOOP_VERSION:-2.7}
ENV SCALA_VERSION ${SCALA_VERSION:-2.12.8}
ENV SBT_VERSION ${SBT_VERSION:-1.3.4}
RUN apk --update add wget tar bash
RUN \
echo "$SPARK_VERSION $HADOOP_VERSION" && \
echo http://apache.mirror.anlx.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \
wget http://apache.mirror.anlx.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
RUN tar -xzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \
mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} /spark && \
rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
RUN \
echo "$SCALA_VERSION $SBT_VERSION" && \
mkdir -p /usr/lib/jvm/java-1.8-openjdk/jre && \
touch /usr/lib/jvm/java-1.8-openjdk/jre/release && \
apk add -U --no-cache bash curl && \
curl -fsL http://downloads.typesafe.com/scala/$SCALA_VERSION/scala-$SCALA_VERSION.tgz | tar xfz - -C /usr/local && \
ln -s /usr/local/scala-$SCALA_VERSION/bin/* /usr/local/bin/ && \
scala -version && \
scalac -version
RUN \
curl -fsL https://github.com/sbt/sbt/releases/download/v$SBT_VERSION/sbt-$SBT_VERSION.tgz | tar xfz - -C /usr/local && \
$(mv /usr/local/sbt-launcher-packaging-$SBT_VERSION /usr/local/sbt || true) \
ln -s /usr/local/sbt/bin/* /usr/local/bin/ && \
sbt sbt-version || sbt sbtVersion || true
Dockerfile-app
FROM app/spark-scala-env:latest
WORKDIR /app
# Pre-install base libraries
ADD build.sbt /app/
ADD project/plugins.sbt /app/project/
ADD src/. /app/src/
RUN sbt update
RUN sbt clean assembly
Commands:
Build the docker image with spark and scala env. i.e docker-compose build spark-scala-env
Build the image with the app jar. i.e docker-compose build app-spark-scala
Bring the spark env container i.e docker-compose up -d --scale spark-worker=2 spark-worker
Submit the job via docker-compose up -d app-submit-job
I'm using the certbot/certbot container as in:
docker-compose run -d --rm --entrypoint 'certbot certonly --webroot -w /var/www/certbot --staging --email example#domain.se -d example.com --rsa-key-size 4096 --agree-tos --force-renewal ; sleep 3600' certbot
on the following compose file:
version: '3.5'
services:
nginx:
image: nginx:1.15-alpine
restart: unless-stopped
volumes:
- "~/dev/docker/projects/common/volumes/letsencrypt/nginx:/etc/nginx/conf.d"
- "~/dev/docker/projects/common/volumes/letsencrypt/certbot/conf:/etc/letsencrypt"
- "~/dev/docker/projects/common/volumes/letsencrypt/certbot/www:/var/www/certbot"
- "~/dev/docker/projects/common/volumes/letsencrypt/nginx:/var/www/nginx"
- "~/dev/docker/projects/common/volumes/logs:/var/log/nginx"
ports:
- "80:80"
- "443:443"
command: "/bin/sh -c 'while :; do sleep 6h & wait $${!}; nginx -s reload; done & nginx -g \"daemon off;\"'"
certbot:
image: certbot/certbot
restart: unless-stopped
volumes:
- "~/dev/docker/projects/common/volumes/letsencrypt/certbot/conf:/etc/letsencrypt"
- "~/dev/docker/projects/common/volumes/letsencrypt/certbot/www:/var/www/certbot"
- "~/dev/docker/projects/common/volumes/logs:/var/log/letsencrypt"
entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew; sleep 12h & wait $${!}; done;'"
But it ignores the sleep command and the container goes away.
Whereas running the following:
docker-compose run -d --rm --entrypoint 'sleep 3600' certbot
keeps the container up and running.
I would like to keep the container up and running after the certbot failed.
You could move "/bin/sh -c 'trap exit TERM; while :; do certbot renew; sleep 12h & wait $${!}; done;'" into dedicated script for example start.sh.
Mount it with docker-compose volumes
volumes:
- "./start.sh:/start.sh
entrypoint: /start.sh
I'm trying to run a bare bones version of Hyperledger Sawtooth using Docker CE on a Mac. The docker-compose.yaml has containers running the base images from Sawtooth.
I'm unable to access the Sawtooth REST API from the host machine even though there are ports published for it when I run docker ps. The docker-compose file has worked on other Macs running Docker CE so I'm suspecting it may be a configuration or setup issue.
The contents of the docker-compose.yaml are below:
version: '2.1'
services:
settings-tp:
image: 'hyperledger/sawtooth-settings-tp:1.1.3'
container_name: sawtooth-settings-tp
depends_on:
- validator
entrypoint: settings-tp --connect tcp://validator:4004
identity-tp:
image: 'hyperledger/sawtooth-identity-tp:1.1.3'
container_name: sawtooth-identity-tp
depends_on:
- validator
entrypoint: identity-tp -vv --connect tcp://validator:4004
rest-api:
image: 'hyperledger/sawtooth-rest-api:1.1.3'
container_name: sawtooth-rest-api
ports:
- '8008:8008'
depends_on:
- validator
entrypoint: sawtooth-rest-api --connect tcp://validator:4004 --bind rest-api:8008
validator:
image: 'hyperledger/sawtooth-validator:1.1.3'
container_name: sawtooth-validator
ports:
- '4004:4004'
command: |
bash -c "
if [ ! -f /etc/sawtooth/keys/validator.priv ]; then
sawadm keygen
sawtooth keygen my_key
sawset genesis -k /root/.sawtooth/keys/my_key.priv
sawset proposal create \
-k /root/.sawtooth/keys/my_key.priv \
sawtooth.consensus.algorithm.name=Devmode \
sawtooth.consensus.algorithm.version=0.1 \
-o config.batch && \
sawadm genesis config-genesis.batch config.batch
fi;
sawtooth-validator -vvv \
--endpoint tcp://validator:8800 \
--bind component:tcp://eth0:4004 \
--bind network:tcp://eth0:8800 \
--bind consensus:tcp://eth0:5050 \
"
devmode-engine:
image: 'hyperledger/sawtooth-devmode-engine-rust:1.1.3'
container_name: sawtooth-devmode-engine-rust-default
depends_on:
- validator
entrypoint: devmode-engine-rust -C tcp://validator:5050
If you cannot access the port from the host, the container must not be running correctly. Look for error messages for that container when starting docker-compose
What does docker ps -a show?
Can you connect to the port? Try something like telnet localhost 8008
I've in Dockerfile service which depends on another service, but I'd like to negate the condition when not service_healthy. So opposite of the following:
service1:
depends_on:
service2:
condition: service_healthy
So basically I'd like to start service1 when service2 is not healthy.
Secondly, based on the documentation for depends_on, the condition option has been removed and it is no longer supported in version 3 of Compose file format.
So how the above logic can be achieved?
Here is the workaround where main container waits for other hosts to exit, by pinging the other hosts and waiting when both are off-line:
version: '3'
services:
main:
image: bash
depends_on:
- test01
- test02
command: bash -c "sleep 2 && until ! ping -qc1 test01 && ! ping -qc1 test02; do sleep 1; done &>/dev/null"
networks:
intra:
ipv4_address: 172.10.0.254
test01:
image: bash
hostname: test01
command: bash -c "ip route && sleep 10"
networks:
intra:
ipv4_address: 172.10.0.11
test02:
image: bash
hostname: test02
command: bash -c "ip route && sleep 20"
networks:
intra:
ipv4_address: 172.10.0.12
networks:
intra:
driver: bridge
ipam:
config:
- subnet: 172.10.0.0/24
See also: Docker compose - Start service only when other service had completed
I am using postgresql with django in my project. I've got them in different containers and the problem is that i need to wait for postgres before running django. At this time i am doing it with sleep 5 in command.sh file for django container. I also found that netcat can do the trick but I would prefer way without additional packages. curl and wget can't do this because they do not support postgres protocol.
Is there a way to do it?
I've spent some hours investigating this problem and I got a solution.
Docker depends_on just consider service startup to run another service. Than it happens because as soon as db is started, service-app tries to connect to ur db, but it's not ready to receive connections. So you can check db health status in app service to wait for connection. Here is my solution, it solved my problem. :)
Important: I'm using docker-compose version 2.1.
version: '2.1'
services:
my-app:
build: .
command: su -c "python manage.py runserver 0.0.0.0:8000"
ports:
- "8000:8000"
depends_on:
db:
condition: service_healthy
links:
- db
volumes:
- .:/app_directory
db:
image: postgres:10.5
ports:
- "5432:5432"
volumes:
- database:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
volumes:
database:
In this case it's not necessary to create a .sh file.
This will successfully wait for Postgres to start. (Specifically line 6). Just replace npm start with whatever command you'd like to happen after Postgres has started.
services:
practice_docker:
image: dockerhubusername/practice_docker
ports:
- 80:3000
command: bash -c 'while !</dev/tcp/db/5432; do sleep 1; done; npm start'
depends_on:
- db
environment:
- DATABASE_URL=postgres://postgres:password#db:5432/practicedocker
- PORT=3000
db:
image: postgres
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=password
- POSTGRES_DB=practicedocker
If you have psql you could simply add the following code to your .sh file:
RETRIES=5
until psql -h $PG_HOST -U $PG_USER -d $PG_DATABASE -c "select 1" > /dev/null 2>&1 || [ $RETRIES -eq 0 ]; do
echo "Waiting for postgres server, $((RETRIES--)) remaining attempts..."
sleep 1
done
The simplest solution is a short bash script:
while ! nc -z HOST PORT; do sleep 1; done;
./run-smth-else;
Problem with your solution tiziano is that curl is not installed by default and i wanted to avoid installing additional stuff. Anyway i did what bereal said. Here is the script if anyone would need it.
import socket
import time
import os
port = int(os.environ["DB_PORT"]) # 5432
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
while True:
try:
s.connect(('myproject-db', port))
s.close()
break
except socket.error as ex:
time.sleep(0.1)
In your Dockerfile add wait and change your start command to use it:
ADD https://github.com/ufoscout/docker-compose-wait/releases/download/2.7.3/wait /wait
RUN chmod +x /wait
CMD /wait && npm start
Then, in your docker-compose.yml add a WAIT_HOSTS environment variable for your api service:
services:
api:
depends_on:
- postgres
environment:
- WAIT_HOSTS: postgres:5432
postgres:
image: postgres
ports:
- "5432:5432"
This has the advantage that it supports waiting for multiple services:
environment:
- WAIT_HOSTS: postgres:5432, mysql:3306, mongo:27017
For more details, please read their documentation.
wait-for-it small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections.
can be cloned in Dockerfile by below command
RUN git clone https://github.com/vishnubob/wait-for-it.git
docker-compose.yml
version: "2"
services:
web:
build: .
ports:
- "80:8000"
depends_on:
- "db"
command: ["./wait-for-it/wait-for-it.sh", "db:5432", "--", "npm", "start"]
db:
image: postgres
Why not curl?
Something like this:
while ! curl http://$POSTGRES_PORT_5432_TCP_ADDR:$POSTGRES_PORT_5432_TCP_PORT/ 2>&1 | grep '52'
do
sleep 1
done
It works for me.
I have managed to solve my issue by adding health check to docker-compose definition.
db:
image: postgres:latest
ports:
- 5432:5432
healthcheck:
test: "pg_isready --username=postgres && psql --username=postgres --list"
timeout: 10s
retries: 20
then in the dependent service you can check the health status:
my-service:
image: myApp:latest
depends_on:
kafka:
condition: service_started
db:
condition: service_healthy
source: https://docs.docker.com/compose/compose-file/compose-file-v2/#healthcheck
If the backend application itself has a PostgreSQL client, you can use the pg_isready command in an until loop. For example, suppose we have the following project directory structure,
.
├── backend
│ └── Dockerfile
└── docker-compose.yml
with a docker-compose.yml
version: "3"
services:
postgres:
image: postgres
backend:
build: ./backend
and a backend/Dockerfile
FROM alpine
RUN apk update && apk add postgresql-client
CMD until pg_isready --username=postgres --host=postgres; do sleep 1; done \
&& psql --username=postgres --host=postgres --list
where the 'actual' command is just a psql --list for illustration. Then running docker-compose build and docker-compose up will give you the following output:
Note how the result of the psql --list command only appears after pg_isready logs postgres:5432 - accepting connections as desired.
By contrast, I have found that the nc -z approach does not work consistently. For example, if I replace the backend/Dockerfile with
FROM alpine
RUN apk update && apk add postgresql-client
CMD until nc -z postgres 5432; do echo "Waiting for Postgres..." && sleep 1; done \
&& psql --username=postgres --host=postgres --list
then docker-compose build followed by docker-compose up gives me the following result:
That is, the psql command throws a FATAL error that the database system is starting up.
In short, using an until pg_isready loop (as also recommended here) is the preferable approach IMO.
There are couple of solutions as other answers mentioned.
But don't make it complicated, just let it fail-fast combined with restart: on-failure. Your service will open connection to the db and may fail at the first time. Just let it fail. Docker will restart your service until it green. Keep your service simple and business-focused.
version: '3.7'
services:
postgresdb:
hostname: postgresdb
image: postgres:12.2
ports:
- "5432:5432"
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=secret
- POSTGRES_DB=Ceo
migrate:
image: hanh/migration
links:
- postgresdb
environment:
- DATA_SOURCE=postgres://user:secret#postgresdb:5432/Ceo
command: migrate sql --yes
restart: on-failure # will restart until it's success
Check out restart policies.
None of other solution worked, except for the following:
version : '3.8'
services :
postgres :
image : postgres:latest
environment :
- POSTGRES_DB=mydbname
- POSTGRES_USER=myusername
- POSTGRES_PASSWORD=mypassword
healthcheck :
test: [ "CMD", "pg_isready", "-q", "-d", "mydbname", "-U", "myusername" ]
interval : 5s
timeout : 5s
retries : 5
otherservice:
image: otherserviceimage
depends_on :
postgres:
condition: service_healthy
Thanks to this thread: https://github.com/peter-evans/docker-compose-healthcheck/issues/16
Sleeping until pg_isready returns true unfortunately is not always reliable. If your postgres container has at least one initdb script specified, postgres restarts after it is started during it's bootstrap procedure, and so it might not be ready yet even though pg_isready already returned true.
What you can do instead, is to wait until docker logs for that instance return a PostgreSQL init process complete; ready for start up. string, and only then proceed with the pg_isready check.
Example:
start_postgres() {
docker-compose up -d --no-recreate postgres
}
wait_for_postgres() {
until docker-compose logs | grep -q "PostgreSQL init process complete; ready for start up." \
&& docker-compose exec -T postgres sh -c "PGPASSWORD=\$POSTGRES_PASSWORD PGUSER=\$POSTGRES_USER pg_isready --dbname=\$POSTGRES_DB" > /dev/null 2>&1; do
printf "\rWaiting for postgres container to be available ... "
sleep 1
done
printf "\rWaiting for postgres container to be available ... done\n"
}
start_postgres
wait_for_postgres
You can use the manage.py command "check" to check if the database is available (and wait 2 seconds if not, and check again).
For instance, if you do this in your command.sh file before running the migration, Django has a valid DB connection while running the migration command:
...
echo "Waiting for db.."
python manage.py check --database default > /dev/null 2> /dev/null
until [ $? -eq 0 ];
do
sleep 2
python manage.py check --database default > /dev/null 2> /dev/null
done
echo "Connected."
# Migrate the last database changes
python manage.py migrate
...
PS: I'm not a shell expert, please suggest improvements.
#!/bin/sh
POSTGRES_VERSION=9.6.11
CONTAINER_NAME=my-postgres-container
# start the postgres container
docker run --rm \
--name $CONTAINER_NAME \
-e POSTGRES_PASSWORD=docker \
-d \
-p 5432:5432 \
postgres:$POSTGRES_VERSION
# wait until postgres is ready to accept connections
until docker run \
--rm \
--link $CONTAINER_NAME:pg \
postgres:$POSTGRES_VERSION pg_isready \
-U postgres \
-h pg; do sleep 1; done
An example for Nodejs and Postgres api.
#!/bin/bash
#entrypoint.dev.sh
echo "Waiting for postgres to get up and running..."
while ! nc -z postgres_container 5432; do
# where the postgres_container is the hos, in my case, it is a Docker container.
# You can use localhost for example in case your database is running locally.
echo "waiting for postgress listening..."
sleep 0.1
done
echo "PostgreSQL started"
yarn db:migrate
yarn dev
# Dockerfile
FROM node:12.16.2-alpine
ENV NODE_ENV="development"
RUN mkdir -p /app
WORKDIR /app
COPY ./package.json ./yarn.lock ./
RUN yarn install
COPY . .
CMD ["/bin/sh", "./entrypoint.dev.sh"]
If you want to run it with a single line command. You can just connect to the container and check if postgres is running
docker exec -it $DB_NAME bash -c "\
until psql -h $HOST -U $USER -d $DB_NAME-c 'select 1'>/dev/null 2>&1;\
do\
echo 'Waiting for postgres server....';\
sleep 1;\
done;\
exit;\
"
echo "DB Connected !!"
Inspired by #tiziano answer and the lack of nc or pg_isready, it seems that in a recent docker python image (python:3.9 here) that curl is installed by default and I have the following check running in my entrypoint.sh:
postgres_ready() {
$(which curl) http://$DBHOST:$DBPORT/ 2>&1 | grep '52'
}
until postgres_ready; do
>&2 echo 'Waiting for PostgreSQL to become available...'
sleep 1
done
>&2 echo 'PostgreSQL is available.'
Trying with a lot of methods, Dockerfile, docker compose yaml, bash script. Only last of method help me: with makefile.
docker-compose up --build -d postgres
sleep 2
docker-compose up --build -d app