I'm running Spark 3.1.1 on Kubernetes in cluster mode with the following command:
command:
- /opt/spark/bin/spark-submit
- --deploy-mode
- cluster
- --name
- "my_job_name"
- --master
- k8s://https://kubernetes.default.svc.cluster.local
- --class
- $(CLASS)
- --conf
- spark.kubernetes.namespace=my-namespace
- --conf
- spark.kubernetes.authenticate.driver.serviceAccountName=my-driver-sa
- --conf
- spark.kubernetes.driver.limit.cores=$(DRIVER_LIMIT_CORES)
- --conf
- spark.executor.instances=$(EXECUTOR_INSTANCES)
- --conf
- spark.executor.memory=$(EXECUTOR_MEMORY)
- --conf
- spark.executor.cores=$(EXECUTOR_CORES)
- --conf
- spark.kubernetes.executor.limit.cores=$(EXECUTOR_CORES)
- --conf
- spark.kubernetes.container.image=$(CONTAINER_REGISTRY)/$(IMAGE_REPOSITORY):$(TAG)
- --conf
- spark.jars.ivy=/tmp/.ivy
- --conf
- spark.eventLog.enabled=true
- --conf
- spark.eventLog.dir=$(EVENT_LOG_DIR)
- --conf
- spark.eventLog.rolling.enabled=true
- --conf
- spark.hadoop.fs.s3a.aws.credentials.provider=$(CREDS_PROVIDER)
- --conf
- spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
- local://$(APPLICATION_JAR)
- -s
- $(APP_ARG_1)
- -d
- $(APP_ARG_2)
Everything seems to work, driver pod is starting, executors starting as well, doing some job, terminating and I see the last log in the driver:
21/05/24 11:29:46 INFO SparkContext: Successfully stopped SparkContext
My problem is that the driver stays in a running state and not terminating after the job is done.
What am I missing?
Related
I'm trying to start locust using docker-compose on Macbook M1:
the issue:
it starts but there is no workers
expected behaviour:
should have one worker
logs:
no error logs
code to reproduce it:
version: '3.3'
services:
master_locust:
image: locustio/locust:master
ports:
- "8089:8089"
volumes:
- ./backend:/mnt/locust
command: -f /mnt/locust/locustfile.py --master
depends_on:
- worker_locust
worker_locust:
image: locustio/locust:master
volumes:
- ./backend:/mnt/locust
command: -f /mnt/locust/locustfile.py --worker --master-host=master_locust
commands:
docker-compose up master_locust
docker-compose up --scale worker_locust=4 master_locust
screenshot:
I am able to run Apache Pulsar using this docker command:
docker run -it \
-p 6650:6650 \
-p 8080:8080 \
--mount source=pulsardata,target=/pulsar/data \
--mount source=pulsarconf,target=/pulsar/conf \
apachepulsar/pulsar:2.6.0 \
bin/pulsar standalone
I am trying to convert this to docker-compose and I use the docker-compose.yml file below. When I run the command:
docker-compose up
I get the error:
Attaching to pulsar
pulsar | Error: Could not find or load main class "
pulsar exited with code 1
What am I doing wrong here?
Thanks in advance.
version: '3.1'
services:
standalone:
image: apachepulsar/pulsar:2.6.0
container_name: pulsar
ports:
- 8080:8080
- 6650:6650
environment:
- PULSAR_MEM=" -Xms512m -Xmx512m -XX:MaxDirectMemorySize=1g"
volumes:
- pulsardata:/pulsar/data
- pulsarconf:/pulsar/conf
command: /bin/bash -c "bin/pulsar standalone"
volumes:
pulsardata:
pulsarconf:
The issue is with the env variable. It should work if you specify it in the following way:
version: '3.1'
services:
standalone:
image: apachepulsar/pulsar:2.6.0
ports:
- 8080:8080
- 6650:6650
environment:
PULSAR_MEM: " -Xms512m -Xmx512m -XX:MaxDirectMemorySize=1g"
command: bin/pulsar standalone
# ... other parameters
When trying to translate the following 2 docker commands into a docker-compose.yml using Compose version 3
docker run \
--name timescaledb \
--network timescaledb-net \
-e POSTGRES_PASSWORD=insecure \
-e POSTGRES_INITDB_WALDIR=/var/lib/postgresql/data/pg_wal \
-e PGDATA=/var/lib/postgresql/data/pg_data \
timescale/timescaledb:latest-pg11 postgres \
-cwal_level=archive \
-carchive_mode=on \
-carchive_command="/usr/bin/wget wale/wal-push/%f -O -" \
-carchive_timeout=600 \
-ccheckpoint_timeout=700 \
-cmax_wal_senders=1
and
docker run \
--name wale \
--network timescaledb-net \
--volumes-from timescaledb \
-v ./backups:/backups \
-e WALE_LOG_DESTINATION=stderr \
-e PGWAL=/var/lib/postgresql/data/pg_wal \
-e PGDATA=/var/lib/postgresql/data/pg_data \
-e PGHOST=timescaledb \
-e PGPASSWORD=insecure \
-e PGUSER=postgres \
-e WALE_FILE_PREFIX=file://localhost/backups \
timescale/timescaledb-wale:latest
we get the following error when running docker-compose up:
ERROR: The Compose file './docker-compose.yml' is invalid because:
Unsupported config option for services.wale: 'volumes_from'
How can we translate the 2 Docker commands correctly to use Compose version 3? We will need to be able to specify the location of the volumes on the host (i.e. ./timescaledb).
Using Mac OS X 10.15.3, Docker 19.03.8, Docker Compose 1.25.4
docker-compose.yml
version: '3.3'
services:
timescaledb:
image: timescale/timescaledb:latest-pg11
container_name: timescaledb
ports:
- 5432:5432
environment:
- POSTGRES_PASSWORD=insecure
- POSTGRES_INITDB_WALDIR=/var/lib/postgresql/data/pg_wal
- PGDATA=/var/lib/postgresql/data/pg_data
command: -cwal_level=archive -carchive_mode=on -carchive_command="/usr/bin/wget wale/wal-push/%f -O -" -carchive_timeout=600 -ccheckpoint_timeout=700 -cmax_wal_senders=1
volumes:
- ./timescaledb:/var/lib/postgresql/data
networks:
- timescaledb-net
wale:
image: timescale/timescaledb-wale:latest
container_name: wale
environment:
- WALE_LOG_DESTINATION=stderr
- PGWAL=/var/lib/postgresql/data/pg_wal
- PGDATA=/var/lib/postgresql/data/pg_data
- PGHOST=timescaledb
- PGPASSWORD=insecure
- PGUSER=postgres
- WALE_FILE_PREFIX=file://localhost/backups
volumes_from:
- tsdb
volumes:
- ./backups:/backups
networks:
- timescaledb-net
depends_on:
- timescaledb
networks:
timescaledb-net:
In the container timescaledb you are actually mounting the /var/lib/postgresql/data to ./timescaledb and hence, if you want to use the same volume for the wale container, you can edit the wale container like:
...
volumes:
- ./backups:/backups
- ./timescaledb:/var/lib/postgresql/data
...
In this case, both of the containers will be able to read and write from the same mounted volume to your local machine.
Also, remember to remove this part as it is not a valid command in docker-compose:
volumes_from:
- tsdb
I am new to Spark(3.0.0_preview) and Scala(SBT). I have written a spark streaming job that I can run successfully on my local from my IDE
Now, I a looking for a way to dockerize the code so that I can run it with my docker-compose that builds the spark cluster
My docker-compose:
version: "3.3"
services:
spark-master:
image: rd/spark:latest
container_name: spark-master
hostname: spark-master
ports:
- "8080:8080"
- "7077:7077"
networks:
- spark-network
environment:
- "SPARK_LOCAL_IP=spark-master"
- "SPARK_MASTER_PORT=7077"
- "SPARK_MASTER_WEBUI_PORT=8080"
command: "/start-master.sh"
spark-worker:
image: rd/spark:latest
depends_on:
- spark-master
ports:
- 8080
networks:
- spark-network
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "SPARK_WORKER_WEBUI_PORT=8080"
command: "/start-worker.sh"
networks:
spark-network:
driver: bridge
ipam:
driver: default
Docker Files:
FROM openjdk:8-alpine
RUN apk --update add wget tar bash
RUN wget http://apache.mirror.anlx.net/spark/spark-3.0.0-preview2/spark-3.0.0-preview2-bin-hadoop2.7.tgz
RUN tar -xzf spark-3.0.0-preview2-bin-hadoop2.7.tgz && \
mv spark-3.0.0-preview2-bin-hadoop2.7 /spark && \
rm spark-3.0.0-preview2-bin-hadoop2.7.tgz
COPY start-master.sh /start-master.sh
COPY start-worker.sh /start-worker.sh
This seems a simple request but I am having a hard time finding good documentation on it.
Got it working with the following:
project-structure:
project
src
build.sbt
Dockerfile
Dockerfile-app
docker-compose.yml
docker-compose.yml
version: "3.3"
services:
spark-scala-env:
image: app/spark-scala-env:latest
build:
context: .
dockerfile: Dockerfile
app-spark-scala:
image: app/app-spark-scala:latest
build:
context: .
dockerfile: Dockerfile-app
spark-master:
image: app/app-spark-scala:latest
container_name: spark-master
hostname: localhost
ports:
- "8080:8080"
- "7077:7077"
networks:
- spark-network
environment:
- "SPARK_LOCAL_IP=spark-master"
- "SPARK_MASTER_PORT=7077"
- "SPARK_MASTER_WEBUI_PORT=8080"
command: ["sh", "-c", "/spark/bin/spark-class org.apache.spark.deploy.master.Master --ip $${SPARK_LOCAL_IP} --port $${SPARK_MASTER_PORT} --webui-port $${SPARK_MASTER_WEBUI_PORT}"]
spark-worker:
image: app/app-spark-scala:latest
hostname: localhost
depends_on:
- spark-master
ports:
- 8080
networks:
- spark-network
#network_mode: host
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "SPARK_WORKER_WEBUI_PORT=8080"
- "SPARK_WORKER_CORES=2"
command: ["sh", "-c", "/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port $${SPARK_WORKER_WEBUI_PORT} $${SPARK_MASTER}"]
app-submit-job:
image: app/app-spark-scala:latest
ports:
- "4040:4040"
environment:
- "SPARK_APPLICATION_MAIN_CLASS=com.app.spark.TestAssembly"
- "SPARK_MASTER=spark://spark-master:7077"
- "APP_PACKAGES=org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0-preview2"
- "APP_JAR_LOC=/app/target/scala-2.12/app_spark_scala-assembly-0.2.jar"
hostname: localhost
networks:
- spark-network
volumes:
- ./appdata:/appdata
command: ["sh", "-c", "/spark/bin/spark-submit --packages $${APP_PACKAGES} --class $${SPARK_APPLICATION_MAIN_CLASS} --master $${SPARK_MASTER} $${APP_JAR_LOC}"]
networks:
spark-network:
driver: bridge
ipam:
driver: default
Dockerfile
FROM openjdk:8-alpine
ARG SPARK_VERSION
ARG HADOOP_VERSION
ARG SCALA_VERSION
ARG SBT_VERSION
ENV SPARK_VERSION=${SPARK_VERSION:-3.0.0-preview2}
ENV HADOOP_VERSION=${HADOOP_VERSION:-2.7}
ENV SCALA_VERSION ${SCALA_VERSION:-2.12.8}
ENV SBT_VERSION ${SBT_VERSION:-1.3.4}
RUN apk --update add wget tar bash
RUN \
echo "$SPARK_VERSION $HADOOP_VERSION" && \
echo http://apache.mirror.anlx.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \
wget http://apache.mirror.anlx.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
RUN tar -xzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \
mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} /spark && \
rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
RUN \
echo "$SCALA_VERSION $SBT_VERSION" && \
mkdir -p /usr/lib/jvm/java-1.8-openjdk/jre && \
touch /usr/lib/jvm/java-1.8-openjdk/jre/release && \
apk add -U --no-cache bash curl && \
curl -fsL http://downloads.typesafe.com/scala/$SCALA_VERSION/scala-$SCALA_VERSION.tgz | tar xfz - -C /usr/local && \
ln -s /usr/local/scala-$SCALA_VERSION/bin/* /usr/local/bin/ && \
scala -version && \
scalac -version
RUN \
curl -fsL https://github.com/sbt/sbt/releases/download/v$SBT_VERSION/sbt-$SBT_VERSION.tgz | tar xfz - -C /usr/local && \
$(mv /usr/local/sbt-launcher-packaging-$SBT_VERSION /usr/local/sbt || true) \
ln -s /usr/local/sbt/bin/* /usr/local/bin/ && \
sbt sbt-version || sbt sbtVersion || true
Dockerfile-app
FROM app/spark-scala-env:latest
WORKDIR /app
# Pre-install base libraries
ADD build.sbt /app/
ADD project/plugins.sbt /app/project/
ADD src/. /app/src/
RUN sbt update
RUN sbt clean assembly
Commands:
Build the docker image with spark and scala env. i.e docker-compose build spark-scala-env
Build the image with the app jar. i.e docker-compose build app-spark-scala
Bring the spark env container i.e docker-compose up -d --scale spark-worker=2 spark-worker
Submit the job via docker-compose up -d app-submit-job
I am running the same job first using spark-shell and then spark-submit. But, spark-submit takes much longer. I am running this on a 16 node cluster (>180 Vcores) in client mode.
spark-submit conf:
spark-submit --class tool \
--master yarn \
--deploy-mode client \
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--conf "spark.kryo.classesToRegister=com.fastdtw.timeseries.TimeSeriesBase" \
--executor-memory 14g \
--driver-memory 16g \
--conf "spark.driver.maxResultSize=16g" \
--conf "spark.kryoserializer.buffer.max=512" \
--num-executors 30 \
--conf "spark.executor.cores=6" \
/home/target/scala-2.10/tool_2.10-0.1-SNAPSHOT.jar
spark-shell conf:
spark-shell \
--master yarn
--deploy-mode client \
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--conf "spark.kryo.classesToRegister=com.fastdtw.timeseries.TimeSeriesBase" \
--executor-memory 12g \
--driver-memory 16g \
--conf "spark.driver.maxResultSize=16g" \
--conf "spark.kryoserializer.buffer.max=512" \
--conf "spark.executor.cores=6" \
--conf "spark.executor.instances=30"
Why is there a difference at runtime?