Looking for a docker image to containerized my spark scala streaming jobs - scala

I am new to Spark(3.0.0_preview) and Scala(SBT). I have written a spark streaming job that I can run successfully on my local from my IDE
Now, I a looking for a way to dockerize the code so that I can run it with my docker-compose that builds the spark cluster
My docker-compose:
version: "3.3"
services:
spark-master:
image: rd/spark:latest
container_name: spark-master
hostname: spark-master
ports:
- "8080:8080"
- "7077:7077"
networks:
- spark-network
environment:
- "SPARK_LOCAL_IP=spark-master"
- "SPARK_MASTER_PORT=7077"
- "SPARK_MASTER_WEBUI_PORT=8080"
command: "/start-master.sh"
spark-worker:
image: rd/spark:latest
depends_on:
- spark-master
ports:
- 8080
networks:
- spark-network
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "SPARK_WORKER_WEBUI_PORT=8080"
command: "/start-worker.sh"
networks:
spark-network:
driver: bridge
ipam:
driver: default
Docker Files:
FROM openjdk:8-alpine
RUN apk --update add wget tar bash
RUN wget http://apache.mirror.anlx.net/spark/spark-3.0.0-preview2/spark-3.0.0-preview2-bin-hadoop2.7.tgz
RUN tar -xzf spark-3.0.0-preview2-bin-hadoop2.7.tgz && \
mv spark-3.0.0-preview2-bin-hadoop2.7 /spark && \
rm spark-3.0.0-preview2-bin-hadoop2.7.tgz
COPY start-master.sh /start-master.sh
COPY start-worker.sh /start-worker.sh
This seems a simple request but I am having a hard time finding good documentation on it.

Got it working with the following:
project-structure:
project
src
build.sbt
Dockerfile
Dockerfile-app
docker-compose.yml
docker-compose.yml
version: "3.3"
services:
spark-scala-env:
image: app/spark-scala-env:latest
build:
context: .
dockerfile: Dockerfile
app-spark-scala:
image: app/app-spark-scala:latest
build:
context: .
dockerfile: Dockerfile-app
spark-master:
image: app/app-spark-scala:latest
container_name: spark-master
hostname: localhost
ports:
- "8080:8080"
- "7077:7077"
networks:
- spark-network
environment:
- "SPARK_LOCAL_IP=spark-master"
- "SPARK_MASTER_PORT=7077"
- "SPARK_MASTER_WEBUI_PORT=8080"
command: ["sh", "-c", "/spark/bin/spark-class org.apache.spark.deploy.master.Master --ip $${SPARK_LOCAL_IP} --port $${SPARK_MASTER_PORT} --webui-port $${SPARK_MASTER_WEBUI_PORT}"]
spark-worker:
image: app/app-spark-scala:latest
hostname: localhost
depends_on:
- spark-master
ports:
- 8080
networks:
- spark-network
#network_mode: host
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "SPARK_WORKER_WEBUI_PORT=8080"
- "SPARK_WORKER_CORES=2"
command: ["sh", "-c", "/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port $${SPARK_WORKER_WEBUI_PORT} $${SPARK_MASTER}"]
app-submit-job:
image: app/app-spark-scala:latest
ports:
- "4040:4040"
environment:
- "SPARK_APPLICATION_MAIN_CLASS=com.app.spark.TestAssembly"
- "SPARK_MASTER=spark://spark-master:7077"
- "APP_PACKAGES=org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0-preview2"
- "APP_JAR_LOC=/app/target/scala-2.12/app_spark_scala-assembly-0.2.jar"
hostname: localhost
networks:
- spark-network
volumes:
- ./appdata:/appdata
command: ["sh", "-c", "/spark/bin/spark-submit --packages $${APP_PACKAGES} --class $${SPARK_APPLICATION_MAIN_CLASS} --master $${SPARK_MASTER} $${APP_JAR_LOC}"]
networks:
spark-network:
driver: bridge
ipam:
driver: default
Dockerfile
FROM openjdk:8-alpine
ARG SPARK_VERSION
ARG HADOOP_VERSION
ARG SCALA_VERSION
ARG SBT_VERSION
ENV SPARK_VERSION=${SPARK_VERSION:-3.0.0-preview2}
ENV HADOOP_VERSION=${HADOOP_VERSION:-2.7}
ENV SCALA_VERSION ${SCALA_VERSION:-2.12.8}
ENV SBT_VERSION ${SBT_VERSION:-1.3.4}
RUN apk --update add wget tar bash
RUN \
echo "$SPARK_VERSION $HADOOP_VERSION" && \
echo http://apache.mirror.anlx.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \
wget http://apache.mirror.anlx.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
RUN tar -xzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \
mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} /spark && \
rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
RUN \
echo "$SCALA_VERSION $SBT_VERSION" && \
mkdir -p /usr/lib/jvm/java-1.8-openjdk/jre && \
touch /usr/lib/jvm/java-1.8-openjdk/jre/release && \
apk add -U --no-cache bash curl && \
curl -fsL http://downloads.typesafe.com/scala/$SCALA_VERSION/scala-$SCALA_VERSION.tgz | tar xfz - -C /usr/local && \
ln -s /usr/local/scala-$SCALA_VERSION/bin/* /usr/local/bin/ && \
scala -version && \
scalac -version
RUN \
curl -fsL https://github.com/sbt/sbt/releases/download/v$SBT_VERSION/sbt-$SBT_VERSION.tgz | tar xfz - -C /usr/local && \
$(mv /usr/local/sbt-launcher-packaging-$SBT_VERSION /usr/local/sbt || true) \
ln -s /usr/local/sbt/bin/* /usr/local/bin/ && \
sbt sbt-version || sbt sbtVersion || true
Dockerfile-app
FROM app/spark-scala-env:latest
WORKDIR /app
# Pre-install base libraries
ADD build.sbt /app/
ADD project/plugins.sbt /app/project/
ADD src/. /app/src/
RUN sbt update
RUN sbt clean assembly
Commands:
Build the docker image with spark and scala env. i.e docker-compose build spark-scala-env
Build the image with the app jar. i.e docker-compose build app-spark-scala
Bring the spark env container i.e docker-compose up -d --scale spark-worker=2 spark-worker
Submit the job via docker-compose up -d app-submit-job

Related

pg_dump from Celery container differs from pg_dump in other containers

I can't understand where pg_dump version is coming from.
I forced everywhere postrgresql-client-13 to be installed.
/usr/bin/pg_dump --version
Celery Beat and Celery
pg_dump (PostgreSQL) 11.12 (Debian 11.12-0+deb10u1)
Other containers (web & postgre and local machine) :
pg_dump (PostgreSQL) 13.4 (Debian 13.4-1.pgdg100+1)
Here is my Dockerfile
FROM python:3
#testdriven turotial they use an other user than root but seemed to fail ehre .
# create directory for the app user
RUN mkdir -p /home/app
ENV HOME=/home/app
ENV APP_HOME=/home/app/web
RUN mkdir $APP_HOME
RUN mkdir $APP_HOME/staticfiles
WORKDIR $APP_HOME
ENV PYTHONUNBUFFERED 1
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main" | tee /etc/apt/sources.list.d/pgdg.list
RUN apt-get update -qq && apt-get install -y \
postgresql-client-13 \
binutils \
libproj-dev \
gdal-bin
RUN apt-get update \
&& apt-get install -yyq netcat
# install psycopg2 dependencies
#install dependencies
RUN pip3 install --no-cache-dir --upgrade pip && pip install --no-cache-dir --no-cache-dir -U pip wheel setuptools
COPY ./requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# copy entrypoint-prod.sh
COPY ./entrypoint.prod.sh $APP_HOME
# copy project
COPY . $APP_HOME
# run entrypoint.prod.sh
ENTRYPOINT ["/home/app/web/entrypoint.prod.sh"]
and here is my docker-compose
version: '3.7'
services:
web:
build:
context: ./app
dockerfile: Dockerfile.prod
command: gunicorn core.wsgi:application --bind 0.0.0.0:8000
volumes:
- static_volume:/home/app/web/staticfiles
expose:
- 8000
env_file:
- ./app/.env.prod
depends_on:
- db
db:
image: postgis/postgis:13-master
volumes:
- postgres_data:/var/lib/postgresql/data/
env_file:
- ./app/.env.prod.db
redis:
image: redis:6
celery:
build: ./app
command: celery -A core worker -l info
volumes:
- ./app/:/usr/src/app/
env_file:
- ./app/.env.prod
depends_on:
- redis
celery-beat:
build: ./app
command: celery -A core beat -l info
volumes:
- ./app/:/usr/src/app/
env_file:
- ./app/.env.prod
depends_on:
- redis
nginx-proxy:
image : nginxproxy/nginx-proxy:latest
container_name: nginx-proxy
build: nginx
restart: always
ports:
- 443:443
- 80:80
volumes:
- static_volume:/home/app/web/staticfiles
- certs:/etc/nginx/certs
- html:/usr/share/nginx/html
- vhost:/etc/nginx/vhost.d
- /var/run/docker.sock:/tmp/docker.sock:ro
depends_on:
- web
nginx-proxy-letsencrypt:
image: jrcs/letsencrypt-nginx-proxy-companion
env_file:
- ./app/.env.prod.proxy-companion
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- certs:/etc/nginx/certs
- html:/usr/share/nginx/html
- vhost:/etc/nginx/vhost.d
depends_on:
- nginx-proxy
volumes:
postgres_data:
static_volume:
certs:
html:
vhost:
I really need to have Celery with the same pg_dump version.
Can you guys provide some inputs ?

curl request to docker-compose port hangs in travis-ci

Our travis builds have started failing and I can't figure out why. Our app runs in docker-compose and then we run cypress to against it. This used to work perfectly. Now the host port for the web server is just unresponsive. I've removed cypress and am just trying to run curl http://localhost:3001 and it just hangs. Here's the travis.yml. Any suggestions would be highly appreciated. I have tried fiddling for several hours with the docker versions, distros, localhost vs 127.0.0.1, etc to no avail. All of this works fine locally on my workstation.
language: node_js
node_js:
- "12.19.0"
env:
- DOCKER_COMPOSE_VERSION=1.25.4
services:
- docker
sudo: required
# Supposedly this is needed for Cypress to work in Ubuntu 16
# https://github.com/cypress-io/cypress-example-kitchensink/blob/master/basic/.travis.yml
addons:
apt:
packages:
- libgconf-2-4
before_install:
# upgrade docker compose https://docs.travis-ci.com/user/docker/#using-docker-compose
- sudo rm /usr/local/bin/docker-compose
- curl -L https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-`uname -s`-`uname -m` > docker-compose
- chmod +x docker-compose
- sudo mv docker-compose /usr/local/bin
# upgrade docker itself https://docs.travis-ci.com/user/docker/#installing-a-newer-docker-version
- curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
- sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
- sudo apt-get update
- sudo apt-get -y -o Dpkg::Options::="--force-confnew" install docker-ce
# Put the .env file in place
- cp .env.template .env
install:
# Install node modules (for jest and wait-on) and start up the docker containers
- cd next
- npm ci
- cd ..
- cd e2e
- npm ci
- cd ..
script:
- docker --version
- docker-compose --version
- docker-compose up --build -d
# Run unit tests
# - cd next
# - npm run test
# Run e2e tests
# - cd ../e2e
# - npx cypress verify
# - CYPRESS_FAIL_FAST=true npx wait-on http://localhost:3001 --timeout 100000 && npx cypress run --config video=false,pageLoadTimeout=100000,screenshotOnRunFailure=false
- sleep 30
- curl http://127.0.0.1:3001 --max-time 30
- docker-compose logs db
- docker-compose logs express
- docker-compose logs next
post_script:
- docker-compose down
The logs look like this:
The command "docker-compose up --build -d" exited with 0.
30.01s$ sleep 30
The command "sleep 30" exited with 0.
93.02s$ curl http://127.0.0.1:3001 --max-time 30
curl: (28) Operation timed out after 30001 milliseconds with 0 bytes received
The command "curl http://127.0.0.1:3001 --max-time 30" exited with 28.
The docker compose logs show nothing suspicious. It's as if the network wasn't set up correctly and docker is not aware of any requests.
Here is the docker-compose.yml in case it's useful:
version: '3.7'
services:
db:
image: mg-postgres
build: ./postgres
ports:
- '5433:5432'
environment:
POSTGRES_HOST_AUTH_METHOD: 'trust'
adminer:
image: adminer
depends_on:
- db
ports:
- '8080:8080'
express:
image: mg-server
build: ./express
restart: always
depends_on:
- db
env_file:
- .env
environment:
DEBUG: express:*
volumes:
- type: bind
source: ./express
target: /app
- /app/node_modules
ports:
- '3000:3000'
next:
image: mg-next
build: ./next
depends_on:
- db
- express
env_file:
- .env
volumes:
- type: bind
source: ./next
target: /app
- /app/node_modules
ports:
- '3001:3001'
command: ['npm', 'run', 'dev']

Run apache pulsar using docker-compose

I am able to run Apache Pulsar using this docker command:
docker run -it \
-p 6650:6650 \
-p 8080:8080 \
--mount source=pulsardata,target=/pulsar/data \
--mount source=pulsarconf,target=/pulsar/conf \
apachepulsar/pulsar:2.6.0 \
bin/pulsar standalone
I am trying to convert this to docker-compose and I use the docker-compose.yml file below. When I run the command:
docker-compose up
I get the error:
Attaching to pulsar
pulsar | Error: Could not find or load main class "
pulsar exited with code 1
What am I doing wrong here?
Thanks in advance.
version: '3.1'
services:
standalone:
image: apachepulsar/pulsar:2.6.0
container_name: pulsar
ports:
- 8080:8080
- 6650:6650
environment:
- PULSAR_MEM=" -Xms512m -Xmx512m -XX:MaxDirectMemorySize=1g"
volumes:
- pulsardata:/pulsar/data
- pulsarconf:/pulsar/conf
command: /bin/bash -c "bin/pulsar standalone"
volumes:
pulsardata:
pulsarconf:
The issue is with the env variable. It should work if you specify it in the following way:
version: '3.1'
services:
standalone:
image: apachepulsar/pulsar:2.6.0
ports:
- 8080:8080
- 6650:6650
environment:
PULSAR_MEM: " -Xms512m -Xmx512m -XX:MaxDirectMemorySize=1g"
command: bin/pulsar standalone
# ... other parameters

Why can't I run pdo_pgsql in my container?

I am trying to use AWS RDS postgresql from my php app. I have modified the Dockerfile to contain
RUN apt-get update && apt-get install -y libpq-dev && docker-php-ext-install pdo pdo_pgsql
But it doesn't seem to run. I think this because when I (finally) ran phpinfo() from the container, I don't see it loaded. I am assuming that when I run docker-compose up, the Dockerfile (that builds php-fpm) is run "automatically". Is that true? Here is the Dockerfile that doesn't seem to run:
FROM bitnami/php-fpm:7.2 as builder
RUN install_packages git autoconf build-essential
WORKDIR /app
RUN wget https://github.com/xdebug/xdebug/archive/2.6.0.tar.gz && \
tar xzf 2.6.0.tar.gz && \
cd xdebug-2.6.0 && \
phpize && \
./configure --enable-xdebug && \
make && make install
RUN apt-get update && apt-get install -y libpq-dev && docker-php-ext-install pdo pdo_pgsql
FROM bitnami/php-fpm:7.2
COPY --from=builder /opt/bitnami/php/lib/php/extensions/xdebug.so /opt/bitnami/php/lib/php/extensions/
RUN echo 'zend_extension="/opt/bitnami/php/lib/php/extensions/xdebug.so"' >> /opt/bitnami/php/etc/php.ini
RUN echo "xdebug.remote_port=9000" >> /opt/bitnami/php/etc/php.ini \
&& echo "xdebug.remote_enable=1" >> /opt/bitnami/php/etc/php.ini \
&& echo "xdebug.remote_connect_back=0" >> /opt/bitnami/php/etc/php.ini \
&& echo "xdebug.remote_host=192.168.122.1" >> /opt/bitnami/php/etc/php.ini \
&& echo "xdebug.idekey=docker" >> /opt/bitnami/php/etc/php.ini \
&& echo "xdebug.remote_autostart=1" >> /opt/bitnami/php/etc/php.ini \
&& echo "xdebug.remote_log=/tmp/xdebug.log" >> /opt/bitnami/php/etc/php.ini \
&& echo "extension=pgp_pdo_pgsql.so" >> /opt/bitnami/php/etc/php.ini \
&& echo "extension=pgp_pgsql.so" >> /opt/bitnami/php/etc/php.ini
And here is the docker-compose
version: '3'
services:
apache:
image: bitnami/apache:latest
restart: unless-stopped
ports:
- 80:8080
volumes:
- ./apache/app.conf:/vhosts/app.conf:ro
- ./app:/app
networks:
- net
mysql:
container_name: "mysql"
restart: unless-stopped
image: mysql:5.6
environment:
- MYSQL_DATABASE
- MYSQL_PASSWORD
- MYSQL_ROOT_PASSWORD
- MYSQL_USER
volumes:
- data:/var/lib/mysql
- ./mysql/mysql.sql:/docker-entrypoint-initdb.d/mysql.sql
ports:
- "3306:3306"
networks:
- net
phpmyadmin:
container_name: phpmyadmin
restart: unless-stopped
image: phpmyadmin/phpmyadmin
environment:
- PMA_HOST=mysql
- PMA_PORT=3306
ports:
- "8081:80"
networks:
- net
php-fpm:
build: ./php
restart: unless-stopped
image: bitnami/php-fpm
volumes:
- ./app:/app
environment:
- XDEBUG_CONFIG="remote_host=192.168.122.1"
networks:
- net
jasperreports:
image: 'bitnami/jasperreports:7'
restart: unless-stopped
environment:
- MARIADB_HOST=mysql
- MARIADB_PORT_NUMBER=3306
- JASPERREPORTS_USERNAME=admin
- JASPERREPORTS_PASSWORD=bitnami
- JASPERREPORTS_DATABASE_USER=admin
- JASPERREPORTS_DATABASE_PASSWORD=xxx
- JASPERREPORTS_DATABASE_NAME=jasper
- ALLOW_EMPTY_PASSWORD=yes
ports:
- '8080:8080'
volumes:
- jasperreports_data:/bitnami
depends_on:
- mysql
networks:
- net
volumes:
data:
driver: local
jasperreports_data:
driver: local
If this Dockerfile is not being run automatically, how do I get it to create a new php-fpm container?
If you change Dockerfile and you want to update the image with docker-compose up, then you need to pass the flag --build with docker-compose up.
--build Build images before starting containers.
docker-compose up --build
Ref:- https://docs.docker.com/compose/reference/up/

Using Flyway in CI/CD to populate/seed a Postgres DB in Docker then TAG and publish the new docker image to be used for testing

I want to start using Flyway to version our database changes. I am trying to create a Postgres Docker container with seeded data, TAG and publish that docker image to be used in automated testing.
I tried using docker-compose, however I haven't figured a way to TAG and publish after Flyway runs.
Repository with test project
https://github.com/bigboy1122/flyway_postgres
Here is a docker-compose I created
version: '3.7'
services:
flyway:
image: boxfuse/flyway
restart: always
command: -url=jdbc:postgresql://db:5432/foo -user='postgres' -password='test123' -schemas='bar' migrate
volumes:
- .:/flyway/sql
depends_on:
- db
db:
image: tgalati1122/flyway_seeded_postgres
build:
context: .
dockerfile: ./Dockerfile
restart: always
environment:
POSTGRES_PASSWORD: 'test123'
POSTGRES_USER: 'postgres'
POSTGRES_DB: 'foo'
ports:
- 5432:5432
adminer:
image: adminer
restart: always
ports:
- 8080:8080
depends_on:
- db
Here is trying to use Docker multi build feature.
In the example below, the database I think spins up but I can't get flyway to access it.
FROM postgres:10.5-alpine as donar
ENV PGDATA=/pgdata
ENV POSTGRES_PASSWORD='test123'
ENV POSTGRES_USER='postgres'
ENV POSTGRES_DB='foo'
EXPOSE 5432:5432
RUN /docker-entrypoint.sh --help
FROM debian:stretch-slim as build_tools
ENV FLWAY_VERSION='5.2.0
RUN set -ex; \
if ! command -v gpg > /dev/null; then \
apt-get update; \
apt-get install -y --no-install-recommends \
gnupg \
dirmngr \
wget \
; \
rm -rf /var/lib/apt/lists/*; \
fi
VOLUME flyway/sql
RUN wget --no-check-certificate https://repo1.maven.org/maven2/org/flywaydb/flyway-commandline/5.2.0/flyway-commandline-5.2.0-linux-x64.tar.gz -O - | tar -xz
RUN pwd; \
ls -l; \
cd flyway-5.2.0; \
pwd; \
ls -l; \
sh ./flyway -url=jdbc:postgresql://localhost:5432/optins -user='postgres' -password='test123' -schemas='bar' migrate; \
FROM postgres:10.5-alpine
ENV PGDATA=/pgdata
ENV POSTGRES_PASSWORD='test123'
ENV POSTGRES_USER='postgres'
ENV POSTGRES_DB='foo'
EXPOSE 5432:5432
COPY --chown=postgres:postgres --from=donor /pgdata /pgdata
The idea I am going for as database changes occur, I want to automatically build a new lightweight test database as well as update the persisted databases throughout the enterprise.