How to pass arguments to spark-submit using docker

How to pass arguments to spark-submit using docker - scala

I have a docker container running on my laptop with a master and three workers, I can launch the typical wordcount example by entering the ip of the master using a command like this:
bash-4.3# spark/bin/spark-submit --class com.oreilly.learningsparkexamples.mini.scala.WordCount --master spark://spark-master:7077 /opt/spark-apps/learning-spark-mini-example_2.11-0.0.1.jar /opt/spark-data/README.md /opt/spark-data/output-5
I can see how the files have been generated inside output-5
but when I try to launch the process from outside, using the command:
docker run --network docker-spark-cluster_spark-network -v /tmp/spark-apps:/opt/spark-apps --env SPARK_APPLICATION_JAR_LOCATION=$SPARK_APPLICATION_JAR_LOCATION --env SPARK_APPLICATION_MAIN_CLASS=$SPARK_APPLICATION_MAIN_CLASS -e APP_ARGS="/opt/spark-data/README.md /opt/spark-data/output-5" spark-submit:2.4.0
Where
echo $SPARK_APPLICATION_JAR_LOCATION
/opt/spark-apps/learning-spark-mini-example_2.11-0.0.1.jar
echo $SPARK_APPLICATION_MAIN_CLASS
com.oreilly.learningsparkexamples.mini.scala.WordCount
And when I enter the page of the worker where the task is attempted, I can see that in line 11, the first of all, where the path for the first argument is collected, I have an error like this:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:11)
It is clear, in the zero position is not collecting the path of the first parameter, the one of the input file of which I want to do the wordcount.
The question is, why is docker not using the arguments passed through -e APP_ARGS="/opt/spark-data/README.md /opt/spark-data-output-5" ?
I already tried to run the job in a traditional way, loging to driver spark-master and running spark-submit command, but when i try to run the task with docker, it doesn't work.
It must be trivial, but i still have any clue. Can anybody help me?
SOLVED
I have to use a command like this:
docker run --network docker-spark-cluster_spark-network -v /tmp/spark-apps:/opt/spark-apps --env SPARK_APPLICATION_JAR_LOCATION=$SPARK_APPLICATION_JAR_LOCATION --env SPARK_APPLICATION_MAIN_CLASS=$SPARK_APPLICATION_MAIN_CLASS --env SPARK_APPLICATION_ARGS="/opt/spark-data/README.md /opt/spark-data/output-6" spark-submit:2.4.0
Resuming, i had to change -e APP_ARGS to --env SPARK_APPLICATION_ARGS
-e APP_ARGS is the suggested docker way...

This is the command that solves my problem:
docker run --network docker-spark-cluster_spark-network -v /tmp/spark-apps:/opt/spark-apps --env SPARK_APPLICATION_JAR_LOCATION=$SPARK_APPLICATION_JAR_LOCATION --env SPARK_APPLICATION_MAIN_CLASS=$SPARK_APPLICATION_MAIN_CLASS --env SPARK_APPLICATION_ARGS="/opt/spark-data/README.md /opt/spark-data/output-6" spark-submit:2.4.0
I have to use --env SPARK_APPLICATION_ARGS="args1 args2 argsN" instead of -e APP_ARGS="args1 args2 argsN".

Related

What is a -d2 flag in docker run command

I am working with a codebase that has a docker run command as follows (real name and password removed):
docker run -it --rm --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=password postgres:11.6 -d2
I know that -d flag is to --detach the container, but what is -d2? I can't figure out the purpose of this flag at the end of the command. I'm also confused why it's at the end of the command and not before the IMAGE name like the other flags.

The docker command line is order sensitive. Once docker sees an option or flag it cannot parse, it treats that as the image name. And everything after the image name is the command to run instead of the default command. In other words:
docker run ${options_to_run} ${image_name} ${command_override}
In the postgres image, the entrypoint is docker-entrypoint.sh and the default command is postgres. That means docker will run this container by default as docker-entrypoint.sh postgres (it concatenates the entrypoint and command together into one command with args to run). With the -d2 command override, that becomes docker-entrypoint.sh -d2 and the entrypoint script may interpret that as an option to change how it will run. The entrypoint has special handling for flags:
if [ "${1:0:1}" = '-' ]; then
set -- postgres "$#"
fi
....
exec "$#"
Which means the entrypoint arguments are modified from -d2 to postgres -d2 and then the shell in pid 1 is replaced by the command line arguments, postgres running with the -d2 argument.

I found the answer. -d2 is a postgres CLI option for specifying the debugging level. We are executing the postgres container with that postgres CLI option.
From postgres --help:
-d 1-5 debugging level

Makefile: Terminates after running commands "go test ./..."

I encountered a problem running "go test" from a makefile. The idea behind all this is to start a docker container, run all tests against it and then stop & remove the container.
The container gets started and the tests run, but the last two commands (docker stop & rm) aren't executed.
Make returns this message:
make: *** [test] Error 1
Is it "go test" which terminates the makefile execution?
.PHONY: up down test
up:
docker-compose up
down:
docker-compose down
test:
docker run -d \
--name dev \
--env-file $${HOME}/go/src/test-api/testdata/dbConfigTest.env \
-p 5432:5432 \
-v $${HOME}/go/src/test-api/testdata/postgres:/var/lib/postgresql/data postgres
# runs all tests including integration tests.
go test ./... --tags=integration -failfast -v
# stop and remove container
docker stop `docker ps -aqf "name=dev"`
docker rm `docker ps -aqf "name=dev"`

Assuming that you want the 'make test' to return the test status consider the following change to the makefile
test:
docker run -d \
--name dev \
--env-file $${HOME}/go/src/test-api/testdata/dbConfigTest.env \
-p 5432:5432 \
-v $${HOME}/go/src/test-api/testdata/postgres:/var/lib/postgresql/data postgres
# runs all tests including integration tests.
go test ./... --tags=integration -failfast -v ; echo "$$?" > test.result
# stop and remove container
docker stop `docker ps -aqf "name=dev"`
docker rm `docker ps -aqf "name=dev"
exit $$(cat test.result)
It uses the test.result file to capture the exit code from the test

Running a script from a mongodb docker-container

I have script that restores the database restore.sh:
mongorestore --port 27017 --db myapp `pwd`/db-dump/myapp
I want to run this in a short lived docker-container using the image mvertes/alpine-mongo.
To run a shortlived container the --rmis used:
docker run --rm --name mongo -p 27017:27017 \
-v /data/db:/data/db \
mvertes/alpine-mongo
But how do I execute my script in the same command?

Check out the docker run reference:
$ docker run [OPTIONS] IMAGE[:TAG|#DIGEST] [COMMAND] [ARG...]
You can pass in the command you wish to execute. In your case, this could be the restore script. You must consider two things, though.
The script is not part of the container, so you need to mount into the container.
Specifying a command overwrites the CMD directive in the Dockerfile.
If you look at the Dockerfile, you see this as its last line:
CMD [ "mongod" ]
This means the default command that the container executes is mongod. When you specify a command for docker run, you "replace" this with the command you pass in. In your case: Passing in the restore script will overwrite mongod, which means Mongo never starts and the script will fail.
You have two options:
Start one container with the database and another one with the restore script.
Try to chain the commands.
Since you want to run this in a short-lived container, option 2 might be better suited for you. Just remember to start mongod with the --fork flag to run it in daemon mode.
$ docker run --rm --name mongo -p 27017:27017 \
-v /data/db:/data/db \
-v "$(pwd)":/mnt/pwd \
mvertes/alpine-mongo "mongod --fork && /mnt/pwd/restore.sh"
Hopefully, this is all it takes to solve your problem.

mongo disconnect after connecting to executing thru shell in docker

I want to close the mongo shell after executing the following in a docker command:
#!/bin/bash
docker run -it --link sonams-mongo:mongo --rm mongo sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'
if [ $? -eq 0 ]; then \
echo "connected to mongo successful"; \
else \
echo "mongo connection NOT successful"; \
fi; \
When it connects it goes to a shell prompt within mongo. Is there a way to pass a shell command to do an exit right in or after the docker command?
thanks

Usually (of course it depends on the base image you're using) you wouldn't need to invoke "sh -c". Also, the -it combination is usually what makes the shell open and wait for input. Try to change your command a little bit, like below, without -it and sh -c:
docker run --link sonams-mongo:mongo --rm mongo mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"
if that doesn't help, try this:
echo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test" | docker run --link sonams-mongo:mongo --rm mongo mongo

Why can't you start postgres in docker using "service postgres start"?

All the tutorials point out to running postgres in the format of
docker run -d -p 5432 \
-t <your username>/postgresql \
/bin/su postgres -c '/usr/lib/postgresql/9.2/bin/postgres \
-D /var/lib/postgresql/9.2/main \
-c config_file=/etc/postgresql/9.2/main/postgresql.conf'
Why can't we in our Docker file have:
ENTRYPOINT ["/etc/init.d/postgresql-9.2", "start"]
And simply start the container by
docker run -d psql
Is that not the purpose of Entrypoint or am I missing something?

the difference is that the init script provided in /etc/init.d is not an entry point. Its purpose is quite different; to get the entry point started, in the background, and then report on the success or failure to the caller. that script causes a postgres process, usually indirectly via pg_ctl, to be started, detached from the controlling terminal.
for docker to work best, it needs to run the application directly, attached to the docker process. that way it can usefully and generically terminate it when the user asks for it, or quickly discover and respond to the process crashing.

Exemplify that IfLoop said.
Using CMD into Dockerfiles:
USE postgres
CMD ["/usr/lib/postgresql/9.2/bin/postgres", "-D", "/var/lib/postgresql/9.2/main", "-c", "config_file=/etc/postgresql/9.2/main/postgresql.conf"]
To run:
$docker run -d -p 5432:5432 psql
Watching PostgeSQL logs:
$docker logs -f POSTGRES_CONTAINER_ID

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to pass arguments to spark-submit using docker - scala

Related

What is a -d2 flag in docker run command

Makefile: Terminates after running commands "go test ./..."

Running a script from a mongodb docker-container

mongo disconnect after connecting to executing thru shell in docker

Why can't you start postgres in docker using "service postgres start"?

Categories

Resources