Apache Flink - org.apache.flink.client.program.ProgramInvocationException - scala

I have created an application with Apache FLink 1.0.3 using Scala 2.11.7 and I want to test it locally (a single jvm). So I did the following as stated in the website:
./bin/start-local.sh
tail log/flink-*-jobmanager-*.log
And it starts just fine, I can see the web interface at localhost:8081.
Then, I tried to submit my application, but I get either an exception or a weird message. For example when I type either of the following commands:
./bin/flink run ./myApp.jar
./bin/flink run ./myApp.jar -c MyMain
./bin/flink run ./myApp.jar -c myMain.class
./bin/flink run ./myApp.jar -c myMain.scala
./bin/flink run ./myApp.jar -c my.package.myMain
./bin/flink run ./myApp.jar -c my.package.myMain.class
./bin/flink run ./myApp.jar -c my.package.myMain.scala
I get the following exception:
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Neither a 'Main-Class', nor a 'program-class' entry was found in the jar file.
at org.apache.flink.client.program.PackagedProgram.getEntryPointClassNameFromJar(PackagedProgram.java:571)
at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:188)
at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:126)
at org.apache.flink.client.CliFrontend.buildProgram(CliFrontend.java:922)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:301)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1192)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1243)
And when I type either of the following commands:
./bin/flink run ./ -c myMain myApp.jar
./bin/flink run ./ -c myMain.class myApp.jar
./bin/flink run ./ -c myMain.scala myApp.jar
./bin/flink run ./ -c my.package.myMain myApp.jar
./bin/flink run ./ -c my.package.myMain.class myApp.jar
./bin/flink run ./ -c my.package.myMain.scala myApp.jar
I get the following error:
JAR file is not a file: .
Use the help option (-h or --help) to get help on the command.
The above commands do not work either with -c or --class. I use IntelliJ and I compiled the application using the Build Module from Dependencies option. What am I doing wrong?

The correct way to submit your JAR is:
bin/flink run -c my.package.myMain myApp.jar
You have to specify the arguments (like -c) before the JAR file. You got the error messages initially, because ./ was interpreted as the JAR and the rest of the line was ignored.
The -p argument is optional. Your last example works, because the argument order is correct and not because of the parallelism flag.

I figured out what was wrong. Flink needed to pass the parallelism degree as an argument, otherwise there was a program invocation exception. The command below worked for me:
./bin/flink run -p2 --class myMain myApp.jar

You have to mention the entry point class in your pom file.
see the following part in the pom file snippety
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.xyz.myMain</mainClass>
</transformer>
</transformers>
Please check the below snippet.

Related

sh: flake8: not found

I've been stuck with this problem for a couple of days now. I am trying to dockerize a django REST API + react (create-react-app) application.
problem: $ docker-compose run --rm app sh -c "flake8"
sh: flake8: not found
I am trying to configure flake8 in gitbash using the code $ docker-compose run --rm app sh -c "flake8"
But there is a problem and it says:-
sh: flake8: not found
I tried running- docker-compose build but the same problem still persists, please tell how to solve it?
Use $ which flake8 to understand how
your python virt env is supplying that dependency.
Adjust the PATH env var, perhaps via $ conda activate myproject
or $ poetry run flake8,
to let the dockerized container access that dependency.

Makefile: Terminates after running commands "go test ./..."

I encountered a problem running "go test" from a makefile. The idea behind all this is to start a docker container, run all tests against it and then stop & remove the container.
The container gets started and the tests run, but the last two commands (docker stop & rm) aren't executed.
Make returns this message:
make: *** [test] Error 1
Is it "go test" which terminates the makefile execution?
.PHONY: up down test
up:
docker-compose up
down:
docker-compose down
test:
docker run -d \
--name dev \
--env-file $${HOME}/go/src/test-api/testdata/dbConfigTest.env \
-p 5432:5432 \
-v $${HOME}/go/src/test-api/testdata/postgres:/var/lib/postgresql/data postgres
# runs all tests including integration tests.
go test ./... --tags=integration -failfast -v
# stop and remove container
docker stop `docker ps -aqf "name=dev"`
docker rm `docker ps -aqf "name=dev"`
Assuming that you want the 'make test' to return the test status consider the following change to the makefile
test:
docker run -d \
--name dev \
--env-file $${HOME}/go/src/test-api/testdata/dbConfigTest.env \
-p 5432:5432 \
-v $${HOME}/go/src/test-api/testdata/postgres:/var/lib/postgresql/data postgres
# runs all tests including integration tests.
go test ./... --tags=integration -failfast -v ; echo "$$?" > test.result
# stop and remove container
docker stop `docker ps -aqf "name=dev"`
docker rm `docker ps -aqf "name=dev"
exit $$(cat test.result)
It uses the test.result file to capture the exit code from the test

How to properly run SpotBugs on a Scala project

I'm trying to get SpotBugs run on Scala project using the SpotBugs CLI.
I installed the CLI like this:
$ curl -L -o /tmp/spotbugs-4.0.3.tgz https://github.com/spotbugs/spotbugs/releases/download/4.0.3/spotbugs-4.0.3.tgz
$ gunzip -c /tmp/spotbugs-4.0.3.tgz | tar xvf - -C /tmp
Then I run it like this
$ time java -jar /tmp/spotbugs-4.0.3/lib/spotbugs.jar -textui -xml:withMessages -html -output target/scala-2.11/spotbugs-report.html vad/target/scala-2.11/projectx-SNAPSHOT-assembly.jar
^Cjava -jar /tmp/spotbugs-4.0.3/lib/spotbugs.jar -textui -xml:withMessages -htm 2462.79s user 135.67s system 130% cpu 33:16.98 total
You can notice that it took more than 30mn without even have finished, I had to halt it.
It obviously seems that SpotBugs is not running properly here, so what am I doing wrong here?

Cannot specify current directory with docker run -v

I am trying to execute the following docker command in PowerShell but I cannot get it to recognize the $(PWD) for the current directory. Help please.
docker run -it -v $(PWD):/app --workdir /app samgentile\aspnetcore
I get:
C:\Program Files\Docker\Docker\Resources\bin\docker.exe: invalid reference format.
See 'C:\Program Files\Docker\Docker\Resources\bin\docker.exe run --help'.
You should use "/" instead of "\" in the image name:
docker run -it -v $PWD:/app --workdir /app samgentile/aspnetcore
Mihai is correct to point out the parentheses. These signify that you want to run the command PWD and use its output, whereas without the parentheses PWD is considered a variable. A correct invocation would take the form:
docker run -it -v $(pwd):/app --workdir /app samgentile/aspnetcore
Or:
docker run -it -v $PWD:/app --workdir /app samgentile/aspnetcore

How to pass arguments to spark-submit using docker

I have a docker container running on my laptop with a master and three workers, I can launch the typical wordcount example by entering the ip of the master using a command like this:
bash-4.3# spark/bin/spark-submit --class com.oreilly.learningsparkexamples.mini.scala.WordCount --master spark://spark-master:7077 /opt/spark-apps/learning-spark-mini-example_2.11-0.0.1.jar /opt/spark-data/README.md /opt/spark-data/output-5
I can see how the files have been generated inside output-5
but when I try to launch the process from outside, using the command:
docker run --network docker-spark-cluster_spark-network -v /tmp/spark-apps:/opt/spark-apps --env SPARK_APPLICATION_JAR_LOCATION=$SPARK_APPLICATION_JAR_LOCATION --env SPARK_APPLICATION_MAIN_CLASS=$SPARK_APPLICATION_MAIN_CLASS -e APP_ARGS="/opt/spark-data/README.md /opt/spark-data/output-5" spark-submit:2.4.0
Where
echo $SPARK_APPLICATION_JAR_LOCATION
/opt/spark-apps/learning-spark-mini-example_2.11-0.0.1.jar
echo $SPARK_APPLICATION_MAIN_CLASS
com.oreilly.learningsparkexamples.mini.scala.WordCount
And when I enter the page of the worker where the task is attempted, I can see that in line 11, the first of all, where the path for the first argument is collected, I have an error like this:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:11)
It is clear, in the zero position is not collecting the path of the first parameter, the one of the input file of which I want to do the wordcount.
The question is, why is docker not using the arguments passed through -e APP_ARGS="/opt/spark-data/README.md /opt/spark-data-output-5" ?
I already tried to run the job in a traditional way, loging to driver spark-master and running spark-submit command, but when i try to run the task with docker, it doesn't work.
It must be trivial, but i still have any clue. Can anybody help me?
SOLVED
I have to use a command like this:
docker run --network docker-spark-cluster_spark-network -v /tmp/spark-apps:/opt/spark-apps --env SPARK_APPLICATION_JAR_LOCATION=$SPARK_APPLICATION_JAR_LOCATION --env SPARK_APPLICATION_MAIN_CLASS=$SPARK_APPLICATION_MAIN_CLASS --env SPARK_APPLICATION_ARGS="/opt/spark-data/README.md /opt/spark-data/output-6" spark-submit:2.4.0
Resuming, i had to change -e APP_ARGS to --env SPARK_APPLICATION_ARGS
-e APP_ARGS is the suggested docker way...
This is the command that solves my problem:
docker run --network docker-spark-cluster_spark-network -v /tmp/spark-apps:/opt/spark-apps --env SPARK_APPLICATION_JAR_LOCATION=$SPARK_APPLICATION_JAR_LOCATION --env SPARK_APPLICATION_MAIN_CLASS=$SPARK_APPLICATION_MAIN_CLASS --env SPARK_APPLICATION_ARGS="/opt/spark-data/README.md /opt/spark-data/output-6" spark-submit:2.4.0
I have to use --env SPARK_APPLICATION_ARGS="args1 args2 argsN" instead of -e APP_ARGS="args1 args2 argsN".