pyspark container- spark-submitting a pyspark script throws file not found error

pyspark container- spark-submitting a pyspark script throws file not found error - pyspark

Solution-
Add following env variables to the container
export PYSPARK_PYTHON=/usr/bin/python3.9
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3.9
Trying to create a spark container and spark-submit a pyspark script.
I am able to create the container but running the pyspark script throws the following error:
Exception in thread "main" java.io.IOException: Cannot run program
"python": error=2, No such file or directory
Questions :
Any idea why this error is occurring ?
Do i need to install python separately or does it comes bundled with spark download ?
Do i need to install Pyspark separately or does it comes bundled with spark download ?
What is preferable regarding python installation? download and put it under /opt/python or use apt-get ?
pyspark script:
from pyspark import SparkContext
sc = SparkContext("local", "count app")
words = sc.parallelize (
["scala",
"java",
"hadoop",
"spark",
"akka",
"spark vs hadoop",
"pyspark",
"pyspark and spark"]
)
counts = words.count()
print "Number of elements in RDD -> %i" % (counts)
output of spark-submit:
newuser#c1f28230da16:~$ spark-submit count.py
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor
java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting
this to the maintainers of org.apache.spark.unsafe.Platform WARNING:
Use --illegal-access=warn to enable warnings of further illegal
reflective access operations WARNING: All illegal access operations
will be denied in a future release 21/02/01 19:58:35 WARN
NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable Exception in
thread "main" java.io.IOException: Cannot run program "python":
error=2, No such file or directory at
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128) at
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071) at
org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97) at
org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method) at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564) at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused
by: java.io.IOException: error=2, No such file or directory at
java.base/java.lang.ProcessImpl.forkAndExec(Native Method) at
java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:319) at
java.base/java.lang.ProcessImpl.start(ProcessImpl.java:250) at
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
... 15 more log4j:WARN No appenders could be found for logger
(org.apache.spark.util.ShutdownHookManager). log4j:WARN Please
initialize the log4j system properly. log4j:WARN See
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
output of printenv:
newuser#c1f28230da16:~$ printenv
HOME=/home/newuser
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
PYTHONPATH=:/opt/spark/python:/opt/spark/python/lib/py4j-0.10.4-src.zip
TERM=xterm SHLVL=1 SPARK_HOME=/opt/spark
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/java/bin:/opt/spark/bin
_=/usr/bin/printenv
myspark dockerfile:
JDK_PACKAGE=openjdk-14.0.2_linux-x64_bin.tar.gz ARG
SPARK_HOME=/opt/spark ARG SPARK_PACKAGE=spark-3.0.1-bin-hadoop3.2.tgz
#MAINTAINER demo#gmail.com
#LABEL maintainer="demo#foo.com"
############################################
### Install openjava
############################################
# Base image stage 1 FROM ubuntu as jdk
ARG JAVA_HOME ARG JDK_PACKAGE
WORKDIR /opt/
## download open java
# ADD https://download.java.net/java/GA/jdk14.0.2/205943a0976c4ed48cb16f1043c5c647/12/GPL/$JDK_PACKAGE
/
# ADD $JDK_PACKAGE / COPY $JDK_PACKAGE .
RUN mkdir -p $JAVA_HOME/ && \
tar -zxf $JDK_PACKAGE --strip-components 1 -C $JAVA_HOME && \
rm -f $JDK_PACKAGE
############################################
### Install spark search
############################################
# Base image stage 2 From ubuntu as spark
#ARG JAVA_HOME ARG SPARK_HOME ARG SPARK_PACKAGE
WORKDIR /opt/
## download spark COPY $SPARK_PACKAGE .
RUN mkdir -p $SPARK_HOME/ && \
tar -zxf $SPARK_PACKAGE --strip-components 1 -C $SPARK_HOME && \
rm -f $SPARK_PACKAGE
# Mount elasticsearch.yml config
### ADD config/elasticsearch.yml /opt/elasticsearch/config/elasticsearch.yml
############################################
### final
############################################
From ubuntu as finalbuild
ARG JAVA_HOME ARG SPARK_HOME ARG SPARK_PACKAGE
WORKDIR /opt/
# get artifacts from previous stages COPY --from=jdk $JAVA_HOME $JAVA_HOME COPY --from=spark $SPARK_HOME $SPARK_HOME
# Setup JAVA_HOME, this is useful for docker commandline ENV JAVA_HOME $JAVA_HOME ENV SPARK_HOME $SPARK_HOME
# setup paths ENV PATH $PATH:$JAVA_HOME/bin ENV PATH $PATH:$SPARK_HOME/bin ENV PYTHONPATH
$PYTHONPATH:$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip
# Expose ports
# EXPOSE 9200
# EXPOSE 9300
# Define mountable directories.
#VOLUME ["/data"]
## give permission to entire setup directory RUN useradd newuser --create-home --shell /bin/bash && \
echo 'newuser:newpassword' | chpasswd && \
chown -R newuser $SPARK_HOME $JAVA_HOME && \
chown -R newuser:newuser /home/newuser && \
chmod 755 /home/newuser
#chown -R newuser:newuser /home/newuser
#chown -R newuser /home/newuser && \
# Install Python RUN apt-get update && \
apt-get install -yq curl && \
apt-get install -yq vim && \
apt-get install -yq python3.9
## Install PySpark and Numpy
#RUN \
# pip install --upgrade pip && \
# pip install numpy && \
# pip install pyspark
#
USER newuser
WORKDIR /home/newuser
# RUN chown -R newuser /home/newuser

Added following env variables to the container and it works
export PYSPARK_PYTHON=/usr/bin/python3.9
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3.9

Related

How to make Python version executables global across multiple pyenv-virtualenv virtual environments

A pyenv Python version (e.g. 3.10.4) has the "normal" expected Python executables associated with it (e.g., pip, 2to3, pydoc)
$ ls "${PYENV_ROOT}/versions/3.10.4/bin"
2to3 idle idle3.10 pip3 pydoc pydoc3.10 python-config python3-config python3.10-config
2to3-3.10 idle3 pip pip3.10 pydoc3 python python3 python3.10 python3.10-gdb.py
and a pyenv-virtualenv virtual environment has only the executables that one would a get inside the virtual environment directory structure
$ pyenv virtualenv 3.10.4 venv
$ ls "${PYENV_ROOT}/versions/venv"
bin include lib lib64 pyvenv.cfg
$ ls "${PYENV_ROOT}/versions/venv/bin/"
Activate.ps1 activate activate.csh activate.fish pip pip3 pip3.10 pydoc python python3 python3.10
by default after creation, the venv virtual environment doesn't know about executables of the Python version that it is associated to, like 2to3
$ pyenv activate venv
(venv) $ 2to3 --help
pyenv: 2to3: command not found
The `2to3' command exists in these Python versions:
3.10.4
Note: See 'pyenv help global' for tips on allowing both
python2 and python3 to be found.
so to allow for a virtual environment like venv to have access to these executables you add both it and the Python that created it to pyenv global so that pyenv will "fall back" to the Python version when the executable isn't found
(venv) $ pyenv deactivate
$ pyenv global venv 3.10.4
(venv) $ pyenv global
venv
3.10.4
(venv) $ 2to3 --help | head -n 3
Usage: 2to3 [options] file|dir ...
Options:
This pattern works for one virtual environment, but how do you maintain access to executables like 2to3 (or pipx as seen below`) when you have multiple virtual environments in play?
(venv) $ pyenv virtualenv 3.10.4 example && pyenv activate example
(example) $ 2to3
pyenv: 2to3: command not found
The `2to3' command exists in these Python versions:
3.10.4
Note: See 'pyenv help global' for tips on allowing both
python2 and python3 to be found.
Reproducible example
Using the following Dockerfile
FROM debian:bullseye
SHELL ["/bin/bash", "-c"]
USER root
RUN apt-get update -y && \
apt-get install --no-install-recommends -y \
make \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
wget \
curl \
llvm \
libncurses5-dev \
xz-utils \
tk-dev \
libxml2-dev \
libxmlsec1-dev \
libffi-dev \
liblzma-dev \
g++ && \
apt-get install -y \
git && \
apt-get -y clean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt/lists/*
# Install pyenv and pyenv-virtualenv
ENV PYENV_RELEASE_VERSION=2.3.0
RUN git clone --depth 1 https://github.com/pyenv/pyenv.git \
--branch "v${PYENV_RELEASE_VERSION}" \
--single-branch \
~/.pyenv && \
pushd ~/.pyenv && \
src/configure && \
make -C src && \
echo 'export PYENV_ROOT="${HOME}/.pyenv"' >> ~/.bashrc && \
echo 'export PATH="${PYENV_ROOT}/bin:${PATH}"' >> ~/.bashrc && \
echo 'eval "$(pyenv init -)"' >> ~/.bashrc && \
. ~/.bashrc && \
git clone --depth 1 https://github.com/pyenv/pyenv-virtualenv.git $(pyenv root)/plugins/pyenv-virtualenv && \
echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc
# Install CPython
ENV PYTHON_VERSION=3.10.4
RUN . ~/.bashrc && \
echo "Install Python ${PYTHON_VERSION}" && \
PYTHON_MAKE_OPTS="-j8" pyenv install "${PYTHON_VERSION}"
# Make 'base' virtual envirionment, add it and its Python version to global for
# executables like 2to3 or pipx to be findable
# c.f. https://github.com/pyenv/pyenv-virtualenv/issues/16#issuecomment-37640961
# and then install pipx into the 'base' virtual environment and use pipx to install
# pepotron
RUN . ~/.bashrc && \
pyenv virtualenv "${PYTHON_VERSION}" base && \
echo "" && echo "Python ${PYTHON_VERSION} has additional executables..." && \
ls -lh "${PYENV_ROOT}/versions/${PYTHON_VERSION}/bin" && \
echo "" && echo "...compared to 'base' virtualenv made with Python ${PYTHON_VERSION}" && \
ls -lh "${PYENV_ROOT}/versions/base/bin" && \
echo "" && echo "...because 'base' is actually a symlink" && \
ls -lh "${PYENV_ROOT}/versions/" && \
pyenv global base "${PYTHON_VERSION}" && \
python -m pip --quiet install --upgrade pip setuptools wheel && \
python -m pip --quiet install pipx && \
python -m pipx ensurepath && \
eval "$(register-python-argcomplete pipx)" && \
pipx install pepotron
WORKDIR /home/data
built with
docker build . --file Dockerfile --tag pyenv/multiple-virtualenvs:debug
it can be run with the following to demonstrate the problem
$ docker run --rm -ti pyenv/multiple-virtualenvs:debug
(base) root#26dfa530cd82:/home/data# pyenv global
base
3.10.4
(base) root#26dfa530cd82:/home/data# 2to3 --help | head -n 3
Usage: 2to3 [options] file|dir ...
Options:
(base) root#26dfa530cd82:/home/data# pipx list
venvs are in /root/.local/pipx/venvs
apps are exposed on your $PATH at /root/.local/bin
package pepotron 0.6.0, installed using Python 3.10.4
- bpo
- pep
(base) root#26dfa530cd82:/home/data# pep 3.11
https://peps.python.org/pep-0664/
(base) root#26dfa530cd82:/home/data# pyenv virtualenv 3.10.4 example
(base) root#26dfa530cd82:/home/data# pyenv activate example
pyenv-virtualenv: prompt changing will be removed from future release. configure `export PYENV_VIRTUALENV_DISABLE_PROMPT=1' to simulate the behavior.
(example) root#26dfa530cd82:/home/data# 2to3
pyenv: 2to3: command not found
The `2to3' command exists in these Python versions:
3.10.4
Note: See 'pyenv help global' for tips on allowing both
python2 and python3 to be found.
(example) root#26dfa530cd82:/home/data# pipx
pyenv: pipx: command not found
The `pipx' command exists in these Python versions:
3.10.4/envs/base
base
Note: See 'pyenv help global' for tips on allowing both
python2 and python3 to be found.
So how can one have something like pipx, that is designed to be installed globally, work globally if it is installed in a pyenv-virtualenv virtual environment so that you don't have to have anything installed with pip in the system Python?
It would seem that instead of ever using pyenv activate to activate a virtual environment you would need to deactivate any virtual environments and then only use pyenv global <virtual environment name> <virtual environment Python version> to effectively switch environments. I assume that can't be the only way to use Python version executables inside of a virtual environment, as that seems like that would remove the point of having a separate pyenv activate CLI API for pyenv-virtualenv.

You can directly execute the pipx binary in the pyenv prefix, it should work correctly.
Pyenv's shims mechanism isn't really designed for global binaries like this. I naively expected that the global environment would act as fallback when a local environment doesn't have an installed binary, but I think pyenv only looks at the system Python before falling back to $PATH.
So, if you don't install pipx into system (which, if you're not installing a system pip, I doubt you're doing), then the naive fallback doesn't work.
An alternative is to run pyenv with a temporary environment, i.e.
PYENV_VERSION=my-pipx-env pyenv exec pipx.
I'd want to make this an executable, so I'd suggest adding something like this into a special PATH directory that takes precedence from the pyenv paths:
#!/usr/bin/env bash
set -eu
export PYENV_VERSION="pipx"
exec "${PYENV_ROOT}/libexec/pyenv" exec pipx "$#"
Although, I'd be tempted to forgo the whole activation logic and just directly exec the pipx binary from the environment /bin to avoid running into any shell configuration errors.

How to connect to Confluent Cloud from a Docker container

I have setup a Kafka topic on Confluent cloud (https://confluent.cloud/) and can connect/send messages to the topic using below configuration:
kafka-config.properties:
# Kafka
ssl.endpoint.identification.algorithm=
bootstrap.servers=pkc-4yyd6.us-east1.gcp.confluent.cloud:9092
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="uname" password="pwd";
sasl.mechanism=PLAIN
Connecting from a docker container I receive:
Failed to produce: org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Searching the above error suggests ssl.endpoint.identification.algorithm= should fix
Here is my Dockerfile:
FROM ysihaoy/scala-play:2.12.2-2.6.0-sbt-0.13.15
COPY ["build.sbt", "/tmp/build/"]
COPY ["project/plugins.sbt", "project/build.properties", "/tmp/build/project/"]
COPY . /root/app/
WORKDIR /root/app
CMD ["sbt" , "run"]
I build and run the container using:
docker build -t kafkatest .
docker run -it kafkatest
Is there extra config required to allow connecting to Confluent Kafka ?
I do not receive this issue when building locally (not using Docker).
Update:
Here is the Scala src I use to build the properties:
def buildProperties(): Properties = {
val kafkaPropertiesFile = Source.fromResource("kafka-config.properties")
val properties: Properties = new Properties
properties.load(kafkaPropertiesFile.bufferedReader())
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.connect.json.JsonSerializer")
properties
}
Update 2:
def buildProperties(): Properties = {
val kafkaPropertiesFile = Source.fromResource("kafka-config.properties")
val properties: Properties = new Properties
properties.load(kafkaPropertiesFile.bufferedReader())
println("bootstrap.servers:"+properties.get("bootstrap.servers"))
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.connect.json.JsonSerializer")
properties
}
The property bootstrap.servers is found, therefore the file is added to the container.
Update3 :
sasl.jaas.config:org.apache.kafka.common.security.plain.PlainLoginModule required username="Q763KBPRI" password="bFehkfL/J6m8L2aukX+A/L59LAYb/bWr"
Update4:
docker run -it kafkatest --network host
returns error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"--network\": executable file not found in $PATH": unknown.

Using a different base image resolved the issue, but I'm unsure which difference is causing the resolution. Here is my updated Dockerfile:
ARG OPENJDK_TAG=8u232
FROM openjdk:${OPENJDK_TAG}
ARG SBT_VERSION=1.4.1
# Install sbt
RUN \
mkdir /working/ && \
cd /working/ && \
curl -L -o sbt-$SBT_VERSION.deb https://dl.bintray.com/sbt/debian/sbt-$SBT_VERSION.deb && \
dpkg -i sbt-$SBT_VERSION.deb && \
rm sbt-$SBT_VERSION.deb && \
apt-get update && \
apt-get install sbt && \
cd && \
rm -r /working/ && \
sbt sbtVersion
COPY ["build.sbt", "/tmp/build/"]
COPY ["project/plugins.sbt", "project/build.properties", "/tmp/build/project/"]
COPY . /root/app/
WORKDIR /root/app
CMD ["sbt" , "run"]

Scaleway scw init Inside Docker Container

I am trying to make the Scaleway CLI installed as part of a docker image I'm building to run Azure Pipelines.
My Dockerfile looks like this:
FROM ubuntu:18.04
# To make it easier for build and release pipelines to run apt-get,
# configure apt to not require confirmation (assume the -y argument by default)
ENV DEBIAN_FRONTEND=noninteractive
RUN echo "APT::Get::Assume-Yes \"true\";" > /etc/apt/apt.conf.d/90assumeyes
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl \
jq \
git \
iputils-ping \
libcurl4 \
libicu60 \
libunwind8 \
netcat\
docker.io \
s3cmd
# Install Scaleway CLI
RUN curl -o /usr/local/bin/scw -L "https://github.com/scaleway/scaleway-cli/releases/download/v2.1.0/scw-2-1-0-linux-x86_64"
RUN chmod +x /usr/local/bin/scw
# Add config for Scaleway CLI
RUN mkdir -p ./config
RUN mkdir -p ./config/scw
COPY ./config/config.yaml $HOME/.config/scw/config.yaml
RUN scw init
# Add private key for SSH connections
COPY ./config/id_rsa $HOME/.ssh/id_rsa
# Config s3cmd
COPY ./config/.s3cfg $HOME/.s3cfg
WORKDIR /azp
COPY ./start.sh .
RUN chmod +x start.sh
CMD ["./start.sh"]
The key section being:
# Install Scaleway CLI
RUN curl -o /usr/local/bin/scw -L "https://github.com/scaleway/scaleway-cli/releases/download/v2.1.0/scw-2-1-0-linux-x86_64"
RUN chmod +x /usr/local/bin/scw
# Add config for Scaleway CLI
RUN mkdir -p ./config
RUN mkdir -p ./config/scw
COPY ./config/config.yaml $HOME/.config/scw/config.yaml
RUN scw init
The config.yaml file referenced above looks like the following (minus the real values of course):
access_key: <key>
secret_key: <secret>
default_organization_id: <orgId>
default_project_id: <projectId>
default_region: nl-ams
default_zone: nl-ams-1
However, when it executes RUN scw init, the output is Invalid email or secret-key: ''
I have tried without running scw init at all, but then calls to scw fail, saying
Access key is required
Details: Access_key can be initialised using the command "scw init".
After initialisation, there are three ways to provide access_key:
with the Scaleway config file, in the access_key key: /root/.config/scw/config.yaml;
with the SCW_ACCESS_KEY environement variable;
Note that the last method has the highest priority.
More info:
https://github.com/scaleway/scaleway-sdk-go/tree/master/scw#scaleway-config
Hint: You can get your credentials here:
https://console.scaleway.com/account/credentials
Which admittedly is one of the better error messages I've seen, but nonetheless has not helped me. I am going to try the Environment Variable approach, which I suspect may do the trick, but I'd still like to know what I'm doing wrong with this config.yaml file.
Lastly... someone with more rep than me needs to create the tag "scaleway". Hard to reference the actual technology in question when the tag doesn't exist.

Prepare coursier artifact for offline use inside container

I have an sbt project producing my artifact xyz.
I would like to put it along with all its dependencies in the docker container so it can be used using
coursier launch --mode offline xyz
The important part is that preparation should take use of local cursier cache from host.
I tried
executing sbt publishLocal,
then resolving my artifact dependencies (cursier resolve xyz),
then preparing to directories - local & cache - by copying resolved artifact into them
then copying those directories into docker container (as coursier cache and ivy local respectively).
This didn't work because coursier doesn't list .pom and .xml files in its output. I tried copying whole directories (abc/1.0.0 instead of abc/1.0.0/some.jar) but AFAIK there is no reliable way to know how many folders up one has to go because maven and ivy have different dir structures.

while my usecase is not quite identical to yours -- I figure I'd write up my findings and maybe my solution works for you as well!
here's my sample dockerfile, I used this to install scalafmt in an offline-compatible way
FROM ubuntu:jammy
RUN : \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* # */ stackoverflow highlighting bug
ARG CS=v2.1.0-RC4
ARG CS_SHA256=176e92e08ab292531aa0c4993dbc9f2c99dec79578752f3b9285f54f306db572
ARG JDK_SHA256=aef49cc7aa606de2044302e757fa94c8e144818e93487081c4fd319ca858134b
ENV PATH=/opt/coursier/bin:$PATH
RUN : \
&& curl --location --silent --output /tmp/cs.gz "https://github.com/coursier/coursier/releases/download/${CS}/cs-x86_64-pc-linux.gz" \
&& echo "${CS_SHA256} /tmp/cs.gz" | sha256sum --check \
&& curl --location --silent --output /tmp/jdk.tgz "https://download.java.net/openjdk/jdk17/ri/openjdk-17+35_linux-x64_bin.tar.gz" \
&& echo "${JDK_SHA256} /tmp/jdk.tgz" | sha256sum --check \
&& mkdir -p /opt/coursier \
&& tar --strip-components=1 -C /opt/coursier -xf /tmp/jdk.tgz \
&& gunzip /tmp/cs.gz \
&& mv /tmp/cs /opt/coursier/bin \
&& chmod +x /opt/coursier/bin/cs \
&& rm /tmp/jdk.tgz
ENV COURSIER_CACHE=/opt/.cs-cache
RUN : \
&& cs fetch scalafmt:3.6.1 \
&& cs install scalafmt:3.6.1 --dir /opt/wd/bin
the key to offline execution for me was to use cs fetch and set COURSIER_CACHE
here's the offline execution succeeding:
$ docker run --net=none --rm -ti cs /opt/wd/bin/scalafmt --version
scalafmt 3.6.1

Jar file is not executed in with Dockerfile

I am building docker image of my application. And I would like to run jar file when I run my docker image. However, I get this error:
Could not find or load main class
Main class is set in the manifest file of the jar file. If I run my jar file from terminal or bash script it works fine. So this error is only observed while running docker:
docker run -v my-volume:/workdir container-name
Are there some configurations missing in my Dockerfile or jar file should be copied/added?
Here is my Dockerfile:
FROM java:8
ENV SCALA_VERSION 2.11.8
ENV SBT_VERSION 1.1.1
ENV SPARK_VERSION 2.2.0
ENV SPARK_DIST spark-$SPARK_VERSION-bin-hadoop2.6
ENV SPARK_ARCH $SPARK_DIST.tgz
WORKDIR /opt
# Install Scala
RUN \
cd /root && \
curl -o scala-$SCALA_VERSION.tgz http://downloads.typesafe.com/scala/$SCALA_VERSION/scala-$SCALA_VERSION.tgz && \
tar -xf scala-$SCALA_VERSION.tgz && \
rm scala-$SCALA_VERSION.tgz && \
echo >> /root/.bashrc && \
echo 'export PATH=~/scala-$SCALA_VERSION/bin:$PATH' >> /root/.bashrc
# Install SBT
RUN \
curl -L -o sbt-$SBT_VERSION.deb https://dl.bintray.com/sbt/debian/sbt-$SBT_VERSION.deb && \
dpkg -i sbt-$SBT_VERSION.deb && \
rm sbt-$SBT_VERSION.deb
# Install Spark
RUN \
cd /opt && \
curl -o $SPARK_ARCH http://d3kbcqa49mib13.cloudfront.net/$SPARK_ARCH && \
tar xvfz $SPARK_ARCH && \
rm $SPARK_ARCH && \
echo 'export PATH=$SPARK_DIST/bin:$PATH' >> /root/.bashrc
EXPOSE 9851 9852 4040 9092 9200 9300 5601 7474 7687 7473
VOLUME /workdir
CMD java -cp "target/scala-2.11/demo_consumer.jar" consumer.SparkConsumer

I believe this is because the command you execute from the Docker container are not in the right folder. You could try to execute the commands from the workdir:
docker run -v my-volume:/workdir -w /workdir container_name
If that does not work, you could inspect what's inside the container. Either with a ls:
docker run -v my-volume:/workdir -w /workdir container_name bash -c 'ls -lah'
Or by accessing its bash session:
docker run -v my-volume:/workdir -w /workdir container_name bash
p.s: if bash does not work, try with sh.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

pyspark container- spark-submitting a pyspark script throws file not found error - pyspark

Added following env variables to the container and it works export PYSPARK_PYTHON=/usr/bin/python3.9 export PYSPARK_DRIVER_PYTHON=/usr/bin/python3.9

Related

How to make Python version executables global across multiple pyenv-virtualenv virtual environments

How to connect to Confluent Cloud from a Docker container

Scaleway scw init Inside Docker Container

Prepare coursier artifact for offline use inside container

Jar file is not executed in with Dockerfile

Categories

Resources