If I run in an ubuntu:20.04 docker image (docker run -it --rm ubuntu:20.04 bash) the following commands:
apt update
apt upgrade -y
apt install -y wget
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
source /root/.bashrc
export LD_LIBRARY_PATH=/root/miniconda3/lib:$LD_LIBRARY_PATH
then it breaks libp11-kit. For instance when running apt install vim:
/usr/lib/apt/methods/http: symbol lookup error: /lib/x86_64-linux-gnu/libp11-kit.so.0: undefined symbol: ffi_type_pointer, version LIBFFI_BASE_7.0
E: Method http has died unexpectedly!
E: Sub-process http returned an error code (127)
E: Method /usr/lib/apt/methods/http did not start correctly
I tried to add other directories to LD_LIBRARY_PATH (/usr/lib/, /lib/x86_64-linux-gnu/) without result.
Possibly related to conda-build fails to recognise libraries?
installing the following using pip in collab notebook :
pip install torch==1.10.1+cu102 \
torchvision==0.11.2+cu102 \
torchaudio==0.10.1 \
-f https://download.pytorch.org/whl/torch_stable.html \
detectron2==0.6 \
-f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.10/index.html \
git+https://github.com/facebookresearch/detectron2#main#subdirectory=projects/DensePose
after, restarting runtime I run:
!python ./apply_net.py dump densepose_rcnn_R_101_FPN_DL_s1x.yaml
https://dl.fbaipublicfiles.com/densepose/densepose_rcnn_R_101_FPN_DL_s1x/165712116/model_final_844d15.pkl
./Tejrab/200404.png --output ./dump/200404.pkl -v
Expected behavior:
pkl file of the png image
Error
Traceback (most recent call last):
File "./apply_net.py", line 19, in
from densepose import add_densepose_config
ModuleNotFoundError: No module named 'densepose'
after installing densepose this error occured
2. ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-g43tc9ql/detectron2_0a570229969d47198f45a267ccc78493/setup.py'"'"'; file='"'"'/tmp/pip-install-g43tc9ql/detectron2_0a570229969d47198f45a267ccc78493/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-px39tp0q/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.7/detectron2 Check the logs for full command output.
PS
a new commit to apply_net.py has been added by authors c54429b could it be the reason ?
!pip install pyyaml==5.1
!pip install ninja
!pip install av
Collecting psycopg2 Using cached psycopg2-2.8.5.tar.gz (380 kB)
ERROR: Command errored out with exit status 1: command: /home/ubuntu/egrdb/env/bin/python -c 'import sys, setuptools, tokenize; sys.argv0 = '"'"'/tmp/pip-install-gn70jweq/psycopg2/setup.py'"'"'; file='"'"'/tmp/pip-install-gn70jweq/psycopg2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-4tr67ll8 cwd: /tmp/pip-install-gn70jweq/psycopg2/ Complete output (7 lines): running egg_info creating /tmp/pip-pip-egg-info-4tr67ll8/psycopg2.egg-info writing /tmp/pip-pip-egg-info-4tr67ll8/psycopg2.egg-info/PKG-INFO writing dependency_links to /tmp/pip-pip-egg-info-4tr67ll8/psycopg2.egg-info/dependency_links.txt writing top-level names to /tmp/pip-pip-egg-info-4tr67ll8/psycopg2.egg-info/top_level.txt writing manifest file '/tmp/pip-pip-egg-info-4tr67ll8/psycopg2.egg-info/SOURCES.txt' Error: b'You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application.\n'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
If you want to install the source version psycopg2-2.8.5.tar.gz then you will need to do as #Chris says. The simpler way though is to do:
pip install psycopg2-binary
Then you get a pre-compiled version and you don't need the -dev packages.
I launch an EMR cluster with boto3 from a separate ec2 instance and use a bootstrapping script that looks like this:
#!/bin/bash
############################################################################
#For all nodes including master #########
############################################################################
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b -p /mnt1/anaconda3
export PATH=/mnt1/anaconda3/bin:$PATH
echo "export PATH="/mnt1/anaconda3/bin:$PATH"" >> ~/.bash_profile
sudo sed -i -e '$a\export PYSPARK_PYTHON=/mnt1/anaconda3/bin/python' /etc/spark/conf/spark-env.sh
echo "export PYSPARK_PYTHON="/mnt1/anaconda3/bin/python3"" >> ~/.bash_profile
conda install -c conda-forge -y shap
conda install -c conda-forge -y lightgbm
conda install -c anaconda -y numpy
conda install -c anaconda -y pandas
conda install -c conda-forge -y pyarrow
conda install -c anaconda -y boto3
############################################################################
#For master #########
############################################################################
if [ `grep 'isMaster' /mnt/var/lib/info/instance.json | awk -F ':' '{print $2}' | awk -F ',' '{print $1}'` = 'true' ]; then
sudo sed -i -e '$a\export PYSPARK_PYTHON=/mnt1/anaconda3/bin/python' /etc/spark/conf/spark-env.sh
echo "export PYSPARK_PYTHON="/mnt1/anaconda3/bin/python3"" >> ~/.bash_profile
sudo yum -y install git-core
conda install -c conda-forge -y jupyterlab
conda install -y jupyter
conda install -c conda-forge -y s3fs
conda install -c conda-forge -y nodejs
pip install spark-df-profiling
jupyter labextension install jupyterlab_filetree
jupyter labextension install #jupyterlab/toc
fi
Then I add a step programatically to the running cluster using add_job_flow_steps
action = conn.add_job_flow_steps(JobFlowId=curr_cluster_id, Steps=layer_function_steps)
The step is a spark-submit that is perfectly formed.
In one of the imported python files I import boto3. The error I get is
ImportError: No module named boto3
Clearly I am installing this library. If I SSH into the master node and run
python
import boto3
it works fine.Is there some kind of issue with spark-submit using the installed libraries since I am doing a conda install?
AWS has a project (AWS Data Wrangler) that helps with EMR launching.
This snippet should work to launch your cluster with Python 3 enabled:
import awswrangler as wr
cluster_id = wr.emr.create_cluster(
cluster_name="wrangler_cluster",
logging_s3_path=f"s3://BUCKET_NAME/emr-logs/",
emr_release="emr-5.28.0",
subnet_id="SUBNET_ID",
emr_ec2_role="EMR_EC2_DefaultRole",
emr_role="EMR_DefaultRole",
instance_type_master="m5.xlarge",
instance_type_core="m5.xlarge",
instance_type_task="m5.xlarge",
instance_ebs_size_master=50,
instance_ebs_size_core=50,
instance_ebs_size_task=50,
instance_num_on_demand_master=1,
instance_num_on_demand_core=1,
instance_num_on_demand_task=1,
instance_num_spot_master=0,
instance_num_spot_core=1,
instance_num_spot_task=1,
spot_bid_percentage_of_on_demand_master=100,
spot_bid_percentage_of_on_demand_core=100,
spot_bid_percentage_of_on_demand_task=100,
spot_provisioning_timeout_master=5,
spot_provisioning_timeout_core=5,
spot_provisioning_timeout_task=5,
spot_timeout_to_on_demand_master=True,
spot_timeout_to_on_demand_core=True,
spot_timeout_to_on_demand_task=True,
python3=True, # Relevant argument
spark_glue_catalog=True,
hive_glue_catalog=True,
presto_glue_catalog=True,
bootstraps_paths=["s3://BUCKET_NAME/bootstrap.sh"], # Relevant argument
debugging=True,
applications=["Hadoop", "Spark", "Ganglia", "Hive"],
visible_to_all_users=True,
key_pair_name=None,
spark_jars_path=[f"s3://...jar"],
maximize_resource_allocation=True,
keep_cluster_alive_when_no_steps=True,
termination_protected=False,
spark_pyarrow=True, # Relevant argument
tags={
"foo": "boo"
}
)
bootstrap.sh content:
#!/usr/bin/env bash
set -e
echo "Installing Python libraries..."
sudo pip-3.6 install -U awswrangler
sudo pip-3.6 install -U LIBRARY1
sudo pip-3.6 install -U LIBRARY2
...
I try to install a new library in Swift on Google Colab
%install '.package(url: "https://github.com/IBM-Swift/BlueCryptor.git", from: "1.0.28")' Cryptor
Then, there is an error
...
error: toolchain is invalid: could not find the `swiftc` at expected path /swift/toolchain/usr/bin/swiftc
Install Error: swift-build returned nonzero exit code 1.
But I check that swiftc does exist in /swift/toolchain/usr/bin.
Here's a Colab notebook
that demonstrate the error.
Please help.
Now I update it to the latest Swift version. By running this notebook:
https://colab.research.google.com/github/tensorflow/swift/blob/master/notebooks/install_latest_swift.ipynb
There's no error anymore. So, just need to update it.
You can also do it all from the Swift notebook as well, using this code.
import Python
Python.import("subprocess").getoutput("""
rm -rf /swift
mkdir -p /swift/toolchain
wget -nv -O- https://storage.googleapis.com/s4tf-kokoro-artifact-testing/latest/swift-tensorflow-DEVELOPMENT-cuda10.0-cudnn7-ubuntu18.04.tar.gz | tar xzf - -C /swift/toolchain
wget -nv -O- https://storage.googleapis.com/s4tf-kokoro-artifact-testing/latest/swift-jupyter.tar.gz | tar xzf - -C /swift
python3 /swift/swift-jupyter/register.py --swift-toolchain /swift/toolchain
apt-get install libblocksruntime-dev
""")