ModuleNotFoundError thrown by using pytest in virtualenv in python 3.10 - pytest

I am new to pytest and tried to write a first test case. Since I am new to programming overall, I developed without a virtual environment (naughty, naughty) using the globally installed python version 3.9.13.
The structure of my program is like this:
mypkg/
sub_mypkg/
file_a.py
pytest.ini
testing/
__init__.py
test_a.py
in which file_a.py imports pandas among other modules. The tests in test_a.py try among other things to run file_a.py.pytest.ini adds the root_dir to the PYTHONPATH.
In this setup pytest ran smoothly without any errors.
I then installed a virtualenv using python 3.11 in this project and installed all necessary modules (including pytest) in it and uninstalled pytest globally. After activating the virtualenv and running pytest from the terminal a ModuleNotFoundError was thrown for pandas.
Here is a list of the modules I use in venv:
Package Version
--------------- -------
attrs 22.1.0
colorama 0.4.6
contourpy 1.0.6
cycler 0.11.0
et-xmlfile 1.1.0
exceptiongroup 1.0.4
fonttools 4.38.0
iniconfig 1.1.1
kiwisolver 1.4.4
matplotlib 3.6.2
numpy 1.23.4
openpyxl 3.0.10
packaging 21.3
pandas 1.5.1
Pillow 9.3.0
pip 22.3.1
pluggy 1.0.0
pyparsing 3.0.9
pytest 7.2.0
python-dateutil 2.8.2
pytz 2022.6
PyYAML 6.0
setuptools 56.0.0
six 1.16.0
tomli 2.0.1
I checked that pandas was installed and I could import pandas in the REPL.
Also I could run the test line by line in the REPL.
Furthermore, I checked that the problem was not pandas itself. If I changed the placement of the import pandas with for example import numpy in file_a.py, it threw the ModuleNotFoundError for numpy.
I tried to use different versions of python in my venv (python 3.11.0, 3.10.7, 3.9.13, 3.9.6). Interestingly, pytest ran only inside the venv for python 3.9.13 (The one I developed it in).
I tried to include and delete __init__.py files in all directories and also tried different versions of pytest (6.2.5, 7.0.0).
I checked sys.path that the right root_dir was included.
Thanks in advance for your answers!

Related

Conda not using latest package version

I am using Anaconda (latest version of september). I have executed 'conda update --all' to get the latest packages, in particular Scipy. But when I import scipy from a python shell, it's only importing the 1.7.3 version, not the latest one (1.9.1).
I am not fluent with the conda framework. I tried running conda install scipy which did not change anything, and conda install scipy=1.9.1 (which seems to hang).
I am using the base environment. This is a fresh isntall of the latest Anconda (with package update via conda, not any use of pip that could interfere).
Listing the packages via conda yields:
>>> conda list scipy
# packages in environment at /home/***/anaconda3:
#
# Name Version Build Channel
scipy 1.7.3 py39hc147768_0
However, when I look at the content of the anaconda3/pkgs folder:
>>> ls anaconda3/pkgs/ | grep scipy
anaconda3/pkgs/scipy-1.7.3-py39hc147768_0.conda
anaconda3/pkgs/scipy-1.9.1-py39h14f4228_0.conda
anaconda3/pkgs/scipy-1.7.3-py39hc147768_0 (contains Scipy 1.7.3)
anaconda3/pkgs/scipy-1.9.1-py39h14f4228_0 (contains Scipy 1.9.1)
anaconda3/pkgs/scipy-1.9.1-py39h14f4228_09jfxaf1g (empty folder)
anaconda3/pkgs/scipy-1.9.1-py39h14f4228_0dydy5wnw (empty folder)
So I am assuming that conda has both Scipy 1.7.3 and 1.9.1. But why can't I import the lastest one ?
How may I correct this situation ?
EDIT: creating a new environment and reinstalling the packages as needed solves my problem. However, how come the base environment is stuck with the earlier version ?

Package list in EMR master node versus package list in EMR Notebook

I have one EMR cluster up and running. In it, I have one Jupyter Notebook with pyspark kernel.
For the master node, I am able to SSH into it. I am able to install Python packages in the master node easily, such as :
pip install pandas
which I can then verify successful with pip freeze
However, when I go to the pyspark notebook, using sc.list_packages(), I see a different list of packages in there. Some package has different version compared to in the master node. Some package (such as pandas) does not appear altogether.
Here is the list of pip freeze in master node SSH.
aws-cfn-bootstrap==2.0
beautifulsoup4==4.9.1
boto==2.49.0
click==7.1.2
Cython==0.29.30
docutils==0.14
jmespath==0.10.0
joblib==0.15.1
lockfile==0.11.0
lxml==4.5.1
mysqlclient==1.4.2
nltk==3.5
nose==1.3.4
numpy==1.21.6
pandas==1.3.5
py-dateutil==2.2
py4j==0.10.9.5
pybind11==2.9.2
pyspark==3.3.0
pystache==0.5.4
python-daemon==2.2.3
python-dateutil==2.8.2
python37-sagemaker-pyspark==1.3.0
pytz==2020.1
PyYAML==5.3.1
regex==2020.6.8
scipy==1.7.3
simplejson==3.2.0
six==1.13.0
soupsieve==1.9.5
tqdm==4.46.1
windmill==1.6
And here is the package list in the PySpark notebook using sc.list_packages():
aws-cfn-bootstrap (2.0)
beautifulsoup4 (4.9.1)
boto (2.49.0)
click (7.1.2)
docutils (0.14)
jmespath (0.10.0)
joblib (0.15.1)
lockfile (0.11.0)
lxml (4.5.1)
mysqlclient (1.4.2)
nltk (3.5)
nose (1.3.4)
numpy (1.16.5)
pip (9.0.1)
py-dateutil (2.2)
pystache (0.5.4)
python-daemon (2.2.3)
python37-sagemaker-pyspark (1.3.0)
pytz (2020.1)
PyYAML (5.3.1)
regex (2020.6.8)
setuptools (28.8.0)
simplejson (3.2.0)
six (1.13.0)
soupsieve (1.9.5)
tqdm (4.46.1)
UNKNOWN (1.3.5)
wheel (0.29.0)
windmill (1.6)
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
You are using pip version 9.0.1, however version 22.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Note that pandas, scipy and pip are different. Why are they different? How do I upgrade or update the list in the PySpark notebook?
Log into the master node and run sudo docker ps -a. You should see a container named something like emr/jupyter-notebook:6.0.3 and that's where your Jupyter Notebook is running; it is not running in the master node.
If you decide to install any packages in the master node, the Jupyter Notebook will not see them. This is the reason why your packages do not match. To install packages in the Jupyter Notebook I use a requirements file, which contains the packages I want to install, and invoke a bootstrap action script that installs those packages. An important detail is to make sure that if you do specify a package version then it must be supported by the Python version running in the container. To find out just run a step in the Jupyter Notebook:
import sys
print(sys.version)
To find the latest packages that go with a specific version of Python, I highly recommend using Anaconda. For example
conda create --name requests python=3.7.9 matplotlib
will tell me the latest version of matplotlib that works with Python 3.7.9

python module not accessible from EMR notebook

I am using an EMR notebook attached to my cluster for some experimentation purposes. I needed to install some python modules for testing, specifically spacy and it's data module en_core_web_sm.
I ssh'ed into the master and core nodes and downloaded the modules individually. However I am not able to import from the my EMR notebook. I get the following error :
An error was encountered:
No module named 'spacy'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'spacy'
I know there is a way to install them just for the scope of EMR notebook, but this wouldn't suffice in a production scenario, so please avoid answers which suggest notebook installing as mentioned in this guide : https://aws.amazon.com/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/
Please let me know if I am missing some setup steps. Appreciate your response.
You can use bootstraps to install additional modules while creating your EMR
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html
I was able to solve this by changing the bootstrap script to use sudo instead of --user. (You could also manually change run the scripts below)
Before I was running
pip3 install spacy --user
python3 -m spacy download en --user
I changed that script to
sudo pip3 install spacy
sudo python3 -m spacy download en
To verify this solution quickly issue the following commands from your EMR notebook (to compare before and after)
sc.list_packages()
You should see an output similar to this
SparkSession available as 'spark'.
Package Version
-------------------------- ----------
beautifulsoup4 4.9.0
blis 0.4.1
boto 2.49.0
catalogue 1.0.0
certifi 2020.4.5.2
chardet 3.0.4
cymem 2.0.3
en-core-web-sm 2.3.0
idna 2.9
importlib-metadata 1.6.1
jmespath 0.9.5
lxml 4.5.0
murmurhash 1.0.2
mysqlclient 1.4.2
nltk 3.4.5
nose 1.3.4
numpy 1.16.5
pip 9.0.1
plac 1.1.3
preshed 3.0.2
py-dateutil 2.2
python37-sagemaker-pyspark 1.3.0
pytz 2019.3
PyYAML 5.3.1
requests 2.24.0
setuptools 28.8.0
six 1.13.0
soupsieve 1.9.5
spacy 2.3.0
srsly 1.0.2
thinc 7.4.1
tqdm 4.46.1
urllib3 1.25.9
wasabi 0.6.0
wheel 0.29.0
windmill 1.6
zipp 3.1.0
This is not the best possible solution IMO, since the first warning that gets displayed after using sudo is
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
If anyone has a better solution please free to post.

How to execute the right Python to import the installed tensorflow.transform package?

The version of my Python is 2.7.13.
I run the following in Jupyter Notebook.
Firstly, I installed the packages
%%bash
pip uninstall -y google-cloud-dataflow
pip install --upgrade --force tensorflow_transform==0.15.0 apache-beam[gcp]
Then,
%%bash
pip freeze | grep -e 'flow\|beam'
I can see that the package tensorflow-transform is installed.
apache-beam==2.19.0
tensorflow==2.1.0
tensorflow-datasets==1.2.0
tensorflow-estimator==2.1.0
tensorflow-hub==0.6.0
tensorflow-io==0.8.1
tensorflow-metadata==0.15.2
tensorflow-probability==0.8.0
tensorflow-serving-api==2.1.0
tensorflow-transform==0.15.0
However when I tried to import it, there are warning and error.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/api/_v1/estimator/__init__.py:12: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.
ImportErrorTraceback (most recent call last)
<ipython-input-3-26a4792d0a76> in <module>()
1 import tensorflow as tf
----> 2 import tensorflow_transform as tft
3 import shutil
4 print(tf.__version__)
ImportError: No module named tensorflow_transform
After some investigation, I think I have some ideas of the problem.
I run this:
%%bash
pip show tensorflow_transform| grep Location
This is the output
Location: /home/jupyter/.local/lib/python3.5/site-packages
I tried to modify the $PATH by adding /home/jupyter/.local/lib/python3.5/site-packages to the beginning of $PATH. However, I still failed to import tensorflow_transform.
Based on the above and the following information, I think, when I ran the import command, it executes Python 2.7, not Python 3.5
import sys
print('\n'.join(sys.path))
/usr/lib/python2.7
/usr/lib/python2.7/plat-x86_64-linux-gnu
/usr/lib/python2.7/lib-tk
/usr/lib/python2.7/lib-old
/usr/lib/python2.7/lib-dynload
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
/usr/local/lib/python2.7/dist-packages/IPython/extensions
/home/jupyter/.ipython
Also,
import sys
sys.executable
'/usr/bin/python2'
I think the problem is tensorflow_transform package was installed in /home/jupyter/.local/lib/python3.5/site-packages. But when I run "Import", it goes to /usr/local/lib/python2.7/dist-packages to search for the package, rather than /home/jupyter/.local/lib/python3.5/site-packages, so even updating $PATH does not help. Am I right?
I tried to upgrade my python, but
%%bash
pip install upgrade python
Defaulting to user installation because normal site-packages is not writeable
Then, I added --user. It seems that the python is not really upgraded.
%%bash
pip install --user upgrade python
%%bash
python -V
Python 2.7.13
Any solution?
It seems to me that your jupyter notebook is not using the right python environment.
Perhaps, you installed the package under version 3.5,
but the Notebook uses the other one, thus it cannot find the library
You can pick the other interpreter by clicking on: Python(your version) - bottom left.
VS-Code - Select Python Environment 1
However you can do this also via:
CNTRL+SHIFT+P > Select Python Interpreter to start Jupyter Server
If that does not work make sure that the package you are trying to import is installed under the correct python environment.
If not open up a terminal, activate the environment and install it using:
pip install packagename
For example i did the same thing here: (Note: I'm using Anaconda)
installing tensorflow_transform
After a installation, you can import it in your code directly like this:
importing tensorflow_transform

Conda always opens root environment

If I 'source activate' a non-root environment (in my case "data"), then launch Jupyter notebook, the env seems to switch to root. I can tell because if I try to open a new python notebook, the dropdown under New says Python [Root]. I am also unable to import packages in my env, but not in root.
(data) Edwards-MacBook-Pro:~ mango$ conda list
# packages in environment at /Users/mango/anaconda/envs/data:
#
boto 2.42.0 py35_0
bz2file 0.98 py35_0
cycler 0.10.0 py35_0
freetype 2.5.5 1
gensim 0.12.4 np111py35_0
libpng 1.6.22 0
matplotlib 1.5.1 np111py35_0
mkl 11.3.3 0
numpy 1.11.1 py35_0
openssl 1.0.2i 0
pandas 0.18.1 np111py35_0
pip 8.1.2 py35_0
pyparsing 2.1.4 py35_0
pyqt 4.11.4 py35_4
python 3.5.2 0
python-dateutil 2.5.3 py35_0
pytz 2016.6.1 py35_0
qt 4.8.7 4
readline 6.2 2
requests 2.11.1 py35_0
scikit-learn 0.17.1 np111py35_2
scipy 0.18.1 np111py35_0
seaborn 0.7.1 py35_0
setuptools 27.2.0 py35_0
sip 4.18 py35_0
six 1.10.0 py35_0
smart_open 1.3.4 py35_0
sqlite 3.13.0 0
tk 8.5.18 0
wheel 0.29.0 py35_0
xz 5.2.2 0
zlib 1.2.8 3
(data) Edwards-MacBook-Pro:~ mango$ ipython
Python 3.5.2 |Anaconda custom (x86_64)| (default, Jul 2 2016, 17:52:12)
Type "copyright", "credits" or "license" for more information.
IPython 4.2.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import seaborn
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-085c0287ecb5> in <module>()
----> 1 import seaborn
ImportError: No module named 'seaborn'
In [2]:
The same behavior occurs with gensim, so it is not just seaborn.
I managed to solve the issue. Conda installs the [root] environment with ipython and jupyter. If you create an env, those are not available by default. So when creating and env, either be sure to list those packages explicitly, or clone root. Cloning root may make a bulkier env and it may be less desirable for production, but better for a sandbox context.
I discovered this issue by trying the same test above with python and finding that my packages indicated that I was in the data env. I then decided to try the Anaconda Navigator program with conda install anaconda-navigator. While I like a cli, this GUI based program seems like a better way to manage packages.