Conda not using latest package version - scipy

I am using Anaconda (latest version of september). I have executed 'conda update --all' to get the latest packages, in particular Scipy. But when I import scipy from a python shell, it's only importing the 1.7.3 version, not the latest one (1.9.1).
I am not fluent with the conda framework. I tried running conda install scipy which did not change anything, and conda install scipy=1.9.1 (which seems to hang).
I am using the base environment. This is a fresh isntall of the latest Anconda (with package update via conda, not any use of pip that could interfere).
Listing the packages via conda yields:
>>> conda list scipy
# packages in environment at /home/***/anaconda3:
#
# Name Version Build Channel
scipy 1.7.3 py39hc147768_0
However, when I look at the content of the anaconda3/pkgs folder:
>>> ls anaconda3/pkgs/ | grep scipy
anaconda3/pkgs/scipy-1.7.3-py39hc147768_0.conda
anaconda3/pkgs/scipy-1.9.1-py39h14f4228_0.conda
anaconda3/pkgs/scipy-1.7.3-py39hc147768_0 (contains Scipy 1.7.3)
anaconda3/pkgs/scipy-1.9.1-py39h14f4228_0 (contains Scipy 1.9.1)
anaconda3/pkgs/scipy-1.9.1-py39h14f4228_09jfxaf1g (empty folder)
anaconda3/pkgs/scipy-1.9.1-py39h14f4228_0dydy5wnw (empty folder)
So I am assuming that conda has both Scipy 1.7.3 and 1.9.1. But why can't I import the lastest one ?
How may I correct this situation ?
EDIT: creating a new environment and reinstalling the packages as needed solves my problem. However, how come the base environment is stuck with the earlier version ?

Related

ModuleNotFoundError thrown by using pytest in virtualenv in python 3.10

I am new to pytest and tried to write a first test case. Since I am new to programming overall, I developed without a virtual environment (naughty, naughty) using the globally installed python version 3.9.13.
The structure of my program is like this:
mypkg/
sub_mypkg/
file_a.py
pytest.ini
testing/
__init__.py
test_a.py
in which file_a.py imports pandas among other modules. The tests in test_a.py try among other things to run file_a.py.pytest.ini adds the root_dir to the PYTHONPATH.
In this setup pytest ran smoothly without any errors.
I then installed a virtualenv using python 3.11 in this project and installed all necessary modules (including pytest) in it and uninstalled pytest globally. After activating the virtualenv and running pytest from the terminal a ModuleNotFoundError was thrown for pandas.
Here is a list of the modules I use in venv:
Package Version
--------------- -------
attrs 22.1.0
colorama 0.4.6
contourpy 1.0.6
cycler 0.11.0
et-xmlfile 1.1.0
exceptiongroup 1.0.4
fonttools 4.38.0
iniconfig 1.1.1
kiwisolver 1.4.4
matplotlib 3.6.2
numpy 1.23.4
openpyxl 3.0.10
packaging 21.3
pandas 1.5.1
Pillow 9.3.0
pip 22.3.1
pluggy 1.0.0
pyparsing 3.0.9
pytest 7.2.0
python-dateutil 2.8.2
pytz 2022.6
PyYAML 6.0
setuptools 56.0.0
six 1.16.0
tomli 2.0.1
I checked that pandas was installed and I could import pandas in the REPL.
Also I could run the test line by line in the REPL.
Furthermore, I checked that the problem was not pandas itself. If I changed the placement of the import pandas with for example import numpy in file_a.py, it threw the ModuleNotFoundError for numpy.
I tried to use different versions of python in my venv (python 3.11.0, 3.10.7, 3.9.13, 3.9.6). Interestingly, pytest ran only inside the venv for python 3.9.13 (The one I developed it in).
I tried to include and delete __init__.py files in all directories and also tried different versions of pytest (6.2.5, 7.0.0).
I checked sys.path that the right root_dir was included.
Thanks in advance for your answers!

Package list in EMR master node versus package list in EMR Notebook

I have one EMR cluster up and running. In it, I have one Jupyter Notebook with pyspark kernel.
For the master node, I am able to SSH into it. I am able to install Python packages in the master node easily, such as :
pip install pandas
which I can then verify successful with pip freeze
However, when I go to the pyspark notebook, using sc.list_packages(), I see a different list of packages in there. Some package has different version compared to in the master node. Some package (such as pandas) does not appear altogether.
Here is the list of pip freeze in master node SSH.
aws-cfn-bootstrap==2.0
beautifulsoup4==4.9.1
boto==2.49.0
click==7.1.2
Cython==0.29.30
docutils==0.14
jmespath==0.10.0
joblib==0.15.1
lockfile==0.11.0
lxml==4.5.1
mysqlclient==1.4.2
nltk==3.5
nose==1.3.4
numpy==1.21.6
pandas==1.3.5
py-dateutil==2.2
py4j==0.10.9.5
pybind11==2.9.2
pyspark==3.3.0
pystache==0.5.4
python-daemon==2.2.3
python-dateutil==2.8.2
python37-sagemaker-pyspark==1.3.0
pytz==2020.1
PyYAML==5.3.1
regex==2020.6.8
scipy==1.7.3
simplejson==3.2.0
six==1.13.0
soupsieve==1.9.5
tqdm==4.46.1
windmill==1.6
And here is the package list in the PySpark notebook using sc.list_packages():
aws-cfn-bootstrap (2.0)
beautifulsoup4 (4.9.1)
boto (2.49.0)
click (7.1.2)
docutils (0.14)
jmespath (0.10.0)
joblib (0.15.1)
lockfile (0.11.0)
lxml (4.5.1)
mysqlclient (1.4.2)
nltk (3.5)
nose (1.3.4)
numpy (1.16.5)
pip (9.0.1)
py-dateutil (2.2)
pystache (0.5.4)
python-daemon (2.2.3)
python37-sagemaker-pyspark (1.3.0)
pytz (2020.1)
PyYAML (5.3.1)
regex (2020.6.8)
setuptools (28.8.0)
simplejson (3.2.0)
six (1.13.0)
soupsieve (1.9.5)
tqdm (4.46.1)
UNKNOWN (1.3.5)
wheel (0.29.0)
windmill (1.6)
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
You are using pip version 9.0.1, however version 22.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Note that pandas, scipy and pip are different. Why are they different? How do I upgrade or update the list in the PySpark notebook?
Log into the master node and run sudo docker ps -a. You should see a container named something like emr/jupyter-notebook:6.0.3 and that's where your Jupyter Notebook is running; it is not running in the master node.
If you decide to install any packages in the master node, the Jupyter Notebook will not see them. This is the reason why your packages do not match. To install packages in the Jupyter Notebook I use a requirements file, which contains the packages I want to install, and invoke a bootstrap action script that installs those packages. An important detail is to make sure that if you do specify a package version then it must be supported by the Python version running in the container. To find out just run a step in the Jupyter Notebook:
import sys
print(sys.version)
To find the latest packages that go with a specific version of Python, I highly recommend using Anaconda. For example
conda create --name requests python=3.7.9 matplotlib
will tell me the latest version of matplotlib that works with Python 3.7.9

python module not accessible from EMR notebook

I am using an EMR notebook attached to my cluster for some experimentation purposes. I needed to install some python modules for testing, specifically spacy and it's data module en_core_web_sm.
I ssh'ed into the master and core nodes and downloaded the modules individually. However I am not able to import from the my EMR notebook. I get the following error :
An error was encountered:
No module named 'spacy'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'spacy'
I know there is a way to install them just for the scope of EMR notebook, but this wouldn't suffice in a production scenario, so please avoid answers which suggest notebook installing as mentioned in this guide : https://aws.amazon.com/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/
Please let me know if I am missing some setup steps. Appreciate your response.
You can use bootstraps to install additional modules while creating your EMR
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html
I was able to solve this by changing the bootstrap script to use sudo instead of --user. (You could also manually change run the scripts below)
Before I was running
pip3 install spacy --user
python3 -m spacy download en --user
I changed that script to
sudo pip3 install spacy
sudo python3 -m spacy download en
To verify this solution quickly issue the following commands from your EMR notebook (to compare before and after)
sc.list_packages()
You should see an output similar to this
SparkSession available as 'spark'.
Package Version
-------------------------- ----------
beautifulsoup4 4.9.0
blis 0.4.1
boto 2.49.0
catalogue 1.0.0
certifi 2020.4.5.2
chardet 3.0.4
cymem 2.0.3
en-core-web-sm 2.3.0
idna 2.9
importlib-metadata 1.6.1
jmespath 0.9.5
lxml 4.5.0
murmurhash 1.0.2
mysqlclient 1.4.2
nltk 3.4.5
nose 1.3.4
numpy 1.16.5
pip 9.0.1
plac 1.1.3
preshed 3.0.2
py-dateutil 2.2
python37-sagemaker-pyspark 1.3.0
pytz 2019.3
PyYAML 5.3.1
requests 2.24.0
setuptools 28.8.0
six 1.13.0
soupsieve 1.9.5
spacy 2.3.0
srsly 1.0.2
thinc 7.4.1
tqdm 4.46.1
urllib3 1.25.9
wasabi 0.6.0
wheel 0.29.0
windmill 1.6
zipp 3.1.0
This is not the best possible solution IMO, since the first warning that gets displayed after using sudo is
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
If anyone has a better solution please free to post.

How to install TensorFlow on Python 3.7

How to install TensorFlow on Python 3.7
Trying:
D:\Users\Downloads>pip install tensorflow
ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow
Windows 10 OS
And with vent error, too
(venv) C:\Users\KvaksManYT>pip install --upgrade tensorflow
ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow
I would recommend using a virtual environment using pip install vitualenv. Then, depending on your OS, you want to create and activate an environment.
python3 -m venv /path/to/new/virtual/environment
Then, activate this environment using,
source ./venv/bin/activate
Now, you can install any Python packages you want.
pip install tensorflow==2.0.0
you can install Tensorflow follow those steps
Ubuntu/Linux /mac os /windows
virtualenv does not require a mention pip version
for system install, you need to mention pip version
upgrade pip version
pip install --upgrade pip
#virtualenv install
pip install --upgrade tensorflow
#system install
pip3 install --user --upgrade tensorflow
reference https://www.tensorflow.org/install/pip
I had the same problem with Windows 10 x64, and it was caused because I was using the wrong Python version, both globally and in the venv. I found questions on the issue multiple times on the internet, including yours.
Be sure to use Python versions 3.5-3.8, as per requirements, but also x64, not x32.
Namely, I ran into this error using both
a venv with 3.9.1 x64 (python --version),
and my globally installed 3.8.2 x32 (python3 --version).
So, I downloaded the x64-version of Python 3.8.6 from here.
Note that the command venv does not allow specifying the python version used in the virtual environment,
as per an answer on this question. So I used virtualenv, which I obviously had to install in my global Python version first.
To specify the Python version used in the venv, I used the command virtualenv, as in:
virtualenv --python="C:\Users\me\AppData\Local\Programs\Python\Python38\python.exe myvenv
where you have to give the path to the newly downloaded Python distribution you want to use, if there are several on your PC (for example, I had Python38-32 and Python39 folders in that directory).
Check Python versions in virtual environment
After I activate my myvenv, created as above, I verify the Python versions as follows:
python3 --version
> Python 3.8.2
python --version
> Python 3.8.6
Then, using the command
import struct
print(struct.calcsize("P") * 8)
Within either python3 or python, shows me whether the version is 32bit or 64bit, as per this answer. The python returns a 64, so that is the one you want to use (not python3).
Finally, within the virtual environment, you can run
pip install --upgrade tensorflow
and it will download and install. (Meanwhile, pip3 install --upgrade tensorflow would still return your error inside and outside the virtual enviroment.)

How to execute the right Python to import the installed tensorflow.transform package?

The version of my Python is 2.7.13.
I run the following in Jupyter Notebook.
Firstly, I installed the packages
%%bash
pip uninstall -y google-cloud-dataflow
pip install --upgrade --force tensorflow_transform==0.15.0 apache-beam[gcp]
Then,
%%bash
pip freeze | grep -e 'flow\|beam'
I can see that the package tensorflow-transform is installed.
apache-beam==2.19.0
tensorflow==2.1.0
tensorflow-datasets==1.2.0
tensorflow-estimator==2.1.0
tensorflow-hub==0.6.0
tensorflow-io==0.8.1
tensorflow-metadata==0.15.2
tensorflow-probability==0.8.0
tensorflow-serving-api==2.1.0
tensorflow-transform==0.15.0
However when I tried to import it, there are warning and error.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/api/_v1/estimator/__init__.py:12: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.
ImportErrorTraceback (most recent call last)
<ipython-input-3-26a4792d0a76> in <module>()
1 import tensorflow as tf
----> 2 import tensorflow_transform as tft
3 import shutil
4 print(tf.__version__)
ImportError: No module named tensorflow_transform
After some investigation, I think I have some ideas of the problem.
I run this:
%%bash
pip show tensorflow_transform| grep Location
This is the output
Location: /home/jupyter/.local/lib/python3.5/site-packages
I tried to modify the $PATH by adding /home/jupyter/.local/lib/python3.5/site-packages to the beginning of $PATH. However, I still failed to import tensorflow_transform.
Based on the above and the following information, I think, when I ran the import command, it executes Python 2.7, not Python 3.5
import sys
print('\n'.join(sys.path))
/usr/lib/python2.7
/usr/lib/python2.7/plat-x86_64-linux-gnu
/usr/lib/python2.7/lib-tk
/usr/lib/python2.7/lib-old
/usr/lib/python2.7/lib-dynload
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
/usr/local/lib/python2.7/dist-packages/IPython/extensions
/home/jupyter/.ipython
Also,
import sys
sys.executable
'/usr/bin/python2'
I think the problem is tensorflow_transform package was installed in /home/jupyter/.local/lib/python3.5/site-packages. But when I run "Import", it goes to /usr/local/lib/python2.7/dist-packages to search for the package, rather than /home/jupyter/.local/lib/python3.5/site-packages, so even updating $PATH does not help. Am I right?
I tried to upgrade my python, but
%%bash
pip install upgrade python
Defaulting to user installation because normal site-packages is not writeable
Then, I added --user. It seems that the python is not really upgraded.
%%bash
pip install --user upgrade python
%%bash
python -V
Python 2.7.13
Any solution?
It seems to me that your jupyter notebook is not using the right python environment.
Perhaps, you installed the package under version 3.5,
but the Notebook uses the other one, thus it cannot find the library
You can pick the other interpreter by clicking on: Python(your version) - bottom left.
VS-Code - Select Python Environment 1
However you can do this also via:
CNTRL+SHIFT+P > Select Python Interpreter to start Jupyter Server
If that does not work make sure that the package you are trying to import is installed under the correct python environment.
If not open up a terminal, activate the environment and install it using:
pip install packagename
For example i did the same thing here: (Note: I'm using Anaconda)
installing tensorflow_transform
After a installation, you can import it in your code directly like this:
importing tensorflow_transform