ValueError: can't read cfg files (sense2vec, reddit vectors) - valueerror

I am relatively new to NLP and mostly use Jupyter, please let me know what I'm doing wrong:
I followed all the instructions provided here:
https://github.com/explosion/sense2vec
but when I try to use the reddit_vectors as described here:
s2v = Sense2VecComponent(nlp.vocab).from_disk("/path/to/s2v_reddit_2015_md")
I get a ValueError as shown below:
ValueError Traceback (most recent call last)
<ipython-input-36-0d396d0145de> in <module>
----> 1 s2v=Sense2Vec().from_disk('reddit_vectors-1.1.0/vectors.bin/')
~/.conda/envs/NewEnv6/lib/python3.7/site-packages/sense2vec/sense2vec.py in from_disk(self, path,
exclude)
343 cache_path = path / "cache"
344 self.vectors = Vectors().from_disk(path)
--> 345 self.cfg.update(srsly.read_json(path / "cfg"))
346 if freqs_path.exists():
347 self.freqs = dict(srsly.read_json(freqs_path))
~/.conda/envs/NewEnv6/lib/python3.7/site-packages/srsly/_json_api.py in read_json(location)
48 data = sys.stdin.read()
49 return ujson.loads(data)
---> 50 file_path = force_path(location)
51 with file_path.open("r", encoding="utf8") as f:
52 return ujson.load(f)
~/.conda/envs/NewEnv6/lib/python3.7/site-packages/srsly/util.py in force_path(location,
require_exists)
19 location = Path(location)
20 if require_exists and not location.exists():
---> 21 raise ValueError("Can't read file: {}".format(location))
22 return location
23
ValueError: Can't read file: reddit_vectors-1.1.0/vectors.bin/cfg
*I installed all the appropriate versions of libraries/packages required in the requirements.txt

This is what worked for me:
If you are not using a virtual environment check if all the libraries from libraries/packages required in the requirements.txt actually got installed properly, in my case one of them was not properly installed.
The path should lead to the folder containing the cfg file. (After the last update I recommend using the entire computer path instead of navigating inside the same project)

Check your path to reddit_vectors-x.x.x folder.
Put your reddit_vectors-x.x.x folder in the same folder where your .py file is.
Use pathlib to be sure your path is correct.
from pathlib import Path
path = Path(__file__).parent.joinpath('reddit_vectors-1.1.0')
s2v.from_disk(path)
If you still get the error, delete your reddit_vectors-x.x.x folder and re untar/unzip the original reddit_vectors-x.x.x.tar.gz or reddit_vectors-x.x.x.zip

Related

Issue getting Github code to work on my own Jupyter Notebook

I am new to Python and would like your help please.
I am copying the NorthernBobwhiteCNN code from Github to try to use the program on my computer: https://github.com/GAMELab-UGA/NorthernBobwhiteCNN. I cloned the Github files as my own Jupyter Notebook that I launched from the Command Prompt.
However, when I try to run the cells in model_prediction_example.ipynb after the import statement cell, I receive multiple errors for all the cells and the code won't run, even though everything is the exact same from Github.
Here are the errors I get using Load Trained Model cell:
RuntimeError Traceback (most recent call last)
Input In [2], in <cell line: 8>()
4 model = net.Net(params).cuda() if torch.cuda.is_available() else net.Net(params)
7 restore_path = os.path.join(model_dir, 'pretrained.pth.tar')
----> 8 _ = utils.load_checkpoint(restore_path, model, optimizer=None)
File ~\NorthernBobwhiteCNN\PythonCode\utils.py:136, in load_checkpoint(checkpoint, model, optimizer)
134 if not os.path.exists(checkpoint):
135 raise("File doesn't exist {}".format(checkpoint))
--> 136 checkpoint = torch.load(checkpoint)
137 model.load_state_dict(checkpoint['state_dict'])
139 if optimizer:
File ~\anaconda3\lib\site-packages\torch\serialization.py:789, in load(f, map_location, pickle_module, weights_only, **pickle_load_args)
787 except RuntimeError as e:
788 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
--> 789 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
790 if weights_only:
791 try:
File ~\anaconda3\lib\site-packages\torch\serialization.py:1131, in _load(zip_file, map_location, pickle_module, pickle_file, **pickle_load_args)
1129 unpickler = UnpicklerWrapper(data_file, **pickle_load_args)
1130 unpickler.persistent_load = persistent_load
-> 1131 result = unpickler.load()
1133 torch._utils._validate_loaded_sparse_tensors()
1135 return result
File ~\anaconda3\lib\site-packages\torch\serialization.py:1101, in _load.<locals>.persistent_load(saved_id)
1099 if key not in loaded_storages:
1100 nbytes = numel * torch._utils._element_size(dtype)
-> 1101 load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
1103 return loaded_storages[key]
File ~\anaconda3\lib\site-packages\torch\serialization.py:1083, in _load.<locals>.load_tensor(dtype, numel, key, location)
1079 storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
1080 # TODO: Once we decide to break serialization FC, we can
1081 # stop wrapping with TypedStorage
1082 loaded_storages[key] = torch.storage.TypedStorage(
-> 1083 wrap_storage=restore_location(storage, location),
1084 dtype=dtype)
File ~\anaconda3\lib\site-packages\torch\serialization.py:215, in default_restore_location(storage, location)
213 def default_restore_location(storage, location):
214 for _, _, fn in _package_registry:
--> 215 result = fn(storage, location)
216 if result is not None:
217 return result
File ~\anaconda3\lib\site-packages\torch\serialization.py:182, in _cuda_deserialize(obj, location)
180 def _cuda_deserialize(obj, location):
181 if location.startswith('cuda'):
--> 182 device = validate_cuda_device(location)
183 if getattr(obj, "_torch_load_uninitialized", False):
184 with torch.cuda.device(device):
File ~\anaconda3\lib\site-packages\torch\serialization.py:166, in validate_cuda_device(location)
163 device = torch.cuda._utils._get_device_index(location, True)
165 if not torch.cuda.is_available():
--> 166 raise RuntimeError('Attempting to deserialize object on a CUDA '
167 'device but torch.cuda.is_available() is False. '
168 'If you are running on a CPU-only machine, '
169 'please use torch.load with map_location=torch.device(\'cpu\') '
170 'to map your storages to the CPU.')
171 device_count = torch.cuda.device_count()
172 if device >= device_count:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
I think that the errors are due to incorrect or missing library installations on my virtual environment.
First, I created the virtual environment "bobwhite" using conda create bobwhite in my command prompt.
Then, I did multiple conda installations based on the import statements in the model_prediction_example.ipynb.
from matplotlib import pyplot as plt
import librosa
import numpy as np
import os
from scipy import ndimage as ndi
from skimage.feature import peak_local_max
import torch
import utils
import model.net as net
So, I did the following installs in my command prompt:
conda install matplotlib
conda install -c conda-forge librosa
conda install numpy
conda install scipy
conda install scikit-image
conda install pytorch torchvision torchaudio cpuonly -c pytorch
conda install pip
pip install utils
However, I am not sure that I have installed the correct libraries to get the notebook to run. How would I find out which libraries are needed based on the import statements? Would I also need to import the libraries used in the python code net.py and utils.py as well? Additionally, I do not understand the import model.net as net statement. Is this referencing the net.py python script also found on the Github? If so, would I need to use a conda install for that, and how would I do it?
Change this line to
map_location = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
checkpoint = torch.load(checkpoint, map_location=map_location)
The existing code is trying to load to cuda even if it is not available. This will make sure it loads to CPU if cuda is not available.
This part of your error's stacktrace points out the issue
File ~\NorthernBobwhiteCNN\PythonCode\utils.py:136, in load_checkpoint(checkpoint, model, optimizer)
134 if not os.path.exists(checkpoint):
135 raise("File doesn't exist {}".format(checkpoint))
--> 136 checkpoint = torch.load(checkpoint)
137 model.load_state_dict(checkpoint['state_dict'])
139 if optimizer:

How do I resolve this pathlib _file_ name error in Jupyter?

---> 84 filepath = Path(__file__) # this not part of course, got tip from friend
85 data = {
86 "W": Path(f"{filepath.parent}/washington.csv"),
87 "C": Path(f"{filepath.parent}/chicago.csv"),
88 "N": Path(f"{filepath.parent}/new_york_city.csv"),
89 }
91 df = pd.read_csv(data[city].as_posix())
NameError: name '__file__' is not defined
I get this name error in Jupyter but it works fine in VSCode and PyCharm. The files are in the same directory as the script file.
How do I resolve it?
Googled pathlib error in Jupyter and come across a stack about that there is another way than pathlib. eg os.path. Anyone knows how to use this to solve my issue?
I found the answer to my own question from user "peng" in this stack:
path problem : NameError: name '__file__' is not defined
adding quotes " around _file solved the error
(__ file__) --> ("__ file__")

__import__() works with python3.6.5 but does not work with python3.7.3

The following script works fine with python3.6.5 but failed with python3.7.3 in windows 10 (home edition):
mod = __import__(pth, globals(), locals(), ['*'])
With python3.6.5 and python3.7.3, pth, globals() and locals() are almost same as follows (mPayment.py locates in scripts/demo):
pth ==> "scripts.demo.mPayment"
locals() ==> {'pth': 'scripts.demo.mPayment', 'self': <bop.bopModule.MasterModule object at 0x000002586EC3EC88>}
globals() ==> {...} #globals() has nothing to do with "scripts.demo.mPayment"
But with python3.7.3, I got exception:
Exception occurs in importing module demo.mPayment.
Traceback (most recent call last):
File "c:\xxx\b\Module.py", line 50, in _init
mod = __import__(pth, globals(), locals(), ['*'])
ModuleNotFoundError: No module named 'scripts.demo'
Anyone knows what is going on with the __import__() script in python 3.7.3 since it works with python3.6.5. Thanks a lot for your help.
ouyang
I just move the project root path in sys.path to the first one, then __import__() can find those modules.

Changing directory with os in python

I'm trying to change the directory to the folder that contains the folder I'm in.
That is, I'm in /Users/ethanfuerst/Documents/Coding/mpgdata and I'm trying to go back to /Users/ethanfuerst/Documents/Coding and access a text file in the Coding folder.
When I run this code:
import os
print(os.getcwd())
os.chdir('⁨Users/ethanfuerst/Documents/Coding')
I get this output and error
/Users/ethanfuerst/Documents/Coding/mpgdata
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-19-6c6ee01785f6> in <module>
1 import os
2 print(os.getcwd())
----> 3 os.chdir('⁨Users/ethanfuerst/Documents/Coding')
FileNotFoundError: [Errno 2] No such file or directory: '\u2068Users/ethanfuerst/Documents/Coding'
Does anyone know why this could be the case?
Edit: forgot to mention I am on a Mac
Try using C:\ in front of Users. If that does not work, try using .. in the directory specification to go back one directory.
Use double quotes os.chdir("⁨Users/ethanfuerst/Documents/Coding") instead of os.chdir('⁨Users/ethanfuerst/Documents/Coding')

iPython notebook not uploading my .txt file

I converted some excel files to .txt files (Text (tab delimited)), carefully saved them in the same folder as my notebook, made sure there were no copies anywhere on my computer, checked the .txt files could all be opened in Notebook, and after using the command '%pylab inline' to load all the functions I usually use, triple checked that mt .txt files were in the same folder as the notebook by using the 'ls' command. I then used loadtxt('filename.txt') to load my data and it gave me errno 2: no such file or directory: 'filename.txt'. I triple checked that I had spelled everything correctly and I can't understand why it's not working. I need these data analysing today for a lab report due in tomorrow - help! Here's my code:
%pylab inline
ls
loadtxt('Physical_lab_experiment_2_25_degrees_with_times.txt')
and here is the error:
FileNotFoundError Traceback (most recent call last)
<ipython-input-9-c8d929572f25> in <module>()
----> 1 loadtxt('Physical_lab_experiment_2_25_degrees_with_times.txt')
C:\Users\Daisy\Anaconda3\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
738 fh = iter(open(fname, 'U'))
739 else:
--> 740 fh = iter(open(fname))
741 else:
742 fh = iter(fname)
FileNotFoundError: [Errno 2] No such file or directory: 'Physical_lab_experiment_2_25_degrees_with_times.txt'
loadtxt is a numpy function. Import numpy like:
import numpy as np
data= np.loadtxt('Physical_lab_experiment_2_25_degrees_with_times.txt')
print data