When I am trying to read many h5 files in one shoot. There is an OS error states like this:
OSError: Unable to open file (unable to open file: name = '/scratch-lustre/hpc-0227/deepcpgData/c{1,2,3,4,5,7,9,11,13}_*.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
I am sure all of these files exist and files' name corresponded with c{1,2,3,4,5,7,9,11,13}_*.h5 rightly. I am also using absolute path. My bash script looks like this:
data_dir="/scratch-lustre/hpc-0227/deepcpgData"
train_files="$data_dir/c{1,2,3,4,5,7,9,11,13}_*.h5"
It works when I use the full name of a single file, for example, c16_1572864-1605632.h5. However, I need to read a lot of files. So I have to read them in one round. I have searched many other answers but none of them deal with multiple files and the os error at the same time.
Related
Currently I am working a project that has have been using the pathlib library so I can work on my Windows desktop when I need too and on my MacBook Pro. Essentially be able to work between both operating systems. I have not have any issues at all until right now. Here is the set up:
I have a pipeline set up to automatically save a .joblib and a whole lot of .png files that will go to a directory called
output_dir = Path('../Trained_Models/Differential_gene_analysis/A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus/Models train on TCGA data and test on Rodriguez data/Oct-XX-20XX')
For example, if I want to save a .joblib file under the name RandomForest_TumorThrombus_104.joblib,I would use the command
joblib.dump(model ,output_dir / 'RandomForest_TumorThrombus_104.joblib')
On my MacBook Pro, I have no issues when this is ran, but on Windows it gives me the following error
FileNotFoundError: [Errno 2] No such file or directory: '..\\Trained_Models\\Differential_gene_analysis\\A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus\\Models train on TCGA data and test on Rodriguez data\\Oct-17-2022\\RandomForest_TumorThrombus_104.joblib'
I have tried to use the .resolve() method to get the absolute path but still gives me the same error. I have tried to experiment to try to see what is goin on such as using os.path.exists(). When using the os.path.exists() method I get True for the follwoing command:
os.path.exists(output_dir)
So it does indeed recognize that the directory exists. The next thing I tried was to rename the file to something like dddddd.joblib and that worked. But I find that only a few names for the file would allow me to save the files. During debug I found that the most recent Traceback occurs here:
with open(filename, 'wb') as f:```
I was wondering if anyone here had any idea what was going on here and how I can fix this issue? Please and Thank you.
The solution was to enable long paths on Windows.
I imported my excel file into R Environment and saved the path by creating a new file in R scrip. However, when I tried to check my directory and load the dataset, I received the following message " Error: path does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’
What am I doing wrong here?
Thanks
Have you missed the format of your dataset, eg. csv, xlsx.
I suggest you first set your file as working directory, then the following code might help you with it.
Dat_customers <- readxl::read_excel("MIS_655_RS_T3_Wholesale_Customers.xlsx")
Hello i have one more problem with deploying my app by Streamlit. It works localy but when I want to upload it on git hub it doesnt work..Have no idea whats wrong. It seems that there is problem with path to the file:
"File "/app/streamlit/bobrza.py", line 14, in <module>
bobrza_locations = pd.read_csv(location)"
Here is link to my github repo. Will be very very grateful for help. Thank in advance.
https://github.com/Bordonous/streamlit
The problem is you are hard coding the path of the bobrza1.csv and route.csv to the path on your computer so when running the code on a different environment the path in not legal.
The solution is to make location independent from running environment, for this we will use the following:
__file__ variable - the path to the current python module (the .py file).
os.path.dirname() - a function to get directory name from path.
os.path.abspath() - a function to get a normalized absolutized version of path.
os.path.join() - a function to join one or more path components.
Now you need to change your location and location2 variables in the code to the following:
# get the absolute path to the directory contain the .csv file
dir_name = os.path.abspath(os.path.dirname(__file__))
# join the bobrza1.csv to directory to get file path
location = os.path.join(dir_name, 'bobrza1.csv')
# join the route.csv to directory to get file path
location2 = os.path.join(dir_name, 'route.csv')
Resulting in an independent path of the bobrza1.csv and route.csv.
I have generated a box file from a png image then I followed this tutorial:
https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/ to generate custom traineddata file.
I encountered an error when I tried to use the generated traineddata alongside with Pytesseract.
and i got this kind of error:
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (-4, "read_params_file:
Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt Error: LSTM requested, but not present!! Loading tesseract. mgr->GetComponent(TESSDATA_NORMPROTO, &fp)
:Error:Assert failed:in file adaptmatch.cpp, line 552")
I'm using Tesseract version 5.0
This is my config options
traineddata = f'+eng+lav+lav2'
config = f'-l {traineddata} --oem 1 --psm 3 {tessdata_dir}'
I followed the same tutorial and encountered the exact same error. At my first tries the ***.traineddata didn't generated well, and I findout that one file was missing (normproto). So I just cleaned all the generated files (except the corrected .box files) and rerun the train process, and everything worked fine on the second attempt.
I am using unoconv to convert an ods spreadsheet to a csv file.
Here is the command:
unoconv -vvv --doctype=spreadsheet --format=csv --output= ~/Dropbox
/mariners_site/textFiles/expenses.csv ~/Dropbox/Aldeburgh/expenses
/expenses.ods
It saves the output file in the same directory as the source file, not in the specified directory. The error message is:
Output file: /home/richard/Dropbox/mariners_site/textFiles/expenses.csv
unoconv: UnoException during export phase:
Unable to store document to file:///home/richard/Dropbox/mariners_site
/textFiles/expenses.csv (ErrCode 19468)
I'm sure that this worked initially, but it has since stopped.
I have checked for permissions and they are identical for both directories.
I translated ErrCode 19468 for you and it boils down to meaning ERRCODE_SFX_DOCUMENTREADONLY.
You can find more information about the specific meaning of LibreOffice ErrCode numbers from the unoconv documentation at: https://github.com/dagwieers/unoconv/blob/master/doc/errcode.adoc
The clue here is that you have a whitespace-character between --output= and the filename (--output= ~/Dropbox
/mariners_site/textFiles/expenses.csv) and because of that unoconv gets an empty output value (which means the current directory) and is given 2 files. And that explains why you get this specific error IMO