Read multiple h5 files but there is an os error couldn't find these files - h5py

When I am trying to read many h5 files in one shoot. There is an OS error states like this:
OSError: Unable to open file (unable to open file: name = '/scratch-lustre/hpc-0227/deepcpgData/c{1,2,3,4,5,7,9,11,13}_*.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
I am sure all of these files exist and files' name corresponded with c{1,2,3,4,5,7,9,11,13}_*.h5 rightly. I am also using absolute path. My bash script looks like this:
data_dir="/scratch-lustre/hpc-0227/deepcpgData"
train_files="$data_dir/c{1,2,3,4,5,7,9,11,13}_*.h5"
It works when I use the full name of a single file, for example, c16_1572864-1605632.h5. However, I need to read a lot of files. So I have to read them in one round. I have searched many other answers but none of them deal with multiple files and the os error at the same time.

Related

Relative Path (pathlib) name working on MAC OS but on Windows gives me a error

Currently I am working a project that has have been using the pathlib library so I can work on my Windows desktop when I need too and on my MacBook Pro. Essentially be able to work between both operating systems. I have not have any issues at all until right now. Here is the set up:
I have a pipeline set up to automatically save a .joblib and a whole lot of .png files that will go to a directory called
output_dir = Path('../Trained_Models/Differential_gene_analysis/A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus/Models train on TCGA data and test on Rodriguez data/Oct-XX-20XX')
For example, if I want to save a .joblib file under the name RandomForest_TumorThrombus_104.joblib,I would use the command
joblib.dump(model ,output_dir / 'RandomForest_TumorThrombus_104.joblib')
On my MacBook Pro, I have no issues when this is ran, but on Windows it gives me the following error
FileNotFoundError: [Errno 2] No such file or directory: '..\\Trained_Models\\Differential_gene_analysis\\A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus\\Models train on TCGA data and test on Rodriguez data\\Oct-17-2022\\RandomForest_TumorThrombus_104.joblib'
I have tried to use the .resolve() method to get the absolute path but still gives me the same error. I have tried to experiment to try to see what is goin on such as using os.path.exists(). When using the os.path.exists() method I get True for the follwoing command:
os.path.exists(output_dir)
So it does indeed recognize that the directory exists. The next thing I tried was to rename the file to something like dddddd.joblib and that worked. But I find that only a few names for the file would allow me to save the files. During debug I found that the most recent Traceback occurs here:
with open(filename, 'wb') as f:```
I was wondering if anyone here had any idea what was going on here and how I can fix this issue? Please and Thank you.
The solution was to enable long paths on Windows.

Error: `path` does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’

I imported my excel file into R Environment and saved the path by creating a new file in R scrip. However, when I tried to check my directory and load the dataset, I received the following message " Error: path does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’
What am I doing wrong here?
Thanks
Have you missed the format of your dataset, eg. csv, xlsx.
I suggest you first set your file as working directory, then the following code might help you with it.
Dat_customers <- readxl::read_excel("MIS_655_RS_T3_Wholesale_Customers.xlsx")

Deploying app, troubles to reffer to datasets. Streamlit

Hello i have one more problem with deploying my app by Streamlit. It works localy but when I want to upload it on git hub it doesnt work..Have no idea whats wrong. It seems that there is problem with path to the file:
"File "/app/streamlit/bobrza.py", line 14, in <module>
bobrza_locations = pd.read_csv(location)"
Here is link to my github repo. Will be very very grateful for help. Thank in advance.
https://github.com/Bordonous/streamlit
The problem is you are hard coding the path of the bobrza1.csv and route.csv to the path on your computer so when running the code on a different environment the path in not legal.
The solution is to make location independent from running environment, for this we will use the following:
__file__ variable - the path to the current python module (the .py file).
os.path.dirname() - a function to get directory name from path.
os.path.abspath() - a function to get a normalized absolutized version of path.
os.path.join() - a function to join one or more path components.
Now you need to change your location and location2 variables in the code to the following:
# get the absolute path to the directory contain the .csv file
dir_name = os.path.abspath(os.path.dirname(__file__))
# join the bobrza1.csv to directory to get file path
location = os.path.join(dir_name, 'bobrza1.csv')
# join the route.csv to directory to get file path
location2 = os.path.join(dir_name, 'route.csv')
Resulting in an independent path of the bobrza1.csv and route.csv.

Error when trying to use custom tessdata file

I have generated a box file from a png image then I followed this tutorial:
https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/ to generate custom traineddata file.
I encountered an error when I tried to use the generated traineddata alongside with Pytesseract.
and i got this kind of error:
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (-4, "read_params_file:
Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt Error: LSTM requested, but not present!! Loading tesseract. mgr->GetComponent(TESSDATA_NORMPROTO, &fp)
:Error:Assert failed:in file adaptmatch.cpp, line 552")
I'm using Tesseract version 5.0
This is my config options
traineddata = f'+eng+lav+lav2'
config = f'-l {traineddata} --oem 1 --psm 3 {tessdata_dir}'
I followed the same tutorial and encountered the exact same error. At my first tries the ***.traineddata didn't generated well, and I findout that one file was missing (normproto). So I just cleaned all the generated files (except the corrected .box files) and rerun the train process, and everything worked fine on the second attempt.

unoconv fails to save in my specified directory

I am using unoconv to convert an ods spreadsheet to a csv file.
Here is the command:
unoconv -vvv --doctype=spreadsheet --format=csv --output= ~/Dropbox
/mariners_site/textFiles/expenses.csv ~/Dropbox/Aldeburgh/expenses
/expenses.ods
It saves the output file in the same directory as the source file, not in the specified directory. The error message is:
Output file: /home/richard/Dropbox/mariners_site/textFiles/expenses.csv
unoconv: UnoException during export phase:
Unable to store document to file:///home/richard/Dropbox/mariners_site
/textFiles/expenses.csv (ErrCode 19468)
I'm sure that this worked initially, but it has since stopped.
I have checked for permissions and they are identical for both directories.
I translated ErrCode 19468 for you and it boils down to meaning ERRCODE_SFX_DOCUMENTREADONLY.
You can find more information about the specific meaning of LibreOffice ErrCode numbers from the unoconv documentation at: https://github.com/dagwieers/unoconv/blob/master/doc/errcode.adoc
The clue here is that you have a whitespace-character between --output= and the filename (--output= ~/Dropbox
/mariners_site/textFiles/expenses.csv) and because of that unoconv gets an empty output value (which means the current directory) and is given 2 files. And that explains why you get this specific error IMO