I have generated a box file from a png image then I followed this tutorial:
https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/ to generate custom traineddata file.
I encountered an error when I tried to use the generated traineddata alongside with Pytesseract.
and i got this kind of error:
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (-4, "read_params_file:
Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt Error: LSTM requested, but not present!! Loading tesseract. mgr->GetComponent(TESSDATA_NORMPROTO, &fp)
:Error:Assert failed:in file adaptmatch.cpp, line 552")
I'm using Tesseract version 5.0
This is my config options
traineddata = f'+eng+lav+lav2'
config = f'-l {traineddata} --oem 1 --psm 3 {tessdata_dir}'
I followed the same tutorial and encountered the exact same error. At my first tries the ***.traineddata didn't generated well, and I findout that one file was missing (normproto). So I just cleaned all the generated files (except the corrected .box files) and rerun the train process, and everything worked fine on the second attempt.
Related
Currently I am working a project that has have been using the pathlib library so I can work on my Windows desktop when I need too and on my MacBook Pro. Essentially be able to work between both operating systems. I have not have any issues at all until right now. Here is the set up:
I have a pipeline set up to automatically save a .joblib and a whole lot of .png files that will go to a directory called
output_dir = Path('../Trained_Models/Differential_gene_analysis/A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus/Models train on TCGA data and test on Rodriguez data/Oct-XX-20XX')
For example, if I want to save a .joblib file under the name RandomForest_TumorThrombus_104.joblib,I would use the command
joblib.dump(model ,output_dir / 'RandomForest_TumorThrombus_104.joblib')
On my MacBook Pro, I have no issues when this is ran, but on Windows it gives me the following error
FileNotFoundError: [Errno 2] No such file or directory: '..\\Trained_Models\\Differential_gene_analysis\\A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus\\Models train on TCGA data and test on Rodriguez data\\Oct-17-2022\\RandomForest_TumorThrombus_104.joblib'
I have tried to use the .resolve() method to get the absolute path but still gives me the same error. I have tried to experiment to try to see what is goin on such as using os.path.exists(). When using the os.path.exists() method I get True for the follwoing command:
os.path.exists(output_dir)
So it does indeed recognize that the directory exists. The next thing I tried was to rename the file to something like dddddd.joblib and that worked. But I find that only a few names for the file would allow me to save the files. During debug I found that the most recent Traceback occurs here:
with open(filename, 'wb') as f:```
I was wondering if anyone here had any idea what was going on here and how I can fix this issue? Please and Thank you.
The solution was to enable long paths on Windows.
When I am trying to read many h5 files in one shoot. There is an OS error states like this:
OSError: Unable to open file (unable to open file: name = '/scratch-lustre/hpc-0227/deepcpgData/c{1,2,3,4,5,7,9,11,13}_*.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
I am sure all of these files exist and files' name corresponded with c{1,2,3,4,5,7,9,11,13}_*.h5 rightly. I am also using absolute path. My bash script looks like this:
data_dir="/scratch-lustre/hpc-0227/deepcpgData"
train_files="$data_dir/c{1,2,3,4,5,7,9,11,13}_*.h5"
It works when I use the full name of a single file, for example, c16_1572864-1605632.h5. However, I need to read a lot of files. So I have to read them in one round. I have searched many other answers but none of them deal with multiple files and the os error at the same time.
I tried running this code but an error occurred that the parameters couldn't be read:
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train .stderr
The error was:
"read_params_file: Can't open .stderr"
Try:-
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train.stderr
Instead:-
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train .stderr
Remove the space and try. It will work.
You instructed tesseract to read parameters/configuration from file .stderr. And tesseract is not able to open it/read from it. I guess it does not exist.
Hi would like to have multiple kickstart files which use a central kickstart file for the bulk of the install and a second file for the small differences. I'm building DVDs for distribution.
The first ks contains small config and has a %include line which points to a common ks file which should do most of the work.
I'm having trouble with %include line.
Fist of all have I understood what %include is for?
Second I think I have the syntax wrong because when I boot I get the following error message:
unable to open input kickstart file: Could not open/read file:///mnt/sysimage/media/dvd/ks/common.cfg
I am installing from a DVD what is the correct path or syntax to the files stored in a sub directory called /ks/ of the DVD's root?
I have tried the following:
%include /mnt/sysimage/media/dvd/ks/common.cfg
%include cdrom:/ks/common.cfg
Does anyone have any working examples?
Thanks in advance for your support
I eventually found part of the answer
%include /mnt/stage2/ks/common.cfg
The dvd is mounted as stage2
However I now get an error message saying it cant read the file
%%include
I can see the file and less it if I hit ctrl + alt + F1
Does anyone have a working simple example of how this should be written?
Open your isolinux/isolinux.cfg from the OS and give the ks file path as below . You can enter your kick start option in boot: prompt of dvd
label 1
kernel vmlinuz
append initrd=initrd.img nofb skipddc lang= devfs=nomount ramdisk_size=8192 ks=cdrom:/option1.cfg 1
label 2
kernel vmlinuz
append initrd=initrd.img nofb skipddc lang= devfs=nomount ramdisk_size=8192 ks=cdrom:/option2 2
label 3
kernel vmlinuz
append initrd=initrd.img nofb skipddc lang= devfs=nomount ramdisk_size=8192 ks=cdrom:/option3.cfg 3
Then edit /isolinux/boot.msg and add the enter the below details
Select installation:
1) option 1
2) option 2
3) option 3
I am using unoconv to convert an ods spreadsheet to a csv file.
Here is the command:
unoconv -vvv --doctype=spreadsheet --format=csv --output= ~/Dropbox
/mariners_site/textFiles/expenses.csv ~/Dropbox/Aldeburgh/expenses
/expenses.ods
It saves the output file in the same directory as the source file, not in the specified directory. The error message is:
Output file: /home/richard/Dropbox/mariners_site/textFiles/expenses.csv
unoconv: UnoException during export phase:
Unable to store document to file:///home/richard/Dropbox/mariners_site
/textFiles/expenses.csv (ErrCode 19468)
I'm sure that this worked initially, but it has since stopped.
I have checked for permissions and they are identical for both directories.
I translated ErrCode 19468 for you and it boils down to meaning ERRCODE_SFX_DOCUMENTREADONLY.
You can find more information about the specific meaning of LibreOffice ErrCode numbers from the unoconv documentation at: https://github.com/dagwieers/unoconv/blob/master/doc/errcode.adoc
The clue here is that you have a whitespace-character between --output= and the filename (--output= ~/Dropbox
/mariners_site/textFiles/expenses.csv) and because of that unoconv gets an empty output value (which means the current directory) and is given 2 files. And that explains why you get this specific error IMO