Error when trying to use custom tessdata file - tesseract

I have generated a box file from a png image then I followed this tutorial:
https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/ to generate custom traineddata file.
I encountered an error when I tried to use the generated traineddata alongside with Pytesseract.
and i got this kind of error:
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (-4, "read_params_file:
Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt Error: LSTM requested, but not present!! Loading tesseract. mgr->GetComponent(TESSDATA_NORMPROTO, &fp)
:Error:Assert failed:in file adaptmatch.cpp, line 552")
I'm using Tesseract version 5.0
This is my config options
traineddata = f'+eng+lav+lav2'
config = f'-l {traineddata} --oem 1 --psm 3 {tessdata_dir}'

I followed the same tutorial and encountered the exact same error. At my first tries the ***.traineddata didn't generated well, and I findout that one file was missing (normproto). So I just cleaned all the generated files (except the corrected .box files) and rerun the train process, and everything worked fine on the second attempt.

Related

Relative Path (pathlib) name working on MAC OS but on Windows gives me a error

Currently I am working a project that has have been using the pathlib library so I can work on my Windows desktop when I need too and on my MacBook Pro. Essentially be able to work between both operating systems. I have not have any issues at all until right now. Here is the set up:
I have a pipeline set up to automatically save a .joblib and a whole lot of .png files that will go to a directory called
output_dir = Path('../Trained_Models/Differential_gene_analysis/A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus/Models train on TCGA data and test on Rodriguez data/Oct-XX-20XX')
For example, if I want to save a .joblib file under the name RandomForest_TumorThrombus_104.joblib,I would use the command
joblib.dump(model ,output_dir / 'RandomForest_TumorThrombus_104.joblib')
On my MacBook Pro, I have no issues when this is ran, but on Windows it gives me the following error
FileNotFoundError: [Errno 2] No such file or directory: '..\\Trained_Models\\Differential_gene_analysis\\A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus\\Models train on TCGA data and test on Rodriguez data\\Oct-17-2022\\RandomForest_TumorThrombus_104.joblib'
I have tried to use the .resolve() method to get the absolute path but still gives me the same error. I have tried to experiment to try to see what is goin on such as using os.path.exists(). When using the os.path.exists() method I get True for the follwoing command:
os.path.exists(output_dir)
So it does indeed recognize that the directory exists. The next thing I tried was to rename the file to something like dddddd.joblib and that worked. But I find that only a few names for the file would allow me to save the files. During debug I found that the most recent Traceback occurs here:
with open(filename, 'wb') as f:```
I was wondering if anyone here had any idea what was going on here and how I can fix this issue? Please and Thank you.
The solution was to enable long paths on Windows.

Read multiple h5 files but there is an os error couldn't find these files

When I am trying to read many h5 files in one shoot. There is an OS error states like this:
OSError: Unable to open file (unable to open file: name = '/scratch-lustre/hpc-0227/deepcpgData/c{1,2,3,4,5,7,9,11,13}_*.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
I am sure all of these files exist and files' name corresponded with c{1,2,3,4,5,7,9,11,13}_*.h5 rightly. I am also using absolute path. My bash script looks like this:
data_dir="/scratch-lustre/hpc-0227/deepcpgData"
train_files="$data_dir/c{1,2,3,4,5,7,9,11,13}_*.h5"
It works when I use the full name of a single file, for example, c16_1572864-1605632.h5. However, I need to read a lot of files. So I have to read them in one round. I have searched many other answers but none of them deal with multiple files and the os error at the same time.

I tried running .box extension file in tesseract but got "read_params_file: Can't open .stderr"

I tried running this code but an error occurred that the parameters couldn't be read:
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train .stderr
The error was:
"read_params_file: Can't open .stderr"
Try:-
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train.stderr
Instead:-
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train .stderr
Remove the space and try. It will work.
You instructed tesseract to read parameters/configuration from file .stderr. And tesseract is not able to open it/read from it. I guess it does not exist.

%include centos kickstart unable to open input kickstart file

Hi would like to have multiple kickstart files which use a central kickstart file for the bulk of the install and a second file for the small differences. I'm building DVDs for distribution.
The first ks contains small config and has a %include line which points to a common ks file which should do most of the work.
I'm having trouble with %include line.
Fist of all have I understood what %include is for?
Second I think I have the syntax wrong because when I boot I get the following error message:
unable to open input kickstart file: Could not open/read file:///mnt/sysimage/media/dvd/ks/common.cfg
I am installing from a DVD what is the correct path or syntax to the files stored in a sub directory called /ks/ of the DVD's root?
I have tried the following:
%include /mnt/sysimage/media/dvd/ks/common.cfg
%include cdrom:/ks/common.cfg
Does anyone have any working examples?
Thanks in advance for your support
I eventually found part of the answer
%include /mnt/stage2/ks/common.cfg
The dvd is mounted as stage2
However I now get an error message saying it cant read the file
%%include
I can see the file and less it if I hit ctrl + alt + F1
Does anyone have a working simple example of how this should be written?
Open your isolinux/isolinux.cfg from the OS and give the ks file path as below . You can enter your kick start option in boot: prompt of dvd
label 1
kernel vmlinuz
append initrd=initrd.img nofb skipddc lang= devfs=nomount ramdisk_size=8192 ks=cdrom:/option1.cfg 1
label 2
kernel vmlinuz
append initrd=initrd.img nofb skipddc lang= devfs=nomount ramdisk_size=8192 ks=cdrom:/option2 2
label 3
kernel vmlinuz
append initrd=initrd.img nofb skipddc lang= devfs=nomount ramdisk_size=8192 ks=cdrom:/option3.cfg 3
Then edit /isolinux/boot.msg and add the enter the below details
Select installation:
1) option 1
2) option 2
3) option 3

unoconv fails to save in my specified directory

I am using unoconv to convert an ods spreadsheet to a csv file.
Here is the command:
unoconv -vvv --doctype=spreadsheet --format=csv --output= ~/Dropbox
/mariners_site/textFiles/expenses.csv ~/Dropbox/Aldeburgh/expenses
/expenses.ods
It saves the output file in the same directory as the source file, not in the specified directory. The error message is:
Output file: /home/richard/Dropbox/mariners_site/textFiles/expenses.csv
unoconv: UnoException during export phase:
Unable to store document to file:///home/richard/Dropbox/mariners_site
/textFiles/expenses.csv (ErrCode 19468)
I'm sure that this worked initially, but it has since stopped.
I have checked for permissions and they are identical for both directories.
I translated ErrCode 19468 for you and it boils down to meaning ERRCODE_SFX_DOCUMENTREADONLY.
You can find more information about the specific meaning of LibreOffice ErrCode numbers from the unoconv documentation at: https://github.com/dagwieers/unoconv/blob/master/doc/errcode.adoc
The clue here is that you have a whitespace-character between --output= and the filename (--output= ~/Dropbox
/mariners_site/textFiles/expenses.csv) and because of that unoconv gets an empty output value (which means the current directory) and is given 2 files. And that explains why you get this specific error IMO