I tried running .box extension file in tesseract but got "read_params_file: Can't open .stderr" - tesseract

I tried running this code but an error occurred that the parameters couldn't be read:
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train .stderr
The error was:
"read_params_file: Can't open .stderr"

Try:-
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train.stderr
Instead:-
tesseract eng.font-name.exp0.tif eng.font-name.box nobatch box.train .stderr
Remove the space and try. It will work.

You instructed tesseract to read parameters/configuration from file .stderr. And tesseract is not able to open it/read from it. I guess it does not exist.

Related

Relative Path (pathlib) name working on MAC OS but on Windows gives me a error

Currently I am working a project that has have been using the pathlib library so I can work on my Windows desktop when I need too and on my MacBook Pro. Essentially be able to work between both operating systems. I have not have any issues at all until right now. Here is the set up:
I have a pipeline set up to automatically save a .joblib and a whole lot of .png files that will go to a directory called
output_dir = Path('../Trained_Models/Differential_gene_analysis/A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus/Models train on TCGA data and test on Rodriguez data/Oct-XX-20XX')
For example, if I want to save a .joblib file under the name RandomForest_TumorThrombus_104.joblib,I would use the command
joblib.dump(model ,output_dir / 'RandomForest_TumorThrombus_104.joblib')
On my MacBook Pro, I have no issues when this is ran, but on Windows it gives me the following error
FileNotFoundError: [Errno 2] No such file or directory: '..\\Trained_Models\\Differential_gene_analysis\\A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus\\Models train on TCGA data and test on Rodriguez data\\Oct-17-2022\\RandomForest_TumorThrombus_104.joblib'
I have tried to use the .resolve() method to get the absolute path but still gives me the same error. I have tried to experiment to try to see what is goin on such as using os.path.exists(). When using the os.path.exists() method I get True for the follwoing command:
os.path.exists(output_dir)
So it does indeed recognize that the directory exists. The next thing I tried was to rename the file to something like dddddd.joblib and that worked. But I find that only a few names for the file would allow me to save the files. During debug I found that the most recent Traceback occurs here:
with open(filename, 'wb') as f:```
I was wondering if anyone here had any idea what was going on here and how I can fix this issue? Please and Thank you.
The solution was to enable long paths on Windows.

Libreoffice Java API failing to export PDF

I have Java code that exports ODT file to PDF. This is working fine in Windows and MacOS but failing in Linux Mint 19.3. LibreOffice version 6.4.4.2. I can reproduce the same error using the DocumentConverter sample class. So I don't think I am doing something wrong in my code. The error is when storeAsURL() method is called. Here is the stack trace from DocumentConverter.java.
com.sun.star.task.ErrorCodeIOException: SfxBaseModel::impl_store <file:////home/leopold/Example.pdf> failed: 0x81a(Error Area:Io Class:Parameter Code:26)
at com.sun.star.lib.uno.environments.remote.Job.remoteUnoRequestRaisedException(Job.java:158)
at com.sun.star.lib.uno.environments.remote.Job.execute(Job.java:122)
at com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:312)
at com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:281)
at com.sun.star.lib.uno.environments.remote.JavaThreadPool.enter(JavaThreadPool.java:81)
at com.sun.star.lib.uno.bridges.java_remote.java_remote_bridge.sendRequest(java_remote_bridge.java:619)
at com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.request(ProxyFactory.java:145)
at com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.invoke(ProxyFactory.java:129)
at com.sun.proxy.$Proxy5.storeAsURL(Unknown Source)
at com.example.oo.DocumentConverter.traverse(DocumentConverter.java:139)
at com.example.oo.DocumentConverter.main(DocumentConverter.java:218)
I am, however, able to successfully convert to PDF directly using soffice:
/opt/libreoffice6.4/program/soffice --nologo --invisible --headless \
--convert-to pdf Example.odt
Is there any way I can generate more information about why this error is happening?

Error when trying to use custom tessdata file

I have generated a box file from a png image then I followed this tutorial:
https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/ to generate custom traineddata file.
I encountered an error when I tried to use the generated traineddata alongside with Pytesseract.
and i got this kind of error:
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (-4, "read_params_file:
Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt read_params_file: Can't open txt Error: LSTM requested, but not present!! Loading tesseract. mgr->GetComponent(TESSDATA_NORMPROTO, &fp)
:Error:Assert failed:in file adaptmatch.cpp, line 552")
I'm using Tesseract version 5.0
This is my config options
traineddata = f'+eng+lav+lav2'
config = f'-l {traineddata} --oem 1 --psm 3 {tessdata_dir}'
I followed the same tutorial and encountered the exact same error. At my first tries the ***.traineddata didn't generated well, and I findout that one file was missing (normproto). So I just cleaned all the generated files (except the corrected .box files) and rerun the train process, and everything worked fine on the second attempt.

jupyter notebook with unicode characters PDF converting problems

I was making a math class note with some unicode characters (Simplified Chinese, in my case) in it. And when I was trying to convert it into PDF file, it popped out 500 error. The error message reads:
...
*************************************************
("E:\Program Files (x86)\MiKTeX 2.9\tex\latex\fontspec\fontspec.sty"
("E:\Program Files (x86)\MiKTeX 2.9\tex\latex\fontspec\fontspec-xetex.sty"
("E:\Program Files (x86)\MiKTeX 2.9\tex\latex\base\fontenc.sty"
("E:\Program Files (x86)\MiKTeX 2.9\tex\latex\base\tuenc.def"))
("E:\Program Files (x86)\MiKTeX 2.9\tex\latex\fontspec\fontspec.cfg")
! Undefined control sequence.
<argument> \LaTeX3 error:
Erroneous variable \c__fontspec_shape_n_n_tl used!
l.3806 \emfontdeclare{ \emshape, \eminnershape }
?
! Emergency stop.
<argument> \LaTeX3 error:
Erroneous variable \c__fontspec_shape_n_n_tl used!
l.3806 \emfontdeclare{ \emshape, \eminnershape }
No pages of output.
Transcript written on notebook.log.
I guess the fontspec part went wrong, but I don't know how to solve it.
For your information, here is what I've done before I got the 500 error.
1.I've installed the pandoc, and I already have Miktex before;
2.I've changed the file
...\nbconvert\templates\latex\article.tplx
rewritten the article class to be ctexart;
3.I've changed the file
...\nbconvert\templates\latex\exporters\pdf.py
rewritten the latex command to be
latex_command = List([u"xelatex", u"{filename}"], config=True,
help="Shell command used to compile latex."
)
4.I've also tried this:"https://github.com/ipython/ipython/issues/7150", which tends to convert the ipynb file into latex file first, then into PDF. And this didn't work for me either. The main reason is that the config file can't be found by jupyter nbconvert command.
For your information, my OS is Win7 Ultimate x64, with Chrome for Anaconda3 jupyter notebook.
Thanks in advance for anyone who takes time to read my post. Any help would be appreciated.

ImageMagick convert command not generating images

I am trying to extract images in CentOS machine using ImageMagick's convert command as below:
convert -coalesce http://cdn.abcdf.com/p/f7/81/d3/40/f781d34031e68828eaasdwc937cf3f8.gif /mnt/temp/123.png
I am getting the following error:
convert: unable to open image `//cdn.adnxs.com/p/f7/81/d3/40/f781d34031e6882840eaa6dc937cf3f8.gif': No such file or directory # error/blob.c/OpenBlob/2701.
convert: no decode delegate for this image format `HTTP' # error/constitute.c/ReadImage/504.
convert: no images defined `/mnt/ephemeral2/creative_report/temp/123.png' # error/convert.c/ConvertImageCommand/3258
I tried reinstalling ImageMagick from source but it was of no use.
I resolved the issue by reinstalling IM and the missing libxml2-devel library. Below are the steps I followed:
1)cd (ImageMagick folder)
2)make uninstall
3)yum install tcl-devel libpng-devel libjpeg-devel ghostscript-devel bzip2-devel freetype-devel libtiff-devel libxml2-devel
4)wget ftp://ftp.imagemagick.org/pub/ImageMagick/ImageMagick-6.9.9-0.tar.gz
5)tar xvfz ImageMagick-6.9.9-0.tar.gz
6)cd (the folder created)
7)./configure --prefix=/usr/local --with-bzlib=yes --with-fontconfig=yes --with-freetype=yes --with-gslib=yes --with-gvc=yes --with-jpeg=yes --with-jp2=yes --with-png=yes --with-tiff=yes
8)make clean
9)make
10)make install
On windows I would put double quotes around the path as it looks like it is breaking up the image path.
Your link is bad. My browser tells me it cannot find the server. In ImageMagick, you should read the input image first. So if you fix your URL, then
convert http://cdn.abcdf.com/p/f7/81/d3/40/f781d34031e68828eaasdwc937cf3f8.gif -coalesce /mnt/temp/123.png