tesseract identifying the whole image as single box? - tesseract

I am trying to train the tesseract 3.02 on Ubuntu 14.04. I followed the guidelines mentioned on Cedric's blog.
First I tried to generate a box file using the following command:
tesseract eng.mr.exp0.jpg eng.mr.exp0 batch.nochop makebox
But the above command is generating a box file with a single line with the whole image as a single box (actually it should have generated a box file with 6 lines). So, I used jTessBoxEditor to edit the box file and create 6 boxes with appropriate co-ordinates and characters). Now when I try to train the tesseract with the above created box file using the command
tesseract eng.mr.exp0.jpg eng.mr.exp0.box nobatch box.train
I get the error:
Tesseract Open Source OCR Engine v3.03 with Leptonica
FAIL!
APPLY_BOXES: boxfile line 1/0 ((20,24),(95,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 2/7 ((96,24),(171,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 3/0 ((172,24),(248,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 4/3 ((248,24),(324,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 5/3 ((324,24),(400,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 6/0 ((400,24),(476,192)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
Boxes read from boxfile: 6
Boxes failed resegmentation: 6
APPLY_BOXES: Unlabelled word at :Bounding box=(0,19)->(480,192)
Found 0 good blobs.
1 remaining unlabelled words deleted.
Generated training data for 0 words
What is the mistake I am making?
Image used is here

That image is so dirty! You will need to clean it up first before any OCR software can recognize, or train with, it. Once the image preprocessed, you can use mirc.traineddata found in this Tesseract forum post.

Related

Error while loading .ods sheet into Octave

I am trying to read a column of integers in a spreadsheet into Octave using the code
A = odsread('Data.ods', 'Sheet1', 'A1:A946');
But it fails and I get a message with warnings and errors as :
> unzip: cannot find or open Data.ods, Data.ods.zip or Data.ods.ZIP.
file Data.ods couldn't be unpacked. Is it the proper file format?
warning: UnZip failed with error 9
Output:
error: warning: STATE structure must have fields 'identifier' and 'state'
error: called from
__OCT_spsh_open__ at line 72 column 7
odsopen at line 267 column 30
odsread at line 179 column 7
So the error says that "STATE structure must have fields 'identifier' and 'state'" , what does this mean?
Apparently the full path to the file should be included to make it work. It cannot load the file itself even when the file is in the same directory.

Dual regression error (multiple files in a text file)

So I'm running into some trouble using dual_regression. The problem here is that I'm using the following command and getting the following error:
> macminngh:session_one_and_three sondosayyash$ dual_regression /Users/sondosayyash/Downloads/FIX_sNorm/40_subjects.gica/groupmelodic.ica/melodic_IC.nii.gz 1 -1 5000 dualreg_40subj_output.dr 'cat /Users/sondosayyash/Desktop/Users.txt'
/Users/sondosayyash/abin/fsl/bin/dual_regression: line 126: [: too many arguments
mkdir: dualreg_40subj_output.dr: File exists
mkdir: dualreg_40subj_output.dr/scripts+logs: File exists
creating common mask
/bin/sh: line 1: syntax error near unexpected token `dualreg_40subj_output.dr/scripts+logs/drA'
/bin/sh: line 1: `file (dualreg_40subj_output.dr/scripts+logs/drA) does not exist -T 5 -N drB -l dualreg_40subj_output.dr/scripts+logs dualreg_40subj_output.dr/scripts+logs/drB'
doing the dual regressions
sorting maps and running randomise
/bin/sh: line 1: you: command not found
I don't know where I'm going wrong.
As for the text file listed as 'Users.txt' has many different file directories to filtered_func data.
I have a feeling there is a problem with the text file but I'm not entirely sure.

How to use bibliography in different directory when knitting rmarkdown document to beamer presentation?

I'm knitting some beamer slides in an RMarkdown script in Rstudio on a Windows 7 PC. The slides are in the directory
C:/me/slides/myslides.Rmd
I have a master bibliography that lives in
C:/me/bib/masterbib.bib
I cannot figure out how to link to the bibliography file from the RMarkdown document. Here's the YAML from my attempt:
---
title: "Slides"
author: "me"
date: "2016-12-20"
bibliography: C:/me/bib/masterbib.bib
biblio-style: "apalike"
output:
beamer_presentation:
citation_package: natbib
---
Here's the error:
! Undefined control sequence.
<write> \string \bibdata {C:\me\bib\masterbib}
l.174 \end{frame}
Error: Failed to compile Slides.tex. See Slides.log for more info.
In addition: Warning message:
running command '"pdflatex" -halt-on-error -interaction=batchmode "Slides.tex"' had status 1
Execution halted
I've tried a couple other ways to specify the directory for masterbib.bib, but none have worked. I would prefer to keep the masterbib.bib file where it is, and not make an extra copy in the C:/me/slides/ directory. Thanks for your help!
Edit
When attempting to pass the following into YAML (quoteed with forward slashes):
bibliography: "C:/LaTeXstuff/BibTexLibrary/BrianBib.bib"
I get a fatal error with log output:
! Undefined control sequence.
<write> \string \bibdata {C:\me
\bib\masterbib}
l.174 \end{frame}
Here is how much of TeX's memory you used:
18047 strings out of 494045
334241 string characters out of 3145937
424206 words of memory out of 3000000
20891 multiletter control sequences out of 15000+200000
31808 words of font info for 44 fonts, out of 3000000 for 9000
715 hyphenation exceptions out of 8191
56i,11n,55p,434b,376s stack positions out of 5000i,500n,10000p,200000b,50000s
! ==> Fatal error occurred, no output PDF file produced!
When passing the following into YAML (quoted with backslashes)
bibliography: "C:\me\bib\masterbib.bib"
I get the following error in the Rstudio console
Error in yaml::yaml.load(enc2utf8(string), ...) :
Scanner error: while parsing a quoted scalar at line 4, column 15found unknown escape character at line 4, column 29
Calls: <Anonymous> ... yaml_load_utf8 -> mark_utf8 -> <Anonymous> -> .Call
Execution halted
When passing the following into YAML (unquoted with backslashes)
bibliography: C:\me\bib\masterbib.bib
I get the following error in the Rstudio console
! Undefined control sequence.
<write> \string \bibdata {C:\me
\bib\masterbib}
l.174 \end{frame}
Error: Failed to compile BibTest.tex. See BibTest.log for more info.
In addition: Warning message:
running command '"pdflatex" -halt-on-error -interaction=batchmode "BibTest.tex"' had status 1
Execution halted
Try unquoted with two backslashes:
...
bibliography: C:\\me\\bib\\masterbib.bib
...

Tesseract index >= 0 && index < size_used_:Error:Assert failed Error

I successfully wrote the traineddata file for a new tesseract language, but when I was finished, I continue to get the following error:
index >= 0 && index < size_used_:Error:Assert failed:in file ../ccutil/genericvector.h, line 657
However, this even happens when I run tesseract on an image I trained with! I am confused as to what is going on, as I would expect that the error should not occur if I run tesseract on the training set.
This error is being caused to the lack of a lang.shapetable file in your lang.traineddata file.
Make sure that you generate the shapetable:
shapeclustering -F font_properties -U unicharset lang.font.exp0.box.tr
This will create a file named shapetable. You will need to rename this to lang.shapetable before you can combine everything:
combine_tessdata lang.
That error indicates your training failed => you overlooked some error message during training.

Converting MATLAB files to Octave

I have a series of experiments that were written for MATLAB, but recently we are trying to run them through Octave instead. I realize they are mostly compatible, but I have been running into a few problems, and none of the online FAQs or directions I have found have addressed these at all. It's complicated a bit because there are multiple .m files that interact; however, for now I am going to focus on the main program. Anyway, so when I try to run the file (MLP.m) through octave, I get the following errors in the Terminal window:
error: dir: expecting directory or filename to be a char array
error: called from:
error: /Applications/Octave.app/Contents/Resources/share/octave/3.2.3/m/miscellaneous/dir.m at line 128, column 5
error: /Applications/MATLAB_R2008a/toolbox/psychoacoustics/MLParameters.m at line 86, column 7
error: /Applications/MATLAB_R2008a/toolbox/psychoacoustics/MLP.m at line 9, column 3
The lines it is referencing are as follows:
1)
d = dir([cd myslash 'Experiments_MLP' myslash '*.m']);
2)
s = MLParameters;
What about these lines is incompatible with Octave? I can't find anything online that indicates that these won't work.
After that, the Terminal window gives me this batch of nonsense:
dyld: Library not loaded: /usr/X11/lib/libfreetype.6.dylib
Referenced from: /usr/X11R6/lib/libfontconfig.1.dylib
Reason: Incompatible library version: libfontconfig.1.dylib requires version 13.0.0 or later, but libfreetype.6.dylib provides version 10.0.0
dyld: Library not loaded: /usr/X11/lib/libfreetype.6.dylib
Referenced from: /usr/X11R6/lib/libfontconfig.1.dylib
Reason: Incompatible library version: libfontconfig.1.dylib requires version 13.0.0 or later, but libfreetype.6.dylib provides version 10.0.0
/Applications/Gnuplot.app/Contents/Resources/bin/gnuplot: line 71: 1077 Trace/BPT trap GNUTERM="${GNUTERM}" GNUPLOT_HOME="${GNUPLOT_HOME}" PATH="${PATH}" DYLD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}" HOME="${HOME}" GNUHELP="${GNUHELP}" DYLD_FRAMEWORK_PATH="${DYLD_FRAMEWORK_PATH}" GNUPLOT_PS_DIR="${GNUPLOT_PS_DIR}" DISPLAY="${DISPLAY}" GNUPLOT_DRIVER_DIR="${GNUPLOT_DRIVER_DIR}" "${ROOT}/bin/gnuplot-4.2.6" "$#"
/Applications/Gnuplot.app/Contents/Resources/bin/gnuplot: line 71: 1083 Trace/BPT trap GNUTERM="${GNUTERM}" GNUPLOT_HOME="${GNUPLOT_HOME}" PATH="${PATH}" DYLD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}" HOME="${HOME}" GNUHELP="${GNUHELP}" DYLD_FRAMEWORK_PATH="${DYLD_FRAMEWORK_PATH}" GNUPLOT_PS_DIR="${GNUPLOT_PS_DIR}" DISPLAY="${DISPLAY}" GNUPLOT_DRIVER_DIR="${GNUPLOT_DRIVER_DIR}" "${ROOT}/bin/gnuplot-4.2.6" "$#"
error: you must have gnuplot installed to display graphics; if you have gnuplot installed in a non-standard location, see the 'gnuplot_binary' function
I have GNUPlot installed, and I checked the gnuplot_binary function, which didn't give me any answers. GNUPlot is installed in my /Applications directory, along with Octave itself. Why shouldn't this work? The README file that came with GNUPlot didn't indicate a special directory for it to be installed in. What about the the dyld library not loaded errors? Is that related to the GNUPlot problem, or is it something else?
Anyway, thanks for your help
I know you already solved your problem, but if you have problems again here are some links with basic information about the differences between Matlab and Octave:
Porting programs from Matlab to Octave
Differences between Octave and MATLAB
Addressing your first error, it's easier to explain with an example:
dirName = '/some/path'; %# base directory
filesPath = fullfile(dirName, 'MLP', '*.m'); %# full path string
d = dir(filesPath); %# expand/enumerate files
for i=1:numel(d)
disp( d(i).name )
end
You also could have built the path using string concatenation yourself:
%# '/some/path/MLP/*.m'
filesPath = [dirName filesep 'MLP' filesep '*.m'];
The above should work for both MATLAB and Octave