I'm trying run tesseract-ocr over this image, unsuccessfully :
> wget http://i.imgur.com/dOtlrvx.png
...
> convert dOtlrvx.png dOtlrvx.tif
> tesseract dOtlrvx.tif out -psm 10 && cat out.txt
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
.
The recognized char is a dot "."
-psm 10 stands for "treat the image as a single character" so I think its the correct option to use. I also tried with other psm possible values, it does not work neither.
Anyone has an idea why is this not working ? Any suggestion is welcomed !
Thanks
Create a new config file for tesseract, add this line tessedit_char_whitelist 0123456789 and then process your image: tesseract dOtlrvx.tif out -psm 10 your_config_file.
This worked for me.
Related
I want to use Tesseract to recognize code. It is said on their website that I can disable dictionaries by setting both of the configuration variables load_system_dawg and load_freq_dawg to false.
However I haven't been able to do it correctly.
$ tesseract img.jpg output.txt --oem 0 -c load_system_dawg=0 load_freq_dawg=0
read_params_file: Can't open load_freq_dawg=0
Error: Tesseract (legacy) engine requested, but components are not present in /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata!!
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
Any ideas on best ways to handle it?
First of all, get eng.traineddata with the legacy engine or other OCR engine value (OEM).
Next, read the output of tesseract --help-extra carefully:
-c VAR=VALUE Set value for config variables.
Multiple -c arguments are allowed.
This question already has answers here:
Tesseract quiet mode
(2 answers)
Closed 1 year ago.
I try to use tesseract for OCR of pictures and I would like to disable the somewhat verbose output of the pages tesseract is scanning:
:~$ tesseract stdin stdout -l eng txt
Page 1
<ocr output>
Is it possible to remove the "Page 1" from the output?
:~$ tesseract --version
tesseract 4.0.0-146-gc39a
Try quiet option at the end of the command.
If you meant you only wanted to see the OCR'd text then just redirect stderr to null.
foo | tesseract - - 2>/dev/null
Or of course, to a log file if you so desire.
I'm using Windows 10 if it matters and I'm trying to feed a file to the "oeminst" app that will convert this file from .EDR to .CCSS. According to the app's website its usage summary is this:
oeminst [-options] [inputfiles]
-v Verbose
-n Don't install, show where files would be installed
-c Don't install, save files to current directory
-S d Specify the install scope u = user (def.), l = local system]
infile Manufacturers setup.exe install file(s) or .dll(s) containing install files
infile.[edr|ccss|ccmx] EDR file(s) to translate and install or CCSS or CCMX files to install
If no file is provided, oeminst will look for the install CD.
more info can be found here https://www.argyllcms.com/doc/oeminst.html
So far I tried this code:
C:\Users\PC>oeminst infile. [C:\Users\PC\testfile.edr]
oeminst: Error - Unable to load file 'infile [C:\Users\PC\testfile]'
I'd appreciate if someone at least could tell me if I'm doing it right or not.
P.S. sorry for the messed up text. Not sure how to fix it. It looks good in editing mode.
Try this : oeminst infile.edr C:\Users\PC\testfile.edr
Nevermind, I got it.
C:\Users\PC>oeminst C:\Users\PC\testfile.edr
I have managed to use
tesseract image.jpg output.txt
to read the text on an image file and save it as a text file, but now I am trying to use more specific commands with tesseract and it is trying to open the output file rather than saving into it
I am trying to use
tesseract image.jpg stdout -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ%/-15 TextOutput
I have literally just started using tesseract so I may well be making a stupid mistake
I figured out that if you insert a > after the specific commands it works
like this
tesseract image.jpg stdout -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ%/-1250 > TextOutput.txt
gsutil -m cp -R 'gs://[BUCKET]/' 'C:/Users/[USER]/[FOLDER]'
will display the following error
[Errno 22] invalid mode ('ab') or filename: u'C:\\Users\\[USER]\\[FOLDER]\\\\[BUCKET]\\[FILE].gstmp'
I've tried changing the '/'s to '//' to '\' and '\' with no results whatsoever
So, after hours trying to find out this was happening.. it happened that the filenames had a character that can't be used in filenames in windows.. hope this helps if anybody else runs into this error.