Data needed to train Tesseract OCR for custom Language

Data needed to train Tesseract OCR for custom Language - tesseract

I am trying to build a CUSTOM language for detecting only following characters:
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1',
'2', '3', '4', '5', '6', '7', '8', '9', '<', '<<<', '/']
I have almost 50 images for which I have generated box files corrected the errors.
My question is for training tesseract for the above customized characters is it needed to use images which were created by tesseract tool to be used also as an input while creating cust.traindata
I have made a code which from the above array takes 5 character and builds an image using tesseract tool and then later generates the .box file which is proper and doesn't need tunning for all possible configurations but since tesseract as created it does it need to be given for building the cust.traindata.
Thanks in advance.

We don't need to create a new language if we want tesseract to use default "eng" language to predict following letters
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '<', '<<<', '/']
You just need to add following configuration to tesseract tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789<"
eg.
tesseract input_image output_text -l eng -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789<"

Related

Powershell .TrimEnd not returning correct result

Running the following
$x = "CF21_flddep-op-config"
$x.TrimEnd("-op-config")
Results in:
CF21_fldde
When it should be displaying:
CF21_flddep
Any ideas why?

.TrimEnd() does not remove a trailing string, it removes a set of trailing characters. p is in that set, so the last p is also removed. (You would get the same result with .TrimEnd("-cfginop"), or more explicitly .TrimEnd('-', 'c', 'f', 'g', 'i', 'n', 'o', 'p').) You want something like $x -replace "-op-config", "" or, if the string must only be removed when it occurs at the end, -replace "-op-config$", "".

Reading from barcode scanner to a text file in raspberry pi

I have a problem that I want to read the input that read by "bar code" and save it to a text file.
Raspberry pi B+ with latest version of wheezy.
"bar code" scanner "data logic q w 2100".

I solved problem with few steps simply:
1) make sure your repo is updated.
2) your barcode scanner is selected as USB KEYBOARD as shown in img:"1.PNG"
[DATALOGIC QW2100 KEYBOARD SELECTION][1][1]: http://i.stack.imgur.com/bxkG2.png
3) in Ubuntu barcode scanner capture data and view it in any window can capture input " terminal or text file " or any other thing.
4) in raspberry pi problem is that barcode scanner captured or read value is made into "/dev/hidraw0" for example.."that file is created auto when your barcode is connected to your raspi.
HERE IS SOME SIMPLE PYTHON CODE TO CAPTURE DATA FROM THAT FILE ONLY WHEN BARCODE IS CONNECTED TO RASPI:
import sys
done = False
while not done:
hid = { 4: 'a', 5: 'b', 6: 'c', 7: 'd', 8: 'e', 9: 'f', 10: 'g', 11: 'h', 12: 'i', 13: 'j', 14: 'k', 15: 'l', 16: 'm', 17: 'n', 18: 'o', 19: 'p', 20: 'q', 21: 'r', 22: 's', 23: 't', 24: 'u', 25: 'v', 26: 'w', 27: 'x', 28: 'y', 29: 'z', 30: '1', 31: '2', 32: '3', 33: '4', 34: '5', 35: '6', 36: '7', 37: '8', 38: '9', 39: '0', 44: ' ', 45: '-', 46: '=', 47: '[', 48: ']', 49: '\\', 51: ';' , 52: '\'', 53: '~', 54: ',', 55: '.', 56: '/' }
hid2 = { 4: 'A', 5: 'B', 6: 'C', 7: 'D', 8: 'E', 9: 'F', 10: 'G', 11: 'H', 12: 'I', 13: 'J', 14: 'K', 15: 'L', 16: 'M', 17: 'N', 18: 'O', 19: 'P', 20: 'Q', 21: 'R', 22: 'S', 23: 'T', 24: 'U', 25: 'V', 26: 'W', 27: 'X', 28: 'Y', 29: 'Z', 30: '!', 31: '#', 32: '#', 33: '$', 34: '%', 35: '^', 36: '&', 37: '*', 38: '(', 39: ')', 44: ' ', 45: '_', 46: '+', 47: '{', 48: '}', 49: '|', 51: ':' , 52: '"', 53: '~', 54: '<', 55: '>', 56: '?' }
fp = open('/dev/hidraw0', 'rb')
ss = ""
shift = False
done = False
while not done:
## Get the character from the HID
buffer = fp.read(8)
for c in buffer:
if ord(c) > 0:
## 40 is carriage return which signifies
## we are done looking for characters
if int(ord(c)) == 40:
done = True
break;
## If we are shifted then we have to
## use the hid2 characters.
if shift:
## If it is a '2' then it is the shift key
if int(ord(c)) == 2 :
shift = True
## if not a 2 then lookup the mapping
else:
ss += hid2[ int(ord(c)) ]
shift = False
## If we are not shifted then use
## the hid characters
else:
## If it is a '2' then it is the shift key
if int(ord(c)) == 2 :
shift = True
## if not a 2 then lookup the mapping
else:
ss += hid[ int(ord(c)) ]
print ss
##DONE
I ADDED FIRST WHILE LOOP TO MAKE SCRIPT WORK CONTINUOUS TILL U KILL IT WITH "CTRL+C".
ANOTHER THING: IMAGE IS FOR DATALOGIC BARCODE CODE SCANNER QW2100 LITE, SO CHECK YOUR BARCODE MANUEL CAREFULLY TOO.

If the LED comes on, I suggest you to boot your raspberry with barcode scanner plug in, open terminal and type:
sudo apt-get update
sudo apt-get upgrade
After that, reboot and open a txt file a try to scan a barcode; Pay attention that barcode is recognized by scanner.
I bought a barcode scanner and I have the same problem me too. In this way I solved.

Replace once part of the query in Emacs

I have such requirement for search&replace in Emacs:
I have a bunch of
'A', 'High'
'B', 'High'
'C', 'High'
'D', 'High'
And the list goes on.
I want to replace them to be:
A = 'High'
B = 'High'
C = 'High'
D = 'High'
Can I query for the pattern, say '#', 'High' and replace it with #= 'High?

Move point to beginning of buffer.
M-x query-replace-regexp.
Enter '\([^']+\)', '\([^']+\)' as regexp and \1 = '\2' as replacement.
Press ! to replace all at once, or keep pressing y/n
for each match.

Calling subroutine N number of times within a foreach loop

I have the two hash of arrays (HoA) that correspond to the following file:
A 10 15 20 25
B 21 33 21 23
C 43 14 23 23
D 37 45 43 49
Here are my HoAs.
my %first_HoA = (
'A' => [ '10', '15', '20', '25'],
'B' => [ '21', '33', '21', '23'],
);
my %second_HoA = (
'A' => [ '10', '15', '20', '25'],
'B' => [ '21', '33', '21', '23'],
'C' => [ '43', '14', '23', '23'],
'D' => [ '37', '45', '43', '49'],
);
For every $key in the second HoA (A-D), I want to call a subroutine that does calculations on it's corresponding array and the array of every $key in the first HoA (A-B). Based on the calculations, the subroutine should return a key from the first HoA that yields the highest value. In other words, the subroutine should only be called for every $key in the second HoA and return the $key in the first HoA that yields the best value based on the calculations of the arrays of the keys in the first HoA.
Here's how I have it right now. Say I have an arbitrary subroutine called calculate
my $iterations = 1;
foreach my $key ( keys %second_HoA ) {
for my $arrayref (values %first_HoA){
calculate($first_HoA{$key}, $arrayref);
print "Iteration: $iterations\n";
$iterations++;
}
}
As you can see, this calls calculate 8 times. I only want to call calculate for every $key in %second_HoA which is 4 times but I also need to pass in the $arrayref to do the calculations in the subroutine.
Does anyone know how I can do this?
Another way I was thinking of doing this was passing in a hash_ref of the first_HoA like so:
foreach my $key ( keys %second_HoA ) {
calculate($second_HoA{$key}, \%first_HoA);
print "Iteration: $iterations\n";
$iterations++;
}
Doing this calls calculate 4 times which is what I want but it complicates things in the subroutine.
Any suggestions. thanks.

You say calculate($second_HoA{$key}, \%first_HoA) "complicates things", but I don't see how that's possible. It seems to me it's the minimum of information you need, and it's in a convenient format.
Anything less would complicate things, in the sense that you wouldn't have the information you need to do your calculations.

How to get all options from Getopt Long without specifc knowledge of the params?

I need a simple script that echos ALL options and values given (and I have no idea what the potential options are going to be). I've experimented with things like this:
use Getopt::Long qw(GetOptionsFromArray);
my %options;
my #opt_spec = qw(a:s b:s c:s d:s e:s f:s g:s h:s i:s j:s k:s l:s m:s n:s o:s p:s r:s q:s r:s s:s t:s u:s v:s w:s x:s y:s z:s);
Getopt::Long::GetOptions(\%options, #opt_spec);
but I'm still having to specify all possible options - is there a way to get all the key/value pairs without knowing ahead of time what I'll be receiving as options?

Getopt::Long supports much more than just key-value pairs: negatable options, options with multiple or hash values, incrementing options, single character and bundled options. Without giving exact scheme Getopt::Long just can't guess what exact abilities of module you'd want to use, so it don't seems like it is the tool for this task.
You might want Getopt::Whatever instead.

You do need a spec. If you didn't have a spec, there would be no way to know that
-a=-b -c -d -e -f g --h -- -i -j
should give
my %options = (
'a' => '-b',
'c' => '',
'd' => '',
'e' => '',
'f' => 'g'
'h' => '',
);
#ARGV = (
'-i',
'-j',
);
instead of
my %options = (
'a' => '-b',
'c' => '-d',
'e' => '-f',
'h' => '--',
'i' => '-j'
);
#ARGV = (
'g',
);
(The latter used a=s, b=s, etc.)
You could write a version of GetOptions that gives the :s spec to all names, but as long as you only have single-letter args, it would be simpler to simple use code to generate the spec.
my #opt_spec = map "$_:s", 'a'..'z';

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Data needed to train Tesseract OCR for custom Language - tesseract

Related

Powershell .TrimEnd not returning correct result

Reading from barcode scanner to a text file in raspberry pi

Replace once part of the query in Emacs

Calling subroutine N number of times within a foreach loop

How to get all options from Getopt Long without specifc knowledge of the params?

Categories

Resources