tesseract OCR output words bounds

tesseract OCR output words bounds - command-line

How to output words bounds using tesseract command line with config file?
So far I been able to output chars using
tesseract image.png myBox makebox
This created a myBox.box file that looks like this:
N 51 1844 75 1874 0
o 80 1843 100 1867 0
S 113 1843 136 1875 0
I 140 1844 145 1874 0
M 151 1844 181 1874 0
c 197 1843 216 1867 0
a 219 1843 238 1867 0
r 243 1844 254 1867 0
d 256 1843 275 1876 0
How ever those only chars and I need words, so I been able to combine it with standard output
tesseract image.png myBox
This creates a file like this:
no simcard
Combining those two outputs I can get words bounds. How ever I prefer to find a method that does not require examining the same image twice. Please help

Related

Data type changes when saving a binary image

I converted a grayscale image to a binary image as shown in the script below:
D = '/folder-path/';
S = dir(fullfile(D,'*.jpg'));
for k = 1:numel(S)
F = fullfile(D,S(k).name);
I = imread(F);
I2 = im2bw(I);
imwrite(I2,F);
end
The issue is when I try to read any of the images that were converted to binary and saved to the hard drive, the returned type is uint8!
I thought the image would contain two values like 0 and 255 for instance at least, but when running unique(I) on one image I got the following:
75×1 uint8 column vector
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
35
36
37
38
217
218
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
Why do you think this is happening? How can I read the saved images as binary and not uint8?
Thanks.

Do not write your binary image to a jpeg file, it is compressed and you certainly loose the exact values in the process.
In addition, erasing the source file really looks like a bad practice.
A solution would be to save your binary image in a png file with the same name. For instance:
imwrite(I2, [D s(k).name(1:end-3) 'png']);
In this case the png contains only zeros and ones. To be able to see your binary image in a viewer, better to have 0s and 255s:
imwrite(I2*255, [D s(k).name(1:end-3) 'png']);

error reading a text file in octave

I have a text file named xMat.txt which has 200 space separated elements in one line and some 767 lines.
This is how xMat.txt looks.
386.0 386.0 388.0 394.0 402.0 413.0 ... .0 800.0 799.0 796
801.0 799.0 799.0 802.0 802.0 80 ... 399.0 397.0 394.0 391
.
.
.
When I try to read the file in octave using X = dlmread('xMat.txt',' ') I get a matrix of size 767 X 610. I am expecting a matrix of size 767 X 200 since there are 200 elements in one row. How can I solve this problem?
Edit - This is my file

Your uploaded file https://bpaste.net/raw/96cf21aa21b8 has incosistent number of columns per row.
$ awk "{print NF}" tmp | sort | uniq -c
2 200
754 201
1 206
1 217
1 223
1 234
1 237
1 238
1 269
1 273
1 390
1 420
1 610
So the most rows have 201 columns but one has 420 columns and one even has 610 columns. This is the reason you get a 767x610 matrix from dlmread.
Lets look which lines have more than 201 columns:
$ awk "{if (NF>201) print NR, NF}" tmp
68 217
580 206
613 390
615 234
657 273
676 610
679 237
720 269
722 238
743 223
762 420
The first coloumn shows the line number, the second number of columns.
So your line with 610 columns is line number 676. I aslo printed line 676:
so you see it really contains data, no multiple spaces which are filles with zeros.

read/load parts of the irregular file by Matlab

I would like to partly load a PTX file by matlab (please see the following example)
I need to read and write the first two row (2 numbers) into 2 variables say a and b. And read and write the data from 5th row to the end into a matrix
Thanks for your help
114
221
1 0 0
1 0 0 0
-5.566405 -7.161944 -1.144557 0.197208 24 29 35
-5.560656 -7.154540 -1.137673 0.222400 29 32 39
-5.559846 -7.153491 -1.131895 0.254002 37 40 49
-5.560894 -7.154833 -1.126452 0.305013 51 54 63
-5.560084 -7.153783 -1.120633 0.290013 72 76 88
-5.561128 -7.155119 -1.115189 0.243214 105 113 134
-5.563203 -7.157782 -1.109926 0.227604 130 143 177
-5.569191 -7.165479 -1.105504 0.201602 121 140 173
-7.833616 -10.078705 -1.546952 0.130007 94 112 134

Look at the tdfread function in order to get the data into Matlab. It should be something like datafile = tdfread(filename, '\t'). Once you have that, index into the variable returned from that function like
a = datafile(1, 1);
b = datafile(2, 1);
data = datafile(5:end, :);

Matlab .txt file read using space delimiter

I want to read a text file into a matrix using space delimiter.My text file contains information like this:
AJ_Lamas/AJ_Lamas_0001.jpg 58 68 134 134 -2 10 31 43 53 45
Aaron_Eckhart/Aaron_Eckhart_0001.jpg 63 72 126 126 0 10 34 35 53
Aaron_Guiel/Aaron_Guiel_0001.jpg 54 67 144 144 -1 10 34 44 58
Aaron_Patterson/Aaron_Patterson_0001.jpg 47 62 148 148 1 10 44 65 63
Aaron_Peirsol/Aaron_Peirsol_0001.jpg 64 72 127 127 0 10 33 43
I tried :
m=dlmread('D:\MatlabCode\lfw_ffd_ann.txt', ' ')
but it shows some errors:
Error using dlmread (line 139)
Mismatch between file and format string.
Trouble reading number from file (row 1u, field 1u) ==> image_name
face_bbox_x face_bbox_y face_bbox_width face_bbox_height headpose
num_facial_features left_eye_left_x left_eye_left_y left_eye_right_x
left_eye_right_y mouth_left_x mouth_left_y mouth_right

You can't really read it itno a matrix, but into a cell and can achieve it with textscan(). Supposing you want to read the actuall strings (which I assume because of the file names), it would go something like this:
fid=fopen('D:\MatlabCode\lfw_ffd_ann.txt');
C=textscan(fid,'%s','delimiter',' ');
fclose(fid);
hope that helps

MATLAB accessing conditional values and performing operation in single column

Just started MATLAB 2 days ago and I can't figure out a non-loop method (since I read they were slow/inefficient and MATLAB has better alternatives) to perform a simple task.
I have a matrix of 5 columns and 270 rows. What I want to do is:
if the value of an element in column 5 of matrix goodM is below 90, I want to take that element and and subtract it from 90.
So far I tried:
test = goodM(:,5) <= 90;
goodM(test) = 999;
It changes all goodM values within column 1 not 5 into 999, in addition this method doesn't allow me to perform operations on the elements below 90 in column 5. Any elegant solution to doing this?
edit:: goodM(:,5)(test) = 999; doesn't seem to work either so I have no idea to specify the target column.

I am assuming you are looking to operate on elements that have values below 90 as your text in the question reads, rather than 'below or equal to' as represented by '<=' as used in your code. So try this -
ind = find(goodM(:,5) < 90) %// Find indices in column 5 that have values less than 90
goodM(ind,5) = 90 - goodM(ind,5) %// Operate on those elements using indices obtained from previous step

Try this code:
b=90-a(a(:,5)<90,5);
For example:
a =
265 104 479 13 176
26 110 447 208 144
379 163 179 366 464
301 48 274 391 26
429 374 174 184 297
495 375 312 373 82
465 272 399 447 420
205 170 373 122 84
1 417 63 65 252
271 277 412 113 500
then,
b=90-a(a(:,5)<90,5);
b =
64
8
6

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

tesseract OCR output words bounds - command-line

Related

Data type changes when saving a binary image

error reading a text file in octave

read/load parts of the irregular file by Matlab

Matlab .txt file read using space delimiter

MATLAB accessing conditional values and performing operation in single column

Categories

Resources