Improving accuracy of single character recognition with low-resolution image

Improving accuracy of single character recognition with low-resolution image - tesseract

I have a set of images all the same size that contain a single character, roughly in the same position each time. These images are all 80x75 pixels and have a purely white background and purely black character.
Here are some examples of my images:
https://imgur.com/a/yiFajwZ
So far, I am having very poor accuracy using this configuration (single character mode and character whitelist):
pytesseract.image_to_string(x, config="-c tessedit
_char_whitelist=abcdefghijklmnopqrstuvwxyz
--psm 10
-l osd
")
Any help would be great, thank you.
Edit: I've tried resizing the images to something larger, like 800 x 750, but still have the same issue. Here's what the characters are thought to be:
c -> p
f -> l
j -> l
k -> g
t -> l
v -> y
x -> m
y -> y

Related

caffe circular shift of zero-padded upscaled images for interpolation

I want to implement the super-resolution algorithm defined in https://arxiv.org/abs/1609.05158 using caffe. There are TF implementations but no caffe implementation yet: https://github.com/tetrachrome/subpixel
To summarize the algorithm: I want to superresolve an image by 3. I want to do the upsampling at the end of the network rather than at the beginning. To do that I will have 9 images (Batch x 9 x height x width) at the end of the network.
Then what I wish to do is to pick one pixel from each image at the same coordinates and place them within 3x3 square to complete an image of size 3*height * 3*width. Similar to:
1) Can I use deconvolution layer to upscale an image by 3, filling zeros in between and if so, how?
2) I am thinking of using slice layer to extract 9 images.
3) Is there a way to circularly shift some images to align them as seen in the image and if so, how?
4) Do I really need slice layer before circular shifting and eltwise summing OR can I do it in another way without needing slice layer: Can I circular shift channels separately and can I merge channels of images by summation?
5) Can this be done in a much easier way which I am unable to imagine.
I asked quite a lot questions I hope I am not overflowing the questions.
Thank you in advance.
EDIT:
I want to implement this Tensorflow code in caffe:
def _phase_shift(I, r):
bsize, a, b, c = I.get_shape().as_list()
bsize = tf.shape(I)[0] # Handling Dimension(None) type for undefined batch dim
X = tf.reshape(I, (bsize, a, b, r, r))
X = tf.transpose(X, (0, 1, 2, 4, 3)) # bsize, a, b, 1, 1
X = tf.split(1, a, X) # a, [bsize, b, r, r]
X = tf.concat(2, [tf.squeeze(x, axis=1) for x in X]) # bsize, b, a*r, r
X = tf.split(1, b, X) # b, [bsize, a*r, r]
X = tf.concat(2, [tf.squeeze(x, axis=1) for x in X]) # bsize, a*r, b*r
return tf.reshape(X, (bsize, a*r, b*r, 1))

calculate conservative interpolation of two vectors in matlab

G'day
Firstly, apologies for poor wording - I'm at a bit of a loss of how to describe this problem. I'm trying to calculate the conservative interpolation between two different vertical coordinate systems.
I have a vector of ocean transport values Ts, that describe the amount of transport at different depth values S. These depths are unevenly spaced (and size(S) is equal to size(Ts)+1 as the values in S are the depths at the top and bottom over which the transport value applies). I want to interpolate(/project?) this onto a vector of regularly spaced depths Z, where each new transport value Tz is formed from the values of Ts but weighted by the amount of overlap.
I've drawn a picture of what I mean (sorry for the bad quality webcam picture) I want to go from Ts1,Ts2.Ts3...TsN (bottom lines) to Tz1,Tz2,...TzN (top lines). The locations in the x direction for these are s0,s1,s2,...sN and z0,z1,z2,...zN. An example of the 'weighted overlap' would be:
Tz1 = a/(s1-s0) Ts1 + b/(s2-s1) Ts2 + c/(s3-s2) Ts3
where a, b and c are shown in the image as the length of overlap.
Some more details:
Example of z and s follow:
z = 0:5:720;
s = [222.69;...
223.74
225.67
228.53
232.39
237.35
243.56
251.17
260.41
271.5
284.73
300.42
318.9
340.54
365.69
394.69
427.78
465.11
506.62
551.98
600.54
651.2];
Note that I'm free to define z, but not s. Typically, z will be bigger than s (i.e. the smallest value in z will be smaller than in s, while the largest value in z will be larger than in s).
Help or tips greatly appreciated. Cheers,
Dave

I don't think there is an easy solution, as stated in the comments. I'll give it a go though :
One hypothesis first : We assume z0>s0 in order for your problem to be defined.
The idea (for your example) would be to get to the array below :
1 (s1-z0) s1-s0 Ts1
1 (s2-s1) s2-s1 Ts2
1 (z1-s2) s3-s2 Ts3
2 (s3-z1) s3-s2 Ts3
2 (z2-s3) s4-s3 Ts4
3 (z3-z2) s4-s3 Ts4
......
Then we would be able to compute, for each row : column1*column3/column2 and then use accumarray to sum the results with respect to the indexes in the first column.
Now the hardest part is to get this array :
Suppose you have :
A Nx1 vectors Ts
2 (N+1)x1 vectors s and z, with z(1)>s(1).
Vectsz=sort([s(2:end);z]); % Sorted vector of s and z values
In your case this vector should look like :
z0
s1
s2
z1
s3
z2
z3
...
The first column will serve as a subscript to apply accumarray, so we'll want it to increase each time there is a z value in our vector Vectsz
First=interp1(z,1:length(z),Vectsz,'previous');
Second=[diff(Vectsz);0]; % Padded with a 0 to keep the right size
Temp=diff(s);
Third=interp1(s(1:end-1),Temp,Vectsz,'previous');
This will just repeat the diff value everytime you have a z value in your vector Vectsz.
The last column is built exactly like the third one
Fourth=interp1(s(1:end-1),Ts,Vectsz,'previous');
Now that the array is built, a call to accumarray is enough to get the final result :
Res=accumarray(First,Second.*Fourth./Third);
EDIT : There is actually no need for the use of interp1 with the previous option :
Vectsz=sort([s(2:end);z]);
First=cumsum(ismember(Vectsz,z));
Second=[diff(Vectsz);0];
idx=cumsum(ismember(Vectsz,s(2:end)))+1;
Diffs=[diff(s);0];
Third=Diffs(idx);
Fourth=Ts(idx);
Res=accumarray(First,Second.*Fourth./Third);

RGB to YIQ conversion

I wrote code for rgb to yiq conversion.I get results but i don't know if this is correct.
%extract the red green blue elements
ImageGridRed = double(ImageRGB(:,:,1))';
ImageGridGreen = double(ImageRGB(:,:,2))';
ImageGridBlue = double(ImageRGB(:,:,3))';
%make the 300x300 matrices into 1x90000 matrices
flag = 1;
for i =1:1:300
for j = 1:1:300
imageGR(flag) = ImageGridRed(j,i);
imageGG(flag) = ImageGridGreen(j,i);
imageGB(flag) = ImageGridBlue(j,i);
flag = flag+1;
end
end
%put the 3 matrices into 1 matrix 90000x3
for j=1:1:300*300
colorRGB(j,1) = imageGR(j);
colorRGB(j,2) = imageGG(j);
colorRGB(j,3) = imageGB(j);
end
YIQ = rgb2ntsc([colorRGB(:,1) colorRGB(:,2) colorRGB(:,3)]);
I wrote this because the rgb2ntsc function needs mx3 matrix for input.I use the number 300 beacuse the picture is 300x300 pixels.I am going to seperate the picture in blocks in my project so dont give attention to the 300 number because i am going to change that, i put it just as an example.
thank you.

What you're doing is completely unnecessary. If you consult the documentation on rgb2ntsc, it also accepts a RGB image. Therefore, when you put in a RGB image, the output will be a 3 channel image, where the first channel is the luminance, or Y component and the second and third channels are the hue and saturation information (I and Q respectively). You don't need to decompose the image into a M x 3 matrix.
Therefore, simply do:
YIQ = rgb2ntsc(ImageRGB);
Make sure that ImageRGB is a RGB image where the first channel is red, second is green and third is blue.
Edit
With your comments, you want to take all of the pixels and place it into a M x 3 matrix where M is the total number of pixels. You would use this as input into rgb2ntsc. The function accepts a M x 3 matrix of RGB values where each row is a RGB tuple. The output in this case will be another M x 3 matrix where each row is its YIQ counterpart. Your code does do what you want it to do, but I would recommend that you do away with the for loops and replace it with:
colorRGB = reshape(permute(ImageRGB, [3 1 2]), 3, []).';,
After, do YIQ = rgb2ntsc(colorRGB);. colorRGB will already be a M x 3 matrix, so that column indexing you're doing is superfluous.
With the above using reshape and permute, it's very unnecessary to use the loops. In fact, I would argue that the for loop code is slower. Stick with the above code to get this done fast. Once you have your matrix in this fashion, then I suppose the code is doing what you want it to do.... however, I would personally just do a conversion on the image itself, then split it up into blocks or whatever you want to do after the fact.

Calculating the Local Ternary Pattern of an image?

I am calculating the Local Ternary Pattern of an image. My code is given below. Am I going in the right direction or not?
function [ I3 ] = LTP(I2)
m=size(I2,1);
n=size(I2,2);
for i=2:m-1
for j=2:n-1
J0=I2(i,j);
I3(i-1,j-1)=I2(i-1,j-1)>J0;
end
end
I2 is the image LTP is applied to.

This isn't quite correct. Here's an example of LTP given a 3 x 3 image patch and a threshold t:
(source: hindawi.com)
The range that you assign a pixel in a window to 0 is when the threshold is between c - t and c + t, where c is the centre intensity of the pixel. Therefore, because the intensity is 34 in the centre of this window, the range is between [29,39]. Any values that are beyond 39 get assigned 1 and any values that are below 29 get assigned -1. Once you determine the ternary codes, you split up the codes into upper and lower patterns. Basically, any values that get assigned a -1 get assigned 0 for upper patterns and any values that get assigned a -1 get assigned 1 for lower patterns. Also, for the lower pattern, any values that are 1 from the original window get mapped to 0. The final pattern is reading the bit pattern starting from the east location with respect to the centre (row 2, column 3), then going around counter-clockwise. Therefore, you should probably modify your function so that you're outputting both lower patterns and upper patterns in your image.
Let's write the corrected version of your code. Bear in mind that I will not give an optimized version. Let's get a basic algorithm working, and it'll be up to you on how you want to optimize this. As such, change your code to something like this, bearing in mind all of the stuff I talked about above. BTW, your function is not defined properly. You can't use spaces to define your function, as well as your variables. It will interpret each word in between spaces as variables or functions, and that's not what you want. Assuming your neighbourhood size is 3 x 3 and your image is grayscale, try something like this:
function [ ltp_upper, ltp_lower ] = LTP(im, t)
%// Get the dimensions
rows=size(im,1);
cols=size(im,2);
%// Reordering vector - Essentially for getting binary strings
reorder_vector = [8 7 4 1 2 3 6 9];
%// For the upper and lower LTP patterns
ltp_upper = zeros(size(im));
ltp_lower = zeros(size(im));
%// For each pixel in our image, ignoring the borders...
for row = 2 : rows - 1
for col = 2 : cols - 1
cen = im(row,col); %// Get centre
%// Get neighbourhood - cast to double for better precision
pixels = double(im(row-1:row+1,col-1:col+1));
%// Get ranges and determine LTP
out_LTP = zeros(3, 3);
low = cen - t;
high = cen + t;
out_LTP(pixels < low) = -1;
out_LTP(pixels > high) = 1;
out_LTP(pixels >= low & pixels <= high) = 0;
%// Get upper and lower patterns
upper = out_LTP;
upper(upper == -1) = 0;
upper = upper(reorder_vector);
lower = out_LTP;
lower(lower == 1) = 0;
lower(lower == -1) = 1;
lower = lower(reorder_vector);
%// Convert to a binary character string, then use bin2dec
%// to get the decimal representation
upper_bitstring = char(48 + upper);
ltp_upper(row,col) = bin2dec(upper_bitstring);
lower_bitstring = char(48 + lower);
ltp_lower(row,col) = bin2dec(lower_bitstring);
end
end
Let's go through this code slowly. First, I get the dimensions of the image so I can iterate over each pixel. Also, bear in mind that I'm assuming that the image is grayscale. Once I do this, I allocate space to store the upper and lower LTP patterns per pixel in our image as we will need to output this to the user. I have decided to ignore the border pixels where when we consider a pixel neighbourhood, if the window goes out of bounds, we ignore these locations.
Now, for each valid pixel that is within the valid borders of the image, we extract our pixel neighbourhood. I convert these to double precision to allow for negative differences, as well as for better precision. I then calculate the low and high ranges, then create a LTP pattern following the guidelines we talked about above.
Once I calculate the LTP pattern, I create two versions of the LTP pattern, upper and lower where any values of -1 for the upper pattern get mapped to 0 and 1 for the lower pattern. Also, for the lower pattern, any values that were 1 from the original window get mapped to 0. After, this, I extract out the bits in the order that I laid out - starting from the east, go counter-clockwise. That's the purpose of the reorder_vector as this will allow us to extract those exact locations. These locations will now become a 1D vector.
This 1D vector is important, as we now need to convert this vector into character string so that we can use bin2dec to convert the value into a decimal number. These numbers for the upper and lower LTPs are what are finally used for the output, and we place those in the corresponding positions of both output variables.
This code is untested, so it'll be up to you to debug this if it doesn't work to your specifications.
Good luck!

Algorithm for short IDs

I'm looking for an algorithm - or should I better say: encoding? - to compress integer numbers to short string IDs like URL shorteners use: http://goo.gl/0puu
Url safe base 64 comes close to it, but maybe there is something better.
Requirements:
as short as possible
url safe

"yi_H" called base64 "perfect" and after a bit more research I came to the same conclusion, since only the following characters could be used in URLs without worry:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~
Thats 66 characters, whereas base64 only uses 64 characters. The two more possible characters wouldn't be practical because 66 is not based on 2.
Conclusion: URL safe base64 (offered as part of Apache Commons for example) is perfect for short IDs.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse