Tesseract OCR failed to recognize full height numbers - tesseract

I have tested with sample text both alphanumeric and digits only. I am using digits mode.
How do I recognize digits like in the following image:
I think it is because of full height.
I have also tried converting it to .jpg using some online tools (not code)
I am using pytesseract 0.1.6, but I think this is Tesseract problem.
Here is my code:
def classify(hash):
socket = urllib.urlopen(hash)
image = StringIO(socket.read())
socket.close()
image = Image.open(image)
number = image_to_string(image, config='digits')
mapping[hash] = number
return number
classify('any url')

I think you've got two problems here.
First is that the text is rather small. You can scale the image up by making it 2x as tall and 2x as wide (preferably using AA or cubic interpolation to try and make the letters clearer).
Next there isn't enough white around the edge of the numbers for tesseract to know that it's actually an edge. So you need to add some blank whitespace image around what you've already got.
You can do that manually using photoshop or GIMP or ImageMagick or whatever to validate that it'll actually help. But if you need to do a bunch of images then you'll probably want to use PIL and ImageOps to help.
How do I resize an image using PIL and maintain its aspect ratio?
If you make the new sizes bigger rather than smaller, PIL will grow the image rather than shrink it. Grow it by 2x or 3x both width and height rather than 20% as that'll cause artifacts.
Here's one way to add extra white border:
http://effbot.org/imagingbook/imageops.htm#tag-ImageOps.expand
This question might help you with adding the extra whitespace also:
In Python, Python Image Library 1.1.6, how can I expand the canvas without resizing?

The input image is too small for recognition. Here is my solution:
Upsample the image
Add constant borders
Apply adaptive-threshold
Set configuration to digits
Upsampling the image is required for the accurate recognition. Adding contant borders will center the digits. Applying adaptive-threhsold will result the features (digit-strokes) more available. Result will be:
When you read:
049
Code:
import cv2
import pytesseract
img = cv2.imread("0cLW9.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w * 2, h * 2))
gry = cv2.copyMakeBorder(gry, 10, 10, 10, 10, cv2.BORDER_CONSTANT, value=255)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 81, 12)
txt = pytesseract.image_to_string(thr, config="digits")
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
You can achieve the same result using other pre-processing methods.

Related

PIL simple image paste - image changing color

I'm trying to paste an image onto another, using:
original = Img.open('original.gif')
tile_img = Img.open('tile_image.jpg')
area = 0, 0, 300, 300
original.paste(tile_img, area)
new_cropped.show()
This works except the pasted image changes color to grey.
Image before:
Image after:
Is there a simple way to retain the same pasted image color? I've tried reading the other questions and the documentation, but I can't find any explanation of how to do this.
Many thanks
I believe all GIF images are palettised - that is, rather than containing an RGB triplet at each location, they contain an index into a palette of RGB triplets. This saves space and improves download speed - at the expense of only allowing 256 unique colours per image.
If you want to treat a GIF (or palettised PNG file) as RGB, you need to ensure you convert it to RGB on opening, otherwise you will be working with palette indices rather than RGB triplets.
Try changing the first line to:
original = Img.open('original.gif').convert('RGB')

Converting PIL image to VIPS image

I'm working on some large histological images using Vips image library. Together with the image I have an array with coordinates. I want to make a binary mask which masks out the part of the image within the polygon created by the coordinates. I first tried to do this using vips draw function, but this is very inefficiently and takes forever (in my real code the images are about 100000 x 100000px and the array of polygons are very large).
I then tried creating the binary mask using PIL, and this works great. My problem is to convert the PIL image into an vips image. They both have to be vips images to be able to use the multiply-command. I also want to write and read from memory, as I believe this is faster than writing to disk.
In the im_PIL.save(memory_area,'TIFF') command I have to specify and image format, but since I'm creating a new image, I'm not sure what to put here.
The Vips.Image.new_from_memory(..) command returns: TypeError: constructor returned NULL
from gi.overrides import Vips
from PIL import Image, ImageDraw
import io
# Load the image into a Vips-image
im_vips = Vips.Image.new_from_file('images/image.tif')
# Coordinates for my mask
polygon_array = [(368, 116), (247, 174), (329, 222), (475, 129), (368, 116)]
# Making a new PIL image of only 1's
im_PIL = Image.new('L', (im_vips.width, im_vips.height), 1)
# Draw polygon to the PIL image filling the polygon area with 0's
ImageDraw.Draw(im_PIL).polygon(polygon_array, outline=1, fill=0)
# Write the PIL image to memory ??
memory_area = io.BytesIO()
im_PIL.save(memory_area,'TIFF')
memory_area.seek(0)
# Read the PIL image from memory into a Vips-image
im_mask_from_memory = Vips.Image.new_from_memory(memory_area.getvalue(), im_vips.width, im_vips.height, im_vips.bands, im_vips.format)
# Close the memory buffer ?
memory_area.close()
# Apply the mask with the image
im_finished = im_vips.multiply(im_mask_from_memory)
# Save image
im_finished.tiffsave('mask.tif')
You are saving from PIL in TIFF format, but then using the vips new_from_memory constructor, which is expecting a simple C array of pixel values.
The easiest fix is to use new_from_buffer instead, which will load an image in some format, sniffing the format from the string. Change the middle part of your program like this:
# Write the PIL image to memory in TIFF format
memory_area = io.BytesIO()
im_PIL.save(memory_area,'TIFF')
image_str = memory_area.getvalue()
# Read the PIL image from memory into a Vips-image
im_mask_from_memory = Vips.Image.new_from_buffer(image_str, "")
And it should work.
The vips multiply operation on two 8-bit uchar images will make a 16-bit uchar image, which will look very dark, since the numeric range will be 0 - 255. You could either cast it back to uchar again (append .cast("uchar") to the multiply line) before saving, or use 255 instead of 1 for your PIL mask.
You can also move the image from PIL to VIPS as a simple array of bytes. It might be slightly faster.
You're right, the draw operations in vips don't work well with very large images in Python. It's not hard to write a thing in vips to make a mask image of any size from a set of points (just combine lots of && and < with the usual winding rule), but using PIL is certainly simpler.
You could also consider having your poly mask as an SVG image. libvips can load very large SVG images efficiently (it renders sections on demand), so you just magnify it up to whatever size you need for your raster images.

Line thickening image filter for preprocessing of scanned digits

For a school project I've built a scanner and connected it to matlab. The scanner scans images (16-by-16 pixels) of handwritten digits from 0 to 9. I'm using a principal component analysis in order to classify the scans. Due to the low accuracy of the scanner, I need to preprocess the scans first, before I can actually send them through the recognition machine.
One of these preprocessing-steps is to thicken the lines. So far, I've used a pretty simple averageing filter for this: H = ones(3, 3) ./ 9. This bears the problem, that the circular gap of the digits 8 and 9 is likely to be "closed". I enclose a picture of all my preprocessing-steps, where the problem is visible: the image with the caption "threshholded" still shows the gap, but it disappeared after the thickening step.
My question is: Do you know a better filter for this "thickening"-step, which would not erase the gap? Or do you have an idea for a filter which could be applied after the thickening to produced the desired result? Any other suggestions or hints are also greatly appreciated.
I=imread('numberreco.png');
subplot(1,2,1),imshow(I)
I=rgb2gray(I);
BW=~im2bw(I,graythresh(I));
BW2 = bwmorph(BW,'thin');
I1=double(I).*BW2;
subplot(1,2,2),imshow(uint8(I1))
The gap is kept, and you can start from here...
Not a very general answer, but if you have the Image Processing Toolbox, and your system doesn't depend on having multiple grey levels, then converting to binary images and using the 'thicken' operation from bwmorph() should do exactly what you want.
Thinking a bit harder, you could also use a suitably thickened binary image as a mask to restore holes - either just elementwise multiply it with the blurred greyscale image or, for more flexibility:
invert it to form a background/holes mask
remove the background with imclearborder() to leave just the holes
optionally dilate the mask
use as a logical index to clear the 'hole' areas of the blurred/brightened greyscale image.
Even without the morphological steps you can use a mask to artificially reintroduce the original holes later, e.g.:
bgmask = (thresholdedimage == 0); % assuming 0 == background
holes = imclearborder(bgmask);
... % other processing steps
brightenedimage(holes) = 0; % punch holes in updated image

Python scipy.ndimage.morphology.dilation

I have a huge problem concerning png images.
My png is a black/white letter (The letter is white, the background is black).
No colors between them.
My problem is, I want/must use in some way binary_dilation/erosion...
But when I try to do this, I get an image which is white inside and the background is blue??
from scipy.ndimage.morphology import binary_dilation
from scipy.misc import imread, imsave
template = imread("temp.png")/255.0
imsave("Result.png",binary_dilation(template))
I have absolutely no clue why...
Beware of the color channels --- if "temp.png" has it, then template.shape == (nx, ny, 3) or with alpha template.shape == (nx, ny, 4). Binary dilation treats the last dimension as the third spatial dimension, rather than as a color channel, which is not what you usually want. You can do binary_dilation(template[:,:,0]) to enforce a 2-D image operation.

text_extends behaves unexpected when i try to render a png, is this a bug?

I noticed lately that in some cases the png will look differently as the pdf. I rendered the preview images in different sizes an realized that the output could be totally different for the same input when I change the output size of the surface.
The problem is, that text_extends reports different normalized sizes for the same text when the surface pixel size is different. In this example the width varies from 113.861 to 120.175. Since I have to write each line separately those errors are some times much bigger in total.
Has anybody an idea how avoid those miscalculation?
Here is a small demonstration of this problem
import cairo
form StringIO import StringIO
def render_png(width, stream):
width_px = height_px = width
surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, width_px, height_px)
cr = cairo.Context(surface)
cr.scale(float(width_px) / float(100),
float(height_px) / float(100))
cr.set_antialias(cairo.ANTIALIAS_GRAY)
cr.set_source_rgb (1, 1, 1)
cr.rectangle(0, 0, 100, 100)
cr.fill()
cr.select_font_face('Zapfino Extra LT') # a fancy font
cr.set_font_size(20)
example_string = 'Ein belieber Test Text'
xbearing, ybearing, width, height, xadvance, yadvance = (
cr.text_extents(example_string))
xpos = (100. - width) / 2. # centering text
print width
cr.move_to(xpos,50)
cr.set_source_rgba(0,0,0)
cr.show_text(example_string)
surface.write_to_png(stream)
return width
if __name__ == '__main__':
l=[]
for i in range(100,150,1):
outs=StringIO()
xpos = render_png(i,outs)
l.append((i,xpos))
#out = open('/home/hwmrocker/Desktop/FooBar/png_test%03d.png'%i, 'w')
#outs.seek(0)
#out.write(outs.read())
#out.close()
from operator import itemgetter
l=sorted(l,key=itemgetter(1))
print
print l[0]
print l[-1]
This behavior is likely due to the nature of text rendering itself - as glyphs in a font depend are drawn in different ways, depending on the pixel resolution. More so when the pixel resolution is small when compared to the glyph sizes (I'd say less than 30px height per glyph). This behavior is to be expected to some extend - always in order to prioritize readability of the text. If it is too off - or the png text is "uglier" than on the PDF (instead of incorrect size), then iis a bug in Cairo. Nevertheless, you probably should place this exact question on Cairo's issue tracker, so that the developers can tell wether it is a bug or not (and if it is, it maybe the only possible way for they to get aware of it)
(Apparently they have no public bug tracker - just e-mail it to cairo-bugs#cairographics.org )
As for your specific problem, the workaround I will suggest to you is to render your text to a larger surface -- maybe 5 times larger, and resize that surface and paste the contents on your original surface (if needed at all). This way you might avoid glyph-size variations due to constraints in the number of pixels available for each glyph (at the cost of having a poorer text rendering on your final output).