Tesseract fails to recognize digits, even with rescaling, char white_listing and filtering - tesseract

For an open source pokerbot I'm trying to recognize images as implemented here. I have tried the following with an example image that I'd like tesseract to recognize:
pytesseract.image_to_string(img_orig)
Out[32]: 'cies TE'
pytesseract.image_to_string(img_mod, 'eng', config='--psm 6 --oem 1 -c tessedit_char_whitelist=0123456789.$£B')
Out[33]: ''
So then let's use some more sophisticated methods by scaling::
basewidth = 200
wpercent = (basewidth / float(img_orig.size[0]))
hsize = int((float(img_orig.size[1]) * float(wpercent)))
img_resized = img_orig.convert('L').resize((basewidth, hsize), Image.ANTIALIAS)
if binarize:
img_resized = binarize_array(img_resized, 200)
Now we end up with an image looking like this:
Let's see what comes out:
pytesseract.image_to_string(img_resized)
Out[34]: 'Stee'
pytesseract.image_to_string(img_resized, 'eng', config='--psm 6 --oem 1 -c tessedit_char_whitelist=0123456789.$£B')
Out[35]: ''
Ok, that didn't work. Let's try applying some filers:
img_min = img_resized.filter(ImageFilter.MinFilter)
img_mod = img_resized.filter(ImageFilter.ModeFilter)
img_med = img_resized.filter(ImageFilter.MedianFilter)
img_sharp = img_resized.filter(ImageFilter.SHARPEN)
pytesseract.image_to_string(img_min)
Out[36]: ''
pytesseract.image_to_string(img_mod)
Out[37]: 'oe Se'
pytesseract.image_to_string(img_med)
Out[38]: 'rete'
pytesseract.image_to_string(img_sharp)
Out[39]: 'ry'
Or maybe binarize will help?
numpy_array = np.array(image)
for i in range(len(numpy_array)):
for j in range(len(numpy_array[0])):
if numpy_array[i][j] > threshold:
numpy_array[i][j] = 255
else:
numpy_array[i][j] = 0
img_binarized = Image.fromarray(numpy_array)
pytesseract.image_to_string(img_binarized)
Out[42]: 'Sion'
pytesseract.image_to_string(img_binarized, 'eng', config='--psm 6 --oem 1 -c tessedit_char_whitelist=0123456789.$£B')
Out[44]: '0'
Again, all totally wrong.
What else can I do?
Any suggestions would be greatly appreciated.
Add on example:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshold_img = cv2.threshold(gray, 100, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
im_pil = cv2_to_pil(threshold_img)
pytesseract.image_to_string(im_pil)
Out[5]: 'TUM'
or trying another suggested algo for:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshold_img = cv2.threshold(gray, 100, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
im_pil = cv2_to_pil(threshold_img)
pytesseract.image_to_string(im_pil, 'eng', config='--psm 7')
Out[5]: '$1.99'

I think you are making it too complicated here. I did simple Otsu thresholding on the image that you've provided and was able to get the output.
image_path = r'path/to/image'
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 100, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
cv2.imwrite('thresh.png', thresh)
detected_text = pytesseract.image_to_string(Image.open(image_path))
print(detected_text)
The image that I got after thresholding was
Tesseract was easily able to detect it and the output I got: $0.51

You can do it with or without Otsu. The trick for me was to invert the image so that it's black text on white background (which Tesseract seems to prefer).
EDIT One more trick for Tesseract is to add a border around the image. Tesseract does not like for the text to be too close to the edge.
import cv2
import pytesseract
import numpy as np
img = cv2.imread('one_twenty_nine.png', cv2.IMREAD_GRAYSCALE)
thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]
cv2.imwrite('thresh.png', thresh)
detected_text = pytesseract.image_to_string(thresh, config = '--psm 7')
print(detected_text)
which gives
$1.29

Related

Alluvial plot - reorder lodes

I have created an alluvial plot but, for visibility purposes I would like to move one lode in one of the axes: more specifically I would like the "NA" of the "Type of surgery" to be at the top so the last 4 axes are aligned.
This is the code I used on R:
aes(y = ID, axis1 = Reason, axis2 = Response, axis3=Type_of_surgery, axis4=Margins, axis5=RT_post_op, axis6=Chemo_post_op)) +
geom_alluvium(aes(fill = Type_of_surgery), width = 1/12,aes.bind = TRUE) +
geom_flow(aes.bind = TRUE) +
geom_stratum(width = 1/3, fill = "grey", color = "white") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("Reason", "Response","Type of surgery", "Margins","RT post op", "Chemo post-op"), expand = c(0.1,0.1)) +
scale_fill_brewer(type = "qual", palette = "Pastel1") +
ggtitle("TBC") ```
This is the plot I obtained:
[Alluvial plot][1]
[1]: https://i.stack.imgur.com/nDCIZ.png
I am beginning on the world of coding so any help would be most welcome,
Thank you all for your help,
JB

Why tesseract wont find this easy text in image?

I have been trying for hours to resize and change color of this image but nothing is consistently getting correct letters. Please see image below. This is a test image I'm using. The goal is to use this for automation purposes. Thanks!
Larger sample
import numpy as np
import pytesseract
from PIL import ImageGrab
import win32gui
import time
toplist, winlist = [], []
#time.sleep(3)
def enum_cb(hwnd, results):
if 'FPS:' in win32gui.GetWindowText(hwnd):
print(hex(hwnd), win32gui.GetWindowText(hwnd))
winlist.append(hwnd)
win32gui.EnumWindows(enum_cb, None)
win32gui.SetForegroundWindow(winlist[0])
bbox = win32gui.GetWindowRect(winlist[0])
print(bbox)
img = np.array(ImageGrab.grab(bbox=(130, 810, 800, 1080)))
#percent by which the image is resized
scale_percent = 400
#calculate the 50 percent of original dimensions
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
# dsize
dsize = (width, height)
# resize image
output = cv2.resize(img, dsize)
#img.show()
pytesseract.pytesseract.tesseract_cmd = r"L:\Program Files\Tesseract-OCR\tesseract.exe"
# img = cv2.imread(r"L:\MGO2PC\MGO2 UNOFFICIAL PC\RPCS3 EMU\screenshots\screenshot-2021_04_06_17_13_03.png", 0)
#crop_img = img[800:900, 260:800]
#cv2.imshow("cropped", crop_img)
#cv2.waitKey(0)
i = cv2.imwrite("test.png",output)
text = pytesseract.image_to_string(output, lang='eng')
print(text)
Ultimately found that I can isolate the text color and tesseract had no problem reading after that.
def cv2_from_screen(self):
boundaries = [
([0, 179, 105], [38, 255, 167]) # BGR
]
pytesseract.pytesseract.tesseract_cmd = r"L:\Program Files\Tesseract-
OCR\tesseract.exe"
def enum_cb(hwnd, results):
if 'FPS:' in win32gui.GetWindowText(hwnd):
print(hex(hwnd), win32gui.GetWindowText(hwnd))
self.winlist.append(hwnd)
win32gui.EnumWindows(enum_cb, None)
win32gui.EnumWindows(enum_cb, None)
win32gui.SetForegroundWindow(self.winlist[0])
image = pyautogui.screenshot()
image = cv2.cvtColor(np.array(image.crop(box=[0, 800, 1000, 1080])), cv2.COLOR_RGB2BGR) #COLOR_RGB2BGR and COLOR_BGR2GRAY

Extract numbers from specific image

I am involved in a project that I think you can help me. I have multiple images that you can see here Images to recognize. The goal here is to extract the numbers between the dashed lines. What is the best approach to do that? The idea that I have from the beginning is to find the coordinates of the dash lines and do the crop function, then is just run OCR software. But is not easy to find those coordinates, can you help me? Or if you have a better approach tell me.
Best regards,
Pedro Pimenta
You may start by looking at more obvious (bigger) objects in your images. The dashed lines are way too small in some images. Searching for the "euros milhoes" logo and the barcode will be easier and it will help you have an idea of the scale and rotation involved.
To find these objects without using match template you can binarize your image (watch out for the background texture) and use the Hu moments on the contours/blobs.
Don't expect a good OCR accuracy on images where the numbers are smaller than 8-10 pixels.
You can use python-tesseract https://code.google.com/p/python-tesseract/ ,it works with your image.What you need to do is to split the result string.I use your https://www.dropbox.com/sh/kcybs1i04w3ao97/u33YGH_Kv6#f:euro9.jpg to test.And source code is below.UPDATE
# -*- coding: utf-8 -*-
from PIL import Image
from PIL import ImageEnhance
import tesseract
im = Image.open('test.jpg')
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(4)
im = im.convert('1')
w, h = im.size
im = im.resize((w * (416 / h), 416))
pix = im.load()
LINE_CR = 0.01
WHITE_HEIGHT_CR = int(h * (20 / 416.0))
status = 0
white_line = []
for i in xrange(h):
line = []
for j in xrange(w):
line.append(pix[(j, i)])
p = line.count(0) / float(w)
if not p > LINE_CR:
white_line.append(i)
wp = None
for i in range(10, len(white_line) - WHITE_HEIGHT_CR):
k = white_line[i]
if white_line[i + WHITE_HEIGHT_CR] == k + WHITE_HEIGHT_CR:
wp = k
break
result = []
flag = 0
while 1:
if wp < 0:
result.append(wp)
break
line = []
for i in xrange(w):
line.append(pix[(i, wp)])
p = line.count(0) / float(w)
if flag == 0 and p > LINE_CR:
l = []
for xx in xrange(20):
l.append(pix[(xx, wp)])
if l.count(0) > 5:
break
l = []
for xx in xrange(416-1, 416-100-1, -1):
l.append(pix[(xx, wp)])
if l.count(0) > 17:
break
result.append(wp)
wp -= 1
flag = 1
continue
if flag == 1 and p < LINE_CR:
result.append(wp)
wp -= 1
flag = 0
continue
wp -= 1
result.reverse()
for i in range(1, len(result)):
if result[i] - result[i - 1] < 15:
result[i - 1] = -1
result = filter(lambda x: x >= 0, result)
im = im.crop((0, result[0], w, result[-1]))
im.save('test_converted.jpg')
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)
mImgFile = "test_converted.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
print "result(ProcessPagesBuffer)=",result
Depends python 2.7 python-tesseract-win32 python-opencv numpy PIL,and be sure to follow python-tesseract's remember to .

Preserving colors during CMYK to RGB transformation in PIL

I'm using PIL to process uploaded images. Unfortunately, I'm having trouble with color conversion from CMYK to RGB, as the resulting images tone and contrast changes.
I'd suspect that it's only doing direct number transformations. Does PIL, or anything built on top of it, have an Adobian dummy-proof consume embedded profile, convert to destination, preserve numbers tool I can use for conversion?
In all my healthy ignorance and inexperience, this sort of jumped at me and it's got me in a pinch. I'd really like to get this done without engaging any intricacies of color spaces, transformations and the necessary math for both at this point.
Though I've never previously used it, I'm also disposed at using ImageMagick for this processing step if anyone has experience that it can perform it in a gracious manner.
So it didn't take me long to run into other people mentioning Little CMS, being the most popular open source solution for color management. I ended snooping around for Python bindings, found the old pyCMS and some ostensible notions about PIL supporting Little CMS.
Indeed, there is support for Little CMS, it's mentioned in a whole whopping one-liner:
CMS support: littleCMS (1.1.5 or later is recommended).
The documentation contains no references, no topical guides, Google didn't crawl out anything, their mailing list is closed... but digging through the source there's a PIL.ImageCms module that's well documented and get's the job done. Hope this saves someone from a messy internet excavation.
Goes off getting himself a cookie...
it's 2019 and things have changed. Your problem is significantly more complex than it may appear at first sight. The problem is, CMYK to RGB and RGB to CMYK is not a simple there and back. If e.g. you open an image in Photoshop and convert it there, this conversion has 2 additional parameters: source color profile and destination color profile. These change things greatly! For a typical use case, you would assume Adobe RGB 1998 on the RGB side and say Coated FOGRA 39 on the CMYK side. These two additional pieces of information clarify to the converter how to deal with the colors on input and output. What you need next is a transformation mechanism, Little CMS is in deed a great tool for this. It is MIT licensed and (after looking for solutions myself for a considerable time), I would recommend the following setup if you indeed do need a python way to transform colors:
Python 3.X (necessary because of littlecms)
pip install littlecms
pip install Pillow
In littlecms' /tests folder you will find a great set of examples. I would allow myself a particular adaptation of one test. Before you get the code, please let me tell you something about those color profiles. On Windows, as is my case, you will find a set of files with an .icc extension in the folder C:\Windows\System32\spool\drivers\color where Windows stores it's color profiles. You can download other profiles from sites like https://www.adobe.com/support/downloads/iccprofiles/iccprofiles_win.html and install them on Windows simply by double-clicking the corresponding .icc file. The example I provide depends on such profile files, which Little CMS uses to do those magic color transforms. I work as a semi-professional graphics designer and needed to be able to convert colors from CMYK to RGB and vice versa for certain scripts that manipulate objects in InDesign. My setup is RGB: Adobe RGB 1998 and CMYK: Coated FOGRA 39 (these settings were recommended by most book printers I get my books printed at). The aforementioned color profiles generated very similar results for me to the same transforms made by Photoshop and InDesign. Still, be warned, the colors are slightly (by around 1%) off in comparison to what PS and Id will give you for the same inputs. I am trying to figure out why...
The little program:
import littlecms as lc
from PIL import Image
def rgb2cmykColor(rgb, psrc='C:\\Windows\\System32\\spool\\drivers\\color\\AdobeRGB1998.icc', pdst='C:\\Windows\\System32\\spool\\drivers\\color\\CoatedFOGRA39.icc') :
ctxt = lc.cmsCreateContext(None, None)
white = lc.cmsD50_xyY() # Set white point for D50
dst_profile = lc.cmsOpenProfileFromFile(pdst, 'r')
src_profile = lc.cmsOpenProfileFromFile(psrc, 'r') # cmsCreate_sRGBProfile()
transform = lc.cmsCreateTransform(src_profile, lc.TYPE_RGB_8, dst_profile, lc.TYPE_CMYK_8,
lc.INTENT_RELATIVE_COLORIMETRIC, lc.cmsFLAGS_NOCACHE)
n_pixels = 1
in_comps = 3
out_comps = 4
rgb_in = lc.uint8Array(in_comps * n_pixels)
cmyk_out = lc.uint8Array(out_comps * n_pixels)
for i in range(in_comps):
rgb_in[i] = rgb[i]
lc.cmsDoTransform(transform, rgb_in, cmyk_out, n_pixels)
cmyk = tuple(cmyk_out[i] for i in range(out_comps * n_pixels))
return cmyk
def cmyk2rgbColor(cmyk, psrc='C:\\Windows\\System32\\spool\\drivers\\color\\CoatedFOGRA39.icc', pdst='C:\\Windows\\System32\\spool\\drivers\\color\\AdobeRGB1998.icc') :
ctxt = lc.cmsCreateContext(None, None)
white = lc.cmsD50_xyY() # Set white point for D50
dst_profile = lc.cmsOpenProfileFromFile(pdst, 'r')
src_profile = lc.cmsOpenProfileFromFile(psrc, 'r') # cmsCreate_sRGBProfile()
transform = lc.cmsCreateTransform(src_profile, lc.TYPE_CMYK_8, dst_profile, lc.TYPE_RGB_8,
lc.INTENT_RELATIVE_COLORIMETRIC, lc.cmsFLAGS_NOCACHE)
n_pixels = 1
in_comps = 4
out_comps = 3
cmyk_in = lc.uint8Array(in_comps * n_pixels)
rgb_out = lc.uint8Array(out_comps * n_pixels)
for i in range(in_comps):
cmyk_in[i] = cmyk[i]
lc.cmsDoTransform(transform, cmyk_in, rgb_out, n_pixels)
rgb = tuple(rgb_out[i] for i in range(out_comps * n_pixels))
return rgb
def rgb2cmykImage(PILImage, psrc='C:\\Windows\\System32\\spool\\drivers\\color\\AdobeRGB1998.icc', pdst='C:\\Windows\\System32\\spool\\drivers\\color\\CoatedFOGRA39.icc') :
ctxt = lc.cmsCreateContext(None, None)
white = lc.cmsD50_xyY() # Set white point for D50
dst_profile = lc.cmsOpenProfileFromFile(pdst, 'r')
src_profile = lc.cmsOpenProfileFromFile(psrc, 'r')
transform = lc.cmsCreateTransform(src_profile, lc.TYPE_RGB_8, dst_profile, lc.TYPE_CMYK_8,
lc.INTENT_RELATIVE_COLORIMETRIC, lc.cmsFLAGS_NOCACHE)
n_pixels = PILImage.size[0]
in_comps = 3
out_comps = 4
n_rows = 16
rgb_in = lc.uint8Array(in_comps * n_pixels * n_rows)
cmyk_out = lc.uint8Array(out_comps * n_pixels * n_rows)
outImage = Image.new('CMYK', PILImage.size, 'white')
in_row = Image.new('RGB', (PILImage.size[0], n_rows), 'white')
out_row = Image.new('CMYK', (PILImage.size[0], n_rows), 'white')
out_b = bytearray(n_pixels * n_rows * out_comps)
row = 0
while row < PILImage.size[1] :
in_row.paste(PILImage, (0, -row))
data_in = in_row.tobytes('raw')
j = in_comps * n_pixels * n_rows
for i in range(j):
rgb_in[i] = data_in[i]
lc.cmsDoTransform(transform, rgb_in, cmyk_out, n_pixels * n_rows)
for j in cmyk_out :
out_b[j] = cmyk_out[j]
out_row = Image.frombytes('CMYK', in_row.size, bytes(out_b))
outImage.paste(out_row, (0, row))
row += n_rows
return outImage
def cmyk2rgbImage(PILImage, psrc='C:\\Windows\\System32\\spool\\drivers\\color\\CoatedFOGRA39.icc', pdst='C:\\Windows\\System32\\spool\\drivers\\color\\AdobeRGB1998.icc') :
ctxt = lc.cmsCreateContext(None, None)
white = lc.cmsD50_xyY() # Set white point for D50
dst_profile = lc.cmsOpenProfileFromFile(pdst, 'r')
src_profile = lc.cmsOpenProfileFromFile(psrc, 'r')
transform = lc.cmsCreateTransform(src_profile, lc.TYPE_CMYK_8, dst_profile, lc.TYPE_RGB_8,
lc.INTENT_RELATIVE_COLORIMETRIC, lc.cmsFLAGS_NOCACHE)
n_pixels = PILImage.size[0]
in_comps = 4
out_comps = 3
n_rows = 16
cmyk_in = lc.uint8Array(in_comps * n_pixels * n_rows)
rgb_out = lc.uint8Array(out_comps * n_pixels * n_rows)
outImage = Image.new('RGB', PILImage.size, 'white')
in_row = Image.new('CMYK', (PILImage.size[0], n_rows), 'white')
out_row = Image.new('RGB', (PILImage.size[0], n_rows), 'white')
out_b = bytearray(n_pixels * n_rows * out_comps)
row = 0
while row < PILImage.size[1] :
in_row.paste(PILImage, (0, -row))
data_in = in_row.tobytes('raw')
j = in_comps * n_pixels * n_rows
for i in range(j):
cmyk_in[i] = data_in[i]
lc.cmsDoTransform(transform, cmyk_in, rgb_out, n_pixels * n_rows)
for j in rgb_out :
out_b[j] = rgb_out[j]
out_row = Image.frombytes('RGB', in_row.size, bytes(out_b))
outImage.paste(out_row, (0, row))
row += n_rows
return outImage
Something to note for anyone implementing this: you probably want to take the uint8 CMYK values (0-255) and round them into the range 0-100 to better match most color pickers and uses of these values. See my code here: https://gist.github.com/mattdesl/ecf305c2f2b20672d682153a7ed0f133

how to determine the transparent color index of ICO image with PIL?

Specifically, this is from an .ico file, so there is no "transparent" "info" attribute like you would get in a gif. The below example illustrates converting Yahoo!'s favicon to a png using the correct transparency index of "0", which I guessed. how to detect that the ico is in fact transparent and that the transparency index is 0 ?
import urllib2
import Image
import StringIO
resp = urllib2.urlopen("http://www.yahoo.com/favicon.ico")
image = Image.open(StringIO.StringIO(resp.read()))
f = file("test.png", "w")
# I guessed that the transparent index is 0. how to
# determine it correctly ?
image.save(f, "PNG", quality=95, transparency=0)
looks like someone recognized that PIL doesn't really read ICO correctly (I can see the same thing after reconciling its source code with some research on the ICO format - there is an AND bitmap which determines transparency)
and came up with this extension:
http://www.djangosnippets.org/snippets/1287/
since this is useful for non-django applications, I've reposted here with a few tweaks to its exception throws:
import operator
import struct
from PIL import BmpImagePlugin, PngImagePlugin, Image
def load_icon(file, index=None):
'''
Load Windows ICO image.
See http://en.wikipedia.org/w/index.php?oldid=264332061 for file format
description.
'''
if isinstance(file, basestring):
file = open(file, 'rb')
try:
header = struct.unpack('<3H', file.read(6))
except:
raise IOError('Not an ICO file')
# Check magic
if header[:2] != (0, 1):
raise IOError('Not an ICO file')
# Collect icon directories
directories = []
for i in xrange(header[2]):
directory = list(struct.unpack('<4B2H2I', file.read(16)))
for j in xrange(3):
if not directory[j]:
directory[j] = 256
directories.append(directory)
if index is None:
# Select best icon
directory = max(directories, key=operator.itemgetter(slice(0, 3)))
else:
directory = directories[index]
# Seek to the bitmap data
file.seek(directory[7])
prefix = file.read(16)
file.seek(-16, 1)
if PngImagePlugin._accept(prefix):
# Windows Vista icon with PNG inside
image = PngImagePlugin.PngImageFile(file)
else:
# Load XOR bitmap
image = BmpImagePlugin.DibImageFile(file)
if image.mode == 'RGBA':
# Windows XP 32-bit color depth icon without AND bitmap
pass
else:
# Patch up the bitmap height
image.size = image.size[0], image.size[1] >> 1
d, e, o, a = image.tile[0]
image.tile[0] = d, (0, 0) + image.size, o, a
# Calculate AND bitmap dimensions. See
# http://en.wikipedia.org/w/index.php?oldid=264236948#Pixel_storage
# for description
offset = o + a[1] * image.size[1]
stride = ((image.size[0] + 31) >> 5) << 2
size = stride * image.size[1]
# Load AND bitmap
file.seek(offset)
string = file.read(size)
mask = Image.fromstring('1', image.size, string, 'raw',
('1;I', stride, -1))
image = image.convert('RGBA')
image.putalpha(mask)
return image