Which features can i use for handwritten OCR other than a downsampled binary grid of the image? - neural-network

Hi I have been searching though research papers on what features would be good for me to use in my handwritten OCR classifying neural network. I am a beginner so I have been just taking the image of the handwritten character, made a bounding box around it, and then resize it into a 15x20 binary image. So this means i have an input layer of 300 features. From the papers i have found on google (most of which are quite old) the methods really vary. My accuracy is not bad with just a binary grid of the image, but I was wondering if anyone had other features I could use to boost my accuracy. Or even just pointing me in the right direction. I would really appreciate it!
Thanks,
Zach

I haven't read any actual papers on this topic, but my advice would be to get creative. Use anything you could think of that might help the classifier identify numbers.
My first thought would be to try and identify "lines" in the image, maybe via a modified "sliding window" algorithm (sliding/rotating line?), or to try and identify a "line of best fit" to the image (to help the classifier respond to changes in italicism or writing style). Really though, if you're using a neural network, it should be picking up on these sorts of things without your manual help (that's the whole point of them!)
I would focus first on the structure and topology of your net to try and improve performance, and worry about additional features only if you cannot get satisfactory performance some other way. Also you could try improving the features you already have, make sure the character is centered in the image, maybe try an algorithm to skew italicised characters to make them vertical?
In my experience these sorts of things don't often help, but you could get lucky and run into one that improves your net :)

Related

Find image within image (template matching?)

I need to find the location of an image that the user provides within an image that I provide.
It is safe to assume at the time of the analysis that the user provided image is certain to be contained within the image to be compared with.
I’ve looked through and even have some experience with Core ML and Vision image classification however I am struggling to convince myself that it is the correct way to approach this problem. I feel like the way “feature values” is handled in Vision it is almost the reverse of what I’m looking for.
My question: Is there a feature of Core ML or Vision that tackles this particular problem head on?
Other information that may be needed;
It is not safe to assume that images provided are pixel to pixel perfect due to possible resolution differences.
They may also be provided in any shape although possible to crop to a standardised shape before analysis.
Rotation will also need to be accounted for.
There would not be cases where the image is in the image twice.
Take a look at some of the feature detection and matching algorithms.
For example, you could use SIFT (scale-invariant feature transform algorithm) with RANSAC (Random sample consensus algorithm) to do exactly what you described.
If you are using OpenCV there are plenty of such algorithms which you can easily use. (FAST, Shi-Tomasi, etc.)
I think you need something like this expale in OpenCV

Check for millions of collisions?

I'm building a solution to fit a number of objects most efficiently into a box. I hope to implement more efficient algorithms soon, but to start out with I'm going to use the brute force method, checking every possible position. This is fine for now since the box is small, with a very few number of items. Later, the complexity will grow.
I'm using Unity to allow the user to see how the items ultimately fit in the box. My initial thought was to also use Unity's physics and collision detection to implement the best fit algorithm; but, with a huge potential number of locations and positions to check, is this a bad approach? Am I much better off running my algorithm in a data structure instead? A 10x10x10 box with even three 1x1x1 objects have almost a billion possible positions...
I'm new to Unity so any advice is welcome; thanks!
Update: right, so this problem is definitely in the bin-packing set of problems, which I know is NP hard. I'm assuming a rectangular box, filled with rectangular box-shaped items of random dimensions.
My question is...
My question is: given my particular algorithm, when we ask, "is there currently something in this x,y,z space?" would it be more efficient to figure that out via code, or to use Unity objects with collision-detection.
Based on the answers I've seen, I can see using Unity would be profoundly inefficient.
If you LITERALLY want to know:
"is there currently something in this x,y,z space?"
the best possible way to do that, is to simply use Unity's engine. So, you trivially check the AABB to see if a point is inside it (or perhaps just check for intersection). You can use one of many
I understand that the question "is there currently something in this x,y,z space?" is or could be one important part of whatever solution you are planning. And indeed the best way to do that is to let Unity's engine do that. It's absolutely impossible you or I could write anything as efficient -- to begin with it comes right off the quaternion cloud in the GPU.
That is the actual answer to what you have now stated is your specific question.
Now regarding the more general issue, which I first fully explained when that was the question you were asking :)
Here are some of my thoughts on trivial "box packing" algorithms in 2D, at the level useful in video games.
https://stackoverflow.com/a/35228592/294884
Regarding 3D "box packing" it's absolutely impossible to offer any guidance unless you include a screen shot of what you are trying to do and fully explain the shapes and constraints involved.
If you are a matheatician and looking for the latest in algorithmic thinking on the matter, just google something like "3d box packing algorithm"
example , example
Again, readers here have utterly no clue what shapes/etc you are dealing with, so please click Edit and explain!
Note too that sphere packing is a really fascinating scientific problem, if that's what you are talking about:
https://en.wikipedia.org/wiki/Close-packing_of_equal_spheres

OpenCV vs Matlab - Line/Hallway Detection

For my computer vision class, I'm going to be doing a project where I extract information about a hallway based on an image of that hallway. In particular, the lines of the hallway which extend toward a vanishing point will be of interest. My question is whether I should use Matlab, OpenCV, or something else to implement this.
I don't have a ton of time for this project. This fact makes Matlab seem like a good option since it seems you can usually get things up and running quickly there. On the other hand, I hope to take what I do for this class project and extend it out further for research once the class is complete. This makes OpenCV seem better as (from what I've read) it's much more efficient. It's possible another choice would be to implement it in Matlab for the project than port that code to an OpenCV form later. It should be noted that I have plenty of experience with C/C++, but only a little in both Matlab and OpenCV.
At the moment, I'm leaning toward just using OpenCV from the start. However, I would like the opinion of someone who's had a bit more experience here than myself. If you'd recommend something over both OpenCV and Matlab, please say so. Also, if you have any tips on what packages or toolkits might be useful for such a project, they would be greatly appreciated.
Any suggestions? Thanks for your time!
Using which one it is easier for you to write a piece of code to read an image file and display it?
If you know C++ very well, then it should be easy to debug the code. Since you say you have little experience with Matlab, if you make a small mistake in the code debugging can take a long time.
So I suggest break down the problem into:
read image and display it, this is very easy in both
detect edges using a simple/classic method, this is super easy in both, display the result and visually check it's correctly done
use a robust line fitting method, the RANSAC and Hough transform methods are probably what you're going to use, OpenCV makes using the easier than you can guess, Matlab also has built in functions to detect lines using the Hough transform, and gives you the start/end points of each segment. But if you're finding a vanishing point, you shouldn't need those.
The decision is yours, this is not a very difficult problem, can find loads of help on the web. Good luck with the project, and please let us know how it goes.

Is there an imaging library that can make you look thinner?

Very odd question, I know, but this is a problem a potential client handed me today.
We assume we have a full length photo of a person. We want to generate a thinner image of that user. Obviously, one way would just be to compress the width of the image but that would result in various distortions that wouldn't be realistic.
I'd like to keep this an open-source implementation so if anybody knows of a library that can identify certain parts of the body and slim each in a way that is most realistic, I'd like to know.
This is obviously something that could be done by hand but we need a solution that works without user interaction.
You should look into seam-carving algorithms. The algorithm is very simple to implement and has many such implmentations online. Seems like ImageMagick has it too - called "Liquid Rescale".
I assume that already the detection of bodyparts in photos is a challenge too hard for algorithms, unless the photos are all very similar (e.g. same background, same pose, etc.)
I have once played around developing algorithms for skin smoothing. I was able to detect skin areas pretty well by converting colors to the LAB space and selecting pixels similar to skin sample colors learnt with a support vector machine from various sample images. Once you have that, you could run something like a liquify-contract algorithm for slimming.
I wouldn't expect satisfying results though unless you spend huge amounts of time on this.

iPhone UIImage number recognition

I have a small UIImage (jpg) with a single typed number. I want to be able to read the number with some kind of pattern recognition. I'm really not sure where to start, so any help would be appreciated.
my initial idea was to compare this image with other images. For instance compare the image with that of a 1,2,3, etc until a match was found. That just seems slow and cumbersome and wondered if there was a better way to do it?
Thanks
Update - I'm trying to convert sudoku puzzles from newspaper print to interactive puzzles
No, you are right, it will be slow and cumbersome. But on the plus side you don't have to write it yourself
http://sourceforge.net/projects/opencvlibrary/
Still not exactly easy tho, and i'm not sure about licensing, so… you don't mention why you need to do this (sounds a little odd).
Maybe you can avoid it? If you know the images are numerical digits 0-9, is there another way to track which one a particalur images is, apart from the way it's pixels are arranged?
Sorry if that sounds like i'm missing the point… Maybe you could fill in a few more details?
I read this really good write-up about this exact problem here: http://sudokugrab.blogspot.com/2009/07/how-does-it-all-work.html
It doesn't have any code samples, but explains the concepts, and might be able to point you in the right direction.
The following tutorial may be right down your alley:
http://blog.damiles.com/2008/11/basic-ocr-in-opencv/
It is a simple tutorial on doing number recognition and comes with the source code also.
Additionally, you may want to do a search on OCR SDK (Optical Character Recognition Software Development Kit). You will surely find a stack of them. Commercial ones a pricey though.
I would go for a "role your own" approach along the line of the OpenCV tutorial, especially since you are only interested in numbers.
All of the best ':-)