I have a small UIImage (jpg) with a single typed number. I want to be able to read the number with some kind of pattern recognition. I'm really not sure where to start, so any help would be appreciated.
my initial idea was to compare this image with other images. For instance compare the image with that of a 1,2,3, etc until a match was found. That just seems slow and cumbersome and wondered if there was a better way to do it?
Thanks
Update - I'm trying to convert sudoku puzzles from newspaper print to interactive puzzles
No, you are right, it will be slow and cumbersome. But on the plus side you don't have to write it yourself
http://sourceforge.net/projects/opencvlibrary/
Still not exactly easy tho, and i'm not sure about licensing, so… you don't mention why you need to do this (sounds a little odd).
Maybe you can avoid it? If you know the images are numerical digits 0-9, is there another way to track which one a particalur images is, apart from the way it's pixels are arranged?
Sorry if that sounds like i'm missing the point… Maybe you could fill in a few more details?
I read this really good write-up about this exact problem here: http://sudokugrab.blogspot.com/2009/07/how-does-it-all-work.html
It doesn't have any code samples, but explains the concepts, and might be able to point you in the right direction.
The following tutorial may be right down your alley:
http://blog.damiles.com/2008/11/basic-ocr-in-opencv/
It is a simple tutorial on doing number recognition and comes with the source code also.
Additionally, you may want to do a search on OCR SDK (Optical Character Recognition Software Development Kit). You will surely find a stack of them. Commercial ones a pricey though.
I would go for a "role your own" approach along the line of the OpenCV tutorial, especially since you are only interested in numbers.
All of the best ':-)
Related
I'm building a solution to fit a number of objects most efficiently into a box. I hope to implement more efficient algorithms soon, but to start out with I'm going to use the brute force method, checking every possible position. This is fine for now since the box is small, with a very few number of items. Later, the complexity will grow.
I'm using Unity to allow the user to see how the items ultimately fit in the box. My initial thought was to also use Unity's physics and collision detection to implement the best fit algorithm; but, with a huge potential number of locations and positions to check, is this a bad approach? Am I much better off running my algorithm in a data structure instead? A 10x10x10 box with even three 1x1x1 objects have almost a billion possible positions...
I'm new to Unity so any advice is welcome; thanks!
Update: right, so this problem is definitely in the bin-packing set of problems, which I know is NP hard. I'm assuming a rectangular box, filled with rectangular box-shaped items of random dimensions.
My question is...
My question is: given my particular algorithm, when we ask, "is there currently something in this x,y,z space?" would it be more efficient to figure that out via code, or to use Unity objects with collision-detection.
Based on the answers I've seen, I can see using Unity would be profoundly inefficient.
If you LITERALLY want to know:
"is there currently something in this x,y,z space?"
the best possible way to do that, is to simply use Unity's engine. So, you trivially check the AABB to see if a point is inside it (or perhaps just check for intersection). You can use one of many
I understand that the question "is there currently something in this x,y,z space?" is or could be one important part of whatever solution you are planning. And indeed the best way to do that is to let Unity's engine do that. It's absolutely impossible you or I could write anything as efficient -- to begin with it comes right off the quaternion cloud in the GPU.
That is the actual answer to what you have now stated is your specific question.
Now regarding the more general issue, which I first fully explained when that was the question you were asking :)
Here are some of my thoughts on trivial "box packing" algorithms in 2D, at the level useful in video games.
https://stackoverflow.com/a/35228592/294884
Regarding 3D "box packing" it's absolutely impossible to offer any guidance unless you include a screen shot of what you are trying to do and fully explain the shapes and constraints involved.
If you are a matheatician and looking for the latest in algorithmic thinking on the matter, just google something like "3d box packing algorithm"
example , example
Again, readers here have utterly no clue what shapes/etc you are dealing with, so please click Edit and explain!
Note too that sphere packing is a really fascinating scientific problem, if that's what you are talking about:
https://en.wikipedia.org/wiki/Close-packing_of_equal_spheres
Hi I have been searching though research papers on what features would be good for me to use in my handwritten OCR classifying neural network. I am a beginner so I have been just taking the image of the handwritten character, made a bounding box around it, and then resize it into a 15x20 binary image. So this means i have an input layer of 300 features. From the papers i have found on google (most of which are quite old) the methods really vary. My accuracy is not bad with just a binary grid of the image, but I was wondering if anyone had other features I could use to boost my accuracy. Or even just pointing me in the right direction. I would really appreciate it!
Thanks,
Zach
I haven't read any actual papers on this topic, but my advice would be to get creative. Use anything you could think of that might help the classifier identify numbers.
My first thought would be to try and identify "lines" in the image, maybe via a modified "sliding window" algorithm (sliding/rotating line?), or to try and identify a "line of best fit" to the image (to help the classifier respond to changes in italicism or writing style). Really though, if you're using a neural network, it should be picking up on these sorts of things without your manual help (that's the whole point of them!)
I would focus first on the structure and topology of your net to try and improve performance, and worry about additional features only if you cannot get satisfactory performance some other way. Also you could try improving the features you already have, make sure the character is centered in the image, maybe try an algorithm to skew italicised characters to make them vertical?
In my experience these sorts of things don't often help, but you could get lucky and run into one that improves your net :)
For my computer vision class, I'm going to be doing a project where I extract information about a hallway based on an image of that hallway. In particular, the lines of the hallway which extend toward a vanishing point will be of interest. My question is whether I should use Matlab, OpenCV, or something else to implement this.
I don't have a ton of time for this project. This fact makes Matlab seem like a good option since it seems you can usually get things up and running quickly there. On the other hand, I hope to take what I do for this class project and extend it out further for research once the class is complete. This makes OpenCV seem better as (from what I've read) it's much more efficient. It's possible another choice would be to implement it in Matlab for the project than port that code to an OpenCV form later. It should be noted that I have plenty of experience with C/C++, but only a little in both Matlab and OpenCV.
At the moment, I'm leaning toward just using OpenCV from the start. However, I would like the opinion of someone who's had a bit more experience here than myself. If you'd recommend something over both OpenCV and Matlab, please say so. Also, if you have any tips on what packages or toolkits might be useful for such a project, they would be greatly appreciated.
Any suggestions? Thanks for your time!
Using which one it is easier for you to write a piece of code to read an image file and display it?
If you know C++ very well, then it should be easy to debug the code. Since you say you have little experience with Matlab, if you make a small mistake in the code debugging can take a long time.
So I suggest break down the problem into:
read image and display it, this is very easy in both
detect edges using a simple/classic method, this is super easy in both, display the result and visually check it's correctly done
use a robust line fitting method, the RANSAC and Hough transform methods are probably what you're going to use, OpenCV makes using the easier than you can guess, Matlab also has built in functions to detect lines using the Hough transform, and gives you the start/end points of each segment. But if you're finding a vanishing point, you shouldn't need those.
The decision is yours, this is not a very difficult problem, can find loads of help on the web. Good luck with the project, and please let us know how it goes.
Very odd question, I know, but this is a problem a potential client handed me today.
We assume we have a full length photo of a person. We want to generate a thinner image of that user. Obviously, one way would just be to compress the width of the image but that would result in various distortions that wouldn't be realistic.
I'd like to keep this an open-source implementation so if anybody knows of a library that can identify certain parts of the body and slim each in a way that is most realistic, I'd like to know.
This is obviously something that could be done by hand but we need a solution that works without user interaction.
You should look into seam-carving algorithms. The algorithm is very simple to implement and has many such implmentations online. Seems like ImageMagick has it too - called "Liquid Rescale".
I assume that already the detection of bodyparts in photos is a challenge too hard for algorithms, unless the photos are all very similar (e.g. same background, same pose, etc.)
I have once played around developing algorithms for skin smoothing. I was able to detect skin areas pretty well by converting colors to the LAB space and selecting pixels similar to skin sample colors learnt with a support vector machine from various sample images. Once you have that, you could run something like a liquify-contract algorithm for slimming.
I wouldn't expect satisfying results though unless you spend huge amounts of time on this.
I was looking at some study i have to do in the future to do with procedural generation techniques and i was wondering what type of content you have:
Developed
Helped Develop
Seen implemented
Tried to develop
and what methods/techniques/procedures you used to develop it.
If you feel generous maybe you can even go into specifics of it such as data structures ad algorithms you have used to develop it.
If this needs to be put as community wiki because it is not me asking for a problem to be solved just let me know.
This is not a homework thread because it is a research unit that i'm not taking yet ;)
Introversion software, the makers of the games Defcon, Uplink and Darwinia (among others) have started working on a game about a year ago which extensively uses PCG for city generation, here is a video of their work, and you can read more about it on the development diary of the game (start from the first part at the bottom of the page!).
This immediately got me extremely interested, and seeing the potential for games I immediately started researching the technology. I have amassed a folder of 18 PDFs about the subject (research papers, SIGGRAPH presentations, etc). Here, I uploaded it for you.
The main approach is to use L-Systems, however, I never got around to understanding enough of that to make something out of this. I tried other, less successful approaches like using Voronois, recursively splitting a rectangular area into more smaller areas and shifting the boundaries a little to obtain a bit of randomness and polygon division.
The last method I had gotten from Mike's Code Blog's posts (here and here). The screenshots shown on his blog make me drool, it is my biggest programmer's dream to ever get something that looks like that. I emailed him to ask how he did it, and here is the relevant part of his reply, I'm sure he wouldn't mind me posting this here:
L-Systems is definitely one way to go, but that isn't what I'm doing. The basis of my method is polygon subdivision. I start with a simple polygon that represents the entire area of the city. Then, I split it (roughly) in half, and then split those two polygons, etc. until I get down to city-block size. At that point, the edges of all my polygons represent roads. I then use the same subdivision method to break the blocks down into building-size lots.
The devil is in the details, of course, but that is the basic method.
I for one still haven't managed to fully implement a solution of which I'm satisfied of, but it remains one of, if not my single biggest programmer's dream to ever achieve something like this.
Here are a few of the leaders in procedurally generated terrain (and to a lesser extent foliage). If you don't get a detailed answer here regarding methods and techniques, you might want to look in / ask in their forums. I have seen some discussions of techniques there.
TerraGen 2
World Builder
World Machine
Natural Graphics
Noone mentioned the demoscene that ONLY use procedural stuff?
So, go search for Werkkzeug, Kkrieger, MilkyTracker to start. Also you can visit the site pouet and see the wonder of well done procedural videos (yes, procedural videoclips! With music and graphics, all procedural!)
Allegorithmic's products are used in actual shipping titles. These guys focus on texture generation (both offline and at runtime).
They have some very pretty screenshots and demos.