I have been trying to develop an OCR engine by myself. After researching the topic a bit I have come to the conclusion that there are 4 major steps involved :
Pre-processing the image [de-skewing, image contrast, binarize, etc.]
Segment the image into the characters [to make it easier to process each character individually]
Identify the chracter through feature extraction / comparison and classification.
Post-processing the image [to increase the chances of getting an optimal solution.]
I am hopelessly lost after the 1st step! Can somebody please help me out by telling how to perform character segregation & feature extraction ? I'll be extremely grateful even if you can provide me a link which points me in the right direction.
Thanks in advance! :)
There is a paper called self-tuning spectral clustering by Zelnik-Manor and Perona. Here is the link to their page for paper and code written in Matlab:
Self-Tuning Spectral Clustering
This method can perform image segmentation. Another thing you may want to look into is topic-modeling on images for feature extraction. Anything by Blei will also be useful.
The Computer Vision System Toolbox now has the ocr function that can save you the trouble.
Related
I recently took a course at Neural Networks and decided to do research work. What I have considered is designing a network that recognizes the movement of the lips, which is commonly known as lip-reading.
I know the theory about neural networks, I chose to design a Convolutional neural network but I have problems thinking about how to extract the characteristics of the video or sequence of images that will serve as input to the network that I plan to design.
Before focusing on the full investigation, I wanted to be helped a bit by giving me concepts or ideas on how to do it, mainly in the feature extraction part.
What I have thought in general is the following:
A vowel or syllable lasts approximately 1 to 2 seconds in video. From that video I have to extract a sequence of images that show how the lips move. Assuming I selected about 10 or 15 images, I suppose all those images, after being processed, should be my "input" to get the characteristics.
But I have already analyzed a single image, like the classic example of "Recognize a letter" but, as I said before, I suppose I will have a sequence of images to analyze and that confuses me a bit.
I would like to know if I'm on the right track with this idea and if not, I would they to guide me with this. I hope I have been clear with the aforementioned, thank you very much.
This paper should help you decide how to handle the sequence of frames as input to a neural network. Looks like you can concatenate(combine) all of the frames for a particular sound into one image and feed into your net for training and evaluation.
http://cs231n.stanford.edu/reports/2016/pdfs/217_Report.pdf
I'm implementing an character recognition system with Hidden Markov Model(HMM). I have used skeleton to extract features of image. And I thought to use HMM for training images.
My question is how I can give those features to HMM? I got to know that I have to save those features into a file and then that file should feed to the HMM.
Can someone please help me? I am stuck here for two months. Still, I couldn't find the solution for this.
Appreciate your help a lot.
I was passing by and just saw this question.
Maybe you looked somewhere because your question is almost a month ago.
You give the features to HMM by clustering your data, you can use k-means, or you can use windows with lengths. If you use k-means, you will obtain the centers, you can use the centers to obtain the features, after this you need to crossfold validation to see if it really learns the features you labeled. Also K-means gives you the states and the initial transition probabilities.
Hope this helps you
I have to use mat lab to find a certain letter in a tif text image.In the spatial domain. I have no idea how to do this, and can't find any documentation other than complex code that uses loops, loops are forbidden. Yes, this is an assignment I don't want the answer just some direction on how to even start.
I want to use imfilter and use a letter as a template or filter to imfilter using correlation but from there I have no idea where to go and don't even know what questions to ask to find more info on mat labs site.
The write up makes it seem simple but I know nothing of this subject as I am a beginner so to me this is hard.
thanks
If you have the image processing toolbox I would suggest using the function normxcorr2. It calculates the normalized cross correlation between a template image and a larger image, which I think is what you want.
You don't need any for loops to use it, but the method itself probably uses for loops somewhere hidden in the code. I don't know if that counts..
How do you train Neural Network for pattern recognition? For example a face recognition in a picture how would you define the output neurons? (eg. how to detect where is the face exactly, rather than just saying that there is a face in camera). Also, how about detecting multiple faces and different size of faces?
If anyone could give me a pointer it would be really great
Cheers!
Generally speaking I would split the problem into multiple stages e.g.
1 - Is there a face in the picture?
2 - Where is the face in the picture?
3 - Is the face in the picture one that the NN (Neural network) recognises?
In each instance I would suggest you build a separate NN and train it to answer the questions posed.
As for the structure of the NN, that's a bit trickier to answer as it depends on your input data and desired output. For example if you had a 100x100 px image then I suppose its feasible to have 10,000 inputs. You might want to consider doing some preprocessing before hand to say detect ovals that way you could look and see if there are a number of ovals in a predictable outline (1 for the face, 2 for the eyes, and one for the mouth possibly). If you are preprocessing the data then you might have inputs for each oval.
Now for the output... for question one you could just have one output to say how sure the NN is that there is a face in the input data i.e a valuer of 0.0 (defiantly no face) --> 1.0 (defiantly a face). This way you can move onto stages 2 and 3.
I might say at this point that this is a non-trivial problem and you might be better to have a look at some of the frameworks available e.g. OpenCV
Now for the training part, you need to have a stockpile of images available to train the NN. There are a number of ways in which you could train the NN. One potential solution is to use a technique called back propagation 1, 2. In general terms, you use the NN on an image and compare it to a predetermined output. If its wrong tweak the NN to produce the desired output and repeat.
If you want a good book on AI, then I would highly recommend Artificial Intelligence: A Modern Approach by Russell and Norvig. Im sure that there are more appropriate Computer Vision textbooks, but the Russell & Norvig book is an excellent starter.
Dear GantengX, you should prepare your self to the fact that the answer is so large, complex and hard to understand. There is so many approaches to pattern and face recognition. And implementing real-life face recognition system is a huge array of work that one person can never handle. Prepare your self for at least 10 years of life behind books on mathematic and artificial intelligence, I'm not talking about hiring 5 highly payed developers in the end who will understand what you want them to do. And maybe you will end up having your own face recognition system. There are also dozen of other issues that will jump out during the process. So be ready for a life full of stresses and problems.
I'm sorry for telling obvious things, but your question was not specific, complete answer would touch many different scientific spheres and will result as a book with over 1k pages.
Regarding your question (the short answer).
There are several principal parts that each face recognition app consists of:
Artificial intelligence algorithm
Optimization algorithm (for AI optimization)
Different filtration algorithms
Effective data set development
Items 1. and 2. are the central part of each system, they do the actual work. Any other preprocessing just makes the input data less complex, making it easier to do a decision for your AI. Don't start 3. and 4. until you will have your first results.
P.S.
Using existing solutions is more cost-effective, but if you are studying things then don't loose time like I did, and start your dissertation right away.
I'm quite new with this topic so any help would be great. What I need is to optimize a neural network in MATLAB by using GA. My network has [2x98] input and [1x98] target, I've tried consulting MATLAB help but I'm still kind of clueless about what to do :( so, any help would be appreciated. Thanks in advance.
Edit: I guess I didn't say what is there to be optimized as Dan said in the 1st answer. I guess most important thing is number of hidden neurons. And maybe number of hidden layers and training parameters like number of epochs or so. Sorry for not providing enough info, I'm still learning about this.
If this is a homework assignment, do whatever you were taught in class.
Otherwise, ditch the MLP entirely. Support vector regression ( http://www.csie.ntu.edu.tw/~cjlin/libsvm/ ) is much more reliably trainable across a broad swath of problems, and pretty much never runs into the stuck-in-a-local-minima problem often hit with back-propagation trained MLP which forces you to solve a network topography optimization problem just to find a network which will actually train.
well, you need to be more specific about what you are trying to optimize. Is it the size of the hidden layer? Do you have a hidden layer? Is it parameter optimization (learning rate, kernel parameters)?
I assume you have a set of parameters (# of hidden layers, # of neurons per layer...) that needs to be tuned, instead of brute-force searching all combinations to pick a good one, GA can help you "jump" from this combination to another one. So, you can "explore" the search space for potential candidates.
GA can help in selecting "helpful" features. Some features might appear redundant and you want to prune them. However, say, data has too many features to search for the best set of features by some approaches such as forward selection. Again, GA can "jump" from this set candidate to another one.
You will need to find away to encode the data (input parameters, features...) fed to GA. For finding a set of input paras or a good set of features, I think binary encoding should work. In addition, choosing operators for GA to reproduce offsprings is also important. Yet GA needs to be tuned, too (early stopping which can also be applied to ANN).
Here are just some ideas. You might want to search for more info about GA, feature selection, ANN pruning...
Since you're using MATLAB already I suggest you look into the Genetic Algorithms solver (known as GATool, part of the Global Optimization Toolbox) and the Neural Network Toolbox. Between those two you should be able to save quite a bit of figuring out.
You'll basically have to do 2 main tasks:
Come up with a representation (or encoding) for your candidate solutions
Code your fitness function (which basically tests candidate solutions) and pass it as a parameter to the GA solver.
If you need help in terms of coming up with a fitness function, or encoding of candidate solutions then you'll have to be more specific.
Hope it helps.
Matlab has a simple but great explanation for this problem here. It explains both the ANN and GA part.
For more info on using ANN in command line see this.
There is also plenty of litterature on the subject if you google it. It is however not related to MATLAB, but simply the results and the method.
Look up Matthew Settles on Google Scholar. He did some work in this area at the University of Idaho in the last 5-6 years. He should have citations relevant to your work.