Optical character recognition program for photographs - matlab

I need to develop an optical character recognition program in Matlab (or any other language that can do this) to be able to extract the reading on this photograph.
The program must be able to upload as many picture files as possible since I have around 40000 pictures that I need to work through.
The general aim of this task is to record intraday gas readings from the specific gas meter shown in the photograph. The is a webcam currently setup that is programmed to photgraph the readings every minute and so the OCR program would help in then having historic intraday gas reading data.
Which is the best software to do this in and are there any online sources that are available for this??

I'd break down the basic recognition steps as follows:
Locate meter display within the image
Isolate and clean up the digits
Calculate features
Classify each digit using a model you've trained using historic examples
Assuming that the camera for a particular location does not move, step 1 will only need to be performed once. Step 2 will include things like enhancing contrast and filtering noise. Step 3 can include any useful calculations you can think of, such as mean and skew of "ink" (white) pixels. Step 4 would utilize a model you build to classify a single digit as '0', '1', ... '9', and could be accomplished using k-nearest neighbors, logistic regression, SVM, neural network, etc.

A couple of things would make 1 in Predictor's answere easy: Placing the cam directly above the meter, adding sufficient light, maybe placing bright pink strips around the meter to help segment out the display :).
Once you do this, and the cam remains fixed, you can use a manual process once and then have it applied to all subsequent images to segment out the digits. If the lighting is good and consistent, you might just be able to use simple template matching to identify each of the segmented digits.
Actually, once you get a sample of all the digits, you might even be able to classify them on something simpler (like sum of thresholded pictures).

In recently, there is many object detect method can be used to deal with this problem.

Related

Classification of X-ray Image using machine learning

By what way i can classify X-ray image's features with the help of any machine learning algorithm so that when next time i test a input by sending an individual's X-ray image feature , it should send me whether or not this X-ray is present or not in the database... i have found out the features using matlab of around 20 images.
If the X-rays you're matching are identical, you don't really need to use machine learning. Just do a pixel-wise match and check if the images are say 99% identical (to make up for illumination differences in scanning). In MATLAB, you can do this by simply taking the absolute pixel-wise difference of the two images, and then counting the number of pixels that are different by more than a pre-defined threshold.
If the X-rays are not identical, and you know what features occur repeatedly when the same portion of the body of the same person is X-rayed multiple times, then machine learning would be useful.
It kinds of like face recognition where you input a human face image and then machine learning output whether this face is in your dataset. For your problem, the simplest way i can think of is just define a "distance metric" to measure the similarity of two image features and set a threshold to judge whether they are the same.

Mapping Vision Outputs To Neural Network Inputs

I'm fairly new to MATLAB, but have acquainted myself with Simulink and Computer Vision over the past few days. My problem statement involves taking a traffic/highway video input and detecting if an accident has occurred.
I plan to do this by extracting the values of centroid to plot trajectory, velocity difference (between frames) and distance between two vehicles. I can successfully track the centroids, and aim to derive the rest of the features.
What I don't know is how to map these to ANN. I mean, every image has more than one vehicle blobs, which means, there are multiple centroids in a single frame/image. So, how does NN act on multiple inputs (the extracted features per vehicle) simultaneously? I am obviously missing the link. Help me figure it out please.
Also, am I looking at time series data?
I am not exactly sure about your question. The problem can be both time series data and not. You might be able to transform the time series version of the problem, such that it can be solved using ANN, but it is sort of a Maslow's hammer :). Also, Could you rephrase the problem.
As you said, you could give it features from two or three frames and then use the classifier to detect accident or not, but it might be difficult to train such a classifier. The problem is really difficult and the so you might need tons of training samples to get it right, esp really good negative samples (for examples cars travelling close to each other) etc.
There are multiple ways you can try to solve this problem of accident detection. For example : Build a classifier (ANN/SVM etc) to detect accidents without time series data. In which case your input would be accident images and non accident images or some sort of positive and negative samples for training and later images for test. In this specific case, you are not looking at the time series data. But here you might need lots of features to detect the same (this in some sense a single frame version of the problem).
The second method would be to use time series data, in which case you will have to detect the features, track the features (say using Lucas Kanade/Horn and Schunck) and then use the information about velocity and centroid to detect the accident. You might even be able to formulate it for HMMs.

Dectecting stamp (seals) imprints on digital image with SIFT

I am working on an application that should determine if input image contain a stamp imprint and return its location. For RGB images I am using color segmentation and doing verification (with various shape factors), for grayscale image I thought that SIFT + verification would do the job, but using SIFT would only find those stamps(on input image) that I got in my database.
In ideal case it works really well, as shown on image bellow.
Fig. 1.
http://i.stack.imgur.com/JHkUl.png
The problem occurs when input image contains a stamp that does not exist in database. First thing I did was checking if there would be any matching key points if I compare a similar stamp to the one on input image. In most cases there is no single matching key point and if there is some they rather refer to other parts of input image than a stamp, as shown in Fig. 2.:
Fig. 2.
http://i.stack.imgur.com/coA4l.png
I also tried to find a match between input and circle images as the stamps are circular, but circle image has very few key points, if any.
So I wonder if there is any different approach that will make SIFT a bit more useful in this exact case? I though about creating a matrix with all descriptors and key-points from my database and then looking for nearest euclidean distance between input image and matrix, but it probably wont work as there is a lot of matching key-points(unwanted) across the database (see Fig. 2.).
I'm working with Matlab and tried both VLFeat and D. Lowe SIFT implementations.
Edit:
So I found a way to force SIFT to compute descriptors for user defined points on an image. My test image contained a circle, then the descriptors were computed and matched against input images, including the one under Fig 1 and 2. This process was repeated for scales from 0 to 10. Unfortunately it didn't help too.
This is only a first hint and not a full answer to the SIFT questions.
My impression is that detecting a circle by matching it against an image of a circle via SIFT is not the best approach, especially if the circle you want to detect has some unknown texture inside.
The textbook algorithm for circle detection would be Hough transform, which is mostly used for line detection but does work for any kind of shape which can be described by a low number of parameters (colleagues tell me things get nasty above 3, but a circle just has X,Y and r). There are several implementations in file exchange, the link is just to one example. Hough circle detection requires you to put an upper bound on the radii you want to detect, but this seems ok for your application.
From the examples you provided it looks like you should get quite far if you can detect circles reliably.
Actually I do not think SIFT will be solving this problem. I've been playing around with SIFT for quite some time and my conclusion is that it's really great for identifying identical patterns but not for similar patterns.
Just have a look at the construction of the SIFT feature vector: The descriptor is composed of several histograms of gradients(!). If you have patterns in the database that have very similar blob like structures in the stamps, then you might have a chance. But if this does not hold, then I guess you will not be very lucky.
From my point of view you have kind of solved the problem of finding indentical objects (stamps) and now extend to finding similar objects. This sounds like the same but in my past research I found these problems just related but not too identical.
Do you have any runtime constraints in your application? There might be other approaches but in this case, more input about possible constraints might be useful.
Update regarding constraints:
So your next task might be to detect the unknown stamps, right?
This sounds like a classification task.
In your case I would first try to find a descriptor/representation (or SVM) that classifies images into stamp/no-stamp. In order to evaluate this, set up a data base with ground truth and a reasonable amount of "unknown" stamps and other images like random snapshots from the letters, NOT containing stamps. This will be your test set.
Then try some descriptors/representations to caluclate the distance/similarity between your images to classify your test set into the classes STAMP / NO-STAMP. When you have found a descriptor/distance measure (or SVM) that performs well in classifying, then you could perform a sliding window approach on a letter to find a stamp. The sliding window approach is certainly not a very fast method, but a very easy one.
At least when you have reached this point, you can tune the detection - for example based on interesting point detectors.. but one step after the other...

classification technique

My BE final year project is about sign language recognition. I'm terribly confused in choosing the right classification technique for patterns seen in the video of signs generated by a dumb user. I learned neural nets(NN) are better than hidden markov model in several aspects but fine tuning the parameters of NN requires a lot of time. Further, some reports say that Support Vector Machine are better in performance than NN. What do I choose among these alternatives or are there any other better alternatives so that it would be feasible to complete my project within 4-5 months and I could continue with that field in my masters?
Actually the system will be fed with real time video and we intend to recognize the hand postures and spatiotemporal gestures. So, its the entire sentences I'm trying to find.
On the basis of studies till now, I'm making my mind to use
1. Hu moments & eigenspace size functions to represent hand shapes
2. SVM for posture classification &
3. Threshold HMM for spatiotemporal gesture recognition.
What would u comment in these decisions?

Training for pattern recognition (neural network)

How do you train Neural Network for pattern recognition? For example a face recognition in a picture how would you define the output neurons? (eg. how to detect where is the face exactly, rather than just saying that there is a face in camera). Also, how about detecting multiple faces and different size of faces?
If anyone could give me a pointer it would be really great
Cheers!
Generally speaking I would split the problem into multiple stages e.g.
1 - Is there a face in the picture?
2 - Where is the face in the picture?
3 - Is the face in the picture one that the NN (Neural network) recognises?
In each instance I would suggest you build a separate NN and train it to answer the questions posed.
As for the structure of the NN, that's a bit trickier to answer as it depends on your input data and desired output. For example if you had a 100x100 px image then I suppose its feasible to have 10,000 inputs. You might want to consider doing some preprocessing before hand to say detect ovals that way you could look and see if there are a number of ovals in a predictable outline (1 for the face, 2 for the eyes, and one for the mouth possibly). If you are preprocessing the data then you might have inputs for each oval.
Now for the output... for question one you could just have one output to say how sure the NN is that there is a face in the input data i.e a valuer of 0.0 (defiantly no face) --> 1.0 (defiantly a face). This way you can move onto stages 2 and 3.
I might say at this point that this is a non-trivial problem and you might be better to have a look at some of the frameworks available e.g. OpenCV
Now for the training part, you need to have a stockpile of images available to train the NN. There are a number of ways in which you could train the NN. One potential solution is to use a technique called back propagation 1, 2. In general terms, you use the NN on an image and compare it to a predetermined output. If its wrong tweak the NN to produce the desired output and repeat.
If you want a good book on AI, then I would highly recommend Artificial Intelligence: A Modern Approach by Russell and Norvig. Im sure that there are more appropriate Computer Vision textbooks, but the Russell & Norvig book is an excellent starter.
Dear GantengX, you should prepare your self to the fact that the answer is so large, complex and hard to understand. There is so many approaches to pattern and face recognition. And implementing real-life face recognition system is a huge array of work that one person can never handle. Prepare your self for at least 10 years of life behind books on mathematic and artificial intelligence, I'm not talking about hiring 5 highly payed developers in the end who will understand what you want them to do. And maybe you will end up having your own face recognition system. There are also dozen of other issues that will jump out during the process. So be ready for a life full of stresses and problems.
I'm sorry for telling obvious things, but your question was not specific, complete answer would touch many different scientific spheres and will result as a book with over 1k pages.
Regarding your question (the short answer).
There are several principal parts that each face recognition app consists of:
Artificial intelligence algorithm
Optimization algorithm (for AI optimization)
Different filtration algorithms
Effective data set development
Items 1. and 2. are the central part of each system, they do the actual work. Any other preprocessing just makes the input data less complex, making it easier to do a decision for your AI. Don't start 3. and 4. until you will have your first results.
P.S.
Using existing solutions is more cost-effective, but if you are studying things then don't loose time like I did, and start your dissertation right away.