Mapping Vision Outputs To Neural Network Inputs - matlab

I'm fairly new to MATLAB, but have acquainted myself with Simulink and Computer Vision over the past few days. My problem statement involves taking a traffic/highway video input and detecting if an accident has occurred.
I plan to do this by extracting the values of centroid to plot trajectory, velocity difference (between frames) and distance between two vehicles. I can successfully track the centroids, and aim to derive the rest of the features.
What I don't know is how to map these to ANN. I mean, every image has more than one vehicle blobs, which means, there are multiple centroids in a single frame/image. So, how does NN act on multiple inputs (the extracted features per vehicle) simultaneously? I am obviously missing the link. Help me figure it out please.
Also, am I looking at time series data?

I am not exactly sure about your question. The problem can be both time series data and not. You might be able to transform the time series version of the problem, such that it can be solved using ANN, but it is sort of a Maslow's hammer :). Also, Could you rephrase the problem.
As you said, you could give it features from two or three frames and then use the classifier to detect accident or not, but it might be difficult to train such a classifier. The problem is really difficult and the so you might need tons of training samples to get it right, esp really good negative samples (for examples cars travelling close to each other) etc.
There are multiple ways you can try to solve this problem of accident detection. For example : Build a classifier (ANN/SVM etc) to detect accidents without time series data. In which case your input would be accident images and non accident images or some sort of positive and negative samples for training and later images for test. In this specific case, you are not looking at the time series data. But here you might need lots of features to detect the same (this in some sense a single frame version of the problem).
The second method would be to use time series data, in which case you will have to detect the features, track the features (say using Lucas Kanade/Horn and Schunck) and then use the information about velocity and centroid to detect the accident. You might even be able to formulate it for HMMs.

Related

Convolutional Neural Network for time-dependent features

I need to do dimensionality reduction from a series of images. More specifically, each image is a snapshot of a ball moving and the optimal features would be its position and velocity. As far as I know, CNN are the state-of-the-art for reducing the features for image classification, but in that case only a single frame is provided. Is it possible to extract also time-dependent features given many images at different time steps? Otherwise which is the state-of-the-art techniques for doing so?
It's the first time I use CNN and I would also appreciate any reference or any other suggestion.
If you want to be able to have the network somehow recognize a progression which is time dependent, you should probably look into recurrent neural nets (RNN). Since you would be operating on video, you should look into recurrent convolutional neural nets (RCNN) such as in: http://jmlr.org/proceedings/papers/v32/pinheiro14.pdf
Recurrence adds some memory of a previous state of the input data. See this good explanation by Karpathy: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
In your case you need to have the recurrence across multiple images instead of just within one image. It would seem like the first problem you need to solve is the image segmentation problem (being able to pick the ball out of the rest of the image) and the first paper linked above deals with segmentation. (then again, maybe you're trying to take advantage of the movement in order to identify the moving object?)
Here's another thought: perhaps you could only look at differences between sequential frames and use that as your input data to your convnet? The input "image" would then show where the moving object was in the previous frame and where it is in the current one. Larger differences would indicate larger amounts of movement. That would probably have a similar effect to using a recurrent network.

Classifying of hand gestures using HMMs on Matlab

I'm currently working on a project where I should classify hand gestures, many papers proposed that HMMs is the way to do so, many tutorials speak of either a weather tutorial or a dice and coin tutorial, I can't seem to understand how to map these to my problem and what should my different matrices be, I currently have a feature vector (containing the detected features of the hands as a n*2 matrix where n is the total number of features detected in all the frames, i.e. if the algorithm detected 10 features in each frame and the video is 10 frames, n would be = 100, and 2 is the x and y coordinates) and the motion vector (the motion of the hand itself in the video m*2 size where m is the number of frames in the video) also any other data u would recommend to extract from the video.
I know the papers you are talking about and the exemples about the weather are simplistic and cannot be mapped to most of the problems now processed with HMMs. In your case, you have features corresponding to hand gestures that you know. HMM can work because the data you have is dynamic, i.e. ordered in time.
My advice is that you should first have a look at the widely used HMM toolbox by Kevin Murphy. It provides all the tools you need to start working with HMMs.
The main idea is to model each gesture type with one dedicated HMM. For a given gesture type, the corresponding HMM will be trained with the available features that you have.
Once trained, you get a state transition probability matrix, an emission probability matrix and a prior for selecting the initial state.
When your have an unknown gesture, you will then compute the likelihood this gesture (its features actually) could have been generated by each of the trained HMMs. Usually, the query sequence is assigned to the category of the one raising the highest score.
This is for the big picture. In your case, you will have to find a way to represent your features as a time series. The "time" being the different frames. With a complex application such as hand gesture it might be difficult to see what each state of the model represents. Some kinds of HMM, by their topology (left-to-right models for instance) make this analogy easier.

Classification of X-ray Image using machine learning

By what way i can classify X-ray image's features with the help of any machine learning algorithm so that when next time i test a input by sending an individual's X-ray image feature , it should send me whether or not this X-ray is present or not in the database... i have found out the features using matlab of around 20 images.
If the X-rays you're matching are identical, you don't really need to use machine learning. Just do a pixel-wise match and check if the images are say 99% identical (to make up for illumination differences in scanning). In MATLAB, you can do this by simply taking the absolute pixel-wise difference of the two images, and then counting the number of pixels that are different by more than a pre-defined threshold.
If the X-rays are not identical, and you know what features occur repeatedly when the same portion of the body of the same person is X-rayed multiple times, then machine learning would be useful.
It kinds of like face recognition where you input a human face image and then machine learning output whether this face is in your dataset. For your problem, the simplest way i can think of is just define a "distance metric" to measure the similarity of two image features and set a threshold to judge whether they are the same.

classification technique

My BE final year project is about sign language recognition. I'm terribly confused in choosing the right classification technique for patterns seen in the video of signs generated by a dumb user. I learned neural nets(NN) are better than hidden markov model in several aspects but fine tuning the parameters of NN requires a lot of time. Further, some reports say that Support Vector Machine are better in performance than NN. What do I choose among these alternatives or are there any other better alternatives so that it would be feasible to complete my project within 4-5 months and I could continue with that field in my masters?
Actually the system will be fed with real time video and we intend to recognize the hand postures and spatiotemporal gestures. So, its the entire sentences I'm trying to find.
On the basis of studies till now, I'm making my mind to use
1. Hu moments & eigenspace size functions to represent hand shapes
2. SVM for posture classification &
3. Threshold HMM for spatiotemporal gesture recognition.
What would u comment in these decisions?

Optical character recognition program for photographs

I need to develop an optical character recognition program in Matlab (or any other language that can do this) to be able to extract the reading on this photograph.
The program must be able to upload as many picture files as possible since I have around 40000 pictures that I need to work through.
The general aim of this task is to record intraday gas readings from the specific gas meter shown in the photograph. The is a webcam currently setup that is programmed to photgraph the readings every minute and so the OCR program would help in then having historic intraday gas reading data.
Which is the best software to do this in and are there any online sources that are available for this??
I'd break down the basic recognition steps as follows:
Locate meter display within the image
Isolate and clean up the digits
Calculate features
Classify each digit using a model you've trained using historic examples
Assuming that the camera for a particular location does not move, step 1 will only need to be performed once. Step 2 will include things like enhancing contrast and filtering noise. Step 3 can include any useful calculations you can think of, such as mean and skew of "ink" (white) pixels. Step 4 would utilize a model you build to classify a single digit as '0', '1', ... '9', and could be accomplished using k-nearest neighbors, logistic regression, SVM, neural network, etc.
A couple of things would make 1 in Predictor's answere easy: Placing the cam directly above the meter, adding sufficient light, maybe placing bright pink strips around the meter to help segment out the display :).
Once you do this, and the cam remains fixed, you can use a manual process once and then have it applied to all subsequent images to segment out the digits. If the lighting is good and consistent, you might just be able to use simple template matching to identify each of the segmented digits.
Actually, once you get a sample of all the digits, you might even be able to classify them on something simpler (like sum of thresholded pictures).
In recently, there is many object detect method can be used to deal with this problem.