Hi I am new to AI and MATLAB. I want to find another way of processing image files. The purpose of this is to differentiate the digit '4' from other digits. The below code is one way (a basic way) of processing images. It takes the images, converts it to a matrix and ignores the black pixels around the edges so it focuses only on the pixels with variation (the white pixels).
% 3. convert the images into a 2D matrix
train_params = reshape(train_images, size(train_images, 1) * size(train_images, 2), size(train_images, 3));
% 4. measure the variance of the different pixels and discard those which
% are zero
train_stds = std(train_params');
tokeep = find(train_stds>0);
train_params = train_params(tokeep,:);
Here is the images which are being processed:
I want to find another (more focused) way of processing these images to recognize the number '4'.
Thank you
You are new to Matlab and it is understandable, but I think you should go deep to image processing concepts first. We have so many recognition methods in image processing to solve your problem. Please take a look at here. Even you can recognize a digit by simple morphology operations in Matlab or by complicated machine learning approach. You can find a method here which solves this problem with Neural network. And also Matlab tutorial #Sardar Usama already introduced is one of the best which uses HOG features and a multiclass SVM classifier to classify digits. You can find more here.
Related
I am working with a classification algorithm that requires the size of the feature vector of all samples in training and testing to be the same.
I am also to use the SIFT feature extractor. This is causing problems as the feature vector of every image is coming up as a different sized matrix. I know that SIFT detects variable keypoints in each image, but is there a way to ensure that the size of the SIFT features is consistent so that I do not get a dimension mismatch error.
I have tried rootSIFT as a workaround:
[~, features] = vl_sift(single(images{i}));
double_features = double(features);
root_it = sqrt( double_features/sum(double_features) ); %root-sift
feats{i} = root_it;
This gives me a consistent 128 x 1 vector for every image, but it is not working for me as the size of each vector is now very small and I am getting a lot of NaN in my classification result.
Is there any way to solve this?
Using SIFT there are 2 steps you need to perform in general.
Extract SIFT features. These points (first output argument of
size NPx2 (x,y) of your function) are scale invariant, and should in
theory be present in each different image of the same object. This
is not completely true. Often points are unique to each frame
(image). These points are described by 128 descriptors each (second
argument of your function).
Match points. Each time you compute features of a different image the amount of points computed is different! Lots of them should be the same point as in the previous image, but lots of them WON'T. You will have new points and old points may not be present any more. This is why you should perform a feature matching step, to link those points in different images. usually this is made by knn matching or RANSAC. You can Google how to perform this task and you'll have tons of examples.
After the second step, you should have a fixed amount of points for the whole set of images (considering they are images of the same object). The amount of points will be significantly smaller than in each single image (sometimes 30~ times less amount of points). Then do whatever you want with them!
Hint for matching: http://www.vlfeat.org/matlab/vl_ubcmatch.html
UPDATE:
You seem to be trying to train some kind of OCR. You would need to probably match SIFT features independently for each character.
How to use vl_ubcmatch:
[~, features1] = vl_sift(I1);
[~, features2] = vl_sift(I2);
matches=vl_ubcmatch(features1,features2)
You can apply a dense SIFT to the image. This way you have more control over from where you get the feature descriptors. I haven't used vlfeat, but looking at the documentation I see there's a function to extract dense SIFT features called vl_dsift. With vl_sift, I see there's a way to bypass the detector and extract the descriptors from points of your choice using the 'frames' option. Either way it seems you can get a fixed number of descriptors.
If you are using images of the same size, dense SIFT or the frames option is okay. There's a another approach you can take and it's called the bag-of-features model (similar to bag-of-words model) in which you cluster the features that you extracted from images to generate codewords and feed them into a classifier.
I was reading this particular paper http://www.robots.ox.ac.uk/~vgg/publications/2011/Chatfield11/chatfield11.pdf and I find the Fisher Vector with GMM vocabulary approach very interesting and I would like to test it myself.
However, it is totally unclear (to me) how do they apply PCA dimensionality reduction on the data. I mean, do they calculate Feature Space and once it is calculated they perform PCA on it? Or do they just perform PCA on every image after SIFT is calculated and then they create feature space?
Is this supposed to be done for both training test sets? To me it's an 'obviously yes' answer, however it is not clear.
I was thinking of creating the feature space from training set and then run PCA on it. Then, I could use that PCA coefficient from training set to reduce each image's sift descriptor that is going to be encoded into Fisher Vector for later classification, whether it is a test or a train image.
EDIT 1;
Simplistic example:
[coef , reduced_feat_space]= pca(Feat_Space','NumComponents', 80);
and then (for both test and train images)
reduced_test_img = test_img * coef; (And then choose the first 80 dimensions of the reduced_test_img)
What do you think? Cheers
It looks to me like they do SIFT first and then do PCA. the article states in section 2.1 "The local descriptors are fixed in all experiments to be SIFT descriptors..."
also in the introduction section "the following three steps:(i) extraction
of local image features (e.g., SIFT descriptors), (ii) encoding of the local features in an image descriptor (e.g., a histogram of the quantized local features), and (iii) classification ... Recently several authors have focused on improving the second component" so it looks to me that the dimensionality reduction occurs after SIFT and the paper is simply talking about a few different methods of doing this, and the performance of each
I would also guess (as you did) that you would have to run it on both sets of images. Otherwise your would be using two different metrics to classify the images it really is like comparing apples to oranges. Comparing a reduced dimensional representation to the full one (even for the same exact image) will show some variation. In fact that is the whole premise of PCA, you are giving up some smaller features (usually) for computational efficiency. The real question with PCA or any dimensionality reduction algorithm is how much information can I give up and still reliably classify/segment different data sets
And as a last point, you would have to treat both images the same way, because your end goal is to use the Fisher Feature Vector for classification as either test or training. Now imagine you decided training images dont get PCA and test images do. Now I give you some image X, what would you do with it? How could you treat one set of images differently from another BEFORE you've classified them? Using the same technique on both sets means you'd process my image X then decide where to put it.
Anyway, I hope that helped and wasn't to rant-like. Good Luck :-)
first of all this is my first question here, so I hope I can explain it in a clear way.
My goal is to detect different classes of traffic signs in images. For that purpose I have trained binary SVMs following these steps:
First I got a database of cropped traffic signs like the one in the link. I considered different classes (prohibition, danger, etc), and negative images. All of them were scaled to 40x40 pixels.
http://i.imgur.com/Hm9YyZT.jpg
I trained linear-SVM models for each class (1-vs-all), using HOG as feature. Each image is described with a 1728-dimensional feature. (I append the three feature vectors for all three image planes). I did crossvalidation to set parameter C, and tested on previously unseen 40x40 images, and I got very accurate results (F1 score over 0.9 for all classes). I used libsvm for training and testing.
Now I'd want to detect signs in full road images, sliding a window in different image scales. The problem I'm facing is that I couldn't find any function that can do it for me (as DetectMultiScale in OpenCV), and my solution is very slow and rudimentary (I'm just doing a triple for loop, and for each scale I crop consecutive and overlapping 40x40 images, obtain HOG features and apply svmpredict for each one).
Can someone give me a clue to find a faster way to do it? I thought too about getting the HOG feature vector of the whole input image, and then reorder that vector to a matrix where each row will have the features corresponding to each 40x40 window, but I couldn't find a straightforward way of doing it.
Thanks,
I would suggest using SURF feature detection, however I don't know if this would also be too slow your needs.
See : http://morf.lv/modules.php?name=tutorials&lasit=2 for more information on how to implement and weather it is a viable solution for you.
I am quite newbie on vlfeat library for computer vision and I have problems dealing with it. What i am trying to do is using the Histograms of Oriented Gradients (HOG) as feature vectors to classify in LIBSVM images with different dimensions.
The first issue i am dealing is the fact that vl_hog returns me a HOG matrix, not a vector. This is not a real issue because I can vectorize this matrix as it follows:
hog = vl_hog(image,cellSize);
features=hog(:);
The second problem is what's freaking me out. Because the images have different dimensions, the feature vectors also have different dimensions, so it's impossible to feed libsvm with them, or i'm wrong? can I solve this in an easier way? did i miss something?
You need to create a global representation of your local features so that you can feed your data to SVMs. One of the most popular approaches for this task is bag-of-words(features). vlfeat has an excellent demo/example for this. You can check this code from vlfeat website.
For your particular case, you need arrange your training/testing data in Caltech-101 like data directories:
Letter 1
Image 1
Image 2
Image 3
Image 4
...
Letter 2
...
Letter 3
...
Then you need to adjust following configuration settings for your case:
conf.numTrain = 15 ;
conf.numTest = 15 ;
conf.numClasses = 102 ;
This demo uses SIFT as local features, but you can change it to HOG afterwards.
I am working on an application that should determine if input image contain a stamp imprint and return its location. For RGB images I am using color segmentation and doing verification (with various shape factors), for grayscale image I thought that SIFT + verification would do the job, but using SIFT would only find those stamps(on input image) that I got in my database.
In ideal case it works really well, as shown on image bellow.
Fig. 1.
http://i.stack.imgur.com/JHkUl.png
The problem occurs when input image contains a stamp that does not exist in database. First thing I did was checking if there would be any matching key points if I compare a similar stamp to the one on input image. In most cases there is no single matching key point and if there is some they rather refer to other parts of input image than a stamp, as shown in Fig. 2.:
Fig. 2.
http://i.stack.imgur.com/coA4l.png
I also tried to find a match between input and circle images as the stamps are circular, but circle image has very few key points, if any.
So I wonder if there is any different approach that will make SIFT a bit more useful in this exact case? I though about creating a matrix with all descriptors and key-points from my database and then looking for nearest euclidean distance between input image and matrix, but it probably wont work as there is a lot of matching key-points(unwanted) across the database (see Fig. 2.).
I'm working with Matlab and tried both VLFeat and D. Lowe SIFT implementations.
Edit:
So I found a way to force SIFT to compute descriptors for user defined points on an image. My test image contained a circle, then the descriptors were computed and matched against input images, including the one under Fig 1 and 2. This process was repeated for scales from 0 to 10. Unfortunately it didn't help too.
This is only a first hint and not a full answer to the SIFT questions.
My impression is that detecting a circle by matching it against an image of a circle via SIFT is not the best approach, especially if the circle you want to detect has some unknown texture inside.
The textbook algorithm for circle detection would be Hough transform, which is mostly used for line detection but does work for any kind of shape which can be described by a low number of parameters (colleagues tell me things get nasty above 3, but a circle just has X,Y and r). There are several implementations in file exchange, the link is just to one example. Hough circle detection requires you to put an upper bound on the radii you want to detect, but this seems ok for your application.
From the examples you provided it looks like you should get quite far if you can detect circles reliably.
Actually I do not think SIFT will be solving this problem. I've been playing around with SIFT for quite some time and my conclusion is that it's really great for identifying identical patterns but not for similar patterns.
Just have a look at the construction of the SIFT feature vector: The descriptor is composed of several histograms of gradients(!). If you have patterns in the database that have very similar blob like structures in the stamps, then you might have a chance. But if this does not hold, then I guess you will not be very lucky.
From my point of view you have kind of solved the problem of finding indentical objects (stamps) and now extend to finding similar objects. This sounds like the same but in my past research I found these problems just related but not too identical.
Do you have any runtime constraints in your application? There might be other approaches but in this case, more input about possible constraints might be useful.
Update regarding constraints:
So your next task might be to detect the unknown stamps, right?
This sounds like a classification task.
In your case I would first try to find a descriptor/representation (or SVM) that classifies images into stamp/no-stamp. In order to evaluate this, set up a data base with ground truth and a reasonable amount of "unknown" stamps and other images like random snapshots from the letters, NOT containing stamps. This will be your test set.
Then try some descriptors/representations to caluclate the distance/similarity between your images to classify your test set into the classes STAMP / NO-STAMP. When you have found a descriptor/distance measure (or SVM) that performs well in classifying, then you could perform a sliding window approach on a letter to find a stamp. The sliding window approach is certainly not a very fast method, but a very easy one.
At least when you have reached this point, you can tune the detection - for example based on interesting point detectors.. but one step after the other...