Detect character by comparing images of different dimensions - matlab

I seem to have stumbled upon a small problem while developing an Optical Character Recognition engine. I have trained the K nearest neighbour classifier on MNIST images and even tested it. It seems to work fine. However, when I input images of different dimensions, it seems unable to classify the input image correctly.
Any suggestions on how to work around this problem ?
I] KNN Classifier -
the code for knn classification is :
% herein, I resize the binary image 'b' to contain the
% same dimensions as the training set 'trainingImages' as the input and training Images
% should have the same no. of columns / dimensions
b = imresize(b, size(trainingImages));
% now i try to classify the input image 'b' against the set of training images and
% training labels.
cls = knnclassify(b, trainingImages, trainingLabels, 3, 'euclidean');
cls is now the classification vector. However, this almost always shows the incorrect classification of 1 regardless of the input image.
On the other hand, when I classify the set of MNIST test images, I get a VERY high level of accuracy! The code for the same is as follows -
class = knnclassify(testImg, trainingImages, trainingLabels, 3, 'euclidean');
Right now the main problem is, no matter what kind of input image I give it to predict, it mostly gives me a wrong result (varying for different images), even for those very different images. Seems like it is not working correctly. Could someone help me check out where should be the problem here? I couldn't find any explanation from the existing sources on the internet. Thanks in advance.

I believe I solved the problem which I listed above.
The problems were :
Like Dhanushka said, I was converting the original input image's dimensions to match that of the training image set's dimensions (which in the case of MNIST was 60000 * 784, implying 60000 digits and 784 features for each digit [28 * 28] ).
Thus, I simply changed the dimension of the input image to 28*28.
Pre-processing the input image.
I was simply converting the image to the binary image and trying to classify that against the MNIST training image data set. This was an INCOMPLETE procedure.
When I further detected the edges of the input binary image (Canny, Prewitt or Zerocross - whichever suits you better) and used this for classification, I got an extremely accurate prediction!
NOTE : In KNN classification, you will have to arrive at the number of neighbouring pixels to consider by trial and error. I managed to arrive at the following conclusions -
3 neighbouring pixels are generally enough for synthetic images
1 neighbouring pixel is mostly suitable for handwritten images
The code for the same is as follows :
% herein, I resize the binary image 'b' as the input and training Images
% should have the same no. of columns / dimensions
b = imresize(b, [28 28]); % this resizes the binary image b to 28*28 dimension
b = edge(b, 'canny'); % this uses Canny edge detection on the resized binary
% image
b = b(:)'; % This converts 'b' to a vector using b(:) and then
% transposes the result using the " ' " operator
% Thus, now 'b' has same no of dimensions/columns as
% MNIST training image set
% now i try to classify the input image 'b' against the set of training images
% and training labels.
cls = knnclassify(b, trainingImages, trainingLabels, 3, 'euclidean');

Related

Why apply rgb2gray and normalize before normalized cross-correlation?

function out = findWaldo(im, filter)
% convert image (and filter) to grayscale
im_input = im;
im = rgb2gray(im);
im = double(im);
filter = rgb2gray(filter);
filter = double(filter);
filter = filter/sqrt(sum(sum(filter.^2)));
out = normxcorr2(filter, im);
Question1: Why we first do rgb2gray on im and filter?
Question2: What does the last second line actually do? Namely,
filter = filter/sqrt(sum(sum(filter.^2)));
Question 1: Why apply rgb2gray first?
normxcorr2 standing for "Normalized 2-D cross-correlation" works on a 2D signal (see doc). A RGB image is a 3D signal: width x height x color (e.g. 1024 x 1024 x 3, 3 since it's three colors). That is why you flatten it first to one color channel. Applying the filter to the image on each color separately would be the alternative, but then you also need to process three correlations (average them or something...).
Question 2: What does filter = filter/sqrt(sum(sum(filter.^2)));do?
It squares the filter image, then sums over rows and columns (basically all squared gray values of the filters) to get a single number that the squareroot is applied to and then is used to divide all filter image values.
I'd say it is some sort of normalization to handle specific input signals, maybe an attempt to get values from 0 - 1. But since normalized cross correlation (normxcorr2) does normalization itself, this step is definitely not needed for it. Unless you don't do something other than cross correlation with the filter variable, I'd consider this an artifact that should be deleted.
General explanation of the function
This function receives two inputs: an image file and a template.
For example, the image file may be a large scene of findings Waldo game and the template can be a picture of Waldo himself.
The output is a matrix called 'out' with same size as the image file. s.t. each pixel holds a "matching results". The higher the value - the higher the chances that the patch centered around the pixel holds a similar pattern such as the template.
The maximal value should be on the pixel in which Waldo is.
Question 1
rgb2gray function receives an rgb image with 3 channels and transorm it into gray image.
It is done on im and on filter because normxcorr2 function only works with grayscale images.
Question 2
The last perform normalization of the pattern: it divides it by it's norm, thus changing it to 1. In fact, this line is not required and should be deleted. Normalization stage is already performed inside normxcorr2 function.

grayscale image processing using k-mean

I am trying to convert the rgb image into a grayscale and then cluster it using kmean function of matlab .
here is my code
he = imread('tumor2.jpg');
%convert into a grayscale image
ab=rgb2gray(he);
nrows = size(ab,1);
ncols = size(ab,2);
%convert the image into a column vector
ab = reshape(ab,nrows*ncols,1);
%nColors=no of clusters
nColors = 3;
%cluster_idx is a n x 1 vector where cluster_idx(i) is the index of cluster assigned to ith pixel
[cluster_idx, cluster_center ,cluster_sum] = kmeans(ab,nColors,'distance','sqEuclidean','Replicates',1,'EmptyAction','drop' );
figure;
%converting vector into a matrix of dimensions equal to that of original
%image dimensions (nrows x ncols)
pixel_labels = reshape(cluster_idx,nrows,ncols);
pixel_labels
imshow(pixel_labels,[]), title('image labeled by cluster index');
problems
1) output image is always a plain white image.
i tried the solution given in the link below but output of the image is a plain gray image in this case.
find the solution tried here
2) when i execute my code second time ,execution does not proceed beyond k-mean function (it is likes an infinite loop there). hence no output in this case.
Actually, it looks like when you are colour segmenting kmeans is known to fall in local minima. This means that often, it wont find the amount of clusters you want as the minimization is not the best (that's why lots of people use other type of segmentation, such as level sets or simple region growing).
An option is to increase the amount of Replicates (amount of times kmeans will try to find the answer). At the moment you are setting it to 1, but you could try 3 or 4, and it may reach the solution that way.
In this question the accepted answer recommends to use a kmeans version of the algorithm specifically created for image segmentation. I havent tried myself but I think its worth a shot.
Link to FEX

why are wavelet coefficients zeros after decomposition

I am learning wavelet theory for image processing. To understand the theory, I write one Matlab program to decompose one black-white image. The program is as follows
Image = zeros(256, 256, 'uint8');
Image(101:200, 101:200) = 255;
figure; imshow(Image);
[cA1, cH1, cV1, cD1] = dwt2(Image, 'db1');
Image1 = [cA1, cH1; cV1, cD1];
figure; imshow(Image1, []);
[cA1, cH1, cV1, cD1] = dwt2(Image, 'db2');
Image1 = [cA1, cH1; cV1, cD1];
figure; imshow(Image1, []);
The first decomposition using the argument db1 produces zeros for all wavelet coefficients. The black-white image has the transition from 0 to 255 along horizontal and vertical directions and should have high-frequency component. Why are zero wavelet coefficients generated? If I change the argument from db1 to db2, the result will show horizontal and vertical lines in the subbands.
If you recall, db1 is the Haar Wavelet. The Haar Wavelet takes either an average of pixels within local windows for the approximation coefficients (or the LL band), or a difference of pixels within local windows for the detail coefficients (or the LH, HL and HH bands).
Be advised that the input image that you specified only consists of two intensities: 0 and 255. Also, you set a square grid within this image to be 255 and it is uniformly shaped.
For self-containment, this is what your test image looks like:
This uniformly shaped object within a square image is important as part of the reasoning why you are not getting any output for the detail images (HL, LH, and HH).
The best way to describe why you're seeing an output for db2 and not db1 can be shown visually.
This slide is from the University of Toronto's CS 320: Introduction to Visual Computing course, specifically, the Discrete Wavelet Transform lecture:
You are well aware that when you take the 2D DWT, you produce 4 sub-images that are half the resolution of the original image. The first output of dwt2 are the approximation coefficients where each output pixel is an average of a 2 x 2 window. The other outputs (second, third and fourth) are detail windows that take two pixels within the window and subtracts these with two other pixels in the window.
As such, the reason why you are not getting an output with db1 is because all of your calculations for the detail images will cancel out. Specifically, you will get 2 x 2 windows of either completely 0, or completely 255, and when you calculate the detail images, you will get 0s for the output regardless. You would take add two 0 values, or two 255 values and you would subtract these two 0 values, or 255 values respectively, thus causing the output to be 0 regardless.
The db2 wavelet is a more complicated transform which is a weighted sum of non-uniform coefficients and so you will certainly get outputs for the detail images rather than just a simple differencing of the 2 x 2 windows.
I would like to stress that if you have a more complicated shape that is non-uniform, db1 will certainly not give you a zero output. Try it on any test image that comes with MATLAB, like with cameraman.tif.
Hope this helps!

Color correcting images in MATLAB

I have 2 images im1 and im2 shown below. Theim2 picture is the same as im1, but the only difference between them is the colors. im1 has RGB ranges of (0-255, 0-255, 0-255) for each color channel while im2 has RGB ranges of (201-255, 126-255, 140-255). My exercise is to reverse the added effects so I can restore im2 to im1 as closely as I can. I have 2 thoughts in mind. The first is to match their histograms so they both have the same colors. I tried it using histeq but it restores only a portion of the image. Is there any way to change im2's histogram to be exactly the same as im1? The second approach was just to copy each pixel value from im1 to im2 but this is wrong since it doesn't restore the original image state. Are there any suggestions to restore the image?
#sepdek below pretty much suggested the method that #NKN alluded to, but I will provide another approach. One more alternative I can suggest is to perform a colour correction based on a least mean squared solution. What this alludes to is that we can assume that transforming a pixel from im2 to im1 requires a linear combination of weights. In other words, given a RGB pixel where its red, green and blue components are shaped into a 3 x 1 vector from the corrupted image (im2), there exists some linear transformation to get its equivalent pixel in the clean image (im1). In other words, we have this relationship:
[R_im1] [R_im2]
[G_im1] = A * [G_im2]
[B_im1] [B_im2]
Y = A * X
A in this case would be a 3 x 3 matrix. This is essentially performing a matrix multiplication to get your output corrected pixel. The input RGB pixel from im2 would be X and the output RGB pixel from im1 would be Y. We can extend this to as many pixels as we want, where pairs of pixels from im1 and im2 would establish columns along Y and X. In general, this would further extend X and Y to 3 x N matrices. To find the matrix A, you would find the least mean squared error solution. I won't get into it, but to find the optimal matrix of A, this requires finding the pseudo-inverse. In our case here, A would thus equal to:
Once you find this matrix A, you would need to take each pixel in your image, shape it so that it becomes a 3 x 1 vector, then multiply A with this vector like the approach above. One thing you're probably asking yourself is what kinds of pixels do I need to grab from both images to make the above approach work? One guideline you must adhere to is that you need to make sure that you're sampling from the same spatial location between the two images. As such, if we were to grab a pixel at... say... row 4, column 9, you need to make sure that both pixels from im1 and im2 come from this same row and same column, and they are placed in the same corresponding columns in X and Y.
Another small caveat with this approach is that you need to be sure that you sample a lot of pixels in the image to get a good solution, and you also need to make sure the spread of your sampling is over the entire image. If we localize the sampling to be within a small area, then you're not getting a good enough distribution of the colours and so the output will not look very nice. It's up to you on how many pixels you choose for the problem, but from experience, you get to a point where the output starts to plateau and you don't see any difference. For demonstration purposes, I chose 2000 pixels in random positions throughout the image.
As such, this is what the code would look like. I use randperm to generate a random permutation from 1 to M where M is the total number of pixels in the image. These generate linear indices so that we can sample from the images and construct our matrices. We then apply the above equation to find A, then take each pixel and apply a matrix multiplication with A to get the output. Without further ado:
close all;
clear all;
im1 = imread('http://i.stack.imgur.com/GtgHU.jpg');
im2 = imread('http://i.stack.imgur.com/wHW50.jpg');
rng(123); %// Set seed for reproducibility
num_colours = 2000;
ind = randperm(numel(im1) / size(im1,3), num_colours);
%// Grab colours from original image
red_out = im1(:,:,1);
green_out = im1(:,:,2);
blue_out = im1(:,:,3);
%// Grab colours from corrupted image
red_in = im2(:,:,1);
green_in = im2(:,:,2);
blue_in = im2(:,:,3);
%// Create 3 x N matrices
X = double([red_in(ind); green_in(ind); blue_in(ind)]);
Y = double([red_out(ind); green_out(ind); blue_out(ind)]);
%// Find A
A = Y*(X.')/(X*X.');
%// Cast im2 to double for precision
im2_double = double(im2);
%// Apply matrix multiplication
out = cast(reshape((A*reshape(permute(im2_double, [3 1 2]), 3, [])).', ...
[size(im2_double,1) size(im2_double,2), 3]), class(im2));
Let's go through this code slowly. I am reading your images directly from StackOverflow. After, I use rng to set the seed so that you can reproduce the same results on your end. Setting the seed is useful because it allows you to reproduce the random pixel selection that I did. We generate those linear indices, then create our 3 x N matrices for both im1 and im2. Finding A is exactly how I described, but you're probably not used to the rdivide / / operator. rdivide finds the inverse on the right side of the operator, then multiplies it with whatever is on the left side. This is a more efficient way of doing the calculation, rather than calculating the inverse of the right side separately, then multiplying with the left when you're done. In fact, MATLAB will give you a warning stating to avoid calculating the inverse separately and that you should the divide operators instead. Next, I cast im2 to double to ensure precision as A will most likely be floating point valued, then go through the multiplication of each pixel with A to compute the result. That last line of code looks pretty intimidating, but if you want to figure out how I derived this, I used this to create vintage style photos which also require a matrix multiplication much like this approach and you can read up about it here: How do I create vintage images in MATLAB? . out stores our final image. After running this code and showing what out looks like, this is what we get:
Now, the output looks completely scrambled, but the colour distribution more or less mimics what the input original image looks like. I have a few explanations on why this is the case:
There is quantization noise. If you take a look at the final image, there is various white spotting all over. This is probably due to the quantization error that is introduced when compressing your image. Pixels that should map to the same colours between the images will have slight variations due to quantization which gives us that spotting
There is more than one colour from im2 that maps to im1. If there is more than one colour from im2 that maps to im1, it is impossible for a linear multiplication with the matrix A to be able to generate more than one kind of colour for im1 given a single pixel in im2. Instead, the least mean-squared solution will try and generate a colour that minimizes the error and give you the best colour possible instead. This is probably way the face and other fine details of the image are obscured because of this exact reason.
The image is noisy. Your im2 is not completely clean. I can also see various spots of salt and pepper noise across all of the channels. One bad thing about this method is that if your image is subject to noise, then this method will not faithfully reconstruct the original image properly. Your image can only be corrupted by a wrong mapping of colours. Should there be any other type of image noise introduced, then this method will definitely not work as you are trying to reconstruct the original image based on a noisy image. There are pixels in the noisy image that were never present in the original image, so you'll have no luck getting it back to the way it was before!
If you want to take a look at the histograms of each channel between the original image and the output image, this is what we get:
The code I used to generate the above figure was:
names = {'Red', 'Green', 'Blue'};
figure;
for idx = 1 : 3
subplot(3,2,2*idx - 1);
imhist(im1(:,:,idx));
title([names{idx} ': Image 1']);
end
for idx = 1 : 3
subplot(3,2,2*idx);
imhist(out(:,:,idx));
title([names{idx} ': Output']);
end
The left side shows the red, green and blue histograms for the original image while the right side shows the same histograms for the reconstructed image. You can see that the general shape more or less mimics the original image, but there are some spikes throughout - most likely attributed to quantization noise and the non-unique mapping between colours of both images.
All in all, this is the best that I could do, but I think that was the whole point of the exercise.... to show that it isn't possible.
For more information on how to perform colour correction, check out Richard Alan Peters' II Digital Image Processing slides on colour correction. This was what I started with, and the derivation of how to calculate A can be found in his slides. Perhaps you can use some of what he talks about in your future work.
Good luck!
It seems that you need a scaling function to map the values of im2 to the values of im1.
This is fairly simple and you could write a scaling function to have it available for any such case.
A basic scaling mapping would work as follows:
out_value = min_output + (in_value - min_input) * (outrange / inrange)
given that there is an input value in_value that is within a range of values inrange=max_input-min_input and the mapping results an output value out_value within a range outrange=max_output-min_output. We also need to take into account the minimum input and output range bounds (min_input and min_output) to have a correct mapping.
See for example the following code for a scaling function:
%
% scale the values of a matrix using a set of limits
% possible ways to use:
% y = scale( x, in_range, out_range) --> ex. y = scale( x, [8 230], [0 255])
% y = scale( x, out_range) --> ex. y = scale( x, [0 1])
%
function y = scale( x, varargin );
if nargin<2,
error([upper(mfilename),':: Syntax: y=',mfilename,'(x[,in_range],out_range)']);
end;
if nargin==2,
inrange=[min(x(:)) max(x(:))]; % compute the limits of the input variable
outrange=varargin{1}; % get the output limits from the arguments
else
inrange=varargin{1}; % get the input limits from the arguments
outrange=varargin{2}; % get the output limits from the arguments
end;
if diff(inrange)==0, % row or column vector matrix or scalar
% just do a clipping...
if x>=outrange(2),
y=outrange(2);
elseif x<=outrange(1),
y=outrange(1);
else
y=x;
end;
else
% actually scale the data
% using: out = min_output + (x-min_input) * (outrange / inrange)
y = outrange(1) + (x-inrange(1))*abs(diff(outrange))/abs(diff(inrange));
end;
This function gets a matrix of values and scales them to a desired range.
In your case it could be used as following (variable img is the scaled im2):
for i=1:size(im1,3), % for each of the input/output image channels
output_range = [min(min(im1(:,:,i))) max(max(im1(:,:,i)))];
img(:,:,i) = scale( im2(:,:,i), output_range);
end;
This way im2 is scaled to the range of values of im1 one channel at a time. Output variable img should be the desired one.

Implementing a neural network for smile detection in images

Let me explain the background to the problem before I explain the problem. My task is to take in an image labelled with a smile of not. The files that have smiles are labelled, for example 100a.jpg and 100b.jpg. Where 'a' is used to represent an image without a smile and 'b' is used to represent an image with a smile. As such I'm looking to make a 3 layered network i.e. Layer 1 = input nodes, Layer 2 = hidden layer and layer 3 = output node.
The general algorithm is to:
Take in an image and re-size it to the size of x 24x20.
Apply a forward propagation from the input nodes to the hidden layer.
Apply a forward propagation from the hidden layer to the output node.
Then apply a backward propagation from the output node to the hidden layer. (Formula1)
Then apply a backward propagation from the hidden layer to the input nodes. (Formula2)
Formula 1:
Formula 2:
Now the problem quite simply is... my code never converges and as such I dont have weight vectors that can be used to test the network. Problem is I HAVE NO CLUE WHY THIS IS HAPPENING... Here is the error I display, clearly not converging:
Training done full cycle
0.5015
Training done full cycle
0.5015
Training done full cycle
0.5015
Training done full cycle
0.5038
Training done full cycle
0.5038
Training done full cycle
0.5038
Training done full cycle
0.5038
Training done full cycle
0.5038
Here is my matlab code:
function [thetaLayer12,thetaLayer23]=trainSystem()
%This is just the directory where I read the images from
files = dir('train1/*jpg');
filelength = length(files);
%Here I create my weights between input layer and hidden layer and then
%from the hidden layer to the output node. The reason the value 481 is used
%is because there will be 480 input nodes + 1 bias node. The reason 200 is
%used is for the number of hidden layer nodes
thetaLayer12 = unifrnd (-1, 1 ,[481,200]);
thetaLayer23 = unifrnd (-1, 1 ,[201,1]);
%Learning Rate value
alpha = 0.00125;
%Initalize Convergence Error
globalError = 100;
while(globalError > 0.001)
globalError = 0;
%Run through all the files in my training set. 400 Files to be exact.
for i = 1 : filelength
%Here we find out if the image has a smile in it or not. If there
%Images are labled 1a.jpg, 1b.jpg where images with an 'a' in them
%have no smile and images with a 'b' in them have a smile.
y = isempty(strfind(files(i).name,'a'));
%We read in the image
imageBig = imread(strcat('train1/',files(i).name));
%We resize the image to 24x20
image = imresize(imageBig,[24 20]);
%I then take the 2D image and map it to a 1D vector
inputNodes = reshape(image,480,1);
%A bias value of 1 is added to the top of the vector
inputNodes = [1;inputNodes];
%Forward Propogation is applied the input layer and the hidden
%layer
outputLayer2 = logsig(double(inputNodes')* thetaLayer12);
%Here we then add a bias value to hidden layer nodes
inputNodes2 = [1;outputLayer2'];
%Here we then do a forward propagation from the hidden layer to the
%output node to obtain a single value.
finalResult = logsig(double(inputNodes2')* thetaLayer23);
%Backward Propogation is then applied to the weights between the
%output node and the hidden layer.
thetaLayer23 = thetaLayer23 - alpha*(finalResult - y)*inputNodes2;
%Backward Propogation is then applied to the weights between the
%hidden layer and the input nodes.
thetaLayer12 = thetaLayer12 - (((alpha*(finalResult-y)*thetaLayer23(2:end))'*inputNodes2(2:end))*(1-inputNodes2(2:end))*double(inputNodes'))';
%I sum the error across each iteration over all the images in the
%folder
globalError = globalError + abs(finalResult-y);
if(i == 400)
disp('Training done full cycle');
end
end
%I take the average error
globalError = globalError / filelength;
disp(globalError);
end
end
Any help would seriously be appreciated!!!!
The success of training any machine learning algorithm is heavily dependent on the amount of training examples you use to train your algorithm. You never said exactly how many training examples you have, but in the case of face detection a huge number of examples would probably be needed (if it would work at all).
Think of it this way, a computer scientist shows you two arrays of pixel intensity values. He tells you which one has a simile in it and which does not. Than he shows you two more and asks you to tell him which one has a simile in it.
Fortunately we can work around this to some extent. You can use an autoencoder or a dictionary learner like sparse coding to find higher level structures in the data. Instead of the computer scientist showing your pixels intensities, he could show you edges or even body parts. You could then use this as input to your neural network, but a significant number of training examples would probably still be needed (but lesser than before).
Than analogy was inspired by a talk given by professor Ng of Stanford on unsupervised feature learning.