I'm working on an optical character recognition project where I am trying to create a program which will recognize alphabetic letters from an image. I'm following the tutorial located on Mathworks(Digit Classification). In their example, their training images are already separated. Unfortunately, I was provided with training images which contain hundreds of letters in a single file.
Here is a sample:
I need an efficient way to segment each individual letter into an image, so I would have a 26Xn array where 26 is each letter in the alphabet and n is n image data variables containing individual letters. It would be extremely tedious to manually segment letters from each training image or attempt to segment letters by a specified length since the separation between letters isn't always equal.
Does anyone know of a MATLAB function or a simple way where I can identify the height and length of every continuous white colored object and store all the individual white objects with their black background in the 26Xn array described above (or at least stored in some type of array so I can later process it into the 26xn array)?
If you want to extract every individual character in your image, you can very easily do that with regionprops. Simply use the BoundingBox attribute to extract the bounding box surrounding each character. After you do this, we can place each character in a cell array for further process. If you want to store this into a 26 x N array, you would need to recognize what each letter was first so that you can choose the slot that the letter is supposed to go in for the first dimension. Because you want to segment out the characters first, we will focus on that. As such, let's load in the image into MATLAB. Note that the original image was in GIF and when I loaded it on my computer... it looked pretty messed up. I've resaved the image into PNG and it's shown below:
Let's read this into MATLAB:
im = imread('http://i.stack.imgur.com/q7cnA.png');
Now, you may notice that there are some discontinuities between some letters. What we can do is perform a morphological opening to close these gaps. However, we aren't going to use this image to extract what the actual characters are. We are only using these to get the bounding boxes for the letters:
se = strel('square', 7);
im_close = imclose(im, se);
Now, you'd call regionprops like this to find all of the bounding boxes in the image (after applying morphology):
s = regionprops(im_close, 'BoundingBox');
What is returned in s is a structure where each element in this structure contains a bounding box that encapsulates an object detected in the image. In our case, this is a single character. The BoundingBox property for each object is a 4 element array that is formatted like so:
[x y w h]
(x,y) are the column and row co-ordinates of the upper left corner of the bounding box and w and h are the width and height of the bounding box. What we will do next is create a 4 column matrix that encapsulates all of these bounding box properties together, where each row denotes a single bounding box:
bb = round(reshape([s.BoundingBox], 4, []).');
It's necessary to round the values because if you want to extract the letters from the image, we have to do this in integer co-ordinates as that is how the image is naturally defined. If you want a good illustration of these bounding boxes, this code below will draw a red box around each character we have detected:
imshow(im);
for idx = 1 : numel(s)
rectangle('Position', bb(idx,:), 'edgecolor', 'red');
end
This is what we get:
The final job is to extract all of the characters and place them into a cell array. I'm using a cell array because the character sizes are uneven, so putting this into a cell array will accommodate for the different sizes. As such, simply loop over every bounding box we have, then extract the bounding box of pixels to get each character and place it into a cell array. Therefore:
chars = cell(1, numel(s));
for idx = 1 : numel(s)
chars{idx} = im(bb(idx,2):bb(idx,2)+bb(idx,4)-1, bb(idx,1):bb(idx,1)+bb(idx,3)-1);
end
If you want a character, simply do ch = chars{idx}; where idx is any number from 1 to as many characters as we have. You can also see what this character looks like by doing imshow(ch);
This should hopefully give you enough to get started. Good luck!
You are looking for bwlabel to label each letter in your image.
Another tool you might find useful is regionprops, especially the 'Image' property.
In your comment you state that you struggle with the order at which Matlab is labeling your regions: Matlab's matrices and images are stored in memory in a column major order (that is column after column), thus is "discovers" new components in the binary image from top to bottom and from left to right. In order to explore the image row by row from to to bottom, you might want to consider transposing the image:
[ind map] = imread('http://i.gyazo.com/0ca8d4416a52b8bc3401da0b71a527fd.gif'); %//read indexed image
BW = max( ind2rgb(ind,map), [], 3 ) > .15; %//convert RGB image to binary mask
seg = regionprops( BW.', 'Image' ); %'// transpose input mask
seg = arrayfun( #(x) x.Image.', seg, 'Uni', 0 ); %'// flip back
Now you separate letters are in cells in cell array seg.
Note that by providing regionprops with a binary input, you do not need to explicitly call bwlabel.
Related
I'm an undergrad student working in a cell biology lab with a basic background in matlab. I'm working on a project of tracking cell trajectory (time lapse) on a petri dish. Below are two example images that i used the watershed feature to separate from the background. The original pictures had neon green cells, now this is all in black and white/
Let's say i have 20 pictures like this, how might I superimpose one on top of another so they all of equal transparency?
Then, how can i add a colormap that represents time? (The bottom most picture is one end of the colormap and the most recent picture is the opposite end) <- this is extremely challenging as it often things the background is black and not NaN
The Basic Idea
Probably the easiest way to do this, is to take the binary image for each layer, and multiply the image by the time at which it was acquired (or it's index in time). Then you can concatenate all images along the third dimension (using cat). You can compute the maximum value along the third dimension using max. This will make the newer time points appear to be "on top" of the older time points. You can then display the resulting flattened matrix using imagesc and it will automatically map to the colormap for the current figure. Typically we would refer to this as a maximum intensity projection.
Creating Some Data
First since you've only provided two images, I'm going to create some shifted versions of the first image you've provided for the demonstration.
% Create some pseudo-data in a cell array that represents the image over time
im = imread('http://i.imgur.com/xTurvfO.jpg');
im = im(:,:,1);
ims = cell(1, 5);
% Create some shifted versions of im1
shifts = round(linspace(0,1000,5));
for k = 1:numel(shifts)
ims{k} = circshift(im > 100, shifts([k k]));
end
Implementing the Method
Now for the application of the method I discussed
% For each image, multiply the binary mask by the time
for k = 1:numel(ims)
ims{k} = ims{k} * k;
end
% Concatenate all images along the third dimension
IMS = cat(3, ims{:});
% Flatten by taking the maximum value along the third dimension
MIP = max(IMS, [], 3);
% Display the resulting flattened image using imagesc
imagesc(MIP);
% Create a custom colormap with black at the end to create our black background
colormap(cat(1, [0 0 0], parula))
The Result
I have used imfuse to create composite images, which is similar to combining multiple channels on a fluorescent microscope. The Mathworks documentation is http://www.mathworks.com/help/images/ref/imfuse.html.
The tricky part is choosing the vector for color channels. For example, [2,1,2] means choosing B(lue) for image 1, R(ed) and G(reen) for image 2. [2,1,2] is the scheme recommended for colorblind people and gives figure on the left of this image. Using [1,0,2] for red/blue gives the the figure on the right.
fig1 = imread([basepath filesep 'fig.jpg']); %white --> black
fig2 = imread([basepath filesep 'fig2.jpg']);
fig_overlay = imfuse(fig1, fig2,'falsecolor','Scaling','joint', 'ColorChannels', [1,0,2]);
imshow(fig_overlay)
I have a segmented image. I wish to extract the middle pixel(s) of each segmentation. The goal is to extract the mean color from the middle pixel.
The following diagram illustrates what I mean by 'middle pixel':
The alternative middle pixels are also acceptable.
What algorithms/functions are available in Matlab to achieve something similar to this? Thanks.
If I'm understanding what you want correctly, you're looking for the centroid. MATLAB has the regionprops function which measures the properties of separate binary objects as long as the objects.
You can use the Centroid property. Assuming your image is stored in im and is binary, something like this will do:
out = regionprops(im, 'Centroid');
The output will be a structure array of N elements where N corresponds to the total number of objects found in the image. To access the ith object's centroid, simply do:
cen = out(i).Centroid;
If you wish to collect all centroids and place them perhaps in a N x 2 numeric array, something like this would work:
out = reshape([out.Centroid], 2, []).';
Each row would be the centroid of an object found in the image. Take note that an object is considered to be a blob of white pixels that are connected to each other.
I have an image that was read in using the imread function. My goal is to collect pairs of pixels in an image in MATLAB. Specifically, I have read a paper, and I am trying to recreate the following scenario:
First, the original image is grouped into pairs of pixel values. A pair consists of two neighboring pixel values or two with a small difference value. The pairing could be done horizontally by pairing the pixels on the same row and consecutive columns, or vertically, or by a key-based specific pattern. The pairing could be through all pixels of the image or just a portion of it.
I am looking to recreate the horizontal pairing scenario. I'm not quite sure how I would do this in MATLAB.
Assuming your image is grayscale, we can easily generate a 2D grid of co-ordinates using ndgrid. We can use these to create one grid, then shift the horizontal co-ordinates to the right to make another grid and then use sub2ind to convert the 2D grid into linear indices. We can finally use these linear indices to create our pixel pairings that you have described in your comments (you should really add that to your post BTW). What's important is that you need to skip over every other column in a row to ensure unique pixel pairings.
I'm also going to assume that your image is grayscale. If we go to colour, this will be slightly more complicated, and I'll leave that to you as a learning exercise. Therefore, assuming your image was read in through imread and is stored in im, do something like this:
[rows,cols] = size(im);
[X,Y] = ndgrid(1:rows,1:2:cols);
ind = sub2ind(size(im), X, Y);
ind_shift = sub2ind(size(im), X, Y+1);
pixels1 = im(ind);
pixels2 = im(ind_shift);
pixels = [pixels1(:) pixels2(:)];
pixels will be a 2D array, where each row gives you the pixel intensities of a particular pairing in the image. Bear in mind that I processed each row independently. As such, as soon as we are done with one row, we simply move on to the next row and continue the procedure. This also assumes that your image has an even number of columns. Should it not, you have a decision to make. You need to either pad the image with one column at the end, and this column can be anything you want, or you can remove this column from the image before processing. If you want to fill in this column, you can either make it all zeroes, or perhaps replicate the last column and place this beside the last column in the original image. Therefore, an appropriate pre-processing step may look something like this:
if mod(cols,2) ~= 0
im = im(:,1:end-1);
end
The above code simply removes the last column in the image if the number of columns is odd. Once you run through this code, you can run the first bit of code that I had above.
Good luck!
I have a (naïve probably) question, I just wanted to clarify this part. So, when I take a dsift on one image I generally get an 128xn matrix. Thing is, that n value, is not always the same across different images. Say image 1 gets an 128x10 matrix, while image 2 gets a 128x18 matrix. I am not quite sure why is this happening.
I think that each column of 128dimension represents a single image feature or a single patch detected from the image. So in the case of 128x18, we have extracted 18 patches and described them with 128 values each. If this is correct, why cant we have a fixed numbers of patches per image, say 20 patches, so every time our matrixes would be 128x20.
Cheers!
This is because the number of reliable features that are detected per image change. Just because you detect 10 features in one image does not mean that you will be able to detect the same number of features in the other image. What does matter is how close one feature from one image matches with another.
What you can do (if you like) is extract the, say, 10 most reliable features that are matched the best between the two images, if you want to have something constant. Choose a number that is less than or equal to the minimum of the number of patches detected between the two. For example, supposing you detect 50 features in one image, and 35 features in another image. After, when you try and match the features together, this results in... say... 20 best matched points. You can choose the best 10, or 15, or even all of the points (20) and proceed from there.
I'm going to show you some example code to illustrate my point above, but bear in mind that I will be using vl_sift and not vl_dsift. The reason why is because I want to show you visual results with minimal pre- and post-processing. Should you choose to use vl_dsift, you'll need to do a bit of work before and after you compute the features by dsift if you want to visualize the same results. If you want to see the code to do that, you can check out the vl_dsift help page here: http://www.vlfeat.org/matlab/vl_dsift.html. Either way, the idea about choosing the most reliable features applies to both sift and dsift.
For example, supposing that Ia and Ib are uint8 grayscale images of the same object or scene. You can first detect features via SIFT, then match the keypoints.
[fa, da] = vl_sift(im2single(Ia));
[fb, db] = vl_sift(im2single(Ib));
[matches, scores] = vl_ubcmatch(da, db);
matches contains a 2 x N matrix, where the first row and second row of each column denotes which feature index in the first image (first row) matched best with the second image (second row).
Once you do this, sort the scores in ascending order. Lower scores mean better matches as the default matching method between two features is the Euclidean / L2 norm. As such:
numBestPoints = 10;
[~,indices] = sort(scores);
%// Get the numBestPoints best matched features
bestMatches = matches(:,indices(1:numBestPoints));
This should then return the 10 best matches between the two images. FWIW, your understanding about how the features are represented in vl_feat is spot on. These are stored in da and db. Each column represents a descriptor of a particular patch in the image, and it is a histogram of 128 entries, so there are 128 rows per feature.
Now, as an added bonus, if you want to display how each feature from one image matches to another image, you can do the following:
%// Spawn a new figure and show the two images side by side
figure;
imagesc(cat(2, Ia, Ib));
%// Extract the (x,y) co-ordinates of each best matched feature
xa = fa(1,bestMatches(1,:));
%// CAUTION - Note that we offset the x co-ordinates of the
%// second image by the width of the first image, as the second
%// image is now beside the first image.
xb = fb(1,bestMatches(2,:)) + size(Ia,2);
ya = fa(2,bestMatches(1,:));
yb = fb(2,bestMatches(2,:));
%// Draw lines between each feature
hold on;
h = line([xa; xb], [ya; yb]);
set(h,'linewidth', 1, 'color', 'b');
%// Use VL_FEAT method to show the actual features
%// themselves on top of the lines
vl_plotframe(fa(:,bestMatches(1,:)));
fb2 = fb; %// Make a copy so we don't mutate the original
fb2(1,:) = fb2(1,:) + size(Ia,2); %// Remember to offset like we did before
vl_plotframe(fb2(:,bestMatches(2,:)));
axis image off; %// Take out the axes for better display
I have a MATLAB code that draws bounding boxes around each letter.
I would like to draw these boxes around each word, instead of each character.
I had thought of
reading the size of each space between words and based on that, separating each word.
grouping adjacent rectangles into larger rectangles, which would essentially do the same thing for me.
How would this be done?
Here is the image so far:
http://imgur.com/iDF5VD4
Here is my code so far:
%CLEAR EVERYTHING
clear all;
close all;
%SET FOLDER AND FILE LOCATION
folder = 'H:\Miscellaneous\Work\Project';
baseFileName = 'lorem-ipsum.jpg';
fullFile = fullfile(folder, baseFileName);
%CONVERT TO GRAYSCALE
normal = imread(fullFile);
gray = rgb2gray(normal);
%CONVERT TO BINARY IMAGE
binary = im2bw(gray);
%INVERT IMAGE
binary = ~binary;
%FILL HOLES
ifill=imfill(binary,'holes');
figure,imshow(ifill)
%COUNT LETTER IN TEXT
[Ilabel num]=bwlabel(ifill);
disp(num)
%CALCULATE REGION PROPERTIES
Iprops=regionprops(Ilabel);
%SET BOX PROPERTIES INTO VARIABLE
Ibox=[Iprops.BoundingBox];
%RESHAPE 1-D ARRAY
Ibox=reshape(Ibox,[4 num]);
%DRAW BOUNDING BOXES FOR EACH LETTER
for cnt=1:num
rectangle('position',Ibox(:,cnt),'edgecolor','r');
end
hold off
I think you've got a good idea. You group letter boxes into words, then compute the bounding box of each group.
In your particular example, you can do this very fast with morphological closing. I don't explain here how to compute the word spacing, you only need the character spacing which is a parameter of the used font. I call this parameter Sp. On your image, Sp seems to be approximately 4 pixels.
So, first take your binary image; note that filling the holes is very interesting here. With morphological closing, you can work directly on the letters, no need to work with their bounding boxes.
binclosed = imclosed(binary, strel('rectangle',[2 ceil(Sp/2)]));
Here I close with a rectangle of height 2 in order to, for example, catch the dots of 'i').
Then you can label the connected components and draw their bounding boxes as you have done for characters.
[Ilabel,num] = bwlabel(binclosed);
Iprops = regionprops(Ilabel, 'BoundingBox');
Ibox = reshape([Iprops.BoundingBox],[4 num]);
for cnt=1:num
rectangle('position',Ibox(:,cnt),'edgecolor','r');
end
That's actually pretty simple to do. Draw the rectangles onto a binary image, then fill in all of the rectangles. After, do a binary morphological closing with a structuring element that is large enough to overlap between two characters. When you do that, you will have masks within each word. You can then use bwlabel to extract out IDs for each of the complete words. Once you have the IDs, you can iterate through and create individual masks for each of the words. If you desire some code, I can edit my post but this should get you started.