Count identical elements near each other in matrix - matlab

Consider a matrix like
A = 0 1 0 1
1 1 0 0
0 0 0 0
1 1 1 1
I would like to calculate the average size of each cluster of 1's. I define a cluster as occurring when two or more 1's are near each other, i.e. next to or above/below. Eg, in this matrix there is a cluster of size 3 in the top left hand corner and a cluster of size 4 in the bottom row.
I need a way to extract this information in a non-visual way because I need to do this many times for different A.

You may want to use bwlabel which isolates the connected components (clusters of 1) in your binary matrix.
A = [0 1 0 1
1 1 0 0
0 0 0 0
1 1 1 1 ];
[L,n] = bwlabel(A,8) % # for a 8-pixel stencil
% # (i.e. hor/vert/diag first neighbors)
or
[L,n] = bwlabel(A,4) % # for 4-pixel stencil
% # (just horizontal & vertical neighbors)
L = 0 1 0 3
1 1 0 0
0 0 0 0
2 2 2 2
Doing so, you obtain a matrix L which labels the n different connected components.
Then you may want to extract some statistics; for instance you may want to histogram the size of the clusters.
cluster_size = hist(L(:),0:n);
cluster_size = cluster_size(2:end); % # histogram of component vs. size
% # (without zeros)
hist(cluster_size) % # histogram of sizes
which tells you thay you have one cluser of 1 element, one cluster of 3 and one cluster of four.
Finally, if you are looking for the average size of the clusters, you can do
mean(cluster_size)
2.6667

Related

Count length and frequency of island of consecutive numbers

I have a sequence of ones and zeros and I would like to count how often islands of consecutive ones appear.
Given:
S = [1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 1]
By counting the islands of consecutive ones I mean this:
R = [4 3 1]
…because there are four single ones, three double ones and a single triplet of ones.
So that when multiplied by the length of the islands [1 2 3].
[4 3 1] * [1 2 3]’ = 13
Which corresponds to sum(S), because there are thirteen ones.
I hope to vectorize the solution rather than loop something.
I came up with something like:
R = histcounts(diff( [0 (find( ~ (S > 0) ) ) numel(S)+1] ))
But the result does not make much sense. It counts too many triplets.
All pieces of code I find on the internet revolve around diff([0 something numel(S)]) but the questions are always slightly different and don’t really help me
Thankful for any advice!
The following should do it. Hopefully the comments are clear.
S = [1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 1];
% use diff to find the rising and falling edges, padding the start and end with 0
edges = diff([0,S,0]);
% get a list of the rising edges
rising = find(edges==1);
% and falling edges
falling = find(edges==-1);
% and thereby get the lengths of all the runs
SRuns = falling - rising;
% The longest run
maxRun = max(SRuns);
% Finally make a histogram, putting the bin centres
R = hist(SRuns,1:maxRun);
You could also obtain the same result with:
x = find(S==1)-(1:sum(S)) %give a specific value to each group of 1
h = histc(x,x) %compute the length of each group, you can also use histc(x,unique(x))
r = histc(h,1:max(h)) %count the occurence of each length
Result:
r =
4,3,1

Creating gray-level co-occurrence matrix from 16-bit image

I have a data set of images that are 16-bit and I want to create GLCM matrix from them to extract GLCM features.
However, the resulting matrix shows one value (as shown in the picture below), I wonder why.
I tried using the same image but converted to 8-bit, the resulted GLCM show several values.
Note: I used the following Matlab function:
glcm_matrix = graycomatrix(image.tif);
Here is a cropped sample from the 16-bit image:
Note: The image used in the computations can be downloaded from here. The original image is very low contrast and looks totally dark. The image shown above has its contrast stretched and is intended only for visualization purposes.
EDIT:
I used
glcm_matrix = graycomatrix(image.tif, 'GrayLimits', []);
and it gives me the following results:
It was a binning/scaling problem.
Let's take a peek inside:
edit graycomatrix
In this case we're interested in the two options, 'NumLevels' and 'GrayLimits'
% 'NumLevels' An integer specifying the number of gray levels to use
% when scaling the grayscale values in I. For example,
% if 'NumLevels' is 8, GRAYCOMATRIX scales the values in
% I so they are integers between 1 and 8. The number of
% gray levels determines the size of the gray-level
% co-occurrence matrix (GLCM).
%
% 'NumLevels' must be an integer. 'NumLevels' must be 2
% if I is logical.
%
% Default: 8 for numeric
% 2 for logical
%
% 'GrayLimits' A two-element vector, [LOW HIGH], that specifies how
% the values in I are scaled into gray levels. If N is
% the number of gray levels (see parameter 'NumLevels')
% to use for scaling, the range [LOW HIGH] is divided
% into N equal width bins and values in a bin get mapped
% to a single gray level. Grayscale values less than or
% equal to LOW are scaled to 1. Grayscale values greater
% than or equal to HIGH are scaled to NumLevels. If
% 'GrayLimits' is set to [], GRAYCOMATRIX uses the
% minimum and maximum grayscale values in I as limits,
% [min(I(:)) max(I(:))].
So in other words the function was binning your data into 8x8 bins and assuming that the scaling range was the full uint16 range (0-65535). However that sample image I you gave has a minimum of 305 and a maximum of 769, making it fall into the first bin (0-8192 or so). When I call A = graycomatrix(I) it gives me the following matrix :
A =
6600 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
However when A = graycomatrix(I,'GrayLimits', []) is called the scaling range is taken as min(I) - max(I), and the function works as expected :
A =
4 2 1 0 0 0 0 0
1 1 2 2 0 0 0 0
2 2 4 7 1 0 0 0
0 1 7 142 72 1 0 0
0 0 0 65 1711 252 0 0
0 0 0 0 230 3055 178 0
0 0 0 0 0 178 654 8
0 0 0 0 0 0 8 9
In your original example the single value is in the middle of the 8x8 matrix most likely because your original images are int16 and not uint16, so the graycomatrix is symmetric to take into account the possibility of negative values.
You can also of course scale the original images to fit their datatypes. For example percentile scaling might be a good idea if you expect outliers etc.
I'd just like to build on #Tapio's excellent answer.
The GLCM yielded by graycomatrix when you use the name/value pair GrayLimits', [] in the function call looks good. However, this approach might not be valid for your application. If you compute the GLCMs for a set of images in this way, the same elements of two different GLCMs corresponding to two different images are likely to have a different meaning. Indeed, as the intensity is being rescaled differently for each image, the components of the GLCM are actually encoding different co-occurrences from one image to another.
To avoid this you could first calculate the minimum and maximum intensities over the whole image dataset (for example minImgs and maxImgs) and then use those values to rescale the intensity of all the images that make up the dataset in the exact same way:
glcm_matrix = graycomatrix(image_tif, 'GrayLimits', [minImgs maxImgs]);

How to label data index with count using 3D histogram in Matlab

I have a set of data points (around 20000) with their x,y values and I want to remove the points that not very close to other points. I try to approach by 'digitizing' and I think the closest way to implement it in Matlab is a 3D histogram so I can remove the points in the low-count bins. I used hist3() but the problems is I couldn't get the index of the points labeled with counts (like the output 'ind' from histc()). The only way I can think of is a nested for loop which is the last thing I want to try. Is there any way I can label the points index or any other approach to do this?
Thanks
I feel like I need some clarification
I have the histogram graph from the data generated by #rayryeng
There are some bins have N=0 or N=1 so I want to remove the data in these bins.
For histc() there is a form of output [bincounts,ind]= histc( ) where ind returns the bin numbers the data falls into. So I can find the index of bins which less/equal or larger than 1, then find the data in the particular bins. Is there any similar thing I can do for a 2D inputs?
Thanks Again
hist3 should be able to accomplish this for you. I'm not quite sure where the problem is. You can call hist3 like so:
[N,C] = hist3(X);
This will automatically partition your dataset into a 10 x 10 grid of equally spaced containers. You can override this behaviour by doing:
[N,C] = hist3(X, NBINS);
NBINS is a 2 element array where the first element tells you how many bins you want vertically and the second element is how many bins you would like horizontally.
N will tell you how many elements fall within each location of the grid and C will give you a 1 x 2 cell array where the first element of the cell array gives you the X co-ordinates of each centre of the bin while the second element of the cell array gives you the Y co-ordinates of each centre of the bin.
To be explicit, if we have a 10 x 10 grid, C will contain a two element cell array where each element is 10 elements long. For each X co-ordinate of the centre found in C{1}, we will have 10 corresponding Y co-ordinates that relate to the a bin's centre in C{2}. This means that the first 10 bin centres are located at C{1}(1), C{2}(1), C{1}(1), C{2}(2), C{1}(1), C{2}(3), ..., C{1}(1), C{2}(10), then the next 10 bin centres are located at: C{1}(2), C{2}(1), C{1}(2), C{2}(2), C{1}(2), C{2}(3), ..., C{1}(1), C{2}(10).
As a quick example, let's do this on a grid between [0,1] on the x-axis and [0,1] on the y-axis. I'm going to generate 100 2D points. Let's also decompose the image into 10 bins horizontally and 10 bins vertically (as per the default of hist3).
rng(100); %// Set seed for reproducibility
A = rand(100,2);
[N,C] = hist3(A);
disp(N);
celldisp(C);
We thus get:
N =
1 2 0 1 2 0 1 0 1 1
0 1 1 1 1 1 0 0 2 5
0 4 1 1 1 1 1 4 0 1
2 0 3 2 2 1 1 0 2 1
0 0 0 0 1 1 1 0 0 1
1 1 1 2 1 1 0 2 0 1
1 0 2 1 2 0 3 1 1 1
0 1 0 0 0 1 1 0 0 1
1 0 1 2 3 3 0 0 0 2
0 2 1 1 0 1 0 3 0 1
C{1} =
Columns 1 through 7
0.0541 0.1528 0.2516 0.3503 0.4491 0.5478 0.6466
Columns 8 through 10
0.7453 0.8440 0.9428
C{2} =
Columns 1 through 7
0.0513 0.1510 0.2508 0.3505 0.4503 0.5500 0.6498
Columns 8 through 10
0.7495 0.8493 0.9491
This tells us that the first grid located at the top left corner of our point distribution only has 1 value logged into it. The next grid after that has 2 bins logged in it and so on and so forth. We also have our bin centres for each of the bins shown in C. Remember, We have 10 x 10 possible bin centres. If we want to display our data with the bin locations, this is what we can do:
[X,Y] = meshgrid(C{1},C{2});
plot(A(:,1), A(:,2), 'b*', X(:), Y(:), 'r*');
grid;
We thus get:
The red stars denote the bin centres while the blue stars denote our data points within the grid. Because our origin is on the bottom left corner of our plot, but the origin of the N matrix is at the top left corner (i.e. the first bin that is decomposed is at the top left while in our data it's at the bottom left corner), we need to rotate N by 90 degrees counter-clockwise so that the origins of each of the matrices agree with each other, and also agree with the plot. As such:
Nrot = rot90(N);
disp(Nrot);
Nrot =
1 5 1 1 1 1 1 1 2 1
1 2 0 2 0 0 1 0 0 0
0 0 4 0 0 2 1 0 0 3
1 0 1 1 1 0 3 1 0 0
0 1 1 1 1 1 0 1 3 1
2 1 1 2 1 1 2 0 3 0
1 1 1 2 0 2 1 0 2 1
0 1 1 3 0 1 2 0 1 1
2 1 4 0 0 1 0 1 0 2
1 0 0 2 0 1 1 0 1 0
As you can see from the picture, this agrees with what we see within the (rotated) N matrix as well as the bin centres C. Using N (or Nrot if you get the convention correct), you can now figure out which points to eliminate from your array of points. Any points that have low membership within N, you would find those points that are the closest to that bin centre that is associated to the grid location in N and remove them.
As an example, supposing that the bin in the first row, second column (of the rotated result) is the one you want to filter out. This corresponds to (C{1}(2), C{2}(10)). We also know that we need to filter out 5 points as they belong to this bin centre. Therefore:
numPointsToRemove = N(2,10); %//or Nrot(1,2);
%// Computes Euclidean distance between this bin centre with every point
dists = sqrt(sum(bsxfun(#minus, A, [C{1}(2) C{2}(10)]).^2, 2));
%// Find the numPointsToRemove closest points to the bin centre and remove
[~,ind] = sort(dists);
A(ind(1:numPointsToRemove,:)) = [];
We sort our distances in ascending order, then determine the numPointsToRemove closest points to this bin centre. We thus remove them from our data matrix.
If you want to remove those bins that have either a 0 or a 1 for the count, we can find those locations, then run a for loop and filter accordingly. However, any bins that have 0 means that we don't even need to run through and filter anything, because no points were mapped to there! You really need to filter out those values that have just 1 in the bins. In other words:
[rows, cols] = find(N == 1);
for index = 1 : numel(rows)
row = rows(index);
col = cols(index);
%// Computes Euclidean distance between this bin centre with every point
dists = sqrt(sum(bsxfun(#minus, A, [C{1}(row) C{2}(col)]).^2, 2));
%// Finds the closest point to the bin centre and remove
[~,ind] = min(dists);
A(ind,:) = [];
end
As you can see, this is similar the same procedure as above. As we wish to filter out those bins that only have 1 assigned to a bin, we just need to find the minimum distance. Remember, we don't need to process any bins that have a count of 0 so we can skip those.

How to replace non-zero elements randomly with zero?

I have a matrix including 1 and 0 elements like below which is used as a network adjacency matrix.
A =
0 1 1 1
1 1 0 1
1 1 0 1
1 1 1 0
I want to simulate an attack on the network, so I must replace some specific percent of 1 elements randomly with 0. How can I do this in MATLAB?
I know how to replace a percentage of elements randomly with zeros, but I must be sure that the element that is replaced randomly, is one of the 1 elements of matrix not zeros.
If you want to change each 1 with a certain probability:
p = 0.1%; % desired probability of change
A_ones = find(A); % linear index of ones in A
A_ones_change = A_ones(rand(size(A_ones))<=p); % entries to be changed
A(A_ones_change) = 0; % apply changes in those entries
If you want to randomly change a fixed fraction of the 1 entries:
f = 0.1; % desired fraction
A_ones = find(A);
n = round(f*length(A_ones));
A_ones_change = randsample(A_ones,n);
A(A_ones_change) = 0;
Note that in this case the resulting fraction may be different to that intended, because of the need to round to an integer number of entries.
#horchler's point is a good one. However, if we keep it simple, then you can just multiple your input matrix to a mask matrix.
>> a1=randint(5,5,[0 1]) #before replacing 1->0
a1 =
1 1 1 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 0 1
1 0 1 0 1
>> a2=random('unif',0,1,5,5) #Assuming frequency distribution is uniform ('unif')
a2 =
0.7889 0.3200 0.2679 0.8392 0.6299
0.4387 0.9601 0.4399 0.6288 0.3705
0.4983 0.7266 0.9334 0.1338 0.5751
0.2140 0.4120 0.6833 0.2071 0.4514
0.6435 0.7446 0.2126 0.6072 0.0439
>> a1.*(a2>0.1) #And the replacement prob. is 0.1
ans =
1 1 1 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 0 1
1 0 1 0 0
And other trick can be added to the mask matrix (a2). Such as a different freq. distribution, or a structure (e.g. once a cell is replaced, the adjacent cells become less likely to be replaced and so on.)
Cheers.
The function find is your friend:
indices = find(A);
This will return an array of the indices of 1 elements in your matrix A and you can use your method of replacing a percent of elements with zero on a subset of this array. Then,
A(subsetIndices) = 0;
will replace the remaining indices of A with zero.

How can I find local maxima in an image in MATLAB?

I have an image in MATLAB:
y = rgb2gray(imread('some_image_file.jpg'));
and I want to do some processing on it:
pic = some_processing(y);
and find the local maxima of the output. That is, all the points in y that are greater than all of their neighbors.
I can't seem to find a MATLAB function to do that nicely. The best I can come up with is:
[dim_y,dim_x]=size(pic);
enlarged_pic=[zeros(1,dim_x+2);
zeros(dim_y,1),pic,zeros(dim_y,1);
zeros(1,dim_x+2)];
% now build a 3D array
% each plane will be the enlarged picture
% moved up,down,left or right,
% to all the diagonals, or not at all
[en_dim_y,en_dim_x]=size(enlarged_pic);
three_d(:,:,1)=enlarged_pic;
three_d(:,:,2)=[enlarged_pic(2:end,:);zeros(1,en_dim_x)];
three_d(:,:,3)=[zeros(1,en_dim_x);enlarged_pic(1:end-1,:)];
three_d(:,:,4)=[zeros(en_dim_y,1),enlarged_pic(:,1:end-1)];
three_d(:,:,5)=[enlarged_pic(:,2:end),zeros(en_dim_y,1)];
three_d(:,:,6)=[pic,zeros(dim_y,2);zeros(2,en_dim_x)];
three_d(:,:,7)=[zeros(2,en_dim_x);pic,zeros(dim_y,2)];
three_d(:,:,8)=[zeros(dim_y,2),pic;zeros(2,en_dim_x)];
three_d(:,:,9)=[zeros(2,en_dim_x);zeros(dim_y,2),pic];
And then see if the maximum along the 3rd dimension appears in the 1st layer (that is: three_d(:,:,1)):
(max_val, max_i) = max(three_d, 3);
result = find(max_i == 1);
Is there any more elegant way to do this? This seems like a bit of a kludge.
bw = pic > imdilate(pic, [1 1 1; 1 0 1; 1 1 1]);
If you have the Image Processing Toolbox, you could use the IMREGIONALMAX function:
BW = imregionalmax(y);
The variable BW will be a logical matrix the same size as y with ones indicating the local maxima and zeroes otherwise.
NOTE: As you point out, IMREGIONALMAX will find maxima that are greater than or equal to their neighbors. If you want to exclude neighboring maxima with the same value (i.e. find maxima that are single pixels), you could use the BWCONNCOMP function. The following should remove points in BW that have any neighbors, leaving only single pixels:
CC = bwconncomp(BW);
for i = 1:CC.NumObjects,
index = CC.PixelIdxList{i};
if (numel(index) > 1),
BW(index) = false;
end
end
Alternatively, you can use nlfilter and supply your own function to be applied to each neighborhood.
This "find strict max" function would simply check if the center of the neighborhood is strictly greater than all the other elements in that neighborhood, which is always 3x3 for this purpose. Therefore:
I = imread('tire.tif');
BW = nlfilter(I, [3 3], #(x) all(x(5) > x([1:4 6:9])) );
imshow(BW)
In addition to imdilate, which is in the Image Processing Toolbox, you can also use ordfilt2.
ordfilt2 sorts values in local neighborhoods and picks the n-th value. (The MathWorks example demonstrates how to implemented a max filter.) You can also implement a 3x3 peak finder with ordfilt2 with the following logic:
Define a 3x3 domain that does not include the center pixel (8 pixels).
>> mask = ones(3); mask(5) = 0 % 3x3 max
mask =
1 1 1
1 0 1
1 1 1
Select the largest (8th) value with ordfilt2.
>> B = ordfilt2(A,8,mask)
B =
3 3 3 3 3 4 4 4
3 5 5 5 4 4 4 4
3 5 3 5 4 4 4 4
3 5 5 5 4 6 6 6
3 3 3 3 4 6 4 6
1 1 1 1 4 6 6 6
Compare this output to the center value of each neighborhood (just A):
>> peaks = A > B
peaks =
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
or, just use the excellent: extrema2.m