I am importing an RGB image U of the stars and doing the following:
im=rgb2gray(U);
img=(im>200);
BW=im2bw(img,0);
L=bwlabeln(BW,18);
b=regionprops(L,'PixelList');
The goal of this program is to find the largest and most prominent stars in this picture of hundreds of stars. b is a 2566x1 struct array that contains all the points with a value greater than 200. If a certain connected region within the image contains multiple values over 200, b will store a coordinate matrix of these points. Otherwise, it will only store a single coordinate pair.
I need a way to find all the rows within b that contain matrices? If possible, a way to find all the rows within b that contain matrices that contain 30 or more points?
You can use the arrayfun function to apply a function to each element in an array. Note that this is just a shorter way of writing a loop.
In this case you'd need to apply the function size(b(i).PixelList, 1) > 30 to each element i of the struct array b:
m = arrayfun(#(x)size(x.PixelList, 1) > 1, b)
This is identical to:
m = false(size(b));
for i=1:numel(b)
m(i) = size(b(i).PixelList, 1) > 30;
end
The matrix m is a logical array, you can use it to index as b(m). You can also get indices using find(m).
If you also include 'Area' in the properties calculated by regionprops, you'll already have the number of pixels in each component:
b=regionprops(L,'PixelList','Area');
idx = [b.Area] >= 30;
Related
I'm new to Matlab, and I want to do the following.
I have 2500 data points that can be clustered into 10 groups. My aim is to find the top 5 data points of each cluster that is closest to the centroid. To do that, I did the following.
1) Find the distance between each point to each centroid, and allocate the closest cluster to each data point.
2) Store the data point's index (1,...,2500) and the corresponding distance in a cluster{index} array (not sure what data type this should be), where index = 1,2,...,10.
3) Go through each cluster to find the 5 closest data points.
My problem is I don't know how many data points will be stored in each cluster, so I don't know which data type I should use for my clusters and how to add to them in Step 2. I think a cell array may be what I need, but then I'll need one for the data point index and one for the distance. Or can I create a cell array of structure (each structure consisting of 2 members - index and distance). Again, how could I dynamically add to each cluster then?
I would suggest you keep the data in an normal array, this usually works the quickest in Matlab.
You could do as follows: (assuming p is an n=2500 by dim matrix of data points, and c is an m=10 by dim matrix of centroids):
dists = zeros(n,m);
for i = 1:m
dists(:,i) = sqrt(sum(bsxfun(#minus,p,c(i,:)).^2,2));
end
[mindists,groups] = min(dists,[],2);
orderOfClosenessInGroup = zeros(size(groups));
for i = 1:m
[~,permutation] = sort(mindists(groups==i));
[~,orderOfClosenessInGroup(groups==i)] = sort(permutation);
end
Then groups will be an n by 1 matrix of values 1 to m telling you which centroid the corresponding data point is closest to, and orderOfClosenessInGroup is an n by 1 matrix telling you the order of closeness inside each group (orderOfClosenessInGroup <= 5 will give you a logical vector of which data points are among the 5 closest to their centroid in their group). To illustrate it, try the following example:
n = 2500;
m = 10;
dim = 2;
c = rand(m,dim);
p = rand(n,dim);
Then run the above code, and finally plot the data as follows:
scatter(p(:,1),p(:,2),100./orderOfClosenessInGroup,[0,0,1],'x');hold on;scatter(c(:,1),c(:,2),50,[1,0,0],'o');
figure;scatter(p(orderOfClosenessInGroup<=5,1),p(orderOfClosenessInGroup<=5,2),50,[0,0,1],'x');hold on;scatter(c(:,1),c(:,2),50,[1,0,0],'o');
This will give you a result looking something like this:
and this:
B = randn(1,25,10);
Z = [1;1;1;2;2;3;4;4;4;3];
Ok, so, I want to find the locations where Z=1(or any numbers that are equal to each other), then average across each of the 25 points at these specific locations. In the example you would end with a 1*25*4 array.
Is there an easy way to do this?
I'm not the most versed in Matlab.
First things first: break down the problem.
Define the groups (i.e. the set of unique Z values)
Find elements which belong to these groups
Take the average.
Once you have done that, you can begin to see it's a pretty standard for loop and "Select columns which meet criteria".
Something along the lines of:
B = randn(1,25,10);
Z = [1;1;1;2;2;3;4;4;4;3];
groups = unique(Z); %//find the set of groups
C = nan(1,25,length(groups)); %//predefine the output space for efficiency
for gi = 1:length(groups) %//for each group
idx = Z == groups(gi); %//find it's members
C(:,:,gi) = mean(B(:,:,idx), 3); %//select and mean across the third dimension
end
If B = randn(10,25); then it's very easy because Matlab function usually works down the rows.
Using logical indexing:
ind = Z == 1;
mean(B(ind,:));
If you're dealing with multiple dimensions use permute (and reshape if you actually have 3 dimensions or more) to get yourself to a point where you're averaging down the rows as above:
B = randn(1,25,10);
BB = permute(B, [3,2,1])
continue as above
I'm trying to index a large matrix in MATLAB that contains numbers monotonically increasing across rows, and across columns, i.e. if the matrix is called A, for every (i,j), A(i+1,j) > A(i,j) and A(i,j+1) > A(i,j).
I need to create a random number n and compare it with the values of the matrix A, to see where that random number should be placed in the matrix A. In other words, the value of n may not equal any of the contents of the matrix, but it may lie in between any two rows and any two columns, and that determines a "bin" that identifies its position in A. Once I find this position, I increment the corresponding index in a new matrix of the same size as A.
The problem is that I want to do this 1,000,000 times. I need to create a random number a million times and do the index-checking for each of these numbers. It's a Monte Carlo Simulation of a million photons coming from a point landing on a screen; the matrix A consists of angles in spherical coordinates, and the random number is the solid angle of each incident photon.
My code so far goes something like this (I haven't copy-pasted it here because the details aren't important):
for k = 1:1000000
n = rand(1,1)*pi;
for i = length(A(:,1))
for j = length(A(1,:))
if (n > A(i-1,j)) && (n < A(i+1,j)) && (n > A(i,j-1)) && (n < A(i,j+1))
new_img(i,j) = new_img(i,j) + 1; % new_img defined previously as zeros
end
end
end
end
The "if" statement is just checking to find the indices of A that form the bounds of n.
This works perfectly fine, but it takes ridiculously long, especially since my matrix A is an image of dimensions 11856 x 11000. is there a quicker / cleverer / easier way of doing this?
Thanks in advance.
You can get rid of the inner loops by performing the calculation on all elements of A at once. Also, you can create the random numbers all at once, instead of one at a time. Note that the outermost pixels of new_img can never be different from zero.
randomNumbers = rand(1,1000000)*pi;
new_img = zeros(size(A));
tmp_img = zeros(size(A)-2);
for r = randomNumbers
tmp_img = tmp_img + A(:,1:end-2)<r & A(:,3:end)>r & A(1:end-1,:)<r & A(3:end,:)>r;
end
new_img(2:end-1,2:end-1) = tmp_img;
/aside: If the arrays were smaller, I'd have used bsxfun for the comparison, but with the array sizes in the OP, the approach would run out of memory.
Are the values in A bin edges? Ie does A specify a grid? If this is the case then you can QUICKLY populate A using hist3.
Here is an example:
numRand = 1e
n = randi(100,1e6,1);
nMatrix = [floor(data./10), mod(data,10)];
edges = {0:1:9, 0:10:99};
A = hist3(dataMat, edges);
If your A doesn't specify a grid, then you should create all of your random values once and sort them. Then iterate through those values.
Because you know that n(i) >= n(i-1) you don't have to check bins that were too small for n(i-1). This is a very easy way to optimize away most redundant checks.
Here is a snippet that should help a lot in the inner loop, it finds the location of the greatest point that is smaller than your value.
idx1 = A<value
idx2 = A(idx1) == max(A(idx1))
if you want to find the exact location you can wrap it with a find.
I have a vector CD1 (120-by-1) and I separate CD1 into 6 parts. For example, the first part is extracted from row 1 to row 20 in CD1, and second part is extracted from row 21 to row 40 in CD1, etc. For each part, I need to compute the means of the absolute values of second differences of the data.
for PartNo = 1:6
% extract data
Y(PartNo) = CD1(1 + 20*(PartNo-1):20*(PartNo),:);
% find the second difference
Z(PartNo) = Y(PartNo)(3:end) - Y(PartNo)(1:end-2);
% mean of absolute value
MEAN_ABS_2ND_DIFF_RESULT(PartNo) = mean(abs(Z));
end
However, the commands above produce the error:
()-indexing must appear last in an index expression for Line:2
Any ideas to change the code to have it do what I want?
This error is often encountered when Y is a cell-array. For cell arrays,
Y{1}(1:3)
is legal. Curly braces ({}) mean data extraction, so this means you are extracting the array stored in location 1 in the cell array, and then referencing the elements 1 through 3 of that array.
The notation
Y(1)(1:3)
is different in that it does not extract data, but it references the cell's location 1. This means the first part (Y(1)) returns a cell-array which, in your case, contains a single array. So you won't have direct access to the regular array as before.
It is an infamous limitation in Matlab that you cannot do indirect or double-referencing, which is in effect what you are doing here.
Hence the error.
Now, to resolve: I suspect replacing a few normal braces with curly ones will do the trick:
Y{PartNo} = CD1(1+20*(PartNo-1):20*PartNo,:); % extract data
Z{PartNo} = Y{PartNo}(3:end)-Y{PartNo}(1:end-2); % find the second difference
MEAN_ABS_2ND_DIFF_RESULT{PartNo} = mean(abs(Z{PartNo})); % mean of absolute value
I might suggest a different approach
Y = reshape(CD1, 20, 6);
Z = diff(y(1:2:end,:));
MEAN_ABS_2ND_DIFF_RESULT = mean(abs(Z));
This is not a valid statement in matlab:
Y(PartNo)(3:end)
You should either make Y two-dimensional and use this indexing
Y(PartNo, 3:end)
or extract vector parts and use them directly, if you use a loop like you have shown
for PartNo = 1:6
% extract data
Y = CD1(1 + 20*(PartNo-1):20*(PartNo),:);
% find the second difference
Z = Y(3:end) - Y(1:end-2);
% mean of absolute value
MEAN_ABS_2ND_DIFF_RESULT(PartNo) = mean(abs(Z));
end
Also, since CD1 is a vector, you do not need to index the second dimension. Drop the :
Y = CD1(1 + 20*(PartNo-1):20*(PartNo));
Finally, you do not need a loop. You can reshape the CD1 vector to a two-dimensional array Y of size 20x6, in which the columns are your parts, and work directly on the resulting matrix:
Y = reshape(CD1, 20, 6);
Z = Y(3:end,:)-Y(1:end-1,:);
MEAN_ABS_2ND_DIFF_RESULT = mean(abs(Z));
Let's assume we have two arrays of the same size - A and B.
Now, we need a filter that, for a given mask size, selects elements from A, but removes the central element of the mask, and inserts there corresponding element from B.
So the 3x3 "pseudo mask" will look similar to this:
A A A
A B A
A A A
Doing something like this for averaging filter is quite simple. We can compute the mean value for elements from A without the central element, and then combine it with a proper proportion with elements from B:
h = ones(3,3);
h(2,2) =0;
h = h/sum(h(:));
A_ave = filter2(h, A);
C = (8/9) * A_ave + (1/9) * B;
But how to do something similar for median filter (medfilt2 or even better for ordfilt2)
The way to solve this is to find a way to combine the information from A and B so that the filtering itself becomes easy.
The first thing I thought of was to catenate A and B along the third dimension and to pass with a filter mask that would take 8 elements from the 'A-slice' and the center element from the 'B-slice'. This is, unfortunately, not supported by Matlab.
While nlfilter only works on 2D images, it does allow you to specify any function for filtering. Thus, you could create a function that somehow is able to look up the right values of A and B. Thus I came to my first solution.
You create a new array, C, that contains the element index at each element, i.e. the first element is 1, the second element is 2, etc. Then, you run nlfilter, which takes a 3x3 sliding window and passes the values of C inside the window to the filtering function, ffn. ffn is an anonymous function, that calls crazyFilter, and that has been initialized so that A and B get passed at each call. CrazyFunction takes the values from the sliding window of C, which are nothing but indices into A and B, and collects the values from A and B from them.
The second solution is exactly the same, except that instead of moving a sliding window, you create a new array that, in every column, has the contents of the sliding window at every possible location. With an overlapping window, the column array gets larger than the original array. Again, you then just need to use the values of the column array, C, which are indices into A and B, to look up the values of A and B at the relevant locations.
EDIT
If you have enough memory, im2col and col2im can speed up the process a lot
%# define A,B
A = randn(100);
B = rand(100);
%# pad A, B - you may want to think about how you want to pad
Ap = padarray(A,[1,1]);
Bp = padarray(B,[1,1]);
#% EITHER -- the more more flexible way
%# create a pseudo image that has indices instead of values
C = zeros(size(Ap));
C(:) = 1:numel(Ap);
%# convert to 'column image', where each column represents a block
C = im2col(C,[3,3]);
%# read values from A
data = Ap(C);
%# replace centers with values from B
data(5,:) = Bp(C(5,:));
%# OR -- the more efficient way
%# reshape A directly into windows and fill in B
data = im2col(Ap,[3,3]);
data(5,:) = B(:);
% median and reshape
out = reshape(median(data,1),size(A));
Old version (uses less memory, may need padding)
%# define A,B
A = randn(100);
B = rand(100);
%# define the filter function
ffun = #(x)crazyFilter(x,A,B);
%# create a pseudo image that has indices instead of values
C = zeros(size(A));
C(:) = 1:numel(A);
%# filter
filteredImage = nlfilter(C,[3,3],ffun);
%# filter function
function out = crazyFilter(input,A,B)
%#CRAZYFILTER takes the median of a 3x3 mask defined by input, taking 8 elements from A and 1 from B
%# read data from A
data = A(input(:));
%# replace center element with value from B
data(5) = B(input(5));
%# return the median
out = median(data);
Here's a solution that will work if your data is an unsigned integer type (like a typical grayscale image of type uint8). You can combine your two matrices A and B into a single matrix of a larger integer type, with the data from one matrix stored in the lower bits and the data from the other matrix stored in the higher bits. You can then use NLFILTER to apply a filtering function that extracts the appropriate bits of data in order to collect the necessary matrix values.
The following example applies a median filter of the form you describe above (a 3-by-3 array of elements from A with the center element from B) to two unsigned 8-bit matrices of random values:
%# Initialize some variables:
A = randi([0 255],[3 3],'uint8'); %# One random matrix of uint8 values
B = randi([0 255],[3 3],'uint8'); %# Another random matrix of uint8 values
C = uint16(A)+bitshift(uint16(B),8); %# Convert to uint16 and place the values
%# of A in the lowest 8 bits and the
%# values of B in the highest 8 bits
C = padarray(C,[1 1],'symmetric'); %# Pad the array edges
%# Make the median filtering function for each 3-by-3 block:
medFcn = #(x) median([bitand(x(1:4),255) ... %# Get the first four A values
bitshift(x(5),-8) ... %# Get the fifth B value
bitand(x(6:9),255)]); %# Get the last four A values
%# Perform the filtering:
D = nlfilter(C,[3 3],medFcn);
D = uint8(D(2:end-1,2:end-1)); %# Remove the padding and convert to uint8
Here are additional links for some of the key functions used above: PADARRAY, BITAND, BITSHIFT.