Performing an averaging operation over every n elements in a vector - matlab

I have a logical vector in which I would like to iterate over every n-elements. If in any given window at least 50% are 1's, then I change every element to 1, else I keep as is and move to the next window. For example.
n = 4;
input = [0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 1];
output = func(input,4);
output = [0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1];
This function is trivial to implement but is it possible to apply a vectorized implementation using logical indexing?. I am trying to build up the intuition of applying this technique.

here's a one liner (that works for your input):
func = #(input,n) input | kron(sum(reshape(input ,n,[]))>=n/2,ones(1,n));
of course, there are cases to solve that this doesnt answer, what if the size of the input is not commensurate in n? etc...
i'm not sure if that's what you meant by vectorization, and I didnt benchmark it vs a for loop...

Here is one way of doing it. Once understood you can compact it in less lines but I'll details the intermediate steps for the sake of clarity.
%% The inputs
n = 4;
input = [0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 1];
1) Split your input into blocks of size n (note that your final function will have to check that the number of elements in input is a integer multiple of n)
c = reshape(input,n,[]) ;
Gives you a matrix with your blocks organized in columns:
c =
0 0 0 0 0
0 1 0 1 0
0 1 0 0 0
1 0 1 1 1
2) Perform your test condition on each of the block. For this we'll take advantage that Matlab is working column wise for the sum function:
>> cr = sum(c) >= (n/2)
cr =
0 1 0 1 0
Now you have a logical vector cr containing as many elements as initial blocks. Each value is the result of the test condition over the block. The 0 blocks will be left unchanged, the 1 blocks will be forced to value 1.
3) Force 1 columns/block to value 1:
>> c(:,cr) = 1
c =
0 1 0 1 0
0 1 0 1 0
0 1 0 1 0
1 1 1 1 1
4) Now all is left is to unfold your matrix. You can do it several ways:
res = c(:) ; %% will give you a column vector
OR
>> res = reshape(c,1,[]) %% will give you a line vector
res =
0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1

Related

How to create an error function comparing two matrices?

I have two matrices in MATLAB. Each one is filled with 1 and 0 at different positions. I want to compare each element:
If there is a 1 match, I want it to record as True Positive.
If there is a 0 match, I want it to record as True Negative.
If one says 1 and the other says 0, I want to record as False Positive.
If one says 0 and the other says 1, I want to record as False Negative.
I tried just comparing the two matrices:
idx = A == B
But, that gives me a simple match, not telling me when there is a True Positive or Negative, etc.
Is there any specific function I could use, or any alternative?
You could just add the matrices in a prescribed way....
a = [1 0 1 0
1 1 0 0
0 0 1 1];
b = [1 0 0 0
0 0 0 1
0 0 1 0];
C = a + 2*b;
% For pairs [a,b] we expect
% [0,0]: C = 0, true negative
% [1,0]: C = 1, false positive
% [0,1]: C = 2, false negative
% [1,1]: C = 3, true positive
% C =
% [ 3 0 1 0
% 1 1 0 2
% 0 0 3 1 ]
If you have the Statistics and Machine Learning toolbox and you only want a summary, you might just need the function confusionmat.
From the docs:
C = confusionmat(group,grouphat) returns the confusion matrix C determined by the known and predicted groups in group and grouphat. [...]. C is a square matrix with size equal to the total number of distinct elements in group and grouphat. C(i,j) is a count of observations known to be in group i but predicted to be in group j.
For example:
a = [1 0 1 0
1 1 0 0
0 0 1 1];
b = [1 0 0 0
0 0 0 1
0 0 1 0];
C = confusionmat( a(:), b(:) );
% C =
% [ 5 1
% 4 2]
% So for each pair [a,b], we have 5*[0,0], 2*[1,1], 4*[1,0], 1*[0,1]
A similar function for those with the Neural Network Toolbox instead would be confusion.
You could just use bitwise operators to produce the four different values:
bitor(bitshift(uint8(b),1),uint8(a))
Produces an array with
0 : True Negative
1 : False Negative (a is true but b is false)
2 : False Positive (a is false but b is true)
3 : True Positive
One naive approach would be four comparisons, case by case:
% Set up some artificial data
ground_truth = randi(2, 5) - 1
compare = randi(2, 5) - 1
% Determine true positives, false positives, etc.
tp = ground_truth & compare
fp = ~ground_truth & compare
tn = ~ground_truth & ~compare
fn = ground_truth & ~compare
Output:
ground_truth =
1 0 1 0 0
0 1 1 0 1
1 1 0 1 0
0 1 0 1 1
0 0 0 1 0
compare =
0 1 1 0 1
0 1 1 1 0
1 1 0 0 1
1 1 1 0 0
1 1 1 1 1
tp =
0 0 1 0 0
0 1 1 0 0
1 1 0 0 0
0 1 0 0 0
0 0 0 1 0
fp =
0 1 0 0 1
0 0 0 1 0
0 0 0 0 1
1 0 1 0 0
1 1 1 0 1
tn =
0 0 0 1 0
1 0 0 0 0
0 0 1 0 0
0 0 0 0 0
0 0 0 0 0
fn =
1 0 0 0 0
0 0 0 0 1
0 0 0 1 0
0 0 0 1 1
0 0 0 0 0
That works, because 0 and 1 (or any positive value) are alternative representations for true and false.
To keep your main code clean, set up a separate function, say my_stats.m
function [tp, fp, tn, fn] = my_stats(ground_truth, compare)
% Determine true positives, false positives, etc.
tp = ground_truth & compare;
fp = ~ground_truth & compare;
tn = ~ground_truth & ~compare;
fn = ground_truth & ~compare;
end
and call it in your main code:
% Set up some artificial data
ground_truth = randi(2, 5) - 1
compare = randi(2, 5) - 1
[tp, fp, tn, fn] = my_stats(ground_truth, compare)
Hope that helps!
I found that I can use the find method and set two conditions, then just find the numbers of the element in each variable
TruePositive = length(find(A==B & A==1))
TrueNegative = length(find(A==B & A==0))
FalsePositive = length(find(A~=B & A==1))
FalseNegative = length(find(A~=B & A==0))
The confusionmatrix() method suggested by #Wolfie is also really neat, especially if you use the confusionchart() which provides a nice visualisation.

Iterating through a matrix using a smaller matrix

I've been struggling with this for a bit now. I have a small matrix s for example and a bigger matrix B as shown below.
B =
0 0 0 0 0 0 1 1
1 1 0 0 1 0 1 1
1 1 0 1 0 0 1 1
1 1 1 0 0 0 1 0
0 0 1 1 1 0 0 1
0 0 0 1 1 1 1 1
1 1 1 0 0 0 1 0
0 1 1 0 1 1 0 0
s =
1 1
1 1
What I want to do is iterate through B with s and compare the values. If all the values in s equal the values in B (the small section of B), then the answer is 1, if not then 0.
The 1's and 0's would be placed in a matrix as well.
This is what I've done so far but unfortunately, it doesn't iterate step by step and doesn't create a matrix either.
s = ones(2,2)
B = randi([0 1],8,8)
f = zeros(size(B))
[M,N]=size(B); % the larger array
[m,n]=size(s); % and the smaller...
for i=1:M/m-(m-1)
for j=1:N/n-(n-1)
if all(s==B(i:i+m-1,j:j+n-1))
disp("1")
else
disp("0")
end
end
end
Any help would be appreciated!
The following code works on the examples you supplied, I haven't tested it on anything else, and it will not work if the dimensions of the smaller matrix are not factors of the dimensions of the larger matrix, but you didn't indicate that it needed to do that in your description.
B =[0 0 0 0 0 0 1 1
1 1 0 0 1 0 1 1
1 1 0 1 0 0 1 1
1 1 1 0 0 0 1 0
0 0 1 1 1 0 0 1
0 0 0 1 1 1 1 1
1 1 1 0 0 0 1 0
0 1 1 0 1 1 0 0];
S =[1 1
1 1];
%check if array meets size requirements
numRowB = size(B,1);
numRowS = size(S,1);
numColB = size(B,2);
numColS = size(S,2);
%get loop multiples
incRows = numRowB/numRowS;
incCols = numColB/numColS;
%create output array
result = zeros(incRows, incCols);
%create rows and colums indices
rowsPull = 1:numRowS:numRowB;
colsPull = 1:numColS:numColB;
%iterate
for i= 1:incRows
for j= 1:incCols
result(i,j) = isequal(B(rowsPull(i):rowsPull(i)+numRowS-1, colsPull(j):colsPull(j)+numColS-1),S);
end
end
%print the resulting array
disp(result)

How to sort the columns of a matrix in order of some other vector in MATLAB?

Say I have a vector A of item IDs:
A=[50936
332680
107430
167940
185820
99732
198490
201250
27626
69375];
And I have a matrix B whose rows contains values of 8 parameters for each of the items in vector A:
B=[0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 1 0 1 1 1
1 0 1 0 0 1 0 1 1 1
0 0 1 0 0 0 0 1 0 1
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1];
So, column 1 in matrix B represents data of item in row 1 of vector A, column 2 in matrix B represents data of item in row 2 of vector A, and so on. However, I want matrix B to contain the information in a different order of items stored in vector A2:
A2=[185820
198490
69375
167940
99732
332680
27626
107430
50936
201250];
How do I sort them, so that column 1 of matrix B contains data for item in row 1 of vector A2, column 2 of matrix B contains data for item in row 2 of vector A2, and so on?
My extremely crude solution to do this is the following:
A=A'; A2=A2';
for i=1:size(A,2)
A(2:size(B,1)+1,i)=B(:,i);
end
A2(2:size(B,1)+1,:)=zeros(size(B,1),size(B,2));
for i=size(A2,2)
for j=size(A,2)
if A2(1,i)==A(1,j)
A2(2:end,i)=A(2:end,j);
end
end
end
B2 = A2(2:end,:);
But I would like to know a cleaner, more elegant and less time consuming method to do this.
A possible solution
You can use second output of ismember function.
[~ ,idx] = ismember(A2,A);
B2 = B(:,idx);
Update:I tested both my solution and another proposed by hbaderts
disp('-----ISMEMBER:-------')
tic
[~,idx]=ismember(A2,A);
toc
disp('-----SORT:-----------')
tic
[~,idx1] = sort(A);
[~,idx2] = sort(A2);
map = zeros(1,size(idx2));
map(idx2) = idx1;
toc
Here is the result in Octave:
-----ISMEMBER:-------
Elapsed time is 0.00157714 seconds.
-----SORT:-----------
Elapsed time is 4.41074e-05 seconds.
Conclusion: the sort method is more efficient!
As both A and A2 contain the exact same elements, just sorted differently, we can create a mapping from the A-sorting to the A2-sorting. For that, we run the sort function on both and save indexes (which are the second output).
[~,idx1] = sort(A);
[~,idx2] = sort(A2);
Now, the first element in idx1 corresponds to the first element in idx2, so A(idx1(1)) is the same as A2(idx2(1)) (which is 27626). To create a mapping idx1 -> idx2, we use matrix indexing as follows
map = zeros(size(idx2));
map(idx2) = idx1;
To sort B accordingly, all we need to do is
B2 = B(:, map);
[A2, sort_order] = sort(A);
B2 = B(:, sort_order)
MATLAB's sort function returns the order in which the items in A are sorted. You can use this to order the columns in B.
Transpose B so you can concatenate it with A:
C = [A B']
Now you have
C = [ 50936 0 0 1 1 0 0 0 0;
332680 0 0 0 0 0 0 0 0;
107430 0 0 1 1 1 0 0 0;
167940 0 0 0 0 0 0 0 0;
185820 0 0 0 0 0 0 0 0;
99732 0 0 1 1 0 0 0 0;
198490 0 0 0 0 0 0 0 0;
201250 0 0 1 1 1 1 0 0;
27626 0 0 1 1 0 0 0 0;
69375 0 0 1 1 1 0 0 1];
You can now sort the rows of the matrix however you want. For example, to sort by ID in ascending order, use sortrows:
C = sortrows(C)
To just swap rows around, use a permutation of 1:length(A):
C = C(perm, :)
where perm could be something like [4 5 6 3 2 1 8 7 9 10].
This way, your information is all contained in one structure and the data is always correctly matched to the proper ID.

Measure how spread out the data in an array is

I have an array of zeros and ones and I need to know if the data is spread out across the columns or concentrated in clumps.
For example:
If I have array x and it has these values:
Column 1 values: 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
Column 2 values: 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1
if we counted the number of ones we can know that it is the same number but the ones are more well spread out and distributed in column 2 compared with column 1.
I am trying to make a score that gives me a high value if the spreading is good and low value if the spreading is bad... any ideas??
Sample of Data:
1 0 0 0 5 0 -2 -3 0 0 1
1 0 0 0 0 0 0 0 0 0 1
2 0 0 0 0 0 0 3 -3 1 0
1 2 3 0 5 0 2 13 4 5 1
1 0 0 0 0 0 -4 34 0 0 1
I think what you're trying to measure is the variance of the distribution of the number of 0s between the 1s, i.e:
f = #(x)std(diff(find(x)))
So for you data:
a = [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
b = [1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1]
f(a)
= 8.0498
f(b)
= 2.0736
But I still think you're essentially trying to measure the disorder of the system which is what I imagine entropy measures but I don't know how
Note that this gives a low value if the "spreading" is good and a high value if it is bad (i.e. the opposite of your request).
Also if you want it per column then it becomes a little more complicated:
f = #(x)arrayfun(#(y)std(diff(find(x(:,y)))), 1:size(x,2))
data = [a', b'];
f(data)
WARNING: This method pretty much does not consider trailing and leading 0s. I don't know if that's a problem or not. but basically f([0; 0; 0; 1; 1; 1; 0; 0; 0]) returns 0 where as f([1; 0; 0; 1; 0; 1; 0; 0; 0]) returns a positive indicating (incorrectly) that first case is more distributed. One possible fix might be to prepend and append a row of ones to the matrix...
I think you would need an interval to find the "spreadness" locally, otherwise the sample 1 (which is named as Column 1 in the question) would appear as spread too between the 2nd and 3rd ones.
So, following that theory and assuming input_array to be the input array, you can try this approach -
intv = 10; %// Interval
diff_loc = diff(find(input_array))
spread_factor = sum(diff_loc(diff_loc<=intv)) %// desired output/score
For sample 1, spread_factor gives 4 and for sample 2 it is 23.
Another theory that you can employ would be if you assume an interval such that distance between consecutive ones must be greater than or equal to that interval. This theory would lead us to a code like this -
intv = 3; %// Interval
diff_loc = diff(find(input_array))
spread_factor = sum(diff_loc>=intv)
With this new approach - For sample 1, spread_factor is 1 and for sample 2 it is 5.

Get the indexes of the boundary cells of a subset of a matrix. Matlab

Given a matrix where 1 is the current subset
test =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Is there a function, or quick method to get change the subset to the boundary of the current subset?
Eg. Get this subset from 'test' above
test =
0 0 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 0
0 1 1 1 1 0
0 0 0 0 0 0
In the end I just want to get the minimum of the cells surrounding a subset of a matrix. Sure I could loop through and get the minimum of the boundary (cell by cell), but there must be a way to do it with the method i've shown above.
Note the subset WILL be connected, but may not be rectangular. This may be the big catch.
This is a possible subset.... (Would pad this with a NaN border)
test =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 1 1 1 1
0 0 1 1 1 1
Ideas?
The basic steps I'd use are:
Perform a dilation on the shape to get a new area which is the shape plus its boundary
Subtract the original shape from the dilated shape to leave just the boundary
Use the boundary to index your data matrix, then take the minimum.
Dilation
What I want to do here is pass a 3x3 window over each cell and take the maximum value in that window:
[m, n] = size(A); % assuming A is your original shape matrix
APadded = zeros(m + 2, n + 2);
APadded(2:end-1, 2:end-1) = A; % pad A with zeroes on each side
ADilated = zeros(m + 2, n + 2); % this will hold the dilated shape.
for i = 1:m
for j = 1:n
mask = zeros(size(APadded));
mask(i:i+2, j:j+2) = 1; % this places a 3x3 square of 1's around (i, j)
ADilated(i + 1, j + 1) = max(APadded(mask));
end
end
Shape subtraction
This is basically a logical AND and a logical NOT to remove the intersection:
ABoundary = ADilated & (~APadded);
At this stage you may want to remove the border we added to do the dilation, since we don't need it any more.
ABoundary = ABoundary(2:end-1, 2:end-1);
Find the minimum data point along the boundary
We can use our logical boundary to index the original data into a vector, then just take the minimum of that vector.
dataMinimum = min(data(ABoundary));
You should look at this as morphology problem, not set theory. This can be solved pretty easily with imdilate() (requires the image package). You basically only need to subtract the image to its dilation with a 3x3 matrix of 1.
octave> test = logical ([0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 1 1 1 1
0 0 1 1 1 1]);
octave> imdilate (test, true (3)) - test
ans =
0 0 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 1
0 1 0 0 0 0
0 1 0 0 0 0
It does not, however, pads with NaN. If you really want that, you could pad your original matrix with false, do the operation, and then check if there's any true values in the border.
Note that you don't have to use logical() in which case you'll have to use ones() instead of true(). But that takes more memory and has worse performance.
EDIT: since you are trying to do it without using any matlab toolbox, take a look at the source of imdilate() in Octave. For the case of logical matrices (which is your case) it's a simple usage of filter2() which belongs to matlab core. That said, the following one line should work fine and be much faster
octave> (filter2 (true (3), test) > 0) - test
ans =
0 0 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 1
0 1 0 0 0 0
0 1 0 0 0 0
One possible solution is to take the subset and add it to the original matrix, but ensure that each time you add it, you offset its position by +1 row, -1 row and +1 column, -1 column. The result will then be expanded by one row and column all around the original subset. You then use the original matrix to mask the original subet to zero.
Like this:
test_new = test + ...
[[test(2:end,2:end);zeros(1,size(test,1)-1)],zeros(size(test,1),1)] + ... %move subset up-left
[[zeros(1,size(test,1)-1);test(1:end-1,2:end)],zeros(size(test,1),1)] + ... %move down-left
[zeros(size(test,1),1),[test(2:end,1:end-1);zeros(1,size(test,1)-1)]] + ... %move subset up-right
[zeros(size(test,1),1),[zeros(1,size(test,1)-1);test(1:end-1,1:end-1)]]; %move subset down-right
test_masked = test_new.*~test; %mask with original matrix
result = test_masked;
result(result>1)=1; % ensure that there is only 1's, not 2, 3, etc.
The result for this on your test matrix is:
result =
0 0 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 1
0 1 0 0 0 0
0 1 0 0 0 0
Edited - it now grabs the corners as well, by moving the subset up and to the left, up and to the right, down then left and down then right.
I expect this would be a very quick way to achieve this - it doesn't have any loops, nor functions - just matrix operations.