Iteratively take mean of column in Matlab - matlab

Hi I have a column of values in Matlab (PDS(:,39)). This column is filtered for various things and there are two seperate flagging columns (PDS(:,[41 81])) that are either 0 for a valid row or -1 for a non-valid row. I am taking the mean of the valid data, and if the mean is above 0, I'd like to make this value non-valid and take the mean again until the mean is below a certain value (0.2 in this instance). Here is my code:
% identify the VALID values
U1 = (PDS(:,81)==0);
F1 = (PDS(:,41)==0);
% only calculate using the valid elements
shearave = mean(PDS(U1&F1,39));
while shearave > 0.2
clear im
% determine the largest shear value overall for filtered and
% non-flagged
[c im] = max(PDS(U1&F1,39));
% make this value a NaN
PDS(im,39)=NaN;
% filter using a specific column and the overall column
PDS(im,41)=-1;
F1 = (PDS(:,41)==0);
% calculate shear ave again using new flagging column - remove the ";" so I can see the average change
shearave = mean(PDS(U1&F1,39))
end
The output that Matlab gives me is:
shearave =
0.3032
shearave =
0.3032
shearave =
0.3032
etc
The loop is not re-evalulating with the new valid data. How do I solve this problem? Do I have to use a break or continue? Or perhaps a different type of loop? Thanks for any help.

You don't need to use a loop, I'd do the following:
sort your data:
m=PDS(U1&F1,39);
[x isort]=sort(m);
Then calculate the cumulative mean of the sorted vector:
y = cumsum(x)./[1:numel(x)]';
Then truncate at 0.2, and retrieve the values needed using the indices found ...
ind=find(y<=0.2);
values_needed=m(isort(ind));

You iteratively replace values in column 39 with NaN. However, mean will not ignore NaN, but instead return NaN as the new average. You can see this with a little experiment:
>> mean([3, 4, 2, NaN, 4, 1])
ans = NaN
Therefore, shearave < 0.2 will never be true.

Related

One hot encode column vectors in matrix without iterating

I am implementing a neural network and am trying to one hot encode a matrix of column vectors based on the max value in each column. Previously, I had been iterating through the matrix vector by vector, but I've been told that this is unnecessary and that I can actually one hot encode every column vector in the matrix at the same time. Unfortunately, after perusing SO, GitHub, and MathWorks, nothing seems to be getting the job done. I've listed my previous code below. Please help! Thanks :)
UPDATE:
This is what I am trying to accomplish...except this only changed the max value in the entire matrix to 1. I want to change the max value in each COLUMN to 1.
one_hots = bsxfun(#eq, mini_batch_activations, max(mini_batch_activations(:)))
UPDATE 2:
This is what I am looking for, but it only works for rows. I need columns.
V = max(mini_batch_activations,[],2);
idx = mini_batch_activations == V;
Iterative code:
% This is the matrix I want to one hot encode
mini_batch_activations = activations{length(layers)};
%For each vector in the mini_batch:
for m = 1:size(mini_batch_activations, 2)
% Isolate column vector for mini_batch
vector = mini_batch_activations(:,m);
% One hot encode vector to compare to target vector
one_hot = zeros(size(mini_batch_activations, 1),1);
[max_val,ind] = max(vector);
one_hot(ind) = 1;
% Isolate corresponding column vector in targets
mini_batch = mini_batch_y{k};
target_vector = mini_batch(:,m);
% Compare one_hot to target vector , and increment result if they match
if isequal(one_hot, target_vector)
num_correct = num_correct + 1;
endif
...
endfor
You’ve got the maxima for each column:
V = max(mini_batch_activations,[],1); % note 1, not 2!
Now all you need to do is equality comparison, the output is a logical array that readily converts to 0s and 1s. Note that MATLAB and Octave do implicit singleton expansion:
one_hot = mini_batch_activations==V;

Optimize nested for loop for calculating xcorr of matrix rows

I have 2 nested loops which do the following:
Get two rows of a matrix
Check if indices meet a condition or not
If they do: calculate xcorr between the two rows and put it into new vector
Find the index of the maximum value of sub vector and replace element of LAG matrix with this value
I dont know how I can speed this code up by vectorizing or otherwise.
b=size(data,1);
F=size(data,2);
LAG= zeros(b,b);
for i=1:b
for j=1:b
if j>i
x=data(i,:);
y=data(j,:);
d=xcorr(x,y);
d=d(:,F:(2*F)-1);
[M,I] = max(d);
LAG(i,j)=I-1;
d=xcorr(y,x);
d=d(:,F:(2*F)-1);
[M,I] = max(d);
LAG(j,i)=I-1;
end
end
end
First, a note on floating point precision...
You mention in a comment that your data contains the integers 0, 1, and 2. You would therefore expect a cross-correlation to give integer results. However, since the calculation is being done in double-precision, there appears to be some floating-point error introduced. This error can cause the results to be ever so slightly larger or smaller than integer values.
Since your calculations involve looking for the location of the maxima, then you could get slightly different results if there are repeated maximal integer values with added precision errors. For example, let's say you expect the value 10 to be the maximum and appear in indices 2 and 4 of a vector d. You might calculate d one way and get d(2) = 10 and d(4) = 10.00000000000001, with some added precision error. The maximum would therefore be located in index 4. If you use a different method to calculate d, you might get d(2) = 10 and d(4) = 9.99999999999999, with the error going in the opposite direction, causing the maximum to be located in index 2.
The solution? Round your cross-correlation data first:
d = round(xcorr(x, y));
This will eliminate the floating-point errors and give you the integer results you expect.
Now, on to the actual solutions...
Solution 1: Non-loop option
You can pass a matrix to xcorr and it will perform the cross-correlation for every pairwise combination of columns. Using this, you can forego your loops altogether like so:
d = round(xcorr(data.'));
[~, I] = max(d(F:(2*F)-1,:), [], 1);
LAG = reshape(I-1, b, b).';
Solution 2: Improved loop option
There are limits to how large data can be for the above solution, since it will produce large intermediate and output variables that can exceed the maximum array size available. In such a case for loops may be unavoidable, but you can improve upon the for-loop solution above. Specifically, you can compute the cross-correlation once for a pair (x, y), then just flip the result for the pair (y, x):
% Loop over rows:
for row = 1:b
% Loop over upper matrix triangle:
for col = (row+1):b
% Cross-correlation for upper triangle:
d = round(xcorr(data(row, :), data(col, :)));
[~, I] = max(d(:, F:(2*F)-1));
LAG(row, col) = I-1;
% Cross-correlation for lower triangle:
d = fliplr(d);
[~, I] = max(d(:, F:(2*F)-1));
LAG(col, row) = I-1;
end
end

delete elements from a matrix and calculate mean

I have an N-by-M-Matrix as input called GR wich consists of the following numbers: -3,0,2,4,7,10,12
And I have to return a vector. If M=1, then it should just return the input.
If M>1 It should remove the lowest number from the matrix and then calculate the mean of the remaining numbers.
However, if one of the numbers in the row is -3, it should return the value -3 in the output.
My thoughts of the problem:
Is it possible to make a for loop?
for i=1:length(GR(:,1))
If length(GR(1,:))==1
GR=GR
end
If length(GR(1,:))>1
x=min(GR(i,:))=[] % for removing the lowest number in the row
GR=sum(x)/length(x(i,:))
I just don't have any Idea of how to detect if any of the numbers in the row is -3 and then return that value instead of calculating the mean and when I tried to delete the lowest number in the matrix using x=min(GR(i,:)) matlab gave me this error massage 'Deletion requires an existing variable.'
I put in a break function. As soon as it detects a -3 value it breaks from the loop. Same goes for the other function.
Note that it is an i,j (M*N) matrix. So you might need to change your loop.
for i=1:length(GR(:,1))
if GR(i,1)==-3
GR=-3
break
end
If length(GR(1,:))==1
GR=GR
break
end
If length(GR(1,:))>1
x=min(GR(i,:))=[] % for removing the lowest number in the row
GR=sum(x)/length(x(i,:))
end
end
you can use Nan's, nanmean, any, and dim argument in these functions:
% generate random matrix
M = randi(3);
N = randi(3);
nums = [-3,0,2,4,7,10,12];
GR = reshape(randsample(nums,N*M,true),[N M]);
% computation:
% find if GR has only one column
if size(GR,2) == 1
res = GR;
else
% find indexes of rows with -3 in them
idxs3 = any(GR == -3,2);
% the (column) index of the min. value in each row
[~,minCol] = min(GR,[],2);
% convert [row,col] index pair into linear index
minInd = sub2ind(size(GR),1:size(GR,1),minCol');
% set minimum value in each row to nan - to ignore it on averaging
GR(minInd) = nan;
% averaging each rows (except for the Nans)
res = nanmean(GR,2);
% set each row with (-3) in it to (-3)
res(idxs3) = -3;
end
disp(res)

Assign labels based on given examples for a large dataset effectively

I have matrix X (100000 X 10) and vector Y (100000 X 1). X rows are categorical and assume values 1 to 5, and labels are categorical too (11 to 20);
The rows of X are repetitive and there are only ~25% of unique rows, I want Y to have statistical mode of all the labels for a particular unique row.
And then there comes another dataset P (90000 X 10), I want to predict labels Q based on the previous exercise.
What I tried is finding unique rows of X using unique in MATLAB, and then assign statistical mode of each of these labels for the unique rows. For P, I can use ismember and carry out the same.
The issue is in the size of the dataset and it takes an 1.5-2 hours to complete the process. Is there a vectorize version possible in MATLAB?
Here is my code:
[X_unique,~,ic] = unique(X,'rows','stable');
labels=zeros(length(X_unique),1);
for i=1:length(X_unique)
labels(i)=mode(Y(ic==i));
end
Q=zeros(length(P),1);
for j=1:length(X_unique)
Q(all(repmat(X_unique(j,:),length(P),1)==P,2))=label(j);
end
You will be able to accelerate your first loop a great deal if you replace it entirely with:
labels = accumarray(ic, Y, [], #(y) mode(y));
The second loop can be accelerated by using all(bsxfun(#eq, X_unique(i,:), P), 2) inside Q(...). This is a good vectorized approach assuming your arrays are not extremely large w.r.t. the available memory on your machine. In addition, to save more time, you could use the unique trick you did with X on P, run all the comparisons on a much smaller array:
[P_unique, ~, IC_P] = unique(P, 'rows', 'stable');
EDIT:
to compute Q_unique in the following way: and then convert it back to the full array using:
Q_unique = zeros(length(P_unique),1);
for i = 1:length(X_unique)
Q_unique(all(bsxfun(#eq, X_unique(i,:), P_unique), 2)) = labels(i)
end
and convert back to Q_full to match the original P input:
Q_full = Q_unique(IC_P);
END EDIT
Finally, if memory is an issue, in addition to everything above, you might want you use a semi-vectorized approach inside your second loop:
for i = 1:length(X_unique)
idx = true(length(P), 1);
for j = 1:size(X_unique,2)
idx = idx & (X_unique(i,j) == P(:,j));
end
Q(idx) = labels(i);
% Q(all(bsxfun(#eq, X_unique(i,:), P), 2)) = labels(i);
end
This would take about x3 longer compared with bsxfun but if memory is limited then you gotta pay with speed.
ANOTHER EDIT
Depending on your version of Matlab, you could also use containers.Map to your advantage by mapping textual representations of the numeric sequences to the calculated labels. See example below.
% find unique members of X to work with a smaller array
[X_unique, ~, IC_X] = unique(X, 'rows', 'stable');
% compute labels
labels = accumarray(IC_X, Y, [], #(y) mode(y));
% convert X to cellstr -- textual representation of the number sequence
X_cellstr = cellstr(char(X_unique+48)); % 48 is ASCII for 0
% map each X to its label
X_map = containers.Map(X_cellstr, labels);
% find unique members of P to work with a smaller array
[P_unique, ~, IC_P] = unique(P, 'rows', 'stable');
% convert P to cellstr -- textual representation of the number sequence
P_cellstr = cellstr(char(P_unique+48)); % 48 is ASCII for 0
% --- EDIT --- avoiding error on missing keys in X_map --------------------
% find which P's exist in map
isInMapP = X_map.isKey(P_cellstr);
% pre-allocate Q_unique to the size of P_unique (can be any value you want)
Q_unique = nan(size(P_cellstr)); % NaN is safe to use since not a label
% find the labels for each P_unique that exists in X_map
Q_unique(isInMapP) = cell2mat(X_map.values(P_cellstr(isInMapP)));
% --- END EDIT ------------------------------------------------------------
% convert back to full Q array to match original P
Q_full = Q_unique(IC_P);
This takes about 15 seconds to run on my laptop. Most of which is consumed by computation of mode.

Variable has "incorrect" value when submitted to Matlab Grader

I am struggling with my Matlab homework:
Write a script to do the following:
Generate a matrix called grades of size 8 x 25 that contains random numbers of type double in the range of 1 to 6.
Calculate the mean of matrix rows (mrow), the mean of matrix columns (mcol), and the overall mean (mall) of the matrix grades.
Copy the matrix grades to a new variable, in which you replace the elements in the 5th row and 20th to 23rd column with NaN. Compute the overall mean (mall_2) of this matrix again, i.e., the mean of the remaining values.
I am done with task 2-5, however, task 1 is not correct. I am not sure what I am doing wrong. I assume that it has something to do with the type of number (double), but I was unable to convert it.
We have to submit our homework to the online tool "Matlab Grader". The system says:
Matrix of random numbers : Variable grades has an incorrect value.
Here is my code:
% Generate matrix 'grades' with random numbers in the range 1 to 6
a = 1;
b = 6;
grades = (b-a).*rand(8,25) + a;
% calculate mean values 'mrow', 'mcol', 'mall'
mrow = mean(grades,2)
mcol = mean(grades,1)
mall = mean(grades(:))
% Replace elements with NaN
grades(5,20:23) = NaN
%Calculate mean of elements omitting NaN
mall_2 = mean(grades(:),'omitnan')
I assume your homework validation system is checking that everything in the variable grades is a (random) number in the range 1 to 6, as required by question 1.
However, by the end of your computation there are also 3 NaN values in the grades variable, because you missed this step of question 3:
Copy the matrix grades to a new variable
Instead, you overrode the elements in grades.
If you did this:
grades_mod = grades;
grades_mod(5,20:23) = NaN;
mall_2 = mean(grades_mod(:),'omitnan');
Then grades would retain its original values (no NaNs) and you can calculate mall_2.