Indices of constant consecutive values in a matrix, and number of constant values - matlab

I have a matrix with constant consecutive values randomly distributed throughout the matrix. I want the indices of the consecutive values, and further, I want a matrix of the same size as the original matrix, where the number of consecutive values are stored in the indices of the consecutive values. For Example
original_matrix = [1 1 1;2 2 3; 1 2 3];
output_matrix = [3 3 3;2 2 0;0 0 0];
I have struggled mightily to find a solution to this problem. It has relevance for meteorological data quality control. For example, if I have a matrix of temperature data from a number of sensors, and I want to know what days had constant consecutive values, and how many days were constant, so I can then flag the data as possibly faulty.
temperature matrix is number of days x number of stations and I want an output matrix that is also number of days x number of stations, where the consecutive values are flagged as described above.
If you have a solution to that, please provide! Thank you.

For this kind of problems, I made my own utility function runlength:
function RL = runlength(M)
% calculates length of runs of consecutive equal items along columns of M
% work along columns, so that you can use linear indexing
% find locations where items change along column
jumps = diff(M) ~= 0;
% add implicit jumps at start and end
ncol = size(jumps, 2);
jumps = [true(1, ncol); jumps; true(1, ncol)];
% find linear indices of starts and stops of runs
ijump = find(jumps);
nrow = size(jumps, 1);
istart = ijump(rem(ijump, nrow) ~= 0); % remove fake starts in last row
istop = ijump(rem(ijump, nrow) ~= 1); % remove fake stops in first row
rl = istop - istart;
assert(sum(rl) == numel(M))
% make matrix of 'derivative' of runlength
% don't need last row, but needs same size as jumps for indices to be valid
dRL = zeros(size(jumps));
dRL(istart) = rl;
dRL(istop) = dRL(istop) - rl;
% remove last row and 'integrate' to get runlength
RL = cumsum(dRL(1:end-1,:));
It only works along columns since it uses linear indexing. Since you want do something similar along rows, you need to transpose back and forth, so you could use it for your case like so:
>> original = [1 1 1;2 2 3; 1 2 3];
>> original = original.'; % transpose, since runlength works along columns
>> output = runlength(original);
>> output = output.'; % transpose back
>> output(output == 1) = 0; % see hitzg's comment
>> output
output =
3 3 3
2 2 0
0 0 0

Related

MATLAB: Applying vectors of row and column indices without looping

I have a situation analogous to the following
z = magic(3) % Data matrix
y = [1 2 2]' % Column indices
So,
z =
8 1 6
3 5 7
4 9 2
y represents the column index I want for each row. It's saying I should take row 1 column 1, row 2 column 2, and row 3 column 2. The correct output is therefore 8 5 9.
I worked out I can get the correct output with the following
x = 1:3;
for i = 1:3
result(i) = z(x(i),y(i));
end
However, is it possible to do this without looping?
Two other possible ways I can suggest is to use sub2ind to find the linear indices that you can use to sample the matrix directly:
z = magic(3);
y = [1 2 2];
ind = sub2ind(size(z), 1:size(z,1), y);
result = z(ind);
We get:
>> result
result =
8 5 9
Another way is to use sparse to create a sparse matrix which you can turn into a logical matrix and then sample from the matrix with this logical matrix.
s = sparse(1:size(z,1), y, 1, size(z,1), size(z,2)) == 1; % Turn into logical
result = z(s);
We also get:
>> result
result =
8
5
9
Be advised that this only works provided that each row index linearly increases from 1 up to the end of the rows. This conveniently allows you to read the elements in the right order taking advantage of the column-major readout that MATLAB is based on. Also note that the output is also a column vector as opposed to a row vector.
The link posted by Adriaan is a great read for the next steps in accessing elements in a vectorized way: Linear indexing, logical indexing, and all that.
there are many ways to do this, one interesting way is to directly work out the indexes you want:
v = 0:size(y,2)-1; %generates a number from 0 to the size of your y vector -1
ind = y+v*size(z,2); %generates the indices you are looking for in each row
zinv = z';
zinv(ind)
>> ans =
8 5 9

How to generate this matrix in matlab

H matrix is n-by-n, n=10000. I can use loop to generate this matrix in matlab. I just wonder if there are any methods that can do this without looping in matlab.
You can see that the upper right portion of the matrix consists of 1 / sqrt(n*(n-1)), the diagonal elements consist of -(n-1)/sqrt(n*(n-1)), the first column consists of 1/sqrt(n) and the rest of the elements are zero.
We can generate the full matrix that consists of the first column having all 1 / sqrt(n), then having the rest of the columns with 1 / sqrt(n*(n-1)) then we'll need to modify the matrix to include the rest of what you want.
As such, let's concentrate on the elements that start from row 2, column 2 as these follow a pattern. Once we're done, we can construct the other things that build up the final matrix.
x = 2:n;
Hsmall = repmat([1./sqrt(x.*(x-1))], n-1, 1);
Next, we will tackle the diagonal elements:
Hsmall(logical(eye(n-1))) = -(x-1)./sqrt(x.*(x-1));
Now, let's zero the rest of the elements:
Hsmall(tril(logical(ones(n-1)),-1)) = 0;
Now that we're done, let's create a new matrix that pieces all of this together:
H = [1/sqrt(n) 1./sqrt(x.*(x-1)); repmat(1/sqrt(n), n-1, 1) Hsmall];
Therefore, the full code is:
x = 2:n;
Hsmall = repmat([1./sqrt(x.*(x-1))], n-1, 1);
Hsmall(logical(eye(n-1))) = -(x-1)./sqrt(x.*(x-1));
Hsmall(tril(logical(ones(n-1)),-1)) = 0;
H = [1/sqrt(n) 1./sqrt(x.*(x-1)); repmat(1/sqrt(n), n-1, 1) Hsmall];
Here's an example with n = 6:
>> H
H =
Columns 1 through 3
0.408248290463863 0.707106781186547 0.408248290463863
0.408248290463863 -0.707106781186547 0.408248290463863
0.408248290463863 0 -0.816496580927726
0.408248290463863 0 0
0.408248290463863 0 0
0.408248290463863 0 0
Columns 4 through 6
0.288675134594813 0.223606797749979 0.182574185835055
0.288675134594813 0.223606797749979 0.182574185835055
0.288675134594813 0.223606797749979 0.182574185835055
-0.866025403784439 0.223606797749979 0.182574185835055
0 -0.894427190999916 0.182574185835055
0 0 -0.912870929175277
Since you are working with a pretty large n value of 10000, you might want to squeeze out as much performance as possible.
Going with that, you can use an efficient approach based on cumsum -
%// Values to be set in each column for the upper triangular region
upper_tri = 1./sqrt([1:n].*(0:n-1));
%// Diagonal indices
diag_idx = [1:n+1:n*n];
%// Setup output array
out = zeros(n,n);
%// Set the first row of output array with upper triangular values
out(1,:) = upper_tri;
%// Set the diagonal elements with the negative triangular values.
%// The intention here is to perform CUMSUM across each column later on,
%// thus therewould be zeros beyond the diagonal positions for each column
out(diag_idx) = -upper_tri;
%// Set the first element of output array with n^(-1/2)
out(1) = -1/sqrt(n);
%// Finally, perform CUMSUM as suggested earlier
out = cumsum(out,1);
%// Set the diagonal elements with the actually expected values
out(diag_idx(2:end)) = upper_tri(2:end).*[-1:-1:-(n-1)];
Runtime Tests
(I) With n = 10000, the runtime at my end were - Elapsed time is 0.457543 seconds.
(II) Now, as the final performance-squeezing practice, you can edit the pre-allocation step for out with a faster pre-allocation scheme as listed in this MATLAB Undodumented Blog. Thus, the pre-allocation step would look like this -
out(n,n) = 0;
The runtime with this edited code was - Elapsed time is 0.400399 seconds.
(III) The runtime for n = 10000 with the other answer by #rayryeng yielded - Elapsed time is 1.306339 seconds.

frequency of each vector value in another vector matlab

I need to calculate the frequency of each value in another vector in MATLAB.
I can use something like
for i=1:length(pdata)
gt(i)=length(find(pf_test(:,1)==pdata(i,1)));
end
But I prefer not to use loop because my dataset is quite large. Is there anything like histc (which is used to find the frequency of values in one vector) to find the frequency of one vector value in another vector?
If your values are only integers, you could do the following:
range = min(pf_test):max(pf_test);
count = histc(pf_test,range);
gt = count(ismember(range,a));
gt(~ismember(unique(a),b)) = 0;
If you can't guarantee that the values are integers, it's a bit more complicated. One possible method of it would be the following:
%restrict yourself to values that appear in the second vector
filter = ismember(pf_test,pdata);
% sort your first vector (ignore this if it is already sorted)
spf_test = sort(pf_test);
% Find the first and last occurrence of each element
[~,last] = unique(spf_test(filter));
[~,first] = unique(spf_test(filter),'first');
% Initialise gt
gt = zeros(length(pf_test));
% Fill gt
gt(filter) = (last-first)+1;
EDIT: Note that I may have got the vectors the wrong way around - if this doesn't work as expected, switch pf_test and pdata. It wasn't immediately clear to me which was which.
You mention histc. Why are you not using it (in its version with two input parameters)?
>> pdata = [1 1 3 2 3 1 4 4 5];
>> pf_test = 1:6;
>> histc(pdata,pf_test)
ans =
3 1 2 2 1 0

Sum every n rows of matrix

Is there any way that I can sum up columns values for each group of three rows in a matrix?
I can sum three rows up in a manual way.
For example
% matrix is the one I wanna store the new data.
% data is the original dataset.
matrix(1,1:end) = sum(data(1:3, 1:end))
matrix(2,1:end) = sum(data(4:6, 1:end))
...
But if the dataset is huge, this wouldn't work.
Is there any way to do this automatically without loops?
Here are four other ways:
The obligatory for-loop:
% for-loop over each three rows
matrix = zeros(size(data,1)/3, size(data,2));
counter = 1;
for i=1:3:size(data,1)
matrix(counter,:) = sum(data(i:i+3-1,:));
counter = counter + 1;
end
Using mat2cell for tiling:
% divide each three rows into a cell
matrix = mat2cell(data, ones(1,size(data,1)/3)*3);
% compute the sum of rows in each cell
matrix = cell2mat(cellfun(#sum, matrix, 'UniformOutput',false));
Using third dimension (based on this):
% put each three row into a separate 3rd dimension slice
matrix = permute(reshape(data', [], 3, size(data,1)/3), [2 1 3]);
% sum rows, and put back together
matrix = permute(sum(matrix), [3 2 1]);
Using accumarray:
% build array of group indices [1,1,1,2,2,2,3,3,3,...]
idx = floor(((1:size(data,1))' - 1)/3) + 1;
% use it to accumulate rows (appliead to each column separately)
matrix = cell2mat(arrayfun(#(i)accumarray(idx,data(:,i)), 1:size(data,2), ...
'UniformOutput',false));
Of course all the solution so far assume that the number of rows is evenly divisble by 3.
This one-liner reshapes so that all the values needed for a particular cell are in a column, does the sum, and then reshapes the back to the expected shape.
reshape(sum(reshape(data, 3, [])), [], size(data, 2))
The naked 3 could be changed if you want to sum a different number of rows together. It's on you to make sure the number of rows in each group divides evenly.
Slice the matrix into three pieces and add them together:
matrix = data(1:3:end, :) + data(2:3:end, :) + data(3:3:end, :);
This will give an error if size(data,1) is not a multiple of three, since the three pieces wouldn't be the same size. If appropriate to your data, you might work around that by truncating data, or appending some zeros to the end.
You could also do something fancy with reshape and 3D arrays. But I would prefer the above (unless you need to replace 3 with a variable...)
Prashant answered nicely before but I would have a simple amendment:
fl = filterLength;
A = yourVector (where mod(A,fl)==0)
sum(reshape(A,fl,[]),1).'/fl;
There is the ",1" that makes the line run even when fl==1 (original values).
I discovered this while running it in a for loop like so:
... read A ...
% Plot data
hold on;
averageFactors = [1 3 10 30 100 300 1000];
colors = hsv(length(averageFactors));
clear legendTxt;
for i=1:length(averageFactors)
% ------ FILTERING ----------
clear Atrunc;
clear ttrunc;
clear B;
fl = averageFactors(i); % filter length
Atrunc = A(1:L-mod(L,fl),:);
ttrunc = t(1:L-mod(L,fl),:);
B = sum(reshape(Atrunc,fl,[]),1).'/fl;
tB = sum(reshape(ttrunc,fl,[]),1).'/fl;
length(B)
plot(tB,B,'color',colors(i,:) )
%kbhit ()
endfor

MATLAB: duplicating vector 'n' times [duplicate]

This question already has answers here:
Octave / Matlab: Extend a vector making it repeat itself?
(3 answers)
Closed 9 years ago.
I have a vector, e.g.
vector = [1 2 3]
I would like to duplicate it within itself n times, i.e. if n = 3, it would end up as:
vector = [1 2 3 1 2 3 1 2 3]
How can I achieve this for any value of n? I know I could do the following:
newvector = vector;
for i = 1 : n-1
newvector = [newvector vector];
end
This seems a little cumbersome though. Any more efficient methods?
Try
repmat([1 2 3],1,3)
I'll leave you to check the documentation for repmat.
This is a Faster Method Than repmat or reshape by an Order of Magnitude
One of the best methods for doing such things is Using Tony's Trick. Repmat and Reshape are usually found to be slower than Tony's trick as it directly uses Matlabs inherent indexing. To answer you question,
Lets say, you want to tile the row vector r=[1 2 3] N times like r=[1 2 3 1 2 3 1 2 3...], then,
c=r'
cc=c(:,ones(N,1));
r_tiled = cc(:)';
This method has significant time savings against reshape or repmat for large N's.
EDIT : Reply to #Li-aung Yip's doubts
I conducted a small Matlab test to check the speed differential between repmat and tony's trick. Using the code mentioned below, I calculated the times for constructing the same tiled vector from a base vector A=[1:N]. The results show that YES, Tony's-Trick is FASTER BY AN ORDER of MAGNITUDE, especially for larger N. People are welcome to try it themselves. This much time differential can be critical if such an operation has to be performed in loops. Here is the small script I used;
N= 10 ;% ASLO Try for values N= 10, 100, 1000, 10000
% time for tony_trick
tic;
A=(1:N)';
B=A(:,ones(N,1));
C=B(:)';
t_tony=toc;
clearvars -except t_tony N
% time for repmat
tic;
A=(1:N);
B=repmat(A,1,N);
t_repmat=toc;
clearvars -except t_tony t_repmat N
The Times (in seconds) for both methods are given below;
N=10, time_repmat = 8e-5 , time_tony = 3e-5
N=100, time_repmat = 2.9e-4 , time_tony = 6e-5
N=1000, time_repmat = 0.0302 , time_tony = 0.0058
N=10000, time_repmat = 2.9199 , time_tony = 0.5292
My RAM didn't permit me to go beyond N=10000. I am sure, the time difference between the two methods will be even more significant for N=100000. I know, these times might be different for different machines, but the relative difference in order-of-magnitude of times will stand. Also, I know, the avg of times could have been a better metric, but I just wanted to show the order of magnitude difference in time consumption between the two approaches. My machine/os details are given below :
Relevant Machine/OS/Matlab Details : Athlon i686 Arch, Ubuntu 11.04 32 bit, 3gb ram, Matlab 2011b
Based on Abhinav's answer and some tests, I wrote a function which is ALWAYS faster than repmat()!
It uses the same parameters, except for the first parameter which must be a vector and not a matrix.
function vec = repvec( vec, rows, cols )
%REPVEC Replicates a vector.
% Replicates a vector rows times in dim1 and cols times in dim2.
% Auto optimization included.
% Faster than repmat()!!!
%
% Copyright 2012 by Marcel Schnirring
if ~isscalar(rows) || ~isscalar(cols)
error('Rows and cols must be scaler')
end
if rows == 1 && cols == 1
return % no modification needed
end
% check parameters
if size(vec,1) ~= 1 && size(vec,2) ~= 1
error('First parameter must be a vector but is a matrix or array')
end
% check type of vector (row/column vector)
if size(vec,1) == 1
% set flag
isrowvec = 1;
% swap rows and cols
tmp = rows;
rows = cols;
cols = tmp;
else
% set flag
isrowvec = 0;
end
% optimize code -> choose version
if rows == 1
version = 2;
else
version = 1;
end
% run replication
if version == 1
if isrowvec
% transform vector
vec = vec';
end
% replicate rows
if rows > 1
cc = vec(:,ones(1,rows));
vec = cc(:);
%indices = 1:length(vec);
%c = indices';
%cc = c(:,ones(rows,1));
%indices = cc(:);
%vec = vec(indices);
end
% replicate columns
if cols > 1
%vec = vec(:,ones(1,cols));
indices = (1:length(vec))';
indices = indices(:,ones(1,cols));
vec = vec(indices);
end
if isrowvec
% transform vector back
vec = vec';
end
elseif version == 2
% calculate indices
indices = (1:length(vec))';
% replicate rows
if rows > 1
c = indices(:,ones(rows,1));
indices = c(:);
end
% replicate columns
if cols > 1
indices = indices(:,ones(1,cols));
end
% transform index when row vector
if isrowvec
indices = indices';
end
% get vector based on indices
vec = vec(indices);
end
end
Feel free to test the function with all your data and give me feedback. When you found something to even improve it, please tell me.