save vectors of different sizes in matrix - matlab

I would like to divide a vector in many vectors and put all of them in a matrix. I got this error "Subscripted assignment dimension mismatch."
STEP = zeros(50,1);
STEPS = zeros(50,length(locate));
for i = 1:(length(locate)-1)
STEP = filtered(locate(i):locate(i+1));
STEPS(:,i) = STEP;
end
I take the value of "filtered" from (1:50) at the first time for example and I would like to stock it in the first row of a matrix, then for iterations 2, I take value of "filtered from(50:70) for example and I stock it in row 2 in the matrix, and this until the end of the loop..
If someone has an idea, I don't get it! Thank you!

As mentioned in the comments, to make it work you can edit the loopy code at the end with -
STEPS(1:numel(STEP),i) = STEP;
Also, output array STEPS doesn't seem to use the last column. So, the initialization could use one less column, like so -
STEPS = zeros(50,length(locate)-1);
All is good with the loopy code, but in the long run with a high level language like MATLAB, you might want to look for faster codes and one way to achieve that would be vectorized codes. So, let me suggest a vectorized solution using bsxfun's masking capability to process such ragged-arrays. The implementation to cover generic elements in locate would look something like this -
% Get differentiation, which represent the interval lengths for each col
diffs = diff(locate)+1;
% Initialize output array
out = zeros(max(diffs),length(locate)-1);
% Get elements from filtered array for setting into o/p array
vals = filtered(sort([locate(1):locate(end) locate(2:end-1)]));
% Use bsxfun to create a mask that are to be set in o/p array and set thereafter
out(bsxfun(#ge,diffs,(1:max(diffs)).')) = vals;
Sample run for verification -
>> % Inputs
locate = [6,50,70,82];
filtered = randi(9,1,120);
% Get extent of output array for number of rows
N = max(diff(locate))+1;
>> % Original code with corrections
STEP = zeros(N,1);
STEPS = zeros(N,length(locate)-1);
for i = 1:(length(locate)-1)
STEP = filtered(locate(i):locate(i+1));
STEPS(1:numel(STEP),i) = STEP;
end
>> % Proposed code
diffs = diff(locate)+1;
out = zeros(max(diffs),length(locate)-1);
vals = filtered(sort([locate(1):locate(end) locate(2:end-1)]));
out(bsxfun(#ge,diffs,(1:max(diffs)).')) = vals;
>> max_error = max(abs(out(:)-STEPS(:)))
max_error =
0

Related

Fast way to get mean values of rows accordingly to subscripts

I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?
This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));
M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another

insert value in a matrix in a for loop

I wrote this matlab code in order to concatenate the results of the integration of all the columns of a matrix extracted form a multi matrix array.
"datimf" is a matrix composed by 100 matrices, each of 224*640, vertically concatenated.
In the first loop i select every single matrix.
In the second loop i integrate every single column of the selected matrix
obtaining a row of 640 elements.
The third loop must concatenate vertically all the lines previously calculated.
Anyway i got always a problem with the third loop. Where is the error?
singleframe = zeros(224,640);
int_frame_all = zeros(1,640);
conc = zeros(100,640);
for i=0:224:(22400-224)
for j = 1:640
for k = 1:100
singleframe(:,:) = datimf([i+1:(i+223)+1],:);
int_frame_all(:,j) = trapz(singleframe(:,j));
conc(:,k) = vertcat(int_frame_all);
end
end
end
An alternate way to do this without using any explicit loops (edited in response to rayryeng's comment below. It's also worth noting that using cellfun may not be more efficient than explicitly looping.):
nmats = 100;
nrows = 224;
ncols = 640;
datimf = rand(nmats*nrows, ncols);
% convert to an nmats x 1 cell array containing each matrix
cellOfMats = mat2cell(datimf, ones(1, nmats)*nrows, ncols);
% Apply trapz to the contents of each cell
cellOfIntegrals = cellfun(#trapz, cellOfMats, 'UniformOutput', false);
% concatenate the results
conc = cat(1, cellOfIntegrals{:});
Taking inspiration from user2305193's answer, here's an even better "loop-free" solution, based on reshaping the matrix and applying trapz along the appropriate dimension:
datReshaped = reshape(datimf, nrows, nmats, ncols);
solution = squeeze(trapz(datReshaped, 1));
% verify solutions are equivalent:
all(solution(:) == conc(:)) % ans = true
I think I understand what you want. The third loop is unnecessary as both the inner and outer loops are 100 elements long. Also the way you have it you are assigning singleframe lots more times than necessary since it does not depend on the inner loops j or k. You were also trying to add int_frame_all to conc before int_frame_all was finished being populated.
On top of that the j loop isn't required either since trapz can operate on the entire matrix at once anyway.
I think this is closer to what you intended:
datimf = rand(224*100,640);
singleframe = zeros(224,640);
int_frame_all = zeros(1,640);
conc = zeros(100,640);
for i=1:100
idx = (i-1)*224+1;
singleframe(:,:) = datimf(idx:idx+223,:);
% for j = 1:640
% int_frame_all(:,j) = trapz(singleframe(:,j));
% end
% The loop is uncessary as trapz can operate on the entire matrix at once.
int_frame_all = trapz(singleframe,1);
%I think this is what you really want...
conc(i,:) = int_frame_all;
end
It looks like you're processing frames in a video.
The most efficent approach in my experience would be to reshape datimf to be 3-dimensional. This can easily be achieved with the reshape command.
something along the line of vid=reshape(datimf,224,640,[]); should get you far in this regard, where the 3rd dimension is time. vid(:,:,1) then would display the first frame of the video.

Extract values from vector and save in new vector

I have a vector Cycle() that can contain several elements with a variable size.
I want to extract from this vector all the values which are in the odd columns, i.e. Cycle(1), Cycle(3), Cycle(5) ... and save them into a new vector Rcycle.
That's my code:
Rcycle = zeros(1, length(cycle)/2);
Rcycle(1) = cycle(1);
for j=3:length(cycle);
for i=2:length(Rcycle);
Rcycle(i) = cycle(j);
j = j+2;
end
end
Also I want to extract from Cycle() the even columns and save them in a vector Lcycle. My code:
Lcycle = zeros(1, length(cycle)/2);
Lcycle(1) = cycle(2);
for k=4:length(cycle);
for i=2:length(cycle);
Lcycle(i) = cycle(k);
k = k+2;
end
end
By running this for a sample Cycle() with 12 elements I get the right results for Lcycle, but the wrong ones for Rcycle. Also I get the error that my matrix have exceeded its dimension.
Has anyone any idea how to solve this in a more smooth way?
Use vector indexing!
Rcyle=cycle(1:2:end); %// Take from cycle starting from 1, each 2, until the end
Lcycle=cycle(2:2:end);%// same, but start at 2.

Matlab: how to implement a dynamic vector

I am refering to an example like this
I have a function to analize the elements of a vector, 'input'. If these elements have a special property I store their values in a vector, 'output'.
The problem is that at the begging I don´t know the number of elements it will need to store in 'output'so I don´t know its size.
I have a loop, inside I go around the vector, 'input' through an index. When I consider special some element of this vector capture the values of 'input' and It be stored in a vector 'ouput' through a sentence like this:
For i=1:N %Where N denotes the number of elements of 'input'
...
output(j) = input(i);
...
end
The problem is that I get an Error if I don´t previously "declare" 'output'. I don´t like to "declare" 'output' before reach the loop as output = input, because it store values from input in which I am not interested and I should think some way to remove all values I stored it that don´t are relevant to me.
Does anyone illuminate me about this issue?
Thank you.
How complicated is the logic in the for loop?
If it's simple, something like this would work:
output = input ( logic==true )
Alternatively, if the logic is complicated and you're dealing with big vectors, I would preallocate a vector that stores whether to save an element or not. Here is some example code:
N = length(input); %Where N denotes the number of elements of 'input'
saveInput = zeros(1,N); % create a vector of 0s
for i=1:N
...
if (input meets criteria)
saveInput(i) = 1;
end
end
output = input( saveInput==1 ); %only save elements worth saving
The trivial solution is:
% if input(i) meets your conditions
output = [output; input(i)]
Though I don't know if this has good performance or not
If N is not too big so that it would cause you memory problems, you can pre-assign output to a vector of the same size as input, and remove all useless elements at the end of the loop.
output = NaN(N,1);
for i=1:N
...
output(i) = input(i);
...
end
output(isnan(output)) = [];
There are two alternatives
If output would be too big if it was assigned the size of N, or if you didn't know the upper limit of the size of output, you can do the following
lengthOutput = 100;
output = NaN(lengthOutput,1);
counter = 1;
for i=1:N
...
output(counter) = input(i);
counter = counter + 1;
if counter > lengthOutput
%# append output if necessary by doubling its size
output = [output;NaN(lengthOutput,1)];
lengthOutput = length(output);
end
end
%# remove unused entries
output(counter:end) = [];
Finally, if N is small, it is perfectly fine to call
output = [];
for i=1:N
...
output = [output;input(i)];
...
end
Note that performance degrades dramatically if N becomes large (say >1000).

Bucketing Algorithm

I've got some code that works, but is a bit of a bottleneck, and I'm stuck trying to figure out how to speed it up. It's in a loop, and I can't figure how to vectorize it.
I've got a 2D array, vals, that represents timeseries data. Rows are dates, columns are different series. I'm trying to bucket the data by months to perform various operations on it (sum, mean, etc). Here is my current code:
allDts; %Dates/times for vals. Size is [size(vals, 1), 1]
vals;
[Y M] = datevec(allDts);
fomDates = unique(datenum(Y, M, 1)); %first of the month dates
[Y M] = datevec(fomDates);
nextFomDates = datenum(Y, M, DateUtil.monthLength(Y, M)+1);
newVals = nan(length(fomDates), size(vals, 2)); %preallocate for speed
for k = 1:length(fomDates);
This next line is the bottleneck because I call it so many times.(looping)
idx = (allDts >= fomDates(k)) & (allDts < nextFomDates(k));
bucketed = vals(idx, :);
newVals(k, :) = nansum(bucketed);
end %for
Any Ideas? Thanks in advance.
That's a difficult problem to vectorize. I can suggest a way to do it using CELLFUN, but I can't guarantee that it will be faster for your problem (you would have to time it yourself on the specific data sets you are using). As discussed in this other SO question, vectorizing doesn't always work faster than for loops. It can be very problem-specific which is the best option. With that disclaimer, I'll suggest two solutions for you to try: a CELLFUN version and a modification of your for-loop version that may run faster.
CELLFUN SOLUTION:
[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1); % Start date of each month
[monthStart,sortIndex] = sort(monthStart); % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart); % Get unique start dates
valCell = mat2cell(vals(sortIndex,:),diff([0 uniqueIndex]));
newVals = cellfun(#nansum,valCell,'UniformOutput',false);
The call to MAT2CELL groups the rows of vals that have the same start date together into cells of a cell array valCell. The variable newVals will be a cell array of length numel(uniqueStarts), where each cell will contain the result of performing nansum on the corresponding cell of valCell.
FOR-LOOP SOLUTION:
[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1); % Start date of each month
[monthStart,sortIndex] = sort(monthStart); % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart); % Get unique start dates
vals = vals(sortIndex,:); % Sort the values according to start date
nMonths = numel(uniqueStarts);
uniqueIndex = [0 uniqueIndex];
newVals = nan(nMonths,size(vals,2)); % Preallocate
for iMonth = 1:nMonths,
index = (uniqueIndex(iMonth)+1):uniqueIndex(iMonth+1);
newVals(iMonth,:) = nansum(vals(index,:));
end
If all you need to do is form the sum or mean on rows of a matrix, where the rows are summed depending upon another variable (date) then use my consolidator function. It is designed to do exactly this operation, reducing data based on the values of an indicator series. (Actually, consolidator can also work on n-d data, and with a tolerance, but all you need to do is pass it the month and year information.)
Find consolidator on the file exchange on Matlab Central