This is my one dimensional array A. containing 10 numbers
A = [-8.92100000000000 10.6100000000000 1.33300000000000 ...
-2.57400000000000 -4.52700000000000 9.63300000000000 ...
4.26200000000000 16.9580000000000 8.16900000000000 4.75100000000000];
I want the loop to go through like this;
(calculating mean interval wise) - Interval length of 2,4,8
(a(1)+a(2))/2 - value stored in one block of a matrix say m= zeros(10)
then (a(1)+a(2)+a(3)+a(4))/4 ------ mean-----
then (a(1)+a(2)..... a(8))/8
then shift index;
(a(2)+a(3))/2; - mean
(a(2)+a(3)+a(4)+a(5))/4
(a(2)+a(3)...a(9))/8
SO basically 2^n length interval
You could do this using conv without loops
avg_2 = mean([A(1:end-1);A(2:end)])
avg_4 = conv(A,ones(1,4)/4,'valid')
avg_8 = conv(A,ones(1,8)/8,'valid')
Output for the sample Input:
avg_2 =
0.8445 5.9715 -0.6205 -3.5505 2.5530 6.9475 10.6100 12.5635 6.4600
avg_4 =
0.1120 1.2105 0.9662 1.6985 6.5815 9.7555 8.5350
avg_8 =
3.3467 5.4830 4.7506
Finding Standard Deviation for an example (std_4)
%// each 1x4 sliding sub-matrix is made a column
%// for eg:- if A is 1x6 you would get 1-2-3-4, 2-3-4-5, 3-4-5-6 each as a column
%// ending with 3 columns. for 1x10 matrix, you would get 7 columns
reshaped_4 = im2col(A,[1 4],'sliding'); %// change 4 to 2 or 8 for other examples
%// calculating the mean of every column
mean_4 = mean(reshaped_4);
%// Subtract each value of the column with the mean value of corresponding column
out1 = bsxfun(#minus,reshaped_4,mean_4);
%// finally element-wise squaring, mean of each column
%// and then element-wise sqrt to get the output.
std_4 = sqrt(mean(out1.^2))
Output for the sample Input:
std_4 =
7.0801 5.8225 5.4304 5.6245 7.8384 4.5985 5.0906
Full code for OP
clc;
clear;
close all;
A = [-8.92100000000000 10.6100000000000 1.33300000000000 ...
-2.57400000000000 -4.52700000000000 9.63300000000000 ...
4.26200000000000 16.9580000000000 8.16900000000000 4.75100000000000];
reshaped_2 = im2col(A,[1 2],'sliding'); %// Length Two
mean_2 = mean(reshaped_2);
out1 = bsxfun(#minus,reshaped_2,mean_2);
std_2 = sqrt(mean(out1.^2))
reshaped_4 = im2col(A,[1 4],'sliding'); %// Four
mean_4 = mean(reshaped_4);
out1 = bsxfun(#minus,reshaped_4,mean_4);
std_4 = sqrt(mean(out1.^2))
reshaped_8 = im2col(A,[1 8],'sliding'); %// Eight
mean_8 = mean(reshaped_8);
out1 = bsxfun(#minus,reshaped_8,mean_8);
std_8 = sqrt(mean(out1.^2))
Related
I am running a LASSO estimation method alongside a for loop.
Here is the code:
%Lasso
data = rand(246,3); %random data for illistrative purposes
XL1 = lagmatrix(data,1); %Lags the data matrix by one period
ydata = data; %Specifies the dependent variable
ydata([1],:)=[]; %Removes the top row due to the lagged X
XL1([1],:)=[]; %Removes the top row of the lagged X with become a NaN from lagmatrix
for ii = 1:3 %For loop to complete LASSO for all industries
y = ydata(:,ii); %y is the industry we are trying to forecast
rng default % For reproducibility, as the LASSO uses some random numbers
[B,FitInfo] = lasso([XL1],y,'CV',10,'PredictorNames',{'x1','x2','x3'});
idxLambdaMinMSE = FitInfo.IndexMinMSE;
ii
minMSEModelPredictors = FitInfo.PredictorNames(B(:,idxLambdaMinMSE)~=0)
end
The output that the LASSO provides is
ii = 1
minMSEModelPredictors =
1×1 cell array
{'x2'}
ii = 2
minMSEModelPredictors =
1×5 cell array
{'x1'} {'x2'} {'x3'}
ii = 3
minMSEModelPredictors =
1×2 cell array
{'x2'} {'x3'}
For the purposes of automating this, I need the result to be reported in the following manner,
Results = {[2],[1 2 3],[2 3]};
I know this is a long shot, but it would be helpful as the above is easy to type out but If I increase the dimensions, this becomes a very difficult task.
Each output of minMSEModelPredictors is a cell array of the form
minMSEModelPredictors = {'x1', 'x2', 'x3'};
We can use strrep to get rid of the 'x' (or just don't have an 'x' in your predictor names to begin with), and str2double to convert the cell array to a numeric array.
Then storing the results is trivial...
Result = cell(1,3); % Initialise output
for ii = 1:3
% stuff...
minMSEModelPredictors = FitInfo.PredictorNames(B(:,idxLambdaMinMSE)~=0);
Result{ii} = str2double( strrep( minMSEModelPredictors, 'x', '' ) );
end
I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?
This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));
M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another
Elements of a column matrix of non-sequential numbers (sourceData) should have their values incremented if their index positions lie between certain values as defined in a second column matrix (triggerIndices) which lists the indices sequentially.
This can be easily done with a for-loop but can it be done in a vectorized way?
%// Generation of example data follows
sourceData = randi(1e3,100,1);
%// sourceData = 1:1:1000; %// Would show more clearly what is happening
triggerIndices = randperm(length(sourceData),15);
triggerIndices = sort(triggerIndices);
%// End of example data generation
%// Code to be vectorized follows
increment = 75;
addOn = 100;
for index = 1:1:length(triggerIndices)-1
sourceData(triggerIndices(index):1:triggerIndices(index+1)-1) = ...
sourceData(triggerIndices(index):1:triggerIndices(index+1)-1) + addOn;
addOn = addOn + increment;
end
sourceData(triggerIndices(end):1:end) = ....
sourceData(triggerIndices(end):1:end) + addOn;
%// End of code to be vectorized
How about replacing everything with:
vals = sparse(triggerIndices, 1, increment, numel(sourceData), 1);
vals(triggerIndices(1)) = addOn;
sourceData(:) = sourceData(:) + cumsum(vals);
This is basically a variant of run-length decoding shown here.
Suppose that I have a matrix with non square size such as 30X35 and I want to split into blocks such as 4 blocks it would be like 15X18 and fill the added cell by zeros could that be done in matlab?
You can do it by copying the matrix (twice) and then setting to 0's the part you want to:
m = rand([30 35]);
mLeft = m;
mLeft(1:15, :) = 0;
mRight = m;
mRight(16:end, :) = 0;
Or it could be the other way around, first you create a matrix full of 0's and then copy the content you are interested.
mLeft = zeros(size(m));
mLeft(16:end, :) = m(16:end, :);
A generalisation could be done as:
% find the splits, the position where blocks end
splits = round(linspace(1, numRows+1, numBlocks+1));
% and for each block
for s = 1:length(splits)-1
% create matrix with 0s the size of m
mAux = zeros(size(m));
% copy the content only in block you are interested on
mAux( splits(s):splits(s+1)-1, : ) = m( splits(s):splits(s+1)-1, : )
% do whatever you want with mAux before it is overwriten on the next iteration
end
So with the 30x35 example (numRows = 30), and assuming you want 6 blocks (numBlocks = 6), splits will be:
splits = [1 6 11 16 21 26 31]
meaning that the i-th block starts at splits(i) and finsished at row splits(i-1)-1.
Then you create an empty matrix:
mAux = zeros(size(m));
And copy the content from m from column splits(i) to splits(i+1)-1:
mAux( splits(s):splits(s+1)-1, : ) = m( splits(s):splits(s+1)-1, : )
This example ilustrates if you want to have subdivision that span ALL the columns. If you want subsets of rows AND columns you will have to find the splits in both directions and then do 2 nested loops with:
for si = 1:legth(splitsI)-1
for sj = 1:legth(splitsj)-1
mAux = zeros(size(m));
mAux( splitsI(si):splitsI(si+1)-1, splitsJ(sj):splitsJ(sj+1)-1 ) = ...
m( splitsI(si):splitsI(si+1)-1, splitsJ(sj):splitsJ(sj+1)-1 );
end
end
Have you looked at blockproc ?
I have a time series in the following format:
time data value
733408.33 x1
733409.21 x2
733409.56 x3
etc..
The data runs from approximately 01-Jan-2008 to 31-Dec-2010.
I want to separate the data into columns of monthly length.
For example the first column (January 2008) will comprise of the corresponding data values:
(first 01-Jan-2008 data value):(data value immediately preceding the first 01-Feb-2008 value)
Then the second column (February 2008):
(first 01-Feb-2008 data value):(data value immediately preceding the first 01-Mar-2008 value)
et cetera...
Some ideas I've been thinking of but don't know how to put together:
Convert all serial time numbers (e.g. 733408.33) to character strings with datestr
Use strmatch('01-January-2008',DatesInChars) to find the indices of the rows corresponding to 01-January-2008
Tricky part (?): TransformedData(:,i) = OriginalData(start:end) ? end = strmatch(1) - 1 and start = 1. Then change start at the end of the loop to strmatch(1) and then run step 2 again to find the next "starting index" and change end to the "new" strmatch(1)-1 ?
Having it speed optimized would be nice; I am going to apply it on data sampled ~2 million times.
Thanks!
I would use histc with a list a list of last days of the month as the second parameter (Note: use histc with the two return functions).
The edge list can easily be created with datenum or datevec.
This way you don't have operation on string and you that should be fast.
EDIT:
Example with result in a simple data structure (including some code from #Rody):
% Generate some test times/data
tstart = datenum('01-Jan-2008');
tend = datenum('31-Dec-2010');
tspan = tstart : tend;
tspan = tspan(:) + randn(size(tspan(:))); % add some noise so it's non-uniform
data = randn(size(tspan));
% Generate list of edge
edge = [];
for y = 2008:2010
for m = 1:12
edge = [edge datenum(y, m, 1)];
end
end
% Histogram
[number, bin] = histc(tspan, edge);
% Setup of result
result = {};
for n = 1:length(edge)
result{n} = [tspan(bin == n), data(bin == n)];
end
% Test
% 04-Aug-2008 17:25:20
datestr(result{8}(4,1))
tspan(data == result{8}(4,2))
datestr(tspan(data == result{8}(4,2)))
Assuming you have sorted, non-equally-spaced date numbers, the way to go here is to put the relevant data in a cell array, so that each entry corresponds to the next month, and can hold a different amount of elements.
Here's how to do that quite efficiently:
% generate some test times/data
tstart = datenum('01-Jan-2008');
tend = datenum('31-Dec-2010');
tspan = tstart : tend;
tspan = tspan(:) + randn(size(tspan(:))); % add some noise so it's non-uniform
data = randn(size(tspan));
% find month numbers
[~,M] = datevec(tspan);
% find indices where the month changes
inds = find(diff([0; M]));
% extract data in columns
sz = numel(inds)-1;
cols = cell(sz,1);
for ii = 1:sz-1
cols{ii} = data( inds(ii) : inds(ii+1)-1 );
end
Note that it can be difficult to determine which entry in cols belongs to which month, year, so here's how to do it in a more human-readable way:
% change this line:
[y,M] = datevec(tspan);
% and change these lines:
cols = cell(sz,3);
for ii = 1:sz-1
cols{ii,1} = data( inds(ii) : inds(ii+1)-1 );
% also store the year and month
cols{ii,2} = y(inds(ii));
cols{ii,3} = M(inds(ii));
end
I'll assume you have a timeVals an Nx1 double vector holding the time value of each datum. Assuming data is also an Nx1 array. I also assume data and timeVals are sorted according to time: that is, the samples you have are ordered according to the time they were taken.
How about:
subs = #(x,i) x(:,i);
months = subs( datevec(timeVals), 2 ); % extract the month of year as a number from the time
r = find( months ~= [months(2:end), months(end)+1] );
monthOfCell = months( r );
r( 2:end ) = r( 2:end ) - r( 1:end-1 );
dataByMonth = mat2cell( data', r ); % might need to transpose data or r here...
timeByMonth = mat2cell( timeVal', r );
After running this code, you have a cell array dataByMonth each cell contains all data relevant to a specific month. The corresponding cell of timeByMonth holds the sampling times of the data of the respective month. Finally, monthOfCell tells you what is the month's number (1-12) of each cell.