Split dataset to specific number of sample by for loop - matlab

I have a 35136-by-1 matrix containing power data of 366 days with every day having 96 measurements). I want to take a sample from 252 days: power data of "day 1 to day 7" is the first sample, power data of "day 2 to day 8" is the second sample, etc.), and reshape my matrix to size [96 7 1 252].
I wrote following code, but I get 36 sample instead of 252
m=7;
for j=1
sample([j:96*m],:)=solarpower_n([j:96*m],:);
y([(96*m)+1:96*(m+1)],:)=solarpower_n([(96*m)+1:96*(m+1)],:);
m=m+1;
for j=2:246
sample([(96*(j-1))+1:96*m],:)=solarpower_n([(96*(j-1))+1:96*m],:);
y([(96*m)+1:96*(m+1)],:)=solarpower_n([(96*m)+1:96*(m+1)],:);
m=m+1;
end
end
I want to take sample from each 7 days. Assume D to be the number of days, and M as number of power measurements on each day. For 252 days, M=[1,2,3,...,96] and D=[1,2,...,252] . Thus the power of first day, P1, has a dimension of 96*1. I want to take sample1={P1,...,P7}, sample2={P2,...,P8} , .....,sample252={P246,.....,P252}. and have a [96 7 1 252] 4-D array.
How can I accomplish this?

Taking samples that way is rather inefficient, since you're copying each data point 7 times. You could simply use indexing:
A = rand(96*366, 1); % Sample data
B = reshape(A,[96 366]); % Reshape all your days in one go
B(:, 1:7) % first 7 days
B(:, 163:170) % Days 163 to 170, etc.
If you do want to copy your data seven times to your 4D array you can use a simple for loop:
A = rand(96*366, 1); % Sample data
% Note you need days 253:256, since P252 contains those days
B = reshape(A(1:96*(252+6)),[96 (252+6)]); % Reshape your first 252 days
C = zeros(size(B,1), 7, 1, size(B,2)-6); % Initialise output
for ii = 1:size(B,2)-6
C(:, :, :, ii) = B(:, ii:ii+6); % Save each 7 day sample
end
Getting rid of the for loop is difficult, given you want a sliding window. There are probably specialised functions for that somewhere, but given your data size a loop should be sufficiently performant.
For a short introduction on reshape() you can read this answer of mine.

Related

Read a complex, and long text file in Matlab

I have a very long text file which contains the data from 4 different stations with different time steps:
1:00
station 1
a number 1 (e.g.0.6E-06)
matrix1 (41x36)
station 2
number 2 (e.g.0.1E-06)
matrix2 (41x36)
station 3
number 3 (e.g.0.2E-06)
matrix3 (41x36)
station 4
number 4 (e.g.0.4E-06)
matrix4 (41x36)
2:00
station 1
a number (e.g.0.24E-06)
matrix5 (41x36)
station 2
a number (e.g.0.3E-06)
matrix6 (41x36)
station 3
number (e.g.0.12E-06)
matrix7 (41x36)
station 4
number (e.g.0.14E-06)
matrix8 (41x36)
.....
and so on
I need to read this data by each station and each step, and noted that each matrix should be scaled by multiplying with a number above it. An example is here: https://files.fm/u/sn447ttc#/view/example.txt
Could you please help?
Thank you a lot.
My idea here would be to read the textfile using fopen and textscan. Afterwards you can search for appearances of the Keyword FACTOR to subdivide the output. Here's the code:
fid=fopen('example.txt'); % open the document
dataRaw=textscan(fid,'%s','Delimiter',''); % read the file with no delimiter to achieve a cell array with 1 cell per line of the text file
fclose(fid); % close the document
rows=cellfun(#(x) strfind(x,'FACTOR'),dataRaw,'uni',0); % search for appearances of 'FACTOR'
hasFactor=find(~cellfun(#isempty,rows{1})); % get rownumbers of the lines that contain the word FACTOR
dataRaw=dataRaw{1}; % convert array for easier indexing
for ii=1:(numel(hasFactor)-1) % loop over appearances of the word FACTOR
array=cellfun(#str2num,dataRaw(hasFactor(ii)+2:hasFactor(ii+1)-1),'uni',0); % extract numerical data
output{ii}=str2num(dataRaw{hasFactor(ii)+1})*cat(1,array{:}); % create output scaled by the factor
end
array=cellfun(#str2num,dataRaw(hasFactor(end)+2:end),'uni',0);
output{end+1}=str2num(dataRaw{hasFactor(end)+1})*cat(1,array{:}); % These last 2 lines add the last array to the ouput
outputMat=cat(3,output{:}); % convert to a 3-dimensional matrix
outputStations=[{output(1:4:end)} {output(2:4:end)} {output(3:4:end)} {output(4:4:end)}]; % Sort the output to have 1 cell for each station
outputColumnSums=cellfun(#(x) cellfun(#sum,x,'uni',0),outputStations,'uni',0); % To sum up all the columns of each matrix
outputRowSums=cellfun(#(x) cellfun(#(y) sum(y,2),x,'uni',0),outputStations,'uni',0);
This approach is pretty slow and probably can be vectorized, but if you don't need it to be fast it should do the job. I created a cell-output with 1 cell per array and a 3 dimensional array as optional output. Hope that's fine with you
I have looked into your situation and it seems that the problem not trivial as anticipated. Keep in mind that if I have made mistakes on the assumption of the location of the data, you can let me know so I can edit it, or you can just change the numbers to that which suits your case. In this case, I initially loaded the delimited file into an Excel spreadsheet, just to visualize it.
After reading up on dlmread, I found that one can specify the exact rows and columns to pull from example.txt, as shown here:
data = dlmread('example.txt', ' ', [4 1 45 37]); % [r1 c1 r2 c2]
data2 = dlmread('example.txt', ' ', [47 1 88 37]);
The result of which is two matrices that are 41-by-37, containing only numbers. I started data at row 4 to bypass the header information/strings. Noticing the pattern, I set it up as a loop:
No_of_matrices_expected = 4;
dataCell = cell(No_of_matrices_expected, 1);
iterations = length(dataCell)
% Initial Conditions
rowBeginning = 4;
col1 = 1; % Constant
rowEnd = rowBeginning + 40; % == 44, right before next header information
col2 = 36; % Constant
for n = 1 : iterations
dataCell{n} = dlmread('example.txt', ' ', [rowBeginning, col1, rowEnd, col2]);
rowBeginning = rowBeginning + 41 + 2; % skip previous matrix and skip header info
rowEnd = rowBeginning + 40;
end
However, I stumbled across what you stated earlier which was that there are four different stations, each with their own time stamps. So running this loop more than 4 times led to unexpected results and MATLAB crashed. The reason is that the new timestamp creates an extra row for the date. Now, you could change the loop above to compensate for this extra row, or you can make multiple for loops for each station. This will be your decision to make.
Now if you wanted to save the header information, I would recommend taking a look into textscan. You can simply use this function to pull the first column of all the data into a cell array of strings. Then you can pull out the header information that you want. Keep in mind, use fopen if you want to use textscan.
I'll let you use what I have found thus far, but let me know if you need more help.
Numbers

Save to array in for loop, with steps - Matlab

Okay, this is a bit tricky to explain, but I have a long .txt file with data (only one column). It could look like this:
data=[18
32
50
3
19
31
48
2
18
33
51
4]
Now, every fourth value (e.g. 18, 19, 18) represents the same physical quantity, just from different measurements. Now, I want Matlab to take every fourth value and put it into an array X=[18 19 18], and like wise for the other quantities.
My solution so far looks like this:
for i=1:3;
for j=1:4:12;
X(i)=data(j);
end
end
... in this example, because there are three of each quantity (therefore i=1:3), and there are 12 datapoints in total (therefore j=1:4:12, in steps of 4). data is simply the loaded list of datapoints (this works fine, I can test it in command window - e.g. data(2)=32).
My problem, doing this, is, that my array turns out like X=[18 18 18] - i.e. only the last iteration is put into the array
Of course, in the end, I would like to do it for all points; saving the 2nd, 6th, and 10th datapoint into Y and so on. But this is simply having more for-loops I guess.
I hope this question makes sense. I guess it is an easy problem to solve.
Why don't you just do?
>> X = data(1:4:end)
X =
18
19
18
>> Y = data(2:4:end)
Y =
32
31
33
You can reshape the data and then either split it up into different variables or just know that each column is a different variable (I'm now assuming each measurement occurs the same number of times i.e. length(data) is a multiple of 4)
data = reshape(data, 4, []).';
So now if you want
X = data(:,1);
Y = data(:,2);
%// etc...
But also you could just leave it as data all in one variable since calling data(:,1) is hardly more hassle than X.
Now, you should NOT use for-loops for this, but I'm gong to address what's wrong with your loops and how to solve this using loops purely as an explanation of the logic. You have a nested loop:
for i=1:3;
for j=1:4:12;
X(i)=data(j);
end
end
Now what you were hoping was that i and j would each move one iteration forward together. So when i==1 then j==1, when i==2 then j==5 etc but this is not what happens at all. To best understand what's going on I suggest you print out the variables at each iteration:
disp(sprintf('i: \tj:'));
for i=1:3;
for j=1:4:12;
disp(sprintf(' %d\t %d',i,j));
end
end
This prints out
i: j:
1 1
1 5
1 9
2 1
2 5
2 9
3 1
3 5
3 9
What you wanted was
disp(sprintf('i: \tj:'));
for i=1:3;
disp(sprintf(' %d\t %d',i,4*i-3));
end
which outputs:
i: j:
1 1
2 5
3 9
applied to your problem:
%// preallocation!
X = zeros(size(data,1)/4, 1)
for i=1:3
X(i)=data(i*4 - 3);
end
Or alternatively you can keep a separate count of either i or j:
%// preallocation!
X = zeros(size(data,1)/4, 1)
i = 1;
for j=1:4:end;
X(i)=data(j);
i = i+1;
end
Just for completeness your own solution should have read
i = 0;
for j=1:4:12;
i = i+1;
X(i)=data(j);
end
Of course am304's answer is a better way of doing it.

How can I vectorize code that runs a function on subsets of a larger matrix?

Let's assume I have the following 9 x 5 matrix:
myArray = [
54.7 8.1 81.7 55.0 22.5
29.6 92.9 79.4 62.2 17.0
74.4 77.5 64.4 58.7 22.7
18.8 48.6 37.8 20.7 43.5
68.6 43.5 81.1 30.1 31.1
18.3 44.6 53.2 47.0 92.3
36.8 30.6 35.0 23.0 43.0
62.5 50.8 93.9 84.4 18.4
78.0 51.0 87.5 19.4 90.4
];
I have 11 "subsets" of this matrix and I need to run a function (let's say max) on each of these subsets. The subsets can be identified with the following matirx of logicals (identified column-wise, not row-wise):
myLogicals = logical([
0 1 0 1 1
1 1 0 1 1
1 1 0 0 0
0 1 0 1 1
1 0 1 1 1
1 1 1 1 0
0 1 1 0 1
1 1 0 0 1
1 1 0 0 1
]);
or via linear indexing:
starts = [2 5 8 10 15 23 28 31 37 40 43]; #%index start of each subset
ends = [3 6 9 13 18 25 29 33 38 41 45]; #%index end of each subset
such that the first subset is 2:3, the second is 5:6, and so on.
I can find the max of each subset and store it in a vector as follows:
finalAnswers = NaN(11,1);
for n=1:length(starts) #%i.e. 1 through the number of subsets
finalAnswers(n) = max(myArray(starts(n):ends(n)));
end
After the loop runs, finalAnswers contains the maximum value of each of the data subsets:
74.4 68.6 78.0 92.9 51.0 81.1 62.2 47.0 22.5 43.5 90.4
Is it possible to obtain the same result without the use of a for loop? In other words, can this code be vectorized? Would such an approach be more efficient than the current one?
EDIT:
I did some testing of the proposed solutions. The data I used was a 1,510 x 2,185 matrix with 10,103 subsets that varied in length from 2 to 916 with a standard deviation of subset length of 101.92.
I wrapped each solution in tic;for k=1:1000 [code here] end; toc; and here are the results:
for loop approach --- Elapsed time is 16.237400 seconds.
Shai's approach --- Elapsed time is 153.707076 seconds.
Dan's approach --- Elapsed time is 44.774121 seconds.
Divakar's approach #2 --- Elapsed time is 127.621515 seconds.
Notes:
I also tried benchmarking Dan's approach by wrapping the k=1:1000 for loop around just the accumarray line (since the rest could be
theoretically run just once). In this case the time was 28.29
seconds.
Benchmarking Shai's approach, while leaving the lb = ... line out
of the k loop, the time was 113.48 seconds.
When I ran Divakar's code, I got Non-singleton dimensions of the two
input arrays must match each other. errors for the bsxfun lines.
I "fixed" this by using conjugate transposition (the apostrophe
operator ') on trade_starts(1:starts_extent) and
intv(1:starts_extent) in the lines of code calling bsxfun. I'm
not sure why this error was occuring...
I'm not sure if my benchmarking setup is correct, but it appears that the for loop actually runs the fastest in this case.
One approach is to use accumarray. Unfortunately in order to do that we first need to "label" your logical matrix. Here is a convoluted way of doing that if you don't have the image processing toolbox:
sz=size(myLogicals);
s_ind(sz(1),sz(2))=0;
%// OR: s_ind = zeros(size(myLogicals))
s_ind(starts) = 1;
labelled = cumsum(s_ind(:)).*myLogicals(:);
So that just does what Shai's bwlabeln implementation does (but this will be 1-by-numel(myLogicals) in shape as opposed to size(myLogicals) in shape)
Now you can use accumarray:
accumarray(labelled(myLogicals), myArray(myLogicals), [], #max)
or else it may be faster to try
result = accumarray(labelled+1, myArray(:), [], #max);
result = result(2:end)
This is fully vectorized, but is it worth it? You'll have to do speed tests against your loop solution to know.
Use bwlabeln with a vertical connectivity:
lb = bwlabeln( myLogicals, [0 1 0; 0 1 0; 0 1 0] );
Now you have a label 1..11 for each region.
To get max value you can use regionprops
props = regionprops( lb, myArray, 'MaxIntensity' );
finalAnswers = [props.MaxIntensity];
You can use regionprops to get some other properties of each subset, but it is not too general.
If you wish to apply a more general function to each region, e.g., median, you can use accumarray:
finalAnswer = accumarray( lb( myLogicals ), myArray( myLogicals ), [], #median );
Ideas behind vectorization and optimization
One of the approaches that one can employ to vectorize this problem would be to convert the subsets into regular shaped blocks and then finding the max of the elements
of the those blocks in one go. Now, converting to regular shaped blocks has one issue here and it is that the subsets are unequal in lengths. To avoid this issue, one can
create a 2D matrix of indices starting from each of starts elements and extending until the maximum of the subset lengths. Good thing about this is, it allows
vectorization, but at the cost of more memory requirements which would depend on the scattered-ness of the subsets lengths.
Another issue with this vectorization technique would be that it could potentially lead to out-of-limits indices creations for final subsets.
To avoid this, one can think of two possible ways -
Use a bigger input array by extending the input array such that maximum of the subset lengths plus the starts indices still lie within the confinements of the
extended array.
Use the original input array for starts until we are within the limits of original input array and then for the rest of the subsets use the original loop code. We can call it the mixed programming just for the sake of having a short title. This would save us memory requirements on creating the extended array as discussed in the other approach earlier.
These two ways/approaches are listed next.
Approach #1: Vectorized technique
[m,n] = size(myArray); %// store no. of rows and columns in input array
intv = ends-starts; %// intervals
max_intv = max(intv); %// max interval
max_intv_arr = [0:max_intv]'; %//'# array of max indices extent
[row1,col1] = ind2sub([m n],starts); %// get starts row and column indices
m_ext = max(row1+max_intv); %// no. of rows in extended input array
myArrayExt(m_ext,n)=0; %// extended form of input array
myArrayExt(1:m,:) = myArray;
%// New linear indices for extended form of input array
idx = bsxfun(#plus,max_intv_arr,(col1-1)*m_ext+row1);
%// Index into extended array; select only valid ones by setting rest to nans
selected_ele = myArrayExt(idx);
selected_ele(bsxfun(#gt,max_intv_arr,intv))= nan;
%// Get the max of the valid ones for the desired output
out = nanmax(selected_ele); %// desired output
Approach #2: Mixed programming
%// PART - I: Vectorized technique for subsets that when normalized
%// with max extents still lie within limits of input array
intv = ends-starts; %// intervals
max_intv = max(intv); %// max interval
%// Find the last subset that when extended by max interval would still
%// lie within the limits of input array
starts_extent = find(starts+max_intv<=numel(myArray),1,'last');
max_intv_arr = [0:max_intv]'; %//'# Array of max indices extent
%// Index into extended array; select only valid ones by setting rest to nans
selected_ele = myArray(bsxfun(#plus,max_intv_arr,starts(1:starts_extent)));
selected_ele(bsxfun(#gt,max_intv_arr,intv(1:starts_extent))) = nan;
out(numel(starts)) = 0; %// storage for output
out(1:starts_extent) = nanmax(selected_ele); %// output values for part-I
%// PART - II: Process rest of input array elements
for n = starts_extent+1:numel(starts)
out(n) = max(myArray(starts(n):ends(n)));
end
Benchmarking
In this section we will compare the the two approaches and the original loop code against each other for performance. Let's setup codes before starting the actual benchmarking -
N = 10000; %// No. of subsets
M1 = 1510; %// No. of rows in input array
M2 = 2185; %// No. of cols in input array
myArray = rand(M1,M2); %// Input array
num_runs = 50; %// no. of runs for each method
%// Form the starts and ends by getting a sorted random integers array from
%// 1 to one minus no. of elements in input array. That minus one is
%// compensated later on into ends because we don't want any subset with
%// starts and ends as the same index
y1 = reshape(sort(randi(numel(myArray)-1,1,2*N)),2,[]);
starts = y1(1,:);
ends = y1(1,:)+1;
%// Remove identical starts elements
invalid = [false any(diff(starts,[],2)==0,1)];
starts = starts(~invalid);
ends = ends(~invalid);
%// Create myLogicals
myLogicals = false(size(myArray));
for k1=1:numel(starts)
myLogicals(starts(k1):ends(k1))=1;
end
clear invalid y1 k1 M1 M2 N %// clear unnecessary variables
%// Warm up tic/toc.
for k = 1:100
tic(); elapsed = toc();
end
Now, the placebo codes that gets us the runtimes -
disp('---------------------- With Original loop code')
tic
for iter = 1:num_runs
%// ...... approach #1 codes
end
toc
%// clear out variables used in the above approach
%// repeat this for approach #1,2
Benchmark Results
In your comments, you mentioned using 1510 x 2185 matrix, so let's do two case runs with such size and subsets of size 10000 and 2000.
Case 1 [Input - 1510 x 2185 matrix, Subsets - 10000]
---------------------- With Original loop code
Elapsed time is 15.625212 seconds.
---------------------- With Approach #1
Elapsed time is 12.102567 seconds.
---------------------- With Approach #2
Elapsed time is 0.983978 seconds.
Case 2 [Input - 1510 x 2185 matrix, Subsets - 2000]
---------------------- With Original loop code
Elapsed time is 3.045402 seconds.
---------------------- With Approach #1
Elapsed time is 11.349107 seconds.
---------------------- With Approach #2
Elapsed time is 0.214744 seconds.
Case 3 [Bigger Input - 3000 x 3000 matrix, Subsets - 20000]
---------------------- With Original loop code
Elapsed time is 12.388061 seconds.
---------------------- With Approach #1
Elapsed time is 12.545292 seconds.
---------------------- With Approach #2
Elapsed time is 0.782096 seconds.
Note that the number of runs num_runs was varied to keep the runtime of the fastest approach close to 1 sec.
Conclusions
So, I guess the mixed programming (approach #2) is the way to go! As future work, one can use standard deviation into the scattered-ness criteria if the performance suffers because of the scattered-ness and offload the work for most scattered subsets (in terms of their lengths) into the loop code.
Efficiency
Measure both the vectorised & for-loop code samples on your respective platform ( be it a <localhost> or Cloud-based ) to see the difference:
MATLAB:7> tic();max( myArray( startIndex(:):endIndex(:) ) );toc() %% Details
Elapsed time is 0.0312 seconds. %% below.
%% Code is not
%% the merit,
%% method is:
and
tic(); %% for/loop
for n = 1:length( startIndex ) %% may be
max( myArray( startIndex(n):endIndex(n) ) ); %% significantly
end %% faster than
toc(); %% vectorised
Elapsed time is 0.125 seconds. %% setup(s)
%% overhead(s)
%% As commented below,
%% subsequent re-runs yield unrealistic results due to caching artifacts
Elapsed time is 0 seconds.
Elapsed time is 0 seconds.
Elapsed time is 0 seconds.
%% which are not so straight visible if encapsulated in an artificial in-vitro
%% via an outer re-run repetitions ( for k=1:1000 ) et al ( ref. in text below )
For a better interpretation of the test results, rather test on much larger sizes than just on a few tens of row/cols.
EDIT:
An erroneous code removed, thanks Dan for the notice. Having taken more attention to emphasize the quantitative validation, that may prove the assumption that a vectorised code may, but need not in all circumstances, be faster is not an excuse for a faulty code, sure.
Output - quantitatively comparative data:
While recommended, there is not IMHO fair to assume, the memalloc and similar overheads to be excluded from the in-vivo testing. Test re-runs typically show VM-page hits improvements, other caching artifacts, while the raw 1st "virgin" run is what typically appears in the real code deployment ( excl. external iterators, for sure ). So consider the results with care and retest in your real environment ( sometimes being run as a Virtual Machine inside a bigger system -- that also makes VM-swap mechanics necessary to take into account once huge matrices start hurt on real-life memory-access patterns ).
On other Projects I am used to use [usec] granularity of the realtime test timing, but the more care is necessary to be taken into account about the test-execution conditions and O/S background.
So nothing but testing gives relevant answers to your specific code/deployment situation, however be methodic to compare data comparable in principle.
Alarik's code:
MATLAB:8> tic(); for k=1:1000 % ( flattens memalloc issues & al )
> for n = 1:length( startIndex )
> max( myArray( startIndex(n):endIndex() ) );
> end;
> end; toc()
Elapsed time is 0.2344 seconds.
%% time is 0.0002 seconds per k-for-loop <--[ ref.^ remarks on testing ]
Dan's code:
MATLAB:9> tic(); for k=1:1000
> s_ind( size( myLogicals ) ) = 0;
> s_ind( startIndex ) = 1;
> labelled = cumsum( s_ind(:) ).*myLogicals(:);
> result = accumarray( labelled + 1, myArray(:), [], #max );
> end; toc()
error: product: nonconformant arguments (op1 is 43x1, op2 is 45x1)
%%
%% [Work in progress] to find my mistake -- sorry for not being able to reproduce
%% Dan's code and to make it work
%%
%% Both myArray and myLogicals shape was correct ( 9 x 5 )

Daily values to Monthly Means for Multiple Years Matlab

I have observed daily data that I need to compare to generated Monthly data so I need to get a mean of each month over the thirty year period.
My observed data set is currently in 365x31 with rows being each day (no leap years!) and the extra column being the month number (1-12).
the problem I am having is that I can only seem to get a script to get the mean of all years. ie. I cannot figure how to get the script to do it for each column separately. Example of the data is below:
1 12 14
1 -15 10
2 13 3
2 2 37
...all the way to 12 for 365 rows
SO: to recap, I need to get the mean of [12; -15; 13; 2] then [14; 10; 3; 37] and so on.
I have been trying to use the unique() function to loop through which works for getting the number rows to average but incorrect means. Now I need it to do each month(28-31 rows) and column individually. Result should be a 12x30 matrix. I feel like I am missing something SIMPLE. Code:
u = unique(m); %get unique values of m (months) ie) 1-12
for i=1:length(u)
month(i) = mean(obatm(u(i), (2:31)); % the average for each month of each year
end
Appreciate any ideas! Thanks!
You can simply filter the rows for each month and then apply mean, like so:
month = zeros(12, size(obatm, 2));
for k = 1:12
month(k, :) = mean(obatm(obatm(:, 1) == k, :));
end
EDIT:
If you want something fancy, you can also do this:
cols = size(obatm, 2) - 1;
subs = bsxfun(#plus, obatm(:, 1), (0:12:12 * (cols - 1)));
vals = obatm(:, 2:end);
month = reshape(accumarray(subs(:), vals(:), [12 * cols, 1], #mean), 12, cols)
Look, Ma, no loops!

matlab updating time vector

I have 19 cells (19x1) with temperature data for an entire year where the first 18 cells represent 20 days (each) and the last cell represents 5 days, hence (18*20)+5 = 365days.
In each cell there should be 7200 measurements (apart from cell 19) where each measurement is taken every 4 minutes thus 360 measurements per day (360*20 = 7200).
The time vector for the measurements is only expressed as day number i.e. 1,2,3...and so on (thus no decimal day),
which is therefore displayed as 360 x 1's... and so on.
As the sensor failed during some days, some of the cells contain less than 7200 measurements, where one in
particular only contains 858 rows, which looks similar to the following example:
a=rand(858,3);
a(1:281,1)=1;
a(281:327,1)=2;
a(327:328,1)=5;
a(329:330,1)=9;
a(331:498,1)=19;
a(499:858,1)=20;
Where column 1 = day, column 2 and 3 are the data.
By knowing that each day number should be repeated 360 times is there a method for including an additional
amount of every value from 1:20 in order to make up the 360. For example, the first column requires
79 x 1's, 46 x 2's, 360 x 3's... and so on; where the final array should therefore have 7200 values in
order from 1 to 20.
If this is possible, in the rows where these values have been added, the second and third column should
changed to nan.
I realise that this is an unusual question, and that it is difficult to understand what is asked, but I hope I have been clear in expressing what i'm attempting to
acheive. Any advice would be much appreciated.
Here's one way to do it for a given element of the cell matrix:
full=zeros(7200,3)+NaN;
for i = 1:20 % for each day
starti = (i-1)*360; % find corresponding 360 indices into full array
full( starti + (1:360), 1 ) = i; % assign the day
idx = find(a(:,1)==i); % find any matching data in a for that day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % copy matching data over
end
You could probably use arrayfun to make this slicker, and maybe (??) faster.
You could make this into a function and use cellfun to apply it to your cell.
PS - if you ask your question at the Matlab help forums you'll most definitely get a slicker & more efficient answer than this. Probably involving bsxfun or arrayfun or accumarray or something like that.
Update - to do this for each element in the cell array the only change is that instead of searching for i as the day number you calculate it based on how far allong the cell array you are. You'd do something like (untested):
for k = 1:length(cellarray)
for i = 1:length(cellarray{k})
starti = (i-1)*360; % ... as before
day = (k-1)*20 + i; % first cell is days 1-20, second is 21-40,...
full( starti + (1:360),1 ) = day; % <-- replace i with day
idx = find(a(:,1)==day); % <-- replace i with day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % same as before
end
end
I am not sure I understood correctly what you want to do but this below works out how many measurements you are missing for each day and add at the bottom of your 'a' matrix additional lines so you do get the full 7200x3 matrix.
nbMissing = 7200-size(a,1);
a1 = nan(nbmissing,3)
l=0
for i = 1:20
nbMissing_i = 360-sum(a(:,1)=i);
a1(l+1:l+nbMissing_i,1)=i;
l = l+nb_Missing_i;
end
a_filled = [a;a1];