Calculate a "running" maximum of a vector - matlab

I have the following matrix which keeps track of the starting and ending points of data ranges (the first column represents "starts" and the second column represents the "ends"):
myMatrix = [
162 199; %// this represents the range 162:199
166 199; %// this represents the range 166:199
180 187; %// and so on...
314 326;
323 326;
397 399;
419 420;
433 436;
576 757;
579 630;
634 757;
663 757;
668 757;
676 714;
722 757;
746 757;
799 806;
951 953;
1271 1272
];
I need to eliminate all the ranges (ie. rows) which are contained within a larger range present in the matrix. For example the ranges [166:199] and [180:187] are contained within the range [162:199] and thus, rows 2 and 3 would need to be removed.
The solution I thought of was to calculate a sort of "running" max on the second column to which subsequent values of the column are compared to determine whether or not they need to be removed. I implemented this with the use of a for loop as follows:
currentMax = myMatrix(1,2); %//set first value as the maximum
[sizeOfMatrix,~] = size(myMatrix); %//determine the number of rows
rowsToRemove = false(sizeOfMatrix,1); %//pre-allocate final vector of logicals
for m=2:sizeOfMatrix
if myMatrix(m,2) > currentMax %//if new max is reached, update currentMax...
currentMax = myMatrix(m,2);
else
rowsToRemove(m) = true; %//... else mark that row for removal
end
end
myMatrix(rowsToRemove,:) = [];
This correctly removes the "redundant" ranges in myMatrix and produces the following matrix:
myMatrix =
162 199
314 326
397 399
419 420
433 436
576 757
799 806
951 953
1271 1272
Onto the questions:
1) It would seem that there has to be a better way of calculating a "running" max than a for loop. I looked into accumarray and filter, but could not figure out a way to do it with those functions. Is there a potential alternative that skips the for loop (some kind of vectorized code that is more efficient)?
2) Is there a completely different (that is, more efficient) way to accomplish the final goal of removing all the ranges that are contained within larger ranges in myMatrix? I don't know if I'm over-thinking this whole thing...

Approach #1
bsxfun based brute-force approach -
myMatrix(sum(bsxfun(#ge,myMatrix(:,1),myMatrix(:,1)') & ...
bsxfun(#le,myMatrix(:,2),myMatrix(:,2)'),2)<=1,:)
Few explanations on the proposed solution:
Compare all starts indices against each other for "contained-ness" and similarly for ends indices. Note that the "contained-ness" criteria has to be for either of these two :
Greater than or equal to for starts and lesser than or equal to for ends
Lesser than or equal to for starts and greater than or equal to for ends.
I just so happen to go with the first option.
See which rows satisfy at least one "contained-ness" and remove those to have the desired result.
Approach #2
If you are okay with an output that has sorted rows according to the first column and if there are lesser number of local max's, you can try this alternative approach -
myMatrix_sorted = sortrows(myMatrix,1);
col2 = myMatrix_sorted(:,2);
max_idx = 1:numel(col2);
while 1
col2_selected = col2(max_idx);
N = numel(col2_selected);
labels = cumsum([true ; diff(col2_selected)>0]);
idx1 = accumarray(labels, 1:N ,[], #(x) findmax(x,col2_selected));
if numel(idx1)==N
break;
end
max_idx = max_idx(idx1);
end
out = myMatrix_sorted(max_idx,:); %// desired output
Associated function code -
function ix = findmax(indx, s)
[~,ix] = max(s(indx));
ix = indx(ix);
return;

I ended up using the following for the "running maximum" problem (but have no comment on its efficiency relative to other solutions):
function x = cummax(x)
% Cumulative maximum along dimension 1
% Adapted from http://www.mathworks.com/matlabcentral/newsreader/view_thread/126657
% Is recursive, but magically so, such that the number of recursions is proportional to log(n).
n = size(x, 1);
%fprintf('%d\n', n)
if n == 2
x(2, :) = max(x);
elseif n % had to add this condition relative to the web version, otherwise it would recurse infinitely with n=0
x(2:2:n, :) = cummax(max(x(1:2:n-1, :), x(2:2:n, :)));
x(3:2:n, :) = max(x(3:2:n, :), x(2:2:n-1, :));
end

Related

How can I vectorize code that runs a function on subsets of a larger matrix?

Let's assume I have the following 9 x 5 matrix:
myArray = [
54.7 8.1 81.7 55.0 22.5
29.6 92.9 79.4 62.2 17.0
74.4 77.5 64.4 58.7 22.7
18.8 48.6 37.8 20.7 43.5
68.6 43.5 81.1 30.1 31.1
18.3 44.6 53.2 47.0 92.3
36.8 30.6 35.0 23.0 43.0
62.5 50.8 93.9 84.4 18.4
78.0 51.0 87.5 19.4 90.4
];
I have 11 "subsets" of this matrix and I need to run a function (let's say max) on each of these subsets. The subsets can be identified with the following matirx of logicals (identified column-wise, not row-wise):
myLogicals = logical([
0 1 0 1 1
1 1 0 1 1
1 1 0 0 0
0 1 0 1 1
1 0 1 1 1
1 1 1 1 0
0 1 1 0 1
1 1 0 0 1
1 1 0 0 1
]);
or via linear indexing:
starts = [2 5 8 10 15 23 28 31 37 40 43]; #%index start of each subset
ends = [3 6 9 13 18 25 29 33 38 41 45]; #%index end of each subset
such that the first subset is 2:3, the second is 5:6, and so on.
I can find the max of each subset and store it in a vector as follows:
finalAnswers = NaN(11,1);
for n=1:length(starts) #%i.e. 1 through the number of subsets
finalAnswers(n) = max(myArray(starts(n):ends(n)));
end
After the loop runs, finalAnswers contains the maximum value of each of the data subsets:
74.4 68.6 78.0 92.9 51.0 81.1 62.2 47.0 22.5 43.5 90.4
Is it possible to obtain the same result without the use of a for loop? In other words, can this code be vectorized? Would such an approach be more efficient than the current one?
EDIT:
I did some testing of the proposed solutions. The data I used was a 1,510 x 2,185 matrix with 10,103 subsets that varied in length from 2 to 916 with a standard deviation of subset length of 101.92.
I wrapped each solution in tic;for k=1:1000 [code here] end; toc; and here are the results:
for loop approach --- Elapsed time is 16.237400 seconds.
Shai's approach --- Elapsed time is 153.707076 seconds.
Dan's approach --- Elapsed time is 44.774121 seconds.
Divakar's approach #2 --- Elapsed time is 127.621515 seconds.
Notes:
I also tried benchmarking Dan's approach by wrapping the k=1:1000 for loop around just the accumarray line (since the rest could be
theoretically run just once). In this case the time was 28.29
seconds.
Benchmarking Shai's approach, while leaving the lb = ... line out
of the k loop, the time was 113.48 seconds.
When I ran Divakar's code, I got Non-singleton dimensions of the two
input arrays must match each other. errors for the bsxfun lines.
I "fixed" this by using conjugate transposition (the apostrophe
operator ') on trade_starts(1:starts_extent) and
intv(1:starts_extent) in the lines of code calling bsxfun. I'm
not sure why this error was occuring...
I'm not sure if my benchmarking setup is correct, but it appears that the for loop actually runs the fastest in this case.
One approach is to use accumarray. Unfortunately in order to do that we first need to "label" your logical matrix. Here is a convoluted way of doing that if you don't have the image processing toolbox:
sz=size(myLogicals);
s_ind(sz(1),sz(2))=0;
%// OR: s_ind = zeros(size(myLogicals))
s_ind(starts) = 1;
labelled = cumsum(s_ind(:)).*myLogicals(:);
So that just does what Shai's bwlabeln implementation does (but this will be 1-by-numel(myLogicals) in shape as opposed to size(myLogicals) in shape)
Now you can use accumarray:
accumarray(labelled(myLogicals), myArray(myLogicals), [], #max)
or else it may be faster to try
result = accumarray(labelled+1, myArray(:), [], #max);
result = result(2:end)
This is fully vectorized, but is it worth it? You'll have to do speed tests against your loop solution to know.
Use bwlabeln with a vertical connectivity:
lb = bwlabeln( myLogicals, [0 1 0; 0 1 0; 0 1 0] );
Now you have a label 1..11 for each region.
To get max value you can use regionprops
props = regionprops( lb, myArray, 'MaxIntensity' );
finalAnswers = [props.MaxIntensity];
You can use regionprops to get some other properties of each subset, but it is not too general.
If you wish to apply a more general function to each region, e.g., median, you can use accumarray:
finalAnswer = accumarray( lb( myLogicals ), myArray( myLogicals ), [], #median );
Ideas behind vectorization and optimization
One of the approaches that one can employ to vectorize this problem would be to convert the subsets into regular shaped blocks and then finding the max of the elements
of the those blocks in one go. Now, converting to regular shaped blocks has one issue here and it is that the subsets are unequal in lengths. To avoid this issue, one can
create a 2D matrix of indices starting from each of starts elements and extending until the maximum of the subset lengths. Good thing about this is, it allows
vectorization, but at the cost of more memory requirements which would depend on the scattered-ness of the subsets lengths.
Another issue with this vectorization technique would be that it could potentially lead to out-of-limits indices creations for final subsets.
To avoid this, one can think of two possible ways -
Use a bigger input array by extending the input array such that maximum of the subset lengths plus the starts indices still lie within the confinements of the
extended array.
Use the original input array for starts until we are within the limits of original input array and then for the rest of the subsets use the original loop code. We can call it the mixed programming just for the sake of having a short title. This would save us memory requirements on creating the extended array as discussed in the other approach earlier.
These two ways/approaches are listed next.
Approach #1: Vectorized technique
[m,n] = size(myArray); %// store no. of rows and columns in input array
intv = ends-starts; %// intervals
max_intv = max(intv); %// max interval
max_intv_arr = [0:max_intv]'; %//'# array of max indices extent
[row1,col1] = ind2sub([m n],starts); %// get starts row and column indices
m_ext = max(row1+max_intv); %// no. of rows in extended input array
myArrayExt(m_ext,n)=0; %// extended form of input array
myArrayExt(1:m,:) = myArray;
%// New linear indices for extended form of input array
idx = bsxfun(#plus,max_intv_arr,(col1-1)*m_ext+row1);
%// Index into extended array; select only valid ones by setting rest to nans
selected_ele = myArrayExt(idx);
selected_ele(bsxfun(#gt,max_intv_arr,intv))= nan;
%// Get the max of the valid ones for the desired output
out = nanmax(selected_ele); %// desired output
Approach #2: Mixed programming
%// PART - I: Vectorized technique for subsets that when normalized
%// with max extents still lie within limits of input array
intv = ends-starts; %// intervals
max_intv = max(intv); %// max interval
%// Find the last subset that when extended by max interval would still
%// lie within the limits of input array
starts_extent = find(starts+max_intv<=numel(myArray),1,'last');
max_intv_arr = [0:max_intv]'; %//'# Array of max indices extent
%// Index into extended array; select only valid ones by setting rest to nans
selected_ele = myArray(bsxfun(#plus,max_intv_arr,starts(1:starts_extent)));
selected_ele(bsxfun(#gt,max_intv_arr,intv(1:starts_extent))) = nan;
out(numel(starts)) = 0; %// storage for output
out(1:starts_extent) = nanmax(selected_ele); %// output values for part-I
%// PART - II: Process rest of input array elements
for n = starts_extent+1:numel(starts)
out(n) = max(myArray(starts(n):ends(n)));
end
Benchmarking
In this section we will compare the the two approaches and the original loop code against each other for performance. Let's setup codes before starting the actual benchmarking -
N = 10000; %// No. of subsets
M1 = 1510; %// No. of rows in input array
M2 = 2185; %// No. of cols in input array
myArray = rand(M1,M2); %// Input array
num_runs = 50; %// no. of runs for each method
%// Form the starts and ends by getting a sorted random integers array from
%// 1 to one minus no. of elements in input array. That minus one is
%// compensated later on into ends because we don't want any subset with
%// starts and ends as the same index
y1 = reshape(sort(randi(numel(myArray)-1,1,2*N)),2,[]);
starts = y1(1,:);
ends = y1(1,:)+1;
%// Remove identical starts elements
invalid = [false any(diff(starts,[],2)==0,1)];
starts = starts(~invalid);
ends = ends(~invalid);
%// Create myLogicals
myLogicals = false(size(myArray));
for k1=1:numel(starts)
myLogicals(starts(k1):ends(k1))=1;
end
clear invalid y1 k1 M1 M2 N %// clear unnecessary variables
%// Warm up tic/toc.
for k = 1:100
tic(); elapsed = toc();
end
Now, the placebo codes that gets us the runtimes -
disp('---------------------- With Original loop code')
tic
for iter = 1:num_runs
%// ...... approach #1 codes
end
toc
%// clear out variables used in the above approach
%// repeat this for approach #1,2
Benchmark Results
In your comments, you mentioned using 1510 x 2185 matrix, so let's do two case runs with such size and subsets of size 10000 and 2000.
Case 1 [Input - 1510 x 2185 matrix, Subsets - 10000]
---------------------- With Original loop code
Elapsed time is 15.625212 seconds.
---------------------- With Approach #1
Elapsed time is 12.102567 seconds.
---------------------- With Approach #2
Elapsed time is 0.983978 seconds.
Case 2 [Input - 1510 x 2185 matrix, Subsets - 2000]
---------------------- With Original loop code
Elapsed time is 3.045402 seconds.
---------------------- With Approach #1
Elapsed time is 11.349107 seconds.
---------------------- With Approach #2
Elapsed time is 0.214744 seconds.
Case 3 [Bigger Input - 3000 x 3000 matrix, Subsets - 20000]
---------------------- With Original loop code
Elapsed time is 12.388061 seconds.
---------------------- With Approach #1
Elapsed time is 12.545292 seconds.
---------------------- With Approach #2
Elapsed time is 0.782096 seconds.
Note that the number of runs num_runs was varied to keep the runtime of the fastest approach close to 1 sec.
Conclusions
So, I guess the mixed programming (approach #2) is the way to go! As future work, one can use standard deviation into the scattered-ness criteria if the performance suffers because of the scattered-ness and offload the work for most scattered subsets (in terms of their lengths) into the loop code.
Efficiency
Measure both the vectorised & for-loop code samples on your respective platform ( be it a <localhost> or Cloud-based ) to see the difference:
MATLAB:7> tic();max( myArray( startIndex(:):endIndex(:) ) );toc() %% Details
Elapsed time is 0.0312 seconds. %% below.
%% Code is not
%% the merit,
%% method is:
and
tic(); %% for/loop
for n = 1:length( startIndex ) %% may be
max( myArray( startIndex(n):endIndex(n) ) ); %% significantly
end %% faster than
toc(); %% vectorised
Elapsed time is 0.125 seconds. %% setup(s)
%% overhead(s)
%% As commented below,
%% subsequent re-runs yield unrealistic results due to caching artifacts
Elapsed time is 0 seconds.
Elapsed time is 0 seconds.
Elapsed time is 0 seconds.
%% which are not so straight visible if encapsulated in an artificial in-vitro
%% via an outer re-run repetitions ( for k=1:1000 ) et al ( ref. in text below )
For a better interpretation of the test results, rather test on much larger sizes than just on a few tens of row/cols.
EDIT:
An erroneous code removed, thanks Dan for the notice. Having taken more attention to emphasize the quantitative validation, that may prove the assumption that a vectorised code may, but need not in all circumstances, be faster is not an excuse for a faulty code, sure.
Output - quantitatively comparative data:
While recommended, there is not IMHO fair to assume, the memalloc and similar overheads to be excluded from the in-vivo testing. Test re-runs typically show VM-page hits improvements, other caching artifacts, while the raw 1st "virgin" run is what typically appears in the real code deployment ( excl. external iterators, for sure ). So consider the results with care and retest in your real environment ( sometimes being run as a Virtual Machine inside a bigger system -- that also makes VM-swap mechanics necessary to take into account once huge matrices start hurt on real-life memory-access patterns ).
On other Projects I am used to use [usec] granularity of the realtime test timing, but the more care is necessary to be taken into account about the test-execution conditions and O/S background.
So nothing but testing gives relevant answers to your specific code/deployment situation, however be methodic to compare data comparable in principle.
Alarik's code:
MATLAB:8> tic(); for k=1:1000 % ( flattens memalloc issues & al )
> for n = 1:length( startIndex )
> max( myArray( startIndex(n):endIndex() ) );
> end;
> end; toc()
Elapsed time is 0.2344 seconds.
%% time is 0.0002 seconds per k-for-loop <--[ ref.^ remarks on testing ]
Dan's code:
MATLAB:9> tic(); for k=1:1000
> s_ind( size( myLogicals ) ) = 0;
> s_ind( startIndex ) = 1;
> labelled = cumsum( s_ind(:) ).*myLogicals(:);
> result = accumarray( labelled + 1, myArray(:), [], #max );
> end; toc()
error: product: nonconformant arguments (op1 is 43x1, op2 is 45x1)
%%
%% [Work in progress] to find my mistake -- sorry for not being able to reproduce
%% Dan's code and to make it work
%%
%% Both myArray and myLogicals shape was correct ( 9 x 5 )

Find extremum of multidimensional matrix in matlab

I am trying to find the Extremum of a 3-dim matrix along the 2nd dimension.
I started with
[~,index] = max(abs(mat),[],2), but I don't know how to advance from here. How is the index vector to be used together with the original matrix. Or is there a completely different solution to this problem?
To illustrate the task assume the following matrix:
mat(:,:,1) =
23 8 -4
-1 -26 46
mat(:,:,2) =
5 -27 12
2 -1 18
mat(:,:,3) =
-10 49 39
-13 -46 41
mat(:,:,4) =
30 -24 18
-40 -16 -36
The expected result would then be
ext(:,:,1) =
23
-46
ext(:,:,2) =
-27
18
ext(:,:,3) =
49
-46
ext(:,:,4) =
30
-40
I don't know how to use the index vector with mat to get the desired result ext.
1) If you want to find a maximum just along, let's say, 2d dimension, your variable index will be a matrix having dimensions (N,1,M), where N and M are number of elements of your matrix in the first and third dimensions respectively. In order to remove dummy dimensions, there is function squeeze() exist: index=squeeze(index) After that size(index) gives N,M
2) Depending on your problem, you probably need matlab function ind2sub(). First, you take a slice of your matrix, than find its maximum with linear indexing, and than you can restore your indicies with int2sub(). Here is an example for a 2D matrix:
M = randn(5,5);
[C,I] = max(M(:));
[index1,index2] = ind2sub(size(M),I);
Same method allows to find the absolute maximal element in whole 3D matrix.
Use ndgrid to generate the values along dimensions 1 and 3, and then sub2ind to combine the three indices into a linear index:
[~, jj] = max(abs(mat),[],2); %// jj: returned by max
[ii, ~, kk] = ndgrid(1:size(mat,1),1,1:size(mat,3)); %// ii, kk: all combinations
result = mat(sub2ind(size(mat), ii, jj, kk));
A fancier, one-line alternative:
result = max(complex(mat),[],2);
This works because, acccording to max documentation,
For complex input A, max returns the complex number with the largest complex modulus (magnitude), computed with max(abs(A)).

Find median value of the largest clump of similar values in an array in the most computationally efficient manner

Sorry for the long title, but that about sums it up.
I am looking to find the median value of the largest clump of similar values in an array in the most computationally efficient manner.
for example:
H = [99,100,101,102,103,180,181,182,5,250,17]
I would be looking for the 101.
The array is not sorted, I just typed it in the above order for easier understanding.
The array is of a constant length and you can always assume there will be at least one clump of similar values.
What I have been doing so far is basically computing the standard deviation with one of the values removed and finding the value which corresponds to the largest reduction in STD and repeating that for the number of elements in the array, which is terribly inefficient.
for j = 1:7
G = double(H);
for i = 1:7
G(i) = NaN;
T(i) = nanstd(G);
end
best = find(T==min(T));
H(best) = NaN;
end
x = find(H==max(H));
Any thoughts?
This possibility bins your data and looks for the bin with most elements. If your distribution consists of well separated clusters this should work reasonably well.
H = [99,100,101,102,103,180,181,182,5,250,17];
nbins = length(H); % <-- set # of bins here
[v bins]=hist(H,nbins);
[vm im]=max(v); % find max in histogram
bl = bins(2)-bins(1); % bin size
bm = bins(im); % position of bin with max #
ifb =find(abs(H-bm)<bl/2) % elements within bin
median(H(ifb)) % average over those elements in bin
Output:
ifb = 1 2 3 4 5
H(ifb) = 99 100 101 102 103
median = 101
The more challenging parameters to set are the number of bins and the size of the region to look around the most populated bin. In the example you provided neither of these is so critical, you could set the number of bins to 3 (instead of length(H)) and it still would work. Using length(H) as the number of bins is in fact a little extreme and probably not a good general choice. A better choice is somewhere between that number and the expected number of clusters.
It may help for certain distributions to change bl within the find expression to a value you judge better in advance.
I should also note that there are clustering methods (kmeans) that may work better, but perhaps less efficiently. For instance this is the output of [H' kmeans(H',4) ]:
99 2
100 2
101 2
102 2
103 2
180 3
181 3
182 3
5 4
250 3
17 1
In this case I decided in advance to attempt grouping into 4 clusters.
Using kmeans you can get an answer as follows:
nbin = 4;
km = kmeans(H',nbin);
[mv iv]=max(histc(km,[1:nbin]));
H(km==km(iv))
median(H(km==km(iv)))
Notice however that kmeans does not necessarily return the same value every time it is run, so you might need to average over a few iterations.
I timed the two methods and found that kmeans takes ~10 X longer. However, it is more robust since the bin sizes adapt to your problem and do not need to be set beforehand (only the number of bins does).

Brain teaser - filtering algorithm using moving averages

I have a 1 second dataset of 86400 wind speed (WS) values in Matlab and need assistance in filtering it. It requires a certain level of cleverness.
If the average WS exceeds:
25m/s in a 600s time interval
28m/s in a 30s time interval
30m/s in a 3 s time interval
If any of these parameters are met, the WS is deemed 'invalid' until the average WS remains below 22m/s in a 300 s time interval.
Here is what I have for the 600 second requirement. I do a 600 and 300 second moving average on the data contained in 'dataset'. I filter the intervals from the first appearance of an average 25m/s to the next appearance of a value below 22m/s as 'NaN'. After filtering, I will do another 600 second average, and the intervals with values flagged with a NaN will be left a NaN.
i.e.
Rolling600avg(:,1) = tsmovavg(dataset(:,2), 's', 600, 1);
Rolling300avg(:,1) = tsmovavg(dataset(:,2), 's', 300, 1);
a = find(Rolling600avg(:,2)>25)
b = find(Rolling300avg(:,2)<22)
dataset(a:b(a:find(b==1)),2)==NaN; %?? Not sure
This is going to require a clever use of 'find' and some indexing. Could someone help me out? The 28m/s and 30m/s filters will follow the same method.
If I follow your question, one approach is to use a for loop to identify where the NaNs should begin and end.
m = [19 19 19 19 28 28 19 19 28 28 17 17 17 19 29 18 18 29 18 29]; %Example data
a = find(m>25);
b = find(m<22);
m2 = m;
% Use a loop to isolate segments that should be NaNs;
for ii = 1:length(a)
firstNull = a(ii)
lastNull = b( find(b>firstNull,1) )-1 % THIS TRIES TO FIND A VALUE IN B GREATER THAN A(II)
% IF THERE IS NO SUCH VALUE THEN NANS SHOULD FILL TO THE END OF THE VECTOR
if isempty(lastNull),
lastNull=length(m);
end
m2(firstNull:lastNull) = NaN
end
Note that this only works if tsmovavg returns an equal length vector as the one passed to it. If not then it's trickier and will require some modifications.
There's probably some way of avoiding a for loop but this is a pretty straight forward solution.

Matlab: Cannot plot timeseries with repeated x values. How to get rid of repeated rows?

so I have a matrix Data in this format:
Data = [Date Time Price]
Now what I want to do is plot the Price against the Time, but my data is very large and has lines where there are multiple Prices for the same Date/Time, e.g. 1st, 2nd lines
29 733575.459548611 40.0500000000000
29 733575.459548611 40.0600000000000
29 733575.459548612 40.1200000000000
29 733575.45954862 40.0500000000000
I want to take an average of the prices with the same Date/Time and get rid of any extra lines. My goal is to do linear intrapolation on the values which is why I must have only one Time to one Price value.
How can I do this? I did this (this reduces the matrix so that it only takes the first line for the lines with repeated date/times) but I don't know how to take the average
function [ C ] = test( DN )
[Qrows, cols] = size(DN);
C = DN(1,:);
for i = 1:(Qrows-1)
if DN(i,2) == DN(i+1,2)
%n = 1;
%while DN(i,2) == DN(i+n,2) && i+n<Qrows
% n = n + 1;
%end
% somehow take average;
else
C = [C;DN(i+1,:)];
end
end
[C,ia,ic] = unique(A,'rows') also returns index vectors ia and ic
such that C = A(ia,:) and A = C(ic,:)
If you use as input A only the columns you do not want to average over (here: date & time), ic with one value for every row where rows you want to combine have the same value.
Getting from there to the means you want is for MATLAB beginners probably more intuitive with a for loop: Use logical indexing, e.g. DN(ic==n,3) you get a vector of all values you want to average (where n is the index of the date-time-row it belongs to). This you need to do for all different date-time-combinations.
A more vector-oriented way would be to use accumarray, which leads to a solution of your problem in two lines:
[DateAndTime,~,idx] = unique(DN(:,1:2),'rows');
Price = accumarray(idx,DN(:,3),[],#mean);
I'm not quite sure how you want the result to look like, but [DataAndTime Price] gives you the three-row format of the input again.
Note that if your input contains something like:
1 0.1 23
1 0.2 47
1 0.1 42
1 0.1 23
then the result of applying unique(...,'rows') to the input before the above lines will give a different result for 1 0.1 than using the above directly, as the latter would calculate the mean of 23, 23 and 42, while in the former case one 23 would be eliminates as duplicate before and the differing row with 42 would have a greater weight in the average.
Try the following:
[Qrows, cols] = size(DN);
% C is your result matrix
C = DN;
% this will give you the indexes where DN(i,:)==DN(i+1)
i = find(diff(DN(:,2)==0);
% replace C(i,:) with the average
C(i,:) = (DN(i,:)+DN(i+1,:))/2;
% delete the C(i+1,:) rows
C(i,:) = [];
Hope this works.
This should work if the repeated time values come in pairs (the average is calculated between i and i+1). Should you have time repeats of 3 or more then try to rethink how to change these steps.
Something like this would work, but I did not run the code so I can't promise there's no bugs.
newX = unique(DN(:,2));
newY = zeros(1,length(newX));
for ix = 1:length(newX)
allOcurrences = find(DN(:,2)==DN(i,2));
% If there's duplicates, take their mean
if numel(allOcurrences)>1
newY(ix) = mean(DN(allOcurrences,3));
else
% If not, use the only Y value
newY(ix) = DN(ix,3);
end
end