MATLAB: How to access 'Duration' datatype - matlab

Consider the following example
xDATA = data_timestamp;
[~,~,Days,Hour,Min,~] = datevec(xDATA(2:end) - xDATA(1:end - 1));
BadSamplingTime = find((Days)> 0 | (Hour)> 0 |(Min)> 5 );
In which xData contains a vector of time stamps and I am trying to find the samples with sampling time greater than 5 mins in between , the algorithm works fine but it creates 3 extra vectors for the data as big as my timestamp vector(the size to time stamp vector is pretty huge) whereas if I do this
DurationTime = xDATA(2:end) - xDATA(1:end - 1);
Instead of the second line it will just create one vector of same length of 'duration' data type which will be much easier to handle because but the problem I cant seem to access each index of the duration data type
for example
DurationTime(5,1)
ans =
26:00:01
I need to access this 26 hours part , does anyone have any idea how to do that ? or a better suggestion

You can create a duration-object and then use it to compare it with the duration vector DurationTime. The result of a>b is a logical vector that can be directly used to index the elements of DurationTime and thus giving you all the values where the duration is greater than 5 minutes.
Sidenote: You can calculate the difference/duration directly with diff.
Code:
% create example data
xDATA = (([0:4,4+26*60,4+26*60+1:4+26*60+5])/24/60+datetime('now')).';
% calculate the durations
DurationTime = xDATA(2:end) - xDATA(1:end-1); % as in the question
%DurationTime = diff(xDATA); % alternative
% get index and values of all durations greater than 5 minutes
ind = find(DurationTime>duration(0,5,0))
DurationTime(ind)
% get values of all durations greater than 5 minutes (direct solution, if no index needed)
DurationTime(DurationTime>duration(0,5,0));
Result:
ind =
5
ans =
26:00:00

Related

Efficient way to get the value that is the most duplicated in a Matrix

I am trying to get the value that is the most duplicated (and the percentage of times that it is duplicated). Here is an example:
A = [5 5 1 2 3 4 6 6 7 7 7 8 8 8 8];
mostduplicatevalue(A) should return 8 and the percentage is 4/length(A).
I am currently doing the following (see below), but it takes approx 5/6 seconds to obtain the result for a matrix of 1300*5000. What is a better way to achieve this result?
function [MostDuplicateValue, MostDuplicatePerc] = mostduplicatevalue(A)
% What is the value that is duplicates the most and what percentage of the
% sample does it represent?
% Value that is Most Duplicated
tbl = tabulate(A(:));
[~,bi] = max(tbl(:,2));
MostDuplicateValue = tbl(bi,1);
MostDuplicatePerc = tbl(bi,3)/100;
end
Here is one possible answer:
function [MostDuplValue, MostDuplPerc, MostDuplCount] = mostduplicatevalue(A)
% What is the value that is duplicates the most and what percentage of the
% sample does it represent?
[MostDuplValue,MostDuplCount] = mode(A(:));
MostDuplPerc = MostDuplCount / sum(sum(~isnan(A)));
end
Solution based on first sorting the array (very costly operation) and then finding the longest streak of the same number with diff. Empirically it seems to be slightly faster (takes about 2/3 of the duration of your proposal at 1300x5000). Has the side benefit that if multiple numbers occur the most, it will return all of them.
% sort array and pad it with -inf and inf
B = [-inf; sort(A(:)); inf];
% find indexes where the streak of each number begins
C = find(diff(B));
% count the length of the streaks
D = diff(C);
% extract the numbers with the longest streak
MostDuplValue = B(C(logical([0; D==max(D)])));
% calc percentage of most occuring value
MostDuplPerc = max(D)/numel(A);

Create new data series based on two data series with different sampling time

I have two data sets with speed and directional data, recorded with different time steps.
One data set (A) is recorded every 10 minutes, and the other (B) is recorded every hour.
The start times are not exactly the same.
A (speed and directional data) is sampled every 10 minutes e.g. 00.00, 00.10, 00.20, ...
B (directional data) is sampled every hour e.g. 23.54, 00.54, 01.54, ...
I would like to create a new version of data set B with directional data (kind of a synthesized data set) based on data set A where I fill in the recordings for every 10 minutes from data set A and keep the original recording of data set B for every hour.
Example data:
% columns: timestamp, direction, speed
A = [732381.006944445 22.70 2.23
732381.013888889 18.20 3.41
732381.020833333 31.00 6.97
732381.027777778 36.90 5.63];
% columns: timestamp, direction
B = [732381.038078704 3.01
732381.079745370 5.63
732381.121412037 0.68
732381.163078704 359.56];
..and I want something like this..
% columns: timestamp, direction
B_new = [732381.038078704 'some value based on value in A at that time'
732381.079745370 'some value based on value in A at that time'
732381.121412037 'some value based on value in A at that time'
732381.163078704 'some value based on value in A at that time'];
So the first column in the B_new matrix are time stamps of 10 minutes, not the original timestamps of one hour. Ie. we create a new timeseries (B_new) with a sampling of 10 minutes. So something like you already showed #Wolfie, but with timestep of matrix A.
What is the best way to assign the direction data in B as the direction data at the closest available time in A while still keeping the same data sampling as A in the new matrix B?
This is easily achieved with interp1 (a table lookup function).
Interpolating to slower sampling
Let's say you have some nice clean data A and B for this demo...
% Columns: time (0.1s timestep), data (just the row number)
A = [ (1:0.1:2); (1:11) ].';
% Columns: time (1.0s timestep), data (doesn't even matter, to be removed)
B = [ (1:1:2); rand(1,2) ].';
Now we use interp1 to get the closest data value (in terms of the time column) from A and assign it to B_new.
B_new = zeros(size(B)); % Initialise
B_new(:,1) = B(:,1); % Get time data from B
% Get nearest neighbour by specifying the 'nearest' method.
% Using 'extrap' means we extrapolate if B's times aren't contained by A's
B_new(:,2) = interp1(A(:,1), A(:,2), B_new(:,1), 'nearest', 'extrap');
% Output
disp(B_new)
% >> [ 1 1
% 2 11 ]
% This is as expected, because 1 and 11 are the values at t = 1 and 2
% in the A data, where t = 1 and 2 are the time values in the B data.
Interpolating to higher sampling
We can do the opposite too. You suggested you want to take some base data, A, and in-fill the points which you had for B (or nearest match).
B_new = A; % Initialise to fast sample data
% Get row indices of nearest neighbour (in time) by using interp1 and mapping
% onto a list of integers 1 to number of rows in A
idx = interp1(A(:,1), 1:size(A,1), B(:,1), 'nearest', 'extrap');
% Overwrite those values (which were originally from A) with values from B
B_new(idx,2) = B(:,2);

Splitting a numerical matrix by column values in MATLAB

I have a matrix in MATLAB of 50572x4 doubles. The last column has datenum format dates, increasing values from 7.3025e+05 to 7.3139e+05. The question is:
How can I split this matrix into sub-matrices, each that cover intervals of 30 days?
If I'm not being clear enough… the difference between the first element in the 4th column and the last element in the 4th column is 7.3139e5 − 7.3025e5 = 1.1376e3, or 1137.6. I would like to partition this into 30 day segments, and get a bunch of matrices that have a range of 30 for the 4th columns. I'm not quite sure how to go about doing this...I'm quite new to MATLAB, but the dataset I'm working with has only this representation, necessitating such an action.
Note that a unit interval between datenum timestamps represents 1 day, so your data, in fact, covers a time period of 1137.6 days). The straightforward approach is to compare each timestamps with the edges in order to determine which 30-day interval it belongs to:
t = A(:, end) - min(A:, end); %// Normalize timestamps to start from 0
idx = sum(bsxfun(#lt, t, 30:30:max(t))); %// Starting indices of intervals
rows = diff([0, idx, numel(t)]); %// Number of rows in each interval
where A is your data matrix, where the last column is assumed to contain the timestamps. rows stores the number of rows of the corresponding 30-day intervals. Finally, you can employ cell arrays to split the original data matrix:
C = mat2cell(A, rows, size(A, 2)); %// Split matrix into intervals
C = C(~cellfun('isempty', C)); %// Remove empty matrices
Hope it helps!
Well, all you need is to find the edge times and the matrix indexes in between them. So, if your numbers are at datenum format, one unit is the same as one day, which means that we can jump from 30 and 30 units until we get as close as we can to the end, as follows:
startTime = originalMatrix(1,4);
endTime = originalMatrix(end,4);
edgeTimes = startTime:30:endTime;
% And then loop though the edges checking for samples that complete a cycle:
nEdges = numel(edgeTimes);
totalMeasures = size(originalMatrix,1);
subMatrixes = cell(1,nEdges);
prevEdgeIdx = 0;
for curEdgeIdx = 1:nEdges
nearIdx=getNearestIdx(originalMatrix(:,4),edgeTimes(curEdgeIdx));
if originalMatrix(nearIdx,4)>edgeTimes(curEdgeIdx)
nearIdx = nearIdx-1;
end
if nearIdx>0 && nearIdx<=totalMeasures
subMatrix{curEdgeIdx} = originalMatrix(prevEdgeIdx+1:curEdgeIdx,:);
prevEdgeIdx=curEdgeIdx;
else
error('For some reason the edge was not inbound.');
end
end
% Now we check for the remaining days after the edges which does not complete a 30 day cycle:
if curEdgeIdx<totalMeasures
subMatrix{end+1} = originalMatrix(curEdgeIdx+1:end,:);
end
The function getNearestIdx was discussed here and it gives you the nearest point from the input values without checking all possible points.
function vIdx = getNearestIdx(values,point)
if isempty(values) || ~numel(values)
vIdx = [];
return
end
vIdx = 1+round((point-values(1))*(numel(values)-1)...
/(values(end)-values(1)));
if vIdx < 1, vIdx = []; end
if vIdx > numel(values), vIdx = []; end
end
Note: This is pseudocode and may contain errors. Please try to adjust it into your problem.

Matlab: Cannot plot timeseries with repeated x values. How to get rid of repeated rows?

so I have a matrix Data in this format:
Data = [Date Time Price]
Now what I want to do is plot the Price against the Time, but my data is very large and has lines where there are multiple Prices for the same Date/Time, e.g. 1st, 2nd lines
29 733575.459548611 40.0500000000000
29 733575.459548611 40.0600000000000
29 733575.459548612 40.1200000000000
29 733575.45954862 40.0500000000000
I want to take an average of the prices with the same Date/Time and get rid of any extra lines. My goal is to do linear intrapolation on the values which is why I must have only one Time to one Price value.
How can I do this? I did this (this reduces the matrix so that it only takes the first line for the lines with repeated date/times) but I don't know how to take the average
function [ C ] = test( DN )
[Qrows, cols] = size(DN);
C = DN(1,:);
for i = 1:(Qrows-1)
if DN(i,2) == DN(i+1,2)
%n = 1;
%while DN(i,2) == DN(i+n,2) && i+n<Qrows
% n = n + 1;
%end
% somehow take average;
else
C = [C;DN(i+1,:)];
end
end
[C,ia,ic] = unique(A,'rows') also returns index vectors ia and ic
such that C = A(ia,:) and A = C(ic,:)
If you use as input A only the columns you do not want to average over (here: date & time), ic with one value for every row where rows you want to combine have the same value.
Getting from there to the means you want is for MATLAB beginners probably more intuitive with a for loop: Use logical indexing, e.g. DN(ic==n,3) you get a vector of all values you want to average (where n is the index of the date-time-row it belongs to). This you need to do for all different date-time-combinations.
A more vector-oriented way would be to use accumarray, which leads to a solution of your problem in two lines:
[DateAndTime,~,idx] = unique(DN(:,1:2),'rows');
Price = accumarray(idx,DN(:,3),[],#mean);
I'm not quite sure how you want the result to look like, but [DataAndTime Price] gives you the three-row format of the input again.
Note that if your input contains something like:
1 0.1 23
1 0.2 47
1 0.1 42
1 0.1 23
then the result of applying unique(...,'rows') to the input before the above lines will give a different result for 1 0.1 than using the above directly, as the latter would calculate the mean of 23, 23 and 42, while in the former case one 23 would be eliminates as duplicate before and the differing row with 42 would have a greater weight in the average.
Try the following:
[Qrows, cols] = size(DN);
% C is your result matrix
C = DN;
% this will give you the indexes where DN(i,:)==DN(i+1)
i = find(diff(DN(:,2)==0);
% replace C(i,:) with the average
C(i,:) = (DN(i,:)+DN(i+1,:))/2;
% delete the C(i+1,:) rows
C(i,:) = [];
Hope this works.
This should work if the repeated time values come in pairs (the average is calculated between i and i+1). Should you have time repeats of 3 or more then try to rethink how to change these steps.
Something like this would work, but I did not run the code so I can't promise there's no bugs.
newX = unique(DN(:,2));
newY = zeros(1,length(newX));
for ix = 1:length(newX)
allOcurrences = find(DN(:,2)==DN(i,2));
% If there's duplicates, take their mean
if numel(allOcurrences)>1
newY(ix) = mean(DN(allOcurrences,3));
else
% If not, use the only Y value
newY(ix) = DN(ix,3);
end
end

matlab updating time vector

I have 19 cells (19x1) with temperature data for an entire year where the first 18 cells represent 20 days (each) and the last cell represents 5 days, hence (18*20)+5 = 365days.
In each cell there should be 7200 measurements (apart from cell 19) where each measurement is taken every 4 minutes thus 360 measurements per day (360*20 = 7200).
The time vector for the measurements is only expressed as day number i.e. 1,2,3...and so on (thus no decimal day),
which is therefore displayed as 360 x 1's... and so on.
As the sensor failed during some days, some of the cells contain less than 7200 measurements, where one in
particular only contains 858 rows, which looks similar to the following example:
a=rand(858,3);
a(1:281,1)=1;
a(281:327,1)=2;
a(327:328,1)=5;
a(329:330,1)=9;
a(331:498,1)=19;
a(499:858,1)=20;
Where column 1 = day, column 2 and 3 are the data.
By knowing that each day number should be repeated 360 times is there a method for including an additional
amount of every value from 1:20 in order to make up the 360. For example, the first column requires
79 x 1's, 46 x 2's, 360 x 3's... and so on; where the final array should therefore have 7200 values in
order from 1 to 20.
If this is possible, in the rows where these values have been added, the second and third column should
changed to nan.
I realise that this is an unusual question, and that it is difficult to understand what is asked, but I hope I have been clear in expressing what i'm attempting to
acheive. Any advice would be much appreciated.
Here's one way to do it for a given element of the cell matrix:
full=zeros(7200,3)+NaN;
for i = 1:20 % for each day
starti = (i-1)*360; % find corresponding 360 indices into full array
full( starti + (1:360), 1 ) = i; % assign the day
idx = find(a(:,1)==i); % find any matching data in a for that day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % copy matching data over
end
You could probably use arrayfun to make this slicker, and maybe (??) faster.
You could make this into a function and use cellfun to apply it to your cell.
PS - if you ask your question at the Matlab help forums you'll most definitely get a slicker & more efficient answer than this. Probably involving bsxfun or arrayfun or accumarray or something like that.
Update - to do this for each element in the cell array the only change is that instead of searching for i as the day number you calculate it based on how far allong the cell array you are. You'd do something like (untested):
for k = 1:length(cellarray)
for i = 1:length(cellarray{k})
starti = (i-1)*360; % ... as before
day = (k-1)*20 + i; % first cell is days 1-20, second is 21-40,...
full( starti + (1:360),1 ) = day; % <-- replace i with day
idx = find(a(:,1)==day); % <-- replace i with day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % same as before
end
end
I am not sure I understood correctly what you want to do but this below works out how many measurements you are missing for each day and add at the bottom of your 'a' matrix additional lines so you do get the full 7200x3 matrix.
nbMissing = 7200-size(a,1);
a1 = nan(nbmissing,3)
l=0
for i = 1:20
nbMissing_i = 360-sum(a(:,1)=i);
a1(l+1:l+nbMissing_i,1)=i;
l = l+nb_Missing_i;
end
a_filled = [a;a1];