I have a matrix ('data') which is composed of day of year in the first column and temperature data in the second column and also a vector ('dDates') of values which also represent day of year. e.g.
clear all
n = 366;
day = linspace(1+1/24,n,(n-1)*24)';
temp = rand(8760,1);
data = [day,temp];
dDates = [12, 32, 45, 67];
I'm trying to alter 'data' so that it only contains data which is measured for the 120 rows following the day number specified in 'dDates' although being the same size as the original 'data' i.e. the other rows filled with nans.
Each measurement in 'data' refers to one hour so 120 refers to 5 days worth of data.
So far I have used:
[r,c,v] = find(data(:,1)>dDates(1),1,'first');
inside a loop to find the row number of each element of 'dDates' in 'dates' but am finding it difficult to bring everything together to produce the outcome that I need. How should I go about doing this?
There's a lot going on here so I made several changes. Generating data can be done avoiding the LINSPACE command. Also, the temperature data is initialized to NaN:
>> n = 366;
>> data = [(1:1/24:(n-1/24))' ones(8760, 1)*NaN];
You can avoid using a loop to find the start indices of each date of interest by calculating the start indices since the date column in data is evenly spaced in time:
>> dDates = [12 32 45 67];
>> startIndex = (dDates - 1) * 24 + 1;
Verify that this worked:
>> data(startIndex)
ans =
12 32 45 67
Generate the random temperature data for the dates of interest:
>> temp = rand(120*length(dDates), 1);
Determine the indices into data that correspond to the start dates of interest and the following 120 records:
>> targetIndex = repmat((0:119)', 1, length(dDates)) + repmat(startIndex, 120, 1);
Slot the random data into the target locations:
>> data(targetIndex, 2) = temp;
Verify that this worked:
Related
Consider the following example
xDATA = data_timestamp;
[~,~,Days,Hour,Min,~] = datevec(xDATA(2:end) - xDATA(1:end - 1));
BadSamplingTime = find((Days)> 0 | (Hour)> 0 |(Min)> 5 );
In which xData contains a vector of time stamps and I am trying to find the samples with sampling time greater than 5 mins in between , the algorithm works fine but it creates 3 extra vectors for the data as big as my timestamp vector(the size to time stamp vector is pretty huge) whereas if I do this
DurationTime = xDATA(2:end) - xDATA(1:end - 1);
Instead of the second line it will just create one vector of same length of 'duration' data type which will be much easier to handle because but the problem I cant seem to access each index of the duration data type
for example
DurationTime(5,1)
ans =
26:00:01
I need to access this 26 hours part , does anyone have any idea how to do that ? or a better suggestion
You can create a duration-object and then use it to compare it with the duration vector DurationTime. The result of a>b is a logical vector that can be directly used to index the elements of DurationTime and thus giving you all the values where the duration is greater than 5 minutes.
Sidenote: You can calculate the difference/duration directly with diff.
Code:
% create example data
xDATA = (([0:4,4+26*60,4+26*60+1:4+26*60+5])/24/60+datetime('now')).';
% calculate the durations
DurationTime = xDATA(2:end) - xDATA(1:end-1); % as in the question
%DurationTime = diff(xDATA); % alternative
% get index and values of all durations greater than 5 minutes
ind = find(DurationTime>duration(0,5,0))
DurationTime(ind)
% get values of all durations greater than 5 minutes (direct solution, if no index needed)
DurationTime(DurationTime>duration(0,5,0));
Result:
ind =
5
ans =
26:00:00
I have a matrix in MATLAB of 50572x4 doubles. The last column has datenum format dates, increasing values from 7.3025e+05 to 7.3139e+05. The question is:
How can I split this matrix into sub-matrices, each that cover intervals of 30 days?
If I'm not being clear enough… the difference between the first element in the 4th column and the last element in the 4th column is 7.3139e5 − 7.3025e5 = 1.1376e3, or 1137.6. I would like to partition this into 30 day segments, and get a bunch of matrices that have a range of 30 for the 4th columns. I'm not quite sure how to go about doing this...I'm quite new to MATLAB, but the dataset I'm working with has only this representation, necessitating such an action.
Note that a unit interval between datenum timestamps represents 1 day, so your data, in fact, covers a time period of 1137.6 days). The straightforward approach is to compare each timestamps with the edges in order to determine which 30-day interval it belongs to:
t = A(:, end) - min(A:, end); %// Normalize timestamps to start from 0
idx = sum(bsxfun(#lt, t, 30:30:max(t))); %// Starting indices of intervals
rows = diff([0, idx, numel(t)]); %// Number of rows in each interval
where A is your data matrix, where the last column is assumed to contain the timestamps. rows stores the number of rows of the corresponding 30-day intervals. Finally, you can employ cell arrays to split the original data matrix:
C = mat2cell(A, rows, size(A, 2)); %// Split matrix into intervals
C = C(~cellfun('isempty', C)); %// Remove empty matrices
Hope it helps!
Well, all you need is to find the edge times and the matrix indexes in between them. So, if your numbers are at datenum format, one unit is the same as one day, which means that we can jump from 30 and 30 units until we get as close as we can to the end, as follows:
startTime = originalMatrix(1,4);
endTime = originalMatrix(end,4);
edgeTimes = startTime:30:endTime;
% And then loop though the edges checking for samples that complete a cycle:
nEdges = numel(edgeTimes);
totalMeasures = size(originalMatrix,1);
subMatrixes = cell(1,nEdges);
prevEdgeIdx = 0;
for curEdgeIdx = 1:nEdges
nearIdx=getNearestIdx(originalMatrix(:,4),edgeTimes(curEdgeIdx));
if originalMatrix(nearIdx,4)>edgeTimes(curEdgeIdx)
nearIdx = nearIdx-1;
end
if nearIdx>0 && nearIdx<=totalMeasures
subMatrix{curEdgeIdx} = originalMatrix(prevEdgeIdx+1:curEdgeIdx,:);
prevEdgeIdx=curEdgeIdx;
else
error('For some reason the edge was not inbound.');
end
end
% Now we check for the remaining days after the edges which does not complete a 30 day cycle:
if curEdgeIdx<totalMeasures
subMatrix{end+1} = originalMatrix(curEdgeIdx+1:end,:);
end
The function getNearestIdx was discussed here and it gives you the nearest point from the input values without checking all possible points.
function vIdx = getNearestIdx(values,point)
if isempty(values) || ~numel(values)
vIdx = [];
return
end
vIdx = 1+round((point-values(1))*(numel(values)-1)...
/(values(end)-values(1)));
if vIdx < 1, vIdx = []; end
if vIdx > numel(values), vIdx = []; end
end
Note: This is pseudocode and may contain errors. Please try to adjust it into your problem.
I have observed daily data that I need to compare to generated Monthly data so I need to get a mean of each month over the thirty year period.
My observed data set is currently in 365x31 with rows being each day (no leap years!) and the extra column being the month number (1-12).
the problem I am having is that I can only seem to get a script to get the mean of all years. ie. I cannot figure how to get the script to do it for each column separately. Example of the data is below:
1 12 14
1 -15 10
2 13 3
2 2 37
...all the way to 12 for 365 rows
SO: to recap, I need to get the mean of [12; -15; 13; 2] then [14; 10; 3; 37] and so on.
I have been trying to use the unique() function to loop through which works for getting the number rows to average but incorrect means. Now I need it to do each month(28-31 rows) and column individually. Result should be a 12x30 matrix. I feel like I am missing something SIMPLE. Code:
u = unique(m); %get unique values of m (months) ie) 1-12
for i=1:length(u)
month(i) = mean(obatm(u(i), (2:31)); % the average for each month of each year
end
Appreciate any ideas! Thanks!
You can simply filter the rows for each month and then apply mean, like so:
month = zeros(12, size(obatm, 2));
for k = 1:12
month(k, :) = mean(obatm(obatm(:, 1) == k, :));
end
EDIT:
If you want something fancy, you can also do this:
cols = size(obatm, 2) - 1;
subs = bsxfun(#plus, obatm(:, 1), (0:12:12 * (cols - 1)));
vals = obatm(:, 2:end);
month = reshape(accumarray(subs(:), vals(:), [12 * cols, 1], #mean), 12, cols)
Look, Ma, no loops!
so I have a matrix Data in this format:
Data = [Date Time Price]
Now what I want to do is plot the Price against the Time, but my data is very large and has lines where there are multiple Prices for the same Date/Time, e.g. 1st, 2nd lines
29 733575.459548611 40.0500000000000
29 733575.459548611 40.0600000000000
29 733575.459548612 40.1200000000000
29 733575.45954862 40.0500000000000
I want to take an average of the prices with the same Date/Time and get rid of any extra lines. My goal is to do linear intrapolation on the values which is why I must have only one Time to one Price value.
How can I do this? I did this (this reduces the matrix so that it only takes the first line for the lines with repeated date/times) but I don't know how to take the average
function [ C ] = test( DN )
[Qrows, cols] = size(DN);
C = DN(1,:);
for i = 1:(Qrows-1)
if DN(i,2) == DN(i+1,2)
%n = 1;
%while DN(i,2) == DN(i+n,2) && i+n<Qrows
% n = n + 1;
%end
% somehow take average;
else
C = [C;DN(i+1,:)];
end
end
[C,ia,ic] = unique(A,'rows') also returns index vectors ia and ic
such that C = A(ia,:) and A = C(ic,:)
If you use as input A only the columns you do not want to average over (here: date & time), ic with one value for every row where rows you want to combine have the same value.
Getting from there to the means you want is for MATLAB beginners probably more intuitive with a for loop: Use logical indexing, e.g. DN(ic==n,3) you get a vector of all values you want to average (where n is the index of the date-time-row it belongs to). This you need to do for all different date-time-combinations.
A more vector-oriented way would be to use accumarray, which leads to a solution of your problem in two lines:
[DateAndTime,~,idx] = unique(DN(:,1:2),'rows');
Price = accumarray(idx,DN(:,3),[],#mean);
I'm not quite sure how you want the result to look like, but [DataAndTime Price] gives you the three-row format of the input again.
Note that if your input contains something like:
1 0.1 23
1 0.2 47
1 0.1 42
1 0.1 23
then the result of applying unique(...,'rows') to the input before the above lines will give a different result for 1 0.1 than using the above directly, as the latter would calculate the mean of 23, 23 and 42, while in the former case one 23 would be eliminates as duplicate before and the differing row with 42 would have a greater weight in the average.
Try the following:
[Qrows, cols] = size(DN);
% C is your result matrix
C = DN;
% this will give you the indexes where DN(i,:)==DN(i+1)
i = find(diff(DN(:,2)==0);
% replace C(i,:) with the average
C(i,:) = (DN(i,:)+DN(i+1,:))/2;
% delete the C(i+1,:) rows
C(i,:) = [];
Hope this works.
This should work if the repeated time values come in pairs (the average is calculated between i and i+1). Should you have time repeats of 3 or more then try to rethink how to change these steps.
Something like this would work, but I did not run the code so I can't promise there's no bugs.
newX = unique(DN(:,2));
newY = zeros(1,length(newX));
for ix = 1:length(newX)
allOcurrences = find(DN(:,2)==DN(i,2));
% If there's duplicates, take their mean
if numel(allOcurrences)>1
newY(ix) = mean(DN(allOcurrences,3));
else
% If not, use the only Y value
newY(ix) = DN(ix,3);
end
end
I have 19 cells (19x1) with temperature data for an entire year where the first 18 cells represent 20 days (each) and the last cell represents 5 days, hence (18*20)+5 = 365days.
In each cell there should be 7200 measurements (apart from cell 19) where each measurement is taken every 4 minutes thus 360 measurements per day (360*20 = 7200).
The time vector for the measurements is only expressed as day number i.e. 1,2,3...and so on (thus no decimal day),
which is therefore displayed as 360 x 1's... and so on.
As the sensor failed during some days, some of the cells contain less than 7200 measurements, where one in
particular only contains 858 rows, which looks similar to the following example:
a=rand(858,3);
a(1:281,1)=1;
a(281:327,1)=2;
a(327:328,1)=5;
a(329:330,1)=9;
a(331:498,1)=19;
a(499:858,1)=20;
Where column 1 = day, column 2 and 3 are the data.
By knowing that each day number should be repeated 360 times is there a method for including an additional
amount of every value from 1:20 in order to make up the 360. For example, the first column requires
79 x 1's, 46 x 2's, 360 x 3's... and so on; where the final array should therefore have 7200 values in
order from 1 to 20.
If this is possible, in the rows where these values have been added, the second and third column should
changed to nan.
I realise that this is an unusual question, and that it is difficult to understand what is asked, but I hope I have been clear in expressing what i'm attempting to
acheive. Any advice would be much appreciated.
Here's one way to do it for a given element of the cell matrix:
full=zeros(7200,3)+NaN;
for i = 1:20 % for each day
starti = (i-1)*360; % find corresponding 360 indices into full array
full( starti + (1:360), 1 ) = i; % assign the day
idx = find(a(:,1)==i); % find any matching data in a for that day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % copy matching data over
end
You could probably use arrayfun to make this slicker, and maybe (??) faster.
You could make this into a function and use cellfun to apply it to your cell.
PS - if you ask your question at the Matlab help forums you'll most definitely get a slicker & more efficient answer than this. Probably involving bsxfun or arrayfun or accumarray or something like that.
Update - to do this for each element in the cell array the only change is that instead of searching for i as the day number you calculate it based on how far allong the cell array you are. You'd do something like (untested):
for k = 1:length(cellarray)
for i = 1:length(cellarray{k})
starti = (i-1)*360; % ... as before
day = (k-1)*20 + i; % first cell is days 1-20, second is 21-40,...
full( starti + (1:360),1 ) = day; % <-- replace i with day
idx = find(a(:,1)==day); % <-- replace i with day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % same as before
end
end
I am not sure I understood correctly what you want to do but this below works out how many measurements you are missing for each day and add at the bottom of your 'a' matrix additional lines so you do get the full 7200x3 matrix.
nbMissing = 7200-size(a,1);
a1 = nan(nbmissing,3)
l=0
for i = 1:20
nbMissing_i = 360-sum(a(:,1)=i);
a1(l+1:l+nbMissing_i,1)=i;
l = l+nb_Missing_i;
end
a_filled = [a;a1];