I am struggling with the code for calculating the sum of molecules produced in an interval and I think the issue is with establishing the interval the sum is being taken at.
prod_time_seq = total_time as a vector
bin_interval= 1;
number_min= 720;
prod_per_min= zeros(bin_interval,number_min);
for i=2:number_min
int(i) = 1 + ceil((prod_time_seq(i)-prod_time_seq(1))./60); %time interval
prod_per_min(i) = sum(int(i)); %sum of the molecules of each array
end
Assuming that you are trying to calculate the sum of the elements in each interval, this can be a solution (adding zero as the initial point in the interval of summation):
for i=1:number_min
int(i) = 1 + ceil((prod_time_seq(i)-prod_time_seq(1))./60); %time interval
prod_per_min(i) = sum(0:int(i)); %sum of the molecules of each array
end
In your case, int(i) is a single value and summation will return the value itself. But to be able to create an interval for each case, you need to define a starting point such as 0:int(i), which means the values from zero up to int(i).
Related
Assume that I have vector shown in the figure below. By common sense, we can see that there are 2 values which suddenly depart from the trend of the vector.
How do I eliminate these sudden changes. I mean how do I automatically detect and replace these noise values by the average value of their neighbors.
Define a threshold, compute the average values, then compare the relative error between the values and the averages of their neighbors:
threshold = 5e-2;
averages = [v(1); (v(3:end) + v(1:end-2)) / 2; v(end)];
is_outlier = (v.^2 - averages.^2) > threshold^2 * averages.^2;
Then replace the outliers:
v(is_outlier) = averages(is_outlier);
Consider the following example
xDATA = data_timestamp;
[~,~,Days,Hour,Min,~] = datevec(xDATA(2:end) - xDATA(1:end - 1));
BadSamplingTime = find((Days)> 0 | (Hour)> 0 |(Min)> 5 );
In which xData contains a vector of time stamps and I am trying to find the samples with sampling time greater than 5 mins in between , the algorithm works fine but it creates 3 extra vectors for the data as big as my timestamp vector(the size to time stamp vector is pretty huge) whereas if I do this
DurationTime = xDATA(2:end) - xDATA(1:end - 1);
Instead of the second line it will just create one vector of same length of 'duration' data type which will be much easier to handle because but the problem I cant seem to access each index of the duration data type
for example
DurationTime(5,1)
ans =
26:00:01
I need to access this 26 hours part , does anyone have any idea how to do that ? or a better suggestion
You can create a duration-object and then use it to compare it with the duration vector DurationTime. The result of a>b is a logical vector that can be directly used to index the elements of DurationTime and thus giving you all the values where the duration is greater than 5 minutes.
Sidenote: You can calculate the difference/duration directly with diff.
Code:
% create example data
xDATA = (([0:4,4+26*60,4+26*60+1:4+26*60+5])/24/60+datetime('now')).';
% calculate the durations
DurationTime = xDATA(2:end) - xDATA(1:end-1); % as in the question
%DurationTime = diff(xDATA); % alternative
% get index and values of all durations greater than 5 minutes
ind = find(DurationTime>duration(0,5,0))
DurationTime(ind)
% get values of all durations greater than 5 minutes (direct solution, if no index needed)
DurationTime(DurationTime>duration(0,5,0));
Result:
ind =
5
ans =
26:00:00
I have a matrix in MATLAB of 50572x4 doubles. The last column has datenum format dates, increasing values from 7.3025e+05 to 7.3139e+05. The question is:
How can I split this matrix into sub-matrices, each that cover intervals of 30 days?
If I'm not being clear enough… the difference between the first element in the 4th column and the last element in the 4th column is 7.3139e5 − 7.3025e5 = 1.1376e3, or 1137.6. I would like to partition this into 30 day segments, and get a bunch of matrices that have a range of 30 for the 4th columns. I'm not quite sure how to go about doing this...I'm quite new to MATLAB, but the dataset I'm working with has only this representation, necessitating such an action.
Note that a unit interval between datenum timestamps represents 1 day, so your data, in fact, covers a time period of 1137.6 days). The straightforward approach is to compare each timestamps with the edges in order to determine which 30-day interval it belongs to:
t = A(:, end) - min(A:, end); %// Normalize timestamps to start from 0
idx = sum(bsxfun(#lt, t, 30:30:max(t))); %// Starting indices of intervals
rows = diff([0, idx, numel(t)]); %// Number of rows in each interval
where A is your data matrix, where the last column is assumed to contain the timestamps. rows stores the number of rows of the corresponding 30-day intervals. Finally, you can employ cell arrays to split the original data matrix:
C = mat2cell(A, rows, size(A, 2)); %// Split matrix into intervals
C = C(~cellfun('isempty', C)); %// Remove empty matrices
Hope it helps!
Well, all you need is to find the edge times and the matrix indexes in between them. So, if your numbers are at datenum format, one unit is the same as one day, which means that we can jump from 30 and 30 units until we get as close as we can to the end, as follows:
startTime = originalMatrix(1,4);
endTime = originalMatrix(end,4);
edgeTimes = startTime:30:endTime;
% And then loop though the edges checking for samples that complete a cycle:
nEdges = numel(edgeTimes);
totalMeasures = size(originalMatrix,1);
subMatrixes = cell(1,nEdges);
prevEdgeIdx = 0;
for curEdgeIdx = 1:nEdges
nearIdx=getNearestIdx(originalMatrix(:,4),edgeTimes(curEdgeIdx));
if originalMatrix(nearIdx,4)>edgeTimes(curEdgeIdx)
nearIdx = nearIdx-1;
end
if nearIdx>0 && nearIdx<=totalMeasures
subMatrix{curEdgeIdx} = originalMatrix(prevEdgeIdx+1:curEdgeIdx,:);
prevEdgeIdx=curEdgeIdx;
else
error('For some reason the edge was not inbound.');
end
end
% Now we check for the remaining days after the edges which does not complete a 30 day cycle:
if curEdgeIdx<totalMeasures
subMatrix{end+1} = originalMatrix(curEdgeIdx+1:end,:);
end
The function getNearestIdx was discussed here and it gives you the nearest point from the input values without checking all possible points.
function vIdx = getNearestIdx(values,point)
if isempty(values) || ~numel(values)
vIdx = [];
return
end
vIdx = 1+round((point-values(1))*(numel(values)-1)...
/(values(end)-values(1)));
if vIdx < 1, vIdx = []; end
if vIdx > numel(values), vIdx = []; end
end
Note: This is pseudocode and may contain errors. Please try to adjust it into your problem.
I have 19 cells (19x1) with temperature data for an entire year where the first 18 cells represent 20 days (each) and the last cell represents 5 days, hence (18*20)+5 = 365days.
In each cell there should be 7200 measurements (apart from cell 19) where each measurement is taken every 4 minutes thus 360 measurements per day (360*20 = 7200).
The time vector for the measurements is only expressed as day number i.e. 1,2,3...and so on (thus no decimal day),
which is therefore displayed as 360 x 1's... and so on.
As the sensor failed during some days, some of the cells contain less than 7200 measurements, where one in
particular only contains 858 rows, which looks similar to the following example:
a=rand(858,3);
a(1:281,1)=1;
a(281:327,1)=2;
a(327:328,1)=5;
a(329:330,1)=9;
a(331:498,1)=19;
a(499:858,1)=20;
Where column 1 = day, column 2 and 3 are the data.
By knowing that each day number should be repeated 360 times is there a method for including an additional
amount of every value from 1:20 in order to make up the 360. For example, the first column requires
79 x 1's, 46 x 2's, 360 x 3's... and so on; where the final array should therefore have 7200 values in
order from 1 to 20.
If this is possible, in the rows where these values have been added, the second and third column should
changed to nan.
I realise that this is an unusual question, and that it is difficult to understand what is asked, but I hope I have been clear in expressing what i'm attempting to
acheive. Any advice would be much appreciated.
Here's one way to do it for a given element of the cell matrix:
full=zeros(7200,3)+NaN;
for i = 1:20 % for each day
starti = (i-1)*360; % find corresponding 360 indices into full array
full( starti + (1:360), 1 ) = i; % assign the day
idx = find(a(:,1)==i); % find any matching data in a for that day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % copy matching data over
end
You could probably use arrayfun to make this slicker, and maybe (??) faster.
You could make this into a function and use cellfun to apply it to your cell.
PS - if you ask your question at the Matlab help forums you'll most definitely get a slicker & more efficient answer than this. Probably involving bsxfun or arrayfun or accumarray or something like that.
Update - to do this for each element in the cell array the only change is that instead of searching for i as the day number you calculate it based on how far allong the cell array you are. You'd do something like (untested):
for k = 1:length(cellarray)
for i = 1:length(cellarray{k})
starti = (i-1)*360; % ... as before
day = (k-1)*20 + i; % first cell is days 1-20, second is 21-40,...
full( starti + (1:360),1 ) = day; % <-- replace i with day
idx = find(a(:,1)==day); % <-- replace i with day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % same as before
end
end
I am not sure I understood correctly what you want to do but this below works out how many measurements you are missing for each day and add at the bottom of your 'a' matrix additional lines so you do get the full 7200x3 matrix.
nbMissing = 7200-size(a,1);
a1 = nan(nbmissing,3)
l=0
for i = 1:20
nbMissing_i = 360-sum(a(:,1)=i);
a1(l+1:l+nbMissing_i,1)=i;
l = l+nb_Missing_i;
end
a_filled = [a;a1];
I have a bunch of times-series each described by two components, a timestamp vector (in seconds), and a vector of values measured. The time vector is non-uniform (i.e. sampled at non-regular intervals)
I am trying to compute the mean/SD of each 1-minutes interval of values (take X minute interval, compute its mean, take the next interval, ...).
My current implementation uses loops. This is a sample of what I have so far:
t = (100:999)' + rand(900,1); %' non-uniform time
x = 5*rand(900,1) + 10; % x(i) is the value at time t(i)
interval = 1; % 1-min interval
tt = ( floor(t(1)):interval*60:ceil(t(end)) )'; %' stopping points of each interval
N = length(tt)-1;
mu = zeros(N,1);
sd = zeros(N,1);
for i=1:N
indices = ( tt(i) <= t & t < tt(i+1) ); % find t between tt(i) and tt(i+1)
mu(i) = mean( x(indices) );
sd(i) = std( x(indices) );
end
I am wondering if there a faster vectorized solution. This is important because I have a large number of time-series to process each much longer than the sample shown above..
Any help is welcome.
Thank you all for the feedback.
I corrected the way t is generated to be always monotonically increasing (sorted), this was not really an issue..
Also, I may not have stated this clearly but my intention was to have a solution for any interval length in minutes (1-min was just an example)
The only logical solution seems to be...
Ok. I find it funny that to me there is only one logical solution, but many others find other solutions. Regardless, the solution does seem simple. Given the vectors x and t, and a set of equally spaced break points tt,
t = sort((100:999)' + 3*rand(900,1)); % non-uniform time
x = 5*rand(900,1) + 10; % x(i) is the value at time t(i)
tt = ( floor(t(1)):1*60:ceil(t(end)) )';
(Note that I sorted t above.)
I would do this in three fully vectorized lines of code. First, if the breaks were arbitrary and potentially unequal in spacing, I would use histc to determine which intervals the data series falls in. Given they are uniform, just do this:
int = 1 + floor((t - t(1))/60);
Again, if the elements of t were not known to be sorted, I would have used min(t) instead of t(1). Having done that, use accumarray to reduce the results into a mean and standard deviation.
mu = accumarray(int,x,[],#mean);
sd = accumarray(int,x,[],#std);
You could try and create a cell array and apply mean and std via cellfun. It's ~10% slower than your solution for 900 entries, but ~10x faster for 90000 entries.
[t,sortIdx]=sort(t); %# we only need to sort in case t is not monotonously increasing
x = x(sortIdx);
tIdx = floor(t/60); %# convert seconds to minutes - can also convert to 5 mins by dividing by 300
tIdx = tIdx - min(tIdx) + 1; %# tIdx now is a vector of indices - i.e. it starts at 1, and should go like your iteration variable.
%# the next few commands are to count how many 1's 2's 3's etc are in tIdx
dt = [tIdx(2:end)-tIdx(1:end-1);1];
stepIdx = [0;find(dt>0)];
nIdx = stepIdx(2:end) - stepIdx(1:end-1); %# number of times each index appears
%# convert to cell array
xCell = mat2cell(x,nIdx,1);
%# use cellfun to calculate the mean and sd
mu(tIdx(stepIdx+1)) = cellfun(#mean,xCell); %# the indexing is like that since there may be missing steps
sd(tIdx(stepIdx+1)) = cellfun(#mean,xCell);
Note: my solution does not give the exact same results as yours, since you skip a few time values at the end (1:60:90 is [1,61]), and since the start of the interval is not exactly the same.
Here's a way that uses binary search. It is 6-10x faster for 9900 elements and about 64x times faster for 99900 elements. It was hard to get reliable times using only 900 elements so I'm not sure which is faster at that size. It uses almost no extra memory if you consider making tx directly from the generated data. Other than that it just has four extra float variables (prevind, first, mid, and last).
% Sort the data so that we can use binary search (takes O(N logN) time complexity).
tx = sortrows([t x]);
prevind = 1;
for i=1:N
% First do a binary search to find the end of this section
first = prevind;
last = length(tx);
while first ~= last
mid = floor((first+last)/2);
if tt(i+1) > tx(mid,1)
first = mid+1;
else
last = mid;
end;
end;
mu(i) = mean( tx(prevind:last-1,2) );
sd(i) = std( tx(prevind:last-1,2) );
prevind = last;
end;
It uses all of the variables that you had originally. I hope that it suits your needs. It is faster because it takes O(log N) to find the indices with binary search, but O(N) to find them the way you were doing it.
You can compute indices all at once using bsxfun:
indices = ( bsxfun(#ge, t, tt(1:end-1)') & bsxfun(#lt, t, tt(2:end)') );
This is faster than looping but requires storing them all at once (time vs space tradeoff)..
Disclaimer: I worked this out on paper, but haven't yet had the opportunity to check it "in silico"...
You may be able to avoid loops or using cell arrays by doing some tricky cumulative sums, indexing, and calculating the means and standard deviations yourself. Here's some code that I believe will work, although I am unsure how it stacks up speed-wise to the other solutions:
[t,sortIndex] = sort(t); %# Sort the time points
x = x(sortIndex); %# Sort the data values
interval = 60; %# Interval size, in seconds
intervalIndex = floor((t-t(1))./interval)+1; %# Collect t into intervals
nIntervals = max(intervalIndex); %# The number of intervals
mu = zeros(nIntervals,1); %# Preallocate mu
sd = zeros(nIntervals,1); %# Preallocate sd
sumIndex = [find(diff(intervalIndex)) ...
numel(intervalIndex)]; %# Find indices of the interval ends
n = diff([0 sumIndex]); %# Number of samples per interval
xSum = cumsum(x); %# Cumulative sum of x
xSum = diff([0 xSum(sumIndex)]); %# Sum per interval
xxSum = cumsum(x.^2); %# Cumulative sum of x^2
xxSum = diff([0 xxSum(sumIndex)]); %# Squared sum per interval
intervalIndex = intervalIndex(sumIndex); %# Find index into mu and sd
mu(intervalIndex) = xSum./n; %# Compute mean
sd(intervalIndex) = sqrt((xxSum-xSum.*xSum./n)./(n-1)); %# Compute std dev
The above computes the standard deviation using the simplification of the formula found on this Wikipedia page.
The same answer as above but with the parametric interval (window_size).
Issue with the vector lengths solved as well.
window_size = 60; % but it can be any value 60 5 0.1, which wasn't described above
t = sort((100:999)' + 3*rand(900,1)); % non-uniform time
x = 5*rand(900,1) + 10; % x(i) is the value at time t(i)
int = 1 + floor((t - t(1))/window_size);
tt = ( floor(t(1)):window_size:ceil(t(end)) )';
% mean val and std dev of the accelerations at speed
mu = accumarray(int,x,[],#mean);
sd = accumarray(int,x,[],#std);
%resolving some issue with sizes (for i.e. window_size = 1 in stead of 60)
while ( sum(size(tt) > size(mu)) > 0 )
tt(end)=[];
end
errorbar(tt,mu,sd);