I have a 1-by-1000 vector containing datenums for events, and I want to count total events per date (simplified for this example). The dimensions of dates and events agree.
My code is
i = 0
for d = unique(dates)
i = i + 1
result(i) = length(events(d == dates))
end
I get a dimension mismatch for d == dates. I understand why (d is a 1-by-1 vector), but how do I write this properly?
Bonus points: The solution with i is pretty ugly... hints?
Thanks!
edit by request:
dates contains datenums
729028
729028
729028
729028
729028
and events contains floats:
0.1205
0.2932
2.0384
2.0384
1.0411
0.5425
The problem is that unique(dates) is a column vector, and for steps through columns of whatever is on the right-hand side of the equal sign. Thus, d is a vector, not a scalar in your original code.
To get the code what you want to do:
for d = unique(dates)'
To avoid the loop:
d = hist(dates,unique(dates));
You only need to compute how many times each value of dates is repeated. You can do this with bsxfun:
uniqueDates = unique(dates);
count = sum(bsxfun(#eq, uniqueDates(:), dates(:).'),2);
Each entry of count corresponds to the same-index entry of uniqueDates.
Example: for dates = [729028; 729028; 729000; 729028; 729100] the result is
uniqueDates =
729000
729028
729100
count =
1
3
1
Related
I am facing an issue with counting number of occurrences by date, suppose I have an excel file where the data is as follows:
1/1/2001 23
1/1/2001 29
1/1/2001 24
3/1/2001 22
3/1/2001 23
My desired output is:
1/1/2001 3
2/1/2001 0
3/1/2001 2
Though 2/1/2001 does't appear in the input, I want that included in the output with 0 counts. This is my current code:
[Value, Time] = xlsread('F:\1km\fire\2001- 02\2001_02.xlsx','Sheet1','A2:D159','',#convertSpreadsheetExcelDates);
tm=datenum(Time);
val=Value(:,4);
data=[tm val];
% a=(datestr(tm));
T1=datetime('9/23/2001');
T2=datetime('6/23/2002');
T = T1:T2;
tm_all=datenum(T);
[~, idx] = ismember(tm_all,data(:,1));
% idx=idx';
out = tm_all(idx);
The ismember function does not seem to work, because the length of tm_all is 274 and the size of data is 158x2
I suggest you to use datetime instead of datenum for converting your date strings into a serial representation, this can make (not only) the whole computation much easier:
tm = datetime({
'1/1/2001';
'1/1/2001';
'1/1/2001';
'3/1/2001';
'3/1/2001'
},'InputFormat','dd/MM/yyyy');
Once you have obtained your datetime vector, the calculation can be achieved as follows:
% Create a sequence of datetimes from the first date to the last date...
T = (min(tm):max(tm)).';
% Build an indexing of every occurrence to the regards of the sequence...
[~,idx] = ismember(tm,T);
% Count the occurrences for every occurrence...
C = accumarray(idx,1);
% Put unique dates and occurrences together into a single variable...
res = table(T,C)
Here is the output:
res =
T C
___________ _
01-Jan-2001 3
02-Jan-2001 0
03-Jan-2001 2
For more information about the functions used within the computation:
accumarray function
ismember function
On a side note, I didn't understand whether your dates are in dd/MM/yyyy or in MM/dd/yyyy format... because with the latter, you cannot have that output using my approach, and you should also implement an algorithm for detecting the current month and then splitting your data over a monthly (and eventually yearly, if your dates span over 2001) criterion instead:
tm = datetime({
'1/1/2001';
'1/1/2001';
'1/1/2001';
'3/1/2001';
'3/1/2001'
},'InputFormat','MM/dd/yyyy');
M = month(tm);
M_seq = (min(M):max(M)).';
[~,idx] = ismember(M,M_seq);
C = accumarray(idx,1);
res = table(datetime(2001,M_seq,1),C)
res =
Var1 C
___________ _
01-Jan-2001 3
01-Feb-2001 0
01-Mar-2001 2
I'll first give the code and then explain step by step.
code:
[Value, Time] = xlsread('stack','A1:D159','',#convertSpreadsheetExcelDates);
tm=datenum(Time);
val=Value(:,4);
data=[tm val];
a=(datestr(tm));
T1=datetime('1/1/2001');
T2=datetime('6/23/2002');
T = T1:T2;
tm_all=datenum(T);
[~, idx] = ismember(tm_all,data(:,1)); % get indices
[occurence,dates]= hist(data(:,1),unique(data(:,1))); % count occurences of dates from file
t = [0;data(:,1)]; % add 0 to dates (for later because MATLAB starts at 1
[~,idx] = ismember(t(idx+1),dates); % get incides
q = [0 occurence]; % add 0 to occurence (for later because MATLAB starts at 1
occ = q(idx+1); % make vector with occurences
out = [tm_all' occ']; % output
idx of ismember is an 1xlength(tm_all) vector that at position i contains the lowest index of where tm_all(i) is found in data(:,1). So take for example A = [1 2 3 4] and B = [1 1 2 4] then for [~,idx] = ismember(A,B) the result will be
idx = [1 3 0 4]
because A(1) = 1 and the first 1 in B is found at posistion 1. If a number in A doesn't occur in B, then the result will be 0.
[occurence,dates]= hist(data(:,1),unique(data(:,1))); gives the number of occurences for the dates.
t = [0;data(:,1)]; adds a zero in the beginning so tlooks like:
0
'date 1'
'date 2'
'date 3'
'date 4'
...
Why this is done, will be explained next.
t(idx+1) is a vector that is 1xlength(tm_all), and is kind of a copy of tm_all except that when a date doesn't occur in the file, the date is zero. How does this work? t(i) gives you the value of t at position i. So t( 1 5 4 2 9) is a vector with the values of t at positions 1, 5, 4, 2 and 9. Remember idx is the vector that contains the incides of the of the dates in data(:,1). Because Matlab indexing starts at 1, idx+1 is needed. The dates in data':,1) then must also be increased. That's done by adding the zero in the beginning.
[~,idx] = ismember(t(idx+1),dates); is the same as before, but idx now contains the indices of dates.
q = [0 occurence]; again adds a zero occ = q(idx+1); is the row of occurences of the dates.
I have a datetime array that highlights the peaks of a function "datepeak", for every day in one year. I obtained it using a datetime array "date" and the array with the position of the peaks "position".
t1 = datetime(year,1,1,0,0,0);
t2 = datetime(year,12,31,23,59,0);
date = t1:minutes(1):t2;
datepeak=date(position);
I need to take the n number of peaks for the day 1 and transpose this array to the first row of the matrix, and so on.
Since the number of peaks are not constants (min 3 max 4) I tried to initiate the matrix like this:
matrix=NaN(365,4)
Then I override the NaN of every row with this double for loop:
for i=1:365
v=datepeak(day(datepeak,'dayofyear')==i);
for c=1:length(v)
matrix(i,c)=(v(c));
end
end
This loop works (I tried it with the peaks), but with datetime I get an error.
Here's an example to paste:
year=2016;
position=[128 458 950];
t1 = datetime(year,1,1,0,0,0);
t2 = datetime(year,12,31,23,59,0);
date = t1:minutes(1):t2;
datepeak=date(position);
matrix=NaN(365,4);
for i=1:365
v=datepeak(day(datepeak,'dayofyear')==i);
for c=1:length(v)
matrix(i,c)=(v(c));
end
end
The nan array is of class double whereas datepeak is of class datetime so you can't store them in the same array. The way you represent your data should be driven by what you want to do with them later (and what is feasible). In your case, i'll assume that list 365 elements, containing the (any number) peak times of the day is ok.
year=2016;
position=[128 458 950];
t1 = datetime(year,1,1,0,0,0);
t2 = datetime(year,12,31,23,59,0);
date = t1:minutes(1):t2;
datepeak=date(position);
peaktimes_list = cell(365,1);
for i=1:365
peaktimes_list{i} = datepeak(day(datepeak,'dayofyear')==i);
end
EDIT : For a 365x4 cell array, change the last part by :
peaktimes = cell(365,4);
for i=1:365
v = datepeak(day(datepeak,'dayofyear')==i);
nv = numel(v);
peaktimes(i,1:nv) = num2cell(v);
end
When there are less than 4 values, the remaining columns will be empty.
I have data1,data2,...,data20 and I want to set all values above 45 for each of them to zero, and then create n1,n2,...,n20 that contain the number of non-zero values. I've tried this:
for i = 1:20
data{i}(data{i} > 45) = 0;
n{i} = nnz(data{i});
end
It's not working and I can't think of an alternative approach.
As mentioned in the comments, you have 20 separate variables and are trying to index them with a loop. With the way you are trying to access the variables in your loop, you can't because data is assumed to be a cell array but you have 20 different variables in your work space... which aren't cell arrays.
Having so many variables is usually bad practice, but one thing I'd recommend is if you place all of the variables inside a 2D matrix where each column represents one column of data. Once you do that, you can simply create a logical matrix out of this data by checking for values that are non-zero and summing up each column individually. Something like this:
data = [data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 ...
data11 data12 data13 data14 data15 data16 data17 data18 data19 data20];
n = sum(data <= 45, 1);
The last line of code is pretty elegant in that we create a logical matrix where each value is set to 1 if the corresponding entry is less than or equal to 45 and 0 otherwise and we sum along the columns of this result.
n will contain a 1 x 20 vector that sums up the total number of non-zeros for each column, where the ith member tells you how many non-zero values were for the ith column.
However, the above code assumes that each vector has the same number of elements. If this isn't the case, then I would make a cell array that stores all of these together, then loop over each cell and do the same logic:
data = {data1, data2, data3, data4, data5, data6, data7, data8, data9, data10, ...
data11, data12, data13, data14, data15, data16, data17, data18, data19, data20};
n = zeros(numel(data), 1);
for idx = 1 : numel(data)
d = data{idx};
d(d > 45) = 0;
n(idx) = nnz(d);
end
n should still be in the same format as the previous code.
I have a matrix in MATLAB of 50572x4 doubles. The last column has datenum format dates, increasing values from 7.3025e+05 to 7.3139e+05. The question is:
How can I split this matrix into sub-matrices, each that cover intervals of 30 days?
If I'm not being clear enough… the difference between the first element in the 4th column and the last element in the 4th column is 7.3139e5 − 7.3025e5 = 1.1376e3, or 1137.6. I would like to partition this into 30 day segments, and get a bunch of matrices that have a range of 30 for the 4th columns. I'm not quite sure how to go about doing this...I'm quite new to MATLAB, but the dataset I'm working with has only this representation, necessitating such an action.
Note that a unit interval between datenum timestamps represents 1 day, so your data, in fact, covers a time period of 1137.6 days). The straightforward approach is to compare each timestamps with the edges in order to determine which 30-day interval it belongs to:
t = A(:, end) - min(A:, end); %// Normalize timestamps to start from 0
idx = sum(bsxfun(#lt, t, 30:30:max(t))); %// Starting indices of intervals
rows = diff([0, idx, numel(t)]); %// Number of rows in each interval
where A is your data matrix, where the last column is assumed to contain the timestamps. rows stores the number of rows of the corresponding 30-day intervals. Finally, you can employ cell arrays to split the original data matrix:
C = mat2cell(A, rows, size(A, 2)); %// Split matrix into intervals
C = C(~cellfun('isempty', C)); %// Remove empty matrices
Hope it helps!
Well, all you need is to find the edge times and the matrix indexes in between them. So, if your numbers are at datenum format, one unit is the same as one day, which means that we can jump from 30 and 30 units until we get as close as we can to the end, as follows:
startTime = originalMatrix(1,4);
endTime = originalMatrix(end,4);
edgeTimes = startTime:30:endTime;
% And then loop though the edges checking for samples that complete a cycle:
nEdges = numel(edgeTimes);
totalMeasures = size(originalMatrix,1);
subMatrixes = cell(1,nEdges);
prevEdgeIdx = 0;
for curEdgeIdx = 1:nEdges
nearIdx=getNearestIdx(originalMatrix(:,4),edgeTimes(curEdgeIdx));
if originalMatrix(nearIdx,4)>edgeTimes(curEdgeIdx)
nearIdx = nearIdx-1;
end
if nearIdx>0 && nearIdx<=totalMeasures
subMatrix{curEdgeIdx} = originalMatrix(prevEdgeIdx+1:curEdgeIdx,:);
prevEdgeIdx=curEdgeIdx;
else
error('For some reason the edge was not inbound.');
end
end
% Now we check for the remaining days after the edges which does not complete a 30 day cycle:
if curEdgeIdx<totalMeasures
subMatrix{end+1} = originalMatrix(curEdgeIdx+1:end,:);
end
The function getNearestIdx was discussed here and it gives you the nearest point from the input values without checking all possible points.
function vIdx = getNearestIdx(values,point)
if isempty(values) || ~numel(values)
vIdx = [];
return
end
vIdx = 1+round((point-values(1))*(numel(values)-1)...
/(values(end)-values(1)));
if vIdx < 1, vIdx = []; end
if vIdx > numel(values), vIdx = []; end
end
Note: This is pseudocode and may contain errors. Please try to adjust it into your problem.
so I have a matrix Data in this format:
Data = [Date Time Price]
Now what I want to do is plot the Price against the Time, but my data is very large and has lines where there are multiple Prices for the same Date/Time, e.g. 1st, 2nd lines
29 733575.459548611 40.0500000000000
29 733575.459548611 40.0600000000000
29 733575.459548612 40.1200000000000
29 733575.45954862 40.0500000000000
I want to take an average of the prices with the same Date/Time and get rid of any extra lines. My goal is to do linear intrapolation on the values which is why I must have only one Time to one Price value.
How can I do this? I did this (this reduces the matrix so that it only takes the first line for the lines with repeated date/times) but I don't know how to take the average
function [ C ] = test( DN )
[Qrows, cols] = size(DN);
C = DN(1,:);
for i = 1:(Qrows-1)
if DN(i,2) == DN(i+1,2)
%n = 1;
%while DN(i,2) == DN(i+n,2) && i+n<Qrows
% n = n + 1;
%end
% somehow take average;
else
C = [C;DN(i+1,:)];
end
end
[C,ia,ic] = unique(A,'rows') also returns index vectors ia and ic
such that C = A(ia,:) and A = C(ic,:)
If you use as input A only the columns you do not want to average over (here: date & time), ic with one value for every row where rows you want to combine have the same value.
Getting from there to the means you want is for MATLAB beginners probably more intuitive with a for loop: Use logical indexing, e.g. DN(ic==n,3) you get a vector of all values you want to average (where n is the index of the date-time-row it belongs to). This you need to do for all different date-time-combinations.
A more vector-oriented way would be to use accumarray, which leads to a solution of your problem in two lines:
[DateAndTime,~,idx] = unique(DN(:,1:2),'rows');
Price = accumarray(idx,DN(:,3),[],#mean);
I'm not quite sure how you want the result to look like, but [DataAndTime Price] gives you the three-row format of the input again.
Note that if your input contains something like:
1 0.1 23
1 0.2 47
1 0.1 42
1 0.1 23
then the result of applying unique(...,'rows') to the input before the above lines will give a different result for 1 0.1 than using the above directly, as the latter would calculate the mean of 23, 23 and 42, while in the former case one 23 would be eliminates as duplicate before and the differing row with 42 would have a greater weight in the average.
Try the following:
[Qrows, cols] = size(DN);
% C is your result matrix
C = DN;
% this will give you the indexes where DN(i,:)==DN(i+1)
i = find(diff(DN(:,2)==0);
% replace C(i,:) with the average
C(i,:) = (DN(i,:)+DN(i+1,:))/2;
% delete the C(i+1,:) rows
C(i,:) = [];
Hope this works.
This should work if the repeated time values come in pairs (the average is calculated between i and i+1). Should you have time repeats of 3 or more then try to rethink how to change these steps.
Something like this would work, but I did not run the code so I can't promise there's no bugs.
newX = unique(DN(:,2));
newY = zeros(1,length(newX));
for ix = 1:length(newX)
allOcurrences = find(DN(:,2)==DN(i,2));
% If there's duplicates, take their mean
if numel(allOcurrences)>1
newY(ix) = mean(DN(allOcurrences,3));
else
% If not, use the only Y value
newY(ix) = DN(ix,3);
end
end