How to construct moving time average with different weights for different months? - matlab

So I want to construct a moving time average with different weights for different months. E.g. see the filter function at http://www.mathworks.com/help/matlab/data_analysis/filtering-data.html, where b = # of days in each month and a = # of days in a year.
The issue is, though, that the time-series is a series of temperatures for every month (and I want to construct a yearly average temperature for each set of possible years, where a year could be from March to February, for example). Using this approach, the first month in each window would be weighted as 31/365, irrespective of whether the first month is January or June.
In that case, the standard filter algorithm wouldn't work. Is there an alternative?
A solution that incorporates leap years would also be nice, but is not necessary for an initial solution.

A weighted average is defined as sum(x .* weights) / sum(weights). If you want to calculate this in a moving average kind of way, I guess you could do (untested):
moving_sum = #(n, x) filter(ones(1,n), 1, x);
moving_weighted_avg = moving_sum(12, temperature .* days_per_month) ...
./ moving_sum(12, days_per_month);
If temperature is a vector of monthly temperatures and days_per_month contains the actual number of days of the corresponding months, this should even work in case of leap years.
Edit to answer comment
You can reconstruct days_per_month like so:
start_year = 2003;
start_month = 10;
nmonth = 130;
month_offset = 0:nmonth - 1;
month = mod(start_month + month_offset - 1, 12) + 1;
year = start_year + floor((start_month + month_offset - 1) / 12);
days_in_month = eomday(year, month);
disp([month_offset; year; month; days_in_month]') %print table to check

Related

How can I convert time from seconds to decimal year in Matlab?

I have a dataset which includes the seconds that have passed since 2000-01-01 00:00:00.0 and I would like them to be converted to decimal years (for example 2013.87).
An example from the dataset:
416554767.293262
416554768.037637
416554768.782013
416554769.526386
416554770.270761
416554771.015136
416554771.759509
416554772.503884
416554773.248258
416554773.992632
416554774.737007
416554775.481381
416554776.225757
416554776.970131
416554777.714504
416554778.458880
Can anyone help me out on this?
Thanks!
You should be able to perform these computations using methods of datetime and duration. A bit like this. I've tried to be careful regarding the number of seconds / year, since of course that varies depending on whether the year in question is a leap year.
% Original data
data = [416554767.293262
416554768.037637
416554768.782013
416554769.526386
416554770.270761
416554771.015136
416554771.759509
416554772.503884
416554773.248258
416554773.992632
416554774.737007
416554775.481381
416554776.225757
416554776.970131
416554777.714504
416554778.458880];
% Original data is seconds since 'base':
base = datetime(2000,1,1);
% Get datetimes corresponding to 'data'
dates = base + seconds(data);
% Extract the year portion from the dates
wholeYears = year(dates);
% Extract the remainder of the dates as seconds
remainderInSeconds = seconds(dates - datetime(wholeYears,1,1));
% Calculate the number of seconds in each of the years
secondsPerYear = seconds(datetime(wholeYears + 1, 1, 1) - datetime(wholeYears, 1, 1));
% Final result is whole years + remainder expressed as years
result = wholeYears + (remainderInSeconds ./ secondsPerYear);
fprintf('%.16f\n', result);

Finding minimum temperature for a month in a data set

I have already made a function which extracts all temeperatures from a data set for a certain time for a given month and year.
Which looks like exctractperiod(data, year, month, time)
I now want to find a minimum temperature for a certain month, say January, across many years. For example if I look at the month January between the year 1997 and 2006. Now I want the lowest registered temperatue for January between 1997-2006.
My progress so far (keep in mind this is just a rough idea for what I want)
for i = 1:12
for z = 1:x+1
year=startyear:1:endyear;
year(z)
p = extractperiodtwo(DATA, year, month, time);
end
I want to know how I can write my for loops so that for, say month 1, it goes through the years 1997-2006 and finds the lowest temperatue. Then for the next loop it goes through the years 1997-2006 for month 2. This should then repeat until month 12.
The variable p stores all the temperatures for year YYYY month MM.
Don't take my program all to seriously it was just a rough write-up to give myself a idea for how it should look. Maybe it clarifies my question.
You're probably looking for something like this:
mintemp = inf(1,12); % initialize to infinity for each month
for month = 1:12
for year = 1997:2006
p = extractperiodtwo(DATA, year, month, time);
temp = min(p); % assuming `p` contains multiple temperatures?
mintemp(month) = min(mintemp(month), temp); % update current month's min temp
end
end

Checking if a given date hour is within a predefined interval with datenum()

I have a table with dates (and other things), which I have extracted from a CSV file. In order to do some processing of my data (including plotting) I decided to convert all my date-strings to date-numbers (below for simplicity reasons I will exclude all the rest of the data and concentrate on the dates only so don't mind the step from dates to timetable and the fact that it can be omitted):
dates = [7.330249777777778e+05;7.330249291666667e+05;7.330246729166667;7.330245256944444;7.330246763888889;7.330245284722222;7.330245326388889;7.330246625000000];
timetable = table(dates);
timetable
_________
7.330249777777778e+05
7.330249291666667e+05
7.330246729166667
7.330245256944444
7.330246763888889
7.330245284722222
7.330245326388889
7.330246625000000
I'm facing the following issue - based on the time during the day I want to tell the user if a date is in the morning (24-hours scale: 5-12h), noon (12-13h), afternoon (13-18h), evening (18-21h), night (21-5h) based on the date I have stored in my table. In case I had a date-vector (with elements: year,month,day,hour,minute,second) it would be pretty straight forward:
for date = 1:size(timetable)
switch timetable(date).hour
case {5,12}
'morning'
case {12,13}
'noon'
case {13,18}
'afternoon'
case {18,21}
'evening'
otherwise
'night'
end
end
With 7.330246729166667 and the rest this is not that obvious at least to me. Any idea how to avoid converting to some other date-format just for this step and at the same time avoid some complex formula for extracting the required data (not necessarily hour only but I'm interested in the rest too)?
One unit in Matlab serial dates is equivalent to 1 day, i.e. 24 hours. Knowing this, you can bin the fractional part of the the dates within the intraday buckets you defined (note that your switch will only work for values exactly equal to the case lists):
bins = {'morning', 'noon', 'afternoon', 'evening', 'night'};
edges = [5,12,13,18,21,25]./24; % As fraction of a day
% Take fractional part
time = mod(dates,1);
% Bin with lb <= x < ub, where e.g. lb = 5/25 and is ub = 12/24
[counts,~,pos] = histcounts(time, edges);
% Make sure unbinned x in [0,5) are assigned 'night'
pos(pos==0) = 5;
bins(pos)'
ans =
'night'
'night'
'morning'
'morning'
'morning'
'morning'
'morning'
'morning'

average wind direction using histc matlab

Hello this question might be easy but i am struggling to get average wind directions for 1 year. I need hourly averages to compare with concentration measurements. My wind measurements are every minute in degree. So my idea was to use the histc function in matlab to get the most common winddirection within the hour. this works for 1 h but how do i create a loop which gives me hourly values for a year.
here is the code
wdd=winddirections in degree(vectorsize e.g for a year 525600)
binranges = [0:10:360];
[bincounts,ind] = histc(wdd(1:60),binranges);
[num idx] = max(bincounts(:));
wd_out=binranges(idx);
kind regards matthias
How about this one -
binranges = [0:10:360]
[bincounts,ind] = histc(reshape(wdd,60,[]),binranges)
[nums idxs] = max(bincounts)
wd_out=binranges(idxs)
What I would do is:
wdd_phour=reshape(wdd,60,525600/60); % get a matrix of size 60(min) X hours per year
mean_phour=mean(wdd_phour,1); % compute the average of each 60 mins for every our in a year

Error in data source: correct iteratively the vector without for loop?

Hello everyone I have a new small problem:
The data I am using have a weird trade time that goes from 17.00 of one day to 16.15 of the day after.
That means that, e.g., for the day 09-27-2013 The source I am using registers the transactions occurred as follows:
DATE , TIME , PRICE
09/27/2013,17:19:42,3225.00,1 #%first obs of the vector
09/27/2013,18:37:59,3225.00,1 #%second obs of the vector
09/27/2013,08:31:32,3200.00,1
09/27/2013,08:36:17,3203.00,1
09/27/2013,09:21:34,3210.50,1 #%fifth obs of the vector
Now first and second obs are incorrect for me: they belong to 9/27 trading day but they have been executed on 9/26. Since I am working on some functions in matlab that relies on non-decremental times I need to solve this issue. The date format I am using is actually the datenum Matlab format so I am trying to solve the problem just subtracting one from the incorrect observations:
%#Call time the time vector, I can identify the 'incorrect' observations
idx=find(diff(time)<0);
time(idx)=time(idx)-1;
It is easy to tell that this will only fix the 'last' incorrect observations of a series. In the previous example this would only correct the second element. And I should run the code several times (I thought about a while loop) until idx will be empty. This is not a big issue when working with small series but I have up to 20millions observations and probably hundred of thousands consecutively incorrect ones.
Is there a way to fix this in a vectorized way?
idx=find(diff(time)<0);
while idx
However, given that the computation would not be so complex I thought that a for loop could efficiently solve the issue and my idea was the following:
[N]=size(time,1);
for i=N:-1:1
if diff(time(i,:)<0)
time(i,:)=time(i,:)-1;
end
end
sadly it does not seems to work.
Here is an example of data I am actually using.
735504.591157407
735507.708030093 %# I made this up to give you an example of two consecutively wrong observations
735507.708564815 %# This is an incorrect observation
735507.160138889
735507.185358796
735507.356562500
Thanks everyone in advance
Sensible version -
for count = 1:numel(time)
dtime = diff([0 ;time]);
ind1 = find(dtime<0,1,'last')-1;
time(ind1) = time(ind1)-1;
end
Faster-but-crazier version -
dtime = diff([0 ;time]);
for count = 1:numel(time)
ind1 = find(dtime<0,1,'last')-1;
time(ind1) = time(ind1)-1;
dtime(ind1+1) = 0;
dtime(ind1) = dtime(ind1)-1;
end
More Crazier version -
dtime = diff([0 ;time]);
ind1 = numel(dtime);
for count = 1:numel(time)
ind1 = find(dtime(1:ind1)<0,1,'last')-1;
time(ind1) = time(ind1)-1;
dtime(ind1) = dtime(ind1)-1;
end
Some average computation runtimes for these versions with various datasizes -
Datasize 1: 3432 elements
Version 1 - 0.069 sec
Version 2 - 0.042 sec
Version 3 - 0.034 sec
Datasize 2: 20 Million elements
Version 1 - 37029 sec
Version 2 - 23303 sec
Version 3 - 20040 sec
So apparently I had 3 other different problems in the data source that I think could have stucked the routine Divakar proposed. Anyway I thought it was being too slow so I started thinking to another solution and came up with a super quick vectorized one.
Given that the observations I wanted to modify fall in a determined known interval of time the function just look for every observation falling in that interval and modifies it as I want (-1 in my case).
function [ datetime ] = correct_date( datetime,starttime, endtime)
%#datetime is my vector of dates and times in matlab numerical format
%#starttime is the starting hour of the interval expressed in datestr format. e.g. '17:00:00'
%#endtime is the ending hour of the interval expressed in datestr format. e.g. '23:59:59'
if (nargin < 1) || (nargin > 3),
error('Requires 1 to 3 input arguments.')
end
% default values
if nargin == 1,
starttime='17:00';
endtime='23:59:59';
elseif nargin == 2,
endtime='23:59:59';
end
tvec=[datenum(starttime) datenum(endtime)];
tvec=tvec-floor(tvec); %#As I am working on multiples days I need to isolate only HH:MM:SS for my interval limits
temp=datetime-floor(datetime); %#same motivation as in the previous line
idx=find(temp>=tvec(1)&temp<=tvec(2)); %#logical find the indices
datetime(idx)=datetime(idx)-1; %#modify them as I want
clear tvec temp idx
end