average wind direction using histc matlab - matlab

Hello this question might be easy but i am struggling to get average wind directions for 1 year. I need hourly averages to compare with concentration measurements. My wind measurements are every minute in degree. So my idea was to use the histc function in matlab to get the most common winddirection within the hour. this works for 1 h but how do i create a loop which gives me hourly values for a year.
here is the code
wdd=winddirections in degree(vectorsize e.g for a year 525600)
binranges = [0:10:360];
[bincounts,ind] = histc(wdd(1:60),binranges);
[num idx] = max(bincounts(:));
wd_out=binranges(idx);
kind regards matthias

How about this one -
binranges = [0:10:360]
[bincounts,ind] = histc(reshape(wdd,60,[]),binranges)
[nums idxs] = max(bincounts)
wd_out=binranges(idxs)

What I would do is:
wdd_phour=reshape(wdd,60,525600/60); % get a matrix of size 60(min) X hours per year
mean_phour=mean(wdd_phour,1); % compute the average of each 60 mins for every our in a year

Related

Getting the correct output units from the PLOMB (Lomb-scargle periodogram) function

I am trying to analyze timeseries of wheel turns that were sampled at 1 minute intervals for 10 days. t is a 1 x 14000 array that goes from .1666 hours to 240 hours. analysis.timeseries.(grp).(chs) is a 1 x 14000 array for each of my groups of interest and their specific channels that specifize activity at each minute sampled. I'm interested in collecting the maximum power and the frequency it occurs at. My problem is I'm not sure what units f is coming out in. I would like to have it return in cycles per hour and span to a maximum period of 30 hours. I tried to use the Galileo example in the documentation as a guide, but it didn't seem to work.
Below is my code:
groups = {'GFF' 'GMF' 'SFF' 'SMF'};
chgroups = {chnamesGF chnamesGM chnamesSF chnamesSM};
t1 = (t * 3600); %matlab treats this as seconds so convert it to an hour form
onehour = seconds(hours(1));
for i = 1:4
grp = groups{1,i};
chn = chgroups{1,i};
for channel = 1:length(chn)
chs = chn{channel,1};
[pxx,f]= plomb(analysis.timeseries.(grp).(chs),t, 30/onehour,'normalized');
analysis.pxx.(grp).(chs) = pxx;
analysis.f.(grp).(chs) = f;
analysis.lsp.power.(grp).(chs) = max(pxx);
[row,col,v] = find(analysis.pxx.(grp).(chs) == analysis.lsp.power.(grp).(chs));
analysis.lsp.tau.(grp).(chs) = analysis.f.(grp).(chs)(row);
end
end
Not really an answer but it is hard to put a image in a comment.
Judging by this (plomb manual matlab),
I think that pxx is without dimension as for f is is the frequency so 1/(dimension of t) dimension. If your t is in hours I would say h^-1.
So I'd rather say try
[pxx,f]= plomb(analysis.timeseries.(grp).(chs),t*30.0/onehour,'normalized');

matlab: how to find interval of data

I have a dataset of trajectories of users: every current location of the traiectories has these fields:_ [userId year month day hour minute second latitude longitude regionId]. Based on the field day, I want to divide trajectories based on daily-scale in interval of different hours: 3 hours, 4 hours, 2 hours. I have realized this code that run for interval of 4 hours
% decomposedTraj is a struct that contains the trajectories based on daily scale
for i=1:size(decomposedTraj,2)
if ~isempty(decomposedTraj(i).dailyScaled)
% find the intervals
% interval [0-4]hours
Interval(i).interval_1=(decomposedTraj(i).dailyScaled(:,5)>=0&decomposedTraj(i).dailyScaled(:,5)<4);
% interval [4-8]hours
Interval(i).interval_2=(decomposedTraj(i).dailyScaled(:,5)>=4&decomposedTraj(i).dailyScaled(:,5)<8);
% interval [8-12]hours
Interval(i).interval_3=(decomposedTraj(i).dailyScaled(:,5)>=8&decomposedTraj(i).dailyScaled(:,5)<12);
% interval [12-16]hours
Interval(i).interval_4=(decomposedTraj(i).dailyScaled(:,5)>=12&decomposedTraj(i).dailyScaled(:,5)<16);
% interval [16-20]hours
Interval(i).interval_5=(decomposedTraj(i).dailyScaled(:,5)>=16&decomposedTraj(i).dailyScaled(:,5)<20);
% interval [20-0]hours
Interval(i).interval_6=(decomposedTraj(i).dailyScaled(:,5)>=20);
end
end
or more easily to understand the logic of the code:
A=[22;19;15;15;0;20;22;19;15;15;0;20;20;0;22;21;17;23;22]';
A(A>=0&A<4)
A(A>=4&A<8)
A(A>=8&A<12)
A(A>=12&A<16)
A(A>=16&A<20)
A(A>=20)
It runs and gives the right answer but it's not smart: if I want to change the interval, I have to change all the code... can you help me to find a smart solution more dinamical of this? thanks
0 Comments
Interval k is defined as [(k-1)*N k*N] where N=4 in your example. Therefore you can do the same using a for loop:
for k=1:floor(24/N)
Interval(k) = A(A>=(k-1)*N & A<k*N);
end
Note that in this example A(A>=(k-1)*N & A<k*N) is not necessarily the same size for each k so Interval should be a cell array.

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')

Error in data source: correct iteratively the vector without for loop?

Hello everyone I have a new small problem:
The data I am using have a weird trade time that goes from 17.00 of one day to 16.15 of the day after.
That means that, e.g., for the day 09-27-2013 The source I am using registers the transactions occurred as follows:
DATE , TIME , PRICE
09/27/2013,17:19:42,3225.00,1 #%first obs of the vector
09/27/2013,18:37:59,3225.00,1 #%second obs of the vector
09/27/2013,08:31:32,3200.00,1
09/27/2013,08:36:17,3203.00,1
09/27/2013,09:21:34,3210.50,1 #%fifth obs of the vector
Now first and second obs are incorrect for me: they belong to 9/27 trading day but they have been executed on 9/26. Since I am working on some functions in matlab that relies on non-decremental times I need to solve this issue. The date format I am using is actually the datenum Matlab format so I am trying to solve the problem just subtracting one from the incorrect observations:
%#Call time the time vector, I can identify the 'incorrect' observations
idx=find(diff(time)<0);
time(idx)=time(idx)-1;
It is easy to tell that this will only fix the 'last' incorrect observations of a series. In the previous example this would only correct the second element. And I should run the code several times (I thought about a while loop) until idx will be empty. This is not a big issue when working with small series but I have up to 20millions observations and probably hundred of thousands consecutively incorrect ones.
Is there a way to fix this in a vectorized way?
idx=find(diff(time)<0);
while idx
However, given that the computation would not be so complex I thought that a for loop could efficiently solve the issue and my idea was the following:
[N]=size(time,1);
for i=N:-1:1
if diff(time(i,:)<0)
time(i,:)=time(i,:)-1;
end
end
sadly it does not seems to work.
Here is an example of data I am actually using.
735504.591157407
735507.708030093 %# I made this up to give you an example of two consecutively wrong observations
735507.708564815 %# This is an incorrect observation
735507.160138889
735507.185358796
735507.356562500
Thanks everyone in advance
Sensible version -
for count = 1:numel(time)
dtime = diff([0 ;time]);
ind1 = find(dtime<0,1,'last')-1;
time(ind1) = time(ind1)-1;
end
Faster-but-crazier version -
dtime = diff([0 ;time]);
for count = 1:numel(time)
ind1 = find(dtime<0,1,'last')-1;
time(ind1) = time(ind1)-1;
dtime(ind1+1) = 0;
dtime(ind1) = dtime(ind1)-1;
end
More Crazier version -
dtime = diff([0 ;time]);
ind1 = numel(dtime);
for count = 1:numel(time)
ind1 = find(dtime(1:ind1)<0,1,'last')-1;
time(ind1) = time(ind1)-1;
dtime(ind1) = dtime(ind1)-1;
end
Some average computation runtimes for these versions with various datasizes -
Datasize 1: 3432 elements
Version 1 - 0.069 sec
Version 2 - 0.042 sec
Version 3 - 0.034 sec
Datasize 2: 20 Million elements
Version 1 - 37029 sec
Version 2 - 23303 sec
Version 3 - 20040 sec
So apparently I had 3 other different problems in the data source that I think could have stucked the routine Divakar proposed. Anyway I thought it was being too slow so I started thinking to another solution and came up with a super quick vectorized one.
Given that the observations I wanted to modify fall in a determined known interval of time the function just look for every observation falling in that interval and modifies it as I want (-1 in my case).
function [ datetime ] = correct_date( datetime,starttime, endtime)
%#datetime is my vector of dates and times in matlab numerical format
%#starttime is the starting hour of the interval expressed in datestr format. e.g. '17:00:00'
%#endtime is the ending hour of the interval expressed in datestr format. e.g. '23:59:59'
if (nargin < 1) || (nargin > 3),
error('Requires 1 to 3 input arguments.')
end
% default values
if nargin == 1,
starttime='17:00';
endtime='23:59:59';
elseif nargin == 2,
endtime='23:59:59';
end
tvec=[datenum(starttime) datenum(endtime)];
tvec=tvec-floor(tvec); %#As I am working on multiples days I need to isolate only HH:MM:SS for my interval limits
temp=datetime-floor(datetime); %#same motivation as in the previous line
idx=find(temp>=tvec(1)&temp<=tvec(2)); %#logical find the indices
datetime(idx)=datetime(idx)-1; %#modify them as I want
clear tvec temp idx
end

matlab - averaging for a given interval

I can calculate daily averages of a data set like so:
Jday = datenum('2010-11-01 00:00','yyyy-mm-dd HH:MM'):60/(60*24):...
datenum('2011-02-31 23:00','yyyy-mm-dd HH:MM');
Dat = rand(length(Jday),1);
DateV = datevec(Jday);
[~,~,b] = unique(DateV(:,1:3),'rows');
AvDat = abs(accumarray(b,Dat,[],#nanmean));
AvJday = abs(accumarray(b,Jday,[],#nanmean));
However, I would like to take an average of a data set given a number for the output resolution. For example, if I wrote
outRes = 86400; % in seconds
I would like to average the values so that the output resolution is equal to 86400 seconds, and if the outRes defined is shorter than the resolution of the data then no averaging will take place.
How can this be done?
You should be able to detect the resolution of the data by:
CurrentRes=mean(diff(Jday))*86400;
if (~all(diff(Jday)==CurrentRes))
error('Inequally distributed sampling times');
end
if (CurrentRes>outRes)
return;
Then the amount you need to downsample by is:
AveragingFactor=outRes/CurrentRes;
Then you can average by discarding some samples at the end (or start) and averaging:
Dat = reshape(Dat(1:end-mod(length(Dat),AveragingFactor),:),AveragingFactor,[]);
AvDat = mean(Dat,1)'; % Transpose to keep it as a column vector if necessary.