analyse time series at a specific frequency - matlab

I have a long data set of water temperature:
t = 1/24:1/24:365;
y = 1 + (30-1).*rand(1,length(t));
plot(t,y)
The series extends for one year and the number of measurements per day is 24 (i.e. hourly). I expect the water temperature to follow a diurnal pattern (i.e. have a period of 24 hours), therefore I would like to evaluate how the 24 hour cycle varies throughout the year. Is there a method for only looking at specific frequencies when analyzing a signal? If so, I would like to draw a plot showing how the 24 hour periodicity in the data varies through the year (showing for example if it is greater in the summer and less in the winter). How could I do this?

You could use reshape to transform your data to a 24x365 matrix. In the new matrix every column is a day and every row a time of day.
temperature=reshape(y,24,365);
time=(1:size(temperature,1))-1;
day=(1:size(temperature,2))-1;
[day,time]=meshgrid(day,time);
surf(time,day,temperature)

My first thought would be fourier transformation. This will give you a frequency spectrum.
At high frequencies (> 1/d) you would have the pattern for a day, at low frequencies the patter over longer times. (see lowpass and highpass filter)
Also you could go for a frequency/time visualization that will show how the frequencies change over a year.
A bit more work - but you could write a simple model and create a Kalman filter for it.

Related

Quantizing timeline data for averaging and histograms

I have some raw spreadsheet data that's in a format, like:
12/7/2016 3:07:00, 88.05,
12/7/2016 3:08:00, 89.10,
12/7/2016 3:13:00, 87.00,
etc
These data points are not sampled at a regular interval, but are randomly collected throughout the day.
Using Google Sheets I'm able to graph this easily onto a Timeline chart. This puts the values at the correct position on the timeline and takes the uneven sampling intervals into account.
I would like to generate a histogram of the timeline data while taking into account the timestamps and calculate an average value over a timeframe. I believe if I simply run this through the built-in histogram chart or select my data values and run it through an averaging function, it will be skewed by the uneven sampling intervals.
What's the easiest way to quantize the sampling intervals (ideally within Google Sheets) for generating my histogram and averaging?
or
Is there a built-in method to generate histograms/averaging of values while taking timestamp data into account, eliminating the need for quantized data?
You can calculate the appropriate average as follows (assuming your data is in the range A2:B50)
=sum(arrayformula((A3:A50-A2:A49)*(B3:B50+B2:B49)/2))/(A50-A2)
This formula implements the Trapezoidal rule: the value assigned to each time interval is the average of observed values at the ends of that interval.
There isn't a built-in "weighted histogram" tool, so it appears that needs re-sampling to create a representative histogram. Here is one way to resample. Let's say you want 20 samples; then in C2 enter
=arrayformula(A2+(row(1:20)-1)*(A50-A2)/19)
to get 20 uniformly distributed time values. (Division by 19 because of the fence-post distinction.) Then in D2,
=arrayformula(vlookup(C2:C21, A2:B50, 2))
will lookup a value for each sample time. Then you can build a histogram from column D.

How to average values in plot, to make a plot with fewer values

I have a script that plots wind speed in m/s (measured every second) against time in minutes over a period of 24 hours. I want to make a new plot that instead of plotting wind speed each second, averages the wind speed over a period of 10 minutes and then plots this against the time.
Here is a sample image of my data:
Any ideas of how I can do this?
You can use a Moving Average filter using the smooth function as suggested by m.s. in a comment. This is fairly simple:
y = smooth(x,span);
This uses a symmetric smoothing filter, so the span (i.e. the number of samples it takes for smoothing) must be odd: take the current sample plus n before and n after the current sample. That way you still have one sample for every second, they are just smoothed to damp noise and measurement errors.
If you want to reduce the number of points, such that only one point every 10 minutes exists, you can do the following: You take the first 10min * 60s = 600 samples of the vector and put them in the first column of a new matrix. Then take the next 600 samples and put them in the second column, and so on. Now you can column-wise take the mean of the matrix. That way you have a new vector where every element is the mean of 600 samples.
In MATLAB this is easily possible:
X = reshape(x,600,[]); % create matrix with 600 elements per column
y = mean(X,1); % take column-wise mean

compute time series weighted average

I have a 8760x1 vector with the 1-hour average ambient temperature time series.
I want to calculate the weighted average temperature weighted by the percentage of operating
hours at each temperature level.
What i thought is divide the temperature range with:
ceil(Tmax-Tmin)
and then use hist.
Are there any other suggestions?
Thank you in advance.
mean(temperatures) should do it.
Since you have hourly measurements, the frequency of a given value will be reflecting the operating hours at that temperature level. A value that occurs frequently will therefore automatically have more weight in the average.
Let's say you have two vectors that are the same length, one is the temperature (temp), and the other is the amount of time at that temperature (time_at_temp). The weighted average formula is this:
wt_avg_temp = sum(temp .* time_at_temp) / sum(time_at_temp);

interpolation of fortnightly annual temperature data into hourly measurements in matlab

I have a dataset of annual temperature measurements recorded at fortnightly intervals. The data looks similar to the following:
t = 1:14:365;
% GENERATE DATA
y = 1 + (30-1).*rand(1,length(t));
y1 = 20*sin(2*pi*t/max(t)); % Annual variation °C
y1(y1<0) = [];
tt = 365/14;
time = 1:tt:365;
plot(time,y1,'-o');
where it clearly follows a annual temperature cycle.
From this I am wondering if it is possible to add a sine function (which would represent a diurnal temperature range) onto the data? For example, from the fortnightly data, if we were to interpolate the series to have 8760 measurements i.e. hourly measurements, for the series to be believable it would need to be characterized by a diurnal temperature cycle in addition to the annual temperature cycle. Furthermore, the diurnal temperature cycle would need to be a function of the temperature measurements at that time i.e. would be greater in the summer than in winter. So maybe it would be better to firstly use linear interpolation to get the data to represents hourly intervals and then add the sine function. Is there a method for writing this into a script? or does anyone have an opinion on how to accurately achieve this?
You could first interpolate your data (down to 1 hours) using something like
x = 1:inv(24):365;
T_interp = interp1(t,y1,x,'spline');
Check out Matlab documentation for interp1 (example 2)
and then add a sine onto it. The following a sine of period 1 (24 hours) with amplitude A, with a minimum at 3am.
T_diurn = -A*sin(2*pi*x+(3/24)*2*pi);
Then
T_total = T_diurn + T_interp;
First: you know that good-looking plots are the most misleading things in existence? Interpolating data gathered every 14 days so that it will look like data collected every hour is considered at least bad practice most circles...
Having said that, I would use splines to do the interpolation -- they are a lot more flexible when it comes to changing from fortnightly and hourly to some arbitrary other combination, plus the annual temperature variation will be a lot smoother.
Here's how:
% Create spline through data
pp = spline(time, y1);
% define diurnal variation (this one is minimal at 4 AM)
T_diurn = #(t) -A*cos(2*pi*(t-(4/24)));
% plot example
t = 150 : 1/24 : 250;
plot( t, ppval(pp,t)+T_diurn(t) , 'b')

generate signal with seasonal and diurnal component

This is a rather vague question but here we go - I would like to generate a time series for hourly measurements of one year, so for 2011 ti would 8760 values within the series. To make it easier to understand what I am trying to do I will use a real world example:
If we had a time series of hourly air temperature measurements an then plotted the entire series it would look similar to a bell shaped curve i.e.
a = 0; b = 30;
x = a + (b-a) * rand(1, 8760);
m = (a + b)/2;
s = 12;
p1 = -.5 * ((x - m)/s) .^ 2;
p2 = (s * sqrt(2*pi));
f = exp(p1) ./ p2;
plot(x,f,'.')
with the maximum values occurring in mid summer and lowest values during the winter. However, by zooming in on specific days we would see that the temperature also fluctuates between the day and the evening where maximum temperatures would occur at approximately 15:00 and minimum temperature at approximately 06:00.
So, my question is how would I generate this series, i.e. a time series which had a maximum value of say 30 degrees in mid summer i.e. value (8760/2) and also had the daily pattern mentioned above incorporated into the overall pattern?
The obvious way to do this would be to add together 2 sine waves, one for the diurnal variations and one for the annual variations.
Whether or not a sine wave is close enough to a bell-shaped curve for your liking I don't know, but I could make a vague argument that since the variation in annual and diurnal temperatures is (in part) a product of (approximately) circular motions you should be using sine waves anyway.
If you need help generating the sine waves update your question.
If I understand the question correctly, you'd like to have a superposition of the two series of known shape, right? If so, you just have to add them up. The important part is to shift the daily temperature fluctuation signal so that its mean is 0, provided the "year" curve expresses the average temperature.