Quantizing timeline data for averaging and histograms - charts

I have some raw spreadsheet data that's in a format, like:
12/7/2016 3:07:00, 88.05,
12/7/2016 3:08:00, 89.10,
12/7/2016 3:13:00, 87.00,
etc
These data points are not sampled at a regular interval, but are randomly collected throughout the day.
Using Google Sheets I'm able to graph this easily onto a Timeline chart. This puts the values at the correct position on the timeline and takes the uneven sampling intervals into account.
I would like to generate a histogram of the timeline data while taking into account the timestamps and calculate an average value over a timeframe. I believe if I simply run this through the built-in histogram chart or select my data values and run it through an averaging function, it will be skewed by the uneven sampling intervals.
What's the easiest way to quantize the sampling intervals (ideally within Google Sheets) for generating my histogram and averaging?
or
Is there a built-in method to generate histograms/averaging of values while taking timestamp data into account, eliminating the need for quantized data?

You can calculate the appropriate average as follows (assuming your data is in the range A2:B50)
=sum(arrayformula((A3:A50-A2:A49)*(B3:B50+B2:B49)/2))/(A50-A2)
This formula implements the Trapezoidal rule: the value assigned to each time interval is the average of observed values at the ends of that interval.
There isn't a built-in "weighted histogram" tool, so it appears that needs re-sampling to create a representative histogram. Here is one way to resample. Let's say you want 20 samples; then in C2 enter
=arrayformula(A2+(row(1:20)-1)*(A50-A2)/19)
to get 20 uniformly distributed time values. (Division by 19 because of the fence-post distinction.) Then in D2,
=arrayformula(vlookup(C2:C21, A2:B50, 2))
will lookup a value for each sample time. Then you can build a histogram from column D.

Related

Plotting x-axis and y-axis with different (indep) limits in Matlab

I developed an Android app such that each scan is set to 1 Minute, and during this time the sensor collects many many readings randomly. I want to plot one sensor data of one scan only as follows:
The time of the scan is put manually in seconds for only 1 minute (from 1:60 sec) in the x-axis. While the vector of random readings collected from the sensor (sometimes reach hundreds of values) in the y-axis.
How I can do this in Matlab?
I tried using this code but gives me an error. "Vectors must be the same length."
This is my code:
x1 = linspace(0,60);
plot(x1,vector1,'o-r',x1,vector2,'+-k','LineWidth',lw,'MarkerSize',msz);
xlabel('Time (s)');
ylabel('sensor readings')
In order to match the amount of values you have to modify the input for linspace:
x1 = linspace(0,60,length(vector1));
This way you will automatically get the right amount of entries for your x-axis vector.
You basically tell linspace to create a vector that ranges from 0 to 60 with length(vector1) entries, so that it matches the length of your data set.
Note that if your second data set has a different amount of entries as your first, you will need to create a different x-axis vector that respectively matches its amount of values.

How to predict temperature for the 4th day, given temperatures for previous days, using a linear perceptron?

I have four sets of data (3 for training, 1 for testing) that include the hour of the day and temperatures in this format:
Time | Temperature
5, 60
6, 63
7,70
8,73
9,78
10,81.5
11,85.1
12,87
13,90
I need to train and test a perceptron and then predict what the temperatures will be on the next day at the same hours.
I am trying to use Matlab to do this and I know I am supposed to normalize the data and use time-series prediction. However I can't figure out how to start.
I don't understand what the inputs and outputs are, and what activations function to use to make the output linearly from -infinity to +infinity.
I'm pretty sure you won't have to use a perceptron for this task as you want to perform regression and not classification. (Perceptron is a binary classifier see Matlab documentation.)
To start with normalization: You need to adjust your data such that the mean is zero and the standard deviation equals 1. For example:
data = rand(1,100);
data = (data - mean(data))/sqrt(var(data));
You can interpret your input and output as follows:
You have an underlying function which maps your time-values to the temperature values (f:time->temperature). Time is the independent variable and temperature the dependent variable (see for example Wikipedia). And you want to find an approximation for f based on your input data.
For time series regression you will find a detailed example here. If you
are required to use a feedforward network you can also take a look at this.

How to create a vertical scatter graph using range specific abscissas using Sigmaplot

I am currently working on a project in which I will be creating a vertical scatter plot using on average 6 points of y-axis data using Sigmaplot. The units of the graph are depth of snow in cm vs time. However the data I have collected is gathered over a range of days (i.e. 173-176) and I am having trouble applying my data sets to their respective ranged abscissa. I've noticed inputting the data in this manner finds the difference in the abscissa (i.e. 173-176 would correspond to 3) rather than interpreting the data as a range. Can anyone help me find a way in which to input abscissa not as singular values but rather ranges of those values using Sigmaplot?
Leaving the abscissa ranges in tact, plot all of the data. Sigmaplot does not want sort these data and rather places them in their respective chronological order. I did not find a way to make the width of the points as wide as the abscissa.

Finding 15th and 85th percentile in matlab

I have came up with a matlab code to plot a probability density and a cumulative graph. I have used the matlab to compute the standard deviation and the mean as well.
My next task is to find the 15th and 85 percentile of the cumulative graph. I tried to use 'prctile (prob, 15)' to calculate the 15th percentile but it does not seem to be the same value as what I have observed from the graph.
Is there any other ways to find the 15th and 85 percentile?
This should give you the 15% and 85% percentile values as you see in your cumulative graph:
15_percentile = prob(find(prob<prctile(prob,15),1));
85_percentile = prob(find(prob>prctile(prob,85),1,'last'));
There are several ways to calculate a percentile (see http://en.wikipedia.org/wiki/Percentile)
The gotcha here is that MatLab and Excel don't agree (Excel uses the definition empployed by the National Institiute of Standards And Technology in the US...also default for R)...worth considering if you swap data and analysis between MatLab and Excel.
Use the percentile function if you have that statistics toolbox (type help prctile).
http://www.mathworks.com/help/stats/prctile.html
Alternatively write it yourself! A percentile is simply the data sorted, and the value closest to the percentile you want (for example if you have 1000 values, your 15th percentile will be the (15/100)*1000=150th value! Make sure you sort the data from smallest to largest.
There is a special way to deal with values that fall in between samples, but these depend on the definition you use. Some take the nearest, others take the average between two samples, and some others calculate how close they are to the samples and take a value that is linearly proportional to that.

analyse time series at a specific frequency

I have a long data set of water temperature:
t = 1/24:1/24:365;
y = 1 + (30-1).*rand(1,length(t));
plot(t,y)
The series extends for one year and the number of measurements per day is 24 (i.e. hourly). I expect the water temperature to follow a diurnal pattern (i.e. have a period of 24 hours), therefore I would like to evaluate how the 24 hour cycle varies throughout the year. Is there a method for only looking at specific frequencies when analyzing a signal? If so, I would like to draw a plot showing how the 24 hour periodicity in the data varies through the year (showing for example if it is greater in the summer and less in the winter). How could I do this?
You could use reshape to transform your data to a 24x365 matrix. In the new matrix every column is a day and every row a time of day.
temperature=reshape(y,24,365);
time=(1:size(temperature,1))-1;
day=(1:size(temperature,2))-1;
[day,time]=meshgrid(day,time);
surf(time,day,temperature)
My first thought would be fourier transformation. This will give you a frequency spectrum.
At high frequencies (> 1/d) you would have the pattern for a day, at low frequencies the patter over longer times. (see lowpass and highpass filter)
Also you could go for a frequency/time visualization that will show how the frequencies change over a year.
A bit more work - but you could write a simple model and create a Kalman filter for it.