Daily maximum 8-Hour running mean/moving average - matlab

I have hourly data and I want to do find the daily max 8-hour average. Basically, for each hour of the day, I want to do an 8-hour average. So take the average of 0:00 to 8:00, then 1:00 to 9:00, etc.), so I end up with 24 8-hour average periods (with some running into the next day of course). Then I need to take the maximum of those 24 8-hour averages to get the daily max. 
The .mat file used can be found here: https://www.dropbox.com/sh/9e2dgm0imvr0hpe/tAUOtpZEEa
A note about the format of the file: The O3.mat file has a variable called O3_Sorted that is a cell array. It contains all data, sorted already. But the data contains information from more than one site (i.e. there is information from different places). The information for each site is sorted together, but in the code, when I try to find the 8-hour averages, I have to pull out one site at a time so that the averages don't run into the beginning of the data for another place.
Here's a sample of what things look like. I included one day for one site and half a day of another site. The actual file has a month of data for each of these sites and other sites as well. As you can see, sometimes, the data is missing.
Column 1 - Site name
Column 2 - Date
Column 3 - Hour
Column 4 - Data
003-0010 2007-05-31 00:00 0.016
003-0010 2007-05-31 01:00 0.015
003-0010 2007-05-31 02:00 0.002
003-0010 2007-05-31 03:00 0.03
003-0010 2007-05-31 04:00 0.019
003-0010 2007-05-31 05:00 0.013
003-0010 2007-05-31 06:00 0.018
003-0010 2007-05-31 07:00 0.024
003-0010 2007-05-31 08:00 0.031
003-0010 2007-05-31 09:00 0.029
003-0010 2007-05-31 10:00 0.031
003-0010 2007-05-31 11:00 0.035
003-0010 2007-05-31 12:00 0.026
003-0010 2007-05-31 13:00 0.026
003-0010 2007-05-31 14:00 0.033
003-0010 2007-05-31 15:00 0.039
003-0010 2007-05-31 16:00 0.036
003-0010 2007-05-31 17:00 0.035
003-0010 2007-05-31 18:00 0.031
003-0010 2007-05-31 19:00 0.03
003-0010 2007-05-31 20:00 0.03
003-0010 2007-05-31 21:00 0.017
003-0010 2007-05-31 22:00 0.017
003-0010 2007-05-31 23:00 0.007
027-0007 2007-05-31 00:00 0.045
027-0007 2007-05-31 01:00 0.043
027-0007 2007-05-31 02:00
027-0007 2007-05-31 03:00 0.038
027-0007 2007-05-31 04:00 0.037
027-0007 2007-05-31 05:00 0.034
027-0007 2007-05-31 06:00 0.034
027-0007 2007-05-31 07:00 0.038
027-0007 2007-05-31 08:00 0.044
027-0007 2007-05-31 09:00 0.05
027-0007 2007-05-31 10:00 0.054
027-0007 2007-05-31 11:00 0.051
027-0007 2007-05-31 12:00 0.047
Here is what I have so far:
for i = 1:size(O3_sites)
I = ismember(D(:,6), O3_sites(i)); % Rows were the cell array O3_sorted has data corresponding to a certain site
site = D(I,:);
%% Convert O3 from ppm to ppb, 1ppm = 1000ppb
x = 1000;
y = str2double(O3);
O3_data = bsxfun(#times,x,y); % ppb
% Find size of array
[M, N]= size(O3_data);
% Create empty array
O3_MD8 = zeros(N,M-7); % double
**% Do a loop to calculate the running mean
for j = 1:M-7
A = O3_data(j:j+7);
O3_MD8(:,j) = mean(A);
end**
% Find max from each 8-hour loop
end
After I get the 8-hour averages, how can I ask MATLAB to find the max for each 24 averages? Basically, get the max of the hourly averages.
Also, the method I'm trying to do now is a bit risky because I'm not using datenum and so if data is missing a day, I won't know. But I have no idea how to consider that when writing the code.

You could just use the filter function, though I assume you already got your data in a proper format (1D-vector)
hours = 8; % size of hour window defining the moving average
movAV = filter(ones(1,hours)/hours,1,O3_data);
For the daily maximum you need to split your "hour"-vector and movAV in 24h brackets. Assuming you have one value per hour you could just reshape your result into a 24 x N array:
%example
x = 1:240; %d ata for 10 days
y = reshape(x,24,[])
then use the additional parameters of the max function to search the max columnwise:
% in this case the max is always the last value of every day
dailyMax = max(y,[],1)
dailyMax =
24 48 72 96 120 144 168 192 216 240
respectively:
dailyMax = max(reshape(movAV,24,[]),[],1)
Probably for your case the most convenient would be to use findpeaks which would directly output all local maxima (Signal Processing Toolbox required).

Related

MATLAB: How to calculate total precipitation per day using hourly data ? (netcdf)

I have hourly data from ECMWF ERA5 for each day in a specific year. I want to convert that data from hourly to daily. Copernicus has a Python code for this here https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation.
I want to know what is the matlab code to do this? I was upload the netcdf file in my google drive here:
https://drive.google.com/open?id=1qm5AGj5zRC3ifD1_V-ne2nDT1ch_Khik
time steps of each day are:
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Notice to cover total precipitation for 1st January 2017 for example, we need two days of data:
1st January 2017 time = 01 - 23 will give you total precipitation data to cover 00 - 23 UTC for 1st January 2017
2nd January 2017 time = 00 will give you total precipitation data to cover 23 - 24 UTC for 1st January 2017
here is ncdisp():
>> ncdisp(filename)
Source:
C:\Users\Behzad\Desktop\download.nc
Format:
64bit
Global Attributes:
Conventions = 'CF-1.6'
history = '2019-11-01 07:36:15 GMT by grib_to_netcdf-2.14.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -o /cache/data6/adaptor.mars.internal-1572593007.3569295-19224-27-449cad76-bcd6-4cfa-9767-8a3c1219c0bb.nc /cache/tmp/449cad76-bcd6-4cfa-9767-8a3c1219c0bb-adaptor.mars.internal-1572593007.35751-19224-4-tmp.grib'
Dimensions:
longitude = 49
latitude = 41
time = 8760
Variables:
longitude
Size: 49x1
Dimensions: longitude
Datatype: single
Attributes:
units = 'degrees_east'
long_name = 'longitude'
latitude
Size: 41x1
Dimensions: latitude
Datatype: single
Attributes:
units = 'degrees_north'
long_name = 'latitude'
time
Size: 8760x1
Dimensions: time
Datatype: int32
Attributes:
units = 'hours since 1900-01-01 00:00:00.0'
long_name = 'time'
calendar = 'gregorian'
tp
Size: 49x41x8760
Dimensions: longitude,latitude,time
Datatype: int16
Attributes:
scale_factor = 3.0792e-07
add_offset = 0.010089
_FillValue = -32767
missing_value = -32767
units = 'm'
long_name = 'Total precipitation'
tp is my variable which have 3 dimensions (lon*lat*time) = 49*41*8760
I want it in the 49*41*365 for a non-leap year.
The result should be the daily values for the whole year.
While some vectorized versions may exist that reshape your vector into 4 dimensions, a simple for loop will do the job.
tp_daily=zeros(size(tp,1),size(tp,2),365);
for ii=0:364
day=tp(:,:,ii*24+1:(ii+1)*24); %grab an entire day
tp_daily(:,:,ii+1)=sum(day,3); % add the third dimension
end

R: Why does st_join give invalid times error?

I am trying to join 2 SpatialPointsDataFrames by nearest neighbour analysis using sf::st_join(). Both files have been converted using st_as_sf() but when I try the join I get the error
Error in rep(seq_len(nrow(x)), lengths(i)) : invalid 'times' argument
At this point I have tried swapping the x and y arguments, and adjusting countless variations of the arguments but nothing seems to work. I have checked the help file for sf::st_join(), but don't see anything about a times argument? So I am unsure where from and why it keeps throwing this error...
below is a sample of my data set which produces the same error using the code further down
> head(sf.eSPDF[[1]])
Simple feature collection with 6 features and 8 fields
geometry type: POINT
dimension: XY
bbox: xmin: 35.9699 ymin: -3.74514 xmax: 35.97065 ymax: -3.74474
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
# A tibble: 6 x 9
TIME ELEVATION LATITUDE LONGITUDE DATE V1 V2 Survey geometry
<dttm> <chr> <dbl> <dbl> <date> <dttm> <dttm> <dbl> <POINT [°]>
1 2012-01-20 07:26:05 1018 m -3.74 36.0 2012-01-20 2012-01-20 00:00:00 2012-01-31 00:00:00 1 (35.97047 -3.74474)
2 2012-01-20 07:27:35 1018 m -3.74 36.0 2012-01-20 2012-01-20 00:00:00 2012-01-31 00:00:00 1 (35.97057 -3.74486)
3 2012-01-20 07:27:39 1019 m -3.74 36.0 2012-01-20 2012-01-20 00:00:00 2012-01-31 00:00:00 1 (35.9706 -3.74489)
4 2012-01-20 07:27:47 1020 m -3.74 36.0 2012-01-20 2012-01-20 00:00:00 2012-01-31 00:00:00 1 (35.97065 -3.74489)
5 2012-01-20 07:28:05 1020 m -3.74 36.0 2012-01-20 2012-01-20 00:00:00 2012-01-31 00:00:00 1 (35.97035 -3.74498)
6 2012-01-20 07:28:26 1019 m -3.75 36.0 2012-01-20 2012-01-20 00:00:00 2012-01-31 00:00:00 1 (35.9699 -3.74514)
> head(sf.plt.centr)
Simple feature collection with 6 features and 1 field
geometry type: POINT
dimension: XY
bbox: xmin: 35.75955 ymin: -3.91594 xmax: 36.0933 ymax: -3.401
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
PairID geometry
1 1 POINT (36.0933 -3.6731)
42 92 POINT (36.02593 -3.91594)
83 215 POINT (36.06496 -3.75837)
124 225 POINT (35.83156 -3.401)
165 251 POINT (35.75955 -3.54388)
206 2 POINT (36.08752 -3.69128)
Below is the code that I am using to check for working solutions
sf.eSPDF<-lapply(eSPDF, function(x){
st_as_sf(as(x, "SpatialPointsDataFrame"))
})
sf.plt.centr<-st_as_sf(as(plt.centr, "SpatialPointsDataFrame"))
x1<-head(sf.eSPDF[[1]])
x2<-head(sf.plt.centr)
check<-st_join(x1, x2, join=st_nn, maxdist = Inf, returnDist = T, progress = TRUE)
As you can see, the file I want to join to is an object within a list. All objects within that list have identical structure to the example given. Eventually I want to get a code that joins the sf.plt.centr file to each of the files within the list. Something like...
big.join<-lapply(sf.eSPDF, function(x){
st_join('['(x), sf.plt.centr, st_nn, maxdist = Inf, returnDist = T, progress = TRUE)
}

Plot histogram where x axis of the plot is a date

I know how to make a plot, but the data would be better represented as a histogram, Is there anyway I can easily convert this to a histogram?
figure();
plot(two_weeks,xAxis);
This is a datetime data type
disp(two_weeks)
21-Nov-2018 00:00:00 22-Nov-2018 00:00:00 23-Nov-2018 00:00:00 24-Nov-2018 00:00:00 25-Nov-2018 00:00:00 26-Nov-2018 00:00:00 27-Nov-2018 00:00:00 28-Nov-2018 00:00:00
Columns 9 through 14
29-Nov-2018 00:00:00 30-Nov-2018 00:00:00 01-Dec-2018 00:00:00 02-Dec-2018 00:00:00 03-Dec-2018 00:00:00 04-Dec-2018 00:00:00
disp(xAxis) =
5
12
1
7
13
24
2
27
62
0
3
17
74
4
Again I want something to look like this plot, except that it would be a histogram, I've tried looking through the histogram documentation and the MatLab helper form, but nothing answers my question, or helps me make the histogram in the desired way

reading only the daily julian values of a data given for every 3 hr

The data I am using is available per 3 hour time step. The Julian date for the data is therefore in an increasing order of 3/24 = 0.125 for each row of data value. I am interested only on the daily time step data and I would like to get help how to read only the daily Julian values that are recorded after every 8 Excel rows using Matlab.
Example of my data:
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
1.125
1.25
1.375
1.5
1.625
1.75
1.875
2
2.125
2.25
2.375
2.5
2.625
2.75
2.875
3
3.125
3.25
3.375
3.5
3.625
3.75
3.875
4
.
.
[continues until 360 and starts back from 0.125]

Subset a time-series using matlab

I have a one minute interval time series from which I want to subset 3 columns of data.
The time format is dd/mm/yy hh:mm:ss
I want to specify a 20 min time value for which I want to extract the corresponding samples for all the corresponding days (19:00 ; 19:20 ; 19:40 ; 20:00).
I already created a time series using
ts = timeseries(data, time)
samples=getdatasamples(ts, i)
But I am having trouble defining the logical vector i that can do such extraction
Please try this code:
pat_19='19:[0 2 4]0:00';
pat_20='20:00:00';
out_19=~(cellfun('isempty',regexpi(a(:,1),pat_19,'match')));
out_20=~(cellfun('isempty',regexpi(a(:,1),pat_20,'match')));
out=a(find(out_19+out_20),:);
Here, I assumed that the value of seconds is always '0'
Please see the example below:
"a" is a cell array with date as first column and data values of time series.
a =
'15/08/81 19:00:00' 0.01
'15/08/81 19:10:00' 0.02
'15/08/81 19:20:00' 0.03
'15/08/81 19:30:00' 0.04
'15/08/81 19:40:00' 0.06
'15/08/81 19:50:00' 0.07
'15/08/81 20:00:00' 0.01
'15/08/81 20:10:00' 0.02
'15/08/81 20:20:00' 0.03
'15/08/81 20:30:00' 0.03
after executing the above code, the output is stored in the cell array "out"
out =
'15/08/81 19:00:00' 0.01
'15/08/81 19:20:00' 0.03
'15/08/81 19:40:00' 0.06
'15/08/81 20:00:00' 0.01