I have hourly data from ECMWF ERA5 for each day in a specific year. I want to convert that data from hourly to daily. Copernicus has a Python code for this here https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation.
I want to know what is the matlab code to do this? I was upload the netcdf file in my google drive here:
https://drive.google.com/open?id=1qm5AGj5zRC3ifD1_V-ne2nDT1ch_Khik
time steps of each day are:
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Notice to cover total precipitation for 1st January 2017 for example, we need two days of data:
1st January 2017 time = 01 - 23 will give you total precipitation data to cover 00 - 23 UTC for 1st January 2017
2nd January 2017 time = 00 will give you total precipitation data to cover 23 - 24 UTC for 1st January 2017
here is ncdisp():
>> ncdisp(filename)
Source:
C:\Users\Behzad\Desktop\download.nc
Format:
64bit
Global Attributes:
Conventions = 'CF-1.6'
history = '2019-11-01 07:36:15 GMT by grib_to_netcdf-2.14.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -o /cache/data6/adaptor.mars.internal-1572593007.3569295-19224-27-449cad76-bcd6-4cfa-9767-8a3c1219c0bb.nc /cache/tmp/449cad76-bcd6-4cfa-9767-8a3c1219c0bb-adaptor.mars.internal-1572593007.35751-19224-4-tmp.grib'
Dimensions:
longitude = 49
latitude = 41
time = 8760
Variables:
longitude
Size: 49x1
Dimensions: longitude
Datatype: single
Attributes:
units = 'degrees_east'
long_name = 'longitude'
latitude
Size: 41x1
Dimensions: latitude
Datatype: single
Attributes:
units = 'degrees_north'
long_name = 'latitude'
time
Size: 8760x1
Dimensions: time
Datatype: int32
Attributes:
units = 'hours since 1900-01-01 00:00:00.0'
long_name = 'time'
calendar = 'gregorian'
tp
Size: 49x41x8760
Dimensions: longitude,latitude,time
Datatype: int16
Attributes:
scale_factor = 3.0792e-07
add_offset = 0.010089
_FillValue = -32767
missing_value = -32767
units = 'm'
long_name = 'Total precipitation'
tp is my variable which have 3 dimensions (lon*lat*time) = 49*41*8760
I want it in the 49*41*365 for a non-leap year.
The result should be the daily values for the whole year.
While some vectorized versions may exist that reshape your vector into 4 dimensions, a simple for loop will do the job.
tp_daily=zeros(size(tp,1),size(tp,2),365);
for ii=0:364
day=tp(:,:,ii*24+1:(ii+1)*24); %grab an entire day
tp_daily(:,:,ii+1)=sum(day,3); % add the third dimension
end
Related
I have a line chart in PowerBi that shows the price of an index every hour. How can I show in the same chart the daily average of prices?
I have computed a measure which calculates it, but when i plot in the hourly chart the average is no longer daily but hourly.
Here is an example: for simplicity, let us say that days have 3 hours, what I want to compute in PowerBi is the last column:
day
hour
price
daily_average
1/1/2023
1
100
150
1/1/2023
2
150
150
1/1/2023
3
200
150
1/2/2023
1
50
60
1/2/2023
2
60
60
1/2/2023
3
70
60
I would like to plot a graph with both "price" and "daily_average".
What you need to do is to create a measure where you remove Hour from filtere context, ALL(Sample1[hour]):
DailyAVG = CALCULATE( AVERAGE(Sample1[price]), ALL(Sample1[hour]) )
I know how to make a plot, but the data would be better represented as a histogram, Is there anyway I can easily convert this to a histogram?
figure();
plot(two_weeks,xAxis);
This is a datetime data type
disp(two_weeks)
21-Nov-2018 00:00:00 22-Nov-2018 00:00:00 23-Nov-2018 00:00:00 24-Nov-2018 00:00:00 25-Nov-2018 00:00:00 26-Nov-2018 00:00:00 27-Nov-2018 00:00:00 28-Nov-2018 00:00:00
Columns 9 through 14
29-Nov-2018 00:00:00 30-Nov-2018 00:00:00 01-Dec-2018 00:00:00 02-Dec-2018 00:00:00 03-Dec-2018 00:00:00 04-Dec-2018 00:00:00
disp(xAxis) =
5
12
1
7
13
24
2
27
62
0
3
17
74
4
Again I want something to look like this plot, except that it would be a histogram, I've tried looking through the histogram documentation and the MatLab helper form, but nothing answers my question, or helps me make the histogram in the desired way
Say I have the following data, S =
Year Week Postcode
2009 24 2035
2009 24 4114
2009 24 4127
2009 26 4114
2009 26 4556
2009 27 7054
2009 27 6061
2009 27 4114
2009 27 2092
2009 27 2315
2009 27 7054
2009 27 4217
2009 27 4551
2009 27 2035
2010 1 4132
2010 1 2155
2010 5 4114 ... (>60000 rows)
In Matlab, I would like to create a matrix with:
column 1: year (2006-2014)
column 2: week (1-52 for each year)
then the next n columns are unique postcodes where the data in each of these columns counts the occurrences from my data, S.
For example:
year week 2035 4114 4127 4556 7054
2009 24 1 1 1 0 0
2009 25 0 0 0 0 0
2009 26 0 1 0 1 0
2009 27 1 1 0 0 2
2009 28 0 0 0 0 0
Thanks if you can help!
Here is a working script which achieves this tabulation. The output is in the data table. You should:
Read the documentation on unique, tables, logical indexing, sortrows. As these are the key tools I used below.
Adapt the script to work with your data. This may involve changing matrices to cell arrays to deal with string inputs etc.
Possibly adapt this to be a function, for cleaner use if this is used regularly / on different data.
Code, fully commented for explanation:
% Use rng for repeatability in rand, n = num data entries
rng('default')
n = 100;
% Set up test data. You would use 3 equal length vectors of real data here
years = floor(rand(n,1)*9 + 2006); % random integer between 2006,2014
weeks = floor(rand(n,1)*52 + 1); % random integer between 1, 52
postcodes = floor(rand(n,1)*10)*7 + 4000; % arbitrary integers over 4000
% Create year/week values like 2017.13, get unique indices
[~, idx, ~] = unique(years + weeks/100);
% Set up table with year/week data
data = table();
data.Year = years(idx);
data.Week = weeks(idx);
% Get columns
uniquepostcodes = unique(postcodes);
% Cycle over unique columns, assign data
for ii = 1:numel(uniquepostcodes)
% Variable names cannot start with a numeric value, make start with 'p'
postcode = ['p', num2str(uniquepostcodes(ii))];
% Create data column variable for each unique postcode
data.(postcode) = zeros(size(data.Year,1),1);
% Count occurences of postcode in each date row
% This uses logical indexing of original data, looking for all rows
% which satisfy year and week of current row, and postcode of column.
for jj = 1:numel(data.Year)
data.(postcode)(jj) = sum(years == data.Year(jj) & ...
weeks == data.Week(jj) & ...
postcodes == uniquepostcodes(ii));
end
end
% Sort week/year data so all is chronological
data = sortrows(data, [1,2]);
% To check all original data was counted, you could run
% sum(sum(table2array(data(:,3:end))))
% ans = n, means that all data points were counted somewhere
On my PC, this takes less than 2.4 seconds for n = 60,000. There are almost definitely optimisations which can be made, but for something which may be used infrequently, this seems acceptable.
There is a linear increase in processing time, relative to the number of unique postcodes. This is because of the loop structure. So if you double the unique postcodes (20 rather than my example of 10) the time is nearer 4.8 seconds - twice as long.
If this solves your problem, consider accepting this as the answer.
I have a vector with random distributed values from 0 to 10 in increasing value, e.g. [1 3 4 9 10]. How can i convert this vector to a datetime object with time values between e.g. November and December such that these numbers represent the corresponding times in between?
Example, if x = [1 2 3] and I want the time period the whole January, then the output should be [1st January, 15th January, 30th January], according to their relative values.
Example, if x = [0 0.5 9 10] and we have entire January then 0 should map to the first day in January and 10 to the last day in January. 0.5 will map to the date at part 0.5/10 = 1/20 starting from the first January to the last. That date will be approximately 30 * 1 / 20 = 1 day and a half into January. Now, the 9 will in the same way be in position 9 / 10 of 30 days. That is 30 * 9 / 10 = 27. That is the 27th day of January. So the output should be [1st January, 1.5th January, 27th January, 30th January] in datetime format.
You can use datenum and some basic arithmetic to arrive at the following solution:
formatIn = 'dd.mm.yyyy';
d1 = '01.01.2017'; % user input, should be the earlier date
d2 = '31.01.2017'; % user input, should be the later date
x = [0 0.5 5 7 10]; % user input
d1 = datenum(d1,formatIn);
d2 = datenum(d2,formatIn);
daysAfter_d1 = d2-d1;
x = x/max(x);
addDays = round(daysAfter_d1*x);
interpolatedDates = d1 + addDays;
datestr(interpolatedDates,formatIn)
ans =
01.01.2017
03.01.2017
16.01.2017
22.01.2017
31.01.2017
I have hourly data and I want to do find the daily max 8-hour average. Basically, for each hour of the day, I want to do an 8-hour average. So take the average of 0:00 to 8:00, then 1:00 to 9:00, etc.), so I end up with 24 8-hour average periods (with some running into the next day of course). Then I need to take the maximum of those 24 8-hour averages to get the daily max.
The .mat file used can be found here: https://www.dropbox.com/sh/9e2dgm0imvr0hpe/tAUOtpZEEa
A note about the format of the file: The O3.mat file has a variable called O3_Sorted that is a cell array. It contains all data, sorted already. But the data contains information from more than one site (i.e. there is information from different places). The information for each site is sorted together, but in the code, when I try to find the 8-hour averages, I have to pull out one site at a time so that the averages don't run into the beginning of the data for another place.
Here's a sample of what things look like. I included one day for one site and half a day of another site. The actual file has a month of data for each of these sites and other sites as well. As you can see, sometimes, the data is missing.
Column 1 - Site name
Column 2 - Date
Column 3 - Hour
Column 4 - Data
003-0010 2007-05-31 00:00 0.016
003-0010 2007-05-31 01:00 0.015
003-0010 2007-05-31 02:00 0.002
003-0010 2007-05-31 03:00 0.03
003-0010 2007-05-31 04:00 0.019
003-0010 2007-05-31 05:00 0.013
003-0010 2007-05-31 06:00 0.018
003-0010 2007-05-31 07:00 0.024
003-0010 2007-05-31 08:00 0.031
003-0010 2007-05-31 09:00 0.029
003-0010 2007-05-31 10:00 0.031
003-0010 2007-05-31 11:00 0.035
003-0010 2007-05-31 12:00 0.026
003-0010 2007-05-31 13:00 0.026
003-0010 2007-05-31 14:00 0.033
003-0010 2007-05-31 15:00 0.039
003-0010 2007-05-31 16:00 0.036
003-0010 2007-05-31 17:00 0.035
003-0010 2007-05-31 18:00 0.031
003-0010 2007-05-31 19:00 0.03
003-0010 2007-05-31 20:00 0.03
003-0010 2007-05-31 21:00 0.017
003-0010 2007-05-31 22:00 0.017
003-0010 2007-05-31 23:00 0.007
027-0007 2007-05-31 00:00 0.045
027-0007 2007-05-31 01:00 0.043
027-0007 2007-05-31 02:00
027-0007 2007-05-31 03:00 0.038
027-0007 2007-05-31 04:00 0.037
027-0007 2007-05-31 05:00 0.034
027-0007 2007-05-31 06:00 0.034
027-0007 2007-05-31 07:00 0.038
027-0007 2007-05-31 08:00 0.044
027-0007 2007-05-31 09:00 0.05
027-0007 2007-05-31 10:00 0.054
027-0007 2007-05-31 11:00 0.051
027-0007 2007-05-31 12:00 0.047
Here is what I have so far:
for i = 1:size(O3_sites)
I = ismember(D(:,6), O3_sites(i)); % Rows were the cell array O3_sorted has data corresponding to a certain site
site = D(I,:);
%% Convert O3 from ppm to ppb, 1ppm = 1000ppb
x = 1000;
y = str2double(O3);
O3_data = bsxfun(#times,x,y); % ppb
% Find size of array
[M, N]= size(O3_data);
% Create empty array
O3_MD8 = zeros(N,M-7); % double
**% Do a loop to calculate the running mean
for j = 1:M-7
A = O3_data(j:j+7);
O3_MD8(:,j) = mean(A);
end**
% Find max from each 8-hour loop
end
After I get the 8-hour averages, how can I ask MATLAB to find the max for each 24 averages? Basically, get the max of the hourly averages.
Also, the method I'm trying to do now is a bit risky because I'm not using datenum and so if data is missing a day, I won't know. But I have no idea how to consider that when writing the code.
You could just use the filter function, though I assume you already got your data in a proper format (1D-vector)
hours = 8; % size of hour window defining the moving average
movAV = filter(ones(1,hours)/hours,1,O3_data);
For the daily maximum you need to split your "hour"-vector and movAV in 24h brackets. Assuming you have one value per hour you could just reshape your result into a 24 x N array:
%example
x = 1:240; %d ata for 10 days
y = reshape(x,24,[])
then use the additional parameters of the max function to search the max columnwise:
% in this case the max is always the last value of every day
dailyMax = max(y,[],1)
dailyMax =
24 48 72 96 120 144 168 192 216 240
respectively:
dailyMax = max(reshape(movAV,24,[]),[],1)
Probably for your case the most convenient would be to use findpeaks which would directly output all local maxima (Signal Processing Toolbox required).