In google trends there is possibility to export data as CSV. Obtained CSV has the following structure:
Week,subject 1, subject 2
2004-01-04 - 2004-01-10,13,6
2004-01-11 - 2004-01-17,9,9
2004-01-18 - 2004-01-24,11,4
I know that there is DateObject[], but it contain only one date. I want to obtain stepped chart of subjects 1 and 2 in time domain, and calculate correlation of them in range between two given dates.
My problem is: how structure of data, should I use to represent time range?
As google trends calls the time variable "week" take
StringTake["2004-01-04 - 2004-01-10", 10]
to get the first day of the range, then use
DateList[{"2004-01-04", {"Year", "Month", "Day"}}]
to create a date list and
DateString[{2004, 1, 4, 0, 0, 0}, {"Week"}]
to express the time in terms of the calendar week of the year. So, the function
RangeToWeek[timerangestring_] := DateString[ DateList[{
StringTake[timerangestring, 10],
{"Year", "Month", "Day"}}], {"Week"}]
gives for the firs date in your list 01, because the time span from 04.01.2004 to 10.01.2004 corresponds to the first callendar week of that year.
Related
I have a netcdf file that has monthly global data from 1991 to 2000 (10 years).
Using CDO, how can I modify the netcdf from monthly to daily timesteps by repeating the monthly values each day of each month?
for eaxample,
convert from
Month 1, value = 0.25
to
Day 1, value = 0.25
Day 2, value = 0.25
Day 3, value = 0.25
....
Day 31, value = 0.25
convert from
Month 2, value = 0.87
to
Day 1, value = 0.87
Day 2, value = 0.87
Day 3, value = 0.87
....
Day 28, value = 0.87
Thanks
##############
Update
my monthly netcdf has the monthly values not on the first day of each month, but in sparse order. e.g. on the 15th, 7th, 9th, etc.. however one value for each month.
The question is perhaps ambiguously worded. Adrian Tompkins' answer is correct for interpolation. However, you are actually asking to set the value for each day of the month to that for the first day of the month. You could do this by adding a second CDO call as follows:
cdo -inttime,1991-01-01,00:00:00,1day in.nc temp.nc
cdo -monadd -gtc,100000000000000000 temp.nc in.nc out.nc
Just set the value after gtc to something much higher than anything in your data.
You can use inttime which interpolates in time at the interval required, but this is not exactly what you asked for as it doesn't repeat the monthly values and your series will be smoothed by the interpolation.
If we assume your dataset starts on the 1st January at time 00:00 (you don't state in the question) then the command would be
cdo inttime,1991-01-01,00:00:00,1day in.nc out.nc
This performs a simple linear interpolation between steps.
Note: This is fine for fields like temperature and seems to be want you ask for, but readers should note that one has to be more careful with flux fields such as rainfall, where one might want to scale and/or change the units appropriately.
I could not find a solution with CDO but I solved the issue with R, as follows:
library(dplyr)
library(ncdf4)
library(reshape2)
## Read ncfile
ncpath="~/my/path/"
ncname="my_monthly_ncfile"
ncfname=paste(ncpath, ncname, ".nc", sep="")
ncin=nc_open(ncfname)
var=ncvar_get(ncin, "nc_var")
## melt ncfile
var=melt(var)
var=var[complete.cases(var), ] ## remove any NA
## split ncfile by gridpoint (lat and lon) into a list
var=split(var, list(var$lat, var$lon))
var=var[lapply(var,nrow)>0] ## remove any empty list element
## create new list and replicate, for each gridpoint, each monthly value n=30 times
var_rep=list()
for (i in 1:length(var)) {
var_rep[[i]]=data.frame(value=rep(var[[i]]$value, each=30))
}
I have a vector of dates in either dmY formats and Ymd format.
These are all dates in the last century.
From each, I need to extract just the year (Y).
I use the following code
library(lubridate)
sampleDates <- c(20100517,17052010)
result <- year(parse_date_time(x, guess_formats(as.character(x), c("Ymd","dmY"))))
result
517 2010
However, I expect something like
result
2010 2010
Here is a base R solution to your problem that takes a particular difficulty into account with your date format. Let's say you have the date 20112020, i.e. November 20th in the year 2020. For your function, it is not easy to distinguish which part of the string is the year - is it 2011 or 2020? The following code takes this difficulty into account, though let me mention that there surely must be simpler solutions.
Code
NonID <- grepl("^2", sampleDates) & (substr(sampleDates, 5, 5) == "2")
ID <- !NonID
dates_normal <- sampleDates[!NonID]
dates_special <- sampleDates[NonID]
normal_years <- as.numeric(c(substr(dates_normal, nchar(dates_normal) - 3, nchar(dates_normal)), substr(dates_normal, 1, 4)))
normal_years <- normal_years[normal_years > 1999]
special_years <- as.numeric(substr(dates_special, nchar(dates_special) - 3, nchar(dates_special)))
all_years <- c(normal_years, special_years)
all_years
> all_years
[1] 2010 2010
Explanation
First, we divide the date vector into those dates which exhibit the indistinguishability (dates_normal) and those which do not (dates_special). Then, for the normal dates, we use the substr() function to extract the first four and last four digits of the string and keep only those values which exceed 2000. For the special dates, we only keep the last four digits because the year can't be possibly included in the first four digits for this date format.
How to add a vector of seconds to time HH:mm:ssPM in MATAB?
I usually have this nice way in Excel to convert normal number format to hour and minutes and sec. format using simple cell custom formatting, but when I put down code below in MATLAB, instead of incrementing in seconds, it adds in days!
time = 1+0:50000+0; % sec
% To show date as plot label it should be converted from numbers to letters
hr_matlab = time' + datenum('4:10:44 PM');
hr= datestr(hr_matlab, 'HH:MM:ssPM');
figure(222)
plot(hr,S,'-b','LineWidth',2)
I am using MATLAB2014a and don't have access to function datetime.
datenum converts the date to a number that represents days as whole numbers. For that reason, when you add the vector [1,2,3,...], you acturally add days to your fixed time ('4:10:44 PM').
if you want to add it as seconds, you need to divide time in the amount of seconds per day:
hr_matlab = (time')/86400 + datenum('4:10:44 PM');
One simple option is to add two date numbers:
hr_matlab = datenum('4:10:44 PM') + datenum(0, 0, 0, 0, 0, time.');
So, I'm beginning to use timeseries in MATLAB and I'm kinda stuck.
I have a list of timestamps of events which I imported into MATLAB. It's now a 3000x25 array which looks like
2000-01-01T00:01:01+00:00
2000-01-01T00:01:02+00:00
2000-01-01T00:01:03+00:00
2000-01-01T00:01:04+00:00
As you can see, each event was recorded by date, hour, minute, second, etc.
Now, I would like to count the number of events by date, hour, etc. and then do various analyses (regression, etc.).
I considered creating a timeseries object for each day, but considering the size of the data, that's not practical.
Is there any way to manipulate this array such that we have "date: # of events"?
Perhaps there's just a simpler way to count events using timeseries?
As others have suggested, you should convert the string dates to serial date numbers. This makes it easy to work with the numeric data.
An efficient way to count number of events per interval (days, hours, minutes, etc...) is to use functions like HISTC and ACCUMARRAY. The process will involve manipulating the serial dates into units/format required by such functions (for example ACCUMARRAY requires integers, whereas HISTC needs to be given the bin edges to specify the ranges).
Here is a vectorized solution (no-loop) that uses ACCUMARRAY to count number of events. This is a very efficient function (even of large input). In the beginning I generate some sample data of 5000 timestamps unevenly spaced over a period of 4 days. You obviously want to replace it with your own:
%# lets generate some random timestamp between two points (unevenly spaced)
%# 1000 timestamps over a period of 4 days
dStart = datenum('2000-01-01'); % inclusive
dEnd = datenum('2000-01-5'); % exclusive
t = sort(dStart + (dEnd-dStart).*rand(5000,1));
%#disp( datestr(t) )
%# shift values, by using dStart as reference point
dRange = (dEnd-dStart);
tt = t - dStart;
%# number of events by day/hour/minute
numEventsDays = accumarray(fix(tt)+1, 1, [dRange*1 1]);
numEventsHours = accumarray(fix(tt*24)+1, 1, [dRange*24 1]);
numEventsMinutes = accumarray(fix(tt*24*60)+1, 1, [dRange*24*60 1]);
%# corresponding datetime range/interval label
days = cellstr(datestr(dStart:1:dEnd-1));
hours = cellstr(datestr(dStart:1/24:dEnd-1/24));
minutes = cellstr(datestr(dStart:1/24/60:dEnd-1/24/60));
%# display results
[days num2cell(numEventsDays)]
[hours num2cell(numEventsHours)]
[minutes num2cell(numEventsMinutes)]
Here is the output for the number of events per day:
'01-Jan-2000' [1271]
'02-Jan-2000' [1258]
'03-Jan-2000' [1243]
'04-Jan-2000' [1228]
And an extract of the number of events per hour:
'02-Jan-2000 09:00:00' [50]
'02-Jan-2000 10:00:00' [54]
'02-Jan-2000 11:00:00' [53]
'02-Jan-2000 12:00:00' [74]
'02-Jan-2000 13:00:00' [49]
'02-Jan-2000 14:00:00' [59]
similarly for minutes:
'03-Jan-2000 08:54:00' [1]
'03-Jan-2000 08:55:00' [1]
'03-Jan-2000 08:56:00' [1]
'03-Jan-2000 08:57:00' [0]
'03-Jan-2000 08:58:00' [0]
'03-Jan-2000 08:59:00' [0]
'03-Jan-2000 09:00:00' [1]
'03-Jan-2000 09:01:00' [2]
You can convert those timestamps to a number with datenum:
A serial date number represents the whole and fractional number of days from a specific date and time, where datenum('Jan-1-0000 00:00:00') returns the number 1. (The year 0000 is merely a reference point and is not intended to be interpreted as a real year in time.)
This way, it's easier to check where a period starts and end. Eg: the week your looking for starts at x and ends at x+7.999... ; all you have to do to find events in that period is checking if the datenum value is between x and x+8:
week_x_events = find(dn_timestamp>=x & dn_timestamp<x+8)
The difficulty is in converting your timestamp to datenum acceptable format, which is doable using regexp, good luck!
I don't know what +00:00 means (maybe time zone?), but you can simply convert your string timestamps into numerical format:
>> t = datenum('2000-01-01T00:01:04+00:00', 'yyyy-mm-ddTHH:MM:SS')
t =
7.3049e+005
>> datestr(t)
ans =
01-Jan-2000 00:01:04
I have a dataset for which I have extracted the date at which an event occurred. The date is in the format of MMDDYY although MatLab does not show leading zeros so often it's MDDYY.
Is there a method to find the mean or median (I could use either) date? median works fine when there is an odd number of days but for even numbers I believe it is averaging the two middle ones which doesn't produce sensible values. I've been trying to convert the dates to a MatLab format with regexp and put it back together but I haven't gotten it to work. Thanks
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
You can use datenum to convert dates to a serial date number (1 at 01/01/0000, 2 at 02/01/0000, 367 at 01/01/0001, etc.):
strDate='27112011';
numDate = datenum(strDate,'ddmmyyyy')
Any arithmetic operation can then be performed on these date numbers, like taking a mean or median:
mean(numDates)
median(numDates)
The only problem here, is that you don't have your dates in a string type, but as numbers. Luckily datenum also accepts numeric input, but you'll have to give the day, month and year separated in a vector:
numDate = datenum([year month day])
or as rows in a matrix if you have multiple timestamps.
So for your specified example data:
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
years = mod(dates,100);
dates = (dates-years)./100;
days = mod(dates,100);
months = (dates-days)./100;
years = years + 1900; % set the years to the 20th century
numDates = datenum([years(:) months(:) days(:)]);
fprintf('The mean date is %s\n', datestr(mean(numDates)));
fprintf('The median date is %s\n', datestr(median(numDates)));
In this example I converted the resulting mean and median back to a readable date format using datestr, which takes the serial date number as input.
Try this:
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
d=zeros(1,length(dates));
for i=1:length(dates)
d(i)=datenum(num2str(dates(i)),'ddmmyy');
end
m=mean(d);
m_str=datestr(m,'dd.mm.yy')
I hope this info to be useful, regards
Store the dates as YYMMDD, rather than as MMDDYY. This has the useful side effect that the numeric order of the dates is also the chronological order.
Here is the pseudo-code for a function that you could write.
foreach date:
year = date % 100
date = (date - year) / 100
day = date % 100
date = (date - day) / 100
month = date
newdate = year * 100 * 100 + month * 100 + day
end for
Once you have the dates in YYMMDD format, then find the median (numerically), and this is also the median chronologically.
You see above how to present dates as numbers.
I will add no your issue of finding median of the list. The default matlab median function will average the two middle values when there are an even number of values.
But you can do it yourself! Try this:
dates; % is your array of dates in numeric form
sdates = sort(dates);
mediandate = sdates(round((length(sdates)+1)/2));