lubridate assigns unrealistic date - date

I have a vector of dates in either dmY formats and Ymd format.
These are all dates in the last century.
From each, I need to extract just the year (Y).
I use the following code
library(lubridate)
sampleDates <- c(20100517,17052010)
result <- year(parse_date_time(x, guess_formats(as.character(x), c("Ymd","dmY"))))
result
517 2010
However, I expect something like
result
2010 2010

Here is a base R solution to your problem that takes a particular difficulty into account with your date format. Let's say you have the date 20112020, i.e. November 20th in the year 2020. For your function, it is not easy to distinguish which part of the string is the year - is it 2011 or 2020? The following code takes this difficulty into account, though let me mention that there surely must be simpler solutions.
Code
NonID <- grepl("^2", sampleDates) & (substr(sampleDates, 5, 5) == "2")
ID <- !NonID
dates_normal <- sampleDates[!NonID]
dates_special <- sampleDates[NonID]
normal_years <- as.numeric(c(substr(dates_normal, nchar(dates_normal) - 3, nchar(dates_normal)), substr(dates_normal, 1, 4)))
normal_years <- normal_years[normal_years > 1999]
special_years <- as.numeric(substr(dates_special, nchar(dates_special) - 3, nchar(dates_special)))
all_years <- c(normal_years, special_years)
all_years
> all_years
[1] 2010 2010
Explanation
First, we divide the date vector into those dates which exhibit the indistinguishability (dates_normal) and those which do not (dates_special). Then, for the normal dates, we use the substr() function to extract the first four and last four digits of the string and keep only those values which exceed 2000. For the special dates, we only keep the last four digits because the year can't be possibly included in the first four digits for this date format.

Related

Strange Date object in MongoDB [duplicate]

As per MDN
"Date objects are based on a time value that is the number of milliseconds since 1 January, 1970 UTC."
Then why does it accept negative values ?
Even if it did shouldn't negative value mean values before Jan 1, 1970 ?
new Date('0000', '00', '-1'); // "1899-12-30T05:00:00.000Z"
new Date('0000', '00', '00'); // "1899-12-31T05:00:00.000Z"
new Date('-9999', '99', '99'); // "-009991-07-08T04:00:00.000Z"
What is happening ?
Update
For some positive values , the year begins from 1900
new Date(100); // "1970-01-01T00:00:00.100Z" // it says 100Z
new Date(0100); // "1970-01-01T00:00:00.064Z" // it says 64Z
new Date("0006","06","06"); // "1906-07-06T04:00:00.000Z"
Also note that, in the last one, the date is shown as 4 which is wrong.
I suspect this is some sort of Y2K bug ?!!
This is hard and inconsistent, yes. The JavaScript Date object was based on the one in Java 1.0, which is so bad that Java redesigned a whole new package.
JavaScript is not so lucky.
Date is "based on" unix epoch because of how it is defined. It's internal details.
1st Jan 1970 is the actual time of this baseline.
since is the direction of the timestamp value: forward for +ve, backward for -ve.
Externally, the Date constructor has several different usages, based on parameters:
Zero parameters = current time
new Date() // Current datetime. Every tutorial should teach this.
The time is absolute, but 'displayed' timezone may be UTC or local.
For simplicity, this answer will use only UTC. Keep timezone in mind when you test.
One numeric parameter = timestamp # 1970
new Date(0) // 0ms from 1970-01-01T00:00:00Z.
new Date(100) // 100ms from 1970 baseline.
new Date(-10) // -10ms from 1970 baseline.
One string parameter = iso date string
new Date('000') // Short years are invalid, need at least four digits.
new Date('0000') // 0000-01-01. Valid because there are four digits.
new Date('1900') // 1900-01-01.
new Date('1900-01-01') // Same as above.
new Date('1900-01-01T00:00:00') // Same as above.
new Date('-000001') // 2 BC, see below. Yes you need all those zeros.
Two or more parameters = year, month, and so on # 1900 or 0000
new Date(0,0) // 1900-01-01T00:00:00Z.
new Date(0,0,1) // Same as above. Date is 1 based.
new Date(0,0,0) // 1 day before 1900 = 1899-12-31.
new Date(0,-1) // 1 month before 1900 = 1899-12-01.
new Date(0,-1,0) // 1 month and 1 day before 1900 = 1899-11-30.
new Date(0,-1,-1) // 1 month and *2* days before 1900 = 1899-11-29.
new Date('0','1') // 1900-02-01. Two+ params always cast to year and month.
new Date(100,0) // 0100-01-01. Year > 99 use year 0 not 1900.
new Date(1900,0) // 1900-01-01. Same as new Date(0,0). So intuitive!
Negative year = BC
new Date(-1,0) // 1 year before 0000-01-01 = 1 year before 1 BC = 2 BC.
new Date(-1,0,-1) // 2 days before 2 BC. Fun, yes? I'll leave this as an exercise.
There is no 0 AC. There is 1 AC and the year before it is 1 BC. Year 0 is 1 BC by convention.
2 BC is displayed as year "-000001".
The extra zeros are required because it is outside normal range (0000 to 9999).
If you new Date(12345,0) you will get "+012345-01-01", too.
Of course, the Gregorian calendar, adopted as late as 1923 in Europe, will cease to be meaningful long before we reach BC.
In fact, scholars accept that Jesus wasn't born in 1 BC.
But with the stars and the land moving at this scale, calendar is the least of your worries.
The remaining given code are just variations of these cases. For example:
new Date(0100) // One number = epoch. 0100 (octal) = 64ms since 1970
new Date('0100') // One string = iso = 0100-01-01.
new Date(-9999, 99, 99) // 9999 years before BC 1 and then add 99 months and 98 days
Hope you had some fun time. Please don't forget to vote up. :)
To stay sane, keep all dates in ISO 8601 and use the string constructor.
And if you need to handle timezone, keep all datetimes in UTC.
Well, firstly, you're passing in string instead of an integer, so that might have something to do with your issues here.
Check this out, it explains negative dates quite nicely, and there is an explanation for your exact example.
Then why does it accept negative values ?
You are confusing the description of how the data is stored internally with the arguments that the constructor function takes.
Even if it did shouldn't negative value mean values before Jan 1, 1970 ?
No, for the above reason. Nothing stops the year, month or day from being negative. You just end up adding a negative number to something.
Also note that, in the last one, the date is shown as 4 which is wrong.
Numbers which start with a 0 are expressed in octal, not decimal. 0100 === 64.
Please have a look at the documentation
Year: Values from 0 to 99 map to the years 1900 to 1999
1970 with appropriate timezone: new Date(0); // int MS since 1970
1900 (or 1899 with applied timezone): new Date(0,0) or new Date(0,0,1) - date is 1 based, month and year are 0 based
1899: new Date(0,0,-1)

Dates from the last month of specific ranges

How can I extract data from the previous month into three sections? I can use the function LASTMONTH but that gives me the whole month.
I need it split into three sections:
1st - 10th of last month
11th - 20th of last month
21st to end of last month
Sounds like a two step problem:
Write a formula which comes up with 3 values based on date
Group and Sort your data based on that formula
The formula in question will look something like the following:
IF {yourDate} in date(year(currentdate),month(currentdate)-1, 1) to date(year(currentdate),month(currentdate)-1, 10)
THEN "A"
ELSE IF {yourDate} in date(year(currentdate),month(currentdate)-1, 11) to date(year(currentdate),month(currentdate)-1, 20)
THEN "B"
ELSE IF {yourDate} in date(year(currentdate),month(currentdate)-1, 21) to date(year(currentdate),month(currentdate),1)-1
THEN "C"
ELSE "D"
Where A, B, and C are valid dates and D is all the invalid ones. Substitute as appropriate.

displaying dates of values - time series data

I am trying to maintain a table using some panel data. I have all the data outputting fine, but I am having difficulty getting the correct dates to display. The method I am using is the following:
gen ymdny = date(date,"MDY"); /*<- date var from panel dataset that i import*/
sort name ymdny;
summ ymdny;
local lastdate : disp %tdM-D r(max);
local lastdate2 : disp %tdM-D (r(max)-1);
local lastw : disp %tdM-D (r(max)-7);
This would work fine if the data were daily, but the dataset I have is actually business daily (ie. missing for the weekends and bank national holidays). It seems silly but I have not been able to figure out a workaround that does the job. Ideally - there is a function that i can use to print the corresponding date to a particular value.
For example:
gen resbal_1d = round(l1.resbal,0.1);
gen dateOf = dateOf(resbal_1d); /* <- pseudocode example of what I would like */
I'm not sure what you're asking for but my guess is that you want to see a human readable form date as the output, given a numerical input. (This is your last sentence.) So simply try something like:
display %td 10
The format is important as the following shows (see help format):
display %tq 10
Same numerical input, different format, different output.
Two other examples from the manual:
* string to integer
display date("5-12-1998", "MDY")
* string to date format
display %td date("5-12-1998", "MDY")
As for your example code, I don't get what you're aiming for. In effect, you can summarize the date variable because in Stata, dates are just integers. It's legal but couldn't say if it's good form. Below a simple example.
clear all
set more off
set obs 10
gen date = _n // create the data
format date %td // give date format
list
summarize date
local onedate = r(max)
display %td `onedate'
Some references:
[U] 24 Working with dates and time
help datetime
help datetime business calendars
http://www.stata.com/support/faqs/data-management/creating-date-variables/
http://www.ats.ucla.edu/stat/stata/modules/dates.htm
(Maybe you can explain with more detail and context what it is you want.)
Edit
Your comment
I do not see how this helps with the date output. For example,
displaying r(max) - 1 on a monday will still display the sunday date.
does not explain, at all, the problems you're having with Stata's business calendars.
I'm adding what is basically an example taken from the help file I already referenced. I do this with the hope of convincing you that (re)-reading the help files is worthwhile.
*clear all
set more off
* import string dates
infile str10 sdate float x using http://www.stata-press.com/data/r13/bcal_simple
list
*----- Regular dates -----
* create elapsed dates - Stata's way of managing dates
generate rdate = date(sdate, "MD20Y")
format rdate %td
drop sdate x
list
* compute previous and next dates
generate tomorrow1 = rdate + 1
format tomorrow1 %td
generate yesterday1 = rdate - 1
format yesterday1 %td
list
*----- Business dates -----
* convert regular date to business dates
generate bdate = bofd("simple", rdate)
format bdate %tbsimple
* compute previous and next dates
generate tomorrow2 = bdate + 1
format tomorrow2 %tbsimple
generate yesterday2 = bdate - 1
format yesterday2 %tbsimple
order yesterday1 rdate tomorrow1 yesterday2 bdate tomorrow2
list
/*
The stbcal-file for simple, the calendar shown below,
November 2011
Su Mo Tu We Th Fr Sa
---------------------------
1 2 3 4 X
X 7 8 9 10 11 X
X 14 15 16 17 18 X
X 21 22 23 X X X
X 28 29 30
---------------------------
*/
Notice that if you add or substract 1 from a regular date, then business days are not taken into account. If you do the same with a business calendar date, you get what you want. Business calendars are defined by .stbcal files; the example uses a built-in calendar called simple. You maybe need to make your own .stbcal file but it is not difficult. Again, the details are in the help files.

Vectorising Date Array Calculations

I simply want to generate a series of dates 1 year apart from today.
I tried this
CurveLength=30;
t=zeros(CurveLength);
t(1)=datestr(today);
x=2:CurveLength-1;
t=addtodate(t(1),x,'year');
I am getting two errors so far?
??? In an assignment A(I) = B, the number of elements in B and
Which I am guessing is related to the fact that the date is a string, but when I modified the string to be the same length as the date dd-mmm-yyyy i.e. 11 letters I still get the same error.
Lsstly I get the error
??? Error using ==> addtodate at 45
Quantity must be a numeric scalar.
Which seems to suggest that the function can't be vectorised? If this is true is there anyway to tell in advance which functions can be vectorised and which can not?
To add n years to a date x, you do this:
y = addtodate(x, n, 'year');
However, addtodate requires the following:
x must be a scalar number, not a string.
n must be a scalar number, not a vector.
Hence the errors you get.
I suggest you use a loop to do this:
CurveLength = 30;
t = zeros(CurveLength, 1);
t(1) = today; % # Whatever today equals to...
for ii = 2:CurveLength
t(ii) = addtodate(t(1), ii - 1, 'year');
end
Now that you have all your date values, you can convert it to strings with:
datestr(t);
And here's a neat one-liner using arrayfun;
datestr(arrayfun(#(n)addtodate(today, n, 'year'), 0:CurveLength))
If you're sequence has a constant known start, you can use datenum in the following way:
t = datenum( startYear:endYear, 1, 1)
This works fine also with months, days, hours etc. as long as the sequence doesn't run into negative numbers (like 1:-1:-10). Then months and days behave in a non-standard way.
Here a solution without a loop (possibly faster):
CurveLength=30;
t=datevec(repmat(now(),CurveLength,1));
x=[0:CurveLength-1]';
t(:,1)=t(:,1)+x;
t=datestr(t)
datevec splits the date into six columns [year, month, day, hour, min, sec]. So if you want to change e.g. the year you can just add or subtract from it.
If you want to change the month just add to t(:,2). You can even add numbers > 12 to the month and it will increase the year and month correctly if you transfer it back to a datenum or datestr.

Unix gettimeofday() - compatible algorithm for determining week within month?

If I've got a time_t value from gettimeofday() or compatible in a Unix environment (e.g., Linux, BSD), is there a compact algorithm available that would be able to tell me the corresponding week number within the month?
Ideally the return value would work in similar to the way %W behaves in strftime() , except giving the week within the month rather than the week within the year.
I think Java has a W formatting token that does something more or less like what I'm asking.
[Everything below written after answers were posted by David Nehme, Branan, and Sparr.]
I realized that to return this result in a similar way to %W, we want to count the number of Mondays that have occurred in the month so far. If that number is zero, then 0 should be returned.
Thanks to David Nehme and Branan in particular for their solutions which started things on the right track. The bit of code returning [using Branan's variable names] ((ts->mday - 1) / 7) tells the number of complete weeks that have occurred before the current day.
However, if we're counting the number of Mondays that have occurred so far, then we want to count the number of integral weeks, including today, then consider if the fractional week left over also contains any Mondays.
To figure out whether the fractional week left after taking out the whole weeks contains a Monday, we need to consider ts->mday % 7 and compare it to the day of the week, ts->wday. This is easy to see if you write out the combinations, but if we insure the day is not Sunday (wday > 0), then anytime ts->wday <= (ts->mday % 7) we need to increment the count of Mondays by 1. This comes from considering the number of days since the start of the month, and whether, based on the current day of the week within the the first fractional week, the fractional week contains a Monday.
So I would rewrite Branan's return statement as follows:
return (ts->tm_mday / 7) + ((ts->tm_wday > 0) && (ts->tm_wday <= (ts->tm_mday % 7)));
If you define the first week to be days 1-7 of the month, the second week days 8-14, ... then the following code will work.
int week_of_month( const time_t *my_time)
{
struct tm *timeinfo;
timeinfo =localtime(my_time);
return 1 + (timeinfo->tm_mday-1) / 7;
}
Assuming your first week is week 1:
int getWeekOfMonth()
{
time_t my_time;
struct tm *ts;
my_time = time(NULL);
ts = localtime(&my_time);
return ((ts->tm_mday -1) / 7) + 1;
}
For 0-index, drop the +1 in the return statement.
Consider this pseudo-code, since I am writing it in mostly C syntax but pretending I can borrow functionality from other languages (string->int assignment, string->time conversion). Adapt or expand for your language of choice.
int week_num_in_month(time_t timestamp) {
int first_weekday_of_month, day_of_month;
day_of_month = strftime(timestamp,"%d");
first_weekday_of_month = strftime(timefstr(strftime(timestamp,"%d/%m/01")),"%w");
return (day_of_month + first_weekday_of_month - 1 ) / 7 + 1;
}
Obviously I am assuming that you want to handle weeks of the month the way the standard time functions handle weeks of the year, as opposed to just days 1-7, 8-13, etc.