Stata: adding a number to a date variable

Stata: adding a number to a date variable - date

I am working with hospital admission data, where information on admission date and discharge date is stored in clock format %tcCCYY-NN-DD_hh:MM_AM, i.e. for example
discharge date
2009-04-21 9:00 AM
So the data information is stored as milliseconds since January 1, 1960, and transforming this into a numeric double variable gives me
discharge date
1556269200000
Now, I would like to shift some of my date variables by 1 minute (just an example), and generate a new variable
gen new_discharge_date = discharge_date + 60*1000
This will only incidentally shift the discharge date by exactly one minute
In the example above this will instead give me
new_discharge_date
2009-04-25 9:00 AM
or as double
new_discharge_date
1556269236224
The difference between new_discharge_date and discharge_date is only 36224 milliseconds instead of 60000.
The problem occurs systematically, sometimes the number of milliseconds since January 1, 1960, will even be lower than before.
Any idea what I am doing wrong?

Executive summary: Adding a constant to a date-time variable with units milliseconds creates another date-time variable. Both variables should be type double.
First note that clock is not a storage format in Stata. Clock date-time variables are stored as integers; clock format is a numeric display format, which is quite different. In fact the description in the original question is backwards: the date-time data arrive as strings, which are then converted to milliseconds with the clock() function.
You are correct that clock date-times should be stored as doubles, as they are often very large integers, but for precisely that reason your shifted date-time (1 minute more than the original values) should not be stored in a float, which is what your generate does by default. You need to specify double in the generate statement. Using float instead just gives a crude approximation, which is why you observe errors. This is easy to check using your example as sandbox.
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen s_discharge_date = "2009-04-21 9:00 AM"
. gen double discharge_date = clock(s_discharge_date, "YMD hm")
. format discharge_date %tc
. gen double new_discharge_date = discharge_date + 60*1000
. format new %tc
. gen long new_discharge_date2 = discharge_date + 60*1000
. format new_discharge_date2 %tc
. list
+--------------------------------------------------------------+
1. | s_discharge_date | discharge_date | new_discharge_date |
| 2009-04-21 9:00 AM | 21apr2009 09:00:00 | 21apr2009 09:01:00 |
|--------------------------------------------------------------|
| new_di~2 |
| . |
+--------------------------------------------------------------+
The advice given in a comment to use long is wrong, as the last experiment shows immediately. Fairly recent date-times have values in trillions, some orders of magnitude larger than be could held in a long. help data types shows the limits on values in various types.

Related

How to transform date in Stata?

I've looked for help on the internet for the following, but I could not find a satisfying answer: for an assignment, I need to plot the time series of a certain variable (the term spread in percentages), with years on the x-axis.
However, we use daily data. Does anybody know a convenient way in which this can be done? The 'date' variable that I've got is formulated in the following way: 20111017 represents the 17th of October 2011.
I tried to extract the first 4 numbers of the variable 'date', by using the substr(date, 1, 4) command, but the message 'type mismatch' popped up. Also, I'm not quite sure if it gives the right information if I only use the years to plot daily data (over the years). It now gives the following graph, which doesn't look that nice.

Answering the question in your title.
The date() function expects a string. If your variable with value 20111017 is in a numeric format you can convert it like this: tostring datenum , gen(datestr).
Then when using the date() function you must provide a mask that tells Stata what format the date string is in. Below is a reproducible example you can run to see how this works.
* Example generated by -dataex-. For more info, type help dataex
clear
input float datenum
20111016
end
* Convert numberic varaible to string
tostring datenum , gen(datestr)
* Convert string to date
gen date = date(datestr, "YMD")
* Display date as date
format date %td
If this does not help you, try to provide a reproducible example.

This adds some details to the helpful answer by #TheIceBear.
As he indicates, one way to get a Stata daily date from your run-together date variable is convert it to a string first. But tostring is just one way to do that and not essential. (I have nothing against tostring, as its original author, but it is better suited to other tasks.)
Here I use daily() not date(): the results are identical, but it's a good idea to use daily(): date() is all too often misunderstood as a generic date function, whereas all it does is produce daily dates (or missings).
To get a numeric year variable, just divide by 10000 and round down. You could convert to a string, extract the first 4 characters, and then convert to numeric, but that's more operations.
clear
set obs 1
gen long date = 20111017
format date %8.0f
gen ddate = daily(strofreal(date, "%8.0f"), "YMD")
format %td ddate
gen year = floor(date/10000)
list
+-----------------------------+
| date ddate year |
|-----------------------------|
1. | 20111017 17oct2011 2011 |
+-----------------------------+

How do I convert Stata dates (%td format e.g. 30jan2015) into YYYYMMDD format (e.g. 20150130)

* date is in %td format
gen date1 = real(string(mofd(daily(date, "DMY")), "%tmCYN"))
* type mismatch error
tostring date, gen(dt)
gen date1 = real(string(mofd(daily(dt, "DMY")), "%tmCYN"))
* the code runs but generates no results
tostring date, gen(dt)
gen date2=date(dt, "YMD")
* the code runs but generates no results

If a date variable has a display format %td it must be numeric and stored as some kind of integer. The display format is, and is only, an instruction to Stata on how to display such integers. Confusions about conversion often seem to hinge on a misunderstanding about what format means, as format is an overloaded word in computing, referring variously to file format (as in graphics file format, .png or jpg or whatever); data layout (as in wide or long layout, structure or format); variable or storage type; and (here) display format. There could well be yet other meanings.
A date displayed as 30jan2015 is stored as an integer, namely
. display mdy(1, 30, 2015)
20118
and a glance at help data types shows that your variable date could be stored as an int, float, long or double. All would work, although int is least demanding of memory. You would need (e.g.) to run describe date to find out which type is being used in your case, but nothing to come in this answer depends on knowing that type. Note that finding out what Stata is doing and thinking can be illuminated by running display with simple, single examples.
Your question is ambiguous.
Want to change display format? If you wish merely to see your dates in a display format exemplified by 20150130 then consulting help datetime display formats shows that the display format is as tested here with display, which can be abbreviated all the way down to di
. di %tdCCYYNNDD 20118
20150130
so
format date %tdCCYYNNDD
is what you need. That instructs Stata to change the display format, but the numbers stored remain precisely as they were.
Want such dates as variables held as integers? If you want the dates to be held as integers like 20150130 then you could convert it to string using the display format above, and then to a real value. A minimal sandbox dataset shows this:
. clear
. set obs 1
Number of observations (_N) was 0, now 1.
. gen date = 20118
. gen wanted = real(strofreal(date, "%tdCCYYNNDD"))
. format wanted %8.0f
. l
+------------------+
| date wanted |
|------------------|
1. | 20118 20150130 |
+------------------+
A display format such as %8.0f is needed to see such values directly.
Another method is to generate a large integer directly. You need to be explicit about a suitable storage type and (as just mentioned) need to set an appropriate format, but it can be got to work:
. gen long also = 10000 * year(date) + 100 * month(date) + day(date)
. format also %8.0f
Want such dates as variables held as strings? This is the previous solution, but leave off the real(). The default display format will work fine.
. gen WANTED = strofreal(date, "%tdCCYYNNDD")
. l
+-----------------------------+
| date wanted WANTED |
|-----------------------------|
1. | 20118 20150130 20150130 |
+-----------------------------+
I have not used tostring here but as its original author I have no bias against it. The principles needed here are better illustrated using the underlying function strofreal(). The older name string() will still work.
Turning to your code,
tostring date, gen(dt)
will just put integers like 20118 in string form, so "20118", but there is no way that Stata can understand that alone to be a daily date. You could have run tostring with a format argument, which would have been equivalent to the code above. The advantage of tostring would only be if you had several such variables you wished to convert at once, as tostring would loop over such variables for you.
I can't follow why you thought that conversion to a monthly date or use of a monthly date display format was needed or helpful, as at best you'd lose the information on day of the month. Thus at best Stata can only map a monthly date back to the first day of that month, and at worst a monthly date (here 660) could not be understood as anything you want.
. di mofd(20118)
660
. di %td mofd(20118)
22oct1961
. di %td dofm(mofd(20118))
01jan2015
There is no shortcut to understanding how Stata thinks about dates that doesn't involve reading the needed parts of help datetime and help datetime display formats.
Yet more explanation and examples can be found at https://www.stata-journal.com/article.html?article=dm0067

Referring to an exact date in calculation

I have a date variable (dd/mm/yyyy).
I need to create a similar variable that is equivalent to Dec. 31 2016 to use it in a calculation.
How would I do this?

You need to use the daily() function and then format the numeric variable accordingly:
clear
set obs 1
generate date = daily("31Dec2016", "DMY")
format %tdMonDDCCYY date
list
+-----------+
| date |
|-----------|
1. | Dec312016 |
+-----------+
Type help daily() and help format from Stata's command prompt for details.

I take it that you have a numeric daily date variable. Some people hold dates as strings, which isn't very useful in Stata, and there are other kinds of numeric date variable.
A date like 31 December 2016 is a constant which can be calculated as
. di mdy(12, 31, 2016)
20819
and for display could be
. di %td mdy(12, 31, 2016)
31dec2016
You can get the same result in other ways, such as
. di daily("31 Dec 2016", "DMY")
20819
Nothing stops you putting this constant in a variable, but that just copies the same value as many times as you have observations, and is for most purposes pointless. Either use it directly or make your code easier to understand by using some evocative macro or scalar name:
. local Dec_31_2016 = mdy(12, 31, 2016)
. local today = mdy(8, 7, 2018)
. di `today' - `Dec_31_2016'
584
I have guessed that the most likely use for a date constant is to calculate time elapsed since some benchmark date.

Transform string monthly dates in Stata

I have a problem in Stata with the format of the dates. I believe it is a very simple question but I can't see how to fix it.
I have a csv file (file.csv) that looks like
v1 v2
01/01/2000 1.1
01/02/2000 1.2
01/03/2000 1.3
...
01/12/2000 1.12
01/02/2001 1.1
...
01/12/2001 1.12
The form of v1 is dd/mm/yyyy.
I import the file in Stata using import delimited ...file.csv
v1 is a string variable, v2 is a float.
I want to transform v1 in a monthly date that Stata can read.
My attempts:
1)
gen Time = date(v1, "DMY")
format Time %tm
which gives me
Time
3177m7
3180m2
3182m7
...
that looks wrong.
2) In alternative
gen v1_1=v1
replace v1_1 = substr(v1_1,4,length(v1_1))
gen Time_1 = date(v1_1, "MY")
format Time_1 %tm
which gives exactly the same result.
And if I type
tsset Time, format(%tm)
it tells me that there are gaps but there are no gaps in the data.
Could you help me to understand what I'm doing wrong?

Stata has wonderful documentation on dates and times, which you should read from beginning to end if you plan on using time-related variables. Reading this documentation will not only solve your current problem, but will potentially prevent costly errors in the future. The section related to your question is titled "SIF-to-SIF conversion." SIF means "Stata internal form."
To explain your current issue:
Stata stores dates as numbers; you interpret them as "dates" when you assign a format. Consider the following:
set obs 1
gen dt = date("01/01/2003", "DMY")
list dt
// 15706
So that date is assigned the value 15706. Let's format it to look like a day:
format dt %td
list
// 01jan2003
Now let's format it to be a month:
format dt %tm
list
// 3268m11
Notice that dt is just a number that you can format and use like a day or month. To get a "month number" from a "day number", do the following:
gen mt = mofd(dt) // mofd = month of day
format mt %tm
list
// dt mt
// 3268m11 2003m1
The variable mt now equals 516. January 2003 is 516 months from January 1960. Stata's "epoch time" is January 1, 1960 00:00:00.000. Date variables are stored as days since the epoch time, and datetime variables are stored as miliseconds since the epoch time. A month variable can be stored as months since the epoch time (that's how the %tm formatting determines which month to show).

How to produce a formatted date string in Q/KDB?

How can one produce an ISO date string "yyyy-MM-dd" from a Q date type? I looked at concatenating the various parts but am not even able to get the day/month, e.g. d:2015.12.01;d.month prints 2015.12, i.e. more than just the month.

If you plan to do it on a large scale (i.e. a large vector/list of dates or a column in a table) and you're sure your dates are always well-formed, then you could use a dot-amend:
q)update .[;(::;4 7);:;"-"]string date from ([] date:2#.z.D)
date
------------
"2016-01-04"
"2016-01-04"
This way you wouldn't have to apply to "each" entry of the vector/list, it works on the vector/list itself.

q)"-" sv "." vs string[2015.12.01]
"2015-12-01"
vs vector from string, splits by "." above;
sv string to vector, join by "-" above.
Remember a string is just a char array, so you can grab each part as you require with indexing. But the above is useful as the resulting vector of vs gives a 3-length vector that you manipulate any way you like

I believe the shortest (and cleanest) option for ISO8601 UTC timestamp available since at least kdb v3.4 would be to use .h.iso8601 builtin
i.e.
q).h.iso8601 .z.p
"2020-11-09T15:42:19.292301000"
Or, if you just need milliseconds similar to what JS toISOString() does, use:
q).isotime:{(23#.h.iso8601 x),"Z"}
q).isotime[.z.p]
"2020-11-09T16:02:02.601Z"
q).isotime[2015.12.01]
"2015-12-01T00:00:00.000Z"
Note .z.p is important, as .h.iso8601 .z.P would silently give you local time without timezone (+0100 etc) so it would still be interpreted as UTC by compliant ISO8601 parser :(

Check-out this GitHub library for datetime formatting. It supports the excel way of formatting date and time. It might not be the right fit for formatting a large number of objects.
q).dtf.format["yyyy-mm-dd"; 2018.06.08T01:02:03.456]
"2018-06-08"
time formatting :
q).dtf.format["yyyy-mmmm-dd hh:uu AM/PM"; 2018.01.08T01:02:03.456]
"2018-January-08 01:02 AM"

I am using something like this:
q)ymd:{[x;s](4#d),s,(2#-5#d),s,-2#d:string[x]}
q)ymd[.z.D;"-"]
"2016-01-25"
q)ymd[.z.D;"/"]
"2016/01/25"
q)ymd[.z.D;""]
"20160125"
Or for tables:
q)t:([]a:5#1;5#.z.d)
q)update s:ymd[;"-"] each d from t
a d s
-------------------------
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"

Please change the separator like - or / in the update statement.
update s:{ssr[string x;".";y]}'[d;"-"] from ([]a:5#1;5?.z.d)
a d s
-------------------------
1 2010.12.31 "2010-12-31"
1 2012.08.24 "2012-08-24"
1 2004.12.05 "2004-12-05"
1 2000.10.02 "2000-10-02"
1 2006.09.10 "2006-09-10"