I have a problem in Stata with the format of the dates. I believe it is a very simple question but I can't see how to fix it.
I have a csv file (file.csv) that looks like
v1 v2
01/01/2000 1.1
01/02/2000 1.2
01/03/2000 1.3
...
01/12/2000 1.12
01/02/2001 1.1
...
01/12/2001 1.12
The form of v1 is dd/mm/yyyy.
I import the file in Stata using import delimited ...file.csv
v1 is a string variable, v2 is a float.
I want to transform v1 in a monthly date that Stata can read.
My attempts:
1)
gen Time = date(v1, "DMY")
format Time %tm
which gives me
Time
3177m7
3180m2
3182m7
...
that looks wrong.
2) In alternative
gen v1_1=v1
replace v1_1 = substr(v1_1,4,length(v1_1))
gen Time_1 = date(v1_1, "MY")
format Time_1 %tm
which gives exactly the same result.
And if I type
tsset Time, format(%tm)
it tells me that there are gaps but there are no gaps in the data.
Could you help me to understand what I'm doing wrong?
Stata has wonderful documentation on dates and times, which you should read from beginning to end if you plan on using time-related variables. Reading this documentation will not only solve your current problem, but will potentially prevent costly errors in the future. The section related to your question is titled "SIF-to-SIF conversion." SIF means "Stata internal form."
To explain your current issue:
Stata stores dates as numbers; you interpret them as "dates" when you assign a format. Consider the following:
set obs 1
gen dt = date("01/01/2003", "DMY")
list dt
// 15706
So that date is assigned the value 15706. Let's format it to look like a day:
format dt %td
list
// 01jan2003
Now let's format it to be a month:
format dt %tm
list
// 3268m11
Notice that dt is just a number that you can format and use like a day or month. To get a "month number" from a "day number", do the following:
gen mt = mofd(dt) // mofd = month of day
format mt %tm
list
// dt mt
// 3268m11 2003m1
The variable mt now equals 516. January 2003 is 516 months from January 1960. Stata's "epoch time" is January 1, 1960 00:00:00.000. Date variables are stored as days since the epoch time, and datetime variables are stored as miliseconds since the epoch time. A month variable can be stored as months since the epoch time (that's how the %tm formatting determines which month to show).
Related
I've looked for help on the internet for the following, but I could not find a satisfying answer: for an assignment, I need to plot the time series of a certain variable (the term spread in percentages), with years on the x-axis.
However, we use daily data. Does anybody know a convenient way in which this can be done? The 'date' variable that I've got is formulated in the following way: 20111017 represents the 17th of October 2011.
I tried to extract the first 4 numbers of the variable 'date', by using the substr(date, 1, 4) command, but the message 'type mismatch' popped up. Also, I'm not quite sure if it gives the right information if I only use the years to plot daily data (over the years). It now gives the following graph, which doesn't look that nice.
Answering the question in your title.
The date() function expects a string. If your variable with value 20111017 is in a numeric format you can convert it like this: tostring datenum , gen(datestr).
Then when using the date() function you must provide a mask that tells Stata what format the date string is in. Below is a reproducible example you can run to see how this works.
* Example generated by -dataex-. For more info, type help dataex
clear
input float datenum
20111016
end
* Convert numberic varaible to string
tostring datenum , gen(datestr)
* Convert string to date
gen date = date(datestr, "YMD")
* Display date as date
format date %td
If this does not help you, try to provide a reproducible example.
This adds some details to the helpful answer by #TheIceBear.
As he indicates, one way to get a Stata daily date from your run-together date variable is convert it to a string first. But tostring is just one way to do that and not essential. (I have nothing against tostring, as its original author, but it is better suited to other tasks.)
Here I use daily() not date(): the results are identical, but it's a good idea to use daily(): date() is all too often misunderstood as a generic date function, whereas all it does is produce daily dates (or missings).
To get a numeric year variable, just divide by 10000 and round down. You could convert to a string, extract the first 4 characters, and then convert to numeric, but that's more operations.
clear
set obs 1
gen long date = 20111017
format date %8.0f
gen ddate = daily(strofreal(date, "%8.0f"), "YMD")
format %td ddate
gen year = floor(date/10000)
list
+-----------------------------+
| date ddate year |
|-----------------------------|
1. | 20111017 17oct2011 2011 |
+-----------------------------+
I have dataset with dates stored as strings in a format ddMonyy e.g. 19Dec16.
When converting the strings using date7. informat to SAS date, some years are interpreted as 19yy and some as 20yy.
Here is a sample code
data strDates;
infile cards;
input StringDate $;
cards;
31Dec99
01Jan00
19Dec16
31Dec25
01Jan26
;
run;
data convertTest;
set strDates;
format Date date9.;
Date=input(StringDate,date7.);
run;
Running the code today (19 Dec 2016) produces the following results
strDate date
31Dec99 31DEC1999
01Jan00 01JAN2000
19Dec16 19DEC2016
31Dec25 31DEC2025
01jan26 01JAN1926
Dates between 01Jan00 and 31Dec25 are assigned to years 2000-2025 while dates from 01Jan26-31Dec99 are treated as years 1926-1999
Question:
How is it determined if 2000 or 1900 is to added to the year? I suspect it is dependent on the runtime (calendar year when the code is run?) - but I was not able to find any reference to this in SAS documentation.
There is an option, YEARCUTOFF, which depending on your system and version probably has a value of either 20 or 26. See KB note 46368 for more information on the change.
It sounds like you're using SAS 9.4, which means the default is 26: anything from 0-25 will be '20xx' and anything from 26-99 will be '19xx'. You can change the YEARCUTOFF option if that value does not work for your data (or, construct the 4 digit year yourself).
I want to assign the current year in a YY format to either a macro or data set variable.
I am able to use the automatic macro variables &sysdate or &sysdate9 to get the current date. However, extracting the year in a YY format is proving to be a nightmare. Below are some examples of what I've been trying.
There exists the YEARw. format. But when I try to use it I get errors or weird results. For instance, running
data _null_;
yy = year(input("&sysdate9.", year2.));
put yy=;
run;
produces the error
ERROR 48-59: The informat YEAR was not found or could not be loaded.
If I try to format the variable in the output, I get 1965 instead of the current year. The following
data _null_;
yy = year(input("&sysdate9.", date9.));
put yy= yy year2.;
run;
outputs
yy=2016 65
Please help.
This works to get you the 2-digit year number of the current year:
DATA _NULL_;
YEAR = PUT(TODAY(),YEAR2.);
PUT YEAR;
RUN;
/* Returns: 16 */
To breakdown what I am doing here:
I use TODAY() to get the current date as a DATE type. &SASDATE needs to be converted to a DATE, but also it is the date that the SAS session started. TODAY() is the current date.
PUT allows us to pass in a non-character (numeric/date) value, which is why it is used with TODAY() as opposed to INPUT.
I think it is worth exploring the issues here in more detail.
First, Formats are patterns for converting numeric values to a human readable format. That's what you want to do here: convert a date value to a human readable format, in this case to a year.
Informats, on the other hand, convert human readable information to numeric values. That's not what you're doing here; you have a value already.
Second, put matches with Formats, and input matches with informats, exclusively.
Third, you get close in your last try: but you misuse the year format. Formats are basically value mappings, so they map every possible numeric value in their range (sometimes "all values" is the range, sometimes not) to a display value (string). You need to know what kind of value is expected on the input. YEARw. expects a date value as input, not a year value: meaning input is "number of days from 1/1/1960", mapped to "year". So you cannot take a value you've already mapped to a year value and map it again with that method; it will not make any sense.
Let's look at it:
data _null_;
yy = year(input("&sysdate9.", date9.));
put yy= yy year2.;
run;
yy contains the result of the year function - 2016. Good so far. Now, you need the 2 digit year (16); you can get that through mod function, if you like, or put/substr/input:
data _null_;
yy = input(substr(put(year(input("&sysdate9.", date9.)),4.),3,2),2.);
put yy=;
run;
mod is probably easier though since it's a number. But of course you could've used year:
data _null_;
yy = put(input("&sysdate9.", date9.),year2.);
put yy=;
run;
Now, yy is character, so you could wrap that with input(...,2.) or leave it character depending on your purposes.
Finally - a use note on &sysdate9.. You can easily make this a date without input:
"&sysdate9."d
So:
yy = put("&sysdate9."d,year2.);
That's called a date literal (and "..."dt and "..."t also work for datetime,time). They require things in the standard SAS formats to work properly.
And as pointed out in Nicarus' answer, today() is a bit better than &sysdate9 since it is guaranteed to be today. If you're running this in batch or restart your session daily, this won't matter, but it will if you have a long-running session.
Apply the year function to the date variable
Convert to string
Take last 2 digits
EDIT: change input to PUT
Year = substr(put(year(today()), 4.), 3);
Hi I have a date conversion problem in SAS,
I imported an excel file which has the following dates.,
2012-01-09
2011-01-31
2010-06-28
2005-06-10
2012-09-19
2012-09-19
2007-06-12
2012-09-20
2004-11-01
2007-03-27
2008-06-23
2006-04-20
2012-09-20
2010-07-14
after I imported the dates have changed like this
40917
40574
40357
38513
41171
41171
39245
41172
38292
39168
39622
38827
41172
40373
I have used the input function to convert the dates but it gives a strange result.,
the code I used.,
want_date=input(have_date, anydtdte12.);
informat want_date date9.; format have_date date9.;run;
I get very stange and out of the World dates., any idea how can I convert these?
You can encourage SAS to convert the data as date during the import, although this isn't necessarily a panacea.
proc import file=whatever out=whatever dbms=excel replace;
dbdsopts=(dbSasType=( datevar=date ) );
run;
where datevar is your date column name. This tells SAS to expect this to be a date and to try to convert it.
See So Your Data Are in Excel for more information, or the documentation.
From : http://www2.sas.com/proceedings/sugi29/068-29.pdf
Times are counted internally in SAS as seconds since midnight and
date/time combinations are calculated as the number of seconds since
midnight 1 January 1960.
Excel also uses simple numerical values for dates and times
internally. For the date values the difference with the SAS date is
only the anchor point. Excel uses 1 January 1900 as day one.
So add a constant.
EXAMPLES:
SAS_date = Excel_date - 21916;
SAS_time = Excel_time * 86400;
SAS_date_time = (Excel_date_time - 21916) * 86400;
As Justin wrote you need to correct for the different zero date (SAS vs. Excel).
Then you just need to apply a format (if you want to get a date variable to do calculations):
want_date = have_date-21916;
format want_date date9.;
Or convert it to a string:
want_date = put(have_date-21916, date9.);
In either case you can choose the date format you prefer.
I'm trying to use joda-time to parse a date string of the form YYYY-MM-DD. I have test code like this:
DateTimeFormatter dateDecoder = DateTimeFormat.forPattern("YYYY-MM-DD");
DateTime dateTime = dateDecoder.parseDateTime("2005-07-30");
System.out.println(dateTime);
Which outputs:
2005-01-30T00:00:00.000Z
As you can see, the DateTime object produced is 30 Jan 2005, instead of 30 July 2005.
Appreciate any help. I just assumed this would work because it's one of the date formats listed here.
The confusion is with what the ISO format actually is. YYYY-MM-DD is not the ISO format, the actual resulting date is.
So 2005-07-30 is in ISO-8601 format, and the spec uses YYYY-MM-DD to describe the format. There is no connection between the use of YYYY-MM-DD as a pattern in the spec and any piece of code. The only constraint the spec places is that the result consists of a 4 digit year folowed by a dash followed by a 2 digit month followed by a dash followed by a two digit day-of-month.
As such, the spec could have used $year4-$month2-$day2, which would equally well define the output format.
You will need to search and replace any input pattern to convert "Y" to "y" and "D" to "d".
I've also added some enhanced documentation of formatting.
You're answer is in the docs: http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html
The string format should be something like: "yyyy-MM-dd".
The date format described in the w3 document and JodaTime's DateTimeFormat are different.
More specifically, in DateTimeFormat, the pattern DD is for Day in year, so the value for DD of 30 is the 30th day in the year, ie. January 30th. As the formatter is reading your date String, it sets the month to 07. When it reads the day of year, it will overwrite that with 01 for January.
You need to use the pattern strings expected by DateTimeFormat, not the ones expected by the w3 dat and time formats. In this case, that would be
DateTimeFormatter dateDecoder = DateTimeFormat.forPattern("yyyy-MM-dd");