Using logical comparisons to group date ranges - date

I have a dataset with dates and I need to group all of the data into 4 groups. Below is the code I have tried to run.
I get the following error:
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT,
LE, LT, MAX, MIN, NE, NG, NL, OR, [, ^=, {, |, ||, ~=.
Data _null_;
call symput ('timenow',put (time(),time.));
call symput ('datenow',put (date(), date9.));
run;
data Unemployment_Groups;
set WORK.import;
if missing(observation_date) then unemployment_Grp = .;
else if observation_date le 1969-12-31 THEN Unemployment_Grp = 1;
else if observation_date ge 1970-01-01 AND le 1984-12-31 THEN Unemployment_Grp = 2;
else if observation_date ge 1985-01-01 AND le 2007-12-31 THEN Unemployment_Grp = 3;
else if observation_date ge 2008-01-01 THEN Unemployment_Grp = 4;
run;
title "The current time is &timenow and the date is &datenow";
proc print data=Unemployment_Groups (obs=10) noobs;
run;

Your date value isn't being recognised as such, general syntax for date literals is 'DD Mon YYYY'd (e.g. 27 November 2022 would be '27NOV2022'd).
Also you are not using the logical operators correctly.
Try this:
data unemployment_groups;
set import;
if missing(observation_date) then unemployment_Grp = .;
else if observation_date le '31DEC1969'd then Unemployment_Grp = 1;
else if '01JAN1970'd <= observation_date <='31DEC1984'd then Unemployment_Grp = 2;
else if '01JAN1985'd <= observation_date <='31DEC2007'd then Unemployment_Grp = 3;
else if observation_date ge '01JAN2008'd then Unemployment_Grp = 4;
run;

Related

Modification of a SAS macro to print dichotomous variable information

I'm trying to modify the following SAS macro so that it includes includes percentages for the variable CHD when it is equal to both 0 and 1. Currently this macro is only set up to print out the results of baseline variables when the CHD (chronic heart disease) is equal to 1. I think the modification needs to occur within the data routfreq&i step but I'm not quite sure how to set it up. I would then also need an additional column to print out 'No Coronary Heart Disease * % (n)".
%macro categ(pred,i);
proc freq data = heart;
tables &pred * chd / chisq sparse outpct out = outfreq&i ;
output out = stats&i chisq;
run;
proc sort data = outfreq&i;
by &pred;
run;
proc means data = outfreq&i noprint;
where chd ne . and &pred ne .;
by &pred;
var COUNT;
output out=moutfreq&i(keep=&pred total rename=(&pred=variable)) sum=total;
run;
data routfreq&i(rename = (&pred = variable));
set outfreq&i;
length varname $20.;
if chd = 1 and &pred ne .;
rcount = put(count,8.);
rcount = "(" || trim(left(rcount)) || ")";
pctnum = round(pct_row,0.1) || " " || (rcount);
index = &i;
varname = vlabel(&pred);
keep &pred pctnum index varname;
run;
data rstats&i;
set stats&i;
length p_value $8.;
if P_PCHI <= 0.05 then do;
p_value = round(P_PCHI,0.0001) || "*";
if P_PCHI < 0.0001 then p_value = "<0.0001" || "*";
end;
else p_value = put(P_PCHI,8.4);
keep p_value index;
index = &i;
run;
data _null_;
set heart;
call symput("fmt",vformat(&pred));
run;
proc sort data = moutfreq&i;
by variable;
run;
proc sort data = routfreq&i;
by variable;
run;
data temp&i;
merge moutfreq&i routfreq&i;
by variable;
run;
data final&i;
merge temp&i rstats&i;
by index;
length formats $20.;
formats=put(variable,&fmt);
if not first.index then do;
varname = " ";
p_value = " ";
end;
drop variable;
run;
%mend;
%categ(gender,1);
%categ(smoke,2);
%categ(age_group,3);
%macro names(j,k,dataname);
%do i=&j %to &k;
&dataname&i
%end;
%mend names;
data categ_chd;
set %names(1,3,final);
label varname = "Demographic Characteristic"
total = "Total"
pctnum = "Coronary Heart Disease * % (n)"
p_value = "p-value * (2 sided)"
formats = "Category";
run;
ods listing close;
ods rtf file = "c:\nesug\table1a.rtf" style = forNESUG;
proc report data = categ_chd nowd split = "*";
column index varname formats total pctnum p_value;
define index /group noprint;
compute before index;
line ' ';
endcomp;
define varname / order = data style(column) = [just=left] width = 40;
define formats / order = data style(column) = [just=left];
define total / order = data style(column) = [just=center];
define pctnum / order = data style(column) = [just=center];
define p_value / order = data style(column) = [just=center];
title1 " NESUG PRESENTATION: TABLE 1A (NESUG 2004)";
title2 " CROSSTABS OF CATEGORICAL VARIABLES WITH CORONARY HEART DISEASE OUTCOME";
run;
ods rtf close;
ods listing;
Also, this code has the following error when it is run:
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
1:2
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
3:111
I think this macro needs to be modified so that it doesn't crash when it runs with categorical/character variables.
The line
if chd = 1 and &pred ne .;
Is what is causing your output to only have CHD = "1".. You would change that to:
if chd = 1 and &pred ne .;
I do not understand your request for an additional column. Perhaps post an example of the current output and the output that you want?
As for the "errors" (actually notes as they do not cause the system to stop processing), the occur when a variable is automatically converted from numeric to character or vice-versa. It provides the code line where it is happening and how many times it happened. I prefer to eliminate these notes as often as possible to avoid unintended consequences of inappropriate coercion. To do this, you would make use of the PUT and INPUT functions.

Count consecutive consonants in e-mail address in SAS SQL

I would like to identify the max number of consecutive consonants and vowels in an e-mail address, using SAS SQL (proc sql). The output should look like the one below in columns Max of consecutive consonants and max of consecutive vowels (I listed characters in first row for illustrative purposes only).
A few things to note:
treat special and numeric characters as a count terminator (e.g. 3rd email is a good example where you've got 3 consonants (hf) then numbers (98) and then again 2 consonants (jl). The output should be just 2 (hf).
I am only interested in the first part of the email (before #).
How do I achieve this, dear community?
E-mail Max of consecutive consonants Max of consecutive vowels
asifhajhtysiofh#gmail.com 5 (jhtys) 2 (io)
chris.nashfield#hotmail.com 3 2
ahf98jla#gmail.com 2 1
There is a routine called prxnext that proves very handy here.
Generate sample data
data emails;
input email $32.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
Do the counting
data checkEmails(keep = email maxCons maxVow);
set emails;
* Consonants;
re = prxparse("/[bcdfghjklmnpqrstvwxyz]+/");
start = 1;
stop = index(email,"#");
do until (pos = 0);
call prxnext(re,start,stop,email,pos,len);
maxCons = max(maxCons, len);
end;
* Vowels;
re = prxparse("/[aeiouy]+/");
start = 1;
stop = index(email,"#");
do until (pos = 0);
call prxnext(re,start,stop,email,pos,len);
maxVow = max(maxVow, len);
end;
run;
Results
Email MaxCons MaxVow
asifhajhtysiofh#gmail.com 5 2
chris.nashfield#hotmail.com 3 2
ahf98jla#gmail.com 2 1
This was much trickier than I expected it to be, but I have a solution using macro loops that roughly follows the logic in #DaBigNikoladze's comment:
data temp;
input email $40.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
run;
proc sql noprint;
select max(length(email)) into: max_email_length from temp;
quit;
%let vowels = "a" "e" "i" "o" "u";
%let consonants = "q" "w" "r" "t" "y" "p" "s" "d" "f" "g" "h" "j" "k" "l" "z" "x" "c" "v" "b" "n" "m";
%macro counter;
data temp_count;
set temp;
/* limit email to just the part before the #*/
email_short = substrn(email, 0, find(email, "#"));
email_vowels_only = email_short;
email_consonants_only = email_short;
/* keep only the vowels or consonants, respectively*/
%do i = 1 %to &max_email_length.;
if substr(email_vowels_only, &i., 1) notin(&vowels.) then substr(email_vowels_only, &i., 1) = " ";
if substr(email_consonants_only, &i., 1) notin(&consonants.) then substr(email_consonants_only, &i., 1) = " ";
%end;
run;
/* determine the max number of strings we have to scan through*/
proc sql noprint;
select max(max(countw(email_vowels_only)), max(countw(email_consonants_only))) into: loops from temp_count;
quit;
/* separate each string out into its own variable, find the max length of those variables, and drop those variables*/
proc sql;
create table temp_count_expand (drop = vowel_word: consonant_word:) as select
*,
%do j = 1 %to &loops.; scan(email_vowels_only, &j.) as vowel_word&j., %end;
%do k = 1 %to &loops.; scan(email_consonants_only, &k.) as consonant_word&k., %end;
max(%do j = 1 %to &loops.; length(calculated vowel_word&j.), %end; .) as max_vowels,
max(%do k = 1 %to &loops.; length(calculated consonant_word&k.), %end; .) as max_consonants
from temp_count;
quit;
%mend counter;
%counter;
I'm not sure why you specify proc sql for this task. A data step is much more suitable as you can loop through the email, treating everything that is either a non-consonant or a non-vowel as a delimiter. I've used a regular expression (prxchange) to remove the # portion of the email, although substr works just as well.
data have;
input Email $50.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
run;
data want;
set have;
length _w1 _w2 $50;
_short_email=prxchange('s/#.+//',-1,email); /* remove everything from # onwards */
do _i = 1 by 1 until (_w1=''); /* loop through email, using everything other than consonants as the delimiter */
_w1 = scan(_short_email,_i,'bcdfghjklmnpqrstvwxyz','ki');
consonant = max(consonant,ifn(missing(_w1),0,length(_w1))); /* keep longest value */
end;
do _j = 1 by 1 until (_w2=''); /* loop through email, using everything other than vowels as the delimiter */
_w2 = scan(_short_email,_j,'aeiou','ki');
vowel = max(vowel,ifn(missing(_w2),0,length(_w2))); /* keep longest value */
end;
drop _: ; /* drop temprorary variables */
run;

Iterating over multiple txt files and creating a new dataset for each in SAS

I am stuck with a problem in SAS. I have a bunch of monthly weather data in individual txt-files. My current goal is to read those in and create a separate data set for each. Alternatively, I could see it being possible to skip this step and go closer to end goal of merging all these data sets to another data set by the date and time. Below was my try at the problem. I thought a macro would work that iterates through the file names and creates matching data set names, but apparently it does not. Also, to make it more efficient the if/else if statements I think can be replaced by a DO loop but I could not figure it out. Help is much appreciated!
%macro loop;
%do i = 11 %to 13;
%do j = 01 %to 12;
%let year = i;
%let month = j;
data _&year&month ;
infile "&path\hr_pit_&year..&month..txt" firstobs=27;
length Time $ 4 Month $ 3 Day $ 2 Year $ 4 temp 3;
input time $ Month $ 10-13 Day Year temp 32-34;
Date = Day||Month||Year;
if time = '12AM' then time = 2400;
else if time = '1AM ' then time = 100;
else if time = '2AM ' then time = 200;
else if time = '3AM ' then time = 300;
else if time = '4AM ' then time = 400;
else if time = '5AM ' then time = 500;
else if time = '6AM ' then time = 600;
else if time = '7AM ' then time = 700;
else if time = '8AM ' then time = 800;
else if time = '9AM ' then time = 900;
else if time = '10AM' then time = 1000;
else if time = '11AM' then time = 1100;
else if time = '12PM' then time = 1200;
else if time = '1PM ' then time = 1300;
else if time = '2PM ' then time = 1400;
else if time = '3PM ' then time = 1500;
else if time = '4PM ' then time = 1600;
else if time = '5PM ' then time = 1700;
else if time = '6PM ' then time = 1800;
else if time = '7PM ' then time = 1900;
else if time = '8PM ' then time = 2000;
else if time = '9PM ' then time = 2100;
else if time = '10PM' then time = 2200;
else if time = '11PM' then time = 2300;
_time = input(time,4.);
time = _time;
drop month day year;
run;
%end;
%end;
%mend;
%loop; run:
In case anyone is wondering this is how a typical txt file looks: http://www.erh.noaa.gov/pbz/hourlywx/hr_pit_13.01
Here is a list of txt files in the same shape and form:
http://www.erh.noaa.gov/pbz/hourlyclimate.htm
First fixes in:
%let year = &i;
%let month = %sysfunc(putn(&j, z2.));
to use macro variables and add leading zero to month.
The rest of changes is just dealing with AM/PM.
Also the Date is now numeric.
Full code:
%macro loop;
%do i = 11 %to 13;
%do j = 1 %to 12;
%let year = &i;
%let month = %sysfunc(putn(&j, z2.));
data _&year&month ;
length Date 5 _Time $4 Time 8 Month $3 Day $2 Year $4 temp 3;
format Date DATE9.;
infile "&path\hr_pit_&year..&month..txt" firstobs=27;
input _time $ Month $ 10-13 Day Year temp 32-34;
_time = right(_time);
Date = input(Day||Month||Year, date9.);
if _time = '12AM' or (_time ne '12PM' and index(_time, 'PM') > 1 )
then time=input(_time, 2.) + 12;
else time=input(_time, 2.);
time = time * 100;
drop month day year;
run;
/* gather all data in one table */
proc append base=work.all_data data=work._&year&month;
run;
%end;
%end;
%mend;
proc sql;
drop table work.all_data;
quit;
%let path=E:;
%loop;
Sounds like the best answer may be to read them all into one dataset and then merge them to the final dataset from there. I think you also are served better by using a real time value, rather than 100-2400 (and an inconsistent 2400, that really should be 000 if you're doing that) - then you can just use input.
Anyway, if you just read the text files in like so:
data my_text_files;
infile "c:\mydirectory\*.txt" lrecl=whatever eov=eovmark;
*firstobs=27 is only respected for the first file - so we have to track with eovmark;
if eovmark then do;
eovmark=0;
linecounter=0;
end;
linecounter+1;
if linecounter ge 27 then do;
input (input statement);
(any other code you want to execute here);
output;
end;
run;
Then merge by (whatever). If you need to know some information about the filename you can use the filename option to get access to that in the infile statement.

How to compare date values in a macro?

Here is the macro I'm running....
%macro ControlLoop(ds);
%global dset nvars nobs;
%let dset=&ds;
/* Open data set passed as the macro parameter */
%let dsid = %sysfunc(open(&dset));
/* If the data set exists, then check the number of obs ,,,then close the data set */
%if &dsid %then %do;
%If %sysfunc(attrn(&dsid,nobs))>0 %THEN %DO;;
%local dsid cols rctotal ;
%let dsid = %sysfunc(open(&DS));
%let cols=%sysfunc(attrn(&dsid, nvars));
%do %while (%sysfunc(fetch(&dsid)) = 0); /* outer loop across rows*/
/*0:Success,>0:NoSuccess,<0:RowLocked,-1:eof reach*/
%If fmt_start_dt<=&sysdate9 and fmt_end_dt>=sysdate9 %then %Do;
%do i = 1 %to &cols;
%local v t; /*To get var names and types using
varname and vartype functions in next step*/
%let v=%sysfunc(varname(&dsid,&i)); /*gets var names*/
%let t = %sysfunc(vartype(&dsid, &i)); /*gets variable type*/
%let &v = %sysfunc(getvar&t(&dsid, &i));/*To get Var values Using
GetvarC or GetvarN functions based on var data type*/
%end;
%CreateFormat(dsn=&dsn, Label=&Label, Start=&Start, fmtName=&fmtName, type=&type);
%END;
%Else %put ###*****Format Expired*****;
%END;
%END;
%else %put ###*****Data set &dset has 0 rows in it.*****;
%let rc = %sysfunc(close(&dsid));
%end;
%else %put ###*****open for data set &dset failed - %sysfunc(sysmsg()).*****;
%mend ControlLoop;
%ControlLoop(format_control);
FOrmat_Control Data:
DSN :$12. Label :$15. Start :$15. fmtName :$8. type :$3. fmt_Start_dt :mmddyy. fmt_End_dt :mmddyy.;
ssin.prd prd_nm prd_id mealnm 'n' 01/01/2013 12/31/9999
ssin.prd prd_id prd_nm mealid 'c' 01/01/2013 12/31/9999
ssin.fac fac_nm onesrc_fac_id fac1SRnm 'n' 01/01/2013 12/31/9999
ssin.fac fac_nm D3_fac_id facD3nm 'n' 01/01/2013 12/31/9999
ssin.fac onesrc_fac_id D3_fac_id facD31SR 'n' 01/01/2013 02/01/2012
oper.wrkgrp wrkgrp_nm wrkgrp_id grpnm 'n' 01/01/2013 12/31/9999
How Can i compare fmt_Start_dt and fmt_end_dt with sysdate ?
I tried something like %If fmt_start_dt<=&sysdate9 and fmt_end_dt>=sysdate9 %then %Do; in the code but values are not picking up in the loop....Any Idea???
Thanks in advance....
I'm not entirely sure what you want, but I think this might work:
%if &fmt_start_dt <= %sysfunc(today()) and &fmt_end_dt >= %sysfunc(today())
Your FETCH function will copy dataset variables to macro variables, so you need to reference them with an ampersand. Also, you should use the TODAY() function rather than the SYSDATE9 macro variable.

Data control from textbox and inverted day/month values

I need to check if the date entered in a textbox is valid. It has to be a single textbox, so no workaround this way.
Now, I have this code:
Private Sub cmdOK_Click()
Dim dataAnalisi As Date
If IsDate(txtDataAnalisi.Value) Then
dataAnalisi = txtDataAnalisi.Value
Dim giornoAnalisi, meseAnalisi As Integer
giornoAnalisi = Format(dataAnalisi, "dd")
meseAnalisi = Format(dataAnalisi, "mm")
If giornoAnalisi <= 31 And meseAnalisi <= 12 Then
Call arrayList(dataAnalisi)
Unload Me
Else
GoTo DateError
End If
Else
DateError:
MsgBox "Inserire una data formattata correttamente!", vbCritical, "Errore nell'inserimento!"
txtDataAnalisi.SetFocus
End If
End Sub
Sorry if it has text in Italian. The function works decently, the only problem is that if I input for instance 11/14/12 (where the date is dd/mm/yy and 14 was a mistype) it inverts the day and month values. Instead, I want the sub to tell the user to check his input again! Can you help me? Thank you!
There are variations of this question every month or so. I am convinced that Excel will treat a date that is a valid American date as an American date. I have thought this for many years but others disagree.
I use functions like the one below which check for formats I believe Excel will misinterpret and convert them to an unambiguous format.
I use the English abbreviations for months. I believe French is the only language that does not permit three character abbreviations for months so perhaps you have your own set. You will have to adapt that part of the routine to your requirement.
Hopes this helps.
Function MyDateValue(ByVal DateIn As String, ByRef DateOut As Date) As Boolean
' DateIn is a value to be checked as a valid date.
' If it is a valid date, DateOut is set to its value and the function
' returns True.
' Excel misinterprets dates such as "4/14/11" as 14 April 2011. This routine
' checks for such dates and, if necessary, changes them to an unambiguous
' format before calling IsDate and DateValue.
Dim DatePart() As String
Dim MonthNum As Long
Const MonthAbbr As String = "janfebmaraprmayjunjulaugsepoctnovdec"
' Replace popular delimiters with Microsoft standard
DateIn = Replace(DateIn, "-", "/")
DateIn = Replace(DateIn, "\", "/")
DatePart = Split(DateIn, "/")
If UBound(DatePart) = 2 Then
' DateStg is three values separated by delimiters
' Check middle part
If IsNumeric(DatePart(1)) Then
MonthNum = Val(DatePart(1))
If MonthNum >= 1 And MonthNum <= 12 Then
' Middle part could be numeric month
' Convert to format Excel does not misinterpret
'Debug.Assert False
DatePart(1) = Mid(MonthAbbr, ((MonthNum - 1) * 3) + 1, 3)
DateIn = Join(DatePart, "-")
If IsDate(DateIn) Then
DateOut = DateValue(DateIn)
MyDateValue = True
Exit Function
End If
Else
' Middle part cannot be a month
'Debug.Assert False
MyDateValue = False
Exit Function
End If
Else
'Debug.Assert False
' The middle part is not a number. It could be a month abbreviation
MonthNum = InStr(1, MonthAbbr, LCase(DatePart(1)))
If MonthNum = 0 Then
' The middle portion is neither a month number nor a month abbreviation
Debug.Assert False
MyDateValue = False
Else
' The middle portion is a month abbreviation.
' Excel will handle date correctly
'Debug.Assert False
MonthNum = (MonthNum - 1) / 3 + 1
DateIn = Join(DatePart, "-")
If IsDate(DateIn) Then
'Debug.Assert False
DateOut = DateValue(DateIn)
MyDateValue = True
Exit Function
End If
End If
End If
Else
' Debug.Assert False
' Use IsDate for other formats
If IsDate(DateIn) Then
' Debug.Assert False
DateOut = DateValue(DateIn)
MyDateValue = True
Exit Function
Else
' Debug.Assert False
MyDateValue = False
End If
End If
End Function