I am new to SPSS macros. I intend on automating CTables production. In the ctables everything remains the same except the variable.
My command is:
CTABLES
/VLABELS VARIABLES=usevar anzahl gesamt F0passthrough DISPLAY=LABEL
/TABLE usevar [C][COLPCT.VALIDN '' PCT40.1] + anzahl [C][COUNT '' F40.0] BY gesamt + F0passthrough
/SLABELS POSITION=ROW
/CATEGORIES VARIABLES=usevar anzahl ORDER=A KEY=VALUE EMPTY=INCLUDE
/CATEGORIES VARIABLES=gesamt F0passthrough ORDER=A KEY=VALUE EMPTY=EXCLUDE.
filter off.
usevar is the variable I aim on exchanging with the macro. (my variables for example are F5 F6 F7)
so i tried:
DEFINE !usevar()
F1 F5
!ENDDEFINE.
CTABLES
/VLABELS VARIABLES=usevar anzahl gesamt F0passthrough DISPLAY=LABEL
/TABLE usevar [C][COLPCT.VALIDN '' PCT40.1] + anzahl [C][COUNT '' F40.0] BY gesamt + F0passthrough
/SLABELS POSITION=ROW
/CATEGORIES VARIABLES=usevar anzahl ORDER=A KEY=VALUE EMPTY=INCLUDE
/CATEGORIES VARIABLES=gesamt F0passthrough ORDER=A KEY=VALUE EMPTY=EXCLUDE.
filter off.
Any help is much appreciated - did not provide sample data. Just need a hint in the right direction.
First of all, if you define the macro with the name "!usevar", you have to use the same name in the syntax - "usevar" won't do.
Anyway I suggest a different approach to the macro:
define !MyCtabMacro (!pos=!cmdend)
CTABLES
/VLABELS VARIABLES=!1 anzahl gesamt F0passthrough DISPLAY=LABEL
/TABLE !1 [C][COLPCT.VALIDN '' PCT40.1] + anzahl [C][COUNT '' F40.0] BY gesamt + F0passthrough
/SLABELS POSITION=ROW
/CATEGORIES VARIABLES=!1 anzahl ORDER=A KEY=VALUE EMPTY=INCLUDE
/CATEGORIES VARIABLES=gesamt F0passthrough ORDER=A KEY=VALUE EMPTY=EXCLUDE.
filter off.
!enddefine.
Now you can call your macro to create a table for each of your variables, for example:
!MyCtabMacro F5.
!MyCtabMacro F6.
If you are going to do this for many variables, you can let the macro loop through them:
define !MyCtabMacro (!pos=!cmdend)
!do !onevar !in(!1)
CTABLES
/VLABELS VARIABLES=!onevar anzahl gesamt F0passthrough DISPLAY=LABEL
/TABLE !onevar [C][COLPCT.VALIDN '' PCT40.1] + anzahl [C][COUNT '' F40.0] BY gesamt + F0passthrough
/SLABELS POSITION=ROW
/CATEGORIES VARIABLES=!onevar anzahl ORDER=A KEY=VALUE EMPTY=INCLUDE
/CATEGORIES VARIABLES=gesamt F0passthrough ORDER=A KEY=VALUE EMPTY=EXCLUDE.
filter off.
!doend
!enddefine.
Now to call the macro:
!MyCtabMacro F5 F6 F7 F8 F9.
Note: for a macro loop you can't use "F5 to F9", you have to list all the variables separately as in my example.
The documentation guide for DEFINE / ENDDEFINE can be a bit scary at first and so to understand all it's features it's best to play around with examples at first.
I share three examples below, which should give you some indication of where you might be going wrong:
GET FILE="C:\Program Files\IBM\SPSS\Statistics\24\Samples\English\Employee data.sav".
/* Example1: Using macro as a global string substitution for variable names */.
DEFINE !MyMac1 () educ jobcat !ENDDEFINE.
FREQ !MyMac1.
/* Example2: Having command in the body of macro with variable input as an argument */.
/* Result: Notice only single FREQ command is run with two variables */.
DEFINE !MyMac2 (VARS=!CMDEND).
SET MPRINT ON.
FREQ !VARS.
SET MPRINT OFF.
!ENDDEFINE.
!MyMac2 vars=educ jobcat.
/* Example3: Having command in the body of macro with variable input as an argument */
/* but looping over each variable */.
/* Result: Notice two separate FREQ commands are run */.
/* with one variable each, i.e. looped for each variable */.
DEFINE !MyMac3 (VARS=!CMDEND).
SET MPRINT ON.
!DO !I !IN (!VARS)
FREQ !i.
!DOEND
SET MPRINT OFF.
!ENDDEFINE.
!MyMac3 vars=educ jobcat.
The features of DEFINE/ENDDEFINE can be used in various ways. Once you build some knowledge of them all, you'll soon develop a particular style of how you prefer to code your macros. If you are learning SPSS macros for the first time and have some knowledge (or interest in Python) then I would encourage you NOT to start a journey of learning SPSS macros but instead learn python as this type of macro building is much more efficient (and fun!) to code in with Python (amongst many other benefits).
Related
could anybody help me with these two problems please?
First one is almost solved for me by question regular expression for multiline commentary in matlab , but I do not know how exactly I should use ^.*%\{(?:\R(?!.*%\{).*)*\R\h*%\}$ or where in grammar if I want use is with antlr4. I have been using matlab grammar from this source.
Second one is related to another type of commentary in matlab which is a = 3 % type any ascii I want.... In this case worked, when I insert label alternative to the rule context unary_expression in this form:
unary_expression
: postfix_expression
| unary_operator postfix_expression
| postfix_expression COMMENT
;
where COMMENT: '%' [ a-zA-Z0-9]*;, but when I use [\x00-\x7F] instead of [ a-zA-Z0-9]* (what I found here) parsing goes wrong, see example bellow:
INPUT FOR PARSER: a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
ANTLR OUTPUT : Exception in thread "main" java.lang.RuntimeException: set is empty
at org.antlr.v4.runtime.misc.IntervalSet.getMaxElement(IntervalSet.java:421)
at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:169)
at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:601)
at org.antlr.v4.Tool.generateInterpreterData(Tool.java:745)
at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:400)
at org.antlr.v4.Tool.process(Tool.java:361)
at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:328)
at org.antlr.v4.Tool.main(Tool.java:172)
line 1:9 token recognition error at: '$'
line 1:20 token recognition error at: '"'
line 1:21 token recognition error at: '!'
line 1:22 token recognition error at: '"'
line 1:38 token recognition error at: '$'
line 1:43 token recognition error at: '"'
line 1:10 missing {',', ';', CR} at 'L'
line 1:32 missing {',', ';', CR} at '3'
Can anybody please tell me what have I done wrong? And what is the best practice for this problem? (I am not exactly regex person...)
Let's take the simple one first.
this looks (to me) like a typical "comment everything through the end of the line" comment.
Assuming I'm correct, then best not to consider what all the valid characters are that might be contained, but rather to think about what not to consume.
Try: COMMENT: '%' ~[\r\n]* '\r'? '\n';
(I notice that you did not include anything in your rule to terminate it at the end of the line, so I've added that).
This basically says: once I see a % consume everything that is not a \r or `nand stop when you see an option\rfollowed by a required\n'.
Generally, comments can occur just about anywhere within a grammar structure, so it's VERY useful to "shove the off to the side" rather than inject them everywhere you allow them in the grammar.
So, a short grammar:
grammar test
;
test: ID EQ INT;
EQ: '=';
INT: [0-9]+;
COMMENT: '%' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN);
ID: [a-zA-Z]+;
WS: [ \t\r\n]+ -> skip;
You'll notice that I removed the COMMENT element from the test rule.
test file:
a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
(be sure to include the \n)
➜ grun test test -tree -tokens < test.txt
[#0,0:0='a',<ID>,1:0]
[#1,2:2='=',<'='>,1:2]
[#2,4:4='3',<INT>,1:4]
[#3,6:48='% $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa\n',<COMMENT>,channel=1,1:6]
[#4,49:48='<EOF>',<EOF>,2:0]
(test a = 3)
You still get a COMMENT token, it's just ignored when matching the parser rules.
Now for the multiline comments:
ANTLR uses a rather "regex-like" syntax for Lexer rules, but, don't be fooled, it's not (it's actually more powerful as it can pair up nested brackets, etc.)
From a quick reading, MatLab multiline tokens start with a %{ and consume everything until a %}. This is very similar to the prior rule, it just doesn't care about \ror\n`), so:
MLCOMMENT: '%{' .*? '%}' -> channel(HIDDEN);
Included in grammar:
grammar test
;
test: ID EQ INT;
EQ: '=';
INT: [0-9]+;
COMMENT: '%' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN);
MLCOMMENT: '%{' .*? '%}' -> channel(HIDDEN);
ID: [a-zA-Z]+;
WS: [ \t\r\n]+ -> skip;
Input file:
a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
%{
A whole bunch of stuff
on several
lines
%}
➜ grun test test -tree -tokens < test.txt
[#0,0:0='a',<ID>,1:0]
[#1,2:2='=',<'='>,1:2]
[#2,4:4='3',<INT>,1:4]
[#3,6:48='% $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa\n',<COMMENT>,channel=1,1:6]
[#4,50:106='%{\n A whole bunch of stuff\n on several\n lines\n%}',<MLCOMMENT>,channel=1,3:0]
[#5,108:107='<EOF>',<EOF>,8:0]
(test a = 3)
I am having an issue on my PERL grammar, here are the relevant parts of my grammar :
element
: element (ASTERISK_CHAR | SLASH_CHAR | PERCENT_CHAR) element
| word
;
SLASH_CHAR: '/';
REGEX_STRING
: '/' (~('/' | '\r' | '\n') | NEW_LINE)* '/'
;
fragment NEW_LINE
: '\r'? '\n'
;
If the rule REGEX_STRING is not commented, then the following perl doesn't parse :
$b = 1/2;
$c = 1/2;
<2021/08/20-19:24:37> <ERROR> [parsing.AntlrErrorLogger] - Unit 1: <unknown>:2:6: extraneous input '/2;\r\n$c = 1/' expecting {<EOF>, '=', '**=', '+=', '-=', '.=', '*=', '/=', '%=', CROSS_EQUAL, '&=', '|=', '^=', '&.=', '|.=', '^.=', '<<=', '>>=', '&&=', '||=', '//=', '==', '>=', '<=', '<=>', '<>', '!=', '>', '<', '~~', '++', '--', '**', '.', '+', '-', '*', '/', '%', '=~', '!~', '&&', '||', '//', '&', '&.', '|', '|.', '^', '^.', '<<', '>>', '..', '...', '?', ';', X_KEYWORD, AND, CMP, EQ, FOR, FOREACH, GE, GT, IF, ISA, LE, LT, OR, NE, UNLESS, UNTIL, WHEN, WHILE, XOR, UNSIGNED_INTEGER}
Note that it doesn't matter where the lexer rule REGEX_STRING is used, even if it is not present anywhere in the parser rules just being here makes the parsing fails (so the issue is lexer side).
If I remove the lexer rule REGEX_STRING, then it gets parsed just fine, but then I can't parse :
$dateCalc =~ /^([0-9]{4})([0-9]{2})([0-9]{2})/
Also, I noticed that this perl parses, so there seems to be some kind of interaction between the first and the second '/'.
$b = 12; # Removed the / between 1 and 2
$c = 1/2; # Removing the / here would work as well.
I can't seem to find how to write my regex lexer rule to not make something fail.
What am I missing ? How can I parse both expressions just fine ?
The basic issue here is that ANTLR4, like many other parsing frameworks, performs lexical analysis independent of the syntax; the same tokens are produced regardless of which tokens might be acceptable to the parser. So it is the lexical analyser which must decide whether a given / is a division operator or the start of a regex, a decision which can really only be made using syntactic information. (There are parsing frameworks which do not have this limitation, and thus can be used to implement for scannerless parsers. These include PEG-based parsers and GLR/GLR parsers.)
There's an example of solving this lexical ambiguity, which also shows up in parsing ECMAScript, in the ANTLR4 example directory. (That's a github permalink so that the line numbers cited below continue to work.)
The basic strategy is to decide whether a / can start a regular expression based on the immediately previous token. This works in ECMAScript because the syntactic contexts in which an operator (such as / or /=) can appear are disjoint from the contexts in which an operand can appear. This will probably not translate directly into a Perl parser, but it might help show the possibilities.
Line 780-782: The regex token itself is protected by a semantic guard:
RegularExpressionLiteral
: {isRegexPossible()}? '/' RegularExpressionBody '/' RegularExpressionFlags
;
Lines 154-182: The guard function itself is simple, but obviously required a certain amount of grammatical analysis to generate the correct test. (Note: The list of tokens has been abbreviated; see the original file for the complete list):
private boolean isRegexPossible() {
if (this.lastToken == null) {
return true;
}
switch (this.lastToken.getType()) {
case Identifier:
case NullLiteral:
...
// After any of the tokens above, no regex literal can follow.
return false;
default:
// In all other cases, a regex literal _is_ possible.
return true;
}
}
}
Lines 127-147 In order for that to work, the scanner must retain the previous token in the member variable last_token. (Comments removed for space):
#Override
public Token nextToken() {
Token next = super.nextToken();
if (next.getChannel() == Token.DEFAULT_CHANNEL) {
this.lastToken = next;
}
return next;
}
Suppose I have a char variable in Matlab like this:
x = 'hello ### my $ name is Sean Daley.';
I want to replace the first '###' with the char '&', and the first '$' with the char '&&'.
Note that the character groups I wish to swap have different lengths [e.g., length('###') is 3 while length('&') is 1].
Furthermore, if I have a more complicated char such that pairs of '###' and '$' repeat many times, I want to implement the same swapping routine. So the following:
y = 'hello ### my $ name is ### Sean $ Daley ###.$.';
would be transformed into:
'hello & my && name is & Sean && Daley &.&&.'
I have tried coding this (for any arbitrary char) manually via for loops and while loops, but the code is absolutely hideous and does not generalize to arbitrary character group lengths.
Are there any simple functions that I can use to make this work?
y = replace(y,["###" "$"],["&" "&&"])
The function strrep is what you are looking for.
Is it possible to use perl to remove parameters from a function definition? eg if my file contains the following text:
var1 = myfunc(sss,'ROW DRILL',1,1,0);
var2 = myfunc(fff,'COL DRILL',1,1,0);
var3 = myfunc(anyAlphaNum123,'anyAlphaNum123 or space',1,1,0);
donotcapture=myfunc2(rr,'some string',1,1,0);
I need to change it so that it becomes:
var1 = myfunc(sss,'ROW DRILL');
var2 = myfunc(fff,'COL DRILL');
var3 = myfunc(anyAlphaNum123,'anyAlphaNum123 or space');
donotcapture=myfunc2(rr,'some string',1,1,0);
Essentially just removing ,1,1,0 from all instances where myfunc is called, but preserving the first two parameters.
I have tried the following, but this approach would mean I have to write rules for each permutation...
perl -pi -w -e "s/myfunc\(rr,'COL SUBSET',1,1,0\)/myfunc\(rr,'COL SUBSET'\)/g;" *.txt
In order to reduce complexity, generalize your regex, using flexible regexes.
The regex for "anything between ( and ,, except , " : \(([^,]+),
The regex for "anything between ' and ', except ' " : \'([^']+)\'
In order to get the output right for the specific input (in spite of the flexibility),
use capture groups, i.e. (...).
They populate variables, which you can use as $1 in the substitution.
To prevent matching functions with names ending in your functions name, e.g. notmyfunc(),
use the regex for word boundary, i.e. \b.
Ikegamis edit (separated to keep visible what you and I learned the hard way):
Avoid double-quotes for the program argument.
It's just asking for trouble and requires so much extra escaping.
Note that \x27 is a single quote when used inside double-quotes or regex literals.
\' -> \x27
Only use one capture group (myfunc\([^,]+,\x27[^\x27]+\x27)
Remove the ;, which is not needed for a single statement.
Add a . to the input file wildcard, assuming you actually meant it like that.
Working code
(Comparing to chat note the \((; the backslash got lost, eaten by the chat I believe.):
perl -pi -w -e "s/(\bmyfunc)\(([^,]+),\'([^']+)\'(?:,\d+){3}\)/\$1\(\$2,\'\$3\'\)/g;" *txt
Ikegamis nice edit
(The detail which was so time-consuming in our chat is not easily visible anymore,
because the ( for the capture group was moved somewhere else.):
perl -i -wpe's/\b(myfunc\([^,]+,\x27[^\x27]+\x27)(?:,\d+){3}\)/$1)/g' *.txt
Input:
var1 = myfunc(sss,'ROW DRILL',1,1,0);
var2 = myfunc(fff,'COL DRILL',1,1,0);
var3 = myfunc(s,'ROW SUBSET',1,1,0);
var4 = myfunc(rr,'COL SUBSET',1,1,0);
var5 = myfunc(rr,'COL SUBSET',2,12,50); with different values
var6 = notmyfunc(rr,'COL SUBSET',1,1,0); tricky differet name
var1 = myfunc(sss,'ROW DRILL',1,1,0);
var2 = myfunc(fff,'COL DRILL',1,1,0);
var3 = myfunc(anyAlphaNum123,'anyAlphaNum123 or space',1,1,0);
donotcapture=myfunc2(rr,'some string',1,1,0);
Output (version "even more relaxed"):
var1 = myfunc(sss,'ROW DRILL');
var2 = myfunc(fff,'COL DRILL');
var3 = myfunc(s,'ROW SUBSET');
var4 = myfunc(rr,'COL SUBSET');
var5 = myfunc(rr,'COL SUBSET'); with different values
var6 = notmyfunc(rr,'COL SUBSET',1,1,0); tricky differet name
var1 = myfunc(sss,'ROW DRILL');
var2 = myfunc(fff,'COL DRILL');
var3 = myfunc(anyAlphaNum123,'anyAlphaNum123 or space');
donotcapture=myfunc2(rr,'some string',1,1,0);
Lessons learned:
I made a habit of creating regexes as tightly fitting the input as possible.
But that caught us/me unprepared, when applied to sample input by someone inexperienced with regexes. (Absolutely no blame on you.)
Posting code-quotes into chat is dangerous, be careful with \.
I want to what does the .(dot) meant in the following sas line statement: (line &ls.*"_";)
I know the ls is a macro variable but what does the dot mean?
option pageno=1 nodate center;
%let ls=68;
%let ps=20;
proc report data=class2 LS=&ls PS=&ps SPLIT="/" center headline headskip nowd spacing=5 out=outdata1;
column sex age name height weight notdone;
define sex / order order=internal descending width=6 LEFT noprint;
define age / order order=internal width=3 spacing=0 "age" right;
define name / display width=8 left "name" flow;
define height / sum width=11 right "height";
define weight / sum width=11 right "weight";
define notdone / sum format= notdone. width=15 left "status";
computer before;
nd=notdone.sum;
endcomp;
compute before _page_/left;
line "gender group: " sex $gender.;
line &ls.*"_";
line ' ';
endcomp;
The period delimits the end of a macro variable name. Often, this isn't necessary as SAS will recognize the end of a macro variable name as soon as is sees a character that is not valid in a SAS name (e.g. space, semicolon). Most importantly, the period allows you to tell SAS the end of the macro variable name if it's in the middle of a string.
%let mv=var;
%put &mv.3;
returns var3 to the log, whereas &mv3 would fail to resolve without there being a macro variable named mv3 defined.
Also, realize that the delimiting period is not contained in the resolved code. e.g:
%let lib=sashelp;
data cars;
set &lib..cars;
run;
The set statement after resolving the macro variable is
set sashelp.cars;