C language preprocessor behavior - macros

There are different kind of macros in the C language, nested macro is one of them.
Considering a program with the following macro
#define HYPE(x,y) (SQUR(x)+SQUR(y))
#define SQUR(x) (x*x)
Using this we can successfully compile to get the result.
As we all know the C preprocessor replaces all the occurrence of the identifiers with the replacement-string. Considering the above example I would like to know how many times the C preprocessor traverses the program to replace the macro with the replacement values. I assume it cannot be done in one go.

the replacement takes place, when "HYPE" is actually used. it is not expanded when the #define statement occurs.
eg:
1 #define FOO 1
2
3 void foo() {
4 printf("%d\n", FOO);
5 }
so the replacement takes place in line 5, and not in line 1. hence the answer to your question is: once.

A #define'd macro invocation is expanded until there are no more terms to expand, except it doesn't recurse. For example:
#define TIMES *
#define factorial(n) ((n) == 0 ? 1 : (n) TIMES factorial((n)-1))
// Doesn't actually work, don't use.
Suppose you say factorial(2). It will expand to ((2) == 0 ? 1 : (2) * factorial((2)-1)). Note that factorial is expanded, then TIMES is also expanded, but factorial isn't expanded again afterwards, as that would be recursion.
However, note that nesting (arguably a different type of "recursion") is in fact expanded multiple times in the same expression:
#define ADD(a,b) ((a)+(b))
....
ADD(ADD(1,2),ADD(3,4)) // expands to ((((1)+(2)))+(((3)+(4))))

Related

Wrapping TimerOutputs macros

I've got a situation where it would be handy to have a variable, to, which can either be a TimerOutput or nothing. I'm interested in providing a macro that takes the same arguments as #timeit from TimerOutputs (e.g. #timeit to "time spent" s = foo()). Because to is potentially set to nothing, I can't simply disable a TimerObject.
If to is set, my preference would be to pass the arguments along to #timeit, but could live with calling timer_expr(__module__, false, args...).
If to is not set, I'd want to just return the remaining arguments (something like args[3:end], maybe) as an expression.
I've been fussing with this for a day or so, and can handle each of the cases in isolation. For the case where I'm not involving TimerOutput, this seems to work:
macro no_timer(args...)
args[3:end][1]
end
And for the case where I am using TimerOutput, I can do this:
macro with_timer(args...)
timer_expr(__module__, false, args...)
end
Not surprising, as that's just what #timeit does.
I haven't figured out how to handle both cases in one macro. I've gotten the closest by wrapping everything in a ternary operator - i.e. return :(isnothing($(args[1])) ? <expression stuff> : <TimerOutput stuff>), but there is some level of abstraction mismatch that I haven't unsnarled.
Addendum: I've come to the conclusion that my original framing was an "X-Y" problem. I didn't actually need a new macro to solve my problem - hence my accepting the answer I did. That said, I am struck by the fact that both of the answers proffered stayed well away from defining a macro.
You have already this functionality in TimerOuputs.
Just use disable_timer! and enable_timer! methods.
julia> const to = TimerOutput();
julia> disable_timer!(to);
julia> #timeit to "sleep" sleep(0.02)
julia> to
────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 23.6s / 0.0% 464KiB / 0.0%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────
julia> enable_timer!(to);
julia> #timeit to "sleep" sleep(0.02)
julia> to
────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 37.3s / 0.1% 777KiB / 0.0%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────
sleep 1 22.6ms 100.0% 22.6ms 320B 100.0% 320B
────────────────────────────────────────────────────────────────────
You can define your own more specific version of the timer_expr function that the macro calls, where the returned expression contains a check for to argument being nothing, then doing the appropriate thing.
import TimerOutputs: timer_expr
function timer_expr(m::Module, is_debug::Bool, to::Symbol, label::String, ex::Expr)
unescaped(ex) = ex.head == :escape ? ex.args[1] : ex
# this is from the original timer_expr functions,
# to be used when `to` isn't nothing
timer_ex = TimerOutputs.is_func_def(ex) ?
unescaped(TimerOutputs.timer_expr_func(m, is_debug, to, ex, label)) :
TimerOutputs._timer_expr(m, is_debug, to, label, ex)
cond_ex = esc(:(if isnothing($to)
$ex
else
placeholder # dummy symbol, to be replaced
end))
# args[3] = "else" section,
# the placeholder is args[2] within that
unescaped(cond_ex).args[3].args[2] = timer_ex
cond_ex
end

SPSS/macro: split string into multiple variables

I am trying to split a string variable into multiple dummy coded variables. I used these sources to get an idea of how one would achieve this task in SPSS:
https://www.ibm.com/support/pages/making-multiple-string-variables-single-multiply-coded-field
https://www.spss-tutorials.com/spss-split-string-variable-into-separate-variables/
But when I try to adapt the first one to my needs or when I try to convert the second one to a macro, I fail.
In my dataset I have (multiple) variables that contain a comma seperated string that represents different combinations of selected items (as well as missing values). For each item of a specific variable I want to create a dummy variable. If the item was selected, it should be represented with a 1 in the new dummy variable. If it was not selected, that case should be represented with a 0.
Different input variables can contain different numbers of items.
For example:
ID
VAR1
VAR2
DMMY1_1
DMMY1_2
DMMY1_3
1
1, 2
8
1
1
0
2
1
1, 3
1
0
0
3
3, 1
2, 3, 1
1
0
1
4
2, 8
0
0
0
Here is what I came up with so far ...
* DEFINE DATA.
DATA LIST /ID 1 (F) VAR1 2-5 (A) VAR2 6-12 (A).
BEGIN DATA
11, 28
21 1, 3
33, 12, 3, 1
4 2, 8
END DATA.
* MACRO SYNTAX.
* DEFINE VARIABLES (in the long run these should/will be inside the macro function, but for now I will leave them outside).
NUMERIC v1 TO v3 (F1).
VECTOR v = v1 TO v3.
STRING #char (A1).
DEFINE split_var(vr = TOKENS(1)).
!DO !#pos=1 !TO char.length(!vr).
COMPUTE #char = char.substr(!vr, !#pos, 1).
!IF (!#char !NE "," !AND !#char !NE " ") !THEN
COMPUTE v(NUMBER(!#char, F1)) = 1.
!IFEND.
!DOEND.
!ENDDEFINE.
split_var vr=VAR1.
EXECUTE.
As I got more errors than I can count, it's hard to narrow down my problem. But I think the problem has something to do with the way I use the char.length() function (and I am a bit confused when to use the bang operator).
If anyone has some insights, I would really appreciate some help :)
There is a fundamental issue to understand about SPSS macro - the macro does not read or interact in any way with the data. All the macro does is manipulate text to write syntax. The syntax created will later work on the actual data when you run it.
So, for example, Your first error is using char.length(!vr) within the syntax. You are trying to get the macro to read the data, calculate the length and use, but that simply can't be done - the macro can only work with what you gave it.
Another example in your code: you calculate #char and then try to use it in the macro as !#char. So that obviously won't work. ! precedes only macro functions or arguments. #char, in your code, is neither, and it can't become one - can't read the data into the macro...
To give you a litte push forward: I understand you want the macro loop to run a different number of times for each variable, but you can't use char.length(!vr). I suggest instead have the macro loop as many times as necessary to be sure you can deal with the longest variable you'll need to work with.
And another general strategy hint - first, create syntax to deal with one specific variable and one specific delimiter. Once this works, start working on a macro, keeping in mind that the only purpose of the macro is to recreate the same working syntax, only changing the parameters of variable name and delimiter.
With my new understanding of the SPSS macro logic (thanks to #eli-k) the problem was quite easy to solve. Here is the working solution.
* DEFINE DATA.
DATA LIST /ID 1 (F) VAR1 2-5 (A) VAR2 6-12 (A).
BEGIN DATA
11, 28
21 1, 3
33, 12, 3, 1
4 2, 8
END DATA.
* DEFINE MACRO.
DEFINE #split_var(src_var = !TOKENS(1)
/dmmy_var_label = !DEFAULT(dmmy) !TOKENS(1)
/dmmy_var_lvls = !TOKENS(1))
NUMERIC !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls) (F1).
VECTOR #dmmy_vec = !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls).
STRING #char (A1).
LOOP #pos=1 TO char.length(!src_var).
COMPUTE #char = char.substr(!src_var, #pos, 1).
DO IF (#char NE "," AND #char NE " ").
COMPUTE #index = NUMBER(#char, F1).
COMPUTE #dmmy_vec(#index) = 1.
END IF.
END LOOP.
RECODE !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls) (SYSMIS=0) (ELSE=COPY).
EXECUTE.
!ENDDEFINE.
* CALL MACRO.
#split_var src_var=VAR2 dmmy_var_lvls=8.

Calculating the e number using Raku

I'm trying to calculate the e constant (AKA Euler's Number) by calculating the formula
In order to calculate the factorial and division in one shot, I wrote this:
my #e = 1, { state $a=1; 1 / ($_ * $a++) } ... *;
say reduce * + * , #e[^10];
But it didn't work out. How to do it correctly?
I analyze your code in the section Analyzing your code. Before that I present a couple fun sections of bonus material.
One liner One letter1
say e; # 2.718281828459045
"A treatise on multiple ways"2
Click the above link to see Damian Conway's extraordinary article on computing e in Raku.
The article is a lot of fun (after all, it's Damian). It's a very understandable discussion of computing e. And it's a homage to Raku's bicarbonate reincarnation of the TIMTOWTDI philosophy espoused by Larry Wall.3
As an appetizer, here's a quote from about halfway through the article:
Given that these efficient methods all work the same way—by summing (an initial subset of) an infinite series of terms—maybe it would be better if we had a function to do that for us. And it would certainly be better if the function could work out by itself exactly how much of that initial subset of the series it actually needs to include in order to produce an accurate answer...rather than requiring us to manually comb through the results of multiple trials to discover that.
And, as so often in Raku, it’s surprisingly easy to build just what we need:
sub Σ (Unary $block --> Numeric) {
(0..∞).map($block).produce(&[+]).&converge
}
Analyzing your code
Here's the first line, generating the series:
my #e = 1, { state $a=1; 1 / ($_ * $a++) } ... *;
The closure ({ code goes here }) computes a term. A closure has a signature, either implicit or explicit, that determines how many arguments it will accept. In this case there's no explicit signature. The use of $_ (the "topic" variable) results in an implicit signature that requires one argument that's bound to $_.
The sequence operator (...) repeatedly calls the closure on its left, passing the previous term as the closure's argument, to lazily build a series of terms until the endpoint on its right, which in this case is *, shorthand for Inf aka infinity.
The topic in the first call to the closure is 1. So the closure computes and returns 1 / (1 * 1) yielding the first two terms in the series as 1, 1/1.
The topic in the second call is the value of the previous one, 1/1, i.e. 1 again. So the closure computes and returns 1 / (1 * 2), extending the series to 1, 1/1, 1/2. It all looks good.
The next closure computes 1 / (1/2 * 3) which is 0.666667. That term should be 1 / (1 * 2 * 3). Oops.
Making your code match the formula
Your code is supposed to match the formula:
In this formula, each term is computed based on its position in the series. The kth term in the series (where k=0 for the first 1) is just factorial k's reciprocal.
(So it's got nothing to do with the value of the prior term. Thus $_, which receives the value of the prior term, shouldn't be used in the closure.)
Let's create a factorial postfix operator:
sub postfix:<!> (\k) { [×] 1 .. k }
(× is an infix multiplication operator, a nicer looking Unicode alias of the usual ASCII infix *.)
That's shorthand for:
sub postfix:<!> (\k) { 1 × 2 × 3 × .... × k }
(I've used pseudo metasyntactic notation inside the braces to denote the idea of adding or subtracting as many terms as required.
More generally, putting an infix operator op in square brackets at the start of an expression forms a composite prefix operator that is the equivalent of reduce with => &[op],. See Reduction metaoperator for more info.
Now we can rewrite the closure to use the new factorial postfix operator:
my #e = 1, { state $a=1; 1 / $a++! } ... *;
Bingo. This produces the right series.
... until it doesn't, for a different reason. The next problem is numeric accuracy. But let's deal with that in the next section.
A one liner derived from your code
Maybe compress the three lines down to one:
say [+] .[^10] given 1, { 1 / [×] 1 .. ++$ } ... Inf
.[^10] applies to the topic, which is set by the given. (^10 is shorthand for 0..9, so the above code computes the sum of the first ten terms in the series.)
I've eliminated the $a from the closure computing the next term. A lone $ is the same as (state $), an anonynous state scalar. I made it a pre-increment instead of post-increment to achieve the same effect as you did by initializing $a to 1.
We're now left with the final (big!) problem, pointed out by you in a comment below.
Provided neither of its operands is a Num (a float, and thus approximate), the / operator normally returns a 100% accurate Rat (a limited precision rational). But if the denominator of the result exceeds 64 bits then that result is converted to a Num -- which trades performance for accuracy, a tradeoff we don't want to make. We need to take that into account.
To specify unlimited precision as well as 100% accuracy, simply coerce the operation to use FatRats. To do this correctly, just make (at least) one of the operands be a FatRat (and none others be a Num):
say [+] .[^500] given 1, { 1.FatRat / [×] 1 .. ++$ } ... Inf
I've verified this to 500 decimal digits. I expect it to remain accurate until the program crashes due to exceeding some limit of the Raku language or Rakudo compiler. (See my answer to Cannot unbox 65536 bit wide bigint into native integer for some discussion of that.)
Footnotes
1 Raku has a few important mathematical constants built in, including e, i, and pi (and its alias π). Thus one can write Euler's Identity in Raku somewhat like it looks in math books. With credit to RosettaCode's Raku entry for Euler's Identity:
# There's an invisible character between <> and i⁢π character pairs!
sub infix:<⁢> (\left, \right) is tighter(&infix:<**>) { left * right };
# Raku doesn't have built in symbolic math so use approximate equal
say e**i⁢π + 1 ≅ 0; # True
2 Damian's article is a must read. But it's just one of several admirable treatments that are among the 100+ matches for a google for 'raku "euler's number"'.
3 See TIMTOWTDI vs TSBO-APOO-OWTDI for one of the more balanced views of TIMTOWTDI written by a fan of python. But there are downsides to taking TIMTOWTDI too far. To reflect this latter "danger", the Perl community coined the humorously long, unreadable, and understated TIMTOWTDIBSCINABTE -- There Is More Than One Way To Do It But Sometimes Consistency Is Not A Bad Thing Either, pronounced "Tim Toady Bicarbonate". Strangely enough, Larry applied bicarbonate to Raku's design and Damian applies it to computing e in Raku.
There is fractions in $_. Thus you need 1 / (1/$_ * $a++) or rather $_ /$a++.
By Raku you could do this calculation step by step
1.FatRat,1,2,3 ... * #1 1 2 3 4 5 6 7 8 9 ...
andthen .produce: &[*] #1 1 2 6 24 120 720 5040 40320 362880
andthen .map: 1/* #1 1 1/2 1/6 1/24 1/120 1/720 1/5040 1/40320 1/362880 ...
andthen .produce: &[+] #1 2 2.5 2.666667 2.708333 2.716667 2.718056 2.718254 2.718279 2.718282 ...
andthen .[50].say #2.71828182845904523536028747135266249775724709369995957496696762772

Why are macros based on abstract syntax trees better than macros based on string preprocessing?

I am beginning my journey of learning Rust. I came across this line in Rust by Example:
However, unlike macros in C and other languages, Rust macros are expanded into abstract syntax trees, rather than string preprocessing, so you don't get unexpected precedence bugs.
Why is an abstract syntax tree better than string preprocessing?
If you have this in C:
#define X(A,B) A+B
int r = X(1,2) * 3;
The value of r will be 7, because the preprocessor expands it to 1+2 * 3, which is 1+(2*3).
In Rust, you would have:
macro_rules! X { ($a:expr,$b:expr) => { $a+$b } }
let r = X!(1,2) * 3;
This will evaluate to 9, because the compiler will interpret the expansion as (1+2)*3. This is because the compiler knows that the result of the macro is supposed to be a complete, self-contained expression.
That said, the C macro could also be defined like so:
#define X(A,B) ((A)+(B))
This would avoid any non-obvious evaluation problems, including the arguments themselves being reinterpreted due to context. However, when you're using a macro, you can never be sure whether or not the macro has correctly accounted for every possible way it could be used, so it's hard to tell what any given macro expansion will do.
By using AST nodes instead of text, Rust ensures this ambiguity can't happen.
A classic example using the C preprocessor is
#define MUL(a, b) a * b
// ...
int res = MUL(x + y, 5);
The use of the macro will expand to
int res = x + y * 5;
which is very far from the expected
int res = (x + y) * 5;
This happens because the C preprocessor really just does simple text-based substitutions, it's not really an integral part of the language itself. Preprocessing and parsing are two separate steps.
If the preprocessor instead parsed the macro like the rest of the compiler, which happens for languages where macros are part of the actual language syntax, this is no longer a problem as things like precedence (as mentioned) and associativity are taken into account.

Preprocessor Quoting macro arguments

Suppose I have some macro #define NAME name, and I want to define some other macro which will expand to the quoted value. That is, as if I had also defined #define NAME_STR "name". Is there a neater way than the following?
#define QUOT(str) #str
#define QUOT_ARG(str) QUOT(str)
#define NAME_STR QUOT_ARG(NAME)
Not really, due to the fact that macro arguments are not expanded when used in stringification. From the GNU C PreProcessor manual:
Unlike normal parameter replacement,
the argument is not macro-expanded
first. This is called stringification.
From the same source:
If you want to stringify the result of
expansion of a macro argument, you
have to use two levels of macros.
...which continues with an example:
#define xstr(s) str(s)
#define str(s) #s
#define foo 4
str (foo)
==> "foo"
xstr (foo)
==> xstr (4)
==> str (4)
==> "4"