Replace spaces with underscores in a macro? - macros

Can I write a single-parameter macro which takes a sequence of words/tokens separated by whitespace, and produces the same sequence but with underscores between each word/token?
e.g.
MAGIC_MACRO(brave new world)
would evaluate to
brave_new_world
Notes:
I don't mind whether each whitespace character becomes an underscore, just that at least one be used.
If I can't do this generally, I would at least like to know if this is possible with exactly two words.

Can I write a single-parameter macro which takes a sequence of words/tokens separated by whitespace, and produces the same sequence but with underscores between each word/token?
Of course, it's not possible. There are no string manipulation utilities in preprocessor.
Och, who am I kidding. First, you have to build a dictionary with all possible words combinations. For the purpose of this, we will have a small dictionary with few words:
#define WORD_world world,
#define WORD_new new,
// etc.
You might get the pattern. Then let's implement the macro that will do the following:
brave new world // our starting argument
WORD_##brave new world // add WORD_ to all arguments and join arguments with spaces
WORD_brave new world
brave, new world // expand WORD_brave macro
WORD_brave WORD_new world // add WORD_ to all arguments and join arguments with spaces
brave, new, world // expand WORD_* macros
WORD_brave WORD_new WORD_world // add WORD_ to all arguments and join arguments with spaces
brave, new, world, // expand WORD_* macros
/* --- repeat above steps up to maximum words you need to handle --- */
brave_new_world // join arguments with `_` ignoring last empty one
The following code:
// our dictionary
#define WORD_world world,
#define WORD_new new,
#define WORD_brave brave,
#define WORD_hello hello,
#define WORD_Hello Hello,
// the classics
#define COMMA(...) ,
#define FIRST(a, ...) a
// apply function f for each argument recursively with tail
#define FOREACHTAIL_1(f,a) f(a,)
#define FOREACHTAIL_2(f,a,...) f(a,FOREACHTAIL_1(f,__VA_ARGS__))
#define FOREACHTAIL_3(f,a,...) f(a,FOREACHTAIL_2(f,__VA_ARGS__))
#define FOREACHTAIL_4(f,a,...) f(a,FOREACHTAIL_3(f,__VA_ARGS__))
#define FOREACHTAIL_N(_4,_3,_2,_1,N,...) \
FOREACHTAIL_##N
#define FOREACHTAIL(f,...) \
FOREACHTAIL_N(__VA_ARGS__,4,3,2,1)(f,__VA_ARGS__)
// if there are two arguments, expand to true. Otherwise false.
#define IFTWO_N(_0,_1,N,...) N
#define IFTWO(true, false, ...) IFTWO_N(__VA_ARGS__, true, false)
// If empty, expand to true, otherwise false.
// https://gustedt.wordpress.com/2010/06/08/detect-empty-macro-arguments/
#define IFEMPTY(true, false, ...) IFTWO(true, false, COMMA __VA_ARGS__ ())
// Join arguments with `_`.
#define JOIN_U(a, b) a##_##b
#define JOIN_TWO_IN(a,b) IFEMPTY(FIRST, JOIN_U, b)(a, b)
#define JOIN_TWO(a,b) JOIN_TWO_IN(a,b)
#define JOIN(...) FOREACHTAIL(JOIN_TWO, __VA_ARGS__)
// Append WORD_ to each argument and join arguments with spaces.
#define WORD_ /* the last one expands to empty */
#define WORDS_TWO(a, b) WORD_##a b
#define WORDS(...) FOREACHTAIL(WORDS_TWO, __VA_ARGS__)
#define MAGIC_MACRO(a) JOIN(WORDS(WORDS(WORDS(WORDS(WORDS(a))))))
MAGIC_MACRO(brave new world)
MAGIC_MACRO(Hello world)
Produces:
brave_new_world
Hello_world

Related

How does SystemVerilog compiler knows to separate two arguments in a macro?

I have this macro:
`define do_code(DO_SOETHING, ID) \
fork \
begin \
``DO_SOMETHING`` \
end \
begin \
$display("%s",ID.name()); \
end \
join_any \
disable fork; \
and I use it as such:
`do_code($display("%s",argA.name()), argB)
How does the compiler knows to separate the two macro's input arguments correct:
DO_SOMETHING = $display("%s",argA.name())
ID = argB
Why not break it to:
DO_SOMETHING = $display("%s"
ID = argA.name()), argB
???
The compiler knows because the IEEE 1800-2017 SystemVerilog LRM
says in section 22.5.1 `define
Actual arguments and defaults shall not contain comma or
right parenthesis characters outside matched pairs of left and right parentheses (), square brackets [],
braces {}, double quotes "", or an escaped identifier.
The comma in your fist argument is inside a matched pair of parenthesis.
BTW, you should not be using `` unless you are trying to create a new identifier by joining macro argument with text in the body of the macro,

How to replace groups of characters between flags in MATLAB

Suppose I have a char variable in Matlab like this:
x = 'hello ### my $ name is Sean Daley.';
I want to replace the first '###' with the char '&', and the first '$' with the char '&&'.
Note that the character groups I wish to swap have different lengths [e.g., length('###') is 3 while length('&') is 1].
Furthermore, if I have a more complicated char such that pairs of '###' and '$' repeat many times, I want to implement the same swapping routine. So the following:
y = 'hello ### my $ name is ### Sean $ Daley ###.$.';
would be transformed into:
'hello & my && name is & Sean && Daley &.&&.'
I have tried coding this (for any arbitrary char) manually via for loops and while loops, but the code is absolutely hideous and does not generalize to arbitrary character group lengths.
Are there any simple functions that I can use to make this work?
y = replace(y,["###" "$"],["&" "&&"])
The function strrep is what you are looking for.

SWI-Prolog: How to get unicode char from escaped string?

I have a problem, I've got an escaped string for example "\\u0026" and I need this to transform to unicode char '\u0026'.
Tricks like
string_concat('\\', S, "\\u0026"), write(S).
didn't help, because it will remove \ not only the escape . So basically my problem is, how to remove escape chars from the string.
EDIT: Oh, I've just noticed, that stackoverflow also plays with escape \.
write_canonical/1 gives me "\\u0026", how to transform that into a single '&' char?
In ISO Prolog a char is usually considered an atom of length 1.
Atoms and chars are enclosed in single quotes, or written without
quotes if possible. Here are some examples:
?- X = abc. /* an atom, but not a char */
X = abc
?- X = a. /* an atom and also a char */
X = a
?- X = '\u0061'.
X = a
The \u notation is SWI-Prolog specific, and not found in the ISO
Prolog. In SWI-Prolog there is a data type string again not found
in the ISO Prolog, and always enclosed in double quotes. Here are
some examples:
?- X = "abc". /* a string */
X = "abc"
?- X = "a". /* again a string */
X = "a"
?- X = "\u0061".
X = "a"
If you have a string at hand of length 1, you can convert it to a char
via the predicate atom_string/2. This is a SWI-Prolog specific predicate,
not in ISO Prolog:
?- atom_string(X, "\u0061").
X = a
?- atom_string(X, "\u0026").
X = &
Some recommendation. Start learning the ISO Prolog atom predicates first,
there are quite a number. Then learn the SWI-Prolog atom and string predicates.
You dont have to learn so many new SWI-Prolog predicates, since in SWI-Prolog most of the ISO Prolog predicates also accept strings. Here is an example of the ISO Prolog predicate atom_codes/2 used with a string in the first argument:
?- atom_codes("\u0061\u0026", L).
L = [97, 38].
?- L = [0'\u0061, 0'\u0026].
L = [97, 38].
?- L = [0x61, 0x26].
L = [97, 38].
P.S: The 0' notation is defined in the ISO Prolog, its neither a char, atom or string, but it represents an integer data type. The value is the code of the given char after the 0'. I have combined it with the SWI-Prolog \u notation.
P.P.S: The 0' notation in connection of the \u notation is of course redundant, in ISO Prolog one can directly use the hex notation prefix 0x for integer values.
The thing is that "\\u0026" is already what you are searching for because it represents \u0026.

How to return next string without >> with stringstream?

Instead of:
stringstream szBuffer;
szBuffer>>string;
myFunc(string);
How do I do like:
muFunc(szBuffer.NextString());
I dont want to create a temp var just for passing it to a function.
If you want to read the whole string in:
// .str() returns a string with the contents of szBuffer
muFunc(szBuffer.str());
// Once you've taken the string out, clear it
szBuffer.str("");
If you want to extract the next line (up to the next \n character), use istream::getline:
// There are better ways to do this, but for the purposes of this
// demonstration we'll assume the lines aren't longer than 255 bytes
char buf[ 256 ];
szBuffer.getline(buf, sizeof(buf));
muFunc(buf);
getline() can also take in a delimiter as a second parameter (\n by default), so you can read it word by word.

Why are constants in C-header files of libraries always defined as hexadecimal?

No matter which C-compatible library I use, when I look at the header defined constants, they are always defined as hexadecimal values. Here, for instance, in GL/gl.h:
#define GL_POINTS 0x0000
#define GL_LINES 0x0001
#define GL_LINE_LOOP 0x0002
#define GL_LINE_STRIP 0x0003
#define GL_TRIANGLES 0x0004
#define GL_TRIANGLE_STRIP 0x0005
#define GL_TRIANGLE_FAN 0x0006
#define GL_QUADS 0x0007
#define GL_QUAD_STRIP 0x0008
#define GL_POLYGON 0x0009
Is there any particular reason for this convention, why not simply use decimal values instead?
There are a number of possible reasons:
1) Bit flags are much easier to express as hex, since each hex digit represents exactly 4 bits.
2) Even for values which aren't explicitly bit flags, there are often intentional bit patterns that are more evident when written as hex.
For instance, all the AlphaFunctions start with 0x02 and differ in only a single byte:
#define GL_NEVER 0x0200
#define GL_LESS 0x0201
#define GL_EQUAL 0x0202
#define GL_LEQUAL 0x0203
#define GL_GREATER 0x0204
#define GL_NOTEQUAL 0x0205
#define GL_GEQUAL 0x0206
#define GL_ALWAYS 0x0207
3) Hex values are allowed to have leading zeroes, so it is easier to line up the values. This can make reading (and proof-reading) easier. You might be surprised that leading zeroes are allowed in hex and octal literals but not decimal, but the C++ spec says quite explicitly
A decimal integer literal (base ten) begins with a digit other than 0 and consists of a sequence of decimal digits.
If the constant values refer to bit flags, and are intended to be combined, then Hex notation is a convenient way of displaying which bits are represented.
For example, from a Boost header:
// Type encoding:
//
// bit 0: callable builtin
// bit 1: non member
// bit 2: naked function
// bit 3: pointer
// bit 4: reference
// bit 5: member pointer
// bit 6: member function pointer
// bit 7: member object pointer
#define BOOST_FT_type_mask 0x000000ff // 1111 1111
#define BOOST_FT_callable_builtin 0x00000001 // 0000 0001
#define BOOST_FT_non_member 0x00000002 // 0000 0010
#define BOOST_FT_function 0x00000007 // 0000 0111
#define BOOST_FT_pointer 0x0000000b // 0000 1011
#define BOOST_FT_reference 0x00000013 // 0001 0011
#define BOOST_FT_non_member_callable_builtin 0x00000003 // 0000 0011
#define BOOST_FT_member_pointer 0x00000020 // 0010 0000
#define BOOST_FT_member_function_pointer 0x00000061 // 0110 0001
#define BOOST_FT_member_object_pointer 0x000000a3 // 1010 0001
It is shorter, but more importantly, if they are bit flags, it is easier to combine them and make masks.