m4 stops processing if new value contains hash (#) - macros

How to force m4 to continue processing line if value has hash?
$ echo a a a | m4 -D a=B+
B+ B+ B+
$ echo a a a | m4 -D a=B#
B# a a
I want identical behavior for second case - is it possible? (all three occurrences to be replaced).
In my understanding observed behavior is inconsistent and I couldn't find explanation in user manual.

The # character is the first character of a comment and a newline is the last character. The m4 parses the first a and replace it to B#. It doesn't scan more because it runs into a comment.
The solution is to change the comment characters with changecom:
$ echo "changecom(BC,EC)a a a" | m4 -D a=B#
B# B# B#
Of course you can choose better comment begin- and end-sequences.
Ps. you can turn off the comment with simple changecom without arguments: echo changecom a a a. You can read it from manual :)

Related

sed `D` with address range

As explained in manual, D deletes a portion of the pattern space, up to the first embedded newline. But I can not find any doc explain D combined with address ranges. For example:
$ cat /tmp/test
accident if I use one.
My wife won't let me buy a power saw. She is afraid of an
$ cat /tmp/test | sed -ne '$p;:a;N;$!{ba};2,$D'
accident if I use one.
My wife won't let me buy a power saw. She is afraid of an
Looks like if there are 2 or more lines in pattern space, the first portion till the first embedded newline will be deleted.
Any official doc for it ?
And why 2D does not work at all?
$ cat /tmp/test | sed -ne '$p;:a;N;$!{ba};2D'
Nothing will be shown for above cmd.

What does the range-operator in "sed" actually do, is it broken in GNU/busybox?

I wonder whether the GNU and BusyBox implementations of "sed" may be broken.
My default sed implementation is the one from GNU.
POSIX says:
An editing command with two addresses shall select the inclusive
range from the first pattern space that matches the first address
through the next pattern space that matches the second.
But then why gives
$ { echo ha; echo ha; echo ha; } | sed '0,/ha/ !d'
ha
instead of
ha
ha
? Clearly the 2nd "ha" here is the "next" pattern space which matches, so it should be output as well!
But even more strange,
$ { echo ha; echo ha; echo ha; } | busybox sed '0,/ha/ !d'
does not output anything at all!
But even if sed would do what the POSIX definition says, it is still unclear what should happen when a range expression is actually checked.
Does every range-condition has its own internal state? Or is there a single global state for all range-conditions in a sed script?
Obviously, a range condition needs at least to remember whether it is currently in the "search for a match of the first address"-state or in the "search for a match of the second address"-state. Perhaps it even needs to remember a third state "I have already processed the range and will not match again, no matter what".
It certainly matters when those conditions are updated: Every time a new pattern space is read? Every time the pattern space is modified, say by an s-command? Or just if the control flow reaches a range condition?
So, what is it?
Until I know better, I will avoid range conditions in my sed-scripts and consider them to be a dubious feature.
Two answers:
0 is not a valid POSIX address (lines count from 1)
0,/re/ is a GNU extension
GNU awk man page includes:
0,addr2
Start out in "matched first address" state, until addr2 is
found. This is similar to 1,addr2, except that if addr2 matches
the very first line of input the 0,addr2 form will be at the end
of its range, whereas the 1,addr2 form will still be at the
beginning of its range. This works only when addr2 is a regular
expression.
Perhaps this will help clarify:
$ { echo ha1; echo ha2; echo ha3; } | sed '0,/ha/ !d'
ha1
$ { echo ha1; echo ha2; echo ha3; } | sed '1,/ha/ !d'
ha1
ha2
$ { echo ha1; echo ha2; echo ha3; } | sed --posix '0,/ha/ !d'
sed: -e expression #1, char 8: invalid usage of line address 0
The busybox code explicitly checks addr1 is greater than 0 and so never enters matching state. See the busybox source code, line 1121:
|| (sed_cmd->beg_line > 0
Each match maintains its own state, as multiple can be active simultaneously.
POSIX says:
An editing command with two addresses shall select the inclusive range from the first pattern space that matches the first address through the next pattern space that matches the second. (If the second address is a number less than or equal to the line number first selected, only one line shall be selected.) Starting at the first line following the selected range, sed shall look again for the first address. Thereafter, the process shall be repeated.
The test happens each time it is encountered:
$ { echo ..a; echo ..b; echo ..c; } |\
sed -n '
=;
y/cba/ba:/;
1 ,/b/ s/$/ 1/p;
/a/,/c/ s/$/ 2/p;
2, 3 s/$/ 3/p;
'
1
..: 1
2
..a 1
..a 1 2
..a 1 2 3
3
..b 1
..b 1 2
..b 1 2 3
This is also demonstrated by, for example, the busybox source code - see the sed_cmd_s typedef.

How to use number flags in sed

I have read sed info. In capture 3.5 :The s Command
There is a description:
The s command can be followed by zero or more of the following flags:
number
Only replace the numberth match of the regexp.
Note: the posix standard does not specify what should happen when you mix
the g and number modifiers, and currently there is no widely agreed
upon meaning across sed implementations. For GNU sed, the interaction
is defined to be: ignore matches before the numberth, and then match
and replace all matches from the numberth on.
I do not know how to use it ,who can give a example.
echo a1 | sed -n 's/\(a\)1/\13/p'
the result is no different with
echo a1 | sed -n 's/\(a\)1/\13/1p'
try this:
echo "hi hi hi" | sed 's/hi/hello/2'
echo "hi hi hi" | sed 's/hi/hello/3'
The number obviously only makes sense when there is more than one match.
sed 's/a/b/4' <<<aaaaa
aaaba
If there isn't a fourth match, obviously, no substitution takes place.

How to assign number for a repeating pattern

I am doing some calculations using gaussian. From the gaussian output file, I need to extract the input structure information. The output file contains more than 800 structure coordinates. What I did so far is, collect all the input coordinates using some combinations of the grep, awk and sed commands, like so:
grep -A 7 "Input orientation:" test.log | grep -A 5 "C" | awk '/C/{print "structure number"}1' | sed '/--/d' > test.out
This helped me to grep all the input coordinates and insert a line with "structure number". So now I have a file that contains a pattern which is being repeated in a regular fashion. The file is like the following:
structure Number
4.176801 -0.044096 2.253823
2.994556 0.097622 2.356678
5.060174 -0.115257 3.342200
structure Number
4.180919 -0.044664 2.251182
3.002927 0.098946 2.359346
5.037811 -0.103410 3.389953
Here, "Structure number" is being repeated. I want to write a number like "structure number:1", "structure number 2" in increasing order.
How can I solve this problem?
Thanks for your help in advance.
I am not familiar at all with a program called gaussian, so I have no clue what the original input looked like. If someone posts an example I might be able to give an even shorter solution.
However, as far as I got it the OP is contented with the output of his/her code besided that he/she wants to append an increasing number to the lines inserted with awk.
This can be achieved with the following line (adjusting the OP's code):
grep -A 7 "Input orientation:" test.log | grep -A 5 "C" | awk '/C/{print "structure number"++i}1' | sed '/--/d' > test.out
Addendum:
Even without knowing the actual input, I am sure that one can at least get rid of the sed command leaving that piece of work to awk. Also, there is no need to quote a single character grep pattern:
grep -A 7 "Input orientation:" test.log | grep -A 5 C | awk '/C/{print "structure number"++i}!/--/' > test.out
I am not sure since I cannot test, but it should be possible to let awk do the grep's work, too. As a first guess I would try the following:
awk '/Input orientation:/{li=7}!li{next}{--li}/C/{print "structure number"++i;lc=5}!lc{next}{--lc}!/--/' test.log > test.out
While this might be a little bit longer in code it is an awk-only solution doing all the work in one process. If I had input to test with, I might come up with a shorter solution.

Insert between each occurrence of two characters

If I can have somewhere in my input a series of two or more characters (in my case, >), how can I insert something between each occurrence of >?
For example: >> to >foo>, but also:
>>> to >foo>foo> and:
>>>> to >foo>foo>foo>.
Using 's/>>/>foo>/g' gives me of course >foo>>foo>, which is not what I need.
In other words, how can I push a character back to the pattern space, or match a character without consuming it (does that make any sense?)
Using Perl, you can do it iteratively
$ echo '>>>>' | perl -pe 's/>>/>foo>/ while />>/'
>foo>foo>foo>
or use a look-ahead assertion, which does not consume the 2nd >
$ echo '>>>>' | perl -pe 's/>(?=>)/>foo/g'
>foo>foo>foo>
This should also work
sed ':b; s/>>/>foo>/; tb'