sed n doesn't seem to work quite the way I thought it would - sed

I was trying to copy an example I found here : http://www.grymoire.com/Unix/Sed.html#uh-35a
here is the sed pattern
/^begin$/,/^end$/{
/begin/n
/end/!d
}
here's the first file
begin
one
end
last line
and here's the second
begin
end
last line
when I run the sed on the first file it deletes what's between the begin/end and all is well. When I run it on the second, it appears to miss the "end" and deletes the rest of the file.
running on first file
$ sed -f x.sed a
begin
end
last line
running on second
$ sed -f x.sed b
begin
end
notice how "last line" is missing on the second run.
I thought that "n" would print the current pattern and suck in the next one. It would then hit the /end/ command and process that.
as it is, it seems like it's somehow causing the end of the range to be missed. Will somebody explain what is happening?

It should be:
/^begin$/,/^end$/{
/^begin$\|^end$/!d
}
Why was your command wrong?
The n command was wrong there. In the second example it will:
begin ---> n read next line(important: this does not affect the state of the range address (begin,end))
1a. end ---> /end/! does not apply. Don't delete the line
last line ---> /end/! applies. Delete the line. (sed is still searching for a line that contains end because the n command skipped that line)

found another way around after #hek2mgl 's help. I can add a branch around the 2nd statement. I actually need this because I want to see the begin label. so you can also do this:
/^begin$/,/^end$/{
/begin/{ b skip }
/end/!d
:skip
}

I think you were close to getting it to do what you wanted. When you want to delete the next line after a match you simply need to pull it in with the sed n and then hit it with a delete d.
It looks like you want to skip the line after the line that starts with begin unless it's end and print all the other lines.
If so, the following should suffice:
/^begin$/,/^end$/{
/begin/{n;/end/!d}
}
It works by skipping the next line after begin except if it starts with end (/end/!).
Also see: sed or awk: delete n lines following a pattern

Related

Trying to understand why sed emulating rev loops on same line until all reversed

sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
Above command reverses line and emulates rev.
But as per my understanding, sed is a line editor and it executes all actions seperated by ; on one line then reads second line.
So in above line why even after executing all commands one time it keeps looping until all characters are reversed.??
I dont want to understand how commad works i know that. But why after executing all action seperated by ; once it keeps executing until all characters reverse
Why loop kind of behaviour
I can't understand this behaviour
eg.
echo 'Hey i am fine' | sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
Should be :
Pattern space
-->(1st action) Hey i am fine\n
--> (2nd action) Hey i am fine\ney i am fine\nH
-->(3rd action) ey i am fine\nH
-->(4th action) At this point it should execute s/.// and exit or read next line
But why after 3rd action 1st action is repeated
Seems like as long as patter space is not deleted the sed will not read next line and keep repeating cycle on same pattern space.
Seems like D option is doing this.
But But why after first D action s/.// is not executed but repeated from beginning.??

SED - using $ inserts string at beginning of line instead of end

I am not getting expected results from sed 's/$/2021-07-21/' demotoytable.csv
Before the command the top 3 lines look like:
urlhm|main_code|description|taxable|itemnum|xtras
t3mr.com/guitar/qrc/G19RTE000000753|G19RTE0000007530|Promo_labor_day_006|Consignment|7522831|bag
t3mr.com/guitar/qrc/G19RTE000000753|G19RTE0000007530|Promo_labor_day_006|Consignment|7522835|box
t3mr.com/guitar/qrc/G19RTE000000753|G19RTE0000007530|Promo_labor_day_006|Consignment|7522839|case
But after running the command sed 's/$/|2021-07-21/' demotoytable.csv
I get this result:
|2021-07-21code|description|taxable|itemnum|xtras
|2021-07-21itar/qrc/G19RTE000000753|G19RTE0000007530|Promo_labor_day_006|Consignment|7522831|bag
|2021-07-21itar/qrc/G19RTE000000753|G19RTE0000007530|Promo_labor_day_006|Consignment|7522835|box
|2021-07-21itar/qrc/G19RTE000000753|G19RTE0000007530|Promo_labor_day_006|Consignment|7522839|case
Any ideas on why this is happening, or better yet how to fix? I want each line to end w "|2021-07-21", not begin with it. On a Mac Pro running Big Sur
Thanks
Remove carriage returns and then add the texts you wish to add:
sed 's/\r$//; s/$/|2021-07-21/' demotoytable.csv
s/\r$// removes carriage returns at the end of lines, s/$/|2021-07-21/ in its turn appends the value of your choice at the end of lines.

A way to append the beginning of every line before a pattern to the end of each same line?

I am trying to copy the beginning of every line in a text file before a certain character to the end of the same line.
I've tried duplicating each line to the end of itself, and then deleting everything after the character, but the trouble is I haven't been able to figure out how to skip the first instance of the character so the result is that the duplicated text gets deleted as well as everything beyond the first instance of the character.
I've tried things like
sed '/S/ s/$/ append text/' sample.txt > cleaned.txt
but this only adds a fixed text. I've also tried using:
s/\(.*\)/\1 \1/
to duplicate the line, and then deleting everything after the S, but I can't figure out how to get it to go to the 3rd S not the 1st to start deleting.
What I have to start with:
dog 50_50_S5_Scale
cat 10_RV_S76_Scale
mouse 15_SQ_S81_Scale
What I'm trying to get:
dog 50_50_S5_Scale dog 50_50_
cat 10_RV_17_S76_Scale cat 10_RV_17_
mouse 15_EQ_S81_Scale mouse 15_EQ_
Where everything before the first S gets copied to the end of the line.
You may use
sed 's/\([^S]*\)S.*/& \1/' file
See the online demo
Details
\([^S]*\) - Capturing group 1 (\1): any 0+ chars other than S
S.* - S and the rest of the string (actually, line, since sed processes line by line by default).
The replacement is the concatenation of the whole match (&), space and Group 1 value.
You could try:
awk '{print $0 " " substr($0, 0, index($0,"S") - 1)}' file
We take the substring from the first character up to but not including the first occurance of "S".

Explain this sed conditional branching behavior

I have the following (gnu) sed script, which is intended to parse another sed script, and output distinct commands on a separate line.
In words, this script should put a newline after each semicolon ;, except semicolons that are inside a matching or substitution command.
Sed script:
#!/bin/sed -rf
# IDEA:
# replace ';' by ';\n' except when it's inside a match expression or subst. expression.
# Ignored patterns:
/^#/b # commented lines
/^$/b # empty lines
# anything in a single line, without semicolon except at the end
/^[^\n;]*;?$/b
# Processed patterns (put on separate lines):
# Any match preceding a semicolon, or the end of the line, or a substitution
s_/^[^/]+/[^;s]*;?_&\n_; t printtopline
s/^\\(.)[^\1]+\1[^;s]*;?/&\n/;t printtopline
# Any substitution (TODO)
# Any other command, separated by semicolon
s/\;/\;\n/; t printtopline;
:printtopline
P;D; # print top line, delete it, start new cycle
For example, I tested it with the following file (actually adapted from an answer of #ctac_ to one of my previous sed questions):
Input file:
#!/bin/sed -f
#/^>/N;
:A;
/\n>/!{s/\n/ /;N;bA}; # join next line if not a sequence label
#h;
#s/\(.*\)\n.*/\1/p;
s/^>//g;P
#x;
#s/.*\n//;
D
bA;
Output
The above script produces the right output, for example, the line /\n>/!{s/\n/ /;N;bA}; # join next line if not a sequence label becomes:
/\n>/!{s/\n/ /;
N;
bA};
# join next line if not a sequence label
Question
However, could you help me understand why this part of the script works:
s/\;/\;\n/; t printtopline;
:printtopline
?
I seems to me that the branching command t printtopline is useless here. I thought whatever the success of the substitution, the next thing to be executed would be :printtopline.
However, if I comment out the t command, or if I replace it with b, the script produces the following output lines:
/\n>/!{s/\n/ /;
N;bA}; # join next line if not a sequence label
From info sed, here is the explanation of t:
't LABEL'
Branch to LABEL only if there has been a successful 's'ubstitution
since the last input line was read or conditional branch was taken.
The LABEL may be omitted, in which case the next cycle is started.
Why isn't the t command immediately followed by its label not behaving like no command at all or the b command?
The crucial part is this:
Branch to label only if there has been a successful substitution since the last input line was read or conditional branch was taken.
I.e. t looks into the past and takes into account the success of all recent substitutions up to the most recent
input, or
conditional branch.
Consider the input line you're asking about. After all the substitutions we have
/\n>/!{s/\n/ /;
N;bA}; # join next line if not a sequence label
in our pattern space when we reach P;D;. The P commands outputs the first line, then D deletes the first line and restarts the main loop. Now we just have
N;bA}; # join next line if not a sequence label
Note that this didn't involve reading any additional lines. No input occurred; D just removed parts of the pattern space.
We process the remaining text (which does nothing because none of the other patterns match) until we reach this part of the code:
s_/^[^/]+/[^;s]*;?_&\n_; t printtopline
The substitution fails (the pattern space doesn't contain /^). But the t command doesn't check the status of just this one s command; it looks at the history of all substitutions since the most recent input or conditional branch taken.
The most recent input occurred when /\n>/!{s/\n/ /;N;bA}; was read.
The most recent conditional branch taken was
s/\;/\;\n/; t printtopline;
:printtopline
in the original version of your code. Since then no other substitution succeeded, so the t command does nothing. The rest of the program continues as expected.
But in the modified version of your code there was no conditional branch at this point (b is an unconditional branch):
s/\;/\;\n/; b printtopline;
:printtopline
That means the t from s_/^[^/]+/[^;s]*;?_&\n_; t printtopline "sees" the s/\;/\;\n/; as having succeeded, so it immediately jumps to the P;D; part. This is what outputs
N;bA}; # join next line if not a sequence label
unmodified.
In summary: t makes a difference here not because of its immediate effect of jumping to a label, but because it serves as a dynamic delimiter for the next t that gets executed. Without t here, the previously executed s command is taken into account for the next t.
Part 1 - how the P;D; sequence works.
Compare this two command's outputs: sed 's/;/;\n/' and sed 's/;/;\n/; P;D;'.
First:
$ sed 's/;/;\n/' <<< 'one;two;three;four'
one;
two;three;four
Second:
$ sed 's/;/;\n/; P;D;' <<< 'one;two;three;four'
one;
two;
three;
four
Why the difference? I will to explain.
The first command substitutes only the first occurrence of the ; character. To substitute all occurrences, the g modifier should be added to the s command: sed 's/;/;\n/g'.
The second command works this way:
sed 's/;/;\n/; - the same as the first command - no difference. Before this command the pattern space is one;two;three;four, after - one\ntwo;three;four.
P; -
from man: "Print up to the first embedded newline of the current pattern space."
That is, it prints up to first newline - one. The pattern space stay unchanged: one\ntwo;three;four
D; -
from man: "If pattern space contains no newline, start a normal new cycle as if the d command was
issued. Otherwise, delete text in the pattern space up to the first newline, and restart
cycle with the resultant pattern space, without reading a new line of input."
In the our case, pattern space has newline - one\ntwo;three;four. The D; removes the one\n part and repeat all commands cycle from the beginning. Now, the pattern space is: two;three;four.
That is, again sed 's/;/;\n/; - pattern space: two\nthree;four, then P; - print two, pattern space unchanged: two\nthree;four, D; - removes two\n, pattern space becomes: three;four. Etc.
Part 2 - what happening with branching.
I looked at the sed source code and found next information:
When the s command is executing and having match, the replaced flag is setting to the true:
/* We found a match, set the 'replaced' flag. */
replaced = true;
The t command is executing, if the replaced flag is true. And it is changing this flag to the false:
case 't':
if (replaced)
{
replaced = false;
So, in the first, s/\;/\;\n/; t printtopline; case, the substitution is successful - therefore, replaced flag is setting to the true. Then, the following t command is running and changing replaced flag back to the false.
In the second case, without t command - s/\;/\;\n/;, substitution is successful, too - therefore, replaced flag is setting to the true.
But now, this flag is stored to the next cycle, initiated by the D command. So, then the first t command appears in the new cycle - s_/^[^/]+/[^;s]*;?_&\n_; t printtopline, it checks the replaced flag, sees, that the flag is true and jumps to the label :printtopline, omitting all other commands before the label.
The pattern space doesn't have newlines, so P;D; sequence just prints pattern space and starts the next cycle with the new line of input.

manipulating sed context

Here is my one-liner:
sed -n '/BEGIN/,/END/{$d;1d;p}' query
And query:
trash
BEGIN first
labas
END
nieko nėra
BEGIN second
iki
END
nesimato
I expect this result:
labas
iki
However, I get this:
BEGIN first
labas
END
BEGIN second
iki
END
What do I misunderstand about sed context? Shouldn't {$d,1d;p} delete first and last line of the matching input?
No, it deletes any line of the matching input that is the first or last line of the file. You can see the effect if you remove the first two lines of query (so that the first line is "BEGIN").
This might work for you:
sed -n '/BEGIN/,/END/{//!p}' file
labas
iki