how to delete lines connected with "+" signs with sed - sed

In this example, "+" sign means it connects the previous line and the current line. So I like to delete a specific group of lines that are connected by "+".
For example, I'd like to remove from 1st line to 4th line(.groupA ~ + G H I). Please help me on how to do it with sed.

To delete lines starting with .groupA and all consecutive +-prefixed lines, one easy to understand approach is:
sed '/\.groupA/,/^[^+]/ { /\.groupA/d; /\.groupA/!{/^\+/d} }' file
We first select everything between .groupA and the first non +-prefixed line (inclusive), then for that selection of lines, we delete the first line (containing .groupA), and of the remaining lines, we delete all with + prefix.
Note you need to escape regex metacharacters (like . and +) if you want to match them literally.
A little bit more advanced, but more elegant (only one use of starting block pattern) approach uses a loop to skip the first line of the matched block, and all the following lines that start with +:
sed -n '/\.groupA/ { :a; n; s/^\+//; ta }; p' file

IMHO this is more readily done with awk, but kindly just ignore if that is not an option for you.
So, every time I see a line starting with .groupA, I set a flag d to say I am deleting, and then skip to the next line. If I see a line starting with a + and I am currently deleting, I skip to the next line. If I see anything else, I change the flag to say I am no longer deleting and print the line:
awk '/^\.groupA/ {d=1; next}
/^+/ && d==1 {next}
{d=0; print}' file
Sample Output
** Example **
abcdef ghijkl
.groupB abc def
+ JKL
+ MNO
+ GHI
opqrst vwxyz
You can cast it as a one-liner like this:
awk '/^\.groupA/{d=1; next} d==1 && /^+/ {next} {d=0;print}' file

Related

sed/awk conditionally delete lines from the start and end of a file

I have several thousand text files which might start with
"
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
"
perl is also ok
my attempt would be something like this with fish shell. awk is probably more performant though
if head -1 | grep \"
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
that might actually work I'm going to try it
The simplest way to do this is a 2-pass approach where on the first pass you figure out the beginning and ending line numbers for the "good" lines and on the second you print the lines between those numbers:
awk '
NR==FNR { if (NF && !/^"$/) { if (!beg) beg=NR; end=NR } next }
(beg <= FNR) && (FNR <= end)
' file file
For example given this input:
$ cat file
"
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
"
We can do the following using any awk in any shell on every UNIX box:
$ awk 'NR==FNR{if (NF && !/^"$/) {if (!beg) beg=NR; end=NR} next} (beg <= FNR) && (FNR <= end)' file file
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
You can use ed to do it in a single pass, too:
Something like
printf '%s\n' '1g/^"$/.,/^./-1d' '$g/^"$/?^.?+1,$d' w | ed -s "$file"
Translated: If the first line is nothing but a quote, delete it and any following empty lines. If the last line is nothing but a quote, delete all preceding empty lines and it. Finally write the file back to disk.
This might work for you (GNU sed):
sed '1{/^"$/d};/\S/!d;:a;${/^"$/Md};/\S/{n;ba};$d;N;ba' file
Delete the first line if contains a single ".
Delete all empty lines from the start of the file.
Form a loop for the remainder of the file.
Delete the last line(s) if it/they contains a single ".
If the current line(s) is/are not empty, print it/them, fetch the next and repeat.
If the current line(s) is/are the last and empty, delete it/them.
The current line(s) is/are empty so append the next line and repeat.
N.B. This is a single pass solution and allows for empty lines within the body of the file.
Alternative, memory intensive:
sed -Ez 's/^"?\n+//;s/\n+("\n)?$/\n/' file
In addition to the two-pass processing, here's a one-pass:
awk '!/^"*$/{print b $0;f=1;b=""} f&&/^"*$/{b=b $0 ORS}' file
The program consists of two small parts:
Whenever there's content (lines that contain more than "), print possibly buffered lines and the current input line, set a flag that content has started, and clear the buffer.
If content had started (f), but the current line doesn't contain any content, we may have reached the end, so we buffer these empty lines. Later, (1) will print them or they will be discarded on EOF.

Can't replace '\n' with '\\' for whatever reason

I have a whole bunch of files, and I wish to change something like this:
My line of text
My other line of text
Into
My line of text\\
My other line of text
Seems simple, but somehow it isn't. I have tried sed s,"\n\n","\\\\\n", as well as tr '\n' '\\' and about 20 other incarnations of these commands.
There must be something going on which I don't understand... but I'm completely lost as to why nothing is working. I've had some comical things happening too, like when cat'ing out the file, it doesn't print newlines, only writes over where the rest was written.
Does anyone know how to accomplish this?
sed works on lines. It fetches a line, applies your code to it, fetches the next line, and so forth. Since lines are treated individually, multiline regexes don't work quite so easily.
In order to use multiline regexes with sed, you have to first assemble the file in the pattern space and then work on it:
sed ':a $!{ N; ba }; s/\n\n/\\\\\n/g' filename
The trick here is the
:a $!{ N; ba }
This works as follows:
:a # jump label for looping
$!{ # if the end of the input has not been reached
N # fetch the next line and append it to what we already have
ba # go to :a
}
Once this is over, the whole file is in the pattern space, and multiline regexes can be applied to it. Of course, this requires that the file is small enough to fit into memory.
sed is line-oriented and so is inappropriate to try to use on problems that span lines. You just need to use a record-oriented tool like awk:
$ awk -v RS='^$' -v ORS= '{gsub(/\n\n/,"\\\\\n")}1' file
My line of text\\
My other line of text
The above uses GNU awk for multi-char RS.
Here is an awk that solve this:
If the the blank lines could contains tabs or spaces, user this:
awk '!NF{a=a"//"} b{print a} {a=$0;b=NF} END {print a}' file
My line of text//
My other line of text
If blank line is just blank with nothing, this should do:
awk '!NF{a=a"//"} a!=""{print a} {a=$0} END {print a}' file
This might work for you (GNU sed):
sed 'N;s|\n$|//|;P;D' file
This keeps 2 lines in the pattern space at any point in time and replaces an empty line by a double slash.

How to remove empty lines to one empty line between sentences in text files?

I have a text file with many empty lines between sentences. I used sed, gawk, grep but they dont work. :(. How can I do now? Thanks.
Myfile: Desired file:
a a
b b
c c
. .
d d
e e
f f
g g
. .
h
i
h j
i k
j .
k
.
You can use awk for this:
awk 'BEGIN{prev="x"}
/^$/ {if (prev==""){next}}
{prev=$0;print}' inputFile
or the compressed one liner:
awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}' inFl
This is a simple state machine that collapses multi-blank-lines into a single one.
The basic idea is this. First, set the previous line to be non-empty.
Then, for every line in the file, if it and the previous one are blank, just throw it away.
Otherwise, set the previous line to that value, print the line, and carry on.
Sample transcript, the following command:
$ echo '1
2
3
4
5
6
7
8
9
10' | awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}'
outputs:
1
2
3
4
5
6
7
8
9
10
Keep in mind that this is for truly blank lines (no content). If you're trying to collapse lines that have an arbitrary number of spaces or tabs, that will be a little trickier.
In that case, you could pipe the file through something like:
sed 's/^\s*$//'
to ensure lines with just whitespace become truly empty.
In other words, something like:
sed 's/^\s*$//' infile | awk 'my previous awk command'
To suppress repeated empty output lines with GNU cat:
cat -s file1 > file2
Here's one way using sed:
sed ':a; N; $!ba; s/\n\n\+/\n\n/g' file
Otherwise, if you don't mind a trailing blank line, all you need is:
awk '1' RS= ORS="\n\n" file
The Perl solution is even shorter:
perl -00 -pe '' file
You could do like this also,
awk -v RS="\0" '{gsub(/\n\n+/,"\n\n");}1' file
Explanation:
RS="\0" Once we set the null character as Record Seperator value, awk will read the whole file as single record.
gsub(/\n\n+/,"\n\n"); this replaces one or more blank lines with a single blank line. Note that \n\n regex matches a blank line along with the previous line's new line character.
Here is an other awk
awk -v p=1 'p=="" {p=1;next} 1; {p=$0}' file

How to find and replace every match except the first using sed?

I am using sed to find and replace text, e.g.:
set -i 's/a/b/g' ./file.txt
This replaces every instance of a with b in the file. I need to add an exception, such that sed replaces every instance of a with b, except for the first appearance in the file, e.g.:
There lived a bird who liked to eat fish.
One day he fly to a tree.
This becomes:
There lived a bird who liked to ebt fish.
One dby he fly to b tree.
How can I modify my sed script to only replace every instance of a with b, except for the first occurrence?
I have GNU sed version 4.2.1.
This might work for you (GNU sed):
sed 's/a/b/2g' file
or
sed ':a;s/\(a[^a]*\)a/\1b/;ta' file
This can be taylored e.g.
sed ':a;s/\(\(a[^a]*\)\{5\}\)a/\1b/;ta' file
will start replacing a with b after 5 a's
You can do a more complete implementation with a script that's more complex:
#!/bin/sed -nf
/a/ {
/a.*a/ {
h
s/a.*/a/
x
s/a/\n/
s/^[^\n]*\n//
s/a/b/g
H
g
s/\n//
}
: loop
p
n
s/a/b/g
$! b loop
}
The functionality of this is easily explained in pseudo-code
if line contains "a"
if line contains two "a"s
tmp = line
remove everything after the first a in line
swap tmp and line
replace the first a with "\n"
remove everything up to "\n"
replace all "a"s with "b"s
tmp = tmp + "\n" + line
line = tmp
remove first "\n" from line
end-if
loop
print line
read next line
replace all "a"s with "b"s
repeat loop if we haven't read the last line yet
end-loop
end-if
One way is to replace all and then reverse the first replacement (thanks potong):
sed -e 'y/a/\n/' -e 's/\n/a/g' -e 'y/\n/b/'
Newline serves as an intermediate so strings beginning with b work correctly.
The above works line-wise, if you want to apply it to the whole file, first make the whole file into one line:
<infile tr '\n' '^A' | sed 'y/a/\n/; s/\n/a/; y/\n/b/' | tr '^A' '\n'
Or more briefly using the sed command from potong's answer:
<infile tr '\n' '^A' | sed 's/a/b/2g' | tr '^A' '\n'
Note ^A (ASCII 0x01) can be produced with Ctrl-vCtrl-a. ^A in tr can be replaced by \001.
This assumes that the file contains no ^A.

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?
This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines
Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input
I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.
You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close
Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.
A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3