Remove empty line before a pattern using sed - sed

Context
E.g. I have this test file foo.py:
#!/usr/bin/env python3
'''foo'''
# comment
import ...
# [END import]
import ...
# [END import]
import ...
# [END import]
# [END import]
import even...
# [END import]
# [END import]
import odd...
# [END import]
# [END import]
Expected
I would like to remove blank line before # [END import
#!/usr/bin/env python3
'''foo'''
# comment
import ...
# [END import]
import ...
# [END import]
import ...
# [END import]
# [END import]
import even...
# [END import]
# [END import]
import odd...
# [END import]
# [END import]
Can someone give me a working version using sed and/or explain why the following didn't work
Test 0
sed '$!N;s/^\n\(# \[END\)/\1/g' foo.py
Observed
#!/usr/bin/env python3
'''foo'''
# comment
import ...
# [END import]
import ...
# [END import]
import ...
# [END import]
# [END import]
import even...
# [END import]
# [END import]
import odd...
# [END import]
# [END import]
Here only "even" line changed since here we "consume" two line at a time
using N; without coming back...
Test 1
sed ':r;$!{N;br};s/^\n\(# \[END\)/\1/g' foo.py
Observed
nothing change, here I don't understand why it is not working (i.e. why the pattern is not matched)...
Test 2
without the ^ anchor.
sed ':r;$!{N;br};s/\n\(# \[END\)/\1/g' foo.py
Observed
#!/usr/bin/env python3
'''foo'''
# comment
import ...
# [END import]
import ...
# [END import]
import ...# [END import]# [END import]
import even...
# [END import]
# [END import]
import odd...
# [END import]
# [END import]
notice the double #[END on the same line as expected by the command but not the result expected.
Test 3
sed ':r;$!{N;br};s/\n\(\n# \[END\)/\1/g' foo.py
Observed
WORKING as expected, BUT I can't figure out why it can match \n\n i.e. two consecutive returns

You'll need to add m flag for Test 1, so that ^ and $ anchors will match every line's start and end locations, otherwise they'll match start/end of entire string. This assumes m flag is supported by your implementation, like GNU sed does.
sed ':r;$!{N;br};s/^\n\(# \[END\)/\1/mg'
Test 3 works because there's a newline just before the empty line as part of that previous line. The below example might help you better visualize it:
$ printf 'a\nb\nc\n'
a
b
c
$ printf 'a\nb\n\nc\n'
a
b
c
With perl:
perl -0777 -pe 's/\n\K\n(?=# \[END)//g'
-0777 will slurp the entire input as single string
\n\K\n(?=# \[END) will match a newline provided there's a newline character before and # \[END after that newline
Another alternative with GNU sed, doesn't need to read whole file in one go.
sed '/^$/{N; s/\n\(# \[END\)/\1/; P; D}'
/^$/ will match an empty line
N add next line to pattern space
s/\n\(# \[END\)/\1/ remove the newline if required regexp matches
P and D are crucial here, so I'll quote the manual:
P Print out the portion of the pattern space up to the first newline.
D If pattern space contains no newline, start a normal new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the first newline, and restart cycle with the resultant pattern space, without reading a new line of input.

This might work for you (GNU sed):
sed 'N;/^\n# \[END import\]/!P;D' file
Open a window of 2 lines throughout the file and if the first line is empty and the second line is # [END import] do not print the first line.
N.B. The idiom N;...;P;D prints all lines in a file but allows the programmer to reason about 2 lines at a time.

Related

Using SED to Remove Anything but a Pattern

I have a bunch of . pdf file names. For example:
901201_HKW_RNT_HW21_136_137_DE_442_Freigabe_DE_CLX.pdf
and i am trying to remove everything but this pattern XXX_XXX where X is always a digit.
The result should be:
136_137
So far i did the opposite .. manage to match the pattern by using :
set NoSpacesString to do shell script "echo " & quoted form of insideName & " | sed 's/([0-9][0-9][0-9]_[0-9][0-9][0-9])//'"
My goal is to set NoSpaceString to 136_137
Little bit of help please.
Thank you !
P.S. The rest of the code is in AppleScript if this matters
Fixing sed command...
You can use
sed -n 's/.*\([0-9]\{3\}_[0-9]\{3\}\).*/\1/p'
See the online demo
Details
-n - suppresses the default line output
s/.*\([0-9]\{3\}_[0-9]\{3\}\).*/\1/ - finds the .*\([0-9]\{3\}_[0-9]\{3\}\).* pattern that matches
.* - any zero or more chars
\([0-9]\{3\}_[0-9]\{3\}\) - Group 1 (the \1 in the RHS refers to this group value): three digits, _, three digits
.* - any zero or more chars
p - prints the result of the substitution only.
The regex above is a POSIX BRE compliant pattern. The same can be written in POSIX ERE:
sed -En 's/.*([0-9]{3}_[0-9]{3}).*/\1/p'
Final AppleScript code
set noSpacesString to do shell script "sed -En 's/.*([0-9]{3}_[0-9]{3}).*/\\1/p' <<<" & insideName's quoted form
This might work for you (GNU sed):
sed -E '/\n/{P;D};s/[0-9]{3}_[0-9]{3}/\n&\n/;D' file
This solution will print all occurrences of the pattern on a separate line.
The initial command is dependant on what follows.
The second command replaces the desired pattern prepending and appending newlines either side.
The D command removes up to the first newline, but as the pattern space is not empty, restarts the sed cycle (without append the next line).
Now the initial command comes into play. The front of the line is printed and then deleted along with its appended newline.
Again, the sed cycle is restarted as if the line had never been presented but minus any characters up to and including the first desired pattern.
This flip-flop pattern of control is repeated until nothing is left and then repeated on subsequent lines until the end of the file.
Here is a copy of the debug log for a suitable one line input containing two representations of the desired pattern:
SED PROGRAM:
/\n/ {
P
D
}
s/[0-9]{3}_[0-9]{3}/
&
/
D
INPUT: 'file' line 1
PATTERN: aaa123_456bbb123_456ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
MATCHED REGEX REGISTERS
regex[0] = 3-10 '123_456'
PATTERN: aaa\n123_456\nbbb123_456ccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'aaa'
PATTERN: \n123_456\nbbb123_456ccc
COMMAND: D
PATTERN: 123_456\nbbb123_456ccc
COMMAND: /\n/ {
COMMAND: P
123_456
COMMAND: D
PATTERN: bbb123_456ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
MATCHED REGEX REGISTERS
regex[0] = 3-10 '123_456'
PATTERN: bbb\n123_456\nccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'bbb'
PATTERN: \n123_456\nccc
COMMAND: D
PATTERN: 123_456\nccc
COMMAND: /\n/ {
COMMAND: P
123_456
COMMAND: D
PATTERN: ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
PATTERN: ccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'ccc'
PATTERN:
COMMAND: D

Replace one matched pattern with another in multiline text with sed

I have file with this text:
mirrors:
docker.io:
endpoint:
- "http://registry:5000"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://registry:5000"
I need to replace it with this text in POSIX shell script (not bash):
mirrors:
docker.io:
endpoint:
- "http://docker.io"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://localhost"
Replace should be done dynamically in all places without hard-coded names. I mean we should take sub-string from a first line ("docker.io", "registry:5000", "localhost") and replace with it sub-string "registry:5000" in a third line.
I've figure out regex, that splits it on 5 groups: (^ )([^ ]*)(:[^"]*"http:\/\/)([^"]*)(")
Then I've tried to use sed to print group 2 instead of 4, but this didn't work: sed -n 's/\(^ \)\([^ ]*\)\(:[^"]*"http:\/\/\)\([^"]*\)\("\)/\1\2\3\2\5/p'
Please help!
This might work for you (GNU sed):
sed -E '1N;N;/\n.*endpoint:.*\n/s#((\S+):.*"http://)[^"]*#\1\2#;P;D' file
Open up a three line window into the file.
If the second line contains endpoint:, replace the last piece of text following http:// with the first piece of text before :
Print/Delete the first line of the window and then replenish the three line window by appending the next line.
Repeat until the end of the file.
Awk would be a better candidate for this, passing in the string to change to as a variable str and the section to change (" docker.io" or " localhost" or " registry:5000") and so:
awk -v findstr=" docker.io" -v str="http://docker.io" '
$0 ~ findstr { dockfound=1 # We have found the section passed in findstr and so we set the dockfound marker
}
/endpoint/ && dockfound==1 { # We encounter endpoint after the dockfound marker is set and so we set the found marker
found=1;
print;
next
}
found==1 && dockfound==1 { # We know from the found and the dockfound markers being set that we need to process this line
match($0,/^[[:space:]]+-[[:space:]]"/); # Match the start of the line to the beginning quote
$0=substr($0,RSTART,RLENGTH)str"\""; # Print the matched section followed by the replacement string (str) and the closing quote
found=0; # Reset the markers
dockfound=0
}1' file
One liner:
awk -v findstr=" docker.io" -v str="http://docker.io" '$0 ~ findstr { dockfound=1 } /endpoint/ && dockfound==1 { found=1;print;next } found==1 && dockfound==1 { match($0,/^[[:space:]]+-[[:space:]]"/);$0=substr($0,RSTART,RLENGTH)str"\"";found=0;dockfound=0 }1' file

How to replace a block of code between two patterns with blank lines?

I am trying replace a block of code between two patterns with blank lines
Tried using below command
sed '/PATTERN-1/,/PATTERN-2/d' input.pl
But it only removes the lines between the patterns
PATTERN-1 : "=head"
PATTERN-2 : "=cut"
input.pl contains below text
=head
hello
hello world
world
morning
gud
=cut
Required output :
=head
=cut
Can anyone help me on this?
$ awk '/=cut/{f=0} {print (f ? "" : $0)} /=head/{f=1}' file
=head
=cut
To modify the given sed command, try
$ sed '/=head/,/=cut/{//! s/.*//}' ip.txt
=head
=cut
//! to match other than start/end ranges, might depend on sed implementation whether it dynamically matches both the ranges or statically only one of them. Works on GNU sed
s/.*// to clear these lines
awk '/=cut/{found=0}found{print "";next}/=head/{found=1}1' infile
# OR
# ^ to take care of line starts with regexp
awk '/^=cut/{found=0}found{print "";next}/^=head/{found=1}1' infile
Explanation:
awk '/=cut/{ # if line contains regexp
found=0 # set variable found = 0
}
found{ # if variable found is nonzero value
print ""; # print ""
next # go to next line
}
/=head/{ # if line contains regexp
found=1 # set variable found = 1
}1 # 1 at the end does default operation
# print current line/row/record
' infile
Test Results:
$ cat infile
=head
hello
hello world
world
morning
gud
=cut
$ awk '/=cut/{found=0}found{print "";next}/=head/{found=1}1' infile
=head
=cut
This might work for you (GNU sed):
sed '/=head/,/=cut/{//!z}' file
Zap the lines between =head and =cut.

Why version is not printable?

I have this one liner:
perl -Mversion -e 'our $VERSION = v1.02; print $VERSION'
The output is (It is not visible, there is two characters: 1, 2):
Why module version is not printable? I expect to see v1.02
I have found this DOC
print v9786; # prints SMILEY, "\x{263a}"
print v102.111.111; # prints "foo"
print 102.111.111; # same
Answering to my question:
Despite on that v1.02 is v-string that is not string internally. And when we want to print it we should do extra steps. For example, use module version as suggested above.
UPD
I found next solution (DOC):
printf "%vd", $VERSION; # prints "1.2"
UPD
And this should be read:
There are two ways to enter v-strings: a bare number with two or more decimal points, or a bare number with one or more decimal points and a leading 'v' character (also bare). For example:
$vs1 = 1.2.3; # encoded as \1\2\3
$vs2 = v1.2; # encoded as \1\2

Replacing strings in files with bash sed or a scripting language (TCL, perl)

I have a list of C++ source files, which have the following structure:
// A lot of stuff
#include <current/parser/support/base.hpp>
// ...
#include <current/parser/iterators/begin.hpp>
// ...
I need to replace lines like
#include <current/parser/support/base.hpp>
with
#include <support_base.hpp>
Namely, omit the current/parser and replace the separator (/) with _.
Is this possible to do with bash sed or a scripting language?
EDIT: Sorry, forgot to mention that I want to replace anything like
#include <current/parser/*/*/*/*>
Anything can go after current/parser, and with any depth.
Going with Tcl:
# Open the file for reading
set fin [open filein.c r]
# Open the file to write the output
set fout [open fileout.c w]
# Loop through each line
while {[gets $fin line] != -1} {
# Check for lines beginning with "^#include <current/parser/"
#
# ^ matches the beginning of the line
# ([^>]*) matches the part after "#include <current/parser/" and stores it
# in the variable 'match'
if {[regexp {^#include <current/parser/([^>]*)>} $line - match]} {
# the edited line is now built using the match from above after replacing
# forward slashes with underscores
set newline "#include <[string map {/ _} $match]>"
} else {
set newline $line
}
# Put output to the file
puts $fout $newline
}
# Close all channels
close $fin
close $fout
Output with the provided input:
// A lot of stuff
#include <support_base.hpp>
// ...
#include <iterators_begin.hpp>
// ...
Demo on codepad (I edited the code a bit since I can't have a channel open to read/write in files there)
Using sed:
sed -i -e '/#include <current\/parser\/support\/base\.hpp>/{ s|current/parser/||; s|/|_|; }' -- file1 file2 file3
Edit:
sed -i -e '/#include <current\/parser\/.*>/{ s|current/parser/||; s|/|_|g; }' -- file1 file2 file3
Would remove currrent/parsers/ and replace all / with _. Example result file:
// A lot of stuff
#include <support_base.hpp>
// ...
#include <iterators_begin.hpp>
// ...
Some details:
/#include <current\/parser\/.*>/ -- Matcher.
s|current/parser/|| -- Deletes `current/parser/` in matched line.
s|/|_|g -- Replaces all `/` with `_` in same line.
You can try it with sed and -r for regular expression:
sed -r 's|#include <current/parser/support/base\.hpp>|#include <support_base.hpp>|g' file
But using this way could kill your code. So be carefull :)
Using a perl one-liner
perl -i -pe 's{^#include <\Kcurrent/parser/([^>]*)}{$1 =~ y|/|_|r}e;' file.cpp
Or without regex features greater than perl 5.10
perl -i -pe 's{(?<=^#include <)current/parser/([^>]*)}{join "_", split "/", $1}e;' file.cpp
Explanation:
Switches:
-i: Edit files in place (makes backup if extension supplied)
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.