sed delete line containg a pair of () parenthesis - sed

Hi I have created a large set of sed commands to manipulate a text file.
The one last thing I cant seem to solve is where a line simply consists of a pair of ()
some text
()
more text
(leave this line as is)
I want to delete the entire () line
some text
more text
(leave this line as is)
in my script
this sed command -e '/()/ s/()//' can find and remove the () but leaves a line behind
some text
more text
(leave this line as is)
Here is a snipped version of the bigger picture
FILEPATH=*.chordpro
for fn in $FILEPATH; do
echo $fn
fnbak=$fn.bak
mv "$fn" "$fnbak" #Create an untouched backup
sed `: # these are comments` \
`: # Insert after subtitle; author,book,keywords,tempo,time` \
-e '/^{subtitle.*/a {author:mds}\n{book:CatStevens}\n{keywords:70s,Tillerman}\n{tempo:120}\n{time:4/4}' \
-e 's/{subtitle:/{artist:/' `: # swap subtitle for artist` \
-e 's/{time:/{duration:/' `: # modifiy original meta "time" for "duration"` \
....lots of other commands
`: # Tidy up` \
-e '/()/ s/()//' `: # Remove any () pairs created by script` \
"$fnbak" >"$fn"
Here is snippet of the input test cases.
{c:Verse2}
{c: Verse 2: Bass single}
{c: Verse 2 Rock it}
{c: verse 1}
{c: verse 1}
{c: Verse 1:}
{c:Verse}
Here it is converted. Out#1
Verse 2:
()
Verse 2:
( Bass single)
Verse 2:
( Rock it)
Verse 1:
()
Verse 1:
()
Verse 1:
()
Verse :
()
This is the tidy up result using -e '/()/ s/()//' Out#2
Verse 2:
Verse 2:
( Bass single)
Verse 2:
( Rock it)
Verse 1:
Verse 1:
Verse 1:
Verse :
This is the result using -e '/()/d'
Verse 2:
( Bass single)
Verse 2:
( Rock it)
where has everything after Rock it gone and why????
NOTE: ok it has something to do with it being in the loop/interaction with other sed commands.
If I put Out#1 into a file on its own and run just the sed -e '/()/d' command on it, it works
This is what I hoped to achieve
Verse 2:
Verse 2:
( Bass single)
Verse 2:
( Rock it)
Verse 1:
Verse 1:
Verse 1:
Verse :

Don't write lengthy sed scripts, use awk instead for clarity, efficiency, robustness, portability, etc. Instead of this:
sed `: # these are comments` \
`: # Insert after subtitle; author,book,keywords,tempo,time` \
-e '/^{subtitle.*/a {author:mds}\n{book:CatStevens}\n{keywords:70s,Tillerman}\n{tempo:120}\n{time:4/4}' \
-e 's/{subtitle:/{artist:/' `: # swap subtitle for artist` \
-e 's/{time:/{duration:/' `: # modifiy original meta "time" for "duration"` \
....lots of other commands
`: # Tidy up` \
-e '/()/ s/()//' `: # Remove any () pairs created by script` \
"$fnbak" >"$fn"
try this which I think is the equivalent of what you're doing above but also fixes your code to remove () pairs:
awk '
# Insert after subtitle; author,book,keywords,tempo,time
/^{subtitle/ {
$0 = $0 \
"\n{author:mds}" \
"\n{book:CatStevens}" \
"\n{keywords:70s,Tillerman}" \
"\n{tempo:120}" \
"\n{time:4/4}"
}
{
sub(/{subtitle:/,"{artist:") # swap subtitle for artist
sub/{time:/,"{duration:") # modify original meta "time" for "duration"
....lots of other commands
# Tidy up
# Remove any () pairs created by script
gsub(/\n\()\n/,"\n") # Convert every \n()\n to \n ..
gsub(/\n\()\n/,"\n") # .. done twice to handle \n()\n()\n
gsub(/^\()\n|\n\()$/,"") # Remove ()\n at the start and \n() at the end
gsub(/\()/,"") # Remove every remaining ()
print
}
' "$fnbak" > "$fn"
Your comment says Remove any () pairs created by script but the script you posted cannot create any () pairs so I'm assuming that your lots of other commands can do so and I'm just guessing in my awk script about what it really is you want to do in the "Tidy up" section since you didn't provide any sample input/output we could test against.
By the way, the more common way to modify an input file would be:
fnbak=$(mktemp) || exit 1
cmd 'script' "$fn" > "$fnbak" &&
mv -- "$fnbak" "$fn"
rather than
fnbak=$fn.bak
mv "$fn" "$fnback"
cmd 'script' "$fnbak" > "$fn"
The former only keeps the backup file around long enough to modify the original, only uses a single backup for all files rather than 1 per file, and won't wipe out your original input file if there isn't enough disk space or you don't have write permission to create the backup.
You only have to create the backup file once before entering your loop:
FILEPATH=*.chordpro
fnbak=$(mktemp) || exit 1
for fn in $FILEPATH; do
echo "$fn"
cmd 'script' "$fn" > "$fnbak" &&
mv -- "$fnbak" "$fn"
done
but of course you don't need a loop or temp files at all if you use GNU awk, all you need is:
gawk -i inplace 'script' *.chordpro
(add FNR==1{print FILENAME | "cat>&2"} to the awk script to see the input file names printed as they're worked on).

here's a GNU sed script for you - it allows inline comments so there's no need for complicated quoting
Save it in tmp.sh
#!/bin/bash
# tmp.sh
sed -E '
# strip initial piece
s/\{c: *//
# strip terminal piece
s/ *} *$//
# munge verse with number
s/verse *([0-9]+):? */Verse \1:/i
# munge verse without number
s/verse *$/Verse :/i
# put description on new line
s/: *(.+)$/:\n(\1)/
'
Test with heredoc:
$ ./tmp.sh <<EOF
{c:Verse2}
{c: Verse 2: Bass single}
{c: Verse 2 Rock it}
{c: verse 1}
{c: verse 1}
{c: Verse 1:}
{c:Verse}
EOF
Verse 2:
Verse 2:
(Bass single)
Verse 2:
(Rock it)
Verse 1:
Verse 1:
Verse 1:
Verse :
It's hard to say why your single delete command behaves unexpectedly without seeing the entire script
You could try the relatively new --debug option offered in sed 4.6 - or what I find very useful are the l and = commands to show the state of the line at any point in multiple transformations
My script is more like a starting point for you to compare and troubleshoot your script and to modify for other test cases
Hope it helps

Related

Replace one matched pattern with another in multiline text with sed

I have file with this text:
mirrors:
docker.io:
endpoint:
- "http://registry:5000"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://registry:5000"
I need to replace it with this text in POSIX shell script (not bash):
mirrors:
docker.io:
endpoint:
- "http://docker.io"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://localhost"
Replace should be done dynamically in all places without hard-coded names. I mean we should take sub-string from a first line ("docker.io", "registry:5000", "localhost") and replace with it sub-string "registry:5000" in a third line.
I've figure out regex, that splits it on 5 groups: (^ )([^ ]*)(:[^"]*"http:\/\/)([^"]*)(")
Then I've tried to use sed to print group 2 instead of 4, but this didn't work: sed -n 's/\(^ \)\([^ ]*\)\(:[^"]*"http:\/\/\)\([^"]*\)\("\)/\1\2\3\2\5/p'
Please help!
This might work for you (GNU sed):
sed -E '1N;N;/\n.*endpoint:.*\n/s#((\S+):.*"http://)[^"]*#\1\2#;P;D' file
Open up a three line window into the file.
If the second line contains endpoint:, replace the last piece of text following http:// with the first piece of text before :
Print/Delete the first line of the window and then replenish the three line window by appending the next line.
Repeat until the end of the file.
Awk would be a better candidate for this, passing in the string to change to as a variable str and the section to change (" docker.io" or " localhost" or " registry:5000") and so:
awk -v findstr=" docker.io" -v str="http://docker.io" '
$0 ~ findstr { dockfound=1 # We have found the section passed in findstr and so we set the dockfound marker
}
/endpoint/ && dockfound==1 { # We encounter endpoint after the dockfound marker is set and so we set the found marker
found=1;
print;
next
}
found==1 && dockfound==1 { # We know from the found and the dockfound markers being set that we need to process this line
match($0,/^[[:space:]]+-[[:space:]]"/); # Match the start of the line to the beginning quote
$0=substr($0,RSTART,RLENGTH)str"\""; # Print the matched section followed by the replacement string (str) and the closing quote
found=0; # Reset the markers
dockfound=0
}1' file
One liner:
awk -v findstr=" docker.io" -v str="http://docker.io" '$0 ~ findstr { dockfound=1 } /endpoint/ && dockfound==1 { found=1;print;next } found==1 && dockfound==1 { match($0,/^[[:space:]]+-[[:space:]]"/);$0=substr($0,RSTART,RLENGTH)str"\"";found=0;dockfound=0 }1' file

Conditional substitution of patterns in bash strings depending on the beginning of a string

I am new in bash, so excuse me if do not use the right terms.
I need to substitute certain patterns of six characters in a set of files. The order by patterns are substituted depends on the beginning of each string of text.
This is an example of input:
chr1:123-123 5GGGTTAGGGTTAGGGTTAGGGTTAGGGTTA3
chr1:456-456 5TTAGGGTTAGGGTTAGGGTTAGGGTTAGGG3
chr1:789-789 5GGGCTAGGGTTAGGGTTAGGGTTA3
chr1:123-123 etc is the name of the string, they are separated from the string I need to work with by a tab. The string I need to work with is delimited by characters 5 and 3, but I can change them.
I want that all patterns containing T, A, G in anyone of these orders is substituted with X: TTAGGG, TAGGG, AGGGTT, GGGTTA, GGTTAG, GTTAGG.
Similarly, patterns containing CTAGGG, like row 3, in orders similar to the previous one will be substituted with a different character.
The game is repeated with some specific differences for all the 6 characters composing each pattern.
I started writing something like this:
#!/bin/bash
NORMAL=`echo "\033[m"`
RED=`echo "\033[31m"` #red
#read filename for the input file and create a copy and a folder for the output
read -p "Insert name for INPUT file: " INPUT
echo "Creating OUTPUT file " "${RED}"$INPUT"_sub.txt${NORMAL}"
mkdir -p ./"$INPUT"_OUTPUT
cp $INPUT.txt ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
echo
#start the first set of instructions
perfrep
#starting a second set of instructions to substitute pattern with one difference from TTAGGG
onemism
Instructions are
perfrep() {
sed -i -e 's/TTAGGG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/TAGGGT/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/AGGGTT/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GGGTTA/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GGTTAG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GTTAGG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
}
# starting a second set of instructions to substitute pattern with one difference from TTAGGG
onemism(){
sed -i -e 's/[GCA]TAGGG/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/G[GCA]TAGG/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GG[GCA]TAG/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/GGG[GCA]TA/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/AGGG[GCA]T/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
sed -i -e 's/TAGGG[GCA]/L/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
}
I will need to repeat also with T[GCA]AGGG, TT[TCG]GGG, TTA[ACT]GG, TTAG[ACT]G and TTAGG[ACT].
Using this procedure, I get for these results for the inputs shown
5GGGXXXXTTA3
5XXXXX3
5GGGLXXTTA3
In my point of view, for my job, the first and second string are both made by X repeated five times, and the order of characters is just slightly different. On the other hand, the third one could be masked like this:
5LXXX3
How do I tell the script that if the string starts with 5GGGTTA instead of 5TTAGGG must start to substitute with
sed -i -e 's/GGGTTA/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
instead of
sed -i -e 's/TTAGGG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
?
I will need to repeat with all cases; for instance, if the string starts with GTTAGG I will need to start with
sed -i -e 's/GTTAGG/X/g' ./"$INPUT"_OUTPUT/"$INPUT"_sub.txt
and so on, and add a couple of variation of my pattern.
I need to repeat the substitution with TTAGGG and the variations for all the rows of my input file.
Sorry for the very long question. Thank you all.
Adding information asked by Varun.
Patterns of 6 characters would be TTAGGG , [GCA]TAGGG , T[GCA]AGGG , TT[TCG]GGG , TTA[ACT]GG , TTAG[ACT]G , TTAGG[ACT].
Each one must be checked for a different frame, for instance for TTAGGG we have 6 frames TTAGGG , GTTAGG , GGTTAG, GGGTTA , AGGGTT , TAGGGT.
The same frames must be applied to the pattern containing a variable position.
I will have a total of 42 patterns to check, divided in 7 groups: one containing TTAGGG and derivative frames, 6 with the patterns with a variable position and their derivatives.
TTAGGG and derivatives are the most important and need to be checked first.
#! /usr/bin/awk -f
# generate a "frame" by moving the first char to the end
function rotate(base){ return substr(base,2) substr(base,1,1) }
# Unfortunately awk arrays do not store regexps
# so I am generating the list of derivative strings to match
function generate_derivative(frame,arr, i,j,k,head,read,tail) {
arr[i]=frame;
for(j=1; j<=length(frame); j++) {
head=substr(frame,1,j-1);
read=substr(frame,j,1);
tail=substr(frame,j+1);
for( k=1; k<=3; k++) {
# use a global index to simplify
arr[++Z]= head substr(snp[read],k,1) tail
}
}
}
BEGIN{
fs="\t";
# alternatives to a base
snp["A"]="TCG"; snp["T"]="ACG"; snp["G"]="ATC"; snp["C"]="ATG";
# the primary target
frame="TTAGGG";
Z=1; # warning GLOBAL
X[Z] = frame;
# primary derivatives
generate_derivative(frame, X);
xn = Z;
# secondary shifted targets and their derivatives
for(i=1; i<length(frame); i++){
frame = rotate(frame);
L[++Z] = frame;
generate_derivative(frame, L);
}
}
/^chr[0-9:-]*\t5[ACTG]*3$/ {
# because we care about the order of the prinary matches
for (i=1; i<=xn; i++) {gsub(X[i],"X",$2)}
# since we don't care about the order of the secondary matches
for (hit in L) {gsub(L[hit],"L",$2)}
print
}
END{
# print the matches in the order they are generated
#for (i=1; i<=xn; i++) {print X[i]};
#print ""
#for (i=1+xn; i<=Z; i++) {print L[i]};
}
IFF you can generate a static matching order you can live with then
something like the above Awk script could work. but you say the primary patterns should take precedence and that a secondary rule would be better applied first in some cases. (no can do).
If you need a more flexible matching pattern I would suggest looking at "recursive decent parsing with backtracking" Or "parsing expression grammars".
But then you are not in a bash shell anymore.

sed editing multiple lines

Sed editing is always a new challenge to me when it comes to multiple line editing. In this case I have the following pattern:
RECORD 4,4 ,5,48 ,7,310 ,10,214608 ,12,199.2 ,13,-19.2 ,15,-83 ,17,35 \
,18,0.8 ,21,35 ,22,31.7 ,23,150 ,24,0.8 ,25,150 ,26,0.8 ,28,25 ,29,6 \
,30,1200 ,31,1 ,32,0.2 ,33,15 ,36,0.4 ,37,1 ,39,1.1 ,41,4 ,80,2 \
,82,1000 ,84,1 ,85,1
which I want to convert into:
#RECORD 4,4 ,5,48 ,7,310 ,10,214608 ,12,199.2 ,13,-19.2 ,15,-83 ,17,35 \
# ,18,0.8 ,21,35 ,22,31.7 ,23,150 ,24,0.8 ,25,150 ,26,0.8 ,28,25 ,29,6\
# ,30,1200 ,31,1 ,32,0.2 ,33,15 ,36,0.4 ,37,1 ,39,1.1 ,41,4 ,80,2 \
# ,82,1000 ,84,1 ,85,1
Besides this I would like to preserve the entirety of these 4 lines (which may be more or less than 4 (unpredictable as the appear in the input) into one (long) line without the backslashes or line wraps.
Two tasks in one so to say.
sed is mandatory.
It's not terribly clear how you recognize the blocks you want to comment out, so I'll use blocks from a line that starts with RECORD and process as long as there are backslashes at the end (if your requirements differ, the patterns used will need to be amended accordingly).
For that, you could use
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g; s/^/#/ }' filename
This works as follows:
/^RECORD/ { # if you find a line that starts with
# RECORD:
:a # jump label for looping
/\\$/ { # while there's a backslash at the end
# of the pattern space
N # fetch the next line
ba # loop.
}
# After you got the whole block:
s/[[:space:]]*\\\n[[:space:]]*/ /g # remove backslashes, newlines, spaces
# at the end, beginning of lines
s/^/#/ # and put a comment sign at the
# beginning.
}
Addendum: To keep the line structure intact, instead use
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/\(^\|\n\)/&#/g }' filename
This works pretty much the same way, except the newline-removal is removed, and the comment signs are inserted after every line break (and once at the beginning).
Addendum 2: To just put RECORD blocks onto a single line:
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g }' filename
This is just the first script with the s/^/#/ bit removed.
Addendum 3: To isolate RECORD blocks while putting them onto a single line at the same time,
sed -n '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g; p }' filename
The -n flag suppresses the normal default printing action, and the p command replaces it for those lines that we want printed.
To write those records out to a file while commenting them out in the normal output at the same time,
sed -e '/^RECORD/ { :a /\\$/ { N; ba }; h; s/[[:space:]]*\\\n[[:space:]]*/ /g; w saved_records.txt' -e 'x; s/\(^\|\n\)/&#/g }' foo.txt
There's actually new stuff in this. Shortly annotated:
#!/bin/sed -f
/^RECORD/ {
:a
/\\$/ {
N
ba
}
# after assembling the lines
h # copy them to the hold buffer
s/[[:space:]]*\\\n[[:space:]]*/ /g # put everything on a line
w saved_records.txt # write that to saved_records.txt
x # swap the original lines back
s/\(^\|\n\)/&#/g # and insert comment signs
}
When specifying this code directly on the command line, it is necessary to split it into several -e options because the w command is not terminated by ;.
This problem does not arise when putting the code into a file of its own (say foo.sed) and running sed -f foo.sed filename instead. Or, for the advanced, putting a #!/bin/sed -f shebang on top of the file, chmod +xing it and just calling ./foo.sed filename.
Lastly, to edit the input file in-place and print the records to stdout, this could be amended as follows:
sed -i -e '/^RECORD/ { :a /\\$/ { N; ba }; h; s/[[:space:]]*\\\n[[:space:]]*/ /g; w /dev/stdout' -e 'x; s/\(^\|\n\)/&#/g }' filename
The new things here are the -i flag for inplace editing of the file, and to have /dev/stdout as target for the w command.
sed '/^RECORD.*\\$/,/[^\\]$/ s/^/#/
s/^RECORD.*/#&/' YourFile
After several remark of #Wintermute and more information from OP
Assuming:
line with RECORD at start are a trigger to modify the next lines
structure is the same (no line with \ with a RECORD line following directly or empty lines)
Explain:
take block of line starting with RECORD and ending with \
add # in front of each line
take line (so after ana eventual modification from earlier block that leave only RECORD line without \ at the end or line without record) and add a # at the start if starting with RECORD

Is it possible to tell sed to perform a maximum of one substitution per line?

Is it possible to encapsulate the following pseudocode using sed?
for line in lines:
if line == "foo":
print "FOO"
else:
print "- " + line
Here's the first thing I tried:
> echo 'foo
> bar
> baz' | sed -e 's/^foo$/FOO/' -e 's/^/- /'
- FOO
- bar
- baz
This is incorrect since both substitutions are applied to the first line.
Is it possible to tell sed to perform a maximum of one substitution per line?
You can limit what lines a substitution affects, by prefixing it with a pattern:
sed -e '/^foo$/! s/^/- /' -e '/^foo$/ s//FOO/' infile
A better alternative is to use the t branch command which will go to the next line if the previous substitution succeeded:
sed 's/^foo$/FOO/; t; s/^/- /' infile
Or the more portable:
sed -e 's/^foo$/FOO/' -e t -e 's/^/- /' infile
Output in both cases:
FOO
- bar
- baz

Using sed to delete a case insensitive matched line

How do I match a case insensitive regex and delete it at the same time
I read that to get case insensitive matches, use the flag "i"
sed -e "/pattern/replace/i" filepath
and to delete use d
sed -e "/pattern/d" filepath
I've also read that I could combine multiple flags like 2iw
I'd like to know if sed could combine both i and d
I've tried the following but it didn't work
sed -e "/pattern/replace/id" filepath > newfilepath
For case-insensitive use /I instead of /i.
sed -e "/pattern/Id" filepath
you can use (g)awk as well.
# print case insensitive
awk 'BEGIN{IGNORECASE=1}/pattern/{print}' file
# replace with case insensitive
awk 'BEGIN{IGNORECASE=1}/pattern/{gsub(/pattern/,"replacement")}1' file
OR just with the shell(bash)
#!/bin/bash
shopt -s nocasematch
while read -r line
do
case "$line" in
*pattern* ) echo $line;
esac
done <"file"
I produced this one-liner because Ansible cannot handle different lv with the same name. This convert near CSV into perfect JSON. Possibly, you want to change the -F flag to change the field separator.
lvs | perl -ane '
local %tmp,$i=0;
while($i<#f){
$tmp{$f[$i]}=$F[$i] if $F[$i];
$i++
};
if(#f){push #ans,\%tmp}
else{ #f=#F };
END { print to_json(\#ans,{pretty=>1}) }
' -MJSON