SED Replace string surrounded by spaces with a new string - sed

I am a beginner in scripting...
I have a file containing a several pages of text. And I wanted to change a word, in the middle of a line, by two others, as in this example:
Original line:
BUREAU DES DOCKS ET TRANSPORTS POSTE M.G. CN
Modified line:
BUREAU DES DOCKS ET TRANSPORTS POSTE M.G. CN NICOLAS
To do this, I used 'SED', in this script, with this syntax:
while read var1 var2
do
sed -i -e "/POSTE M.G./ s/ $var1 / $var1 $var2 /g" /exploit/scripts/file-MOD.txt
done < liste_MOD
But it doesn't work. Can someone help me ?

Your solution works for me. Can you show what doesn't work for you?
bjb#rhino:~$ cat /tmp/mod
CN NICOLAS
bjb#rhino:~$ cat /tmp/sed.in
BUREAU DES DOCKS ET TRANSPORTS POSTE M.G. CN
bjb#rhino:~$ while read var1 var2; do sed -i -e "/POSTE M.G./ s/ $var1 /
$var1 $var2 /g" /tmp/sed.in; done < /tmp/mod
bjb#rhino:~$ cat /tmp/sed.in
BUREAU DES DOCKS ET TRANSPORTS POSTE M.G. CN NICOLAS
bjb#rhino:~$
I would use this as the address for the substitution though:
/POSTE M\.G\./
Otherwise it will match things like POSTE MAGA, POSTE M9G2, etc.
Also, to match whole words without requiring the spaces before and after, you can use \b:
s/\b$var1\b/$var1 $var2/g

Related

Syntax error using sed for a find and replace

I am trying to do a find and replace using sed. I am trying to find that : purge: [], to replace by purge: ["./src/**/*.{js,jsx,ts,tsx}", "./public/index.html"], but it does not work, here is my commande :
sed -i -e 's/purge: [],/purge: ["./src/**/*.{js,jsx,ts,tsx}", "./public/index.html"],/g' myfile.txt
Could you help me please ?
Thank you very much !
You need to:
Change delimiters to a char other than a /, like ~ or !, or escape / delimiter chars inside the pattern (I'd suggest the former)
Escape the [ char in the pattern.
You can use
sed -i 's!purge: \[],!purge: ["./src/**/*.{js,jsx,ts,tsx}", "./public/index.html"],!g' myfile.txt
See the online demo:
#!/bin/bash
s='Blah..purge: [], blah...'
sed 's!purge: \[],!purge: ["./src/**/*.{js,jsx,ts,tsx}", "./public/index.html"],!g'<<< "$s"
# => Blah..purge: ["./src/**/*.{js,jsx,ts,tsx}", "./public/index.html"], blah...

Treat Two Columns as One

Sample Text:
$ cat X
Birth Death Name
02/28/42 07/03/69 Brian Jones
11/27/42 09/18/70 Jimi Hendrix
11/19/43 10/04/70 Janis Joplin
12/08/43 07/03/71 Jim Morrison
11/20/46 10/29/71 Duane Allman
After Processing With Perl, column & sed:
$ perl -lae 'print "$F[2]_$F[3] $F[0]"' X | column -t | sed 's/_/ /g'
Name Birth
Brian Jones 02/28/42
Jimi Hendrix 11/27/42
Janis Joplin 11/19/43
Jim Morrison 12/08/43
Duane Allman 11/20/46
This is the exact output I want. But the issue is, I do not want to use column -t | sed 's/_/ /g' at the end.
My intuition is that this can be done only with perl oneliner (without the need of sed or column).
Is it possible? How can I do that?
P.S. I have an awk solution (awk '{print $3"_"$4" "$1}' X | column -t | sed 's/_/ /g')as well for this exact same result. However, I am looking for a perl only solution.
One way
perl -wlnE'say join " ", (split " ", $_, 3)[-1,0]' input.txt
This limits the split to three terms -- first two fields obtained by normally splitting by the given pattern, and then the rest, here comprising the name.
It won't line up nicely as in the shown output.
If the proper alignment is a must, then there's more to do since one must first see the whole file in order to know what the field width should be. Then the "one"-liner (command-line program) is
perl -MList::Util=max -wlne'
push #recs, [ (split " ", $_, 3)[-1,0] ];
END {
$m = max map { length $_->[0] } #recs;
printf("%-${m}s %s\n", #$_) for #recs
}' input.txt
If an apriori-set field width is acceptable, as brought up in a comment, we can do
perl -wlne'printf "%-20s %s\n", (split " ", $_, 3)[-1,0]' input.txt
The saving grace for the obvious short-coming here -- what with names that are longer? -- is that it is only those particular lines that will be out of order.
See if following one liner will be an acceptable solution
perl -ne "/(\S+)\s+\S+\s+(.*)/, printf \"%-13s %s\n\",$2,$1" birth_data.dat
Input birth_data.dat
Birth Death Name
02/28/42 07/03/69 Brian Jones
11/27/42 09/18/70 Jimi Hendrix
11/19/43 10/04/70 Janis Joplin
12/08/43 07/03/71 Jim Morrison
11/20/46 10/29/71 Duane Allman
Output
Name Birth
Brian Jones 02/28/42
Jimi Hendrix 11/27/42
Janis Joplin 11/19/43
Jim Morrison 12/08/43
Duane Allman 11/20/46

Replace first 3 occurrences of a character in each line

I have a tab-delimited file of genetic variants with an INFO column of many semicolon-delimited tags:
Chr Start End Ref Alt ExAC_ALL ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS Otherinfo QUAL DP Chr Start Ref Alt QUAL FILTER INFO
1 15847952 15847952 G C . . . . . . . . . 241.9 76196 1 15847952 . G C 241.9 PASS AC=2;AF=0;AN=18332;BaseQRankSum=0.731;ClippingRankSum=-0.731;DP=76196;ExcessHet=3.1;FS=0;InbreedingCoeff=-0.0456;MLEAC=2;MLEAF=0;MQ=38.93;MQRankSum=0.515;NEGATIVE_TRAIN_SITE;QD=10.52;ReadPosRankSum=0.89;SOR=0.481;VQSLOD=-1.406 culprit=MQ
1 15847963 15847963 A C . . . . . . . . . 1607.1 126156 1 15847963 . A C 1607.1 PASS AC=2;AF=0;AN=22004;BaseQRankSum=0.851;ClippingRankSum=-0.419;DP=126156;ExcessHet=3.4904;FS=0;InbreedingCoeff=0.0299;MLEAC=2;MLEAF=0;MQ=59.29;MQRankSum=0.18;QD=1.55;ReadPosRankSum=0.067;SOR=0.651;VQSLOD=0.995 culprit=QD
1 15847964 15847966 GCC - . . . . . . . . . 1607.1 126156 1 15847963 . AGCC A 1607.1 PASS AC=63;AF=0.003;AN=22004;BaseQRankSum=0.851;ClippingRankSum=-0.419;DP=126156;ExcessHet=3.4904;FS=0;InbreedingCoeff=0.0299;MLEAC=55;MLEAF=0.002;MQ=59.29;MQRankSum=0.18;QD=1.55;ReadPosRankSum=0.067;SOR=0.651;VQSLOD=0.995 culprit=QD
1 15847978 15847978 C T . . . . . . . . . 648.41 234344 1 15847978 . C T 648.41 PASS AC=9;AF=0;AN=25894;BaseQRankSum=-0.572;ClippingRankSum=-0.404;DP=234344;ExcessHet=3.348;FS=2.639;InbreedingCoeff=-0.0098;MLEAC=6;MLEAF=0;MQ=58.71;MQRankSum=-0.456;NEGATIVE_TRAIN_SITE;QD=4.13;ReadPosRankSum=-0.456;SOR=0.452;VQSLOD=-1.238 culprit=QD
I want to split the first 3 semicolon-delimited terms in the INFO column:
AC=2;AF=0;AN=18332
So that they become:
AC=2 AF=0 AN=18332 BaseQRankSum=0.731;ClippingRankSum=-0.731;DP=76196;ExcessHet=3.1;FS=0;InbreedingCoeff=-0.0456;MLEAC=2;MLEAF=0;MQ=38.93;MQRankSum=0.515;NEGATIVE_TRAIN_SITE;QD=10.52;ReadPosRankSum=0.89;SOR=0.481;VQSLOD=-1.406 culprit=M
So far I've tried the following expression with sed:
sed -i .bk 's/\(A.=.*\);/\1 /g' allChr_ExAC38.hg38_multianno.txt
But this yields no changes.
Ideally I was looking for a way to tell sed to replace the first 3 occurences of a semicolon ; for a tab, but 's/;/ /g3' doesn't seem to mean that.
Use Perl instead of sed:
perl -i.bk -pe '$c = 0; s/;/\t/ while $c++ < 3' -- file.txt
You can try this awk
awk '{for(i=1;i<4;i++)sub(";","\t")}1' infile
The .* in your regex is greedy, and will match as much text as possible on the line, up to just before the last semicolon (but not beyond, because then the entire regex won't match at all).
You cannot mix /3 and /g; the latter means, replace all occurrences on every line, so it is directly at odds with the /3 which says to replace only a maximum of three occurrences on a line.
"No changes" seems wrong, though; if your regex matched at all, the last semicolon on matching lines will have been replaced.
Some regex engines support non-greedy matching, but sed isn't one of them. As long as there is a single delimiter character you can use to limit the greediness, using that is a much better solution anyway. In your case, simply replace . with [^;] to say "any character except (newline or) semicolon" instead of "any character (except newline)."
sed 's/\(A.=[^;]*\);/\1 /3' allChr_ExAC38.hg38_multianno.txt
(This will print to standard output for verification; put back the -i .bk once you see the result is correct.)
Based on your example data, perhaps consider replacing the remaining . in the expression with [A-Z] and [^;] with [^;=] or even [0-9]. The more specific you can make your regex, the better.
Could you please try following and let me know if this helps you.
awk '
FNR==1{
print;
next}
{
num=split($(NF-1),array,";");
for(i=4;i<=num;i++){
val=val?val ";"array[i]:array[i]};
$(NF-1)=array[1] OFS array[2] OFS array[3] OFS val;
val="";
$1=$1
}
1
' OFS="\t" Input_file
This might work for you (GNU sed):
sed -i.bak 's/;/\n/3;h;y/;/\t/;G;s/\n.*\n/\t/' file
Replace the third ; with a newline, make a copy of the line, replace all ;'s with \t's, append the copy and replace the end of the first line to the middle of the second line with a \t.
Since by definition a line is demarcated by a newline, lines cannot contain a newline unless they are introduced by a programmer.
If the number of occurrences is reasonable you can pipe sed multiple times i.e.
sed -E -e 's/[0-9]{4}/****/'| sed -E -e 's/[0-9]{4}/****/'| sed -E -e 's/[0-9]{4}/****/'
will mask first 3 4-digit groups of credit card number like so
Input:
1234 5678 9101 1234
Output:
**** **** **** 1234

sed — joining a range of selected lines

I'm a beginner to sed. I know that it's possible to apply a command (or a set of commands) to a certain range of lines like so
sed '/[begin]/,/[end]/ [some command]'
where [begin] is a regular expression that designates the beginning line of the range and [end] is a regular expression that designates the ending line of the range (but is included in the range).
I'm trying to use this to specify a range of lines in a file and join them all into one line. Here's my best try, which didn't work:
sed '/[begin]/,/[end]/ {
N
s/\n//
}
'
I'm able to select the set of lines I want without any problem, but I just can't seem to merge them all into one line. If anyone could point me in the right direction, I would be really grateful.
One way using GNU sed:
sed -n '/begin/,/end/ { H;g; s/^\n//; /end/s/\n/ /gp }' file.txt
This is straight forward if you want to select some lines and join them. Use Steve's answer or my pipe-to-tr alternative:
sed -n '/begin/,/end/p' | tr -d '\n'
It becomes a bit trickier if you want to keep the other lines as well. Here is how I would do it (with GNU sed):
join.sed
/\[begin\]/ {
:a
/\[end\]/! { N; ba }
s/\n/ /g
}
So the logic here is:
When [begin] line is encountered start collecting lines into pattern space with a loop.
When [end] is found stop collecting and join the lines.
Example:
seq 9 | sed -e '3s/^/[begin]\n/' -e '6s/$/\n[end]/' | sed -f join.sed
Output:
1
2
[begin] 3 4 5 6 [end]
7
8
9
I like your question. I also like Sed. Regrettably, I do not know how to answer your question in Sed; so, like you, I am watching here for the answer.
Since no Sed answer has yet appeared here, here is how to do it in Perl:
perl -wne 'my $flag = 0; while (<>) { chomp; if (/[begin]/) {$flag = 1;} print if $flag; if (/[end]/) {print "\n" if $flag; $flag = 0;} } print "\n" if $flag;'

sed property substitution

I've got a property file where I want to do a property substitution, so I wrote a sed patter to change
host = 1234
with another value, but when I execute
echo "host = 1234" | sed 's/\#*\(host[ \t]*\)\=\([ \t]\d*\)/\1\=\1/g'
I got that the substitution is done (host =host) but the \2 atom is also appended to the end of the string (1234). How can I remove it?
`host =host 1234
The first problem is that \d doesn't do what you think. Use [0-9] at least.
You still get host =host out, which seems crazy to me.
EDIT:
Okay
echo "host = 1234" | sed 's/#*\host[ \t]*=[ \t]*\([0-9]*\)/host = asdf/g'
Why capture 'host' if it's always the same? Just rewrite it.
Why preserve the exact tab/space information? Just rewrite it.
Why escape things which are not special?
I hope you get the idea.
But here's what you probably want:
sed '/^#/!s/[ \t]*\([^ \t]*\)[ \t]*=[ \t]*\([^ \t]*\)/\1 = newvalue/g' input_file
This will change anything = anything to anything = newvalue in non-commented lines of input_file. To make it a specific key which is replaced by newvalue, use:
sed '/^#/!s/[ \t]*\(host\)[ \t]*=[ \t]*\([^ \t]*\)/\1 = newvalue/g' input_file
to e.g. replace only lines reading host = anything.
Does this suit your needs?
echo "host = 1234" | cut -d"=" -f 1
yields
host
Then,
echo "host = 1234" | cut -d"=" -f 1
yields
1234