Looking for a Perl or sed one-liner to replace a string with contents from another file - perl

In an HTML file, let’s call it index.html, I want to replace a comment string, say //gac goes here, with the contents (multi-line) from a separate file which is called: gac.js. Is there a nice one-liner for that?
I found something as: sed -e "/$str/r FileB" -e "/$str/d" FileA, but it is not working as promised.
I do like it as short as possible, as it will be called after an SVN revert (I don't want any of that google.script polluting my development environment).

This should work, even though it is nasty:
perl -pe 'BEGIN{open F,"gac.js";#f=<F>}s#//gac goes here##f#' index.html
In the case that gac.js is supposed to be dynamic:
perl -pe 's#//(\S+) goes here#open+F,"$1.js";join"",<F>#e' index.html

perl -mFile::Slurp -pe 's/\/\/(\w+) goes here/#{[File::Slurp::read_file("$1.js")]}/;'
Obviously requires File::Slurp

Not very nice, but seems to work:
cat index.html | perl -pe 'open(GAC, "gac.js");#gac=<GAC>;$data=join("", #gac); s/gac goes here/$data/g'

After going through man sed, this tutorial and some experimenting I came up with:
sed -i '\_//gac goes here_ {
r gac.js
d
}' index.html
Which does exactly what I want. It's not exactly a oneliner (if i make it one line i get: sed: -e expression #1, char 0: unmatched '{') which I don't understand. However expression above fits nicely in my update script.
Lessons learned: sed is very powerfull, -i behaves different on mac os x / linux, /string/ can easily be replaced with \[other delimiter]string[other delimiter].

Related

The perl -pe command

So I've done a research about the perl -pe command and I know that it takes records from a file and creates an output out of it in a form of another file. Now I'm a bit confused as to how this line of command works since it's a little modified so I can't really figure out what exactly is the role of perl pe in it. Here's the command:
cd /usr/kplushome/entities/Standalone/config/webaccess/WebaccessServer/etc
(PATH=/usr/ucb:$PATH; ./checkall.sh;) | perl -pe "s,^, ,g;"
Any idea how it works here?
What's even more confusing in the above statement is this part : "s,^, ,g;"
Any help would be much appreciated. Let me know if you guys need more info. Thank you!
It simply takes an expression given by the -e flag (in this case, s,^, ,g) and performs it on every line of the input, printing the modified line (i.e. the result of the expression) to the output.
The expression itself is something called a regular expression (or "regexp" or "regex") and is a field of learning in and of itself. Quick googles for "regular expression tutorial" and "getting started with regular expressions" turn up tons of results, so that might be a good place to start.
This expression, s,^, ,g, adds ten spaces to the start of the line, and as I said earlier, perl -p applies it to every line.
"s,^, ,g;"
s is use for substitution. syntax is s/somestring/replacement/.
In your command , is the delimiter instead of /.
g is for work globally, means replace all occurrence.
For example:
perl -p -i -e "s/oldstring/newstring/g" file.txt;
In file.txt all oldstring will replace with newstring.
i is for inplace file editing.
See these doc for information:
perlre
perlretut
perlop

Find and replace wildcard that spans across lines?

I would like to input some text like this:
foo
stuff
various stuff
could be anything here
variable number of lines
bar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important
and get out this:
foo
teststuff
bar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important
So far I've read and messed around with using sed, grep, and pcregrep and not been able to make it work. It looks like I could probably do such a thing with Vim, but I've never used Vim before and it looks like a royal pain.
Stuff I've tried includes:
sed -e s/foo.*bar/teststuff/g -i file.txt
and
pcregrep -M 'foo\s+bar\s' file.txt | xargs sed 's/foo.*bar/teststuff/g'
I'm not sure if it's just that I'm not using these commands correctly, or if I'm using the wrong tool. It's hard for me to believe that there is no way to use the terminal to find and replace wildcards that span lines.
For clarity, simplicity, robustness, maintainability, portability and most other desirable qualities of software, just use awk:
$ awk 'f&&/bar/{print "teststuff";f=0} !f; /foo/{f=1}' file
foo
teststuff
bar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important
Try this with GNU sed:
sed -e '/^foo/,/^bar/{/^foo/b;/^bar/{i teststuff' -e 'b};d}'
See: man sed
This might work for you (GNU sed):
sed '/foo/,/bar/cteststuff' file
This will replace everything between foo and bar with teststuff.
However what you appear to want is everything following foo before bar, perhaps:
sed '/foo/,/bar/cfoo\nteststuff\nbar' file

sed and perl not replacing a letter in a file

I have a file 1.htm. I want to replace a letter ṣ (s with dot below). I tried with both sed and perl and it does not replace.
sed -i 's/ṣ/s/g' "1.htm"
perl -i -pe 's/ṣ/s/g' "1.htm"
can anyone suggest what to do
1.html (not replacing ṣ)
Also i have found another strange thing. Sed (same command as above) replaces in one file but not the other I am putting the links
replacable.html
unreplacable.html same as 1.html
Why is it happening so. sed is able to replace ṣ in one file but not the other.
You have combined characters in the html file. That is, the "ṣ" is really a "s" followed by a " ̣" (a COMBINING DOT BELOW). One possibility to fix the oneliner is:
perl -C -i -pe 's/s\x{0323}/s/g' "1.htm"
That is, turn utf8 mode for stdout/stdin on (-C) and explicitely write the two characters in the left side of the s///.
Another possibility is to normalize all the combining characters using Unicode::Normalize, e.g.:
perl -C -MUnicode::Normalize -Mutf8 -i -pe '$_=NFKC($_); s/ṣ/s/g' "1.htm"
But this would also normalize all the other characters in the input file, which may or may not be OK for you.
This might work for you (GNU sed):
sed 's/\o341\o271\o243/s/g' file
To find seds octal interpretation of a character use:
echo 'ṣ'| sed l
This returns (for me):
\341\271\243$
ṣ
Then use \onnn (or combinations of) to find the correct pattern in the lefthandside (LFH) of the substitute command.
N.B. \onnn may also be used in the RHS of the substitute command.

capturing groups in sed

I have many lines of the form
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
ko04080 ko:GZMA
and would dearly like to get rid of the ko: bit of the right-hand column. I'm trying to use sed, as follows:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko\d{5}\)\tko:\(.*$\)/\1\2/'
which simply outputs the original string I echo'd. I'm very new to command line scripting, sed, pipes etc, so please don't be too angry if/when I'm doing something extremely dumb.
The main thing that is confusing me is that the same thing happens if I reverse the \1\2 bit to read \2\1 or just use one group. This, I guess, implies that I'm missing something about the mechanics of piping the output of echo into sed, or that my regexp is wrong or that I'm using sed wrong or that sed isn't printing the results of the substitution.
Any help would be greatly appreciated!
sed is outputting its input because the substitution isn't matching. Since you're probably using GNU sed, try this:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko[0-9]\{5\}\)\tko:\(.*$\)/\1\2/'
\d -> [0-9] since GNU sed doesn't recognize \d
{} -> \{\} since GNU sed by default uses basic regular expressions.
This should do it. You can also skip the last group and simply use, \1 instead, but since you're learning sed and regex this is good stuff. I wanted to use a non-capturing group in the middle (:? ) but I could not get that to play with sed for whatever reason, perhaps it's not supported.
sed --posix 's/\(^ko[0-9]\{5\}\)\( ko:\)\(.*$\)/\1 \3/g' file > result
And ofcourse you can use
sed --posix 's/ko://'
You don't need sed for this
Here is how you can do it with bash:
var="ko05414 ko:ITGA4"
echo ${var//"ko:"}
${var//"ko:"} replaces all "ko:" with ""
See Manipulating Strings for more info
#OP, if you just want to get rid of "ko:", then
$ cat file
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 ko:GZMA
$ awk '{sub("ko:","",$2)}1' file
ko04062 CXCR3
ko04062 CX3CR1
ko04062 CCL3
ko04062 CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 GZMA
Jsut a note. While you can use pure bash string substitution, its only more efficient when you are changing a single string. If you have a file, especially a big file, using bash's while read loop is still slower than using sed or awk.

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?
This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'
#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters
In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.
The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file
Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).