Perl substitution fails with multiline - perl

I have two web pages, one page has been created by hand, the other has been published with visual studio 2010 (.aspx). I want to modify the content of these files, replacing a bunch of script tags by a single script tag. To achieve this goal, I simply run some Perl code from a batch file. Here is the Perl code and the HTML before and after substitution :
Perl in a batch :
perl -pi.backup -e "s/<!--\s*<pack>\s*-->.*?<!--\s*<\/pack>\s*-->/<script src=\"pack.js\"><\/script>/s" file.aspx
HTML input :
<!-- <pack> -->
<script src="file1.js" type="text/javascript"></script>
<script src="file2.js" type="text/javascript"></script>
<!-- </pack> -->
HTML output :
<script src="pack.js"></script>
Everything works fine for the hand created file, while the generated file is not updated unless all lines are gathered into one. I guess the issue comes from linebreaks but I can't figure out why it does work only for the first file since the code is exactly the same.

Your problem is that running Perl with the -p switch causes it to execute the code for each line and print the result. Thus the regex is only seeing one line of the file at a time, and is never able to match the entire pattern.
You could do something like this:
perl -i.backup -e "undef $/; $_=<>; s/<!--\s*<pack>\s*-->.*?<!--\s*<\/pack>\s*-->/<script src=\"pack.js\"><\/script>/s; print" file.aspx
It slurps the whole file into $_, then performs your substitution and prints the result to the same file.

Related

Using sed, prepend line only once, if there's a match later in file content

I'd like to add a line on top of my output if my input file has a specific word.
However, if I'm just looking for specific string, then as I understand it, it's too late. The first line is already in the output and I can't prepend to it anymore.
Here's an exemple of input.
one
two
two
three
If I can find a line with, say, the word two, I'd like to add a new line before the first one, with for example FOUND. I want that line prepended only once, even if there are several matches.
So an input file without any two would remain unchanged, and the example file above would become:
FOUND
one
two
two
three
I know how to prepend with i\, but can't get the context right. From what I understood that would be around:
1{
/two/{ # This will will search "two" in the first line, how to look for it in the whole file ?
1i\
FOUND
}
}
EDIT:
I know how to do it using other languages/methods, that's not my question.
Sed has advanced features to work on several lines at once, append/prepend lines and is not limited to substitution. I have a sed file already filled with expressions to modify a python source file, which is why I'd prefer to avoid using something else. I want to be able to add an import at the beginning of a file if a certain class is used.
A Perl solution:
perl -i.bak -0077 -pE 'say "FOUND" if /two/;' in_file
The Perl one-liner uses these command line flags:
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
-E : Tells Perl to look for code in-line, instead of in a file. Also enables all optional features. Here, enables say.
-0777 : Slurp files whole.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
sed is for doing s/old/new on individual strings, that's not what you're trying to do so you shouldn't bother trying to use sed. There's lots of ways to do this, this one will be very efficient, robust and portable to all Unix systems:
$ grep -Fq 'two' file && echo "FOUND"; cat file
FOUND
one
two
two
three
To operate on a stream instead of (or in addition to) a file and without needing to read the whole input into memory:
awk 'f{print; next} {buf[NR]=$0} /two/{print "FOUND"; for (i=1;i<=NR;i++) print buf[i]; f=1}'
e.g.:
$ cat file | awk 'f{print; next} {buf[NR]=$0} /two/{print "FOUND"; for (i=1;i<=NR;i++) print buf[i]; f=1}'
FOUND
one
two
two
three
That awk script will also work using any awk in any shell on every Unix box.

Use processed output from stdin as a replacement string in Sed

Following command gives me the output I want:
$ sed '/^<template.*>/,/<\/template>/!d;//d' src/components/**/*.vue | html2jade
in that it processes each template containing html into it's pug equivalent.
Would it be possible now to somehow replace the originally found html in all those files, with this now
processed output? There is also some other content outside template tags, which should stay as it is,
namely some script and style tags.

What is the purpose of filtering a log file using this Perl one-liner before displaying it in the terminal?

I came across this script which was not written by me, but because of an issue I need to know what it does.
What is the purpose of filtering the log file using this Perl one-liner?
cat log.txt | perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)/ /g'
The log.txt file contains the output of a series of commands. I do not understand what is being filtered here, and why it might be useful.
It looks like the code should remove ANSI escape codes from the input, i.e codes to set colors, window title .... Since some of these code might cause harm it might be a security measure in case some kind of attack was able to include such escape codes into the log file. Since usually a log file does not contain any such escape codes this would also explain why you don't see any effect of this statement for normal log files.
For more information about this kind of attack see A Blast From the Past: Executing Code in Terminal Emulators via Escape Sequences.
BTW, while your question looks bad on the first view it is actually not. But you might try to improve questions by at least formatting it properly. Otherwise you risk that this questions gets down-voted fast.
First, the command line suffers from a useless use of cat. perl is fully capable of reading from a file name on the command line.
So,
$ perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)/ /g' log.txt
would have done the same thing, but avoided spawning an extra process.
Now, -e is followed by a script for perl to execute. In this case, we have a single global substitution.
\e in a Perl regex pattern corresponds to the escape character, x1b.
The pattern following \e looks like the author wants to match ANSI escape sequences.
The -p option essentially wraps the script specified with -e in while loop, so the s/// is executed for each line of the input.
The pattern probably does the job for this simple purpose, but one might benefit from using Regexp::Common::ANSIescape as in:
$ perl -MRegexp::Common::ANSIescape=ANSIescape,no_defaults -pe 's/$RE{ANSIescape}/ /g' log.txt
Of course, if one uses a script like this very often, one might want to either use an alias, or even write a very short script that does this, as in:
#!/usr/bin/env perl
use strict;
use Regexp::Common 'ANSIescape', 'no_defaults';
while (<>) {
s/$RE{ANSIescape}/ /g;
print;
}

Invalid preceding regular expression error given by sed

I'm part of a rotating team that manages a lot of websites, and we inherited some particularly bad code for one website that we're in the process of completely redesigning. Horrifically enough, there are links on the development server that lead you to the live server and to old domains and a lot of other terrible, terrible things.
I've been trying to write a grep/sed command to replace all of these links with the user-defined php function full_link that we use across our websites now to prevent all the linking to different places. So (using a placeholder for our domain) instead of writing http://www.place.com/foo/bar, you'd write <?php echo full_link('foo/bar'); ?> and it will work when we move it from one server to another.
Here's what I got:
grep -v 'echo' * -r -P | grep "(?<=<a href=['\"])(http:\/\/foo\.bar\.net\/|10\.41\.6\.118\/|http:\/\/foo2\.bar\.net\/)([^<]*?)(?=['\"])" -P | sed -r "s#(?<=<a href=['\"])(http://foo\.bar\.net/|10\.41\.6\.118/|http://foo2\.bar\.net/)([^<]*?)(?=['\"])#<?php echo full_link('\2'); ?>#gpw output"
(If you're wondering about the first grep or the [^<], they're both just a basic attempt to keep from putting php tags inside existing php tags. Since it's just a first pass to make manual editing less full of copy-pasting links and getting redirected to the wrong server, it doesn't need to be perfect, but I am open to better ways to do that.)
I've got the grep statements working and picking up what I want them to get, but when I add the sed to the end, this is what happens:
sed: -e expression #1, char 159: Invalid preceding regular expression
From what research I've done, it seems like I probably escaped something in my sed statement incorrectly, and I've tried a number of things, but it only ever gives me the same message pointing to one of the last few characters of the expression.
sed "s.http://[^/]*/.<?php echo full_link('.;s/$/'); ?>/"
look for http://[^/]*/
replace with <?php echo full_link('
look for $
replace with '); ?>

Adding characters to the beginning and end of a line starting with a specific string

I have quite a few html files where I need to comment out a specific line of JavaScript:
<script src="/common/javascript/jquery/jquery.tools-1.2.4.min.js" type="text/javascript"></script>
What I would like to do via command line, is search .htm files in the directory for the string: "/common/javascript/jquery/jquery.tools-1.2.4.min.js" and add <!-- to the beginning of the respective line containing the string, and
--> to the end of the line.
Some files include type= and some do not, which is why I'd like to search using the src value and add to the beginning and end of line.
Thanks for the help. I appreciate it.
This will output a modified file if you replace "that whole line" with that entire js line you want to comment out.
sed 's/\(that whole line\)/<<!--\1-->/' file.htm
Now just iterate that over all the files in the dir
for f in *htm; do
sed 's/\(that whole line\)/<<!--\1-->/' $f > $f.new
done
I'll let you figure out how to handle moving them back to the right filenames. (Maybe a new dir? Maybe a mv command? Whatever's best in your situation.)
Is this what you want?
for f in *.htm*; do
mv "$f" "$f.tmp"
sed 's#^\(.*common/javascript/jquery/jquery.tools-1.2.4.min.js.*\)#<!-- \1 -->#' "$f.tmp" > "$f"
rm "$f.tmp" ;
done