Sed expression to match this multiline code? - sed

Assume the following code snippet:
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>
some stuff
a change
more stuff
more changes
more stuff
}
}
}
}
final changes
</script>
</body>
I need to add something right before the last </script>, what's stated as final changes. How can I tell sed to match that one? final changes doesn't exist, the last lines of the script are like four or five }, so it would be the scenario, I'd need to match multiple lines.
All the other changes were replaced by matching the line, then replacing with the line + the changes. But I don't know how to match the multi line to replace</script></body> with final changes </script></body>.
I tried to use the same tactic I use for replacing with multiple lines, but it didn't work, keep reporting unterminated substitute pattern.
sed 's|</script>\
</body>|lalalalala\
</script>\
</body>|' file.hmtl
I've read this question Sed regexp multiline - replace HTML but it doesn't suit my particular case because it matches everything between the search options. I need to match something, then add something before the first search operator.

sed, grep, awk etc. are NOT for XML/HTML processing.
Use a proper XML/HTML parsers.
xmlstarlet is one of them.
Sample file.html:
<html>
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>
var data = [0, 1, 2];
console.log(data);
</script>
</body>
</html>
The command:
xmlstarlet ed -O -P -u '//body/script' -v 'alert("success")' file.htm
The output:
<html>
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>alert("success")</script>
</body>
</html>
http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html

Finally got this following xara's answer in https://unix.stackexchange.com/questions/26284/how-can-i-use-sed-to-replace-a-multi-line-string
In summary, instead of trying to do magic with sed, replace the newlines with a character which sed understands (like \r), do the replace and then replace the character with newline again.

Related

sed unterminated `s' command - removing js call from html

I want to remove script calls from the HTML with following script.
var=$(sed -e '/^<script.*</script>$/d' -e '/.js/!d' testFile.html)
sed -i -e "/$var/d" testFile.html
Sample input file:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
<script type="text/javascript" src="script.js" language="javascript">
</script>
<script>
// script code
</script>
</head>
<body>
</body>
</html>
Sample output file:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
</script>
<script>
// script code
</script>
</head>
<body>
</body>
</html>
But, it gives the following error..
sed: -e expression #1, char 23: unterminated `s' command
Thanks in advance
trying
root#isadora:~/temp# sed -e '/^<script/,/<\/script>/d' aaaa.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
</script>
</head>
<body>
</body>
</html>
root#isadora:~/temp#
Att.
It is unclear why you break this up into two separate scripts or what you hope for the variable to contain. This can be performed trivially with a single script.
The immediate problem is that you cannot use a literal unescaped slash in a regex if you use slash as the regex separator. Either use a different separator, or backslash-escape any literal slashes.
sed -i -e '\#^<script.*</script>$#d' -e '/\.js/!d' testFile.html
Notice also the backslash before the dot (an unescaped dot in a regex matches any character, so /.js/ matches e.g. the string notjs.)

Sed only with specific place

For example;
I'd love to replace /test src path only within <img> tag.
However <p>test</p> should not be touched.
$ cat test.html
<img src="/test" width="18" alt="" /><br>
<p>test</p>
For now I could execute something like;
sed -i '/test'|/hoge|g' test.html
However it changes the word globally.
sed '/<img/s|/test|/hoge|g' test.html would work for one line <img tags
Sed allows the s///g replacement to be prefixed with another /PATTERN/ to restrict the replacement to lines matching PATTERN.
But you should really use an xml parser to be safe.
Another approach with sed:
sed -i 's|\(<img *src="/\)test|\1hoge|' test.html
<img *src="/ is captured and backreferenced using \1 in substitution string.
Following string(test) is replaced with hoge.

How to sed stuff within pairs of quotes?

I want to change lines like:
<A HREF="classes_index_additions.html"class="hiddenlink">
to
<A HREF="classes_index_additions.html" class="hiddenlink">
(note the added ' ' before class) but it should leave lines like
<meta name="generator" content="JDiff v1.1.1">
alone. sed -e 's|\("[^"]*"\)\([^ />]\)|\1 \2|g' satisfies the first condition but it changes the other text to
<meta name="generator" content=" JDiff v1.1.1"/>
How do I get sed to process the correct pairs of double quotes?
You can try this:
sed -e 's/"\([^" ]*\)=/" \1=/g'
But with sed, it may be possible that the regular expression matches other parts of your document that you didn't intend, so best to try it and look over the results to see if there are any unintended side effects!
You can try putting each attributes on a new line and then triming trailing spaces on each line before removing new lines.
sed -r 's/(\w*="[^"]*")/\n\1/g; s/ *\n/\n/g; s/\n/ /g'
This works as follow :
s/(\w*="[^"]*")/\n\1/g
Put every attributes on a new line so your node looks like this
<A
HREF="classes_index_additions.html"
class="hiddenlink">
After that you remove trailing spaces
s/ *\n/\n/g
And remove new lines
s/\n/ /g

Perl substitution fails with multiline

I have two web pages, one page has been created by hand, the other has been published with visual studio 2010 (.aspx). I want to modify the content of these files, replacing a bunch of script tags by a single script tag. To achieve this goal, I simply run some Perl code from a batch file. Here is the Perl code and the HTML before and after substitution :
Perl in a batch :
perl -pi.backup -e "s/<!--\s*<pack>\s*-->.*?<!--\s*<\/pack>\s*-->/<script src=\"pack.js\"><\/script>/s" file.aspx
HTML input :
<!-- <pack> -->
<script src="file1.js" type="text/javascript"></script>
<script src="file2.js" type="text/javascript"></script>
<!-- </pack> -->
HTML output :
<script src="pack.js"></script>
Everything works fine for the hand created file, while the generated file is not updated unless all lines are gathered into one. I guess the issue comes from linebreaks but I can't figure out why it does work only for the first file since the code is exactly the same.
Your problem is that running Perl with the -p switch causes it to execute the code for each line and print the result. Thus the regex is only seeing one line of the file at a time, and is never able to match the entire pattern.
You could do something like this:
perl -i.backup -e "undef $/; $_=<>; s/<!--\s*<pack>\s*-->.*?<!--\s*<\/pack>\s*-->/<script src=\"pack.js\"><\/script>/s; print" file.aspx
It slurps the whole file into $_, then performs your substitution and prints the result to the same file.

Using sed to find and append multiple files with multiple lines?

Hi I'm trying to append a few lines of codes to a couple thousand html files in a directory (and sub-directories). What I'm trying to do is add xxx lines of code to all html files following the tag. I've tried to explore sed but I'm having issues with having the / sign inside the search and adding the several lines of codes to the sed command.
I'm thinking of adding the lines I want to add in a txt file and use sed to place all content in that txt file after the tag.
Much appreciate any help.
Say sample.html contains this:
<html>
<head>
</head>
<h1>Title</h1>
<body>
etc
I want to add this after the </h1> element:
<script>
etc.
</script>
<iframe>
</iframe>
to Produce this:
<html>
<head>
</head>
<h1>Title</h1>
<script>
etc.
</script>
<iframe>
</iframe>
<body>
etc
Assuming you want to place the text after the H1 end tag, and that end tag enter code here:
sed -i '/<\/h1>/r new_text.html' sample.html
Another solution:
Content of script.sed
/<\/h1>/ {
a\
<script>\
etc.\
</script>\
<iframe>\
</iframe>
}
Run it like:
sed -i -f script.sed sample.html