sed unterminated `s' command - removing js call from html - sed

I want to remove script calls from the HTML with following script.
var=$(sed -e '/^<script.*</script>$/d' -e '/.js/!d' testFile.html)
sed -i -e "/$var/d" testFile.html
Sample input file:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
<script type="text/javascript" src="script.js" language="javascript">
</script>
<script>
// script code
</script>
</head>
<body>
</body>
</html>
Sample output file:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
</script>
<script>
// script code
</script>
</head>
<body>
</body>
</html>
But, it gives the following error..
sed: -e expression #1, char 23: unterminated `s' command
Thanks in advance

trying
root#isadora:~/temp# sed -e '/^<script/,/<\/script>/d' aaaa.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
</script>
</head>
<body>
</body>
</html>
root#isadora:~/temp#
Att.

It is unclear why you break this up into two separate scripts or what you hope for the variable to contain. This can be performed trivially with a single script.
The immediate problem is that you cannot use a literal unescaped slash in a regex if you use slash as the regex separator. Either use a different separator, or backslash-escape any literal slashes.
sed -i -e '\#^<script.*</script>$#d' -e '/\.js/!d' testFile.html
Notice also the backslash before the dot (an unescaped dot in a regex matches any character, so /.js/ matches e.g. the string notjs.)

Related

sed: how to replace a specific string

I have an issue I am trying to solve with sed. My goal is to quote a the content after content= if the content is not already quoted.
Here is the concrete example:
<meta name="ProgId" content=Word.Document>
<meta name="Generator" content="Microsoft Word 15">
I would like to add quotes around Word.Document so at the end have:
<meta name="ProgId" content="Word.Document">
<meta name="Generator" content="Microsoft Word 15">
I was trying with
sed -i 's#content="\(.*\)"#content="\1"/#g' "$1"
However this is not working.
Thank you.
There is no " in the input behind content=, so you shouldn't match it. You could match up until a space or >.
sed 's#content=\([^"][^ >]*\)#content="\1"#'
Note that you should use XML aware tools to parse XML documents.
This should work:
sed -E 's/content=([^">]+)/content="\1"/'
Explanation:
In this way, you tell sed to substitute everything is after content= and before > only if it doesn't start with ". I used regex grouping to replace the content with itself surrounded by ".
Input:
<meta name="ProgId" content=Word.Document>
<meta name="Generator" content="Microsoft Word 15">
Output:
<meta name="ProgId" content="Word.Document">
<meta name="Generator" content="Microsoft Word 15">

Sed expression to match this multiline code?

Assume the following code snippet:
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>
some stuff
a change
more stuff
more changes
more stuff
}
}
}
}
final changes
</script>
</body>
I need to add something right before the last </script>, what's stated as final changes. How can I tell sed to match that one? final changes doesn't exist, the last lines of the script are like four or five }, so it would be the scenario, I'd need to match multiple lines.
All the other changes were replaced by matching the line, then replacing with the line + the changes. But I don't know how to match the multi line to replace</script></body> with final changes </script></body>.
I tried to use the same tactic I use for replacing with multiple lines, but it didn't work, keep reporting unterminated substitute pattern.
sed 's|</script>\
</body>|lalalalala\
</script>\
</body>|' file.hmtl
I've read this question Sed regexp multiline - replace HTML but it doesn't suit my particular case because it matches everything between the search options. I need to match something, then add something before the first search operator.
sed, grep, awk etc. are NOT for XML/HTML processing.
Use a proper XML/HTML parsers.
xmlstarlet is one of them.
Sample file.html:
<html>
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>
var data = [0, 1, 2];
console.log(data);
</script>
</body>
</html>
The command:
xmlstarlet ed -O -P -u '//body/script' -v 'alert("success")' file.htm
The output:
<html>
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>alert("success")</script>
</body>
</html>
http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html
Finally got this following xara's answer in https://unix.stackexchange.com/questions/26284/how-can-i-use-sed-to-replace-a-multi-line-string
In summary, instead of trying to do magic with sed, replace the newlines with a character which sed understands (like \r), do the replace and then replace the character with newline again.

Sed only with specific place

For example;
I'd love to replace /test src path only within <img> tag.
However <p>test</p> should not be touched.
$ cat test.html
<img src="/test" width="18" alt="" /><br>
<p>test</p>
For now I could execute something like;
sed -i '/test'|/hoge|g' test.html
However it changes the word globally.
sed '/<img/s|/test|/hoge|g' test.html would work for one line <img tags
Sed allows the s///g replacement to be prefixed with another /PATTERN/ to restrict the replacement to lines matching PATTERN.
But you should really use an xml parser to be safe.
Another approach with sed:
sed -i 's|\(<img *src="/\)test|\1hoge|' test.html
<img *src="/ is captured and backreferenced using \1 in substitution string.
Following string(test) is replaced with hoge.

How to sed stuff within pairs of quotes?

I want to change lines like:
<A HREF="classes_index_additions.html"class="hiddenlink">
to
<A HREF="classes_index_additions.html" class="hiddenlink">
(note the added ' ' before class) but it should leave lines like
<meta name="generator" content="JDiff v1.1.1">
alone. sed -e 's|\("[^"]*"\)\([^ />]\)|\1 \2|g' satisfies the first condition but it changes the other text to
<meta name="generator" content=" JDiff v1.1.1"/>
How do I get sed to process the correct pairs of double quotes?
You can try this:
sed -e 's/"\([^" ]*\)=/" \1=/g'
But with sed, it may be possible that the regular expression matches other parts of your document that you didn't intend, so best to try it and look over the results to see if there are any unintended side effects!
You can try putting each attributes on a new line and then triming trailing spaces on each line before removing new lines.
sed -r 's/(\w*="[^"]*")/\n\1/g; s/ *\n/\n/g; s/\n/ /g'
This works as follow :
s/(\w*="[^"]*")/\n\1/g
Put every attributes on a new line so your node looks like this
<A
HREF="classes_index_additions.html"
class="hiddenlink">
After that you remove trailing spaces
s/ *\n/\n/g
And remove new lines
s/\n/ /g

Using sed to find and append multiple files with multiple lines?

Hi I'm trying to append a few lines of codes to a couple thousand html files in a directory (and sub-directories). What I'm trying to do is add xxx lines of code to all html files following the tag. I've tried to explore sed but I'm having issues with having the / sign inside the search and adding the several lines of codes to the sed command.
I'm thinking of adding the lines I want to add in a txt file and use sed to place all content in that txt file after the tag.
Much appreciate any help.
Say sample.html contains this:
<html>
<head>
</head>
<h1>Title</h1>
<body>
etc
I want to add this after the </h1> element:
<script>
etc.
</script>
<iframe>
</iframe>
to Produce this:
<html>
<head>
</head>
<h1>Title</h1>
<script>
etc.
</script>
<iframe>
</iframe>
<body>
etc
Assuming you want to place the text after the H1 end tag, and that end tag enter code here:
sed -i '/<\/h1>/r new_text.html' sample.html
Another solution:
Content of script.sed
/<\/h1>/ {
a\
<script>\
etc.\
</script>\
<iframe>\
</iframe>
}
Run it like:
sed -i -f script.sed sample.html