I have an issue I am trying to solve with sed. My goal is to quote a the content after content= if the content is not already quoted.
Here is the concrete example:
<meta name="ProgId" content=Word.Document>
<meta name="Generator" content="Microsoft Word 15">
I would like to add quotes around Word.Document so at the end have:
<meta name="ProgId" content="Word.Document">
<meta name="Generator" content="Microsoft Word 15">
I was trying with
sed -i 's#content="\(.*\)"#content="\1"/#g' "$1"
However this is not working.
Thank you.
There is no " in the input behind content=, so you shouldn't match it. You could match up until a space or >.
sed 's#content=\([^"][^ >]*\)#content="\1"#'
Note that you should use XML aware tools to parse XML documents.
This should work:
sed -E 's/content=([^">]+)/content="\1"/'
Explanation:
In this way, you tell sed to substitute everything is after content= and before > only if it doesn't start with ". I used regex grouping to replace the content with itself surrounded by ".
Input:
<meta name="ProgId" content=Word.Document>
<meta name="Generator" content="Microsoft Word 15">
Output:
<meta name="ProgId" content="Word.Document">
<meta name="Generator" content="Microsoft Word 15">
Related
I want to add new line in a html file by using sed command
The line I want to add is
<link href="https://newvalue.css" rel="test1" id="test2">
After
<link href="test.css" rel="test1" id="test2">
in a html file.
Can anyone help ?
Use sed and a for append and so:
sed -i '/<link href="test.css" rel="test1" id="test2">/a<link href="https://newvalue.css" rel="test1" id="test2">' file
Search for the line by using /.../ and then use a for append followed by the string to add.
I want to remove script calls from the HTML with following script.
var=$(sed -e '/^<script.*</script>$/d' -e '/.js/!d' testFile.html)
sed -i -e "/$var/d" testFile.html
Sample input file:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
<script type="text/javascript" src="script.js" language="javascript">
</script>
<script>
// script code
</script>
</head>
<body>
</body>
</html>
Sample output file:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
</script>
<script>
// script code
</script>
</head>
<body>
</body>
</html>
But, it gives the following error..
sed: -e expression #1, char 23: unterminated `s' command
Thanks in advance
trying
root#isadora:~/temp# sed -e '/^<script/,/<\/script>/d' aaaa.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
</script>
</head>
<body>
</body>
</html>
root#isadora:~/temp#
Att.
It is unclear why you break this up into two separate scripts or what you hope for the variable to contain. This can be performed trivially with a single script.
The immediate problem is that you cannot use a literal unescaped slash in a regex if you use slash as the regex separator. Either use a different separator, or backslash-escape any literal slashes.
sed -i -e '\#^<script.*</script>$#d' -e '/\.js/!d' testFile.html
Notice also the backslash before the dot (an unescaped dot in a regex matches any character, so /.js/ matches e.g. the string notjs.)
Assume the following code snippet:
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>
some stuff
a change
more stuff
more changes
more stuff
}
}
}
}
final changes
</script>
</body>
I need to add something right before the last </script>, what's stated as final changes. How can I tell sed to match that one? final changes doesn't exist, the last lines of the script are like four or five }, so it would be the scenario, I'd need to match multiple lines.
All the other changes were replaced by matching the line, then replacing with the line + the changes. But I don't know how to match the multi line to replace</script></body> with final changes </script></body>.
I tried to use the same tactic I use for replacing with multiple lines, but it didn't work, keep reporting unterminated substitute pattern.
sed 's|</script>\
</body>|lalalalala\
</script>\
</body>|' file.hmtl
I've read this question Sed regexp multiline - replace HTML but it doesn't suit my particular case because it matches everything between the search options. I need to match something, then add something before the first search operator.
sed, grep, awk etc. are NOT for XML/HTML processing.
Use a proper XML/HTML parsers.
xmlstarlet is one of them.
Sample file.html:
<html>
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>
var data = [0, 1, 2];
console.log(data);
</script>
</body>
</html>
The command:
xmlstarlet ed -O -P -u '//body/script' -v 'alert("success")' file.htm
The output:
<html>
<head>
<script>....</script>
<script>....</script>
</head>
<body>
<script>alert("success")</script>
</body>
</html>
http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html
Finally got this following xara's answer in https://unix.stackexchange.com/questions/26284/how-can-i-use-sed-to-replace-a-multi-line-string
In summary, instead of trying to do magic with sed, replace the newlines with a character which sed understands (like \r), do the replace and then replace the character with newline again.
I want to change lines like:
<A HREF="classes_index_additions.html"class="hiddenlink">
to
<A HREF="classes_index_additions.html" class="hiddenlink">
(note the added ' ' before class) but it should leave lines like
<meta name="generator" content="JDiff v1.1.1">
alone. sed -e 's|\("[^"]*"\)\([^ />]\)|\1 \2|g' satisfies the first condition but it changes the other text to
<meta name="generator" content=" JDiff v1.1.1"/>
How do I get sed to process the correct pairs of double quotes?
You can try this:
sed -e 's/"\([^" ]*\)=/" \1=/g'
But with sed, it may be possible that the regular expression matches other parts of your document that you didn't intend, so best to try it and look over the results to see if there are any unintended side effects!
You can try putting each attributes on a new line and then triming trailing spaces on each line before removing new lines.
sed -r 's/(\w*="[^"]*")/\n\1/g; s/ *\n/\n/g; s/\n/ /g'
This works as follow :
s/(\w*="[^"]*")/\n\1/g
Put every attributes on a new line so your node looks like this
<A
HREF="classes_index_additions.html"
class="hiddenlink">
After that you remove trailing spaces
s/ *\n/\n/g
And remove new lines
s/\n/ /g
Hi I'm trying to append a few lines of codes to a couple thousand html files in a directory (and sub-directories). What I'm trying to do is add xxx lines of code to all html files following the tag. I've tried to explore sed but I'm having issues with having the / sign inside the search and adding the several lines of codes to the sed command.
I'm thinking of adding the lines I want to add in a txt file and use sed to place all content in that txt file after the tag.
Much appreciate any help.
Say sample.html contains this:
<html>
<head>
</head>
<h1>Title</h1>
<body>
etc
I want to add this after the </h1> element:
<script>
etc.
</script>
<iframe>
</iframe>
to Produce this:
<html>
<head>
</head>
<h1>Title</h1>
<script>
etc.
</script>
<iframe>
</iframe>
<body>
etc
Assuming you want to place the text after the H1 end tag, and that end tag enter code here:
sed -i '/<\/h1>/r new_text.html' sample.html
Another solution:
Content of script.sed
/<\/h1>/ {
a\
<script>\
etc.\
</script>\
<iframe>\
</iframe>
}
Run it like:
sed -i -f script.sed sample.html