sed find replace (inplace replace) using regex - sed

I need to find and replace certain text in many files. I am trying to use sed to do the replacement. Here is what I am trying to do:
Find:
<font size="4" face="verdana, arial,geneva"><b>([^<]*)</b></font>
replace with:
<font size="4" face="verdana, arial,geneva"><b><title>$1</title></b></font>
Esentially I want to add a <title></title> tag around what ever I find.
e.g. if the text is like:
<font size="4" face="verdana, arial,geneva"><b>THIS IS MY TITLE</b></font>
I want to replace it with:
<font size="4" face="verdana, arial,geneva"><b><title>THIS IS MY TITLE</title></b></font>
I have tried various commands, but it does not seems to work. Here aare the commands that I have tried so far:
sed -e 's/<font size="4" face="verdana, arial,geneva"><b>\([^<]*\)<\/b><\/font>/<font size="4" face="verdana, arial,geneva"><b><title>\1<\/title><\/b><\/font>/g'
sed -r 's/<font size="4" face="verdana, arial,geneva"><b>([^<]*)<\/b><\/font>/<font size="4" face="verdana, arial,geneva"><b><title>\1<\/title><\/b><\/font>/g'
sed -E 's/<font size="4" face="verdana, arial,geneva"><b>([^<]*)<\/b><\/font>/<font size="4" face="verdana, arial,geneva"><b><title>\1<\/title><\/b><\/font>/g'

For me this works
sed '/font *size *= *"4" *face/s|<b>\([^<]*\)</b>|<b><title>\1</title></b>|g'
my idea is to avoid as much escapes as possible and break matching and substitution in two steps

a sed line was basically built from copy & paste ^_^. please try it:
kent$ (master|✔) echo '<font size="4" face="verdana, arial,geneva"><b>THIS IS MY TITLE</b></font>'|sed -r 's#(<font size="4" face="verdana, arial,geneva"><b>)([^<]*)(</b></font>)#\1<title>\2</title>\3#'
<font size="4" face="verdana, arial,geneva"><b><title>THIS IS MY TITLE</title></b></font>

Related

remove everything between two characters with sed

I'd like to remove any characters between including them also
<img src=\"/wp-content/uploads/9e580e68ed249dec8fc0e668da78d170.jpg\" / hspace=\"5\" vspace=\"0\" align=\"left\">
I was trying
sed -i -e 's/<img src.*align=\\"left\\">//g' file
You do not say what version of sed you are using, or what shell.
With GNU sed and bash, your attempt was almost there. Try:
sed -i 's/<img src[^>]*align=\\"left\\">//g' file
Explanation:
s/<img src[^>]*align=\\"left\\">/ search for <img src_STUFF_align=\"left\">, where _STUFF_ cannot contain any >
// and replace it with nothing
/g and continue
-i and modify the file
I believe this should work with most version of sed (except for the -i).

Can I prepend a line without creating a new line?

If I have a text file containing:
This is a line
Using sed, how can I do this:
<p>This is a line</p>
I have tried the following script:
i\<p> a\</p>
but this gives me
<p>
This is a line
</p>
How can I achieve this?
Use s/// not append or insert.
$ echo 'This is a line' | sed 's~.*~<p>&</p>~'
<p>This is a line</p>
& at the replacement part refers the whole match.
OR
You could also do like this,
$ echo 'This is a line' | sed 's~^~<p>~;s~$~</p>~'
<p>This is a line</p>
You can also use awk:
echo 'This is a line' | awk '$0="<p>"$0"</p>"'
<p>This is a line</p>
Or more robust:
echo 'This is a line' | awk '{$0="<p>"$0"</p>"}1'
<p>This is a line</p>

Retrieve information Text/Word from HTML code using awk/sed

awk/sed newbie here. I have a HTML file and from that file and I would like to retrieve a text word.
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc_process.txt>abc</a> NDK Version: 4.0 </li>
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc01_process.txt>abc01</a> NDK Version: 4.0 </li>
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc045_process.txt>abc045</a> NDK Version: 4.0 </li>
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/cdf_process.txt>cdf</a> NDK Version: 4.0 </li>
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/Manhattan_process.txt>Manhattan</a> NDK Version: 4.0 </li>
For eg. From the 1st line I would like to retrieve abc placed between: .txt>abc/a>
I have used the following command but as you can see that number of letters in the word keeps changing abc, abc01, abc045, cdf, Manhattan.
awk -F\/ '{print substr($4,0,3)}' list.html
So this command is getting the output for only the 3 letter word. However I want to extract the same information (abc01, abc045, cdf, Manhattan) from all the lines in the HTML code. Please help.
Using awk:
awk -F'[<>]' '{print $7}' urls
abc
abc01
abc045
cdf
Manhattan
You could try:
perl -nE '/<a href.*?>(.*?)<\/a>/; say $1' file
Output:
abc
abc01
abc045
cdf
Manhattan
$ sed -n 's/.*txt>\([[:alnum:]]\+\)<.*/\1/p' list.html
abc
abc01
abc045
cdf
Manhattan
Or:
$ awk -F'(txt>|</a)' '{print $2}' list.html
abc
abc01
abc045
cdf
Manhattan
I use command sed or awk to extract it. Here, I save origin data into file /tmp/html.txt.
Both of them utilize regular expression and back reference
Via sed
flying#lempstacker:~$ sed -r -n 's#.*<a [^>]*>(.*)</a>.*#\1#p' /tmp/html.txt
abc
abc01
abc045
cdf
Manhattan
flying#lempstacker:~$
Via awk
using function gensub
flying#lempstacker:~$ awk '{print gensub(/.*<a [^>]*>(.*)<\/a>.*/,"\\1"," ",$0)}' /tmp/html.txt
abc
abc01
abc045
cdf
Manhattan
flying#lempstacker:~$
Using gnu grep
grep -Po "<a href.*?>\K[^<]*" file

How to add new line using sed on MacOS?

I wanted to add a new line between </a> and <a><a>
</a><a><a>
</a>
<a><a>
I did this
sed 's#</a><a><a>#</a>\n<a><a>#g' filename but it didn't work.
Powered by mac in two Interpretation:
echo foo | sed 's/f/f\'$'\n/'
echo foo | gsed 's/f/f\n/g'
Some seds, notably Mac / BSD, don't interpret \n as a newline, you need to use an actual newline, preceded by a backslash:
$ echo foo | sed 's/f/f\n/'
fnoo
$ echo foo | sed 's/f/f\
> /'
f
oo
$
Or you can use:
echo foo | sed $'s/f/f\\\n/'
...or you just pound on it! worked for me on insert on mac / osx:
sed "2 i \\\n${TEXT}\n\n" -i ${FILE_PATH_NAME}
sed "2 i \\\nSomeText\n\n" -i textfile.txt

Find and replace string with sed

I need to do a multi-file find and replace with nothing (delete) using sed. I have the line:
So replace the line:
<meta name="keywords" content="there could be anything here">
With '' (nothing) in all files in and under the current dir.
I have got this so far:
sed -e 's/<meta name="keywords" content=".*>//g' myfile.html'
But I know this is only going to remove the < or > tags. How can I match against
<meta name="keywords" content="
and delete everything from that to the next
>
I also need to do it for all files in and under (recursively) the current directory.
Thanks in advance!
sed has the delete directive try using
sed -e '/<meta name="keywords"/d' myfile.html