sed command to find .css file inside link - sed

I am using sed to read the .css file name (after "href=") from an html file. The command is follow:
cssFiles=$(echo "$BODY" | sed -rn 's/<link\s.*href=\W(.*.css).*/\1/p')
But, it does not works correctly. Below, sample input, output and expected output is given. Where am I wrong?
Sample input:
<link href="/css/default.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="js/flexslider/flexslider.css">
Sample output:
/css/default.css" rel="stylesheet" type="text/css
js/flexslider/flexslider.css
Expected output:
/css/default.css
js/flexslider/flexslider.css

Try this:
cssFiles=$(echo "$BODY" | sed -rn 's/<link\s.*href=\W(.*.css).*/\1/p' | awk -F'=' '{print$2}' awk -F' ' '{print$1}')

Related

Why is the last slash of a node missing in xidel output?

$ xidel --output-format html -s -e 'x:replace-nodes(//script,())' - <<< '<html><head><script>x=1</script><link href="http://example.com" type='text/css' rel='stylesheet' /></head><body>abc</body></html>'
<!DOCTYPE html>
<html><head><link href="http://example.com" type="text/css" rel="stylesheet"></head><body>abc
</body></html>
In the above example, <link href="http://example.com" type="text/css" rel="stylesheet" /> is contracted to <link href="http://example.com" type="text/css" rel="stylesheet">.
Shouldn't the last slash be maintained in the output?

Extract data using grep/sed from html tag with special class/id

I need to grep info from website and it is stored like:
<div class="name">Mark</div>
<div class="surname">John</div>
<div class="phone">8434</div>
and etc.
Tried to grep it and parse it later with sed:
grep -o '<div class="name">.*</div>' | sed -e 's?<div class="name">?|?g'
but, when I try to replace with sed -e 's?<\/div><div class="phone">?|?g' - no result
and for every class do the same thing. I cannot delete all html tags (sed 's/<[^>]\+>//g'), and need to do it only for div with this classes.
The output format should be like
|Mark|John|8434|
I need to do it with grep/sed
Using awk should do the job:
awk -F"[<>]" '{printf "%s|",$3}' file
Mark|John|8434|
If you need a new line at the end:
awk -F"[<>]" '{printf "%s|",$3} END {print ""}' file
It creates filed separated by < or > then print the third field with | as separator.

Can I prepend a line without creating a new line?

If I have a text file containing:
This is a line
Using sed, how can I do this:
<p>This is a line</p>
I have tried the following script:
i\<p> a\</p>
but this gives me
<p>
This is a line
</p>
How can I achieve this?
Use s/// not append or insert.
$ echo 'This is a line' | sed 's~.*~<p>&</p>~'
<p>This is a line</p>
& at the replacement part refers the whole match.
OR
You could also do like this,
$ echo 'This is a line' | sed 's~^~<p>~;s~$~</p>~'
<p>This is a line</p>
You can also use awk:
echo 'This is a line' | awk '$0="<p>"$0"</p>"'
<p>This is a line</p>
Or more robust:
echo 'This is a line' | awk '{$0="<p>"$0"</p>"}1'
<p>This is a line</p>

Using curlmirror.pl gives different outputs

Using http://curl.haxx.se/programs/curlmirror.txt [Edit: Current version at https://github.com/cudeso/tools/blob/master/curlmirror.txt ], I'm looking to download a website and check for changes between the newly downloaded website and one that I have downloaded previously. However when I download the same website sometimes the links on the website use relative paths, sometimes they use absolute paths, and that counts as a "change" even though the website did not change.
Usage: curlmirror.pl -l -d 3 -o someOutputFileDirectory/url http://url
Output 1: <td>LINK</td>
Output 2: <td>LINK</td>
Is there a way to convert all relative paths to absolute paths or the other way around? I just need to standardize the download so that these links do not appear as "changes"
UPDATED
I assume that the url is placed to $url variable. Then You can try something like bellow:
perl -pe 'BEGIN {$url="http://somedomain.org"}
s!(\b(?:url|href)=")([^/]+)(")!$1$url/$2$3!gi' << XXX
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="home">
XXX
Output:
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="http://mymain.org/home">
It replaces all href="..." or url="..." (case-insensitive) patterns with href="$url/..." or url="$url/..." if ... not contains / character.
If the input is a file, You can replace these patterns in the file directly:
cat >tfile << XXX
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="home">
XXX
cat tfile
perl -i -pe 'BEGIN {$url="http://mymain.org"}
s!(\b(?:url|href)=")([^/]+)(")!$1$url/$2$3!gi' tfile
echo "---"
cat tfile
Output:
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="home">
---
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="http://mymain.org/home">

Find and replace string with sed

I need to do a multi-file find and replace with nothing (delete) using sed. I have the line:
So replace the line:
<meta name="keywords" content="there could be anything here">
With '' (nothing) in all files in and under the current dir.
I have got this so far:
sed -e 's/<meta name="keywords" content=".*>//g' myfile.html'
But I know this is only going to remove the < or > tags. How can I match against
<meta name="keywords" content="
and delete everything from that to the next
>
I also need to do it for all files in and under (recursively) the current directory.
Thanks in advance!
sed has the delete directive try using
sed -e '/<meta name="keywords"/d' myfile.html