Why is the last slash of a node missing in xidel output? - xidel

$ xidel --output-format html -s -e 'x:replace-nodes(//script,())' - <<< '<html><head><script>x=1</script><link href="http://example.com" type='text/css' rel='stylesheet' /></head><body>abc</body></html>'
<!DOCTYPE html>
<html><head><link href="http://example.com" type="text/css" rel="stylesheet"></head><body>abc
</body></html>
In the above example, <link href="http://example.com" type="text/css" rel="stylesheet" /> is contracted to <link href="http://example.com" type="text/css" rel="stylesheet">.
Shouldn't the last slash be maintained in the output?

Related

sed command to find .css file inside link

I am using sed to read the .css file name (after "href=") from an html file. The command is follow:
cssFiles=$(echo "$BODY" | sed -rn 's/<link\s.*href=\W(.*.css).*/\1/p')
But, it does not works correctly. Below, sample input, output and expected output is given. Where am I wrong?
Sample input:
<link href="/css/default.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="js/flexslider/flexslider.css">
Sample output:
/css/default.css" rel="stylesheet" type="text/css
js/flexslider/flexslider.css
Expected output:
/css/default.css
js/flexslider/flexslider.css
Try this:
cssFiles=$(echo "$BODY" | sed -rn 's/<link\s.*href=\W(.*.css).*/\1/p' | awk -F'=' '{print$2}' awk -F' ' '{print$1}')

Using curlmirror.pl gives different outputs

Using http://curl.haxx.se/programs/curlmirror.txt [Edit: Current version at https://github.com/cudeso/tools/blob/master/curlmirror.txt ], I'm looking to download a website and check for changes between the newly downloaded website and one that I have downloaded previously. However when I download the same website sometimes the links on the website use relative paths, sometimes they use absolute paths, and that counts as a "change" even though the website did not change.
Usage: curlmirror.pl -l -d 3 -o someOutputFileDirectory/url http://url
Output 1: <td>LINK</td>
Output 2: <td>LINK</td>
Is there a way to convert all relative paths to absolute paths or the other way around? I just need to standardize the download so that these links do not appear as "changes"
UPDATED
I assume that the url is placed to $url variable. Then You can try something like bellow:
perl -pe 'BEGIN {$url="http://somedomain.org"}
s!(\b(?:url|href)=")([^/]+)(")!$1$url/$2$3!gi' << XXX
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="home">
XXX
Output:
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="http://mymain.org/home">
It replaces all href="..." or url="..." (case-insensitive) patterns with href="$url/..." or url="$url/..." if ... not contains / character.
If the input is a file, You can replace these patterns in the file directly:
cat >tfile << XXX
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="home">
XXX
cat tfile
perl -i -pe 'BEGIN {$url="http://mymain.org"}
s!(\b(?:url|href)=")([^/]+)(")!$1$url/$2$3!gi' tfile
echo "---"
cat tfile
Output:
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="home">
---
<td>LINK</td>
<td>LINK</td>
<meta http-equiv="Refresh" content="0;URL="http://mymain.org/home">

Find and replace string with sed

I need to do a multi-file find and replace with nothing (delete) using sed. I have the line:
So replace the line:
<meta name="keywords" content="there could be anything here">
With '' (nothing) in all files in and under the current dir.
I have got this so far:
sed -e 's/<meta name="keywords" content=".*>//g' myfile.html'
But I know this is only going to remove the < or > tags. How can I match against
<meta name="keywords" content="
and delete everything from that to the next
>
I also need to do it for all files in and under (recursively) the current directory.
Thanks in advance!
sed has the delete directive try using
sed -e '/<meta name="keywords"/d' myfile.html

sed find replace (inplace replace) using regex

I need to find and replace certain text in many files. I am trying to use sed to do the replacement. Here is what I am trying to do:
Find:
<font size="4" face="verdana, arial,geneva"><b>([^<]*)</b></font>
replace with:
<font size="4" face="verdana, arial,geneva"><b><title>$1</title></b></font>
Esentially I want to add a <title></title> tag around what ever I find.
e.g. if the text is like:
<font size="4" face="verdana, arial,geneva"><b>THIS IS MY TITLE</b></font>
I want to replace it with:
<font size="4" face="verdana, arial,geneva"><b><title>THIS IS MY TITLE</title></b></font>
I have tried various commands, but it does not seems to work. Here aare the commands that I have tried so far:
sed -e 's/<font size="4" face="verdana, arial,geneva"><b>\([^<]*\)<\/b><\/font>/<font size="4" face="verdana, arial,geneva"><b><title>\1<\/title><\/b><\/font>/g'
sed -r 's/<font size="4" face="verdana, arial,geneva"><b>([^<]*)<\/b><\/font>/<font size="4" face="verdana, arial,geneva"><b><title>\1<\/title><\/b><\/font>/g'
sed -E 's/<font size="4" face="verdana, arial,geneva"><b>([^<]*)<\/b><\/font>/<font size="4" face="verdana, arial,geneva"><b><title>\1<\/title><\/b><\/font>/g'
For me this works
sed '/font *size *= *"4" *face/s|<b>\([^<]*\)</b>|<b><title>\1</title></b>|g'
my idea is to avoid as much escapes as possible and break matching and substitution in two steps
a sed line was basically built from copy & paste ^_^. please try it:
kent$ (master|✔) echo '<font size="4" face="verdana, arial,geneva"><b>THIS IS MY TITLE</b></font>'|sed -r 's#(<font size="4" face="verdana, arial,geneva"><b>)([^<]*)(</b></font>)#\1<title>\2</title>\3#'
<font size="4" face="verdana, arial,geneva"><b><title>THIS IS MY TITLE</title></b></font>

Add CSS File To Layout Zend Framework

I'm trying to add a css file in my zend application layout.php the problem is it resides in
/application/media/css/style.css
when i do
<?php echo $this->headLink()->appendStylesheet('/media/css/style.css') ?>
it generates the path like
appname/public/media/css/style.css
where as i need to generate the path like
appname/application/media/css/style.css
how can i tell zend to look for the css file at a apecific location in layout.php
I have solved the problem. Always is simple. :D
I added
<link rel="stylesheet" href="css/site.css" type="text/css" media="screen, projection">
in head tag. And css folder is in public folder.
thanks
ohm
Everything in application directory is not accessible via your web server you would either have to move the file to the public directory or setup a symlink to it.
Just to complete all the answers, is not "correct" put your CSS, JS, fonts, and all other media in 'application' folder. The 'public' folder is there for that.
Then, to remove the '/public' from all your URLs, just make a simple change in your .htaccess
RewriteEngine On
RewriteBase / # you could put some subfolder, if you need
RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
RewriteRule ^.*$ index.php [NC,L]
And in your index.php
<?php
define('RUNNING_FROM_ROOT', true);
include 'public/index.php';
And then you could use this to append your styles
<?php echo $this->headLink()->appendStylesheet('/public/media/css/style.css') ?>
Hope it helps.
You should use this code:
<link rel="stylesheet" type="text/css" media="screen" href="<?=$this->baseUrl();?>/yourpathtocss" />
I think the correct way is to do this from layout.phtml as follows
<?php echo $this->headLink()->appendStylesheet('/css/site.css') ?>
here 'TestZend' is Project name
<?php echo $this->headLink()->appendStylesheet('/TestZend/public/css/bootstrap.css'); ?>
Just add this code snippet in the layout section to make sure the MEDIA folder is created inside yourProjectFoder/public/media/css/style.css
<head>
// some code here...
<?= $this->headLink()
//added from the new template
->prependStylesheet($this->basePath('media/css/style.css'))
?>
// some other code here
</head>