this is my problem I'm trying to read an HTML file(index.html) then search all links an put it on a second file named salida.html, I read this answer, I read this answer and I tried to do it, but it didn't work for me.
This is my perl code:
use strict;
use warnings;
use 5.010;
use Tie::File;
my $entrada='index.html';
my $salida='salida.html';
open(A,"<$entrada");
my #links;
foreach my $linea (<A>){
print "Renglon => $linea\n" if $linea =~ m/a href/;
#print $B $linea if $linea =~ m/a href/;
push #links, $linea if $linea =~ m/a href/;
}
tie my #resultado, 'Tie::File', 'salida.html' or die "Nelson";
for (#resultado) {
if ($_ =~ m/<main class="contenido">/){
foreach my $found (#links){
$_ .= '<br/>'.$found;
}
last;
}
}
close(A);
My Perl code runs without problems but in the for of my code I'm trying to write the links that I have in my variable $links in a specific part of my salida.html file:
<!DOCTYPE html>
<html lang="es-mx">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Resultados de la busqueda</title>
<link rel="stylesheet" href="style-salida.css">
</head>
<body>
<div class="contenedor">
<header class="header">
<h2>Resultados de la busqueda</h2>
</header>
*<main class="contenido">
</main>*
<footer class="footer">
<h4>
Gerardo Saucedo Arevalo - 15092087 - Topicos selectos de tecnologias web - Búsqueda de enlaces dentro de
una página web
</h4>
</footer>
</div>
</body>
</html>
But my code always add the lines at the end of the file, I ran this code once and it worked perfectly, but then I add some lines and when I tried to run one more time didn't work.
I restored my file at the moment when it worked but it does not work anymore.
What I'm doing wrong?
Always process HTML or XML with an appropriate parser and then implement your processing on the DOM. My solution uses HTML::TreeBuilder. As your question doesn't include the contents of index.html I have appended my own to the solution:
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
# Extract links from <DATA>
my $root1 = HTML::TreeBuilder->new->parse_file(\*DATA)
or die "HTML: $!\n";
my #links = $root1->look_down(_tag => 'a');
# Process salida.html from STDIN
my $root2 = HTML::TreeBuilder->new;
$root2->ignore_unknown(0);
$root2->parse_file(\*STDIN)
or die "HTML: $!\n";
# insert links in correct section
if (my #nodes = $root2->look_down(class => 'contenido')) {
$nodes[0]->push_content(#links);
}
print $root2->as_HTML(undef, ' '), "\n";
# IMPORTANT: must delete manually
$root2->delete;
$root1->delete;
exit 0;
__DATA__
<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<div>
Link 1
Link 2
</div>
</body>
</html>
Test run:
$ perl dummy.pl <dummy.html
<!DOCTYPE html>
<html lang="es-mx">
...
<main class="contenido"> Link 1Link 2</main>
...
</html>
Related
My web page uses Charset UTF-8 to allow Chinese character input in a textarea form field. I want to test if the input contains a certain character. I've writtena test script to see how Perl is going to handle the Chinese input. It's not finding the match when there is a known match.
Here is my test form:
<!DOCTYPE html>
<head>
<meta charset="utf-8">
</head>
<body>
<form method="post" action="http://www.my_domain.com/cgi-bin/my_test_script.pl">
<textarea name="user_input" rows="" cols=""></textarea>
<input type="submit" name="submit" value="submit">
</form>
</body>
</html>
Here is my code:
#!/usr/bin/perl -T
use strict;
use warnings;
use CGI;
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use utf8;
print "Content-type: text/html; charset=UTF-8\n\n";
print "<meta http-equiv='content-type' content='text/html;charset=UTF-8'>";
my $query = new CGI;
my $msg = $query->param('user_input');
chomp $msg;
my $msg_code = ord($msg);
print "<p> Message was: ".$msg."\n";
print "<p> Message Code is: ".$msg_code."\n";
my $char_from_code_point = "\N{U+89C6}";
my $char_from_code_point_reverse_code = ord($char_from_code_point);
print "<p> char_from_code_point= ".$char_from_code_point."\n";
print "<p> char_from_code_point_reverse_code = ".$char_from_code_point_reverse_code."\n";
if ($msg =~ m/$char_from_code_point/) {
print "<p>Matched!\n";
}
else {
print "<p> NOT matched\n";
}
And here is the output from submitting the correct character:
Message was: 视
Message Code is: 232
char_from_code_point= 视
char_from_code_point_reverse_code = 35270
NOT matched
Could someone please point out what I'm doing wrong?
Thank you.
Suppose, we have the following HTML file:
test.htm
<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<b>weight:</b> 120kg<br>
<b>length:</b> 10cm<br>
</body>
</html>
How can I get the following data from it?
{
'weight' => '120kg',
'length' => '10cm',
}
parser.pl
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use HTML::TreeBuilder;
my $root = HTML::TreeBuilder->new;
$root->parse_file('test.htm');
#what to do here?
$root->delete( );
This gets you very close to what you want (you'll need to tweak the text strings you're getting for the keys and values slightly).
But I think you'll find it far simpler using a tool like Web:Scraper.
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
use HTML::TreeBuilder;
my $root = HTML::TreeBuilder->new;
$root->parse_file(\*DATA);
my $data;
foreach my $elem ($root->find('b')) {
$data->{($elem->content_list)[0]} = $elem->right;
}
say Dumper $data;
__END__
<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<b>weight:</b> 120kg<br>
<b>length:</b> 10cm<br>
</body>
</html>
Output:
$VAR1 = {
'length:' => ' 10cm',
'weight:' => ' 120kg'
};
Two solutions using Mojo::DOM:
use strict;
use warnings;
use Mojo::DOM;
use Data::Dump;
my $dom = Mojo::DOM->new(do {local $/; <DATA>});
my %hash = do {
my $text = $dom->find('body')->all_text();
split ' ', $text;
};
dd \%hash;
my %hash2 = map {
$_->all_text() => $_->next_sibling() =~ s{^\s+|\s+$}{}gr
} $dom->find('b')->each;
dd \%hash2;
__DATA__
<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<b>weight:</b> 120kg<br>
<b>length:</b> 10cm<br>
</body>
</html>
Outputs:
{ "length:" => "10cm", "weight:" => "120kg" }
{ "length:" => "10cm", "weight:" => "120kg" }
Is there anyway that I can extract style tag data from a HTML page using Perl
#!/usr/bin/perl
use strict;
my $HTML = <<"EOF";
<HTML>
<head>
<style type='text/css'>
#yui-dt0-bdrow0 td{background:#CFF;}
#yui-dt0-bdrow1 td{background:#CFF;}
#yui-dt0-bdrow2 td{background:#CFF;}
</style>
</head>
</HTML>
EOF
I need to extract yui-dt0-bdrow0 td{background:#CFF;} information from the above HTML code.
I googled for lot of modules but didn't find the right one. Other than that I didn't try writing any code to extract the information
Any help is appreciated.
Use Mojo::DOM
Sample:
#!/usr/bin/perl
use strict;
use warnings;
use Mojo::DOM;
my $HTML = <<"EOF";
<HTML>
<head>
<style type='text/css'>
#yui-dt0-bdrow0 td{background:#CFF;}
#yui-dt0-bdrow1 td{background:#CFF;}
#yui-dt0-bdrow2 td{background:#CFF;}
</style>
</head>
</HTML>
EOF
my $dom = Mojo::DOM->new( $HTML );
print $dom->find('style')->text;
Output
chankey#pathak:~/myscripts$ perl mojo.pl
#yui-dt0-bdrow0 td{background:#CFF;}
#yui-dt0-bdrow1 td{background:#CFF;}
#yui-dt0-bdrow2 td{background:#CFF;}
You can now filter out the desired data.
For a 8 minute video tutorial on Mojo::DOM and Mojo::UserAgent check out Mojocast Episode 5
here is my html code
<html>
<title>Results</title>
<body><h1> Here are your results</h1>
<p>Please click the Button to see your result run by Ravi's team.</p>
<form action='index.pl' method='post'>
<input type='submit' value='submit'>
</form>
</body>
</html>
and index.pl is my perl and my subroutine is as follows.
sub my_result{
my $run;
my $dir="/kbio/sraja/BenzoExposedDataSet/database/Output";
my $parsebphtml = "/parse_bphtml.pl";
my $olgacsvfile = "/database/Output/sample.csv";
my #bp=<$dir/*.bp>;
$run ="perl $parsebphtml > $olgacsvfile";
# print "$com\n";
system($run)==0 or my_err("Could not run $run\n");
#printing the table
open(F,"$olgacsvfile") or my_err("Could not open the csv ($olgacsvfile) file");
print "<h2> Average Results </h2>";
print "<table border=1>";
while(my $line=<F>){
print "<tr>";
my #cells= split ',',$line;
foreach my $cell (#cells)
{
print "<td colspan=1>$cell</td>";
}
print "</tr>";
}
print "</table>";
}
So as you see, table is what i need to return to results.html
Any help would be really appreciable.
thanks .
Geet
I don't know how much work you want to do but, if you want to keep it simple give a try at the HTML::Template module. Here is a simple usage example.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset=utf-8>
<title>A random page</title>
</head>
<body>
<TMPL_VAR NAME=page_content>
</body>
</html>
My perl code contained something like this. Better yet, check the documentation at http://metacpan.org/pod/HTML::Template .
use HTML::Template;
sub my_result {
return $html_string;
}
my $master_template = HTML::Template->new(filename => "Path to html template file");
$master_template->param('page_content' => my_result());
Depending on how far you plan on going with this, I would recommend that you a more advanced templating system such as the one used by the mojolicious framework (http://mojolicio.us/perldoc/Mojo/Template).
Cheers,
MrMcKizzle
I am fairly new to Perl.
I have a form that reads into a script.pl and does the validation check and etc.
How can I make it so once its done showing the validation, loops back to the home page after a few seconds automatically?
I tried using the following and it didn't work:
use strict;
use warnings;
my $url = "http://google.com";
print "Location: $url\n\n";
An Example of HTML for this would be: <META HTTP-EQUIV="REFRESH" CONTENT="10;URL=index.htm">
here is what i have:
#!/usr/bin/perl
use strict;
use warnings;
my $url = "google.com";;
print "Location: $url\n\n";
print "Content-type: text/html\n\n";
%form=&parse_form();
etc....etc...
You could use the following alternative:
use strict;
use warnings;
my $url = "http://google.com";
print "Content-type: text/html\n\n";
print qq[
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Redirecting...</title>
<meta HTTP-EQUIV="REFRESH" CONTENT="10;URL=$url">
</head>
<body>
</body>
</html>
];
The following is valid and should work fine:
use strict;
use warnings;
my $url = "http://google.com";
print "Location: $url\n\n";