Submit multiple fasta sequence using WWW::Mechanize

Submit multiple fasta sequence using WWW::Mechanize - perl

I want to summit multiple protein sequences in fasta format on this server using following perl script
use WWW::Mechanize;
# get the webpage with the form
my $mech = WWW::Mechanize->new();
$mech->show_progress(1);
my $url = "http://harrier.nagahama-i-bio.ac.jp/sosui/sosu/sosuigsubmit.html";
$mech->get($url);
# just checks to see if scripts call properly
sub validateInput{
my $file = shift;
my $inFh = IO::File->new( $file ) || die "can't open input file\n";
close($inFh);
return 1;
}
validateInput($ARGV[0]);
# fill the fields with the appropriate data and submit
my $fields = {
'in' => $ARGV[0],
'value' => 'Exec'
};
$mech->submit_form(
fields => $fields,
);
# print results
print $mech->content();
But every time I getting the result like this
<HTML>
<bR><bR><bR>
<TABLE>
<TR><TD ALIGN=left WIDTH=300>TOTAL PROTEINS</TD><TH>0</TH></TR>
<TR><TD ALIGN=left WIDTH=300>TOTAL MEMBRANE PROTEINS</TD><TH>0</TH></TR>
<TR><TD ALIGN=left WIDTH=300>PERCENTAGE</TD><TH> 0.0 %</TH></TR>
</TABLE>
</HTML>
Which is a result page when you submit form without input. So I suspect that there is some problem with my sequence submission. My input file look like this
>ATCG00420
MQGTLSVWLAKRGLVHRSLGFDYQGIETLQIKPEDWHSIAVILYVYGYNYLRSQCAYDVAPGGLLASVYHLTRIEYGV NQAEEVCIKVFTHRSNPRIPSVFWVWKSTDFQERESYDMLGITYDSHPRLKRILMPESWIGWPLRKDYIAPNFYEIQDAY
>ATCG00360
MSAQSEGNYAEALQNYYEAMRLEIDPYDRSYILYNIGLIHTSNGEHTKALEYYFRALERNPFLPQAFNNMAVICHYRGEQAIQQGDSEMAEAWFAQAAEYWKQAITLTPGNYIEAQNWLTITRRFE
and I am calling my script like this
perl my_script input.seq >output
Thanks for helping me out.

For starters, this line:
'in' => $ARGV[0],
Means that you're sending them a filename, rather than the contents of the file. You'll need to get the contents first, and send those. Libraries like File::Slurper are handy for this.

Related

Bypass optional field in pattern match

I'm trying to pull out the names of the players and totals, but in some cases there is an extra html tag following the number of the player in the list. So how can I bypass that extra field when it appears. I can't put parenthesis around it because it will try to match it, correct?
<tr><td>10<td>MANNY MACHADO - FA</td><td>37</td></tr>
<tr><td>107</td><td>ALEDMYS DIAZ - HOU</td><td>18</td></tr>
while($content =~ /<tr><td>\d+?\S+?<td>(.*?)\s-.*?<\/td><td>(\d+?)</g) {
my $player = $1;
my $total = $2;
print "\nPlayer => $player Total => $total\n";
}
I tried using the '\S+?' to bypass it, but in this case it doesn't print out anything where the number of the player is less than 10.

It is generally a bad idea to use regexes for HTML, XML, etc.
Instead you should use an appropriate parser to convert it to a DOM and then implement your algorithm in the DOM domain. Using your example:
parse the HTML from file or string
(find the correct table in the document - left out in the example as I don't have the complete HTML)
loop over the rows in the table
extract the information you are looking for from the columns of a row
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
my $parser = new HTML::TreeBuilder;
my $root = $parser->parse_file(\*DATA)
or die "HTML\n";
foreach my $row ($root->look_down(_tag => 'tr')) {
if (my #columns = $row->look_down(_tag => 'td')) {
my $player = $columns[1]->as_text();
my $total = $columns[2]->as_text();
print "Player => $player Total => $total\n";
}
}
exit 0;
__DATA__
<body>
<tr><td>10<td>MANNY MACHADO - FA</td><td>37</td></tr>
<tr><td>107</td><td>ALEDMYS DIAZ - HOU</td><td>18</td></tr>
</body>
Test run:
$ perl dummy.pl
Player => MANNY MACHADO - FA Total => 37
Player => ALEDMYS DIAZ - HOU Total => 18

With Mojo::DOM:
use strict;
use warnings;
use Mojo::DOM;
my $html = <<'EOD';
<tr><td>10<td>MANNY MACHADO - FA</td><td>37</td></tr>
<tr><td>107</td><td>ALEDMYS DIAZ - HOU</td><td>18</td></tr>
EOD
my $dom = Mojo::DOM->new($html);
foreach my $tr ($dom->find('tr')->each) {
my #cells = $tr->children('td')->each;
my $player = $cells[1]->all_text;
my $total = $cells[2]->all_text;
# or alternatively
my $player = $tr->at('td:nth-of-type(2)')->all_text;
my $total = $tr->at('td:nth-of-type(3)')->all_text;
print "\nPlayer => $player Total => $total\n";
}

You need to match an optional </tr>, so You can do that with the following (?:<\/tr>)? in your regex. This makes a non capturing group, because of the ?: at the start, that matches 0 or 1 times. So your new regex is
/<tr><td>\d+(?:<\/td>)?<td>(.*?)\s-.*?<\/td><td>(\d+?)</g
Normally I'd add a bit about not using regex to parse HTML, but as this is not well formed HTML I'll let it pass. However if you can exercise some control over what is creating the HTML try and fix it so that the <td> and </td> tags are balanced.

I'm also someone who would go for a proper HTML or XML modul to extract information like others above already said. So I will not elaborate on that.
If I would have to extract from the wrongly formatted html you showed, I'd stick with a multi step aproach.
cleanup
extract
more cleanup
For cleanup I'd first check what's common. In this case every line starts with <tr> so I'd settle for that to find my lines, skipping those not starting with <tr>, after some optional whitespace:
while (<>) {
next unless /^\s*<tr>/;
The next common thing I noticed is that every interesting field starts with td. So I'd replace it with something more easy like a tab. Assuming there could be tabs already, I'd first replace them with spaces:
tr/\t/ /;
s/<td>/\t/g;
Now what I have is some tags sprinkled around the data I really need. And the data I really need is prepended with a tab. So let's delete the tags:
s/<.*?>//g;
Finally I can extract my data:
my($dummy, $number, $player, $total)= split /\t/;
But since the player has some stuff appended (after -) let's remove that as well
$player=~ s/\s-.*//;
print "\nPlayer => $player Total => $total\n";
}
Putting it together and using DATA:
while (<DATA>) {
next unless /^\s*<tr>/;
tr/\t/ /;
s/<td>/\t/g;
s/<.*?>//g;
my($dummy, $number, $player, $total)= split /\t/;
$player=~ s/\s-.*//;
print "\nPlayer => $player Total => $total\n";
}
__DATA__
<tr><td>10<td>MANNY MACHADO - FA</td><td>37</td></tr>
<tr><td>107</td><td>ALEDMYS DIAZ - HOU</td><td>18</td></tr>
Please be prepared that you might come across data with more whitespace and the approach will fail.
Example:
<tr>
<td>10
<td>MANNY MACHADO - FA</td>
<td>37</td>
</tr>
<tr><td>107</td>
<td>ALEDMYS DIAZ - HOU</td>
<td>18</td>
</tr>

Perl GD::Graph Invalid data set: 0 at (pie)

i am trying to make a little PERL-Script (i am an beginner!)
I took an example Code and editet it to my needs.
So the task is to read data from a csv file put them into an html-table and also to show a diagram in pie form.
The table already works, only the pie diagram is my problem. I already looked and tried many changes within the diagram part in the code but not win bringing.
Here is my code:
#!C:\Perl64\bin\perl.exe -w
### Variablendeklarationen und Moduleinbindungen ###
use strict;
use CGI qw(:standard);
use CGI::Carp qw(fatalsToBrowser);
use DBI;
my $DBH = DBI->connect('DBI:CSV:');
my $STH;
use CGI::Carp 'fatalsToBrowser';
### Statement-Vorbereitung ###
$DBH->{'csv_tables'}->{'daten'} = { 'file' => 'daten.csv'}
or die "Konnte Datenbank nicht oeffnen:$!";
$STH = $DBH->prepare("SELECT * FROM daten")
or die "Konnte SQL-Statement nicht ausfuehren:$!";
$STH->execute()
or die "Ausfuehren der Datenbankabfrage nicht moeglich:$!";
print <<HERE_TEXT;
Content-type:text/html
<html>
<head>
<title>Datenanzeige CSV-File</title>
</head>
<body>
<center>
<h1>Folgende Umsatzdaten sind ausgelesen worden:</h1>
<hr>
<table border>
<tr>
<td width="200"><b>Filiale:</b></td>
<td width="100"><b>Leiter:</b></td>
<td width="200"><b>Mitarbeiter:</b></td>
<td width="100"><b>Umsatz:</b></td>
</tr>
HERE_TEXT
my #data;
my #diagarray;
while (#data = $STH->fetchrow_array()) {
my $filiale = $data[0];
my $leiter = $data[1];
my $mitarbeiter = $data[2];
my $umsatz = $data[3];
push (#diagarray, $umsatz);
print qq§<tr>\n<td><b>$filiale</b></td>\n<td>$leiter</td>\n<td>$mitarbeiter</td>\n<td>$umsatz</td>\n</tr>\n§;
}
print ("<br><br>");
use GD::Graph::pie;
my $graph = GD::Graph::pie->new(300, 300);
$graph->set(
title => 'Umsatzverteilung Filialen',
) or die $graph->error;
#my #diagram = (\#data,\#diagarray);
#Debug
#my $diagram;
# foreach $diagram(#diagram)
# {
# print ("$diagram\n");
# }
my $gd = $graph->plot(\#diagarray) or die $graph->error;
my $format = $graph->export_format;
print header("image/$format");
binmode STDOUT;
print $graph->plot(\#diagarray)->$format();
Would be great if anyone could give me the last needed hint.
Greetings

When debugging, always confirm your data and script flow, never assume anything to be correct.
Try
use Data::Dumper; # at the top of your script
[...]
print Dumper(\#diagarray); # just before your $graph->plot call
You'll probably notice that your data format differs from what is shown on http://search.cpan.org/~ruz/GDGraph-1.52/Graph.pm#USAGE
You're passing an ArrayRef to ->plot while the sample shows an ArrayRef of ArrayRefs:
[
['Desc1','Desc2'],
[250000, 350000],
]
I suggest to extract the drawing part and try it with static data until you get a working result. Then copy it back into your script and replace the static data with your data, for example:
#!/usr/bin/perl
use GD::Graph::pie;
my $graph = GD::Graph::pie->new(300, 300);
$graph->set(
title => 'Umsatzverteilung Filialen',
) or die $graph->error;
my #diagarray = (
['Title1', 'Title2', ],
[ 100, 200 ],
);
my $gd = $graph->plot(\#diagarray) or die $graph->error;
my $format = $graph->export_format;
print header("image/$format");
binmode STDOUT;
print $graph->plot(\#diagarray)->$format();
Also check the line reported in the error message. Each of your ->plot calls may be the reason.
Two additional remarks:
No(!) code should be within the use lines of your script as they're processed at compile time while code runs at run time. Mixing doesn't harm your script, but looks like my $DBH = DBI->connect('DBI:CSV:'); would run before use CGI::Carp.
print'ing HTML source from a script is ok for testing and learning, but shouldn't be done in productive environments as it makes maintenance harder. Try using Template::Toolkit or something.

Getting two array refs into #diagarray isn't any different to how you're pushing a scalar in to it.
push(#diagarray,\#labels);
push(#diagarray,\#values);
But you want that to happen outside of the while-loop. Inside the while-loop is where you'd populate #labels and #values. Both arrays have to be the same size.
Also your script is trying to output the HTML and piechart in one go which won't work as your browser will just treat it all as just one lump of HTML. Your HTML needs to have an "img" tag in it that points at another URL. That URL can be the same script but with a different query string. For example
use CGI
my $query=new CGI;
if($query->param("piechart")) {
# print out the graph
} else {
print "<img src=\"",$ENV{"SCRIPT_NAME"},"?piechart=1\"/>";
}
Or alternatively you could split the piechart code into an entirely separate script, but that makes it less easy to maintain as you'd have to update two scripts if the code for reading in the data ever changed.

CGI and Perl script one file, passing arguments

I have a script which fetches the summary file from the NCBI website using command line argument (accession number).
Example:
./efetch.pl NM_000040
Now I am trying to fetch the same file using a HTML webpage which takes the form request via a CGI script.
My question: Is it possible to combine the CGI and my Perl script in one file and pass the HTML form argument from the CGI portion of the code to the perl script in single run?
I have tried to do some scripting but it seems that the argument from the CGI is not getting passed to the Perl script.
Any help will be greatly appreciated.
CGI and Perl Script in one single file:
#!/usr/bin/perl -wT
use strict;
use warnings;
use LWP::Simple;
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
################### Environmental Variables ###################
my ($buffer, #pairs, $pair, $name, $value, %FORM);
# Read in text
$ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
if ($ENV{'REQUEST_METHOD'} eq "POST")
{
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
} else {
$buffer = $ENV{'QUERY_STRING'};
}
#print "$buffer\n";
# Split information into name/value pairs
#pairs = split(/&/, $buffer);
foreach $pair (#pairs) {
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
#$value =~ s/%(..)/pack("C", hex($1))/eg;
$FORM{$name} = $value;
}
my $access = $FORM{accession};
if ($access =~ m{\A(\w+\d+)\z}) {
$access = $1;
}
print "Content-type:text/html\r\n\r\n";
print "<html>";
print "<head>";
print "<title> CGI Program</title>";
print "</head>";
print "<body>";
if ($access eq "") {
print "<h2> Please check the accession number</h2>";
exit;
}
print "<h2>$access</h2>";
print "</body>";
print "</html>";
print <HEADING
<html>
<head>
<title> Output result of the program </title>
</head>
<body>
<h1> Summary result </h1>
<table border=1>
<tr>
<th>S.No.</th>
<th>Fragment</th>
<th>Position</th>
<th>Region</th>
<th>GC%</th>
</tr>
HEADING
;
######################## INPUT PARAMETERS #####################
my $utils = "http://www.ncbi.nlm.nih.gov/entrez/eutils";
my $db = "nuccore";
my $query = $access; #"$ARGV[0]" or die "Please provide input for the accession number. $!";
############### END OF INPUT PARAMETERS ######################
############### FILE DOWNLOAD FROM NCBI ######################
my $report = "gb"; # downloads the summary text file
open (IN,">", $query.".summary");
my $esearch = "$utils/esearch.fcgi?" . "db=$db&retmax=1&usehistory=y&term=";
my $esearch_result = get($esearch . $query);
$esearch_result =~ m|<Count>(\d+)</Count>.*<QueryKey>(\d+)</QueryKey>.*<WebEnv>(\S+)</WebEnv>|s;
my $Count = $1; my $QueryKey = $2; my $WebEnv = $3;
my $retstart; my $retmax=3;
for($retstart = 0; $retstart < $Count; $retstart += $retmax) {
my $efetch = "$utils/efetch.fcgi?" .
"rettype=$report&retmode=text&retstart=$retstart&retmax=$retmax&" .
"db=$db&query_key=$QueryKey&WebEnv=$WebEnv";
my $efetch_result = get($efetch);
print IN $efetch_result, "\n";
}
close (IN);
Print command in the perl script prints the $access but it fails to pass the value of $access to $query.
HTML form:
<form action="/cgi-bin/efetch.cgi" method="post" id="myform">
<div>
NCBI accession number:<label for="accession"> <input type="text" name="accession"> </label><br>
<input type="submit" value="Submit" form="myform">
</div>
</form>

Your script is much more complicated than it needs to be. Specifically - you're using the CGI module (which is deprecated, so you might want to consider something else*) but then you're trying to roll your own input handling in your script.
You can write a single script that sends 'POST' or 'GET' data to itself for processing. That's not too difficult at all.
A simple example might be
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
print "Content-Type: text/html\n\n";
my %param;
while ( <STDIN> ) {
my ( $key, $value ) = split ( "=" );
$param{$key} = $value;
}
print Dumper \%param;
print "<FORM METHOD=\"POST\">\n";
print " <INPUT TYPE=\"text\" NAME=\"access\">\n";
print " <INPUT TYPE=\"submit\">\n";
print "</FORM>\n";
This isn't a good example, but it'll work, and hopefully it'll give you an idea of what's going on - POSTed stuff comes on STDIN. GET stuff comes in the URL.
You can test for the existence of such input, and either render your basic form or process the input you got.
if ( $param{'access'} ) {
#process it;
else {
#print form;
}
There are many modules that make this easier (you're even using one already, in the form of CGI), so I wouldn't EVER suggest doing it this way 'for real' - this is purely an illustration of the basics.
With the CGI module, which is perhaps the thing that'd require least code alteration, you could use the 'CGI::param()' method to retrieve parameters:
use CGI;
print CGI::header;
print CGI::param('access');
#form stuff.
But a more complete one would be to consider a bit more of an in-depth rewrite, and consider using one of the more up to date 'web handling' frameworks. There really are a lot of potential gotchas. (Although it does depend rather, on how much control over your environment you have - internal/limited user scripts I'm a lot more relaxed about than internet facing).
* See: CGI::Alternatives

Writing a CGI program in 2014 is a lot like using a typewriter. Sure, it'll work, but people will look at you very strangely.
But given that you already have a CGI program, let's look at what it might look like if you used techniques that weren't out of date in the last millennium.
There are basically two underlying problems with your code.
You open a file using a name that comes from user input. And that violates the taint mode rules, so your program dies. You would have seen this in your web server error log, had you looked there.
You don't actually need to write the data to a file, because you want to send the data to the user's web browser.
So here's an improved version of your code. It fixes the two problems I mentioned above but it also uses modern tools.
The CGI module has a param() method which makes it far easier for us to get the parameters passed to our program. We also use its header() method to output the CGI header (basically just the Content-type header).
We use the Template module to move all of the HTML out of out code and put it in a separate area. Here I've cheated slightly and have just put it in the DATA section of the CGI program. Usually we'd put it in a completely separate file. Notice how separating the Perl and the HTML makes the program look cleaner and easier to maintain.
It wasn't clear to me exactly how you wanted to format the data you're getting back from the other web site. So I've just stuck it in a "pre" tag. You'll need to work that out for yourself.
Here's the code:
#!/usr/bin/perl -T
use strict;
use warnings;
use LWP::Simple;
use Template;
use CGI ':cgi';
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
my $access = param('accession');
my $utils = "http://www.ncbi.nlm.nih.gov/entrez/eutils";
my $db = "nuccore";
my $query = $access;
my $report = "gb"; # downloads the summary text file
my $esearch = "$utils/esearch.fcgi?" . "db=$db&retmax=1&usehistory=y&term=";
my $esearch_result = get($esearch . $query);
my $data = '';
if (my ($Count, $QueryKey, $WebEnv) = $esearch_result =~ m|<Count>(\d+)</Count>.*<QueryKey>(\d+)</QueryKey>.*<WebEnv>(\S+)</WebEnv>|s) {
my $retstart;
my $retmax=3;
for ($retstart = 0; $retstart < $Count; $retstart += $retmax) {
my $efetch = "$utils/efetch.fcgi?" .
"rettype=$report&retmode=text&retstart=$retstart&retmax=$retmax&" .
"db=$db&query_key=$QueryKey&WebEnv=$WebEnv";
my $efetch_result = get($efetch);
$data .= $efetch_result;
}
}
my $tt = Template->new;
print header;
$tt->process(\*DATA, { data => $data })
or die $tt->error;
__END__
<html>
<head>
<title> CGI Program</title>
</head>
<body>
<h1>Input</h1>
<form action="/cgi-bin/efetch.cgi" method="post" id="myform">
<div>NCBI accession number:<label for="accession"> <input type="text" name="accession"></label><br>
<input type="submit" value="Submit" form="myform"></div>
</form>
[% IF data -%]
<h1>Summary Result</h1>
<pre>
[% data %]
</pre>
[% END -%]
</body>
</html>

Final piece of the puzzle required ( returning a value requested by the user)

reference an earlier problem I have since changed my code to this(I am getting closer) however it now reads out the whole file and not the line I am trying to ask for. (I want to read out a line that contains a value that the user enters on the form.
form code:
#!\xampp\perl\bin\perl.exe
use CGI qw/:standard/; # load standard CGI routines
use CGI::Carp('fatalsToBrowser');
print header(); # create the HTTP header
print <<HTML
<head>
<title>Shop Here</title>
</head>
<body>
<h1>list</h1>
<br />
<form action="doSearch.pl">
animalname: <input type="text", name="search" size=5><br><br>
<input type="submit" value="select">
</form>
</body>
</html>
HTML
# <>;
response form////////////////////////
use CGI qw(:standard);
use CGI::Carp('fatalsToBrowser');
$search = new CGI;
#animallist = param;
print header, start_html("animal list"); #prints title on tab
$inFile = "animal.txt";
open (IN, $inFile) or
die "Can't find file: $inFile";
#animallist = (<IN>);
# print #animallist, "\n" ;
foreach $line (#animallist)
{
if ($line =~ $value)
{
print $line;
}
}
print end_html;

You really should put a question in your question.
I assume your code is not working.
I know that the line $line =~ $value may very well not do what you want if $value contains special characters.
have a look here ( \Q \E may be what you want ) IF that is the problem and you need to solve it.

Get contents from HTML tag using MyParser in Perl

I have a html as the following:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body bgcolor="white">
<h1>foo.c</h1>
<form method="post" action=""
enctype="application/x-www-form-urlencoded">
Compare this file to the similar file:
<select name="file2">
<option value="...">...</option>
</select>
<input type="hidden" name="file1" value="foo.c" /><br>
Show the results in this format:
</form>
<hr>
<p>
<pre>
some code
</pre>
I need to get value of input name = 'file' and the contents of HTML pre tag. I don't know on perl language, by googling I wrote this small program(that I believe isn't "elegant"):
#!/usr/bin/perl
package MyParser;
use base qw(HTML::Parser);
#Store the file name and contents obtaind from HTML Tags
my($filename, $file_contents);
#This value is set at start() calls
#and use in text() routine..
my($g_tagname, $g_attr);
#Process tag itself and its attributes
sub start {
my ($self, $tagname, $attr, $attrseq, $origtext) = #_;
$g_tagname = $tagname;
$g_attr = $attr;
}
#Process HTML tag body
sub text {
my ($self, $text) = #_;
#Gets the filename
if($g_tagname eq "input" and $g_attr->{'name'} eq "file1") {
$filename = $attr->{'value'};
}
#Gets the filecontents
if($g_tagname eq "pre") {
$file_contents = $text;
}
}
package main;
#read $filename file contents and returns
#note: it works only for text/plain files.
sub read_file {
my($filename) = #_;
open FILE, $filename or die $!;
my ($buf, $data, $n);
while((read FILE, $data, 256) != 0) {
$buf .= $data;
}
return ($buf);
}
my $curr_filename = $ARGV[0];
my $curr_file_contents = read_file($curr_filename);
my $parser = MyParser->new;
$parser->parse($curr_file_contents);
print "filename: ",$filename,"file contents: ",$file_contents;
Then I call ./foo.pl html.html But I'm getting empty values from $filename and $file_contents variables.
How to fix this?

Like always, there's more than one way to do it. Here's how to use the DOM Parser of Mojolicious for this task:
#!/usr/bin/env perl
use strict;
use warnings;
use Mojo::DOM;
# slurp all lines at once into the DOM parser
my $dom = Mojo::DOM->new(do { local $/; <> });
print $dom->at('input[name=file1]')->attr('value');
print $dom->at('pre')->text;
Output:
foo.c
some code

Using xpath and HTML::TreeBuilder::XPath Perl module ( very few lines ):
#!/usr/bin/env perl
use strict; use warnings;
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new_from_content( <> );
print $tree->findvalue( '//input[#name="file1"]/#value' );
print $tree->findvalue( '//pre/text()' );
USAGE
./script.pl file.html
OUTPUT
foo.c
some code
NOTES
in the past, I was using HTML::TreeBuilder module to do some web-scraping. Now, I can't go back to complexity. HTML::TreeBuilder::XPath do all the magic with the useful Xpath expressions.
you can use new_from_file method to open a file or a filehandle instead of new_from_content, see perldoc HTML::TreeBuilder ( HTML::TreeBuilder::XPath inherit methods from HTML::TreeBuilder)
using <> in this way is allowed here because HTML::TreeBuilder::new_from_content() specifically allows reading multiple lines in that way. Most constructors will not allow this usage. You should provide a scalar instead or use another method.

You don't generally want to use plain HTML::Parser unless you're writing your own parsing module or doing something generally tricky. In this case, HTML::TreeBuilder, which is a subclass of HTML::Parser, is the easiest to use.
Also, note that HTML::Parser has a parse_file method (and HTML::TreeBuilder makes it even easier with a new_from_file method, so you don't have to do all of this read_file business (and besides, there are better ways to do it than the one you picked, including File::Slurp and the old do { local $/; <$handle> } trick.
use HTML::TreeBuilder;
my $filename = $ARGV[0];
my $tree = HTML::TreeBuilder->new_from_file($filename);
my $filename = $tree->look_down(
_tag => 'input',
type => 'hidden',
name => 'file1'
)->attr('value');
my $file_contents = $tree->look_down(_tag => 'pre')->as_trimmed_text;
print "filename: ",$filename,"file contents: ",$file_contents;
For information on look_down, attr, and as_trimmed_text, see the HTML::Element docs; HTML::TreeBuilder both is a, and works with, elements.