First cgi script in perl and don't know what it does - perl

I am new to Perl and I am looking into CGI programs.
I tried the following from Perl Monks and it works. But I have no idea what it does.
1) What is the END_HERE? that is followed by HTML? :
print <<END_HERE;
<html>
<head>
<title>My First CGI Script</title>
</head>
<body bgcolor="#FFFFCC">
<h1>This is a pretty lame Web page</h1>
<p>Who is this Ovid guy, anyway?</p>
</body>
</html>
END_HERE
2) I modified the sample script by adding:
my $query = new CGI;
my $p= $query->param('myparam');
I.e. the new script is:
#!C:\perl\bin\perl.exe -wT
use strict;
use CGI;
my $query = new CGI;
print $query->header( "text/html" );
my $time = $query->param('fromDate');
print <<END_HERE;
<html>
<head>
<title>My First CGI Script $time</title>
</head>
<body bgcolor="#FFFFCC">
<h1>This is a pretty lame Web page</h1>
<p>Who is this Ovid guy, anyway?</p>
</body>
</html>
END_HERE
# must have a line after "END_HERE" or Perl won't recognize
# the token
It stopped working. I get the following error message:
Undefined subroutine &main::param called at C:/.../test2.cgi line 10.
How can I get the parameters send by the browser if not this way?

... <<END_HERE ...
foo
bar
END_HERE
means
... "foo
bar
" ...
The choice of terminator is up to you. You can use any bareword or any string if you add quotes. Both the following are equivalent to "foo\nbar\n":
<<MEOW
foo
bar
MEOW
<<"And they lived happily ever after."
foo
bar
And they lived happily ever after.
The script you posted has two problems, neither of them resulting in the error you specified.
Perl can't find the end of the here-doc since no line contains solely END_HERE. You have one that contains END_HERE with a whole bunch of leading spaces, but that's not the same thing. Remove the leading spaces.
It allows an arbitrary string to be placed in the HTML. Do escape (using, say, HTML::Entities's encode_entities)! Consider what happens if someone passes the following to the fromDate parameter:
<script>alert("owned")</script>

Related

How to trigger Perl multiline substitution

I have a folder of HTML files which have the below DOCTYPE declaration which I need to remove, so that a not-very-good parser can successfully load it as XML.
I've been trying to use perl to do the substitution in place, but no change is made when I run the substitution and I can't figure out why. Can anyone identify the correct flags or specification I need to make in order to remove the DOCTYPE processing instruction here.
Here's an example file I'd like to manipulate.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 25 March 2009), see www.w3.org" />
<title></title>
</head>
<body>
</body>
</html>
Here's the perl one-liner I'm trying to use, which looks for the angle brackets, the exclamation mark, and everything before the close angle bracket. It incorporates perl substitution flags which other postings suggest should work for a multiline match - m for multiline, s for allowing newlines to be matched by regex. I'm then replacing the match with the empty string.
perl -i -e 's/<![^>]+>//gsm' `find . -name '*.html'`
I can't figure out why, but the DOCTYPE is not removed from the file after running this command. Does anyone else know why?
What you need is the -0777 switch which will cause the entire file to be read into a single string. If this is not used, the files will be read in line-by-line mode, and you can never match a multi-line statement that way.
Also, as Andomar points out, you are missing the -p switch, but I assume you figured that out.
The modifiers on the regex won't matter in this case, except the /g modifier. /m only affects ^ and $, and /s causes wildcard . to also match newlines. None of this applies to your regex.
So basically, you want something like:
perl -0777 -pi -e 's/<![^>]+>//g' ...
Side note:
Html should be handled with parsers, ideally, so I spent a few minutes working on using HTML::Parser which has a convenient option to strip declarations by adding a handler. Something like this seems to print ok for a single file:
perl -MHTML::Parser -we '
$p = HTML::Parser->new(default_h => [sub {print #_},"text"] );
$p->handler(declaration => "");
$p->parse_file(shift) or die $!; ' yourfile.html
I figured it would be overkill so I abandoned trying to fix it with the -pi in-place edit switches, but it is (probably) easily implemented in a script.
First, you seem to be missing the -p parameter, for processing the input line by line. -i doesn't seem to do much without -p.
Second, since -pi processes the input line-by-line, it can't replace a regex that spans more than one line.
You could write a Perl script instead. This script should run your regex on the entire content of all files passed on the command line:
use IO::All;
foreach my $file (#ARGV) {
my $content = io($file)->slurp;
$content =~ s/<![^>]+>//g;
$content > io($file);
}
The command cpan IO:All should install the IO:All module, if it is not present on your system.
P.S. The m and s options only affect ., ^ and $. I think you can omit them.

Perl CGI display image to browser from file

Friends,
I have been scouring the web for a solution to displaying images to a web browser with Perl and have found nothing that works for me.
I've tried possible solutions such as:
How To Display an Image with Perl
Outputting Image Data
Return an Image From a Script
and none of it works for what I'm doing. I want to deny client access to the image (Or even simply place the image file out of the www root) and dish it out server-side.
Here is an example of what I'm doing:
In my main perl file:
...
my $query = CGI->new();
sub main {
### This grabs page content from a module, depending on the page name.
### The module returns the HTML.
my $html = get_page('page', 'session');
### Perform any special conditionals here before printing the header...
print $query->header(some cookie/session data here);
print $html
}
In one of the modules:
sub return_page_content {
return <<HTML;
<html>
<body>
<img src="GET IMAGE HERE..." />
</body>
</html>
HTML
}
I've thought about just creating a copy of the image in a temp directory location, but that seems like it would defeat the entire purpose of keeping the image out of client-side access.
The probable solutions do not generate the image. I'm not sure where to go from here, so I am hoping someone here has an idea. Thank you so much!
Please let me know if I need to provide additional information. I feel like this could be beneficial to a lot of people. I hope at least ;-)
For a simple 'send the file to the browser solution', just send the browser the correct headers (to let the browser know what's coming), and then open your image and print the content to STDOUT.
select(STDOUT); $| = 1; #unbuffer STDOUT
print "Content-type: image/png\n\n";
open (IMAGE, '<', '/image_outside_webroot/image.png');
print <IMAGE>;
close IMAGE;
Once that is working, take a look at ImageMagick. There are all kinds of on-the-fly, fun image manipulations you can do (resizing, colorizing, etc.)
Your cgi script would contain code that looks something like this:
select(STDOUT); $| = 1; #unbuffer STDOUT
my $image = Image::Magick->new();
my $x = $image->Read(filename =>"/images_outside_web_root/image1.png");
#some manipulation of the image here
print "Content-type: image/png\n\n";
binmode STDOUT;
$x = $image->Write(.png:-');
You can read more about this on the site linked above.
Hope that helps.
Well, it turns out the reason this wasn't working is because I completely forgot I do not operate with CGI out of the standard cgi_bin directory. Instead, I use an .htaccess file to tell the server how and when to interpret files from the root directory as cgi.
So, this is what I ended up using in my image dishing script:
imagedish.pm
use CGI;
my $cgi = new CGI;
open (IMAGE, 'logo.png');
my $size = -s "logo.png";
my $data;
read IMAGE, $data, $size;
close (IMAGE);
print $cgi->header(-type=>'image/png'), $data;
This did the trick, as it should have been doing from the beginning, but in my .htaccess file, I added:
<files imagedish.pm>
SetHandler cgi-script
</files>
And that did the trick (Well, that, as well as going into Terminal and running chmod +x imagedish.pm to make it executable)! Of course the next steps are additional security measures, but at least it's working now! :-)
The full solution:
mainfile:
!/usr/bin/perl
use CGI;
use CGI::Carp qw(fatalsToBrowser);
use CGI::Session qw/-ip-match/;
use DBI;
use strict;
use warnings;
# Variables
my $query = CGI->new();
my %vars = $query->Vars();
sub main {
my $p1 = $vars{p1};
$p1 = 'Home' if (!$p1);
my $html = get_page();
#I use this method in case we have multiple sessions
#I've omitted how I acquire the session, as this is not part of the solution ;-)
print $query->header(-cookie=>[$query->cookie($vars{p1}=>$session->id)]);
print $html;
}
sub get_page {
return <<HTML;
<!DOCTYPE HTML>
<html>
<head>
<title>Image Disher</title>
<link rel="shortcut icon" href="images/favicon.ico" />
<link href="css/style.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div class="container">
<div class="addentry">
<div class="iaddentry">
<form name="client" action="" method="POST">
<div class="form-header" action="">
<center>
<img src="imagedish.pm" width="305" alt=""/><br>
</center>
<br>
</div>
</form>
</div>
</div>
</div>
</body>
</html>
HTML
}
main();
In here, the img tag looks for the source "imagedish.pm", and once it finds it, the .htaccess file tells it to execute as a CGI script. At that point, it dishes out information appropriately, not like before.
Please note, this is not the most-secure way to do it, but it gets me going in the right direction.
The links you found (except the first one) describe how to do it, I think you are just getting confused with the difference between delivering an image and delivering an html file with an img tag on it. Keep in mind that when your browser is parsing an html file and it encounters an img tag, it takes the src url in the tag and makes an additional get request for it.
Try capturing the raw output from a request for an image using curl, wireshark, etc. The result is what you want to try to create. It's just a matter of returning the content type http header, followed by the binary image data.
Have another look at this example, and get rid of the random_file sub, and replace this line:
my $image = random_file( IMAGE_DIRECTORY, '\\.(png|jpg|gif)$' );
with this:
my $image = "path to an image file accessible by www-user";
Hopefully once you that working and understand what it's doing, you'll know what you need to do next.
Yet another way is to embed the image using the <img src="data:image/...,base64,..."> format.
This defeats browser caching and isn't great for very large images. But is useful if its easier to construct the image as part of the initial page load or you don't want the hassle of managing/serving them via a file system.
#!/user/bin/perl
use warnings; use strict;
use MIME::Base64 qw();
sub return_page_content {
my $image_type = shift;
my $image_data = shift;
my $image_base64 = MIME::Base64::encode($image_data);
$image_base64 =~ s{\n}{}g; # lose newlines
return <<HTML;
<html>
<body>
<img src="data:image/${image_type};base64,${image_base64}" />
</body>
</html>
HTML
}
my $image_path = "/tmp/test.jpg";
open(my $fh, '<', $image_path)
or die "unable to open file $image_path: $!";
binmode($fh);
my $image_data = do {local $/; <$fh>};
close($fh);
print return_page_content("jpeg", $image_data);

Ignore Text in HTML::TreeBuilder Output Perl

I need to ignore or remove all text in between all HTML elements so I can generate a blank template from a given web page.
I am parsing using the perl module HTML::TreeBuilder and HTML::Element.
I have tried the ignore_text method noted in the documentation but that doesn't provide correct results.
I have also tried using DOMXpath with PHP to do the same thing and results seemed too cumbersome to manage. Regex's might work but are a last resort to me.
This is part of my current code, very basic. Bottom is just output to file. All code is functional I just need formatting to work so I can generate template files.
my $url= "http://www.example.com";
my $page = get($url) or die $!;
my $tree = HTML::TreeBuilder->new_from_content($page);
$tree->parse_file($page);
$tree->ignore_text;
$tree->elementify;
open OUTPUT, "+>".$body;
my $output = $tree->as_HTML;
print OUTPUT $output;
close OUTPUT;
Thanks in advance for the help!
EDIT: I found the problem - the ignore text only works when you parse from a physical file. I had to save the page as a temp file to parse then output the way I wanted with no text then I just did unlink($tmp) at the bottom to delete the file. My script has since grown much more complicated with reading and writing to database and each time I need to create this temp file which is kind of annoying...
Thanks for the reply below!
You are very close.
It looks like you need to set ignore_text with a true value. $tree->ignore_text(1) and then make sure its set before calling parse_file.
Sorry this is a bit long but i hope it helps.
Here is quick pass at the new code, hard to test without example page:
my $tree = HTML::TreeBuilder->new;
$tree->ignore_text(1);
$tree->elementify;
$tree->parse_file( $page );
Here is my quick test script using a local file:
use strict;
use warnings;
use HTML::TreeBuilder;
my $page = 'test.html';
my $tree = HTML::TreeBuilder->new();
$tree->ignore_text(1);
$tree->parse_file($page);
$tree->elementify;
print $tree->as_HTML;
Input test.html:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>title text</title>
</head>
<body>
<h1>Heading 1</h1>
<p>paragraph text</p>
</body>
</html>
And output:
<html xmlns="http://www.w3.org/1999/xhtml"><head><title></title></head><body><h1></h1><p></body></html>
Good luck
Maybe you should use HTML::Parser for this task. It is maybe a little bit more code, but should not to complicated.

I want to check if number starts with 4 or 5 in CGI

I need to write some script in CGI which is new to me. I am trying to do if else with condition numbers starting with 5 or 6. So do one code if number starts with 5 and do another if number starts with 6.
use 5.013;
use warnings;
use Scalar::Util qw( looks_like_number );
use CGI;
my $param = CGI->new()->param('some_example');
given (substr $param, 0, 1) {
when (! looks_like_number($_) ) { say 'Not a number' }
when (5) { say 'starts with 5' }
when (6) { say 'starts with 6' }
}
Alternatively, rather than using substr to get the first letter, put $param and change (5) to your regex of choice.
I don't think you understand what CGI is. CGI is simply a set of environment variables that are set up by the webserver, and your program is executed with them. The output of the program becomes the webpage.
So if you want to write a CGI script in Python, PHP, C, Assembly, Whitespace... as long as it can be called and use environment variables, it's fine.
So this is really a language question. Which language are you using?
EDIT You specified Perl in a comment to this answer. I suggest you edit the question.
What's your input number? The Perl script will be run with a whole truckload of extra environment variables. Two of the most important are QUERY_STRING and REQUEST_METHOD. CGI consists of a specification of these environment variables, so any language can be used to write CGI.
Consider perl_cgi.cgi?something=else. The bit following the ? is the QUERY_STRING. You can specify this directly as part of an anchor:
Run with something equals else
or as part of a form (one of GET or POST, defaults to GET):
<form action="perl_cgi.cgi" method="[GET or POST]">
<input type="text" name="something" value="else"/>
<input type="submit" value="Submit!"/>
</form>
This will run your program with the same query string as above (or a different parameter, if the text box is changed) but REQUEST_METHOD will be either GET or POST depending.
So let's write a Perl CGI script to print the first number of the string we get (we're only passed strings):
use CGI;
$cgi=new CGI;
$x=$cgi->param('x');
$firstnum=substr($x, 0, 1);
print "Content-type: text/html\n\n";
print <<"EOF";
<html>
<head>
<title>My sample HTML page</title>
</head>
<body>
<p>The first number of $x is $firstnum</p>
</body>
</html>
EOF
This presupposes that this program is run as [program_name]?x=[some string]. It's up to you to make sure that's the case.
That should give you enough. You can check firstnum to see if its 5 or 6, then do different things depending.

Why doesn't my Perl CGI program work on Windows?

I have written following in index.pl which is the C:\xampp\htdocs\perl folder:
#!/usr/bin/perl
print "<html>";
print "<h2>PERL IT!</h2>";
print "this is some text that should get displyed in browser";
print "</html>";
When I browse to http://localhost:88/perl/ the above HTML doesn't get displayed (I have tried in IE FF and chrome).
What would be the reason?
I have xampp and apache2.2 installed on this Windows XP system.
See also How do I troubleshoot my Perl CGI Script?.
Your problem was due to the fact that your script did not send the appropriate headers.
A valid HTTP response consists of two sections: Headers and body.
You should make sure that you use a proper CGI processing module. CGI.pm is the de facto standard. However, it has a lot of historical baggage and CGI::Simple provides a cleaner alternative.
Using one of those modules, your script would have been:
#!/usr/bin/perl
use strict; use warnings;
use CGI::Simple;
my $cgi = CGI::Simple->new;
print $cgi->header, <<HTML;
<!DOCTYPE HTML>
<html>
<head><title>Test</title></head>
<body>
<h1>Perl CGI Script</h1>
<p>this is some text that should get displyed in browser</p>
</body>
</html>
HTML
Keep in mind that print has no problem with multiple arguments. There is no reason to learn to program like it's 1999.
Maybe it's because you didn't put your text between <body> tags. Also you have to specify the content type as text/html.
Try this:
print "Content-type: text/html\n\n";
print "<html>";
print "<h2>PERL IT!</h2>";
print "<body>";
print "this is some text that should get displyed in browser";
print "</body>";
print "</html>";
Also, from the link rics gave,
Perl:
Executable: \xampp\htdocs and \xampp\cgi-bin
Allowed endings: .pl
so you should be accessing your script like:
http://localhost/cgi-bin/index.pl
I am just guessing.
Have you started the apache server?
Is 88 the correct port for reaching your apache?
You may also try http://localhost:88/perl/index.pl (so adding the script name to the correct address).
Check this documentation for help.