Perl UTF8 in CGI problems - perl

I have a very simple Perl script which works right on the terminal but when run as a CGI script it produces garbage. The script basically take a HTML entities encoded data and converts it to print it. I have tried all the different setup like using "Encode" to change the output and set the STDOUT to utf8 mode and it does not help. I have also tried to change the environment of CGI to see if things will work like the terminal environment. Still no luck.
Here is the script
#!/usr/bin/perl
use HTML::Entities qw(encode_entities_numeric decode_entities);
use Encode qw/encode decode/;
binmode(STDOUT, ":utf8");
#$ENV{'PERL_UNICODE'} = 'D';
#$ENV{'LANG'} = 'en_US.UTF-8';
#$ENV{'TERM'} = 'vt100';
#$ENV{'SHELL'} = '/bin/bash';
#binmode(STDOUT, ":utf8");
print "Content-type: text/html\n\n";
my $y = decode_entities("Συστήματα_&#x
391;νίχνευσης_Εισ.pd
f");
#print encode("UTF8",$y);
print $y;
The output on terminal it is clean like
perl test.pl
Content-type: text/html
Συστήματα_Ανίχνευσης_Εισ.pdf
But on the CGI print it is garbled
ΣυστηÌματα_ΑνιÌχνευσης_Εισ.pdf
I am sort of stuck as I cannot find any simple way to solve this. Tried "encode_utf8" and utf8::upgrade of the variable but still no luck. Anyone's experience here will help a lot!
Thanks
Vijay

When interpreting a HTML document, the browser needs to know the encoding. The default encoding as per the HTML standard is not UTF-8. Since the browser is assuming the wrong encoding, it reads garbage.
Instead, you should specify the encoding explicitly, such as by printing a meta tag
<meta charset="utf-8">
or by including the encoding in the content type:
Content-type: text/html; charset=utf-8
Here, using the content type would seem most appropriate.

Related

Perl - Validate Chinese character input from web page form?

My Perl script accepts and processes input from a text field in a form on a web page. It was written for the English version of the web page and works just fine.
There is also a Chinese version of the page (a separate page, not both languages on the same page), and now I need my script to work with that. The user input on this page is expected to be in Chinese.
Expecting to need to work in UTF-8, I added
use utf8;
This continues to function just fine on the English page.
But in order to, for example, define a string variable for comparison that uses Chinese characters, I have to save the Perl script itself with utf-8 encoding. As soon as I do that, I get the dreaded 500 server error.
Clearly I'm going about this wrong and any helpful direction will be greatly appreciated/
Thanks.
EDIT - please see my clarification post below.
To handle utf8 properly :
use strict; use warnings;
use utf8;
use open(IO => ':encoding(utf8)');
binmode $_, ":utf8" for qw/STDOUT STDIN STDERR/;
open(my $fh, '<:utf8', '/file/path'); # if you need a file-handle
# code.....
Check
why-does-modern-perl-avoid-utf-8-by-default
perluniintro
I'm sorry - I think I poorly expressed my question by including too much information.
The issue is - if I save my script in ANSI format and upload it to the server, it works just fine for the English page. Expecting to want to use Chinese characters in the script, I saved it in UTF-8 format and re-uploaded, and suddenly it throws 500 for the English page.
I tested with a Hello World script:
#!/usr/bin/perl -T
use strict;
use warnings;
print "Content-type: text/html\n\n";
print "Hello, world!\n";
Works fine when saved as ANSI - fails 500 when saved as UTF8.

Perl drop down menus and Unicode

I've been going around on this for some time now and can't quite get it. This is Perl 5 on Ubuntu. I have a drop down list on my web page:
$output .= start_form . "Student: " . popup_menu(-name=>'student', -values=>['', #students], -labels=>\%labels, -onChange=>'Javascript:submit()') . end_form;
It's just a set of names in the form "Last, First" that are coming from a SQL Server table. The labels are created from the SQL columns like so:
$labels{uc($record->{'id'})} = $record->{'lastname'} . ", " . $record->{'firstname'};
The issue is that the drop down isn't displaying some Unicode characters correctly. For instance, "Søren" shows up in the drop down as "Søren". I have in my header:
use utf8;
binmode(STDOUT, ":utf8");
...and I've also played around with various takes on the "decode( )" function, to no avail. To me, the funny thing is that if I pull $labels into a test script and print the list to the console, the names appear just fine! So what is it about the drop down that is causing this? Thank you in advance.
EDIT:
This is the relevant functionality, which I've stripped down to this script that runs in the console and yields the correct results for three entries that have Unicode characters:
#!/usr/bin/perl
use DBI;
use lib '/home/web/library';
use mssql_util;
use Encode;
binmode(STDOUT, ":utf8");
$query = "[SQL query here]";
$dbh = &connect;
$sth = $dbh->prepare($query);
$result = $sth->execute();
while ($record = $sth->fetchrow_hashref())
{
if ($record->{'id'})
{
$labels{uc($record->{'id'})} = Encode::decode('UTF-8', $record->{'lastname'} . ", " . $record->{'nickname'} . " (" . $record->{'entryid'} . ")");
}
}
$sth->finish();
print "$labels{'ST123'}\n";
print "$labels{'ST456'}\n";
print "$labels{'ST789'}\n";
The difference in what the production script is doing is that instead of printing to the console like above, it's printing to HTTP:
$my_output = "<p>$labels{'ST123'}</p><br>
<p>$labels{'ST456'}</p><br>
<p>$labels{'ST789'}</p>";
$template =~ s/\$body/$my_output/;
print header(-cookie=>$cookie) . $template;
This gives, i.e., strings like "Zoë" and "Søren" on the page. BUT, if I remove binmode(STDOUT, ":utf8"); from the top of the production script, then the strings appear just fine on the page (i.e. I get "Zoë" and "Søren").
I believe that the binmode( ) line is necessary when writing UTF-8 to output, and yet removing it here produces the correct results. What gives?
Problem #1: Decoding inputs
53.C3.B8.72.65.6E is the UTF-8 encoding for Søren. When you instruct Perl to encode it all over again (by printing it to handle with the :utf8 layer), you are producing garbage.
You need to decode your inputs ($record->{id}, $record->{lastname}, $record->{firstname}, etc)! This will transform The UTF-8 bytes 53.C3.B8.72.65.6E ("encoded text") into the Unicode Code Points 53.F8.72.65.6E ("decoded text").
In this form, you will be able to use uc, regex matches, etc. You will also be able to print them out to a handle with an encoding layer (e.g. :encoding(UTF-8), or the improper :utf8).
You let on that these inputs come from a database. Most DBD have a flag that causes strings to be decoded. For example, if it's a MySQL database, you should pass mysql_enable_utf8mb4 => 1 to connect.
Problem #2: Communicating encoding
If you're going to output UTF-8, don't tell the browser it's ISO-8859-1!
$ perl -e'use CGI qw( :standard ); print header()'
Content-Type: text/html; charset=ISO-8859-1
Fixed:
$ perl -e'use CGI qw( :standard ); print header( -type => "text/html; charset=UTF-8" )'
Content-Type: text/html; charset=UTF-8
Hard to give a definitive solution as you don't give us much useful information. But here are some pointers that might help.
use utf8 only tells Perl that your source code is encoded as UTF-8. It does nothing useful here.
Reading perldoc perlunitut would be a good start.
Do you know how your database tables are encoded?
Do you know whether your database connection is configured to automatically decode data coming from the database into Perl characters?
What encoding are you telling the browser that you have encoded your HTTP response in?

perl cgi print header charset not work

I have a perl cgi program which output to a simple html form for user data input.
The form is in chinese big5 charset
When opened the cgi script, I have to manual switch web browser charset encoding to big5.
I searched on google and I found a method to set charset. Then
original code
$q = new CGI;
print $q->header;
to new code
$q = new CGI;
print $q->header(-charset=>'big5');
However, it just output a blank html.
This works for me:
use CGI;
my $q = CGI->new();
print $q->header(-charset => 'big5');
print '簡體字';
When i try it, it will be showed correctly. (Make sure, that your script is also saved in big5).
If those are the only two lines, then it's probably working.
Run the cgi from command line and you should see:
Content-Type: text/html; charset=big5
You're printing headers, but no content, so the page will be blank. Use Firebug or similar to verify the response from the server.

Corrupt Spanish characters when saving variables to a text file in Perl

I think I have an encoding problem. My knowledge of perl is not great. Much better with other languages, but I have tried everything I can think of and checked lots of other posts.
I am collecting a name and address. This can contain non english characters. In this case Spanish.
A php process uses curl to execute a .pl script and passes the values URLEncoded
The .pl executes a function in a .pm which writes the data to a text file. No database is involved.
Both the .pl and .pm have
use Encode;
use utf8;
binmode (STDIN, 'utf8');
binmode (STDOUT, 'utf8');
defined. Below is the function which is writing the text to a file
sub bookingCSV(#){
my $filename = "test.csv";
utf8::decode($_[1]{booking}->{LeadNameFirst});
open OUT, ">:utf8", $filename;
$_="\"$_[1]{booking}->{BookingNo}¦¦$_[1]{booking}->{ShortPlace}¦¦$_[1]{booking}->{ShortDev}¦¦$_[1]{booking}->{ShortAcc}¦¦$_[1]{booking}->{LeadNameFirst}¦¦$_[1]{booking}->{LeadNameLast}¦¦$_[1]{booking}->{Email}¦¦$_[1]{booking}->{Telephone}¦¦$_[1]{booking}->{Company}¦¦$_[1]{booking}->{Address1}¦¦$_[1]{booking}->{Address2}¦¦$_[1]{booking}->{Town}¦¦$_[1]{booking}->{County}¦¦$_[1]{booking}->{Zip}¦¦$_[1]{booking}->{Country}¦¦";
print OUT $_;
close (OUT);
All Spanish characters are corrupted in the text file. I have tried decode on one specific field "LeadNameFirst" but that has not made a difference. I left the code in place just in case it is useful.
Thanks for any help.
What is the encoding of the input? If the input encoding is not utf-8, then it will not do you any good to decode it as utf-8 input.
Does the input come from an HTML form? Then the encoding probably matches the encoding of the web page it came from. ISO-8859-1 is a common default encoding for American/European locales. Anyway, once you discover the encoding, you can decode the input with it:
$name = decode('iso-8859-1',$_[1]{booking}->{LeadNameFirst});
print OUT "name is $name\n"; # utf8 layer already enabled
Some browsers look for and respect a accept-charset attribute inside a <form> tag, e.g.,
<form action="/my_form_processor.php" accept-charset="UTF-8">
...
</form>
This will (cross your fingers) cause you to receive the form input as utf-8 encoded.

Why doesn't my Perl CGI program work on Windows?

I have written following in index.pl which is the C:\xampp\htdocs\perl folder:
#!/usr/bin/perl
print "<html>";
print "<h2>PERL IT!</h2>";
print "this is some text that should get displyed in browser";
print "</html>";
When I browse to http://localhost:88/perl/ the above HTML doesn't get displayed (I have tried in IE FF and chrome).
What would be the reason?
I have xampp and apache2.2 installed on this Windows XP system.
See also How do I troubleshoot my Perl CGI Script?.
Your problem was due to the fact that your script did not send the appropriate headers.
A valid HTTP response consists of two sections: Headers and body.
You should make sure that you use a proper CGI processing module. CGI.pm is the de facto standard. However, it has a lot of historical baggage and CGI::Simple provides a cleaner alternative.
Using one of those modules, your script would have been:
#!/usr/bin/perl
use strict; use warnings;
use CGI::Simple;
my $cgi = CGI::Simple->new;
print $cgi->header, <<HTML;
<!DOCTYPE HTML>
<html>
<head><title>Test</title></head>
<body>
<h1>Perl CGI Script</h1>
<p>this is some text that should get displyed in browser</p>
</body>
</html>
HTML
Keep in mind that print has no problem with multiple arguments. There is no reason to learn to program like it's 1999.
Maybe it's because you didn't put your text between <body> tags. Also you have to specify the content type as text/html.
Try this:
print "Content-type: text/html\n\n";
print "<html>";
print "<h2>PERL IT!</h2>";
print "<body>";
print "this is some text that should get displyed in browser";
print "</body>";
print "</html>";
Also, from the link rics gave,
Perl:
Executable: \xampp\htdocs and \xampp\cgi-bin
Allowed endings: .pl
so you should be accessing your script like:
http://localhost/cgi-bin/index.pl
I am just guessing.
Have you started the apache server?
Is 88 the correct port for reaching your apache?
You may also try http://localhost:88/perl/index.pl (so adding the script name to the correct address).
Check this documentation for help.