perl cgi print header charset not work - perl

I have a perl cgi program which output to a simple html form for user data input.
The form is in chinese big5 charset
When opened the cgi script, I have to manual switch web browser charset encoding to big5.
I searched on google and I found a method to set charset. Then
original code
$q = new CGI;
print $q->header;
to new code
$q = new CGI;
print $q->header(-charset=>'big5');
However, it just output a blank html.

This works for me:
use CGI;
my $q = CGI->new();
print $q->header(-charset => 'big5');
print '簡體字';
When i try it, it will be showed correctly. (Make sure, that your script is also saved in big5).

If those are the only two lines, then it's probably working.
Run the cgi from command line and you should see:
Content-Type: text/html; charset=big5

You're printing headers, but no content, so the page will be blank. Use Firebug or similar to verify the response from the server.

Related

Properly displaying UTF-8 chars in Perl

I am running perl 5, version 24, subversion 3 (v5.24.3) built for MSWin32-x64-multi-thread
(with 1 registered patch, see perl -V for more detail) (Active State).
Trying to parse HTML page encoded in UTF-8:
$request = new HTTP::Request('GET', $url);
$response = $ua->request($request);
$content = $response->content();
I parse the $content as one giant string using INDEX and SUBSTR functions, that works fine.
HTML page contains string with value ÖBB and I need to insert it in the database exactly as ÖBB
When I print it and insert in the db, instead of Ö I get some ascii characters.
NOTE: this question is not database related; MySQL handles utf-8 just fine, so if I insert value "ÖBB" it will take it no problem.
I've looked at great number of similar questions/answers here and in other forums and I am none wiser.
use utf-8 and binmode(STDOUT, ":utf8") has not worked for me...
Would greatly appreciate a code snippet that would solve the issue, thank you.
Decode inputs; encode outputs.
First of all, you don't decode your inputs.
$response->content returns the raw content that could be in any encoding. Use $response->decoded_content(); to get the decoded response if it's HTML.
Second of all, you might not be encoding your outputs.
You didn't specify which database driver you use. Most DBI drivers have an option you need to specify. For example, with MySQL, you want
my $dbh = DBI->connect(
'dbi:mysql:...',
$user, $password,
{
mysql_enable_utf8mb4 => 1,
...
},
);
You mentioned use utf8;. That tells Perl that your source code is encoded using UTF-8 rather than ASCII. Do use it if your source code is encoded using UTF-8.
This is not directly related to your issue.
You mentioned binmode(STDOUT, ":utf8"). That's a very poor way of writing
use open ':std', ':encoding(UTF-8)';
The above handles that for STDIN, STDOUT and STDERR, and does so at compile time. It also sets the default for files open in scope of the pragma.
But that's assuming the terminal expects UTF-8. That would be the case if you used chcp 65001. For a version that handles whatever encoding the terminal expects, you can use the following:
BEGIN {
require Win32;
my $cie = "cp" . Win32::GetConsoleCP();
my $coe = "cp" . Win32::GetConsoleOutputCP();
my $ae = "cp" . Win32::GetACP();
binmode(STDIN, ":encoding($cie)");
binmode(STDOUT, ":encoding($coe)");
binmode(STDERR, ":encoding($coe)");
require open;
"open"->import(":encoding($ae)");
}
This has a few more details.
This is not directly related to your issue.
This is what worked:
use Win32::API;
binmode(STDOUT, ":unix:utf8");
$SetConsoleOutputCP= new Win32::API( 'kernel32.dll',
'SetConsoleOutputCP', 'N','N' );
$SetConsoleOutputCP->Call(65001);
All this was on the surface and I simply overlooked it ;-)
For MySQL db to work right and accept utf-8 encoded string this connection parameter had to be enabled:
mysql_enable_utf8 => 1,
There are several components are involved when you capture webpage and output it to the screen.
For the moment let's assume that you use Windows and run following script in a terminal window.
First you need to confirm that your terminal supports UTF8 encoding. Type command chcp and see if it will output 65001.
If it does then you set, if it does not then issue the following command chcp 65001.
Run the script with command perl script_name.pl and you should get output with ÖBB included in terminal window
use strict;
use warnings;
use utf8;
use feature 'say';
use HTTP::Tiny;
my $url = shift || 'https://www.thetrainline.com/en/train-companies/obb';
my $response = HTTP::Tiny->new->get($url);
if ($response->{success}) {
my $html = $response->{content};
$html =~ m/(<p>Planning.+pets.<\/p>)/;
say $1;
}
To store data in UTF8 encoding in database, the database should be configured to support UTF8 encoding.
In case of MYSQL database the command should look like following
CREATE DATABASE mydb
CHARACTER SET utf8
COLLATE utf8_general_ci;
See the following MYSQL documentation webpage.

Perl UTF8 in CGI problems

I have a very simple Perl script which works right on the terminal but when run as a CGI script it produces garbage. The script basically take a HTML entities encoded data and converts it to print it. I have tried all the different setup like using "Encode" to change the output and set the STDOUT to utf8 mode and it does not help. I have also tried to change the environment of CGI to see if things will work like the terminal environment. Still no luck.
Here is the script
#!/usr/bin/perl
use HTML::Entities qw(encode_entities_numeric decode_entities);
use Encode qw/encode decode/;
binmode(STDOUT, ":utf8");
#$ENV{'PERL_UNICODE'} = 'D';
#$ENV{'LANG'} = 'en_US.UTF-8';
#$ENV{'TERM'} = 'vt100';
#$ENV{'SHELL'} = '/bin/bash';
#binmode(STDOUT, ":utf8");
print "Content-type: text/html\n\n";
my $y = decode_entities("Συστήματα_&#x
391;νίχνευσης_Εισ.pd
f");
#print encode("UTF8",$y);
print $y;
The output on terminal it is clean like
perl test.pl
Content-type: text/html
Συστήματα_Ανίχνευσης_Εισ.pdf
But on the CGI print it is garbled
ΣυστηÌματα_ΑνιÌχνευσης_Εισ.pdf
I am sort of stuck as I cannot find any simple way to solve this. Tried "encode_utf8" and utf8::upgrade of the variable but still no luck. Anyone's experience here will help a lot!
Thanks
Vijay
When interpreting a HTML document, the browser needs to know the encoding. The default encoding as per the HTML standard is not UTF-8. Since the browser is assuming the wrong encoding, it reads garbage.
Instead, you should specify the encoding explicitly, such as by printing a meta tag
<meta charset="utf-8">
or by including the encoding in the content type:
Content-type: text/html; charset=utf-8
Here, using the content type would seem most appropriate.

Perl - Validate Chinese character input from web page form?

My Perl script accepts and processes input from a text field in a form on a web page. It was written for the English version of the web page and works just fine.
There is also a Chinese version of the page (a separate page, not both languages on the same page), and now I need my script to work with that. The user input on this page is expected to be in Chinese.
Expecting to need to work in UTF-8, I added
use utf8;
This continues to function just fine on the English page.
But in order to, for example, define a string variable for comparison that uses Chinese characters, I have to save the Perl script itself with utf-8 encoding. As soon as I do that, I get the dreaded 500 server error.
Clearly I'm going about this wrong and any helpful direction will be greatly appreciated/
Thanks.
EDIT - please see my clarification post below.
To handle utf8 properly :
use strict; use warnings;
use utf8;
use open(IO => ':encoding(utf8)');
binmode $_, ":utf8" for qw/STDOUT STDIN STDERR/;
open(my $fh, '<:utf8', '/file/path'); # if you need a file-handle
# code.....
Check
why-does-modern-perl-avoid-utf-8-by-default
perluniintro
I'm sorry - I think I poorly expressed my question by including too much information.
The issue is - if I save my script in ANSI format and upload it to the server, it works just fine for the English page. Expecting to want to use Chinese characters in the script, I saved it in UTF-8 format and re-uploaded, and suddenly it throws 500 for the English page.
I tested with a Hello World script:
#!/usr/bin/perl -T
use strict;
use warnings;
print "Content-type: text/html\n\n";
print "Hello, world!\n";
Works fine when saved as ANSI - fails 500 when saved as UTF8.

Perl drop down menus and Unicode

I've been going around on this for some time now and can't quite get it. This is Perl 5 on Ubuntu. I have a drop down list on my web page:
$output .= start_form . "Student: " . popup_menu(-name=>'student', -values=>['', #students], -labels=>\%labels, -onChange=>'Javascript:submit()') . end_form;
It's just a set of names in the form "Last, First" that are coming from a SQL Server table. The labels are created from the SQL columns like so:
$labels{uc($record->{'id'})} = $record->{'lastname'} . ", " . $record->{'firstname'};
The issue is that the drop down isn't displaying some Unicode characters correctly. For instance, "Søren" shows up in the drop down as "Søren". I have in my header:
use utf8;
binmode(STDOUT, ":utf8");
...and I've also played around with various takes on the "decode( )" function, to no avail. To me, the funny thing is that if I pull $labels into a test script and print the list to the console, the names appear just fine! So what is it about the drop down that is causing this? Thank you in advance.
EDIT:
This is the relevant functionality, which I've stripped down to this script that runs in the console and yields the correct results for three entries that have Unicode characters:
#!/usr/bin/perl
use DBI;
use lib '/home/web/library';
use mssql_util;
use Encode;
binmode(STDOUT, ":utf8");
$query = "[SQL query here]";
$dbh = &connect;
$sth = $dbh->prepare($query);
$result = $sth->execute();
while ($record = $sth->fetchrow_hashref())
{
if ($record->{'id'})
{
$labels{uc($record->{'id'})} = Encode::decode('UTF-8', $record->{'lastname'} . ", " . $record->{'nickname'} . " (" . $record->{'entryid'} . ")");
}
}
$sth->finish();
print "$labels{'ST123'}\n";
print "$labels{'ST456'}\n";
print "$labels{'ST789'}\n";
The difference in what the production script is doing is that instead of printing to the console like above, it's printing to HTTP:
$my_output = "<p>$labels{'ST123'}</p><br>
<p>$labels{'ST456'}</p><br>
<p>$labels{'ST789'}</p>";
$template =~ s/\$body/$my_output/;
print header(-cookie=>$cookie) . $template;
This gives, i.e., strings like "Zoë" and "Søren" on the page. BUT, if I remove binmode(STDOUT, ":utf8"); from the top of the production script, then the strings appear just fine on the page (i.e. I get "Zoë" and "Søren").
I believe that the binmode( ) line is necessary when writing UTF-8 to output, and yet removing it here produces the correct results. What gives?
Problem #1: Decoding inputs
53.C3.B8.72.65.6E is the UTF-8 encoding for Søren. When you instruct Perl to encode it all over again (by printing it to handle with the :utf8 layer), you are producing garbage.
You need to decode your inputs ($record->{id}, $record->{lastname}, $record->{firstname}, etc)! This will transform The UTF-8 bytes 53.C3.B8.72.65.6E ("encoded text") into the Unicode Code Points 53.F8.72.65.6E ("decoded text").
In this form, you will be able to use uc, regex matches, etc. You will also be able to print them out to a handle with an encoding layer (e.g. :encoding(UTF-8), or the improper :utf8).
You let on that these inputs come from a database. Most DBD have a flag that causes strings to be decoded. For example, if it's a MySQL database, you should pass mysql_enable_utf8mb4 => 1 to connect.
Problem #2: Communicating encoding
If you're going to output UTF-8, don't tell the browser it's ISO-8859-1!
$ perl -e'use CGI qw( :standard ); print header()'
Content-Type: text/html; charset=ISO-8859-1
Fixed:
$ perl -e'use CGI qw( :standard ); print header( -type => "text/html; charset=UTF-8" )'
Content-Type: text/html; charset=UTF-8
Hard to give a definitive solution as you don't give us much useful information. But here are some pointers that might help.
use utf8 only tells Perl that your source code is encoded as UTF-8. It does nothing useful here.
Reading perldoc perlunitut would be a good start.
Do you know how your database tables are encoded?
Do you know whether your database connection is configured to automatically decode data coming from the database into Perl characters?
What encoding are you telling the browser that you have encoded your HTTP response in?

Why doesn't my Perl CGI program work on Windows?

I have written following in index.pl which is the C:\xampp\htdocs\perl folder:
#!/usr/bin/perl
print "<html>";
print "<h2>PERL IT!</h2>";
print "this is some text that should get displyed in browser";
print "</html>";
When I browse to http://localhost:88/perl/ the above HTML doesn't get displayed (I have tried in IE FF and chrome).
What would be the reason?
I have xampp and apache2.2 installed on this Windows XP system.
See also How do I troubleshoot my Perl CGI Script?.
Your problem was due to the fact that your script did not send the appropriate headers.
A valid HTTP response consists of two sections: Headers and body.
You should make sure that you use a proper CGI processing module. CGI.pm is the de facto standard. However, it has a lot of historical baggage and CGI::Simple provides a cleaner alternative.
Using one of those modules, your script would have been:
#!/usr/bin/perl
use strict; use warnings;
use CGI::Simple;
my $cgi = CGI::Simple->new;
print $cgi->header, <<HTML;
<!DOCTYPE HTML>
<html>
<head><title>Test</title></head>
<body>
<h1>Perl CGI Script</h1>
<p>this is some text that should get displyed in browser</p>
</body>
</html>
HTML
Keep in mind that print has no problem with multiple arguments. There is no reason to learn to program like it's 1999.
Maybe it's because you didn't put your text between <body> tags. Also you have to specify the content type as text/html.
Try this:
print "Content-type: text/html\n\n";
print "<html>";
print "<h2>PERL IT!</h2>";
print "<body>";
print "this is some text that should get displyed in browser";
print "</body>";
print "</html>";
Also, from the link rics gave,
Perl:
Executable: \xampp\htdocs and \xampp\cgi-bin
Allowed endings: .pl
so you should be accessing your script like:
http://localhost/cgi-bin/index.pl
I am just guessing.
Have you started the apache server?
Is 88 the correct port for reaching your apache?
You may also try http://localhost:88/perl/index.pl (so adding the script name to the correct address).
Check this documentation for help.