www::curl - how to upload (post) large files - perl

I use WWW::Curl to upload files:
use WWW::Curl::Easy 4.14;
use WWW::Curl::Form;
my $url = 'http://example.com/backups/?sid=12313qwed323';
my $params = {
name => 'upload',
action => 'keep',
backup1 => [ '/tmp/backup1.zip' ], # 1st file for upload
};
my $form = WWW::Curl::Form->new();
foreach my $k (keys %{$params}) {
if (ref $params->{$k}) {
$form->formaddfile(#{$params->{$k}}[0], $k, 'multipart/form-data');
} else {
$form->formadd($k, $params->{$k});
}
}
my $curl = WWW::Curl::Easy->new() or die $!;
$curl->setopt(CURLOPT_HTTPPOST, $form);
$curl->setopt(CURLOPT_URL, $url);
my $body;
$curl->setopt(CURLOPT_WRITEDATA, \$body);
my $retcode = $curl->perform();
my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE);
nothing special here and this code works well.
I want to upload large files and I don't want to preload everything in the memory. At least that is what I heard that libcurl is doing.
CURLOPT_READFUNCTION accepts callbacks which returns parts of the content. That means that I cannot use WWW::Curl::Form to set POST parameters but that I have to return the whole content through this callback. Is that right?
I think that the code could look like this:
use WWW::Curl::Easy 4.14;
my $url = 'http://example.com/backups/?sid=12313qwed323'
my $params = {
name => 'upload',
action => 'keep',
backup1 => [ '/tmp/backup1.zip' ], # 1st file for upload
};
my $fields;
foreach my $k (keys %{$params}) {
$fields .= "$k=".(ref $params->{$k} ? '#'.#{$params->{$k}}[0] : uri_escape_utf8($params->{$k}))."&";
}
chop($fields);
my $curl = WWW::Curl::Easy->new() or die $!;
$curl->setopt(CURLOPT_POST, 1);
$curl->setopt(CURLOPT_POSTFIELDS, $fields); # is it needed with READFUNCTION??
$curl->setopt(CURLOPT_URL, $url);
my #header = ('Content-type: multipart/form-data', 'Transfer-Encoding: chunked');
$curl->setopt(CURLOPT_HTTPHEADER, \#header);
#$curl->setopt(CURLOPT_INFILESIZE, $size);
$curl->setopt(CURLOPT_READFUNCTION, sub {
# which data to return here?
# $params (without file) + file content?
return 0;
});
Which data does CURLOPT_READFUNCTION callback have to return? $params + File(s) content? In which format?
Do I really have to create the data (returned by CURLOPT_READFUNCTION) by myself or is there a simple way to create it in the right format?
Thanks

Test 16formpost.t is relevant. As you can see, it's completely disabled. This fact and my fruitless experiments with various return values for the callback function lets me believe the CURLOPT_READFUNCTION feature is known broken in the Perl binding.
I have to return the whole content through this callback. Is that right?
No, you can feed it the request body piecewise, suitable for chunked encoding. The callback will be necessarily called several times, according to the limit set in CURLOPT_INFILESIZE.
Which data does CURLOPT_READFUNCTION callback have to return?
A HTTP request body. Since you do a file upload, this means Content-Type multipart/form-data. Following is an example using HTTP::Message. CURLOPT_HTTPPOST is another way to construct this format.
use HTTP::Request::Common qw(POST);
use WWW::Curl::Easy 4.14;
my $curl = WWW::Curl::Easy->new or die $!;
$curl->setopt(CURLOPT_POST, 1);
$curl->setopt(CURLOPT_URL, 'http://localhost:5000');
$curl->setopt(CURLOPT_HTTPHEADER, [
'Content-type: multipart/form-data', 'Transfer-Encoding: chunked'
]);
$curl->setopt(CURLOPT_READFUNCTION, sub {
return POST(undef, Content_Type => 'multipart/form-data', Content => [
name => 'upload',
action => 'keep',
backup1 => [ '/tmp/backup1.zip' ], # 1st file for upload
])->content;
});
my $r = $curl->perform;

The CURLOPT_READFUNCTION callback is only used for chunked tranfer encoding. It may work, but I haven't been able to get it to and found that doing so wasn't necessary anyway.
My use case was for upload of data to AWS, where it's not ok to upload the data as multi-part form data. Instead, it's a straight POST of the data. It does require that you know how much data you're sending the server, though. This seems to work for me:
my $infile = 'file-to-upload.json';
my $size = -s $infile;
open( IN, $infile ) or die("Cannot open file - $infile. $! \n");
my $curl = WWW::Curl::Easy->new;
$curl->setopt(CURLOPT_HEADER, 1);
$curl->setopt(CURLOPT_NOPROGRESS, 1);
$curl->setopt(CURLOPT_POST, 1);
$curl->setopt(CURLOPT_URL, $myPostUrl);
$curl->setopt(CURLOPT_HTTPHEADER,
['Content-Type: application/json']); #For my use case
$curl->setopt(CURLOPT_POSTFIELDSIZE_LARGE, $size);
$curl->setopt(CURLOPT_READDATA, \*IN);
my $retcode = $curl->perform;
if ($retcode == 0) {
print("File upload success\n");
}
else {
print("An error happened: $retcode ".$curl->strerror($retcode)."\n");
}
The key is providing an open filehandle reference to CURLOPT_READDATA. After that, the core curl library handles the reads from it without any need for callbacks.

Related

In mojolicious, how to protect images from public view

Somebody can help me, please. I have app in mojolicious Lite
I wanna block images from all without session login
when i type http://m.y.i.p:3000/images/imageX.jpg
i wanna show images only with session login.
my html code is
shwo image 1
Same way as any other content. Set up a handler for requests, render (or don't render) the content.
get '/images/*img' => sub {
my $c = shift;
if (!$c->session("is_authenticated")) {
return $c->render( text => "Forbidden", status => 403 );
}
my $file = $c->param("img");
if (!open(my $fh, '<', $IMAGE_DIR/$file)) {
return $c->render( text => "Not found", status => 404 );
}
my $data = do { local $/; <$fh> };
close $fh;
$c->render( data => $data, format => 'jpg' );
};
Your request handler will take priority over the default handler that serves content from public folders, but once you have this handler in place, you don't need to store the files it serves in a public folder.
another sololution is
get '/pays/*img' => sub {
my $self = shift;
my $img = $self->param('img');
plugin 'RenderFile';
my #imgext = split(/\./, $img);
my $ext = $imgext[-1];
$self->render_file(
'filepath' => "/directiry/$img",
'format' => "$ext", # will change Content-Type "application/x-download" to "application/pdf"
'content_disposition' => 'inline', # will change Content-Disposition from "attachment" to "inline"
# delete file after completed
);
It is use plugin 'RenderFile';

How do I do a chunked transfer-encoding upload with WWW:Mechanize?

I'm attempting to use a particular web service, and I can successfully perform the upload with the following command:
curl -X POST --header "Transfer-Encoding: chunked" -d #Downloads/file.pdf https://some.webservice/upload
I get back a json response indicate success.
However, I'm unable to figure out how to do the same with WWW::Mechanize.
$mech->post("https://" . $server . "/upload", Content_Type => 'multipart/form-data', Content => [upID => $upid, name => $dlfile, userID => 0, userK => 0, file_0 => [$dlfile]]);
This receives a similar json response with a big fat error message in it.
Do I need to explicitly set the Transfer-Encoding header first? Is there some other trick to it? Google's not shedding much light on this, Perlmonks neither, and the documentation's a little obtuse.
You can do it using HTTP::Request::StreamingUpload
my $starttime = time();
my $req = HTTP::Request::StreamingUpload->new(
POST => $url,
path => $file,
headers => HTTP::Headers->new(
'Transfer-Encoding' => 'chunked'
),
);
my $gen = $req->content;
die unless ref($gen) eq "CODE";
my $total = 0;
$req->content(sub {
my $chunk = &$gen();
$total += length($chunk);
print "\r$total / $size bytes ("
. int($total/$size*100)
. "%) sent, "
. int($total/1000/(time()-$starttime+1))
. " k / sec ";
return $chunk;
});
my $resp = $ua->request($req);
print "\n";
unless ($resp->is_success) {
die "Failed uploading the file: ", $resp->status_line;
}
my $con = $resp->content;
return $con;
Do you really need WWW::Mechanize? It is a subclass of LWP::UserAgent with additional functionality that gives browser-like functionality like filling in and submitting forms, clicking links, a page history with a "back" operation etc. If you don't need all of that then you may as well use LWP::UserAgent directly
Either way, the post method is inherited unchanged from LWP::UserAgent, and it's fine to use it directly as you have done
The way to send a chunked POST is to set the Content to a reference to a subroutine. The subroutine must return the next chunk of data each time it is called, and finally ann empty string or undef when there is no more to send
Is the data supposed to be a JSON string?
It's easiest to write a factory subroutine that returns a closure, like this
sub make_callback {
my ($data) = shift;
sub { substr($data, 0, 512, "") }
}
Then you can call post like this
my $payload = to_json(...);
$mech->post(
"https://$server/upload",
Content_Type => 'multipart/form-data',
Content => make_callback($payload)
);
Please be aware that all of this is untested

Upload media to twitter using perl

Has anyone successfully uploaded media to Twitter, ie posted a tweet with an image using Perl?
I would like to upload images from my blog without doing it all manually.
update_with_media(status, media[]) is deprecated, and crashes. Twitter says to use plain update(), passing a media id. However it is first necessary to upload the media to get the id, and I can find no code examples.
Any experience in this area?
Cheers,
Peter
I thought I'd add an update here even though the thread is quite old. I too was looking for an answer to how to use Perl to upload media to twitter.
Net::Twitter is perfectly capable of sending PNG images up to Twitter, though the documentation is poor. The OP is correct that update_with_media is deprecated and crashed for us.
You do need to use the $nt->upload AND $nt->update methods combined. You upload the RAW PNG file encoded with base64, I could not get the RAW PNG file upload to work but no issues when base64 encoded. The upload method returns, on success, a JSON structure which has the media_ids in. These id's are then passed with the $nt->update method. Here's some actual code
use Net::Twitter;
use File::Slurp;
use MIME::Base64;
use Data::Dumper;
my $nt = Net::Twitter->new(
ssl => 1,
traits => [qw/API::RESTv1_1/],
consumer_key => $config->{twitter}{api_key},
consumer_secret => $config->{twitter}{api_secret},
access_token => $config->{twitter}{access_token},
access_token_secret => $config->{twitter}{access_token_secret},
);
my $filename = "<somelink to a PNG file>";
my $file_contents = read_file ($filename , binmode => ':raw');
my $status = $nt->upload(encode_base64($file_contents));
print "SendTweet: status = ".Dumper($status);
my $status2;
eval {
$status2 = $nt->update({status => $s , media_ids => $status->{media_id} });
print "status2 = ".Dumper($status2);
};
if ($#)
{
print "Error: $#\n";
}
The code is pulled directly from our working test code so should work. This code is purely proof of concept so you will need to add in all your twitter authentication etc.
We have only done PNG files but I see no reason why video etc shouldn't work fine as we simply followed the Twitter docs.
Rob
In the end I used readpipe with twurl. Had I known about twurl in the first place I likely would not have bothered with Net::Twitter! twurl works great, and returns a full json string to tell you what went wrong, if anything.
Pre-requisite: you need to get the Oauth keys for your twitter account. (See here - https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html). There are some step by step online exmaples elsewhere also that can help.
Here's the code I ended up with.
First call the tweet module (and these are not valid keys by the way - just insert yours)
use Jimtweet;
my $tweet=Jimtweet->new();
$tweet->consumer_key('KvfevhjwkKJvinvalidkeycvhejwkKJVCwe');
$tweet->consumer_secret('KvfevhjwkKJvcvnvalidkeyhejwkKJVCwe');
$tweet->oauth_token('KvfevhjwkKJvcvhenvalidkeyjwkKJVCwKvfevhjwkKJvcvhejwkKJVCweVU');
$tweet->oauth_token_secret('KvfevhjwkKJvcvhejwnvalidkeykKJVCwe');
my $res = $tweet->update_status($message, $ENV{DOCUMENT_ROOT}.$li);
Jimtweet is a free module I found online (I forget where), but I had to modify it to do the image upload. It follows below.
$message is the text status message to send
$li contains the local path to the file I want to upload. This is a file local on the server. $ENV{DOCUMENT_ROOT} contains the server path to the public HTML files on my website.
$res contains a JSON reply from twitter you can look at if you need to.
If you want to use this, cut & paste everything from 'package Jimtweet;' and on into a file called Jimtweet.pm which the above code should use. If your perl installation can't find the module try adding the line use lib "/home/your/path/to/jimtweet/directory;" before the use Jimtweet; line.
Twitter requires an image to be uploaded, it then returns a media_id, then you include the media_id in a regular message you want to post. See Jimtweet package below:
package Jimtweet;
#####JimTweet 0.1 By James Januszka 2010
#####Email:jimjanuszka#gmail.com
#####Twitter:#jamesjanuszka
#####Modifications by John Bell to include image upload. Twitter: #BellUkcbajr
use strict;
use warnings;
use LWP;
use HTTP::Headers;
use URI::Encode qw(uri_encode);
use URI::Escape qw(uri_escape);
use Digest::HMAC_SHA1;
####Constructor####
sub new {
my $self={};
$self->{OAUTH_VERSION}=uri_escape("1.0");
$self->{OAUTH_SIGNATURE_METHOD}=uri_escape("HMAC-SHA1");
$self->{OAUTH_TIMESTAMP}=undef;
$self->{OAUTH_NONCE}=undef;
$self->{AGENT}="jimtweet/0.1";
#####################
#Use this for status updates
$self->{URLx}="https://api.twitter.com/1.1/statuses/update.json";
#####################
#Use this for image upload
$self->{URL}="https://upload.twitter.com/1.1/media/upload.json";
$self->{BROWSER}=LWP::UserAgent->new(agent =>$self->{AGENT});
$self->{CONSUMER_KEY}=undef;
$self->{CONSUMER_SECRET}=undef;
$self->{OAUTH_TOKEN}=undef;
$self->{OAUTH_TOKEN_SECRET}=undef;
$self->{STATUS}=undef;
$self->{MEDIAurl}=undef;
bless($self);
return $self;
}
sub consumer_key{
my $self=shift;
if (#_) { $self->{CONSUMER_KEY}=uri_escape(shift) }
return $self->{CONSUMER_KEY};
}
sub consumer_secret{
my $self = shift;
if (#_) { $self->{CONSUMER_SECRET}=uri_escape(shift) }
return $self->{CONSUMER_SECRET};
}
sub oauth_token{
my $self = shift;
if (#_) { $self->{OAUTH_TOKEN}=uri_escape(shift) }
return $self->{OAUTH_TOKEN};
}
sub oauth_token_secret{
my $self = shift;
if (#_) { $self->{OAUTH_TOKEN_SECRET}=uri_escape(shift) }
return $self->{OAUTH_TOKEN_SECRET};
}
sub update_status(#){
sleep(2);
my $self = shift;
if (#_) { $self->{STATUS}=uri_escape(shift) }
if (#_) { $self->{MEDIAurl}=shift }
#Got parameters. Now create the POST to upload an image
my $seconds = time();
$self->{OAUTH_TIMESTAMP}=uri_escape($seconds);
$self->{OAUTH_NONCE}=$self->{OAUTH_TIMESTAMP};
#####################
#Use this for uploads. Parameters have to be in alphabetical order!
my $query=qq(oauth_consumer_key=$self->{CONSUMER_KEY}&oauth_nonce=$self->{OAUTH_NONCE}&oauth_signature_method=$self->{OAUTH_SIGNATURE_METHOD}&oauth_timestamp=$self->{OAUTH_TIMESTAMP}&oauth_token=$self->{OAUTH_TOKEN}&oauth_version=$self->{OAUTH_VERSION});
my $sig="POST&";
$sig .=uri_encode($self->{URL},1);
$sig .="&";
$sig .=uri_encode($query,1);
my $sig_key=$self->{CONSUMER_SECRET};
$sig_key .="&";
$sig_key .=$self->{OAUTH_TOKEN_SECRET};
my $hmac = Digest::HMAC_SHA1->new($sig_key);
$hmac->add($sig);
my $oauth_signature_base64=$hmac->b64digest;
$oauth_signature_base64 .="=";
my $utf8_oauth_signature_base64=uri_escape($oauth_signature_base64);
my $header=qq(OAuth oauth_nonce="$self->{OAUTH_NONCE}", oauth_signature_method="$self->{OAUTH_SIGNATURE_METHOD}", oauth_timestamp="$self->{OAUTH_TIMESTAMP}", oauth_consumer_key="$self->{CONSUMER_KEY}", oauth_token="$self->{OAUTH_TOKEN}", oauth_signature="$utf8_oauth_signature_base64", oauth_version="$self->{OAUTH_VERSION}");
my $res = $self->{BROWSER}->post(
$self->{URL},
'Authorization' => $header,
'content-type' => 'form-data',
'Content' => [ media => ["$self->{MEDIAurl}"] ]
);
use JSON;
my $response = decode_json ($res->content);
my $media_id = $response->{'media_id'};
$seconds = time();
$self->{OAUTH_TIMESTAMP}=uri_escape($seconds);
$self->{OAUTH_NONCE}=$self->{OAUTH_TIMESTAMP}
my $queryx=qq(media_ids=$media_id&oauth_consumer_key=$self->{CONSUMER_KEY}&oauth_nonce=$self->{OAUTH_NONCE}&oauth_signature_method=$self->{OAUTH_SIGNATURE_METHOD}&oauth_timestamp=$self->{OAUTH_TIMESTAMP}&oauth_token=$self->{OAUTH_TOKEN}&oauth_version=$self->{OAUTH_VERSION}&status=$self->{STATUS});
my $sigx="POST&";
$sigx .=uri_encode($self->{URLx},1);
$sigx .="&";
$sigx .=uri_encode($queryx,1);
my $hmacx = Digest::HMAC_SHA1->new($sig_key);
$hmacx->add($sigx);
my $oauth_signature_base64x=$hmacx->b64digest;
$oauth_signature_base64x .="=";
my $utf8_oauth_signature_base64x=uri_escape($oauth_signature_base64x);
my $headerx=qq(OAuth oauth_nonce="$self->{OAUTH_NONCE}", oauth_signature_method="$self->{OAUTH_SIGNATURE_METHOD}", oauth_timestamp="$self->{OAUTH_TIMESTAMP}", oauth_consumer_key="$self->{CONSUMER_KEY}", oauth_token="$self->{OAUTH_TOKEN}", oauth_signature="$utf8_oauth_signature_base64x", oauth_version="$self->{OAUTH_VERSION}");
#And done generating content. Now to POST to twitter.
$res = $self->{BROWSER}->post(
$self->{URLx},
'Authorization' => $headerx,
'content-type' => 'application/x-www-form-urlencoded',
'Content' => qq(media_ids=$media_id&status=$self->{STATUS})
);
return $res;
}
####Footer####
1; #so the require or use succeeds

Send a HTTP POST Request(xml data ) using WWW::Curl in perl

I want to use WWW::Curl instead of LWP::UserAgent to send a post request.
Below is the Code using LWP::UserAgent which works fine.
my $agent = LWP::UserAgent->new(agent => 'perl post');
push #{ $agent->requests_redirectable }, 'POST';
my $header = HTTP::Headers->new;
$header->header('Content-Type' => "text/xml; charset=UTF-8");
$header->content_encoding('gzip');
utf8::encode( my $utf8_content = $args{content} );
sinfo $utf8_content;
$error->description($utf8_content);
$error->log;
my $request = HTTP::Request->new(POST => $args{url}, $header, $utf8_content);
my $response = $agent->request($request);
I need to rewrite this code using WWW::Curl as Curl is faster than LWP.
I have tried the below code but it returns me code '35' as response, which
means the request is invalid.
my $curl = WWW::Curl::Easy->new();
$curl->setopt(WWW::Curl::Easy::CURLOPT_HEADER,1);
$curl->setopt(WWW::Curl::Easy::CURLOPT_URL,$self->uri());
$curl->setopt(WWW::Curl::Easy::CURLOPT_POST, 1);
$curl->setopt(WWW::Curl::Easy::CURLOPT_POSTFIELDS, $utf8_content);
my $response;
$curl->setopt(WWW::Curl::Easy::CURLOPT_WRITEDATA,\$response);
my $retcode = $curl->perform();
The data i pass in the post request ($utf8_content) is a xml string ,sample xml :
<Request>
<Source>
<RequestorID Password="PASS" Client="Client" EMailAddress="email#address.com"/>
<RequestorPreferences Language="en">
<RequestMode>SYNCHRONOUS</RequestMode>
</RequestorPreferences>
</Source>
<RequestDetails>
<SearchRequest>
<ItemDestination DestinationType="area" DestinationCode="XYZ"/>
</ItemDestination>
</SearchRequest>
</RequestDetails>
</Request>
Moreover the response too will be a xml string which can be retrieved from $response;
In theory, this should work, but doesn't. The problem is that $utf8_content_gzip contains a \0 in the middle and the C API truncates the request body. If this is a bug and not just a misunderstanding of mine how to talk to WWW::Curl, then either have the bug fixed or work around by simply not encoding the request.
use utf8;
use strictures;
use Devel::Peek qw(Dump);
use Encode qw(encode);
use HTTP::Response qw();
use IO::Compress::Gzip qw(gzip $GzipError);
use WWW::Curl::Easy qw();
my $utf8_content_gzip;
{
my $utf8_content = encode('UTF-8', '<root>Třistatřicettři stříbrných stříkaček stříkalo přes třistatřicettři stříbrných střech.</root>', Encode::LEAVE_SRC | Encode::FB_CROAK);
gzip(\$utf8_content, \$utf8_content_gzip)
or die sprintf 'gzip error: %s', $GzipError;
}
Dump $utf8_content_gzip;
my $xml;
{
my $curl = WWW::Curl::Easy->new;
$curl->setopt(WWW::Curl::Easy::CURLOPT_HEADER(), 1);
$curl->setopt(WWW::Curl::Easy::CURLOPT_URL(), 'http://localhost:5000');
$curl->setopt(WWW::Curl::Easy::CURLOPT_HTTPHEADER(), ['Content-Type: text/xml; charset=UTF-8', 'Content-Encoding: gzip']);
$curl->setopt(WWW::Curl::Easy::CURLOPT_POST(), 1);
$curl->setopt(WWW::Curl::Easy::CURLOPT_POSTFIELDS(), $utf8_content_gzip);
my $response;
$curl->setopt(WWW::Curl::Easy::CURLOPT_WRITEDATA(), \$response);
my $retcode = $curl->perform;
if (0 == $retcode) {
$response = HTTP::Response->parse($response);
$xml = $response->decoded_content;
} else {
die sprintf 'libcurl error %d (%s): %s', $retcode, $curl->strerror($retcode), $curl->errbuf;
}
}
Have you tried $curl->setopt(CURLOPT_SSLVERSION, CURL_SSLVERSION_SSLv3);?

How to post non-latin1 data to non-UTF8 site using perl?

I want to post russian text on a CP1251 site using LWP::UserAgent and get following results:
# $text="Русский текст"; obtained from command line
FIELD_NAME => $text # result: Г?в г'В?г'В?г'В?г?вєг?вёг?в? Г'В'Г?вчг?вєг'В?г'В'
$text=Encode::decode_utf8($text);
FIELD_NAME => $text # result: Р с?с?с?рєрёр? С'Рчрєс?с'
FIELD_NAME => Encode::encode("cp1251", $text) # result: Г?гіг+г+гЄгёгЏ ГІгҐгЄг+гІ
FIELD_NAME => URI::Escape::uri_escape_utf8($text) # result: D0%a0%d1%83%d1%81%d1%81%d0%ba%d0%b8%d0%b9%20%d1%82%d0%b5%d0%ba%d1%81%d1%82
How can I do this? Content-Type must be x-www-form-urlencoded. You can find similar form here, but there you can just escape any non-latin character using &#...; form, trying to escape it in FIELD_NAME results in 10561091108910891 10901077108210891 (every &, # and ; stripped out of the string) or 1056;усский текст (punctuation characters at the beginning of the string are stripped out) depending on what the FIELD_NAME actually is.
UPDATE: Anybody knows how to convert the following code so that it will use LWP::UserAgent::post function?
my $url=shift;
my $fields=shift;
my $request=HTTP::Request->new(POST => absURL($url));
$request->content_type('application/x-www-form-urlencoded');
$request->content_encoding("UTF-8");
$ua->prepare_request($request);
my $content="";
for my $k (keys %$fields) {
$content.="&" if($content ne "");
my $c=$fields->{$k};
eval {$c=Encode::decode_utf8($c)};
$c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
$content.="$k=".URI::Escape::uri_escape($c);
}
$request->content($content);
my $response=$ua->simple_request($request);
This code actually solves the problem, but I do not want to add the third request wrapper function (alongside with get and post).
One way around it appears to be (far from the best, I think) to use recode system command if you have it avialable. From http://const.deribin.com/files/SignChanger.pl.txt
my $boardEncoding="cp1251"; # encoding used by the board
$vals{'Post'} = `fortune $forunePath | recode utf8..$boardEncoding`;
$res = $ua->post($formURL,\%vals);
Another approach seems to be in http://mail2lj.nichego.net/lj.txt
my $formdata = $1 ;
my $hr = ljcomment_string2form($formdata) ;
my $req = new HTTP::Request('POST' => $ljcomment_action)
or die "new HTTP::Request(): $!\n" ;
$hr->{usertype} = 'user' ;
$hr->{encoding} = $mh->mime_attr('content-type.charset') ||
"cp1251" ;
$hr->{subject} = decode_mimewords($mh->get('Subject'));
$hr->{body} = $me->bodyhandle->as_string() ;
$req->content_type('application/x-www-form-urlencoded');
$req->content(href2string($hr)) ;
my $ljres = submit_request($req, "comment") ;
if ($ljres->{'success'} eq "OK") {
print STDERR "journal updated successfully\n" ;
} else {
print STDERR "error updating journal: $ljres->{errmsg}\n" ;
send_bounce($ljres->{errmsg}, $me, $mh->mime_attr("content-type.charset")) ;
}
Use WWW::Mechanize, it takes care of encoding (both character encoding and form encoding) automatically and does the right thing if a form element's accept-charset attribute is set appropriately. If it's missing, the form defaults to UTF-8 and thus needs correction. You seem to be in this situation. By the way, your example site's encoding is KOI8-R, not Windows-1251. Working example:
use utf8;
use WWW::Mechanize qw();
my $message = 'Русский текст';
my $mech = WWW::Mechanize->new(
cookie_jar => {},
agent => 'Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/533.9 SUSE/6.0.401.0-2.1 (KHTML, like Gecko)',
);
$mech->get('http://zhurnal.lib.ru/cgi-bin/comment?COMMENT=/z/zyx/index_4-1');
$mech->current_form->accept_charset(scalar $mech->response->content_type_charset);
$mech->submit_form(with_fields => { TEXT => $message });
HTTP dump (essential parts only):
POST /cgi-bin/comment HTTP/1.1
Content-Length: 115
Content-Type: application/x-www-form-urlencoded
FILE=%2Fz%2Fzyx%2Findex_4-1&MSGID=&OPERATION=store_new&NAME=&EMAIL=&URL=&TEXT=%F2%D5%D3%D3%CB%C9%CA+%D4%C5%CB%D3%D
These functions solve the issue (first for posting application/x-www-form-urlencoded data and second for multipart/form-data):
#{{{2 postue
sub postue($$;$) {
my $url=shift;
my $fields=shift;
my $referer=shift;
if(defined $referer and $referer eq "" and defined $fields->{"DIR"}) {
$referer=absURL($url."?DIR=".$fields->{"DIR"}); }
else {
$referer=absURL($referer); }
my $request=HTTP::Request->new(POST => absURL($url));
$request->content_type('application/x-www-form-urlencoded');
$request->content_encoding("UTF-8");
$ua->prepare_request($request);
my $content="";
for my $k (keys %$fields) {
$content.="&" if($content ne "");
my $c=$fields->{$k};
if(not ref $c) {
$c=Encode::decode_utf8($c) unless Encode::is_utf8($c);
$c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
$c=URI::Escape::uri_escape($c);
}
elsif(ref $c eq "URI::URL") {
$c=$c->canonical();
$c=URI::Escape::uri_escape($c);
}
$content.="$k=$c";
}
$request->content($content);
$request->referer($referer) if(defined $referer);
my $i=0;
print STDERR "Doing POST request to url $url".
(($::o_verbose>2)?(" with fields:\n".
::YAML::dump($fields)):("\n"))
if($::o_verbose>1);
REQUEST:
my $response=$ua->simple_request($request);
$i++;
my $code=$response->code;
if($i<=$o_maxtries and 500<=$code and $code<600) {
print STDERR "Failed to request $url with code $code... retrying\n"
if($::o_verbose>2);
sleep $o_retryafter;
goto REQUEST;
}
return $response;
}
#{{{2 postfd
sub postfd($$;$) {
my $url=absURL(shift);
my $content=shift;
my $referer=shift;
$referer=absURL($referer) if(defined $referer);
my $i=0;
print STDERR "Doing POST request (form-data) to url $url".
(($::o_verbose>2)?(" with fields:\n".
::YAML::dump($content)):("\n"))
if($::o_verbose>1);
my $newcontent=[];
while(my ($f, $c)=splice #$content, 0, 2) {
if(not ref $c) {
$c=Encode::decode_utf8($c) unless Encode::is_utf8($c);
$c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
}
push #$newcontent, $f, $c;
}
POST:
my $response=$ua->post($url, $newcontent,
Content_type => "form-data",
((defined $referer)?(referer => $referer):()));
$i++;
my $code=$response->code;
if($i<=$o_maxtries and 500<=$code and $code<600) {
print STDERR "Failed to download $url with code $code... retrying\n"
if($::o_verbose>2);
sleep $o_retryafter;
goto POST;
}
return $response;
}