LWP GET large file download - perl

I have a special case for file download. I need to do chunked download for large files and also need to pass parameters to the CGI script prior to download.
It is really a REST interface. I have searched all over the Internet, and there are lots of stuff on the download part, and lots of stuff on the parameter part, but when I go to put them together I get errors. Also note that I do a POST in a similar way, and it works fine. Here is my code snip:
# $filename, $target, $url, $bs, etc. are all set...
my $bytes_received = 0;
open (FH, ">", "$filename") or $logger->error("Couldn't open $filename for writing: $!" );
my $ua = LWP::UserAgent->new();
my $res = $ua->get(
$url,
':content_cb' => \&callback,
'Content' => {
"api" => 'olfs',
"cmd" => 'rfile',
"target" => $target,
"bs" => $bs});
print $logger->info("$bytes_received bytes received");
sub callback{
my($chunk, $res) = #_;
$bytes_received += length($chunk);
print FH $chunk;
}
Here are the errors:
Not a SCALAR reference at /usr/local/share/perl5/HTTP/Message.pm line 163.
at /usr/local/share/perl5/HTTP/Message.pm line 163
HTTP::Message::add_content('HTTP::Request=HASH(0x1956a88)', 'HASH(0x7fdfda565e88)') called at /usr/local/share/perl5/HTTP/Request/Common.pm line 111
HTTP::Request::Common::_simple_req(undef, undef) called at /usr/local/share/perl5/HTTP/Request/Common.pm line 20
HTTP::Request::Common::GET('http://10.0.0.15:8084/cgi-bin/olss.cgi', 'Content', 'HASH(0x7fdfda565e88)') called at /usr/local/share/perl5/LWP/UserAgent.pm line 410
LWP::UserAgent::get('LWP::UserAgent=HASH(0x191a220)', 'http://10.0.0.15:8084/cgi-bin/olss.cgi', ':content_cb', 'CODE(0x1845818)', 'Content', 'HASH(0x7fdfda565e88)') called at ./olfs_get.pl line 72
Debugged program terminated. Use q to quit or R to restart,
use o inhibit_exit to avoid stopping after program termination,
h q, h R or h o to get additional info.
DB<3> print oct("764")
500
DB<4>

$ua->get( $url )
$ua->get( $url , $field_name => $value, ... )
This method will dispatch a GET request on the given $url. Further arguments can be given to initialize the headers of the request.
There's no such thing as a Content header. ->post uses this to create the message-body, which is never used for GET requests. If you want to build a url, you can use URI.
$ua->post( $url, $field_name => $value,... Content => \%form )
$ua->post( $url, $field_name => $value,... Content => \#form )
$ua->post( $url, $field_name => $value,... Content => $content )

Related

Adding a .zip file to the body of an LWP::UserAgent POST request

I believe I have a general Perl problem, rather than an LWP::UserAgent problem... however its somewhat complex.
The task is to write a test-script that does a SWORD deposit.
I create tests by first writing code to prove the thing works, then add in the Test::More wrappers to make it a test.
BACKGROUND
A SWORD deposit is simply an http post request with a bunch of defined headers, and the content of the body being the thing to be ingested. This all works fine, I can perform the actions through CURL, and I've written scripts to do this.... but within a a larger application environment (that'll be EPrints.)
CODE
My problem, I believe, comes when I try to attach the contents of the file on the disk.
#!/home/cpan/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
##use WWW::Mechanize;
use File::Slurp;
use MIME::Base64;
my $auth = 'username:password';
my $domain = 'devel.example.com';
my $ua = LWP::UserAgent->new();
my $basedir = "./test_files";
my $package = 'http://opendepot.org/europePMC/2.0';
my $filename = "$basedir/PMC165035.zip";
my $mime = 'application/zip';
print "filename: $filename\n";
my $deposit_url = $domain . '/sword-app/deposit/archive';
my $file = read_file( $filename, { binmode => ':raw' } );
# Set up the SWORD deposit
my $autho = "Basic " . MIME::Base64::encode( $auth, '' );
my %headers = (
'X-Packaging' => $package,
'X-No-Op' => 'false',
'X-Verbose' => 'true',
'Content-Disposition' => "filename=$filename",
'Content-Type' => $mime,
'User-Agent' => 'Broker Test Harness',
'Authorization' => $autho,
);
my $r = $ua->post( $deposit_url, %headers, Content => $file );
# DEBUG TEST
write_file('foo.zip', $file);
my $ret = $r->decoded_content;
print "Content: $ret\n";
if ( $r->is_success ) { print "Deposited $package successfully" }
WHAT WORKS, WHAT DOESN'T
This code is lifted pretty much directly from working code I have - the only difference is that the working code gets the content for $file via an object-call within EPrints.
I know the file exists on the disk, if I do an ls -l on the filename printed, I can see the file, and its readable
In the code above, there is a line write_file('foo.zip', $file); - that writes a file which unzip -l foo.zip happily tells me has 3 files in it.
The line print "Content: $ret\n"; should print an atom response - for me, it prints nothing....
The Access log reports an error 500, but there's diddly-squat in the error-log.
The help
What I need to know is how I get the actual contents of the .zip file into the content part of the LWP::UserAgent post request...
(I'm going to spend much time not trying to dig into EPrints, to track where the error-500 is coming from, and why nothing appears in the log file.... but that's probably going to be down to an issue with what's been posted)
The solution lies in realizing what LWP POST is doing.
my $filename = "$basedir/PMC165035.zip";
my $file = read_file( $filename, { binmode => ':raw' } );
my %headers = (
'X-Packaging' => $package,
'X-No-Op' => 'false',
'X-Verbose' => 'true',
'Content-Disposition' => "filename=$filename",
'Content-Type' => $mime,
'User-Agent' => 'Broker Test Harness',
'Authorization' => $autho,
);
All work by setting $filename to be something like /home/services/foo/testing/test_files/PMC165035.zip, and passing this (full) filename to the server example.com.
The problem is that the server is looking for a filename, not a filename-with-path... so when the service does its thing with the file by dumping the content into its temporary upload location, and then it looks for ~~~temp_location/home/services/foo/testing/test_files/PMC165035.zip, it can't find it!
The solution is to read in the file, but ensure that the filename given in the headers is just the filename, not with-a-path

Perl LWP::UserAgent only reading first line of posted jpg file

I'm sure this has been asked but I could not find a good question to my answer. I've got two scripts, one makes a post using LWP::UserAgent and the other basically receives the data, in this case I'm just looking to write a file. The file does get written but it can't be opened and the size is 1262 which leads me to believe only some of it is being read.
Here's what I'm doing (It's worth noting, I was uri/base64 encoding the file and passing everything via json, which was working, but my new task is the split the files out and pass everything as form params):
post script:
open (IMAGE, "./flower.jpg") or die "$!";
$raw_string1 = do{ local $/ = undef; <IMAGE>; };
my $req = HTTP::Request->new(POST => $url);
$req->content_type("application/x-www-form-urlencoded");
$req->content("json_string=$json&file_1=$raw_string1");
my $ua = LWP::UserAgent->new;
$res = $ua->request($req);
print $res->content;
Receiver script:
$cgi = CGI->new;
my $json_post = $cgi->param('json_string');
my $file_1 = $cgi->param('file_1');
open my $fh, '>', "$path/flower.jpg" or die $!;
binmode $fh;
print $fh $file_1;
close $fh;
Thanks for help in advance!
As previously mentioned, you have an encoding problem. The solution is simple:
my $req = HTTP::Request->new(POST => $url, [
json_string => $json,
file_1 => $raw_string1,
]);
which is short for
my $req = HTTP::Request->new(POST => $url,
Content_Type => 'application/x-www-form-urlencoded',
Content => [
json_string => $json,
file_1 => $raw_string1,
]
);
It's far more typical to use multipart/form-data to upload files, though. And if you do, you can even let HTTP::Request load the file for you!
my $req = HTTP::Request->new(POST => $url,
Content_Type => 'multipart/form-data',
Content => [
json_string => $json,
file_1 => [ 'flower.jpg' ],
]
);
CGI.pm will handle that no problem. See the section of the docs titled "PROCESSING A FILE UPLOAD FIELD".
You're not encoding the $raw_string1 data before sticking it into the HTTP POST body. That means that if the data in flower.jpg includes a hex 0x26 byte (the & char) -- at position 1263, let's say -- then the POST data will look like this:
json_string={ ... JSON data ... }&file_1=...1262 bytes of raw JPEG data...&...more JPEG data...
... which means that anything parsing the form body for form variables will truncate file_1 after 1262 bytes.
I'd recommend continuing to encode the $raw_string1 data with base64 or something similar, even if you pass it through as its own POST variable rather than adding it to the JSON data.

How to quote this parameter in perl module

I'm trying to use the AnyEvent::Twitter::Stream module and I want to reference a file that lists the Twitter uids I want to follow. I can put the uids in the code itself and it works as follows:
my $done = AnyEvent->condvar;
my $nt_filter = AnyEvent::Twitter::Stream->new(
username => $cf{account},
password => $cf{password},
method => 'filter',
follow => '15855509,14760150,18598536',
on_tweet => sub {
#some code.....
},
on_error => sub {
my $error = shift;
debug "ERROR: $error";
},
timeout => 45,
);
$done->recv;
But when I try to do the same using a file as such:
my $done = AnyEvent->condvar;
my $nt_filter = AnyEvent::Twitter::Stream->new(
open UID_FILE, "/tmp/uids" or die $!;
my #uid_line = <UID_FILE>;
username => $cf{account},
password => $cf{password},
method => 'filter',
follow => #uid_file,
on_tweet => sub {
#some code....
},
on_error => sub {
my $error = shift;
debug "ERROR: $error";
},
timeout => 45,
);
$done->recv;
it fails. The uids file has the following contents:
'15855509,14760150,18598536'
I'm getting a 406 error from Twitter, suggesting the format is not correct. I'm guessing the quotes are not correct somehow?
The AnyEvent::Twitter::Stream module subclasses AnyEvent and you don't need to access the base module at all. All the functionality is provided by the new method and the callbacks that you specify there, and you shouldn't call AnyEvent->condvar or $done->recv.
The open call and assignment to #uid_line don't belong inside the call to AnyEvent::Twitter::Stream->new.
Furthermore the variable you are using to supply the value for the follow parameter is #uid_file instead of #uid_line.
You must use strict; and use warnings; at the start of your programs, especially if you are asking for help with them. This will trap simple mistakes like this that you could otherwise overlook.
You can't in general use an array to supply a single scalar value. In this instance it may be OK as long as the file has only a single line (so there is only one element in the array) but there is a lot that could go wrong.
In addition you are passing single-quotes in the value that don't belong there: they appear in the code only to mark the start and end of the Perl string.
I suggest you read all decimal strings from your file like this
open my $uid_fh, '<', '/tmp/uids' or die $!;
my #uids;
push #uids, /\d+/g for <$uid_fh>;
(Note that these lines belong before the call to AnyEvent::Twitter::Stream->new)
Then you can supply the follow parameter by writing
follow => join(',', #uids),
I hope this is clear. Please ask again if you need further help.
Edit
These changes incorporated into your code should look like this, but it is incomplete and I cannot guarantee that it will work properly.
use strict;
use warnings;
use AnyEvent::Twitter::Stream;
open my $uid_fh, '<', '/tmp/uids' or die $!;
my #uids;
push #uids, /\d+/g for <$uid_fh>;
my %cf = (
account => 'myaccount',
password => 'password',
);
my $nt_filter = AnyEvent::Twitter::Stream->new(
username => $cf{account},
password => $cf{password},
method => 'filter',
follow => join(',', #uids),
on_tweet => sub {
#some code....
},
on_error => sub {
my $error = shift;
debug "ERROR: $error";
},
timeout => 45,
);
It looks to me like you are trying to pass an array in a scalar context so it is possible that you are setting follow to 1, the number of elements in the array (the number of lines in the file).
Assuming there is only a single line of IDs in your file, does the following work:
open UID_FILE, "/tmp/uids" or die $!;
# Load the first line of uids
my $uid_line = <UID_FILE>;
# Remove apostrophes from input.
# This may or may not be necessary
$uid_line =~ s/'//g;
# Don't forget to close the file
close(UID_FILE);
my $done = AnyEvent->condvar;
my $nt_filter = AnyEvent::Twitter::Stream->new(
username => $cf{account},
password => $cf{password},
method => 'filter',
follow => $uid_line,
on_tweet => sub {
#some code....
},
on_error => sub {
my $error = shift;
debug "ERROR: $error";
},
timeout => 45,
);
$done->recv;
You will probably need to strip the leading and trailing apostrophes from your input line. For example:
$uid_line =~ s/^'//;
$uid_line =~ s/'$//;
You could probably get away with just removing all apostrophes, e.g.:
$uid_line =~ s/'//g;

How do I upload a file from the local machine to Sharepoint using Perl SOAP::Lite?

#use SOAP::Lite ( +trace => all, maptype => {} );
use SOAP::Lite maptype => {};
use LWP::UserAgent;
use HTTP::Request::Common;
#credentials' file
require "c:\\test\\pass.pl";
my $userAgent = LWP::UserAgent->new(keep_alive => 1);
sub SOAP::Transport::HTTP::Client::get_basic_credentials {
return $username => $password;
}
my $soap
= SOAP::Lite
->uri('<mysite>/_vti_bin/lists.asmx')
->on_action(sub {join '/', 'http://schemas.microsoft.com/sharepoint/soap/CopyIntoItemsLocal', $_[1]})
->proxy('<mysite>/_layouts/viewlsts.aspx?BaseType=0', keep_alive => 1);
# my $method = SOAP::Data->name('CopyIntoItemsLocal')
# ->attr({xmlns => 'http://schemas.microsoft.com/sharepoint/soap/'});
# my #params = (SOAP::Data->name(SourceUrl => $source),
# SOAP::Data->name(DestinationUrl => $destination) );
# print $soap->call($method => #params)->result;
my $fileName = 'c:\test\abc.txt';
my $destDir = "<mysite>/Lists/sharepoint1/";
#load and encode Data
my $data;
open(FILE, $fileName) or die "$!";
#read in chunks of 57 bytes to ensure no padding in the middle (Padding means extra space for large files)
while (read(FILE, $buf, 60 * 57)) {
$data .= encode_base64($buf);
}
close(FILE);
#make the call
print "uploading $fileName...";
$lists = $soap->GetList();
my $method = SOAP::Data->name('CopyIntoItemsLocal')->attr({xmlns => 'http://schemas.microsoft.com/sharepoint/soap/'});
my #params = (
SOAP::Data->name('SourceUrl')->value($fileName)->type(''),
SOAP::Data->name('DestinationUrls')->type('')->value(
\SOAP::Data->name('string')->type('')->value($destDir . $fileName)
),
SOAP::Data->name('Fields')->type('')->value(
\SOAP::Data->name('FieldInformation')->type('')->attr({Type => 'File'})->value('')
),
SOAP::Data->name('Stream')->value("$data")->type('')
);
#print Results
print $soap->call($method => #params)->result;
#print $response->headerof('//CopyResult')->attr->{ErrorCode};
#use SOAP::Lite ( +trace => all, maptype => {} );
use SOAP::Lite maptype => {};
use LWP::UserAgent;
use HTTP::Request::Common;
use MIME::Base64 qw(encode_base64);
require "c:\\test\\pass.pl";
my $userAgent = LWP::UserAgent->new(keep_alive=>1);
#setup connection
sub SOAP::Transport::HTTP::Client::get_basic_credentials {
return $username => $password;
}
my $soap = SOAP::Lite
-> uri('http://<mysite>')
-> on_action( sub{ join '/', 'http://schemas.microsoft.com/sharepoint/soap', $_[1] })
-> proxy('http://<mysite>/_vti_bin/lists.asmx',keep_alive => 1);
$lists = $soap->GetListCollection();
quit(1, $lists->faultstring()) if defined $lists->fault();
my #result = $lists->dataof('//GetListCollectionResult/Lists/List');
foreach my $data (#result) {
my $attr = $data->attr;
foreach my $a qw'Title Description DefaultViewUrl Name ID WebId ItemCount' {
printf "%-16s %s\n", $a, $attr->{$a};
}
print "\n";
}
The authentication seems to be working. First I thought that the GetlistCollection Web service is working, as when I made call using that Web service, it returned a page. But I think the call is returning the page I specified in the proxy argument.
I am able to get the collection of list on the particular site on the sharepoint.
I have used GetListCollection. However I did not really understand the code which is printing the list. I just copied it from squish.net. Now I am trying to use the CopyIntoItemsLocal web service.
We have a repository of files on one server (SVN) and I have to write a Perl script which when executed will copy the files and directories from SVN to sharepoint along with the directory structure.
I will appreciate any help or tips. Since it is a big task, I am doing it in modules.
I would start by using soapUI (formerly by eviware, now by smartbear) an open source soapUI testing tool. This will allow you to send soap transactions back and forth without any other user interface. Once you are sure your transactions work and you can parse the data to get what you want, then I would make the move to use Perl to automate those transactions.
This helps you eliminate errors in your requests early on, figure out how to parse responses, and familiarize yourself with the API.

How do I work with just one key and value from Data::Dumper output

I have data dumper outputting a remotely hosted xml file into a local text file and I am getting the following info:
$VAR1 = {
'resource' => {
'005cd410-41d6-4e3a-a55f-c38732b73a24.xml' => {
'standard' => 'DITA',
'area' => 'holding',
'id' => 'Comp_UKCLRONLINE_UKCLR_2000UKCLR0278',
},
'003c2a5e-4af3-4e70-bf8b-382d0b4edda1.xml' => {
'standard' => 'DITA',
'area' => 'holding',
'id' => 'Comp_UKCLRONLINE_UKCLR_2000UKCLR0278',
},
etc. What I want to do is work with just one/key and value in each resource. Ie pick out the ID and then create a url from that.
I would normally use a regex on the file and pull the info I need from that but I'm thinking there must be an easier/proper way but can't think of the right term to use in a search and am therefore not finding it.
Here is the code I am using to write this output to a file:
#-----------------------------------------------
sub request_url {
#-----------------------------------------------
my $useragent = LWP::UserAgent->new;
my $request = HTTP::Request->new( GET => "http://digitalessence.net/resource.xml" );
$resource = $useragent->request( $request );
}
#-----------------------------------------------
sub file_write {
#-----------------------------------------------
open OUT, ">$OUT" or Log_message ("\n$DATE - $TIME - Could not create filelist.doc \t");
Log_message ("\n$DATE - $TIME - Opened the output file");
print OUT Dumper (XML::Simple->new()->XMLin( $resource->content ));
Log_message ("\n$DATE - $TIME - Written the output file");
}
thanks
I'm not really understanding your question, but I'm guessing you want to access some data from the hash.
You don't need a regex or other strage stuff; just `do` your data and get the value from the hassref you get back:
A simple one liner as an example (assuming your file is called `dumper.out`):
perl -Mstrict -wE 'my $hashref = do{ do "dumper.out" }; say $hashref->{resource}{"005cd410-41d6-4e3a-a55f-c38732b73a24.xml"}{id}'
HTH, Paul
Maybe you want to walk the data structure built by XML::Simple.
Each resource is inside an ARRAYREF you get using the resource key with $doc data structure.
use XML::Simple;
use LWP;
use Data::Dumper;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new( GET => "http://digitalessence.net/resource.xml" );
my $res = $ua->request( $req );
my $xs = XML::Simple->new();
my $doc = $xs->XMLin( $res->content );
printf "resources: %s\n", scalar keys %{ $doc->{ resource } };
foreach ( keys %{ $doc->{ resource } } ) {
printf "resource => %s, id => %s\n", $_, $doc->{ resource }->{ $_ }->{ id };
}
The output is this:
resources: 7
resource => 005cd410-41d6-4e3a-a55f-c38732b73a24.xml, id => Comp_UKCLRONLINE_UKCLR_2000UKCLR0278
resource => 003c2a5e-4af3-4e70-bf8b-382d0b4edda1.xml, id => Comp_UKCLRONLINE_UKCLR_2002UKCLR0059
resource => 0033d4d3-c397-471f-8cf5-16fb588b0951.xml, id => Comp_UKCLRONLINE_UKCLR_navParentTopic_67
resource => 002a770a-db47-41ef-a8bb-0c8aa45a8de5.xml, id => Comp_UKCLRONLINE_UKCLR_navParentTopic_308
resource => 000fff79-45b8-4ac3-8a57-def971790f16.xml, id => Comp_UKCLRONLINE_UKCLR_2002UKCLR0502
resource => 00493372-c090-4734-9a50-8f5a06489591.xml, id => Comp_UKCLRONLINE_COMPCS_2010_10_0002
resource => 004377bf-8e24-4a69-9411-7c6baca80b87.xml, id => Comp_CLJONLINE_CLJ_2002_01_11