Uploading large files with WebService::Dropbox - perl

Can someone here give me some example code for using the
WebService::Dropbox module
to upload files bigger than 1GB?
I followed the instructions and successfully uploaded files less than 150MB but I don't understand how to upload larger files.

The
module documentation
says this about the update method
Do not use this to upload a file larger than 150 MB. Instead, create an upload session with upload_session/start.
And this is presumably why you have mentioned 150MB in your question.
The documentation for upload_session has this
Uploads large files by upload_session API
# File Handle
my $content = IO::File->new('./mysql.dump', '<');
my $result = $dropbox->upload_session($path, $content);
my $result = $dropbox->upload_session($path, $content, {
mode => 'add',
autorename => JSON::true,
mute => JSON::false
});
Note that, just like the documentation for upload, those two examples of calling upload_session are alternatives, and you should choose the second only if you have special requirements that require non-default option values
There is also no need to use IO::File to open a file: the standard Perl open call will work fine, and you should add a :raw layer whether you are using IO::File or not, like this
open my $content, '<:raw', './mysql.dump' or die $!
There is also no need for JSON::true and JSON::false: a simple 1 and 0 will do fine
This is pretty much identical to the upload use case, which you say you have working fine. What exactly are you having problems with?

Related

PerlMagick Chokes on ICO File?

I've been working on a script that retrieves favicons from sites, which is now mostly working, but I've run into a huge roadblock. When I call on the ImageMagick module in Perl, it doesn't seem to know what to do with the venerable favicon.ico file (everything is working great when a site has a non-ICO favicon). I can find lots of information on converting to ICO to create a favicon, but not much about converting from ICO.
After I retrieve the favicon, I use PerlMagick's ping function to figure out what kind of file I'm dealing with (so I'm not dependent on the icon's server to report accurately):
use Image::Magick;
my $im = Image::Magick->new();
my ($width, $height, $size, $format) = $im->Ping( $saveFile );
When the file is an ICO file, $format comes back empty (the server I'm requesting it from reports it as image/x-icon). I also have a little subroutine that creates JPEG thumbnails of everything I download. It works great on non-ICO files, but ImageMagick creates a blank file when converting from an ICO:
open my $file, $params->{'openFile'};
my $imageData = do { local $/; <$file> };
my $image = Image::Magick->new;
$image->BlobToImage($imageData);
$image->SetAttribute(quality => 80);
$image->SetAttribute(compression => 'JPEG');
$image->SetAttribute(geometry => $thumbnailWidth . "x" . $thumbnailHeight);
$image->Thumbnail();
my $thumbnailData = $image->ImageToBlob();
open(my $file, '>', $params->{'saveFile'}) or die "Could not open file '" . $params->{'saveFile'} . "'.";
print $file $thumbnailData;
close $file;
Do I need to somehow coax ImageMagick into recognize the file? I've been saving the favicons I download and the initial file is a valid ICO, even though ImageMagick won't recognize it.
Update: Here is a link to one of the ico files that is acting up. All the ico files I've tried have acted up, however.
If I try the command line ImageMagick convert tool, here is the result:
[root#local favicons]# convert 1299 1299-jpg.jpg
convert: no decode delegate for this image format `' # error/constitute.c/ReadImage/564.
convert: no images defined `1299-jpg.jpg' # error/convert.c/ConvertImageCommand/3235.
Based on #MarkSetchell's comments, I can add code to deal with the issue laid out above. I had been depending on PerlMagick's ping function to determine the file type, hoping to avoid possible bad information from a server I connect to. What I've done now is examine the Content-type header if ImageMagick cannot determine the file type and return it in $format:
if ((! $format) and (($mech->content_type() eq "image/x-icon") or ($mech->content_type() eq "image/vnd.microsoft.icon"))) {
$format = "ICO";
}
I then manually pass along the ICO format to ImageMagick before giving it the file blob:
my %imParam;
%imParam = ( 'magick' => 'ico' ) if ($params->{'format'} eq "ICO");
my $image = Image::Magick->new( %imParam );
This seems to be working so far. Thankfully on GIF, PNG, SVG and JPEG, ImageMagick is working fine on its own, which is even better, since I'd rather trust ImageMagick than the remote server's headers.

Change output filename from WGET when using input file option

I have a perl script that I wrote that gets some image URLs, puts the urls into an input file, and proceeds to run wget with the --input-file option. This works perfectly... or at least it did as long as the image filenames were unique.
I have a new company sending me data and they use a very TROUBLESOME naming scheme. All files have the same name, 0.jpg, in different folders.
for example:
cdn.blah.com/folder/folder/202793000/202793123/0.jpg
cdn.blah.com/folder/folder/198478000/198478725/0.jpg
cdn.blah.com/folder/folder/198594000/198594080/0.jpg
When I run my script with this, wget works fine and downloads all the images, but they are titled 0.jpg.1, 0.jpg.2, 0.jpg.3, etc. I can't just count them and rename them because files can be broken, not available, whatever.
I tried running wget once for each file with -O, but it's embarrassingly slow: starting the program, connecting to the site, downloading, and ending the program. Thousands of times. It's an hour vs minutes.
So, I'm trying to find a method to change the output filenames from wget without it taking so long. The original approach works so well that I don't want to change it too much unless necessary, but i am open to suggestions.
Additional:
LWP::Simple is too simple for this. Yes, it works, but very slowly. It has the same problem as running individual wget commands. Each get() or get_store() call makes the system re-connect to the server. Since the files are so small (60kB on average) with so many to process (1851 for this one test file alone) that the connection time is considerable.
The filename i will be using can be found with /\/(\d+)\/(\d+.jpg)/i where the filename will simply be $1$2 to get 2027931230.jpg. Not really important for this question.
I'm now looking at LWP::UserAgent with LWP::ConnCache, but it times out and/or hangs on my pc. I will need to adjust the timeout and retry values. The inaugural run of the code downloaded 693 images (43mb) in just a couple minutes before it hung. Using simple, I only got 200 images in 5 minutes.
use LWP::UserAgent;
use LWP::ConnCache;
chomp(#filelist = <INPUTFILE>);
my $browser = LWP::UserAgent->new;
$browser->conn_cache(LWP::ConnCache->new());
foreach(#filelist){
/\/(\d+)\/(\d+.jpg)/i
my $newfilename = $1.$2;
$response = $browser->mirror($_, $folder . $newfilename);
die 'response failure' if($response->is_error());
}
LWP::Simple's getstore function allows you to specify a URL to fetch from and the filename to store the data from it in. It's an excellent module for many of the same use cases as wget, but with the benefit of being a Perl module (i.e. no need to outsource to the shell or spawn off child processes).
use LWP::Simple;
# Grab the filename from the end of the URL
my $filename = (split '/', $url)[-1];
# If the file exists, increment its name
while (-e $filename)
{
$filename =~ s{ (\d+)[.]jpg }{ $1+1 . '.jpg' }ex
or die "Unexpected filename encountered";
}
getstore($url, $filename);
The question doesn't specify exactly what kind of renaming scheme you need, but this will work for the examples given by simply incrementing the filename until the current directory doesn't contain that filename.

How do I properly format plain text data for a simple Perl dictionary app?

I have a very simple dictionary application that does search and display. It's built with the Win32::GUI module. I put all the plain text data needed for the dictionary under the __DATA__ section. The script itself is very small but with everything under the __DATA__ section, its size reaches 30 MB. In order to share the work with my friends, I've then packed the script into a stand-alone executable using the PP utility of the PAR::Packer module with the highest compression level 9 and now I have a single-file dictionary app of about the size of 17MB.
But although I'm very comfortable with the idea of a single-file script, placing such huge amount of text data under the script's DATA section does not feel right. For one thing, when I try opening the script in Padre (Notepad ++ is okay), I'm receiving the error that is like:
Can't open my script as the script is over the arbitrary file size limit which is currently 500000.
My questions:
Does it bring me any extra benefits except for the eliminating of Padre's file opening issue if I move everything under the DATA section to a separate text file?
If I do so, What should I do to reduce the size of the separate file? Zip it and uncompress it while doing search and display?
How do people normally format the text data needed for a dictionary application?
Any comments, ideas or suggestions? Thanks like always :)
If I do so, What should I do to reduce the size of the separate file? Zip it and uncompress it while doing search and display?
Well, it depends on WHY you want to reduce the size. If it is to minimize disk space usage (rather weird goal most of the time these days), then the zip/unzip is the way to go.
However if the goal is to minimize memory usage, then a better approach is to split up the dictionary data into smaller chunks (for example indexed by a first letter), and only load needed chunks.
How do people normally format the text data needed for a dictionary application?
IMHO the usual approach is what you get as the logical end of an approach mentioned above (partitioned and indexed data): using a back-end database, which allows you to only retrieve the data which is actually needed.
In your case probably something simple like SQLite or Berkley DB/DBM files should be OK.
Does it bring me any extra benefits except for the eliminating of Padre's file opening issue if I move everything under the DATA section to a separate text file?
This depends somewhat on your usage... if it's a never-changing script used by 3 people, may be no tangible benefits.
In general, it will make maintenance much easier (you can change the dictionary and the code logic independently - think virus definitions file vs. antivirus executable for real world example).
It will also decrease the process memory consumption if you go with the approaches I mentioned above.
Since you are using PAR::Packer already, why not move it to a separate file or module and include it in the PAR file?
The easy way (no extra commandline options to pp, it will see the use statement and do the right thing):
words.pl
#!/usr/bin/perl
use strict;
use warnings;
use Words;
for my $i (1 .. 2) {
print "Run $i\n";
while (defined(my $word = Words->next_word)) {
print "\t$word\n";
}
}
Words.pm
package Words;
use strict;
use warnings;
my $start = tell DATA
or die "could not find current position: $!";
sub next_word {
if (eof DATA) {
seek DATA, $start, 0
or die "could not seek: $!";
return undef;
}
chomp(my $word = scalar <DATA>);
return $word;
}
1;
__DATA__
a
b
c

Why is my .zip file corrupted after an HTTP file upload?

I'm trying to use a CGI script to accept and save a file from a program that is using an HTTP POST to send a zip file.
In the MIME section of the HTTP header it looks something like this:
Content-Disposition: form-data; name="el_upload_file_0"; filename="BugReport.zip";\r\n
Content-Type: application/octet-stream\r\n\r\n
In my CGI code I'm using this:
use CGI;
use strict;
my $cgi = CGI->new;
my $upload_file = $cgi->upload('el_upload_file_0');
my $time = time;
my $filename = "/tmp/$time.zip";
open TMP, ">$filename";
binmode TMP;
while (<$upload_file>) {
print TMP $_;
}
close TMP;
The file that keeps getting saved is somehow getting corrupt and is not a valid zip file. The HTTP request is being sent by a C# app and it's possible that it might be sending a corrupt zip file, but I doubt it. Is there anything I can do to troubleshoot further?
You're reading in a .zip file line by line, which is a big mistake. Lines are only relevant for text files, after all.
Read the whole thing in one shot, or if you must, do it in reasonably sized chunks. In this example it's being read in 1024 byte chunks, but you can easily use a much larger value, like 16MB (1 << 24) or whatever seems appropriate:
my $data;
while (read($upload_file, $data, 1024))
{
print TMP, $data;
}
What's the difference between the file you uploaded and the file that you ended up saving? Look at a hex dump of each.
I know it sounds stupid, but trying merely copying the file you are trying to upload to the server without using the CGI script. Can you still unzip it there? Likewise, can you take the uploaded file from the server, copy it back to your client machine, and unzip it there?
What's the rest of the HTTP header look like? Are you changing character sets or anything?
I don't suspect a problem with file translations since CGI should already set that for you. So, once anyone says "should", you have to double check :). There's a line in CGI.pm that auto-detects systems that need binmode, so I don't think you need that on $upload_file. However, maybe CGI.pm gets it wrong for you:
$needs_binmode = $OS=~/^(WINDOWS|DOS|OS2|MSWin|CYGWIN|NETWARE)/;
You might try setting the variable to true yourself just to make sure:
use CGI;
$CGI::needs_binmode = 1;
The "\r\n"'s in the header might be a clue. Does the output file contain "\r\n" sequences? Can you/should you do a binmode on the $upload_file filehandle?

How do I serve a large file for download with Perl?

I need to serve a large file (500+ MB) for download from a location that is not accessible to the web server. I found the question Serving large files with PHP, which is identical to my situation, but I'm using Perl instead of PHP.
I tried simply printing the file line by line, but this does not cause the browser to prompt for download before grabbing the entire file:
use Tie::File;
open my $fh, '<', '/path/to/file.txt';
tie my #file, 'Tie::File', $fh
or die 'Could not open file: $!';
my $size_in_bytes = -s $fh;
print "Content-type: text/plain\n";
print "Content-Length: $size_in_bytes\n";
print "Content-Disposition: attachment; filename=file.txt\n\n";
for my $line (#file) {
print $line;
}
untie #file;
close $fh;
exit;
Does Perl have an equivalent to PHP's readfile() function (as suggested with PHP) or is there a way to accomplish what I'm trying to do here?
If you just want to slurp input to output, this should do the trick.
use Carp ();
{ #Lexical For FileHandle and $/
open my $fh, '<' , '/path/to/file.txt' or Carp::croak("File Open Failed");
local $/ = undef;
print scalar <$fh>;
close $fh or Carp::carp("File Close Failed");
}
I guess in response to the "Does Perl have a PHP ReadFile Equivelant" , and I guess my answer would be "But it doesn't really need one".
I've used PHP's manual File IO controls and they're a pain, Perls are just so easy to use by comparison that shelling out for a one-size-fits-all function seems over-kill.
Also, you might want to look at X-SendFile support, and basically send a header to your webserver to tell it what file to send: http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/ ( assuming of course it has permissions enough to access the file, but the file is just NOT normally accessible via a standard URI )
Edit Noted, it is better to do it in a loop, I tested the above code with a hard-drive and it does implicitly try store the whole thing in an invisible temporary variable and eat all your ram.
Alternative using blocks
The following improved code reads the given file in blocks of 8192 chars, which is much more memory efficient, and gets a throughput respectably comparable with my disk raw read rate. ( I also pointed it at /dev/full for fits and giggles and got a healthy 500mb/s throughput, and it didn't eat all my rams, so that must be good )
{
open my $fh , '<', '/dev/sda' ;
local $/ = \8192; # this tells IO to use 8192 char chunks.
print $_ while defined ( $_ = scalar <$fh> );
close $fh;
}
Applying jrockways suggestions
{
open my $fh , '<', '/dev/sda5' ;
print $_ while ( sysread $fh, $_ , 8192 );
close $fh;
}
This literally doubles performance, ... and in some cases, gets me better throughput than DD does O_o.
The readline function is called readline (and can also be written as
<>).
I'm not sure what problem you're having. Perhaps that for loops
aren't lazily evaluated (which they're not). Or, perhaps Tie::File is
screwing something up? Anyway, the idiomatic Perl for reading a file
a line at a time is:
open my $fh, '<', $filename or die ...;
while(my $line = <$fh>){
# process $line
}
No need to use Tie::File.
Finally, you should not be handling this sort of thing yourself. This
is a job for a web framework. If you were using
Catalyst (or
HTTP::Engine), you would
just say:
open my $fh, '<', $filename ...
$c->res->body( $fh );
and the framework would automatically serve the data in the file
efficiently. (Using stdio via readline is not a good idea here, it's
better to read the file in blocks from the disk. But who cares, it's
abstracted!)
You could use my Sys::Sendfile module. It's should be highly efficient (as it uses sendfile underneath the hood), but not entirely portable (only Linux, FreeBSD and Solaris are currently supported).
When you say "this does not cause the browser to prompt for download" -- what's "the browser"?
Different browsers behave differently, and IE is particularly wilful, it will ignore headers and decide for itself what to do based on reading the first few kb of the file.
In other words, I think your problem may be at the client end, not the server end.
Try lying to "the browser" and telling it the file is of type application/octet-stream. Or why not just zip the file, especially as it's so huge.
Don't use for/foreach (<$input>) because it reads the whole file at once and then iterates over it. Use while (<$input>) instead. The sysread solution is good, but the sendfile is the best performance-wise.
Answering the (original) question ("Does Perl have an equivalent to PHP's readline() function ... ?"), the answer is "the angle bracket syntax":
open my $fh, '<', '/path/to/file.txt';
while (my $line = <file>) {
print $line;
}
Getting the content-length with this method isn't necessarily easy, though, so I'd recommend staying with Tie::File.
NOTE
Using:
for my $line (<$filehandle>) { ... }
(as I originally wrote) copies the contents of the file to a list and iterates over that. Using
while (my $line = <$filehandle>) { ... }
does not. When dealing with small files the difference isn't significant, but when dealing with large files it definitely can be.
Answering the (updated) question ("Does Perl have an equivalent to PHP's readfile() function ... ?"), the answer is slurping. There are a couple of syntaxes, but Perl6::Slurp seems to be the current module of choice.
The implied question ("why doesn't the browser prompt for download before grabbing the entire file?") has absolutely nothing to do with how you're reading in the file, and everything to do with what the browser thinks is good form. I would guess that the browser sees the mime-type and decides it knows how to display plain text.
Looking more closely at the Content-Disposition problem, I remember having similar trouble with IE ignoring Content-Disposition. Unfortunately I can't remember the workaround. IE has a long history of problems here (old page, refers to IE 5.0, 5.5 and 6.0). For clarification, however, I would like to know:
What kind of link are you using to point to this big file (i.e., are you using a normal a href="perl_script.cgi?filename.txt link or are you using Javascript of some kind)?
What system are you using to actually serve the file? For instance, does the webserver make its own connection to the other computer without a webserver, and then copy the file to the webserver and then send the file to the end user, or does the user make the connection directly to the computer without a webserver?
In the original question you wrote "this does not cause the browser to prompt for download before grabbing the entire file" and in a comment you wrote "I still don't get a download prompt for the file until the whole thing is downloaded." Does this mean that the file gets displayed in the browser (since it's just text), that after the browser has downloaded the entire file you get a "where do you want to save this file" prompt, or something else?
I have a feeling that there is a chance the HTTP headers are getting stripped out at some point or that a Cache-control header is getting added (which apparently can cause trouble).
I've successfully done it by telling the browser it was of type application/octet-stream instead of type text/plain. Apparently most browsers prefer to display text/plain inline instead of giving the user a download dialog option.
It's technically lying to the browser, but it does the job.
The most efficient way to serve a large file for download depends on a web-server you use.
In addition to #Kent Fredric X-Sendfile suggestion:
File Downloads Done Right have some links that describe how to do it for Apache, lighttpd (mod_secdownload: security via url generation), nginx. There are examples in PHP, Ruby (Rails), Python which can be adopted for Perl.
Basically it boils down to:
Configure paths, and permissions for your web-server.
Generate valid headers for the redirect in your Perl app (Content-Type, Content-Disposition, Content-length?, X-Sendfile or X-Accel-Redirect, etc).
There are probably CPAN modules, web-frameworks plugins that do exactly that e.g., #Leon Timmermans mentioned Sys::Sendfile in his answer.