Perl - Piping gunzip Output to File::ReadBackwards - perl

I have a Perl project (CGI script, running on Apache) that has previously always used gunzip and tac (piping gunzip to tac, and piping that to filehandle) in order to accomplish its workload, which is to process large, flat text files, sometimes on the order of 10GB or more each in size. More specifically, in my use case these files are needing to be decompressed on-the-fly AND read backwards at times as well (there are times when both are a requirement - mainly for speed).
When I started this project, I looked at using File::ReadBackwards but decided on tac instead for performance reasons. After some discussion on a slightly-related topic last night and several suggestions to try and keep the processing entirely within Perl, I decided to give File::ReadBackwards another shot to see how it stands up under this workload.
Some preliminary tests indicate that it may in fact be comparable, and possibly even better, than tac. However, so far I've only been able to test this on uncompressed files. But it now has grabbed my interest so I'd like to see if I could make it work with compressed files as well.
Now I'm pretty sure I could probably unzip a file to another file, then read that backwards, but I think that would have terrible performance. Especially because the user has the option to limit results to X number for the exact reason of helping performance so I do not want to have to process/decompress the entirety of a file every single time I pull any lines out of it. Ideally I would like to be able to do what I do now, which is to decompress and read it backwards on-the-fly, with the ability to bail out as soon as I hit my quota if needed.
So, my dilemma is that I need to find a way to pipe output from gunzip, into File::ReadBackwards, if possible.
On a side note, I would be willing to give IO::Uncompress::Gunzip a chance as well (compare the decompression performance against a plain, piped gunzip process), either for performance gain (which would surprise me) or for convenience/the ability to pipe output to File::ReadBackwards (which I find slightly more likely).
Does anyone have any ideas here? Any advice is much appreciated.

You can't. File::ReadBackwards requires a seekable handle (i.e. a plain file and not a pipe or socket).
To use File::ReadBackwards, you'd have to first send the output to a named temporary file (which you could create using File::Temp).

While File::ReadBackwards won't work as desired, here is another take.
In the original approach you first gunzip before tac-ing, and the whole file is read so to get to its end; thus tac is there only for convenience. (For a plain uncompressed file one can get file size from file metadata and then seek toward the end of a file so to not have to read the whole thing.)
Then try the same, or similar, in Perl. The IO::Uncompress::Gunzip module also has seek method. It does have to uncompress data up to that point
Note that the implementation of seek in this module does not provide true random access to a compressed file/buffer
but with it we still avoid copying uncompressed data (into variables) and so pay the minimal price here, to uncompress data in order to seek. In my timings this saves upward from an order of magnitude, making it far closer to system's gunzip (competitive on the order of 10Mb file sizes).
For that we also need the uncompressed size, which module's seek uses, which I get with system's gzip -l. Thus I still need to parse output of an external tool; so there's that issue.†
use warnings;
use strict;
use feature 'say';
use IO::Uncompress::Gunzip qw($GunzipError);
my $file = shift;
die "Usage: $0 file\n" if not $file or not -f $file;
my $z = IO::Uncompress::Gunzip->new($file) or die "Error: $GunzipError";
my $us = (split ' ', (`gunzip -l $file`)[1])[1]; # CHECK gunzip's output
say "Uncompressed size: $us";
# Go to 1024 bytes before uncompressed end (should really be more careful
# since we aren't guaranteed that size estimate)
$z->seek($us-1024, 0);
while (my $line = $z->getline) {
print $line if $z->eof;
}
(Note: docs advertise SEEK_END but it didn't work for me, neither as a constant nor as 2. Also note that the constructor does not fail for non-existent files so the program doesn't die there.)
This only prints the last line. Collect those lines into an array instead, for more involved work.
For compressed text files on order of 10Mb in size this runs as fast as gunzip | tac. For files around 100Mb in size it is slower by a factor of two. This is a rather rudimentary estimate, and it depends on all manner of detail. But I am comfortable to say that it will be noticeably slower for larger files.
However, the code above has a particular problem with file sizes possible in this case, in tens of Gb. The good old gzip format has the limitation, nicely stated in gzip manual
The gzip format represents the input size modulo 2^32 [...]
Then sizes obtained by --list for files larger than 4Gb undermine the above optimization: We'll seek to a place early in the file instead of to near its end (for a 17Gb file the size is reported by -l as 1Gb and so we seek there), and then in fact read the bulk of the file by getline.
The best solution would be to use the known value for the uncompressed data size -- if that is known. Otherwise, if the compressed file size exceeds 4Gb then seek to its compressed size (as far as we can safely), and after that use read with very large chunks
my $len = 10*1_024_000; # only hundreds of reads but large buffer
$z->read($buf, $len) while not $z->eof;
my #last_lines = split /\n/, $buf;
The last step depends on what actually need be done. If it is indeed to read lines backwards then you can do while (my $el = pop #last_lines) { ... } for example, or reverse the array and work away. Note that it is likely that the last read will be far lesser than $len.
On the other hand, it could so happen that the last read buffer is too small for what's needed; so one may want to always copy the needed number of lines and keep that across reads.
The buffer size to read ($len) clearly depends on specifics of the problem.
Finally, if this is too much bother you can pipe gunzip and keep a buffer of lines.
use String::ShellQuote qw(shell_quote);
my $num_lines = ...; # user supplied
my #last_lines;
my $cmd = shell_quote('gunzip', '-c', '--', $file);
my $pid = open my $fh, '-|', $cmd // die "Can't open $cmd: $!";
push #last_lines, scalar <$fh> for 0..$num_lines; # to not have to check
while (<$fh>) {
push #last_lines, $_;
shift #last_lines;
}
close $fh;
while (my $line = pop #last_lines) {
print; # process backwards
}
I put $num_lines on the array right away so to not have to test the size of #last_lines against $num_lines for every shift, so on every read. (This improves runtime by nearly 30%.)
Any hint of the number of lines (of uncompressed data) is helpful, so that we skip ahead and avoid copying data into variables, as much as possible.
# Stash $num_lines on array
<$fh> for 0..$num_to_skip; # skip over an estimated number of lines
# Now push+shift while reading
This can help quite a bit, but depending on how well we can estimate the number of lines. Altogether, in my tests this is still slower than gunzip | tac | head, by around 50% in the very favorable case when I skip 90% of the file.
† The uncompressed size can be found without going to external tools as
my $us = do {
my $d;
open my $fh, '<', $file or die "Can't open $file: $!";
seek($fh, -4, 2) and read($fh, $d, 4) >= 4 and unpack('V', $d)
or die "Can't get uncompressed size: $!";
};
Thanks to mosvy for a comment with this.
If we still stick with using system's gunzip then the safety of running an external command with user input (filename), practically bypassed here by checking for that file, need be taken into account by using String::ShellQuote to compose the command
use String::ShellQuote qw(shell_quote);
my $cmd = shell_quote('gunzip', '-l', '--', $file);
# my $us = ... qx($cmd) ...;
Thanks to ikegami for comment.

Related

How to pipe to and read from the same tempfile handle without race conditions?

Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH

Perl : Parallel or multithread or Bloom-Faster or fork to fill hash from a file of 500 million lines

what is the best solution to do this script faster like parallel runs ?
#!usr/bin/perl
use warnings;
use strict;
use threads;
open(R1 ,"<$ARGV[0]") || die " problem in oppening $ARGV[0]: $!\n";
my %dict1 : shared;
my $i=0;
while (my $l = <R1>){
chomp($l);
$l=~ s/\s$//;
$l=~ s/^\s//;
if ($l=~ /(.*)\s(.*)/){
$i++;
#print $1,"\n";
#my $t = threads->create($dict1{$1}++);
$dict1{$1}++;
}
}
print $i, "\n";
close R1;
You can make array of $N elements which correspond to equal parts of file,
my $N = 6;
my $step = int( $file_size/$N );
my #arr = map { ($_-1) * $step } 1 .. $N;
Then correct these numbers by seeking to file positions (perldoc -f seek), reading the rest of the line(perldoc -f readline), and telling corrected file position (perldoc -f tell).
Start $N threads where each already knows where to start with their extracting job, and join their results at the end. However you may find that reading from media is actual bottleneck as #ikegami already pointed out.
The most likely case is that you're being limited by the speed that your data can be read from the disk ("I/O bound"), not by processing time ("CPU bound"). If that's the case, then there is nothing you can do with threads or parallel execution to speed this up - if it has any effect at all, parallelization would slow you down by forcing the disk to jump back and forth between the parts of the file being read by the different processes/threads.
An easy way to test whether this is the case would be to open a shell and run the command cat my_data_file > /dev/null This should tell you how long it takes to just read the file from disk, without actually doing anything to it. If it's roughly the same as the time it takes to run your program on my_data_file, then don't bother trying to optimize or speed it up. You can't. There are only two ways to improve performance in that situation:
Change the way your code works so that you don't need to read the entire file. If you're dealing with something that will run multiple times, indexing records in the file or using a database may help, but that doesn't do any good if it's a one-shot operation (since you'd still need to read the whole file once to create the index/database).
Use faster storage media.
If you're not I/O bound, the next most likely case is that you're memory bound - the data won't all fit in memory at once, causing the disk to thrash as it moves chunks of data into and out of virtual memory. Once again, parallelizing the process would make things worse, not better.
The solutions in this case are similar to before:
Change what you're doing so that you don't need all the data in memory at once. In this case, indexing or a database would probably be beneficial even for a one-off operation.
Buy more RAM.
Unless you're doing much heavier processing of the data than just the few regexes and stuffing it in a hash that you've shown, though, you're definitely not CPU bound and parallelization will not provide any benefit.

How can i make writes to a gzip file from my perl script non-blocking?

I'm currently writing a script that takes a database as input and generates all valid combinations from the 10+ tables, following certain rules. Since the output is pretty darn huge, i'm dumping this through gzip into the file, like this:
open( my $OUT, '|-', "gzip > file" );
for ( #data ) {
my $line = calculate($_);
print $OUT $line;
}
Due to the nature of the beast though i end up having to make hundreds of thousands of small writes, one for each line. This means that between each calculation it waits for gzip to receive the data and get done compressing it. At least i think so, i might be wrong.
In case I'm right though, I'm wondering how i can make this print asynchronous, i.e. it fires the data at gzip and then goes on processing the data.
Give IO::Compress::Gzip a try. It accepts a filehandle to write to. You can set O_NONBLOCK on that filehandle.
Pipes already use a buffer so that the writing program doesn't have to wait for the reading program. However, that buffer is usually fairly small (it's normally only 64KB on Linux) and not easily changed (it requires recompiling the kernel). If the standard buffer is not enough, the easiest thing to do is include a buffering program in the pipeline:
open( my $OUT, '|-', "bfr | gzip > file" );
bfr simply reads STDIN into an in-memory buffer, and writes to STDOUT as fast as the next program allows. The default is a 5MB buffer, but you can change that with the -b option (e.g. bfr -b10m for a 10MB buffer).
naturally i'll will do it in a thread or with a fork as you wish.
http://hell.jedicoder.net/?p=82

How can I write compressed files on the fly using Perl?

I am generating relatively large files using Perl. The files I am generating are of two kinds:
Table files, i.e. textual files I print line by line (row by row), which contain mainly numbers. A typical line looks like:
126891 126991 14545 12
Serialized objects I create then store into a file using Storable::nstore. These objects usually contain some large hash with numeric values. The values in the object might have been packed to save on space (and the object unpacks each value before using it).
Currently I'm usually doing the following:
use IO::Compress::Gzip qw(gzip $GzipError);
# create normal, uncompressed file ($out_file)
# ...
# compress file using gzip
my $gz_out_file = "$out_file.gz";
gzip $out_file => $gz_out_file or die "gzip failed: $GzipError";
# delete uncompressed file
unlink($out_file) or die "can't unlink file $out_file: $!";
This is quite inefficient since I first write the large file to disk, then gzip read it again and compresses it. So my questions are as following:
Can I create a compressed file without first writing a file to disk? Is it possible to create a compressed file sequentially, i.e. printing line-by-line like in scenario (1) described earlier?
Does Gzip sounds like an appropriate choice? aRe there any other recommended compressors for the kind of data I have described?
Does it make sense to pack values in an object that will later be stored and compressed anyway?
My considerations are mainly saving on disk space and allowing fast decompression later on.
You can use IO::Zlib or PerlIO::gzip to tie a file handle to compress on the fly.
As for what compressors are appropriate, just try several and see how they do on your data. Also keep an eye on how much CPU/memory they use for compression and decompression.
Again, test to see how much pack helps with your data, and how much it affects your performance. In some cases, it may be helpful. In others, it may not. It really depends on your data.
You can also open() a filehandle to a scalar instead of a real file, and use this filehandle with IO::Compress::Gzip. Haven't actually tried it, but it should work. I use something similar with Net::FTP to avoid creating files on disk.
Since v5.8.0, Perl has built using PerlIO by default. Unless you've changed this (i.e., Configure -Uuseperlio), you can open filehandles directly to Perl scalars via:
open($fh, '>', \$variable) || ..
from open()
IO::Compress::Zlib has an OO interface that can be used for this.
use strict;
use warnings;
use IO::Compress::Gzip;
my $z = IO::Compress::Gzip->new('out.gz');
$z->print($_, "\n") for 0 .. 10;

How do I serve a large file for download with Perl?

I need to serve a large file (500+ MB) for download from a location that is not accessible to the web server. I found the question Serving large files with PHP, which is identical to my situation, but I'm using Perl instead of PHP.
I tried simply printing the file line by line, but this does not cause the browser to prompt for download before grabbing the entire file:
use Tie::File;
open my $fh, '<', '/path/to/file.txt';
tie my #file, 'Tie::File', $fh
or die 'Could not open file: $!';
my $size_in_bytes = -s $fh;
print "Content-type: text/plain\n";
print "Content-Length: $size_in_bytes\n";
print "Content-Disposition: attachment; filename=file.txt\n\n";
for my $line (#file) {
print $line;
}
untie #file;
close $fh;
exit;
Does Perl have an equivalent to PHP's readfile() function (as suggested with PHP) or is there a way to accomplish what I'm trying to do here?
If you just want to slurp input to output, this should do the trick.
use Carp ();
{ #Lexical For FileHandle and $/
open my $fh, '<' , '/path/to/file.txt' or Carp::croak("File Open Failed");
local $/ = undef;
print scalar <$fh>;
close $fh or Carp::carp("File Close Failed");
}
I guess in response to the "Does Perl have a PHP ReadFile Equivelant" , and I guess my answer would be "But it doesn't really need one".
I've used PHP's manual File IO controls and they're a pain, Perls are just so easy to use by comparison that shelling out for a one-size-fits-all function seems over-kill.
Also, you might want to look at X-SendFile support, and basically send a header to your webserver to tell it what file to send: http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/ ( assuming of course it has permissions enough to access the file, but the file is just NOT normally accessible via a standard URI )
Edit Noted, it is better to do it in a loop, I tested the above code with a hard-drive and it does implicitly try store the whole thing in an invisible temporary variable and eat all your ram.
Alternative using blocks
The following improved code reads the given file in blocks of 8192 chars, which is much more memory efficient, and gets a throughput respectably comparable with my disk raw read rate. ( I also pointed it at /dev/full for fits and giggles and got a healthy 500mb/s throughput, and it didn't eat all my rams, so that must be good )
{
open my $fh , '<', '/dev/sda' ;
local $/ = \8192; # this tells IO to use 8192 char chunks.
print $_ while defined ( $_ = scalar <$fh> );
close $fh;
}
Applying jrockways suggestions
{
open my $fh , '<', '/dev/sda5' ;
print $_ while ( sysread $fh, $_ , 8192 );
close $fh;
}
This literally doubles performance, ... and in some cases, gets me better throughput than DD does O_o.
The readline function is called readline (and can also be written as
<>).
I'm not sure what problem you're having. Perhaps that for loops
aren't lazily evaluated (which they're not). Or, perhaps Tie::File is
screwing something up? Anyway, the idiomatic Perl for reading a file
a line at a time is:
open my $fh, '<', $filename or die ...;
while(my $line = <$fh>){
# process $line
}
No need to use Tie::File.
Finally, you should not be handling this sort of thing yourself. This
is a job for a web framework. If you were using
Catalyst (or
HTTP::Engine), you would
just say:
open my $fh, '<', $filename ...
$c->res->body( $fh );
and the framework would automatically serve the data in the file
efficiently. (Using stdio via readline is not a good idea here, it's
better to read the file in blocks from the disk. But who cares, it's
abstracted!)
You could use my Sys::Sendfile module. It's should be highly efficient (as it uses sendfile underneath the hood), but not entirely portable (only Linux, FreeBSD and Solaris are currently supported).
When you say "this does not cause the browser to prompt for download" -- what's "the browser"?
Different browsers behave differently, and IE is particularly wilful, it will ignore headers and decide for itself what to do based on reading the first few kb of the file.
In other words, I think your problem may be at the client end, not the server end.
Try lying to "the browser" and telling it the file is of type application/octet-stream. Or why not just zip the file, especially as it's so huge.
Don't use for/foreach (<$input>) because it reads the whole file at once and then iterates over it. Use while (<$input>) instead. The sysread solution is good, but the sendfile is the best performance-wise.
Answering the (original) question ("Does Perl have an equivalent to PHP's readline() function ... ?"), the answer is "the angle bracket syntax":
open my $fh, '<', '/path/to/file.txt';
while (my $line = <file>) {
print $line;
}
Getting the content-length with this method isn't necessarily easy, though, so I'd recommend staying with Tie::File.
NOTE
Using:
for my $line (<$filehandle>) { ... }
(as I originally wrote) copies the contents of the file to a list and iterates over that. Using
while (my $line = <$filehandle>) { ... }
does not. When dealing with small files the difference isn't significant, but when dealing with large files it definitely can be.
Answering the (updated) question ("Does Perl have an equivalent to PHP's readfile() function ... ?"), the answer is slurping. There are a couple of syntaxes, but Perl6::Slurp seems to be the current module of choice.
The implied question ("why doesn't the browser prompt for download before grabbing the entire file?") has absolutely nothing to do with how you're reading in the file, and everything to do with what the browser thinks is good form. I would guess that the browser sees the mime-type and decides it knows how to display plain text.
Looking more closely at the Content-Disposition problem, I remember having similar trouble with IE ignoring Content-Disposition. Unfortunately I can't remember the workaround. IE has a long history of problems here (old page, refers to IE 5.0, 5.5 and 6.0). For clarification, however, I would like to know:
What kind of link are you using to point to this big file (i.e., are you using a normal a href="perl_script.cgi?filename.txt link or are you using Javascript of some kind)?
What system are you using to actually serve the file? For instance, does the webserver make its own connection to the other computer without a webserver, and then copy the file to the webserver and then send the file to the end user, or does the user make the connection directly to the computer without a webserver?
In the original question you wrote "this does not cause the browser to prompt for download before grabbing the entire file" and in a comment you wrote "I still don't get a download prompt for the file until the whole thing is downloaded." Does this mean that the file gets displayed in the browser (since it's just text), that after the browser has downloaded the entire file you get a "where do you want to save this file" prompt, or something else?
I have a feeling that there is a chance the HTTP headers are getting stripped out at some point or that a Cache-control header is getting added (which apparently can cause trouble).
I've successfully done it by telling the browser it was of type application/octet-stream instead of type text/plain. Apparently most browsers prefer to display text/plain inline instead of giving the user a download dialog option.
It's technically lying to the browser, but it does the job.
The most efficient way to serve a large file for download depends on a web-server you use.
In addition to #Kent Fredric X-Sendfile suggestion:
File Downloads Done Right have some links that describe how to do it for Apache, lighttpd (mod_secdownload: security via url generation), nginx. There are examples in PHP, Ruby (Rails), Python which can be adopted for Perl.
Basically it boils down to:
Configure paths, and permissions for your web-server.
Generate valid headers for the redirect in your Perl app (Content-Type, Content-Disposition, Content-length?, X-Sendfile or X-Accel-Redirect, etc).
There are probably CPAN modules, web-frameworks plugins that do exactly that e.g., #Leon Timmermans mentioned Sys::Sendfile in his answer.