How can I create a binary file in Perl? - perl

For example, I want to create a file called sample.bin and put a number, like 255, so that 255 is saved in the file as little-endian, FF 00. Or 3826 to F2 0E.
I tried using binmode, as the perldoc said.

The Perl pack function will return "binary" data according to a template.
open(my $out, '>:raw', 'sample.bin') or die "Unable to open: $!";
print $out pack('s<', 255);
close($out);
In the above example, the 's' tells it to output a short (16 bits), and the '<' forces it to little-endian mode.
In addition, ':raw' in the call to open tells it to put the filehandle into binary mode on platforms where that matters (it is equivalent to using binmode). The PerlIO manual page has a little more information on doing I/O in different formats.

You can use pack to generate your binary data. For complex structures, Convert::Binary::C is particularly nice.
CBC parses C header files (either from a directory or from a variable in your script). It uses the information from the headers to pack or unpack binary data.
Of course, if you want to use this module, it helps to know some C.
CBC gives you the ability to specify the endianness and sizes for your C types, and you can even specify functions to convert between native Perl types and the data in the binary file. I've used this feature to handle encoding and decoding fixed point numbers.
For your very basic example you'd use:
use strict;
use warnings;
use IO::File;
use Convert::Binary::C;
my $c = Convert::Binary::C->new('ByteOrder' => 'LittleEndian');
my $packed = $c->pack( 'short int', 0xFF );
print $packed;
my $fh = IO::File->new( 'outfile', '>' )
or die "Unable to open outfile - $!\n";
$fh->binmode;
$fh->print( $packed );
CBC doesn't really get to shine in this example, since it is just working with a single short int. If you need to handle complex structures that may have typedefs pulled from several different C headers, you will be very happy to have this tool on hand.
Since you are new to Perl, I'll suggest that you always use stict and use warnings. Also, you can use diagnostics to get more detailed explanations for error messages. Both this site and Perlmonks have lots of good information for beginners and many very smart, skilled people willing to help you.
BTW, if you decide to go the pack route, check out the pack tutorial, it helps clarify the somewhat mystifying pack documentation.

Yes, use binmode
For your entertainment (if not education) my very first attempt at creating a binary file included binmode STDOUT and the following:
sub output_word {
$word = $_[0];
$lsb = $word % 256;
$msb = int($word/256);
print OUT chr($lsb) . chr($msb);
return $word;
}
FOR PITY'S SAKE DON'T USE THIS CODE! It comes from a time when I didn't know any better.
It could be argued I still don't, but it's reproduced here to show that you can control the order of the bytes, even with brain-dead methods, and because I need to 'fess up.
A better method would be to use pack as Adam Batkin suggested.
I think I committed the atrocity above in Perl 4. It was a long time ago. I wish I could forget it...

Related

Perl - Piping gunzip Output to File::ReadBackwards

I have a Perl project (CGI script, running on Apache) that has previously always used gunzip and tac (piping gunzip to tac, and piping that to filehandle) in order to accomplish its workload, which is to process large, flat text files, sometimes on the order of 10GB or more each in size. More specifically, in my use case these files are needing to be decompressed on-the-fly AND read backwards at times as well (there are times when both are a requirement - mainly for speed).
When I started this project, I looked at using File::ReadBackwards but decided on tac instead for performance reasons. After some discussion on a slightly-related topic last night and several suggestions to try and keep the processing entirely within Perl, I decided to give File::ReadBackwards another shot to see how it stands up under this workload.
Some preliminary tests indicate that it may in fact be comparable, and possibly even better, than tac. However, so far I've only been able to test this on uncompressed files. But it now has grabbed my interest so I'd like to see if I could make it work with compressed files as well.
Now I'm pretty sure I could probably unzip a file to another file, then read that backwards, but I think that would have terrible performance. Especially because the user has the option to limit results to X number for the exact reason of helping performance so I do not want to have to process/decompress the entirety of a file every single time I pull any lines out of it. Ideally I would like to be able to do what I do now, which is to decompress and read it backwards on-the-fly, with the ability to bail out as soon as I hit my quota if needed.
So, my dilemma is that I need to find a way to pipe output from gunzip, into File::ReadBackwards, if possible.
On a side note, I would be willing to give IO::Uncompress::Gunzip a chance as well (compare the decompression performance against a plain, piped gunzip process), either for performance gain (which would surprise me) or for convenience/the ability to pipe output to File::ReadBackwards (which I find slightly more likely).
Does anyone have any ideas here? Any advice is much appreciated.
You can't. File::ReadBackwards requires a seekable handle (i.e. a plain file and not a pipe or socket).
To use File::ReadBackwards, you'd have to first send the output to a named temporary file (which you could create using File::Temp).
While File::ReadBackwards won't work as desired, here is another take.
In the original approach you first gunzip before tac-ing, and the whole file is read so to get to its end; thus tac is there only for convenience. (For a plain uncompressed file one can get file size from file metadata and then seek toward the end of a file so to not have to read the whole thing.)
Then try the same, or similar, in Perl. The IO::Uncompress::Gunzip module also has seek method. It does have to uncompress data up to that point
Note that the implementation of seek in this module does not provide true random access to a compressed file/buffer
but with it we still avoid copying uncompressed data (into variables) and so pay the minimal price here, to uncompress data in order to seek. In my timings this saves upward from an order of magnitude, making it far closer to system's gunzip (competitive on the order of 10Mb file sizes).
For that we also need the uncompressed size, which module's seek uses, which I get with system's gzip -l. Thus I still need to parse output of an external tool; so there's that issue.†
use warnings;
use strict;
use feature 'say';
use IO::Uncompress::Gunzip qw($GunzipError);
my $file = shift;
die "Usage: $0 file\n" if not $file or not -f $file;
my $z = IO::Uncompress::Gunzip->new($file) or die "Error: $GunzipError";
my $us = (split ' ', (`gunzip -l $file`)[1])[1]; # CHECK gunzip's output
say "Uncompressed size: $us";
# Go to 1024 bytes before uncompressed end (should really be more careful
# since we aren't guaranteed that size estimate)
$z->seek($us-1024, 0);
while (my $line = $z->getline) {
print $line if $z->eof;
}
(Note: docs advertise SEEK_END but it didn't work for me, neither as a constant nor as 2. Also note that the constructor does not fail for non-existent files so the program doesn't die there.)
This only prints the last line. Collect those lines into an array instead, for more involved work.
For compressed text files on order of 10Mb in size this runs as fast as gunzip | tac. For files around 100Mb in size it is slower by a factor of two. This is a rather rudimentary estimate, and it depends on all manner of detail. But I am comfortable to say that it will be noticeably slower for larger files.
However, the code above has a particular problem with file sizes possible in this case, in tens of Gb. The good old gzip format has the limitation, nicely stated in gzip manual
The gzip format represents the input size modulo 2^32 [...]
Then sizes obtained by --list for files larger than 4Gb undermine the above optimization: We'll seek to a place early in the file instead of to near its end (for a 17Gb file the size is reported by -l as 1Gb and so we seek there), and then in fact read the bulk of the file by getline.
The best solution would be to use the known value for the uncompressed data size -- if that is known. Otherwise, if the compressed file size exceeds 4Gb then seek to its compressed size (as far as we can safely), and after that use read with very large chunks
my $len = 10*1_024_000; # only hundreds of reads but large buffer
$z->read($buf, $len) while not $z->eof;
my #last_lines = split /\n/, $buf;
The last step depends on what actually need be done. If it is indeed to read lines backwards then you can do while (my $el = pop #last_lines) { ... } for example, or reverse the array and work away. Note that it is likely that the last read will be far lesser than $len.
On the other hand, it could so happen that the last read buffer is too small for what's needed; so one may want to always copy the needed number of lines and keep that across reads.
The buffer size to read ($len) clearly depends on specifics of the problem.
Finally, if this is too much bother you can pipe gunzip and keep a buffer of lines.
use String::ShellQuote qw(shell_quote);
my $num_lines = ...; # user supplied
my #last_lines;
my $cmd = shell_quote('gunzip', '-c', '--', $file);
my $pid = open my $fh, '-|', $cmd // die "Can't open $cmd: $!";
push #last_lines, scalar <$fh> for 0..$num_lines; # to not have to check
while (<$fh>) {
push #last_lines, $_;
shift #last_lines;
}
close $fh;
while (my $line = pop #last_lines) {
print; # process backwards
}
I put $num_lines on the array right away so to not have to test the size of #last_lines against $num_lines for every shift, so on every read. (This improves runtime by nearly 30%.)
Any hint of the number of lines (of uncompressed data) is helpful, so that we skip ahead and avoid copying data into variables, as much as possible.
# Stash $num_lines on array
<$fh> for 0..$num_to_skip; # skip over an estimated number of lines
# Now push+shift while reading
This can help quite a bit, but depending on how well we can estimate the number of lines. Altogether, in my tests this is still slower than gunzip | tac | head, by around 50% in the very favorable case when I skip 90% of the file.
† The uncompressed size can be found without going to external tools as
my $us = do {
my $d;
open my $fh, '<', $file or die "Can't open $file: $!";
seek($fh, -4, 2) and read($fh, $d, 4) >= 4 and unpack('V', $d)
or die "Can't get uncompressed size: $!";
};
Thanks to mosvy for a comment with this.
If we still stick with using system's gunzip then the safety of running an external command with user input (filename), practically bypassed here by checking for that file, need be taken into account by using String::ShellQuote to compose the command
use String::ShellQuote qw(shell_quote);
my $cmd = shell_quote('gunzip', '-l', '--', $file);
# my $us = ... qx($cmd) ...;
Thanks to ikegami for comment.

What is the Perl's IO::File equivalent to open($fh, ">:utf8",$path)?

It's possible to white a file utf-8 encoded as follows:
open my $fh,">:utf8","/some/path" or die $!;
How do I get the same result with IO::File, preferably in 1 line?
I got this one, but does it do the same and can it be done in just 1 line?
my $fh_out = IO::File->new($target_file, 'w');
$fh_out->binmode(':utf8');
For reference, the script starts as follows:
use 5.020;
use strict;
use warnings;
use utf8;
# code here
Yes, you can do it in one line.
open accepts one, two or three parameters. With one parameter, it is just a front end for the built-in open function. With two or three parameters, the first parameter is a filename that may include whitespace or other special characters, and the second parameter is the open mode, optionally followed by a file permission value.
[...]
If IO::File::open is given a mode that includes the : character, it passes all the three arguments to the three-argument open operator.
So you just do this.
my $fh_out = IO::File->new('/some/path', '>:utf8');
It is the same as your first open line because it gets passed through.
I would suggest to try out Path::Tiny. For example, to open and write out your file
use Path::Tiny;
path('/some/path')->spew_utf8(#data);
From the docs, on spew, spew_raw, spew_utf8
Writes data to a file atomically. [ ... ]
spew_raw is like spew with a binmode of :unix for a fast, unbuffered, raw write.
spew_utf8 is like spew with a binmode of :unix:encoding(UTF-8) (or PerlIO::utf8_strict ). If Unicode::UTF8 0.58+ is installed, a raw spew will be done instead on the data encoded with Unicode::UTF8.
The module integrates many tools for handling files and directories, paths and content. It is often simple calls like above, but also method chaining, recursive directory iterator, hooks for callbacks, etc. There is error handling throughout, consistent and thoughtful dealing with edge cases, flock on input/ouput handles, its own tiny and useful class for exceptions ... see docs.
Edit:
You could also use File::Slurp if it was not discouraged to use
e.g
use File::Slurp qw(write_file);
write_file( 'filename', {binmode => ':utf8'}, $buffer ) ;
The first argument to write_file is the filename. The next argument is
an optional hash reference and it contains key/values that can modify
the behavior of write_file. The rest of the argument list is the data
to be written to the file.
Some good reasons to not use?
Not reliable
Has some bugs
And as #ThisSuitIsBlackNot said File::Slurp is broken and wrong

Writing a macro in Perl

open $FP, '>', $outfile or die $outfile." Cannot open file for writing\n";
I have this statement a lot of times in my code.
I want to keep the format same for all of those statements, so that when something is changed, it is only changed at one place.
In Perl, how should I go about resolving this situation?
Should I use macros or functions?
I have seen this SO thread How can I use macros in Perl?, but it doesn't say much about how to write a general macro like
#define fw(FP, outfile) open $FP, '>', \
$outfile or die $outfile." Cannot open file for writing\n";
First, you should write that as:
open my $FP, '>', $outfile or die "Could not open '$outfile' for writing:$!";
including the reason why open failed.
If you want to encapsulate that, you can write:
use Carp;
sub openex {
my ($mode, $filename) = #_;
open my $h, $mode, $filename
or croak "Could not open '$filename': $!";
return $h;
}
# later
my $FP = openex('>', $outfile);
Starting with Perl 5.10.1, autodie is in the core and I will second Chas. Owens' recommendation to use it.
Perl 5 really doesn't have macros (there are source filters, but they are dangerous and ugly, so ugly even I won't link you to the documentation). A function may be the right choice, but you will find that it makes it harder for new people to read your code. A better option may be to use the autodie pragma (it is core as of Perl 5.10.1) and just cut out the or die part.
Another option, if you use Vim, is to use snipMate. You just type fw<tab>FP<tab>outfile<tab> and it produces
open my $FP, '>', $outfile
or die "Couldn't open $outfile for writing: $!\n";
The snipMate text is
snippet fw
open my $${1:filehandle}, ">", $${2:filename variable}
or die "Couldn't open $$2 for writing: $!\n";
${3}
I believe other editors have similar capabilities, but I am a Vim user.
There are several ways to handle something similar to a C macro in Perl: a source filter, a subroutine, Template::Toolkit, or use features in your text editor.
Source Filters
If you gotta have a C / CPP style preprocessor macro, it is possible to write one in Perl (or, actually, any language) using a precompile source filter. You can write fairly simple to complex Perl classes that operate on the text of your source code and perform transformations on it before the code goes to the Perl compiler. You can even run your Perl code directly through a CPP preprocessor to get the exact type of macro expansions you get in C / CPP using Filter::CPP.
Damian Conway's Filter::Simple is part of the Perl core distribution. With Filter::Simple, you could easily write a simple module to perform the macro you are describing. An example:
package myopinion;
# save in your Perl's #INC path as "myopinion.pm"...
use Filter::Simple;
FILTER {
s/Hogs/Pigs/g;
s/Hawgs/Hogs/g;
}
1;
Then a Perl file:
use myopinion;
print join(' ',"Hogs", 'Hogs', qq/Hawgs/, q/Hogs/, "\n");
print "In my opinion, Hogs are Hogs\n\n";
Output:
Pigs Pigs Hogs Pigs
In my opinion, Pigs are Pigs
If you rewrote the FILTER in to make the substitution for your desired macro, Filter::Simple should work fine. Filter::Simple can be restricted to parts of your code to make substations, such as the executable part but not the POD part; only in strings; only in code.
Source filters are not widely used in in my experience. I have mostly seen them with lame attempts to encrypt Perl source code or humorous Perl obfuscators. In other words, I know it can be done this way but I personally don't know enough about them to recommend them or say not to use them.
Subroutines
Sinan Ünür openex subroutine is a good way to accomplish this. I will only add that a common older idiom that you will see involves passing a reference to a typeglob like this:
sub opensesame {
my $fn=shift;
local *FH;
return open(FH,$fn) ? *FH : undef;
}
$fh=opensesame('> /tmp/file');
Read perldata for why it is this way...
Template Toolkit
Template::Toolkit can be used to process Perl source code. For example, you could write a template along the lines of:
[% fw(fp, outfile) %]
running that through Template::Toolkit can result in expansion and substitution to:
open my $FP, '>', $outfile or die "$outfile could not be opened for writing:$!";
Template::Toolkit is most often used to separate the messy HTML and other presentation code from the application code in web apps. Template::Toolkit is very actively developed and well documented. If your only use is a macro of the type you are suggesting, it may be overkill.
Text Editors
Chas. Owens has a method using Vim. I use BBEdit and could easily write a Text Factory to replace the skeleton of a open with the precise and evolving open that I want to use. Alternately, you can place a completion template in your "Resources" directory in the "Perl" folder. These completion skeletons are used when you press the series of keys you define. Almost any serious editor will have similar functionality.
With BBEdit, you can even use Perl code in your text replacement logic. I use Perl::Critic this way. You could use Template::Toolkit inside BBEdit to process the macros with some intelligence. It can be set up so the source code is not changed by the template until you output a version to test or compile; the editor is essentially acting as a preprocessor.
Two potential issues with using a text editor. First is it is a one way / one time transform. If you want to change what your "macro" does, you can't do it, since the previous text of you "macro" was already used. You have to manually change them. Second potential issue is that if you use a template form, you can't send the macro version of the source code to someone else because the preprocessing that is being done inside the editor.
Don't Do This!
If you type perl -h to get valid command switches, one option you may see is:
-P run program through C preprocessor before compilation
Tempting! Yes, you can run your Perl code through the C preprocessor and expand C style macros and have #defines. Put down that gun; walk away; don't do it. There are many platform incompatibilities and language incompatibilities.
You get issues like this:
#!/usr/bin/perl -P
#define BIG small
print "BIG\n";
print qq(BIG\n);
Prints:
BIG
small
In Perl 5.12 the -P switch has been removed...
Conclusion
The most flexible solution here is just write a subroutine. All your code is visible in the subroutine, easily changed, and a shorter call. No real downside other than the readability of your code potentially.
Template::Toolkit is widely used. You can write complex replacements that act like macros or even more complex than C macros. If your need for macros is worth the learning curve, use Template::Toolkit.
For very simple cases, use the one way transforms in an editor.
If you really want C style macros, you can use Filter::CPP. This may have the same incompatibilities as the perl -P switch. I cannot recommend this; just learn the Perl way.
If you want to run Perl one liners and Perl regexs against your code before it compiles, use Filter::Simple.
And don't use the -P switch. You can't on newer versions of Perl anyway.
For something like open i think it's useful to include close in your factorized routine. Here's an approach that looks a bit wierd but encapsulates a typical open/close idiom.
sub with_file_do(&$$) {
my ($code, $mode, $file) = #_;
open my $fp, '>', $file or die "Could not open '$file' for writing:$!";
local $FP = $fp;
$code->(); # perhaps wrap in an eval
close $fp;
}
# usage
with_file_do {
print $FP "whatever\n";
# other output things with $FP
} '>', $outfile;
Having the open params specified at the end is a bit wierd but it allows you to avoid having to specify the sub keyword.

Are there reasons to ever use the two-argument form of open(...) in Perl?

Are there any reasons to ever use the two-argument form of open(...) in Perl rather than the three-or-more-argument versions?
The only reason I can come up with is the obvious observation that the two-argument form is shorter. But assuming that verbosity is not an issue, are there any other reasons that would make you choose the two-argument form of open(...)?
One- and two-arg open applies any default layers specified with the -C switch or open pragma. Three-arg open does not. In my opinion, this functional difference is the strongest reason to choose one or the other (and the choice will vary depending what you are opening). Which is easiest or most descriptive or "safest" (you can safely use two-arg open with arbitrary filenames, it's just not as convenient) take a back seat in module code; in script code you have more discretion to choose whether you will support default layers or not.
Also, one-arg open is needed for Damian Conway's file slurp operator
$_ = "filename";
$contents = readline!open(!((*{!$_},$/)=\$_));
Imagine you are writing a utility that accepts an input file name. People with reasonable Unix experience are used to substituting - for STDIN. Perl handles that automatically only when the magical form is used where the mode characters and file name are one string, else you have to handle this and similar special cases yourself. This is a somewhat common gotcha, I am surprised no one has posted that yet. Proof:
use IO::File qw();
my $user_supplied_file_name = '-';
IO::File->new($user_supplied_file_name, 'r') or warn "IO::File/non-magical mode - $!\n";
IO::File->new("<$user_supplied_file_name") or warn "IO::File/magical mode - $!\n";
open my $fh1, '<', $user_supplied_file_name or warn "non-magical open - $!\n";
open my $fh2, "<$user_supplied_file_name" or warn "magical open - $!\n";
__DATA__
IO::File/non-magical mode - No such file or directory
non-magical open - No such file or directory
Another small difference : the two argument form trim spaces
$foo = " fic";
open(MH, ">$foo");
print MH "toto\n";
Writes in a file named fic
On the other hand
$foo = " fic";
open(MH, ">", $foo);
print MH "toto\n";
Will write in a file whose name begin with a space.
For short admin scripts with user input (or configuration file input), not having to bother with such details as trimming filenames is nice.
The two argument form of open was the only form supported by some old versions of perl.
If you're opening from a pipe, the three argument form isn't really helpful. Getting the equivalent of the three argument form involves doing a safe pipe open (open(FILE, '|-')) and then executing the program.
So for simple pipe opens (e.g. open(FILE, 'ps ax |')), the two argument syntax is much more compact.
I think William's post pretty much hits it. Otherwise, the three-argument form is going to be more clear, as well as safer.
See also:
What's the best way to open and read a file in Perl?
Why is three-argument open calls with autovivified filehandles a Perl best practice?
One reason to use the two-argument version of open is if you want to open something which might be a pipe, or a file. If you have one function
sub strange
{
my ($file) = #_;
open my $input, $file or die $!;
}
then you want to call this either with a filename like "file":
strange ("file");
or a pipe like "zcat file.gz |"
strange ("zcat file.gz |");
depending on the situation of the file you find, then the two-argument version may be used. You will actually see the above construction in "legacy" Perl. However, the most sensible thing might be to open the filehandle appropriately and send the filehandle to the function rather than using the file name like this.
When you are combining a string or using a variable, it can be rather unclear whether '<' or '>' etc is in already. In such cases, I personally prefer readability, which means, I use the longer form:
open($FILE, '>', $varfn);
When you simply use a constant, I prefer the ease-of-typing (and, actually, consider the short version better readable anyway, or at least even to the long version).
open($FILE, '>somefile.xxx');
I'm guessing you mean open(FH, '<filename.txt') as opposed to open(FH, '<', 'filename.txt') ?
I think it's just a matter of preference. I always use the former out of habit.

How do I serve a large file for download with Perl?

I need to serve a large file (500+ MB) for download from a location that is not accessible to the web server. I found the question Serving large files with PHP, which is identical to my situation, but I'm using Perl instead of PHP.
I tried simply printing the file line by line, but this does not cause the browser to prompt for download before grabbing the entire file:
use Tie::File;
open my $fh, '<', '/path/to/file.txt';
tie my #file, 'Tie::File', $fh
or die 'Could not open file: $!';
my $size_in_bytes = -s $fh;
print "Content-type: text/plain\n";
print "Content-Length: $size_in_bytes\n";
print "Content-Disposition: attachment; filename=file.txt\n\n";
for my $line (#file) {
print $line;
}
untie #file;
close $fh;
exit;
Does Perl have an equivalent to PHP's readfile() function (as suggested with PHP) or is there a way to accomplish what I'm trying to do here?
If you just want to slurp input to output, this should do the trick.
use Carp ();
{ #Lexical For FileHandle and $/
open my $fh, '<' , '/path/to/file.txt' or Carp::croak("File Open Failed");
local $/ = undef;
print scalar <$fh>;
close $fh or Carp::carp("File Close Failed");
}
I guess in response to the "Does Perl have a PHP ReadFile Equivelant" , and I guess my answer would be "But it doesn't really need one".
I've used PHP's manual File IO controls and they're a pain, Perls are just so easy to use by comparison that shelling out for a one-size-fits-all function seems over-kill.
Also, you might want to look at X-SendFile support, and basically send a header to your webserver to tell it what file to send: http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/ ( assuming of course it has permissions enough to access the file, but the file is just NOT normally accessible via a standard URI )
Edit Noted, it is better to do it in a loop, I tested the above code with a hard-drive and it does implicitly try store the whole thing in an invisible temporary variable and eat all your ram.
Alternative using blocks
The following improved code reads the given file in blocks of 8192 chars, which is much more memory efficient, and gets a throughput respectably comparable with my disk raw read rate. ( I also pointed it at /dev/full for fits and giggles and got a healthy 500mb/s throughput, and it didn't eat all my rams, so that must be good )
{
open my $fh , '<', '/dev/sda' ;
local $/ = \8192; # this tells IO to use 8192 char chunks.
print $_ while defined ( $_ = scalar <$fh> );
close $fh;
}
Applying jrockways suggestions
{
open my $fh , '<', '/dev/sda5' ;
print $_ while ( sysread $fh, $_ , 8192 );
close $fh;
}
This literally doubles performance, ... and in some cases, gets me better throughput than DD does O_o.
The readline function is called readline (and can also be written as
<>).
I'm not sure what problem you're having. Perhaps that for loops
aren't lazily evaluated (which they're not). Or, perhaps Tie::File is
screwing something up? Anyway, the idiomatic Perl for reading a file
a line at a time is:
open my $fh, '<', $filename or die ...;
while(my $line = <$fh>){
# process $line
}
No need to use Tie::File.
Finally, you should not be handling this sort of thing yourself. This
is a job for a web framework. If you were using
Catalyst (or
HTTP::Engine), you would
just say:
open my $fh, '<', $filename ...
$c->res->body( $fh );
and the framework would automatically serve the data in the file
efficiently. (Using stdio via readline is not a good idea here, it's
better to read the file in blocks from the disk. But who cares, it's
abstracted!)
You could use my Sys::Sendfile module. It's should be highly efficient (as it uses sendfile underneath the hood), but not entirely portable (only Linux, FreeBSD and Solaris are currently supported).
When you say "this does not cause the browser to prompt for download" -- what's "the browser"?
Different browsers behave differently, and IE is particularly wilful, it will ignore headers and decide for itself what to do based on reading the first few kb of the file.
In other words, I think your problem may be at the client end, not the server end.
Try lying to "the browser" and telling it the file is of type application/octet-stream. Or why not just zip the file, especially as it's so huge.
Don't use for/foreach (<$input>) because it reads the whole file at once and then iterates over it. Use while (<$input>) instead. The sysread solution is good, but the sendfile is the best performance-wise.
Answering the (original) question ("Does Perl have an equivalent to PHP's readline() function ... ?"), the answer is "the angle bracket syntax":
open my $fh, '<', '/path/to/file.txt';
while (my $line = <file>) {
print $line;
}
Getting the content-length with this method isn't necessarily easy, though, so I'd recommend staying with Tie::File.
NOTE
Using:
for my $line (<$filehandle>) { ... }
(as I originally wrote) copies the contents of the file to a list and iterates over that. Using
while (my $line = <$filehandle>) { ... }
does not. When dealing with small files the difference isn't significant, but when dealing with large files it definitely can be.
Answering the (updated) question ("Does Perl have an equivalent to PHP's readfile() function ... ?"), the answer is slurping. There are a couple of syntaxes, but Perl6::Slurp seems to be the current module of choice.
The implied question ("why doesn't the browser prompt for download before grabbing the entire file?") has absolutely nothing to do with how you're reading in the file, and everything to do with what the browser thinks is good form. I would guess that the browser sees the mime-type and decides it knows how to display plain text.
Looking more closely at the Content-Disposition problem, I remember having similar trouble with IE ignoring Content-Disposition. Unfortunately I can't remember the workaround. IE has a long history of problems here (old page, refers to IE 5.0, 5.5 and 6.0). For clarification, however, I would like to know:
What kind of link are you using to point to this big file (i.e., are you using a normal a href="perl_script.cgi?filename.txt link or are you using Javascript of some kind)?
What system are you using to actually serve the file? For instance, does the webserver make its own connection to the other computer without a webserver, and then copy the file to the webserver and then send the file to the end user, or does the user make the connection directly to the computer without a webserver?
In the original question you wrote "this does not cause the browser to prompt for download before grabbing the entire file" and in a comment you wrote "I still don't get a download prompt for the file until the whole thing is downloaded." Does this mean that the file gets displayed in the browser (since it's just text), that after the browser has downloaded the entire file you get a "where do you want to save this file" prompt, or something else?
I have a feeling that there is a chance the HTTP headers are getting stripped out at some point or that a Cache-control header is getting added (which apparently can cause trouble).
I've successfully done it by telling the browser it was of type application/octet-stream instead of type text/plain. Apparently most browsers prefer to display text/plain inline instead of giving the user a download dialog option.
It's technically lying to the browser, but it does the job.
The most efficient way to serve a large file for download depends on a web-server you use.
In addition to #Kent Fredric X-Sendfile suggestion:
File Downloads Done Right have some links that describe how to do it for Apache, lighttpd (mod_secdownload: security via url generation), nginx. There are examples in PHP, Ruby (Rails), Python which can be adopted for Perl.
Basically it boils down to:
Configure paths, and permissions for your web-server.
Generate valid headers for the redirect in your Perl app (Content-Type, Content-Disposition, Content-length?, X-Sendfile or X-Accel-Redirect, etc).
There are probably CPAN modules, web-frameworks plugins that do exactly that e.g., #Leon Timmermans mentioned Sys::Sendfile in his answer.