I want to extract some data from a large-ish (3+ GB, gzipped) FTP download, and do this on-the-fly, to avoid dumping then full download on my disk.
To extract the desired data I need to examine the uncompressed stream line-by-line.
So I'm looking for the moral equivalent of
use PerlIO::gzip;
my $handle = open '<:gzip', 'ftp://ftp.foobar.com/path/to/blotto.txt.gz'
or die $!;
for my $line (<$handle>) {
# etc.
}
close($handle);
FWIW: I know how to open a read handle to ftp://ftp.foobar.com/path/to/blotto.txt.gz (with Net::FTP::repr), but I have not yet figured out how to add a :gzip layer to this open handle.
It took me a lot longer than it should have to find the answer to the question above, so I thought I'd post it for the next person who needs it.
OK, the answer is (IMO) not at all obvious: binmode($handle, ':gzip').
Here's a fleshed-out example:
use strict;
use Net::FTP;
use PerlIO::gzip;
my $ftp = Net::FTP->new('ftp.foobar.com') or die $#;
$ftp->login or die $ftp->message; # anonymous FTP
my $handle = $ftp->retr('/path/to/blotto.txt.gz') or die $ftp->message;
binmode($handle, ':gzip');
for my $line (<$handle>) {
# etc.
}
close($handle);
The code below is from IO::Compress FAQ
use Net::FTP;
use IO::Uncompress::Gunzip qw(:all);
my $ftp = new Net::FTP ...
my $retr_fh = $ftp->retr($compressed_filename);
gunzip $retr_fh => $outFilename, AutoClose => 1
or die "Cannot uncompress '$compressed_file': $GunzipError\n";
To get the data line by line, change it to this
use Net::FTP;
use IO::Uncompress::Gunzip qw(:all);
my $ftp = new Net::FTP ...
my $retr_fh = $ftp->retr($compressed_filename);
my $gunzip = new IO::Uncompress::Gunzip $retr_fh, AutoClose => 1
or die "Cannot uncompress '$compressed_file': $GunzipError\n";
while(<$gunzip>)
{
...
}
Related
I am facing read the file error while i am uploading a file using perl like this
fileparse_set_fstype('MSWin32');
my ($OriginalName,$OriginalPath) = fileparse( $CgiRef->{'filename'} );
my $LocalName = $_ . $OriginalName;
open(FILE, ">$config->{'BASE_PATH'}/files/$LocalName")
or die "Could not open file:$!";
my $Req = new CGI;
while (read($Req->param('filename'), my $Buffer, 1024))
{
print FILE $Buffer;
}
close(FILE)
And There is no problem in accesing $CgiRef->{'$filename'} or any refernce variables.
please let me know where is the actual problem while uploading.
now it shows the error
read() on unopened filehandle
You're trying to read from the wrong place. In CGI-land, use $cgi->upload('varname') to get a filehandle on the object you're trying to receive.
This modified version of your snippet should work:
fileparse_set_fstype('MSWin32');
my ($OriginalName,$OriginalPath) = fileparse( $CgiRef->{'filename'} );
my $LocalName = $_ . $OriginalName;
open(FILE, ">", "$config->{'BASE_PATH'}/files/$LocalName")
or die "Could not open file:$!";
my $Req = CGI->new();
# Get the filehandle for the upload content
my $Req_file = $Req->upload('filename');
# Save to FILE
while (<$Req_file>) {
print FILE;
}
close(FILE);
Please note, always use the 3 param version of open. It's cleaner, safer, and clearer. See Modern Perl for an explanation.
A full example of the whole process from HTML form to CGI processing can be found here.
A page that prints out a file (on the server) contents and provides a direct download link.
Download File HERE
Start contents of file:
line 1
line 2
line 3
...
I am not sure of the best way and the right header that will allow a download link and HTML text. This prints out blank
print $mycgi->header(
-cookie => $mycookie,
-Type => "application/x-download"
-'Content-Disposition'=>'attachment; filename="FileName"'
);
You can include a link to a script and pass the filename as a parameter. The link might look something like this:
http://url/to/script?action=download&file=foo
Below that, simply print the contents of the file:
#!/usr/bin/perl -T
use strict;
use warnings;
use CGI qw/escapeHTML/;
my $q = CGI->new;
print $q->header,
$q->start_html('foo'),
$q->a({ -href => 'http://url/to/script?action=download&file=foo' }, 'Click to download'),
"<pre>";
open my $fh, "<", "/path/to/file" or die $!;
print escapeHTML($_) while <$fh>;
close $fh;
print "</pre>", $q->end_html;
Note that you should use escapeHTML() to prevent the browser from rendering anything in the file as HTML (which the <pre> tag alone does not take care of).
When the script is called with the action parameter set to download, use the application/x-download content type as you did above:
my $q = CGI->new;
# Untaint parameters
my ($action) = ($q->param('action') =~ /^(\w+)$/g);
my ($file) = ($q->param('file') =~ /^([-.\w]+)$/g);
# Map file parameter to the actual file name on your filesystem.
# The user should never know the actual file name. There are many
# ways you could implement this.
???
if ($action eq "download") {
print $q->header(
-type => "application/x-download",
-'Content-Disposition' => 'attachment; filename="FileName"'
);
open my $fh, "<", $file or die "Failed to open `$file' for reading: $!";
print while <$fh>;
close $fh;
}
Note that you also need to print the contents of the file in the body of the response.
I'm using an XPath like "school/student[4]". Could the setNodeText function save to harddisk? My changes only seem to be made in memory.
If I understand correctly, you are trying to change a document then write it to disk.
use XML::LibXML qw( );
my $parser = XML::LibXML->new();
my $doc = $parser->parse_fh(...);
my $root = $doc->documentElement();
for my $node ($root->findnodes('//school/student[4]')) {
$node->removeChildNodes();
$node->appendText("New text");
}
open(my $fh, '>:raw', ...) or die $!;
print($fh $doc->toString());
You can dump the XML structure using the undocumented method getNodeAsXML. The output isn't garanteed to be valid XML (e.g. no header), but it usually does the trick.
my $str = $xp->getNodeAsXML();
print $str;
Source: http://www.perlmonks.org/?node_id=567212
I have a problem when I use Apache::DBI in child processes. The problem is that Apache::DBI provides a single handle for all processes which use it, so I get
DBD::mysql::db selectall_arrayref
failed: Commands out of sync; you
can't run this command now at
/usr/local/www/apache22/data/test-fork.cgi
line 20.
Reconnection doesn't help, since Apache::DBI reconnects in all processes, as I understood the following error
The server encountered an internal
error and was unable to complete your
request.
Error message: DBD driver has not
implemented the AutoCommit attribute
at
/usr/local/lib/perl5/site_perl/5.8.9/Apache/DBI.pm
line 283. ,
Here's the origin code:
use Data::Dumper 'Dumper';
use DBI ();
my $dbh = DBI->connect($dsn, $username, $password, {
RaiseError => 1,
PrintError => 0,
});
my $file = "/tmp/test-fork.tmp";
my $pid = fork;
defined $pid or die "fork: $!";
if ($pid) {
my $rows = eval { $dbh->selectall_arrayref('SELECT SLEEP(1)') };
print "Content-Type: text/plain\n\n";
print $rows ? "parent: " . Dumper($rows) : $#;
}
else {
my $rows = eval { $dbh->selectall_arrayref('SELECT SLEEP(1)') };
open FH, '>', $file or die "$file: $!";
print FH $rows ? "child: " . Dumper($rows) : $#;
close FH;
}
The code I used for reconnection:
...
else {
$dbh->disconnect;
$dbh = DBI->connect($dsn, $username, $password, $attrs);
my $rows = eval { $dbh->selectall_arrayref('SELECT SLEEP(1)') };
open FH, '>', $file or die "$file: $!";
print FH $rows ? "child: " . Dumper($rows) : $#;
close FH;
}
Is there a safe way to use Apache::DBI with forking? Is there a way to make it create a new connection perhaps?
I see a few options:
Explicitly close your DB handles when you fork, and reopen them as needed.
e.g.:
my $dbh = DBI->connect(...);
my $pid = fork;
defined $pid or die "fork: $!";
if ($pid) {
# parent...
}
else {
# child...
undef $dbh;
This could be made easier by storing the $dbh in an object, and passing around that object as needed to parts of your system. The object would be responsible for reopening the $dbh as needed, so the rest of the application doesn't have to concern itself with the details. Keep code encapsulated and well-separated from other parts of the system.
Don't use Apache::DBI. I can highly recommend DBIx::Connector, which opens a new connection as needed and doesn't preserve the bad behaviour of either plain DBI or Apache::DBI: see http://search.cpan.org/~dwheeler/DBIx-Connector-0.32/lib/DBIx/Connector.pm#Description for a detailed description of how it differs.
I use DBIx::Connector in my system inside a Moose object, which uses a method delegation to provide the dbh. The application simply does:
my $dbh = $db_dbj->dbh;
my $sth = $dbh->prepare(...);
# more boring DBI code here
...And the dbh is reconnected/regenerated as needed, invisibly.
As an aside, you should be really careful of using bare filehandles in a multiprocess environment. You could be very easily clobbering your data. open (my $fh, $file) or die "Cannot open $file: $!" is much safer.
I'm also a little nervous by seeing you using eval {} blocks without checking the contents of $#. You're just masking errors, rather than dealing with them, so there may be more things going on than you are aware of. Check your result values (or better, use an explicit exception-handling module, such as Try::Tiny. use use strict; use warnings;.
PS. I just noticed that you are explicitly including DBI in your code. Don't do that. If you use Apache::DBI in your startup_modperl.pl (or whatever you call your bootstrap file), you should never have to include DBI itself. I can't say for sure but I wouldn't be confident the right package is getting called (it's been a while since I looked at Apache::DBI's guts; it might take care of this for you though).
Don't fork under mod_perl2. Use Apache2::Subprocess. See also Is it a bad idea to fork under mod_perl2?
You are given either an IO::File object or a typeglob (\*STDOUT or Symbol::symbol_to_ref("main::FH")); how would you go about determining if it is a read or write handle? The interface cannot be extended to pass this information (I am overriding close to add calls to flush and sync before the actual close).
Currently I am attempting to flush and sync the filehandle and ignoring the error "Invalid argument" (which is what I get when I attempt to flush or sync a read filehandle):
eval { $fh->flush; 1 } or do {
#this seems to exclude flushes on read handles
unless ($! =~ /Invalid argument/) {
croak "could not flush $fh: $!";
}
};
eval { $fh->sync; 1 } or do {
#this seems to exclude syncs on read handles
unless ($! =~ /Invalid argument/) {
croak "could not sync $fh: $!";
}
};
Have a look at the fcntl options. Maybe F_GETFL with O_ACCMODE.
Edit: I did a little googling and playing over lunch and here is some probably non-portable code but it works for my Linux box, and probably any Posix system (perhaps even Cygwin, who knows?).
use strict;
use Fcntl;
use IO::File;
my $file;
my %modes = ( 0 => 'Read only', 1 => 'Write only', 2 => 'Read / Write' );
sub open_type {
my $fh = shift;
my $mode = fcntl($fh, F_GETFL, 0);
print "File is: " . $modes{$mode & 3} . "\n";
}
print "out\n";
$file = new IO::File();
$file->open('> /tmp/out');
open_type($file);
print "\n";
print "in\n";
$file = new IO::File();
$file->open('< /etc/passwd');
open_type($file);
print "\n";
print "both\n";
$file = new IO::File();
$file->open('+< /tmp/out');
open_type($file);
Example output:
$ perl test.pl
out
File is: Write only
in
File is: Read only
both
File is: Read / Write