Encrypt file in Perl - perl

I created the algorithm to encrypt and decrypt a string in Perl ( using AES in CFB mode). Now I want to extend to encryption to file level. How should I get the content of the file? What would be a good approach?
Read the file normally
open(my $fh, "<", "myTestFile.ext");
Read the file in binmode
open(my $fh, "<", "myTestFile.ext");
binmode $fh
Then how should I store the content of the files?
a) Read all the content of the file in one string and provide the string to the implemented program
my $document = do {
local $/ = undef;
<$fh>; # file handle opened previously
};
encryptionAlgorithm($document);
b) Read the content of the file line by line
while( my $line = <$fh>)
{
encryptionAlgorithm($line);
}
In both cases should I chomp the \n's ?

AES encrypts blocks of 128 bits (16 bytes), so you'll want to read your file 16 bytes at a time. To do this, you need to binmode your file, then read it with the read builtin:
open my $fh, '<', 'myTestFile.ext' or die $!;
binmode $fh;
while (read($fh,my $block,16)) {
# encrypt $block
}
Note that I've added or die $! after opening the file: you want to always make sure your open worked.
Also, don't forget that if the block you read is less than 16 bytes long, you'll have to do some padding. (I don't recall how the blocks are padded for AES, but I trust you do since you are implementing it)
About the approaches you thought of:
Reading the entire file at once will potentially consume a lot of memory if the file is big.
Reading the file line by line: if the file contains no newline, then you'll read it entirely at once, which might consume a lot of memory. And if lines contain a number of bytes which isn't a multiple of 16, then you'll have to combine bytes from different lines, which will require more work than simply reading blocks of 16 bytes.
Also, you definitely don't want to chomp anything! You should have decrypt(encrypt(file)) == file, but if you chomp the newlines, that won't be the case anymore.

Related

Out of memory when serving a very big binary file over HTTP

The code below is the original code of a Perl CGI script we are using. Even for very big files it seems to be working, but not for really huge files.
The current code is :
$files_location = $c->{target_dir}.'/'.$ID;
open(DLFILE, "<$files_location") || Error('open', 'file');
#fileholder = <DLFILE>;
close (DLFILE) || Error ('close', 'file');
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$name\n\n";
print #fileholder;
binmode $DLFILE;
If I understand the code correctly, it is loading the whole file in memory before "printing" it. Of course I suppose it would be a lot better to load and display it by chunks ? But after having read many forums and tutorials I am still not sure how to do it best, with standard Perl libraries...
Last question, why is "binmode" specified at the end ?
Thanks a lot for any hint or advice,
I have no idea what binmode $DLFILE is for. $DLFILE is nothing to do with the file handle DLFILE, and it's a bit late to set the binmode of the file now that it has been read to the end. It's probably just a mistake
You can use this instead. It uses modern Perl best practices and reads and sends the file in 8K chunks
The file name seems to be made from $ID so I'm not sure that $name would be correct, but I can't tell
Make sure to keep the braces, as the block makes Perl restore the old value of $/ and close the open file handle
my $files_location = "$c->{target_dir}/$ID";
{
print "Content-Type: application/x-download\n";
print "Content-Disposition: attachment; filename=$name\n\n";
open my $fh, '<:raw', $files_location or Error('open', "file $files_location");
local $/ = \( 8 * 1024 );
print while <$fh>;
}
You're pulling the entire file at once into memory. Best to loop over the file line-by-line, which eliminates this problem.
Note also that I've modified the code to use the proper 3-arg open, and to use a lexical file handle instead of a global bareword one.
open my $fh, '<', $files_location or die $!;
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$name\n\n";
while (my $line = <$fh>){
print $line;
}
The binmode call appears to be useless in the context of what you've shown here, as $DLFILE doesn't appear to be a valid, in-use variable (add use strict; and use warnings; at the top of your script...)

Holding files in memory in perl while using them like file handles

I have a script in perl that I need to modify. The script opens, reads and seeks through two large (ASCII) files (they are several GB in size). Since it does that quite a bit, I would like to put these two files completely into RAM. The easiest way of doing this while not modifying the script a lot would be to load the files into the memory in a way that I can treat the resulting variable just as a file handle - and for example use seek to get to a specific byte position. Is that possible in perl?
Update: Using File::Slurp as proposed does the job only for small files. If the files are larger than about 2GB, it doesn't work.
Mimimum example:
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
use File::Slurp 'read_file';
my $fn="testfile";
#buffer, then open as file, read first line:
read_file($fn, buf_ref => \my $file_contents_forests) or die "Could not read file!";
my $filehandle;
open($filehandle, "<", \$file_contents_forests) or die "Could not open buffer: $!\n";
my $line = "the first line:".<$filehandle>;
print $line."\n";
close($filehandle);
#open as file, read first line:
open( FORESTS, "<",$fn) or die "Could not open file.\n";
my $line = "the first line:".<FORESTS>;
print $line;
close(FORESTS);
The output in this case is identical for the two methods if the file size is < 2 GB. If the file is larger, then slurping returns an empty line.
Read in the file:
use File::Slurp 'read_file';
read_file( "filename", buf_ref => \my $file_contents );
and open a filehandle to it:
open my $file_handle, '<', \$file_contents;

Read & Seek in gzip files Perl

I am trying to read given set of gzip/plain xml files and printing some portions of these files into output xml files based on given offset and length values.
The offset values are keys of hash %offhash and corresponding keys are length.
Here is the funcntion I used for generating output files-
sub fileproc {
my $infile = shift;
my $outfile = shift;
my $FILEH;
$| = 1;
$outfile =~ s/.gz$//;
if($infile =~ m/\.gz$/i){
open( $FILEH,"gunzip -c $infile | ") or die "Could not open input $infile";
}
else{
open( $FILEH, "<", $infile) or die "Could not open input $infile";
}
open(my $OUTH, ">", $outfile) or die "Couldn't open file, $!";
foreach my $offset (sort{$a <=> $b} keys %offhash)
{
my $record="";
seek ($FILEH, $offset, 0);
read ($FILEH, $record, $offhash{$offset}, 0);
print $OUTH "$record";
}
close $FILEH;
close $OUTH;
}
This function works properly for plain xml input files but creating some buffering issue when there are some(or all) .xml.gz files in the input file set. The output file in this case contains data from some previous read input(.gz) files.
It seems the problem is in the line--
open( $FILEH,"gunzip -c $infile | ") or die "Could not open input $infile";
Can anyone help me to resolve this issue?
Thanks in advance.
You can only seek in regular files, not in the output of programs or STDIN etc. If you want to do this, you need to add a buffering layer yourself, but note that you might to need to buffer the whole uncompressed file just to be able to seek in it.
Even if you don't gunzip with an external program, but use something like IO::Gzip you will not be able to seek, because the inherent way gzip (and other compressions) work, is that you need to read all the previous data to be able to decompress the data at the current file position. There are ways around it to limit the amount of necessary previous data, but then you would need to specifically prepare your gzip file and it will grow bigger. I'm not aware of any module which implements this currently, but I did a proof-of-concept once so I know it works.

Perl printing binary to files - cr lf

I am not a regular Perl programmer and I could not find anything about this in the forum or few books I have.
I am trying to write binary data to a file using the construct:
print filehandle $record
I note that all of my records truncate when an x'0A' is encountered so apparently Perl uses the LF as and end of record indicator. How can I write the complete records, using for example, a length specifier? I am worried about Perl tampering with other binary "non printables" as well.
thanks
Fritz
You want to use
open(my $fh, '<', $qfn) or die $!;
binmode($fh);
or
open(my $fh, '<:raw', $qfn) or die $!;
to prevent modifications. Same goes for output handles.
This "truncation at 0A" talk makes it sound like you're using readline and expect to do something other than read a line.
Well, actually, it can! You just need to tell readline you want it to read fix width records.
local $/ = \4096;
while (my $rec = <$fh>) {
...
}
The other alternative would be to use read.
while (1) {
my $rv = read($fh, my $rec, 4096);
die $! if !defined($rv);
last if !$rv;
...
}
binmode
open
read
readline (aka <> and <$fh>)
$/
Perl is not "tampering" with your writes. If your records are being truncated when they encounter a line feed, then that's a problem with the code that reads them, not the code that writes them. (Unless the format specifies that line feeds must be escaped, in which case the "problem" with the code writing the file is that it doesn't tamper with the data (by escaping line feeds) and instead writes exactly what you tell it to.)
Please provide a small (but runnable) code sample demonstrating your issue, ideally including both reading and writing, along with the actual result and the desired result, and we'll be able to give more specific help.
Note, however, that \n does not map directly to a single data byte (ASCII character) unless you're in binary mode. If the file is being read or written in text mode, \n could be just a CR, just a LF, or a CRLF, depending on the operating system it's being run under.

Write PDF file bytes to a file with Perl?

I'm reading a PDF file into an array of bytes byte[] and sending this to a Perl SOAP::Lite web service. Once the Perl service receives this array of bytes, I'd like to write them to a file (with a PDF extension of course).
How can I achieve this? All the examples I can dig up assume that I'd like to begin with opening a file, reading, then writing...but what if only have the raw data to work with?
I don't think array of bytes is good use of perl data structures, you would waste a lot of memory this way. Just use string for file contents and write it into binary file (:raw setting in open):
my $pdf_data = 'contents of PDF ...';
open my $ofh, '>:raw', 'output.pdf'
or die "Could not write: $!";
print {$ofh} $pdf_data;
close $ofh;
Does this work for you? My Perl is a little rusty.
open(OUTFILE,">>output.pdf");
binmode OUTFILE;
foreach my $byte (#bytes){
print OUTFILE $byte;
}
close(OUTFILE);