Write PDF file bytes to a file with Perl? - perl

I'm reading a PDF file into an array of bytes byte[] and sending this to a Perl SOAP::Lite web service. Once the Perl service receives this array of bytes, I'd like to write them to a file (with a PDF extension of course).
How can I achieve this? All the examples I can dig up assume that I'd like to begin with opening a file, reading, then writing...but what if only have the raw data to work with?

I don't think array of bytes is good use of perl data structures, you would waste a lot of memory this way. Just use string for file contents and write it into binary file (:raw setting in open):
my $pdf_data = 'contents of PDF ...';
open my $ofh, '>:raw', 'output.pdf'
or die "Could not write: $!";
print {$ofh} $pdf_data;
close $ofh;

Does this work for you? My Perl is a little rusty.
open(OUTFILE,">>output.pdf");
binmode OUTFILE;
foreach my $byte (#bytes){
print OUTFILE $byte;
}
close(OUTFILE);

Related

Encrypt file in Perl

I created the algorithm to encrypt and decrypt a string in Perl ( using AES in CFB mode). Now I want to extend to encryption to file level. How should I get the content of the file? What would be a good approach?
Read the file normally
open(my $fh, "<", "myTestFile.ext");
Read the file in binmode
open(my $fh, "<", "myTestFile.ext");
binmode $fh
Then how should I store the content of the files?
a) Read all the content of the file in one string and provide the string to the implemented program
my $document = do {
local $/ = undef;
<$fh>; # file handle opened previously
};
encryptionAlgorithm($document);
b) Read the content of the file line by line
while( my $line = <$fh>)
{
encryptionAlgorithm($line);
}
In both cases should I chomp the \n's ?
AES encrypts blocks of 128 bits (16 bytes), so you'll want to read your file 16 bytes at a time. To do this, you need to binmode your file, then read it with the read builtin:
open my $fh, '<', 'myTestFile.ext' or die $!;
binmode $fh;
while (read($fh,my $block,16)) {
# encrypt $block
}
Note that I've added or die $! after opening the file: you want to always make sure your open worked.
Also, don't forget that if the block you read is less than 16 bytes long, you'll have to do some padding. (I don't recall how the blocks are padded for AES, but I trust you do since you are implementing it)
About the approaches you thought of:
Reading the entire file at once will potentially consume a lot of memory if the file is big.
Reading the file line by line: if the file contains no newline, then you'll read it entirely at once, which might consume a lot of memory. And if lines contain a number of bytes which isn't a multiple of 16, then you'll have to combine bytes from different lines, which will require more work than simply reading blocks of 16 bytes.
Also, you definitely don't want to chomp anything! You should have decrypt(encrypt(file)) == file, but if you chomp the newlines, that won't be the case anymore.

Out of memory when serving a very big binary file over HTTP

The code below is the original code of a Perl CGI script we are using. Even for very big files it seems to be working, but not for really huge files.
The current code is :
$files_location = $c->{target_dir}.'/'.$ID;
open(DLFILE, "<$files_location") || Error('open', 'file');
#fileholder = <DLFILE>;
close (DLFILE) || Error ('close', 'file');
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$name\n\n";
print #fileholder;
binmode $DLFILE;
If I understand the code correctly, it is loading the whole file in memory before "printing" it. Of course I suppose it would be a lot better to load and display it by chunks ? But after having read many forums and tutorials I am still not sure how to do it best, with standard Perl libraries...
Last question, why is "binmode" specified at the end ?
Thanks a lot for any hint or advice,
I have no idea what binmode $DLFILE is for. $DLFILE is nothing to do with the file handle DLFILE, and it's a bit late to set the binmode of the file now that it has been read to the end. It's probably just a mistake
You can use this instead. It uses modern Perl best practices and reads and sends the file in 8K chunks
The file name seems to be made from $ID so I'm not sure that $name would be correct, but I can't tell
Make sure to keep the braces, as the block makes Perl restore the old value of $/ and close the open file handle
my $files_location = "$c->{target_dir}/$ID";
{
print "Content-Type: application/x-download\n";
print "Content-Disposition: attachment; filename=$name\n\n";
open my $fh, '<:raw', $files_location or Error('open', "file $files_location");
local $/ = \( 8 * 1024 );
print while <$fh>;
}
You're pulling the entire file at once into memory. Best to loop over the file line-by-line, which eliminates this problem.
Note also that I've modified the code to use the proper 3-arg open, and to use a lexical file handle instead of a global bareword one.
open my $fh, '<', $files_location or die $!;
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$name\n\n";
while (my $line = <$fh>){
print $line;
}
The binmode call appears to be useless in the context of what you've shown here, as $DLFILE doesn't appear to be a valid, in-use variable (add use strict; and use warnings; at the top of your script...)

Print other language character in csv using perl file handling

I am scraping a site based on German language , I am trying to store the content of the site in a CSV using Perl , but i am facing garbage value in the csv, the code i use is
open my $fh, '>> :encoding(UTF-8)', 'output.csv';
print {$fh} qq|"$title"\n|;
close $fh;
For example :I expect Weiß ,Römersandalen , but i get Weiß, Römersandalen
Update :
Code
use strict;
use warnings;
use utf8;
use WWW::Mechanize::Firefox;
use autodie qw(:all);
my $m = WWW::Mechanize::Firefox->new();
print "\n\n *******Program Begins********\n\n";
$m->get($url) or die "unable to get $url";
my $Home_Con=$m->content;
my $title='';
if($Home_Con=~m/<span id="btAsinTitle">([^<]*?)<\/span>/is){
$title=$1;
print "title ::$1\n";
}
open my $fh, '>> :encoding(UTF-8)', 's.txt'; #<= (Weiß)
print {$fh} qq|"$title"\n|;
close $fh;
open $fh, '>> :encoding(UTF-8)', 's1.csv'; #<= (Weiß)
print {$fh} qq|"$title"\n|;
close $fh;
print "\n\n *******Program ends********";
<>;
This is the part of code. The method works fine in text files, but not in csv.
You've shown us the code where you're encoding the data correctly as you write it to the file.
What we also need to see is how the data gets into your program. Are you decoding it correctly at that point?
Update:
If the code was really just my $title='Weiß ,Römersandalen' as you say in the comments, then the solution would be as simple as adding use utf8 to your code.
The point is that Perl needs to know how to interpret the stream of bytes that it's dealing with. Outside your program, data exists as bytes in various encodings. You need to decode that data as it enters your program (decoding turns a stream of bytes into a string of characters) and encode it again as it leaves your program. You're doing the encoding step correctly, but not the decoding step.
The reason that use utf8 fixes that in the simple example you've given is that use utf8 tells Perl that your source code should be interpreted as a stream of bytes encoded as utf8. It then converts that stream of bytes into a string of characters containing the correct characters for 'Weiß ,Römersandalen'. It can then successfully encode those characters into bytes representing those characters encoded as utf8 as they are written to the file.
Your data is actually coming from a web page. I assume you're using LWP::Simple or something like that. That data might be encoded as utf8 (I doubt it, given the problems you're having) but it might also be encoded as ISO-8859-1 or ISO-8859-9 or CP1252 or any number of other encodings. Unless you know what the encoding is and correctly decode the incoming data, you will see the results that you are getting.
Check if there are any weird characters at start or anywhere in the file using commands like head or tail

Force UTF-8 Byte Order Mark in Perl file output

I'm writing out a CSV file using Perl. The data going into the CSV contains Unicode characters. I'm using the following to write the CSV out:
#OPEN THE FILE FOR WRITE
open(my $fh, ">:utf8", "rpt-".$datestring.".csv")
or die "cannot open < rpt.csv: $!";
That is writing the characters correctly inside the file but doesn't appear to be including the UTF8 Byte Order Mark. This in turn throws off my users trying to open the file in Excel. Is there a way to force the Byte Order Mark to be written?
I attempted it the following way:
print $fh "\x{EFBBBF};
I ended up with gibberish at the top of the file. Any help would be greatly appreciated.
Try doing this:
print $fh chr(65279);
after opening the file.

PERL CGI: Filehandle that uses both binary and text mode

I have a perl program that writes application/zip document with binary data. I can do the following code in my cgi script.
print $cgi->header(-type=>"application/x-zip; name=\"$filename.zip\"", -content_disposition=> "attachment; filename=\"$filename.zip\""). "\n";
print $result
where $result is the binary data. This will then output a page that prompts the user to download the zip
What I want to do though is pass that entire 'webpage' as form parameter, so I did this:
open $resultfh, ">", \$output_buffer or die "could not open buffer";
print $resultfh $cgi->header(-type=>"application/x-zip; name=\"$filename.zip\"", -content_disposition=> "attachment; filename=\"$filename.zip\""). "\n";
print $resultfh $result
and then I can pass the $output_buffer around as variable.
The problem is that this doesn't work, something seems to get passed because I'm prompted to download the zipfile, but the zipfile is corrupted, I get a mismatch between the expected bytes and the actual bytes or something.
I think this has to do with that output buffer not being in binary mode, but I can't read the content header in binary mode, so can I have a file handle be partially in text and partially in binary?
If not, what options do I have?
EDIT: The problem actually seems to happen when I pass the binary data as a cgi form param. Anyone know what the problem might be? Maybe a size limit?
Set the filehandle to use binary. When you need to print something that you know is "text", use the appropriate end-of-line sequence explicitly. For example, for data that will be processed on Windows:
binmode $handle;
print $handle $some_text, "\r\n";
print $handle $some_binary_data;