Download file with Perl on form submit - perl

I am trying to initiate a Save As dialog on a form submit. My form is pretty simple, I'm using Dropzonejs for a drag and drop file, and it looks this:
<form action="action.epl" class="dropzone" id="dropzone" method="post">
</form>
So when the user drops the file it submits and kicks off action.epl. In action.epl, I handle the file and it gets saved to the server. Then I'm trying to spit back out an encrypted version of the file. The encryption is done and I have removed it to make sure it is not the source of the problem, the problem I have now is that I can't get it to download from the server. I have the following (also in action.epl):
$fileName = 'file.pdf';
$filepath= "/server/path/$fileName";
open (FILE, "<$filepath") or die "can't open : $!";
#fileholder = <FILE>;
close FILE;
print "Content-Type:application/x-downloadn";
print "Content-Disposition:attachment;filename=$fileName";
print #fileholder
It's doing /something/ because the submit takes 5x as long as it did without this snippet. What I thought I would get was the "Save As" dialog but nothing happens. This tutorial is where I got my info.
Edit, now I have:
$fileName ='file.pdf';
$filepath = "/server/path/$fileName";
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$fileName\n\n";
open FILE, '<', $filepath or die "can't open: $!";
print while <FILE>;
close FILE;
However, there is still no dialog. I see you have the "$" sigil in your filehandle. I tried that too. But I dont think you need that right?

I see you addressed the typo in the tutorial. The
"Content-Type:application/x-downloadn"
should be "Content-Type: application/x-download\n", to specify the type of the content as "application/x-download" and the "\n" to end the line of the header field.
After that, you're getting into how the browser handles the response. If you provide the Content-Disposition:attachment;filename=$fileName header, you're asserting that the attachment ought to be the given filename $filename. Many browsers will take a peek at the file name, and try to sniff a suitable MIME type for the extension. So, if you're specifying that $filename is a .pdf then to modern browsers, "if it looks like a pdf and smells like a pdf, then it's a pdf". Not only are you saying "this is a pdf" with your specification of the Content-Disposition, you're also providing the name for the file that they download. In most situations, this should prevent the fall-back "save as" behavior.
Your best bet would be to not provide the Content-Disposition. That way, you're not specifying any default name to save the file as, and as such there's no extension for the browser to snoop. Unfortunately, some browsers simply default to the name of the script even if the extension is absurd compared to the contents. In some of the "enterprise solutions" that I deal with in my day-to-day, I get .csv files named as "report.cgi" because they use a MIME type that only Internet Explorer recognizes and they don't provide a Content-Disposition. Buyer beware.
The bottom line is that you can't force the browser to open the "Save As" from the server side unless you have information about the browser and know how to trick it, or you simply don't give it anything to go by (and even then some browsers may have default conventions).
By specifying the Content-Disposition and a filename, you're giving hints as to what the file should be, and what it should be saved as. On the other hand, if you don't give any other hints to the browser other than that the Content-Type is 'application/x-download' then you'll probably get a "Save As" dialog box, but the user will have no idea what kind of content the data is. This puts you at the mercy of the browser's default naming conventions. This is how I get my .csv files as "report.cgi", even when the server is providing a MIME type for csv files (though an IE-only flavor).
What I do is use perl's File::Type and the mime_type function to get the mime type and simply specify a name. If you use the mime_type function for determining the MIME type, and don't specify a file to return, you'll get silly things like .xlsx files being downloaded as zip files, or other absurdity.
How important is it that they get the "Save As" dialog box, because at the end of the day, what file type they choose is irrelevant if the content of the file is not appropriate for the type and they try to open an excel file in acrobat, or vice-versa.
In all of my years of experience doing server side programming, I have always found it futile to try to control the client side.

Related

doc or docx: Is there safeway to identify the type from 'requests' in python3?

1) How can I differentiate doc and docx files from requests?
a) For instance, if I have
url='https://www.iadb.org/Document.cfm?id=36943997'
r = requests.get(url,timeout=15)
print(r.headers['content-type'])
I get this:
application/vnd.openxmlformats-officedocument.wordprocessingml.document
This file is a docx.
b) If I have
url='https://www.iadb.org/Document.cfm?id=36943972'
r = requests.get(url,timeout=15)
print(r.headers['content-type'])
I get this
application/msword
This file is a doc.
2) Are there other options?
3) If I save a docx file as doc or vice-versa may I have recognition problems (for instance, for conversion to pdf?)? Is there any kind of best practice for dealing with this?
The mime headers you get appear to be the correct ones: What is a correct mime type for docx, pptx etc?
However, the sending software can only go on what file its user selected – and there still are a lot of people sending files with the wrong extension. Some software can handle this, others cannot. To see this in action, change the name of a PNG image to end with JPEG instead. I just did on my Mac and Preview still is able to open it. When I press ⌘+I in the Finder it says it is a JPEG file, but when opened in Preview it gets correctly identified as a "Portable Network Graphics" file. (Your OS may or may not be able to do this.)
But after the file is downloaded, you can unambiguously differ between a DOC and a DOCX file, even if the author got its extension wrong.
A DOC file starts with a Microsoft OLE Header, which is quite complicated structure. A DOCX file, on the other hand, is a compound file format containing lots of smaller XML files, compressed together using a standard ZIP file compression. Therefore, this file type always will start with the two characters PK.
This check is compatible with Python 2.7 and 3.x (only one needs the decode):
import sys
if len(sys.argv) == 2:
print ('testing file: '+sys.argv[1])
with open(sys.argv[1], 'rb') as testMe:
startBytes = testMe.read(2).decode('latin1')
print (startBytes)
if startBytes == 'PK':
print ('This is a DOCX document')
else:
print ('This is a DOC document')
Technically it will confidently state "This is a DOC document" for anything that does not start with PK, and, conversely, it will say "This is a DOCX document" for any zipped file (or even a plain text file that happens to start with those two characters). So if you further process the file based on this decision, you may find out it's not a Microsoft Word document after all. But at least you will have tried with the proper decoder.

Get information about file attachment via perl Mechanize

BACKGROUND
I am experimenting with Mechanize on a web forum. The forum has some file attachments in its threads. The attachment can be of various media types. Each attachment has a link to a server-side program called "attachment.php?" and a unique id which identifies the file. When you visit it in a normal browser, a file is returned and the browser decides what to do with it. If it's an image, the file is displayed in the browser window and the titlebar is set to the filename. If it's another type of file, the browser will ask if you want to download the file (and it automatically sets the filename to the name of the file).
QUESTION
My question is how can I explore the details of such file attachments with Mechanize so that I can determine filetype and filename?
I've already successfully downloaded a file using my program, but I have to tell Mechanize what the filename should be. I would prefer to keep the original filename, but to do that I have to be able to discover it somehow. I know it can be done because my browser is able to determine the filetype and filename.
As a secondary objective I would also like to query the size of the file, if this is possible.
I hope my question makes sense and thank you in advance to anyone who takes the time to answer.
To achieve an inspection of the filetype you have to use $mech->res(). It returns an HTTP::Response object, and this class provides the filename method.
Example:
foreach (#media)
{
print "Fetching " . $_->url() . "\n";
$m->get($_);
my $res = $m->res();
if($res->is_success)
{
my $filename = $res->filename();
print "$filename\n";
}
}

Open a local web page from Perl

I'm writing a Perl script that creates HTML output and I would like to have it open in the user's preferred browser. Is there a good way to do this? I can't see a way of using ShellExecute since I don't have an http: address for it.
Assuming you saved your output to "../data/index.html",
$ret = system( 'start ..\data\index.html' );
should open the file in the default browser.
Added:
Advice here:
my $filename = "/xyzzy.html"; #whatever
system("start file://$filename");
If I understand what you're trying to do, this will not work. You would have to setup a web server, like apache and configure it to execute your script. This wouldn't be a trivial task if you've never done it before.
Since this is Windows, the easy option is to dump the data to a temporary file using File::Temp (making sure it has an extension .htm or .html, and that it isn't cleaned up immediately on script exit, so that the file remains, i.e, you probably want something like File::Temp->new(UNLINK => 0, SUFFIX => '.htm')). Then you ought to be able to use Win32::FileOp's ShellExecute to open the file regularly. This does make all sorts of assumptions about file types being associated with file extensions, but then, that's how Windows tends to work.

TFMail : How to keep original name of attachments

TFMail was a popular CGI Form Mail script at one time.
Unfortunately, my client insists on continuing to use it. I hope that there are people who still use it and are experts in using it. The best documentation I can find is someone's home made reference sheet.
In my HTML form, I have an input named attachment1 :
<input type="file" name="attachment1" id="attachment1" />
In my trc config file, I specify the types the attachment can be
# Upload File Types
upload_attachment1: jpg jpeg pdf xxx
In the email template, I display the original name of the file:
Original File Name of Attachment 1: {= param.attachment1 =}
So I fill out the form, and attach a file called myImage32.jpg
BUT in the email, the file gets renamed and attached as attachment1.jpg
How or where can I specify the name of the file? I'm going to end up with hundreds on files named attachment1.jpg if I leave it like this.
I don't know anything about TFMail, but I just glanced at the source code. On line 700 of TFmail.pl it is assigning the name of your input tag to be the filename for the attachment. It doesn't appear to be checking for any config options to set this filename.
It might be easy to modify. The actual file name ($filename variable) is assigned a few lines earlier. If you go this route make sure to clean-up $filename. Depending on the upload browser it might just be a filename or the whole file path.

How can I limit file types in CGI file uploads in Perl?

I am using CGI to allow the user to upload some files. I just want the just to be able to upload .txt or .csv files. If the user uploads file with any other format then I want to be able to put out an error message.
I saw that this can be done by javascript: http://www.codestore.net/store.nsf/unid/DOMM-4Q8H9E
But is there a better way to achieve this? Is there is some functionality in Perl that allows this?
The disclaimer on the site to you link to is important:
Note: This is not entirely foolproof as people can easily change the extension of a file before uploading it, or do some other trickery, as in the case of the "LoveBug" virus.
If you really want to do this right, let the user upload the file, and
then use something like File::MimeInfo::Magic (or file(1), the
UNIX utility) to guess the actual file type. If you don't like the
file type, delete the file and give the user an error message.
I just want the just to be able to upload .txt or .csv files.
Sounds easy, doesn't it? It's not. And then some.
The simple approach is just to test that the file ends in ‘.txt’ or ‘.csv’ before storing it on the filesystem. This should be part of a much more in-depth validation of what the filename is allowed to contain before you let a user-submitted filename anywhere near the filesystem.
Because the rules about what can go in a filename are complex on some platforms (especially Windows) it's usually best to create your own filename independently with a known-good name and extension.
In any case there is no guarantee that the browser will send you a file with a usable name at all, and even if it does there is no guarantee that name will have ‘.txt’ or ‘.csv’ at the end, even if it is a text or CSV file. (Some platforms simply do not use extensions for file typing.)
Whilst you can try to sniff the contents of the file to see what type it might be, this is highly unreliable. For example:
<html>,<body>,</body>,</html>
could be plain text, CSV, HTML, XML, or a variety of other formats. Better to give the user an explicit control to say what file type they're uploading (or use one file upload field per type).
Now here's where it gets really nasty. Say you've accepted the upload and stored it as /data/mygoodfilename.txt, and the web server is correctly serving it as the Content-Type ‘text/plain’. What do you think the browser interprets it as? Plain text? You should be so lucky.
The problem is that browsers (primarily IE) don't trust your Content-Type header, and instead sniff the contents of the file to see if it looks like something else. Serve the above snippet as plain text, and IE will happily treat it as HTML. This can be a huge problem, because HTML can include client-side scripts that will take over the user's access to the site (a cross-site-scripting attack).
At this point you might be tempted to sniff the file on the server-side, for example using the ‘file’ command, to check it doesn't contain ‘<html>’. But this is doomed to failure. The ‘file’ command does not sniff for all the same HTML tags as IE does, and other browsers sniff differently anyway. It's quite easy to prepare a file that ‘file’ will claim is not HTML, but that IE will nevertheless treat as if it is (with security-disaster implications).
Content-sniffing approaches such as ‘file’ will give you only a false sense of security. This is a convenience tool for loose guessing of filetypes and not an effective security measure.
At this point your last desperate possibilities are things like:
serving all user-uploaded files from a separate hostname, so that a script injection attack can't purloin the credentials of your main site;
serving all user-uploaded files through a CGI wrapper, adding the header ‘Content-Disposition: attachment’ so that browsers won't attempt to display them directly;
only accepting uploads from trusted users.
On unix the easiest way is to do an JRockway suggested. If not on unix then your options are limited. You can examine the file extension and you can examine the contents to verify. I'm assuming for you specific case that you only want "* seperated value" text files. So one of the Text::CSV::* modules may be useful in verifying the file is the type you asked for.
Security for this operation is a whole other ball of wax.
try this:
$file_name = "file.txt";
$file_cmd = "file \"$file_name"\";
$file_type = `$file_cmd`;
return 0 unless($file_type =~ /(ASCII|text)/i)