How can I limit file types in CGI file uploads in Perl? - perl

I am using CGI to allow the user to upload some files. I just want the just to be able to upload .txt or .csv files. If the user uploads file with any other format then I want to be able to put out an error message.
I saw that this can be done by javascript: http://www.codestore.net/store.nsf/unid/DOMM-4Q8H9E
But is there a better way to achieve this? Is there is some functionality in Perl that allows this?

The disclaimer on the site to you link to is important:
Note: This is not entirely foolproof as people can easily change the extension of a file before uploading it, or do some other trickery, as in the case of the "LoveBug" virus.
If you really want to do this right, let the user upload the file, and
then use something like File::MimeInfo::Magic (or file(1), the
UNIX utility) to guess the actual file type. If you don't like the
file type, delete the file and give the user an error message.

I just want the just to be able to upload .txt or .csv files.
Sounds easy, doesn't it? It's not. And then some.
The simple approach is just to test that the file ends in ‘.txt’ or ‘.csv’ before storing it on the filesystem. This should be part of a much more in-depth validation of what the filename is allowed to contain before you let a user-submitted filename anywhere near the filesystem.
Because the rules about what can go in a filename are complex on some platforms (especially Windows) it's usually best to create your own filename independently with a known-good name and extension.
In any case there is no guarantee that the browser will send you a file with a usable name at all, and even if it does there is no guarantee that name will have ‘.txt’ or ‘.csv’ at the end, even if it is a text or CSV file. (Some platforms simply do not use extensions for file typing.)
Whilst you can try to sniff the contents of the file to see what type it might be, this is highly unreliable. For example:
<html>,<body>,</body>,</html>
could be plain text, CSV, HTML, XML, or a variety of other formats. Better to give the user an explicit control to say what file type they're uploading (or use one file upload field per type).
Now here's where it gets really nasty. Say you've accepted the upload and stored it as /data/mygoodfilename.txt, and the web server is correctly serving it as the Content-Type ‘text/plain’. What do you think the browser interprets it as? Plain text? You should be so lucky.
The problem is that browsers (primarily IE) don't trust your Content-Type header, and instead sniff the contents of the file to see if it looks like something else. Serve the above snippet as plain text, and IE will happily treat it as HTML. This can be a huge problem, because HTML can include client-side scripts that will take over the user's access to the site (a cross-site-scripting attack).
At this point you might be tempted to sniff the file on the server-side, for example using the ‘file’ command, to check it doesn't contain ‘<html>’. But this is doomed to failure. The ‘file’ command does not sniff for all the same HTML tags as IE does, and other browsers sniff differently anyway. It's quite easy to prepare a file that ‘file’ will claim is not HTML, but that IE will nevertheless treat as if it is (with security-disaster implications).
Content-sniffing approaches such as ‘file’ will give you only a false sense of security. This is a convenience tool for loose guessing of filetypes and not an effective security measure.
At this point your last desperate possibilities are things like:
serving all user-uploaded files from a separate hostname, so that a script injection attack can't purloin the credentials of your main site;
serving all user-uploaded files through a CGI wrapper, adding the header ‘Content-Disposition: attachment’ so that browsers won't attempt to display them directly;
only accepting uploads from trusted users.

On unix the easiest way is to do an JRockway suggested. If not on unix then your options are limited. You can examine the file extension and you can examine the contents to verify. I'm assuming for you specific case that you only want "* seperated value" text files. So one of the Text::CSV::* modules may be useful in verifying the file is the type you asked for.
Security for this operation is a whole other ball of wax.

try this:
$file_name = "file.txt";
$file_cmd = "file \"$file_name"\";
$file_type = `$file_cmd`;
return 0 unless($file_type =~ /(ASCII|text)/i)

Related

How do I configure mplayer to use a default edl file name?

I want to configure mplayer to look for an edl when playing a video. Specifically, I want it to use "show.edl" when playing "show.mp4", assuming both are in the same directory. Very similar to how it looks for subtitles.
I can add a default edl in the config file by adding the following:
edl=default.edl
And this will look for the file "default.edl" IN THE CURRENT DIRECTORY, rather than in the directory where the media file is. And it isn't named after the media file either, and thus even if it did look in the right place, I'd have one single edl file for every media file in that directory.
Not really what I wanted.
So, is there a way, in the "~/.mplayer/config" file, to specify the edl relative to the input file name?
Mplayer's config file format doesn't seem to support any sort of replacement syntax. So there's no way to do this?
MPlayer does not have a native method to specify strings in the config file relative to the input file name. So there's no native way to deal with this.
There's a variety of approaches you could use to get around that. Writing a wrapper around mplayer to parse out the input file and add an "-edl=" parameter is fairly general, but will fail on playlists, and I'm sure lots of other edge cases. The most general solution would of course be to add the functionality to mplayer's config parser (m_parse.c, iirc.)
The simplest, though, is to (ab)use media-specific configuration files.
pros:
Doesn't require recompiling mplayer!
Well defined and limited failure modes. I.E. the ways it fails and when it fails are easily understood, and there aren't hidden "oops, didn't expect that" behaviors hidden anywhere.
Construction and updating of the edl files is easily automated.
cons:
Fails if you move the media around, as the config files need to full path to the edl file to function correctly.
Requires you have a ".conf" file as well as an EDL file, which adds clutter to the file system.
Malicious config files in the media directory may be a security issue. (Though if you're allowing general upload of media files, you probably have bigger problems. mplayer is not at all a security-hardened codebase, nor generally are the codecs it uses.)
To make this work:
Add "use-filedir-conf=yes" to "/etc/mplayer.conf" or "~/.mplayer/config". This is required, as looking in the media directory for config files is turned off by default,
For each file "clip.mp4" which has an edl "clip.edl" in the same directory, create a file "clip.mp4.conf" which contains the single line "edl=/path/to/clip.edl". The complete path is required.
Enjoy!
Automatic creation and updating of the media-specific .conf files is left as an exercise for the student.

Is there a difference between the Outlook .MSG and .OFT file formats?

This question is somewhat of a long shot, but I've spent hours on it to no avail. I have some code that generates an email file on a webserver, and allows the user to download that email and open it in Outlook. From here, they can make various manual changes to the email before they send it to a bunch of people.
Right now, I generate a .OFT file, which is basically an email template. What I want to do is generate a .MSG file, which is an actual email. From a binary point of view, it seems these file formats are identical. They have the same Stream IDs and properties and stuff.
My approach was to first create a blank email message in Outlook and then just save it to a file called Base.oft. In my code, I open the document and modify Stream ID __substg1.0_1013001E which is the ID for the HTML email body. I then save the file and write it out to the cilent. This works perfectly.
I tried the same approach with the MSG format. I created a blank email message, saved it as Base.msg, and modify the same Stream ID. If I look at the resulting file, the new body is actually in there and saved. However, if I open the email, the body is still blank.
What's even weirder is if I type in a body in Outlook and save that to the base file, I can see that body under stream 0_1013001E. If I then modify that stream with a different body, I can verify the new body is indeed saved in the file, but if I open the message in Outlook, I see the old, original body. It's as if the email body is stored in a different place in the file for the .MSG format, however I've looked through each stream and cannot find anything else that looks like it could be an email body.
Perhaps .MSG files are encrypted, or their bodies are stored in some proprietary binary format unlike .OFT files? Hopefully someone has some insight on this, as I scoured the Internet and found basically nothing on these formats.
Update:
It seems the .MSG format stores the body in Stream ID __substg1.0_10090102 - Which is encoded in some binary form (not sure what.) If I delete the stream (or set it to a single \0, the file becomes corrupt.
First of all, to find more information on this and related topics, move away from raw substream numbers and google for the corresponding MAPI properties. For example, 1013 is PR_HTML and 1009 is PR_RTF_COMPRESSED. MAPI has ways of synching the body from one format to the other.
See this article on MSDN for a good overview of all content-related MAPI properties (i.e. the different "streams" inside the .MSG file).
To write PR_RTF_COMPRESSED, wrap the stream inside WrapCompressedStream. On the other hand, in your particular situation you might want to avoid the MAPI-dependencies in your code, so maybe you're better off finding the PR_STORE_SUPPORT_MASK and setting the STORE_UNCOMPRESSED_RTF bit. This will allow you to use straight RTF in the PR_RTF_COMPRESSED substream. Or Outlooks fancy html-wrapped-in-rtf, if you are feeling brave.
None of this stuff is for the faint of heart, but seeing how you are already handing raw .MSG substream writing, I'm guessing it would be feasible.
When it comes to the format, there is no difference.
the only difference is that OFT files have CLSID_TemplateMessage ({0006F046-0000-0000-C000-000000000046}) as the storage class (WriteClassStg), while MSG files use CLSID_MailMessage ({00020D0B-0000-0000-C000-000000000046})

Create Numbers file and open it with Numbers on iPad

I would like to do a task that is quite simple on other OS, but it is not so trivial on iOS. Namely, I want to create file and open it in Numbers.
I can preview the file with UIDocumentInteractionController and then offer it to user that he/she opens it.
THis seems to me quite a reasonable solution. However, I need to offer proper file format. I suppose CSV and XLS would be reasonable to implement and it would most probably work, but I would still like to do it in native Numbers format if possible. However, I can't find any info about this file format.
Basically, this task is about exporting data to another app and then working further with them.
I don't know of a library that can create native Numbers files. There are hoewever some libraries that allow creating XLS files. Since Numbers fully supports XLS, this is probably the way to go.
There is a comercial library available that might work on the iPhone (costs $200): http://www.libxl.com/
As for free XLS libraries, I only know xlwt, a Python module. You could set up a webservice that creates an XLS file for your app, using xlwt on the server side.
If you want to pass information to Numbers, you can probably also use CSV files. If you use CSV files, you must be aware of some things. There are two kinds of CSV files: the comma separated version (used in english speaking countries) and the semicolon separated (used in continental europe).
The comma separated CSV files look for example like this:
"ID","First Name","Last Name","Salary"
1,"John","Malkovich",3400.20
2,"Fred","Astaire",2000.60
The second kind of CSV files are semicolon separated and use a comma as decimal mark. They look like this:
"ID";"First Name";"Last Name";"Salary"
1;"John";"Malkovich";3400,20
2;"Fred";"Astaire";2000,60
On the Macintosh, Numbers expects a different format depending on the Region setting. If you have your Region set to the US, it will expect the first kind. If you choose Germany, it will expect the second kind.
I don't know what kind of files Numbers on the iPad expects.
Another alternative would be using copy and paste. Try to copy tab separated text into the clipboard.
I hope this may help you. I've contacted libxl team and they responded with the link to the demo version of their iPhone library: http://www.libxl.com/download/libxl-iphone.zip

How does the non executeable exploit work?

Hello the question is how works non executable exploit's, when i say non executable i mean those who don't have the file extension .exe, like word exploits .doc or other. How did they make some executable action if they are not compiled?
That varies from exploit to exploit.
While .doc isn't an executable format it does contain interpreted vba code which is generally where the malicious content was hidden. When you opened the document there would be an onOpen event or some such fired which would execute the malicious payload. Hence why most office installations have macro's disabled by default these days, far too much scope for abuse.
There are also plenty of things that will run on your system without being a .exe for example .com, .vbs, .hta
Then there are formats which have no normal executable content but can be attacked in other ways, usually taking advantage of poorly written routines to load the files which can allow things like buffer overflows
The other way is to exploit bugs in the code that handles those files. Often this will be a 'buffer overflow'. Perhaps the code is expecting a header of 100 bytes, but the malicious file has 120 bytes. That causes the program to overwrite some other data in its memory, and if you can smash the 'stack' with your extra bytes it's possible to redirect the processor to a 'payload' code embedded in your file.
google "buffer overflow exploit" for more.

How to read a file from the disk if less than X days old, if older, refetch the html file

I wish to read an html file off of the internet and cache it. Then when I go back, because I'm debugging, I don't want to hammer the servers with the numerous requests I'll need. I don't want to get my IP banned for slamming the server over and over again just because I'm debugging. So my code needs to look something like:
if ((file > days_old) || !(file exists))
fetch html file from internet
save file to disk
else
read it from the disk
Because there will be multiple files, I'll need to include a variable name in the file name so the file is unique and I can easily look it up again.
I just learned Perl this semester and we only learned the basics & a bit of regex, once I get this I should be mostly fine.
Thanks!
Use an existing module:
Cache::Cache
HTTP::Cache::Transparent
If you really want to implement your own, you'll want to look at the If-Modified-Since and ETag HTTP headers to determine when to re-fetch a file, rather than an arbitrary days_old number you suck out of your thumb. You will also have to generate a unique filename, preferably with a hash function, while retaining the original URL to cater for hash collisions.