How is mime type of an uploaded file determined by browser? - forms

I have a web app where the user needs to upload a .zip file. On the server-side, I am checking the mime type of the uploaded file, to make sure it is application/x-zip-compressed or application/zip.
This worked fine for me on Firefox and IE. However, when a coworker tested it, it failed for him on Firefox (sent mime type was something like "application/octet-stream") but worked on Internet Explorer. Our setups seem to be identical: IE8, FF 3.5.1 with all add-ons disabled, Windows XP SP3, WinRAR installed as native .zip file handler (not sure if that's relevant).
So my question is: How does the browser determine what mime type to send?
Please note: I know that the mime type is sent by the browser and, therefore, unreliable. I am just checking it as a convenience--mainly to give a more friendly error message than the ones you get by trying to open a non-zip file as a zip file, and to avoid loading the (presumably heavy) zip file libraries.

Chrome
Chrome (version 38 as of writing) has 3 ways to determine the MIME type and does so in a certain order. The snippet below is from file src/net/base/mime_util.cc, method MimeUtil::GetMimeTypeFromExtensionHelper.
// We implement the same algorithm as Mozilla for mapping a file extension to
// a mime type. That is, we first check a hard-coded list (that cannot be
// overridden), and then if not found there, we defer to the system registry.
// Finally, we scan a secondary hard-coded list to catch types that we can
// deduce but that we also want to allow the OS to override.
The hard-coded lists come a bit earlier in the file: https://cs.chromium.org/chromium/src/net/base/mime_util.cc?l=170 (kPrimaryMappings and kSecondaryMappings).
An example: when uploading a CSV file from a Windows system with Microsoft Excel installed, Chrome will report this as application/vnd.ms-excel. This is because .csv is not specified in the first hard-coded list, so the browser falls back to the system registry. HKEY_CLASSES_ROOT\.csv has a value named Content Type that is set to application/vnd.ms-excel.
Internet Explorer
Again using the same example, the browser will report application/vnd.ms-excel. I think it's reasonable to assume Internet Explorer (version 11 as of writing) uses the registry. Possibly it also makes use of a hard-coded list like Chrome and Firefox, but its closed source nature makes it hard to verify.
Firefox
As indicated in the Chrome code, Firefox (version 32 as of writing) works in a similar way. Snippet from file uriloader\exthandler\nsExternalHelperAppService.cpp, method nsExternalHelperAppService::GetTypeFromExtension
// OK. We want to try the following sources of mimetype information, in this order:
// 1. defaultMimeEntries array
// 2. User-set preferences (managed by the handler service)
// 3. OS-provided information
// 4. our "extras" array
// 5. Information from plugins
// 6. The "ext-to-type-mapping" category
The hard-coded lists come earlier in the file, somewhere near line 441. You're looking for defaultMimeEntries and extraMimeEntries.
With my current profile, the browser will report text/csv because there's an entry for it in mimeTypes.rdf (item 2 in the list above). With a fresh profile, which does not have this entry, the browser will report application/vnd.ms-excel (item 3 in the list).
Summary
The hard-coded lists in the browsers are pretty limited. Often, the MIME type sent by the browser will be the one reported by the OS. And this is exactly why, as stated in the question, the MIME type reported by the browser is unreliable.

Kip, I spent some time reading RFCs, MSDN and MDN. Here is what I could understand. When a browser encounters a file for upload, it looks at the first buffer of data it receives and then runs a test on it. These tests try to determine if the file is a known mime type or not, and if known mime type it will simply further test it for which known mime type and take action accordingly. I think IE tries to do this first rather than just determining the file type from extension. This page explains this for IE http://msdn.microsoft.com/en-us/library/ms775147%28v=vs.85%29.aspx. For firefox, what I could understand was that it tries to read file info from filesystem or directory entry and then determines the file type. Here is a link for FF https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIFile. I would still like to have more authoritative info on this.

This is probably OS and possibly browser dependent, but on Windows, the MIME type for a given file extension can be found by looking in the registry under HKCR:
For example:
HKEY_CLASSES_ROOT.zip
- ContentType
To go from MIME to file extension, you can look at the keys under
HKEY_CLASSES_ROOT\Mime\Database\Content Type
To get the default extension for a particular MIME type.

While this is not an answer to your question, it does solve the problem you are trying to solve. YMMV.
As you wrote, mime type is not reliable as each browser has its way of determining it. However, browsers send the original name (including extension) of the file. So the best way to deal with the problem is to inspect extension of the file instead of the MIME type.
If you still need the mime type, you can use your own apache's mime.types to determine it server-side.

I agree with johndodo, there are so many variables that make mime types that are sent from browsers unreliable. I would exclude the subtypes that are received and just focus on the type like 'application'. if your app is php based, you can easily do this by using the function explode().
in addition, just check the file extension to make sure it is .zip or any other compression you are looking for!

According to rfc1867 - Form-based file upload in HTML:
Each part should be labelled with an appropriate content-type if the
media type is known (e.g., inferred from the file extension or
operating system typing information) or as application/octet-stream.
So my understanding is, application/octet-stream is kind of like a blanket catch-all identifier if the type cannot be inferred.

Related

Browser's view-source: Can files be "downloaded" this way?

As you probably know, one can view the original response HTML code for any website URL by prefixing it with view-source: in the browser (e.g. view-source:https://www.google.de/).
Now interestingly, this also works for URLs that lead to files with types other than HTML. For instance, view-source:https://d3.7-zip.org/a/7z2107.exe will show the .exe file (here of 7zip) as byte stream (probably interpreted as latin1 or another encoding). You would get a similar result if you downloaded the .exe file normally and then open it in Notepad.
My question is this: When I just manually copy the code view-source: gives me for a .exe file, paste it in Notepad and then save it as .exe, the file is of roughly correct size but corrupted. Can there anything be done to fix this?
(If you wonder why anyone would want to do this, the admittedly exotic case is browser automatization with Selenium, which is not really able to download files normally, for a resource that is protected in such a way that it practically can only be downloaded by real browsers.)
When an application is compiled, there are static references to parts of the executable, calculated as offset in bytes. These can be as broad as the .text and .data sections of the executable, or more low-level like function call addresses and jumps.
If you open an exe in a real disassembler, you'll see that there are hard coded jumps in bytes, function addresses in bytes, etc. When you open exe in text editor, these jumps make the processor start running random code, which causes an exception. That causes Windows to believe its not a valid executable anymore.

How to find out the source of a request (in chrome dev tools)?

I have a weird network request in my page, which refers to JavaScript files, which I removed from every html file earlier. Cache is cleared and there is no single reference to be found in the source html and the JavaScript files. For fixing that and also out of general curiosity I would like to know if there is a simple way to find out where a request was triggered, preferably using the chrome-devtools.
Update:
Thanks to jaredwilli I found the initator column under the network-tab. However this only shows Other. What I would like to know, is the (html or javascript) file where those Requests have been triggered.
On the Network panel, you can determine what the initiator of a request was by viewing the Initiator column. It gives you the file, line number and type of resource it was, either Script or something else.

Recognize my mime type without file extensions on iOS

I am writing an application that needs to recognize my custom mime type so that when such file is downloaded from a server my application will be launched. I read the great article of Brad on how to write a mime type recognizer under iOS at How do I associate file types with an iPhone application? and it works well if and only if the extensions of the file is also specified in the UTExportedTypeDeclarations / UTTypeTagSpecification section of my plist and the server serves the files with the same extension. If the server serves the file with a different extension or if no extensions are specified in the plist but the mime-type is matching, the following happens:
The browser (or the application that received the file) shows the correct icon of my file type with the correct [Open in myApplication] button but clicking on the button does nothing, my application is not launched and if it is running, no application:openURL:sourceApplication:annotation: message is sent.
Is there any way to write a file type recognizer based only on the mime-type, without a specific file extension?
this has been answered, you'd want to use NSURLRequest, this will allow you to get to the mimeType which you can use to determine the file extension as needed. the full code and additional hints and tips are available at this post:
https://stackoverflow.com/a/1401918/728261

how do I set a Thunderbird signature to use a dynamic url's html?

I want to use a dynamic email signature in Thunderbird, that is context aware (depends on date, events in db, etc.)
If I have a PHP that can generate the signature html (i.e. http://www.site.com/email_sign.php)
how do I force Thunderbird to use it?
(the only options I see are using static html (whether inline, of from a local system file).
any ideas?
You can use the Signature Switch add-on and a batch file calling wget to achieve what you want. I wrote a simple executable to replace the bat file; you can read about it (and download it if you want to) from http://www.else.co.nz/portfolio/020-code/dynamic-email-signatures
I doubt you can do this simply. Thunderbird does allow scripting via the creation of plugins but I wouldn't personally know how do do it or how easy it might be.
Best answer I can think of would be to set a scheduled task / cron job to download the php to a local file then follow the instructions in the knowledge base, namely:
You can use Thunderbird to create signature files, or you can use your operating system tools to create them—for example, a plain text editor.
Thunderbird does not provide any
special place to store signature
files. You could create a Signatures
directory in your profile to store
them, making them easy to back up
along with the rest of your profile.
Or you could store them somewhere
else.
To use a signature file, specify it in
Account Settings as the signature for
an identity. Check the box "Attach the
signature from a file instead" and
specify the signature file.
This will work unless Thunderbird caches the HTML internally, however I see no indication in the FAQ that this is the case.
For years I updated the signature for my email client manually – until I got fed up… That’s why I wrote a PHP script to create a randomized signature block automatically from an RSS feed! Check it out: https://github.com/birdy1976/signature :-D

Hyperlink that will open document using DAV protocol?

I have a DAV server (Oracle Portal in this case). If I open word and then enter the DAV URL of a document, I'm correctly prompted for username/password and the document is checked out. I can edit it and just click Save to save it back to the server. So far, so good.
What I need is a link on a web page that will open the document for editing in Word. If I just use the same URL as I use in the File Open dialog in Word, I get a read-only copy, and the File Save dialog suggests to save it locally.
Is there a way to open a document for DAV editing directly from a hyperlink?
According to this thread, you should be able to get DAV supported by adding special headers to your response so that word knows that it is editable via DAV.
It seems to be the default behavior of Word to open such links as read-only. However, there seem to be two workarounds. You can either tweak a registry setting or use the SharePoint.OpenDocuments ActiveX control.
See here: http://www.webdavsystem.com/server/documentation/hyperlinked_ms_office_docs
No. The dav protocol uses standard HTTP transactions, and unless the client is aware of the support for DAV, it won't know to use it.
Word is likely not DAV aware, and you're relying on people mounting DAV devices as a mounted network drive.
IE: As far as words concerned, its just like any other URL.
( Unless there is a way to tell word its specifically on a DAVFS system, via a url with a different protocol specifier, for example davfs://www.google.com/ if davfs happened to be a registered protocol that your client recognised, this of course makes too much sense, and for that reason alone, you are unlikely to find it supported in windows )