Using Fiddler2, how can I detect if the response stream was in fact GZipped?
With Fiddler, simply click the Inspectors tab, then click the Transformer response inspector. It will tell you exactly what encodings have been applied.
Be sure that the "AutoDecode" option isn't checked in the Fiddler toolbar.
There is a special magic marker that denotes a gzipped archive, the first two bytes, namely 0x1f and 0x8b tells it is a gzip archive. The RFC 1952 will give more information on this.
You can find out more by checking wotsit and also on wikipedia.
Related
I want to modify the responsebody.
However, the data is compressed by Brotli, which fiddler does not support.
I want to set a response breakpoint.
When fiddler break on the response.
I decompress the data with a tool, modify it and compress the modified data.
Copy the modified data to the fiddlerscript and save the fiddlerscript.
But I find that the response breakpoint breaks after the response arrived.
So when I resave the fiddlerscript, the script won't work on the breaked response.
How can I modify a responsebody compressed with an unsupported algorithm?
You can install the Compressibility add on from http://www.telerik.com/fiddler/add-ons. This adds Brotly support to Fiddler at present.
I'm trying to edit the content of a Request in mitmproxy and pass it over, but the content of the body is encoded by gzip. I can see the structure of data which is like xml, but I cannot edit it and save it in gzip format. How can I resolve this issue? I tried different tutorials, but none of them are going into detail in that level
I was not able to get this to work using mitmproxy 0.11.1, because every time I tried to edit the response, the body would open in my text editor as the raw gzipped source. However, it did work in mitmproxy 0.11.3. Unfortunately, there appear to be no release notes for the 0.11.2 or 0.11.3 releases.
I set up an i ~bs (response body) intercept hook, and a l ~bs filter to display the intercepted message. I loaded the page in a browser, opened the request, pressed tab to view the response body, hit e to edit, and r for raw body. That opened my editor with the body response as unformatted ASCII text, not the raw gzipped encoding. After saving a one-character change and exiting the editor, I hit a to accept and send the updated message, and saw the change in the web browser developer tools.
However, on several other occasions while doing this and changing a lot of characters in the response body, mitmproxy crashed.
I support a web-application that displays reports from a database. Occassionally, a report will contain an attachment (which is typically an image/document which is stored in the database as well).
We serve the attachment via a dynamic .htm resource which streams the attachment from the database, and populates the content-type based on what type of attachment it is (we support PDFs, RTFs, and various image formats)
For RTFs we've come across a problem. It seems a lot of Windows users don't defaultly have an assocation for the 'application/rtf' content-type (they do have an association for the *.rtf file extention). As a result, clicking on the link to the attachment doesn't do anything in Internet Explorer 6.
Returning 'application/msword' as the content-type seems to make the RTF viewable when clicking on the link, but only for people who have MS Office installed (some of the users won't have this installed, and will use alternate RTF readers, like OpenOffice).
This application is accessed publicly, so we don't have control of the user's machine settings.
Has anybody here solved this before? And how? Thanks!
Use application/octet-stream content-type to force download. Once it's downloaded, it should be viewable in whatever is registered to handle .rtf files.
In addition to the Content-Type header, you also need to add the following:
Content-Disposition: attachment; filename=my-document.rtf
Wordpad (which is on pretty much every Windows machine) can view RTF files. Is there an 'application/wordpad' content-type?
Alternatively, given the rarety of RTF files, your best solution might be to use a server-side component to open the RTF file, convert it to some other format (like PDF or straight HTML), and serve that to the requesting client. I don't know what language/platform you're using on the server side, so I don't know what to tell you to use for this.
I have a web app where the user needs to upload a .zip file. On the server-side, I am checking the mime type of the uploaded file, to make sure it is application/x-zip-compressed or application/zip.
This worked fine for me on Firefox and IE. However, when a coworker tested it, it failed for him on Firefox (sent mime type was something like "application/octet-stream") but worked on Internet Explorer. Our setups seem to be identical: IE8, FF 3.5.1 with all add-ons disabled, Windows XP SP3, WinRAR installed as native .zip file handler (not sure if that's relevant).
So my question is: How does the browser determine what mime type to send?
Please note: I know that the mime type is sent by the browser and, therefore, unreliable. I am just checking it as a convenience--mainly to give a more friendly error message than the ones you get by trying to open a non-zip file as a zip file, and to avoid loading the (presumably heavy) zip file libraries.
Chrome
Chrome (version 38 as of writing) has 3 ways to determine the MIME type and does so in a certain order. The snippet below is from file src/net/base/mime_util.cc, method MimeUtil::GetMimeTypeFromExtensionHelper.
// We implement the same algorithm as Mozilla for mapping a file extension to
// a mime type. That is, we first check a hard-coded list (that cannot be
// overridden), and then if not found there, we defer to the system registry.
// Finally, we scan a secondary hard-coded list to catch types that we can
// deduce but that we also want to allow the OS to override.
The hard-coded lists come a bit earlier in the file: https://cs.chromium.org/chromium/src/net/base/mime_util.cc?l=170 (kPrimaryMappings and kSecondaryMappings).
An example: when uploading a CSV file from a Windows system with Microsoft Excel installed, Chrome will report this as application/vnd.ms-excel. This is because .csv is not specified in the first hard-coded list, so the browser falls back to the system registry. HKEY_CLASSES_ROOT\.csv has a value named Content Type that is set to application/vnd.ms-excel.
Internet Explorer
Again using the same example, the browser will report application/vnd.ms-excel. I think it's reasonable to assume Internet Explorer (version 11 as of writing) uses the registry. Possibly it also makes use of a hard-coded list like Chrome and Firefox, but its closed source nature makes it hard to verify.
Firefox
As indicated in the Chrome code, Firefox (version 32 as of writing) works in a similar way. Snippet from file uriloader\exthandler\nsExternalHelperAppService.cpp, method nsExternalHelperAppService::GetTypeFromExtension
// OK. We want to try the following sources of mimetype information, in this order:
// 1. defaultMimeEntries array
// 2. User-set preferences (managed by the handler service)
// 3. OS-provided information
// 4. our "extras" array
// 5. Information from plugins
// 6. The "ext-to-type-mapping" category
The hard-coded lists come earlier in the file, somewhere near line 441. You're looking for defaultMimeEntries and extraMimeEntries.
With my current profile, the browser will report text/csv because there's an entry for it in mimeTypes.rdf (item 2 in the list above). With a fresh profile, which does not have this entry, the browser will report application/vnd.ms-excel (item 3 in the list).
Summary
The hard-coded lists in the browsers are pretty limited. Often, the MIME type sent by the browser will be the one reported by the OS. And this is exactly why, as stated in the question, the MIME type reported by the browser is unreliable.
Kip, I spent some time reading RFCs, MSDN and MDN. Here is what I could understand. When a browser encounters a file for upload, it looks at the first buffer of data it receives and then runs a test on it. These tests try to determine if the file is a known mime type or not, and if known mime type it will simply further test it for which known mime type and take action accordingly. I think IE tries to do this first rather than just determining the file type from extension. This page explains this for IE http://msdn.microsoft.com/en-us/library/ms775147%28v=vs.85%29.aspx. For firefox, what I could understand was that it tries to read file info from filesystem or directory entry and then determines the file type. Here is a link for FF https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIFile. I would still like to have more authoritative info on this.
This is probably OS and possibly browser dependent, but on Windows, the MIME type for a given file extension can be found by looking in the registry under HKCR:
For example:
HKEY_CLASSES_ROOT.zip
- ContentType
To go from MIME to file extension, you can look at the keys under
HKEY_CLASSES_ROOT\Mime\Database\Content Type
To get the default extension for a particular MIME type.
While this is not an answer to your question, it does solve the problem you are trying to solve. YMMV.
As you wrote, mime type is not reliable as each browser has its way of determining it. However, browsers send the original name (including extension) of the file. So the best way to deal with the problem is to inspect extension of the file instead of the MIME type.
If you still need the mime type, you can use your own apache's mime.types to determine it server-side.
I agree with johndodo, there are so many variables that make mime types that are sent from browsers unreliable. I would exclude the subtypes that are received and just focus on the type like 'application'. if your app is php based, you can easily do this by using the function explode().
in addition, just check the file extension to make sure it is .zip or any other compression you are looking for!
According to rfc1867 - Form-based file upload in HTML:
Each part should be labelled with an appropriate content-type if the
media type is known (e.g., inferred from the file extension or
operating system typing information) or as application/octet-stream.
So my understanding is, application/octet-stream is kind of like a blanket catch-all identifier if the type cannot be inferred.
You might know that HTML related file formats are compressed using GZip compression, server side, (by mod_gzip on Apache servers), and are decompressed by compatible browsers. ("content encoding")
Does this only work for HTML/XML files? Lets say my PHP/Perl file generates some simple comma delimited data, and sends that to the browser, will it be encoded by default?
What about platforms like Silverlight or Flash, when they download such data will it be compressed/decompressed by the browser/runtime automatically? Is there any way to test this?
Does this only work for HTML/XML
files?
No : it is quite often used for CSS and JS files, for instance -- as those are amongst the biggest thing that websites are made of (except images), because of JS frameworks and full-JS applications, it represents a huge gain!
Actually, any text-based format can be compressed quite well (on the opposite, images can not, for instance, as they are generally already compressed) ; sometimes, JSON data returned from Ajax-requests are compressed too -- it's text data, afterall ;-)
Lets say my PHP/Perl file generates
some simple comma delimited data, and
sends that to the browser, will it be
encoded by default?
It's a matter of configuration : if you configured your server to compress that kind of content, it'll probably be compressed :-)
(If the browser says it accepts gzip-encoded data)
Here's a sample of configuration for Apache 2 (using mod_deflate) that I use on my blog :
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript application/x-javascript application/xml
</IfModule>
Here, I want html/xml/css/JS te be compressed.
And here is the same thing, plus/minus a few configuration options I used once, under Apache 1 (mod_gzip) :
<IfModule mod_gzip.c>
mod_gzip_on Yes
mod_gzip_can_negotiate Yes
mod_gzip_minimum_file_size 256
mod_gzip_maximum_file_size 500000
mod_gzip_dechunk Yes
mod_gzip_item_include file \.css$
mod_gzip_item_include file \.html$
mod_gzip_item_include file \.txt$
mod_gzip_item_include file \.js$
mod_gzip_item_include mime text/html
mod_gzip_item_exclude mime ^image/
</IfModule>
Things that can be noticed here are that I don't want too small (the gain wouldn't be quite important) or too big (would eat too much CPU to compress) files to be compressed ; and I want css/html/txt/js files to be compressed, but not images.
If you want you comma-separated data to be compressed the same way, you'll have to add either it's content-type or it's extension to the configuration of your webserver, to activate gzip-compression for it.
Is there any way to test this?
For any content returned directly to the browser, Firefox's extensions Firebug or LiveHTTPHeaders are a must-have.
For content that doesn't go through the standard communication way of the browser, it might be harder ; in the end, you may have to end up using something like Wireshark to "sniff" what is really going through the pipes... Good luck with that!
What about platforms like Silverlight or Flash,
when they download such data will it be compressed/decompressed
by the browser/runtime automatically?
To answer your question about Silverlight and Flash, if they send an Accept header indicating they support compressed content, Apache will use mod_deflate or mod_gzip. If they don’t support compression they won’t send the header. It will “just work.” – Nate
I think Apache’s mod_deflate is more common than mod_gzip, because it’s built-in and does the same thing. Look at the documentation for mod_deflate (linked above) and you’ll see that it’s easy to specify which file types to compress, based on their MIME types. Generally it’s worth compressing HTML, CSS, XML and JavaScript. Images are already compressed, so they don’t benefit from compression.
The browser sends an "Accept-Encoding" header with the types of compression that it knows how to understand. The server looks at this, along with the user-agent and decides how to encode the result. Some browsers lie about what they can understand, so this is more complex than just searching for "deflate" in the header.
Technically, any HTTP/2xx response with content can be content-encoded using any of the valid content encodings (gzip, zlib, deflate, etc.), but in practice it's wasteful to apply compression to common image types because it actually makes them larger.
You can definitely compress the response from dynamic PHP pages. The simplest method is to add:
<?php ob_start("ob_gzhandler"); ?>
to the start of every PHP page. It's better to set it up through the PHP configuration, of course.
There are many test pages, easily found with Google:
http://www.whatsmyip.org/http_compression/
http://www.gidnetwork.com/tools/gzip-test.php
http://nontroppo.org/tools/gziptest/
http://www.nibbleguru.com/tools/gzip-test.php