in what situation does the HTTP_REFERER not work? - http-referer

I have used referrer before in foo.php to decide whether the page iframing foo.php is of a particular URL. (using $_SERVER['HTTP_REFERER'])
It turned out that most of the time, it worked (about 98% of the time), but it also seemed like some users arrived the page and $_SERVER['HTTP_REFERER'] was not set in foo.php and therefore broke the code. [update: These user claimed that they followed the usual page flow and didn't use the URL of foo.php all by itself on the browser (that they let it be an iframe) and the users never altered their browser settings.]
I wonder what the reasons are that it could happen?

The HTTP/1.1 RFC does not make it mandatory to send an HTTP referer header. You can't make any assumptions about its presence when writing robust code; perfectly conforment browsers may not include it.
Moreoever, the RFC advises that "The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard", and "We suggest, though do not require, that a convenient toggle interface be provided for the user to enable or disable the sending of From and Referer information".
The later is not very common (though some browsers have a "Private" mode that fulfils the requirements). More likely for your 2% is that people Bookmarked the URL, which fulfils the first criteria (URI obtained from a source without a URI), and so the browser sends no referer.

Not by default AFAIK, but it's easy to turn it off (for privacy) e.g. in Firefox via about:config, and surely some users could be using browsers distributed to them (e.g. by their IT department) with such kinds of setting. So you should try to avoid relying on REFERER for any important functionality (also because it's mis-spelled, of course;-).

Related

HTTP Redirect Status Code

I have an ASP.NET website. A user can access the URL /partners/{partner-id} in my app. When that url is invoked, I do two things:
1) I want to log the partner ID and user that requested it and
2) Redirect the url to the partner's website.
My question is, which HTTP Status Code should I use? I was using 301. However, that introduced a problem where my logging code was getting skipped. I suspect its because a 301 represents a permanent redirect. However, I basically want to remain the middle man so that I properly log the details.
What HTTP status code should I use?
Thanks!
Taking a look here:
https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
you should use the 302 status code. Two useful points about the 302 redirect:
Since the redirection might be altered on occasion, the client SHOULD
continue to use the Request-URI for future requests
This says by inferring that the redirect may be temporary, clients should always check the initial URI instead of going to the redirect URI as a default behavior, meaning they will pass through your logging system each time rather than going directly to the redirected URI on subsequent requests. The 302 response also states:
This response is only cacheable if indicated by a Cache-Control or
Expires header field.
By default, the 301 redirect is cacheable unless you explicitly specify, but the 302 is not cacheable unless explicitly specified.
However, it's probably a good idea to explicitly add in 'do not cache' headers to the redirect to let the client know that it should not be cached just in case you have a client that doesn't follow the default spec behavior. There are a number of other answers in stackoverflow regarding this, here's a decent one:
How to control web page caching, across all browsers?

Non-SEO anti-spoofing external link redirect: Status code?

I've read several documents on the merits of the different HTTP redirect status codes, but those've all been very SEO-centric. I've now got a problem where search-engines don't factor in, because the section of the site in question is not publicly viewable.
However, we do want our website to be as accurate / helpful with meta-data as possible, especially for accessibility reasons.
Now, our application takes external links provided by third parties and routes them across an anti-spoofing page with a disclaimer. Since this redirector page can effectively also be embedded via an Ajax call in certain constellations, we also want to strip any query parameters from the referer (for privacy purposes; the target site has no business finding out what internal page the user was on before).
To do this, the confirmation button triggers a server-side script which in turn redirects (rather than just opening the page for the user).
So much as to why our anti-spoofing disclaimer page ends up triggering a redirect.
The question is:
Does it effectively make any difference which status code I use? Do non-typical browsers (e.g. screen-readers) care? If so, what's the best practise for such redirects? The most semantically sound, if you so will? They all seem various degrees of insincere to me.
I'm thinking of a 302 - but as it makes no sense trying to bookmark the page (it's protected with a crsf token), so there's probably no harm in a 301, either, is there? So I'm wondering if there's a reason for me to prefer the one over the other.
Hmm. Here's the list. 301 sounds okay (emphasis mine):
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible.
302 doesn't fit n my opinion:
The requested resource resides temporarily under a different URI
However, my favourite is 303 see other:
The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. The new URI is not a substitute reference for the originally requested resource.
But that might be so rare (I've never seen it used in the wild) that some clients may not understand it - which would render your desire for maximum compatibility moot. 301 is probably the closest choice.

HTTP POST with URL query parameters -- good idea or not?

I'm designing an API to go over HTTP and I am wondering if using the HTTP POST command, but with URL query parameters only and no request body, is a good way to go.
Considerations:
"Good Web design" requires non-idempotent actions to be sent via POST. This is a non-idempotent action.
It is easier to develop and debug this app when the request parameters are present in the URL.
The API is not intended for widespread use.
It seems like making a POST request with no body will take a bit more work, e.g. a Content-Length: 0 header must be explicitly added.
It also seems to me that a POST with no body is a bit counter to most developer's and HTTP frameworks' expectations.
Are there any more pitfalls or advantages to sending parameters on a POST request via the URL query rather than the request body?
Edit: The reason this is under consideration is that the operations are not idempotent and have side effects other than retrieval. See the HTTP spec:
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
...
Methods can also have the property of
"idempotence" in that (aside from
error or expiration issues) the
side-effects of N > 0 identical
requests is the same as for a single
request. The methods GET, HEAD, PUT
and DELETE share this property. Also,
the methods OPTIONS and TRACE SHOULD
NOT have side effects, and so are
inherently idempotent.
If your action is not idempotent, then you MUST use POST. If you don't, you're just asking for trouble down the line. GET, PUT and DELETE methods are required to be idempotent. Imagine what would happen in your application if the client was pre-fetching every possible GET request for your service – if this would cause side effects visible to the client, then something's wrong.
I agree that sending a POST with a query string but without a body seems odd, but I think it can be appropriate in some situations.
Think of the query part of a URL as a command to the resource to limit the scope of the current request. Typically, query strings are used to sort or filter a GET request (like ?page=1&sort=title) but I suppose it makes sense on a POST to also limit the scope (perhaps like ?action=delete&id=5).
Everyone is right: stick with POST for non-idempotent requests.
What about using both an URI query string and request content? Well it's valid HTTP (see note 1), so why not?!
It is also perfectly logical: URLs, including their query string part, are for locating resources. Whereas HTTP method verbs (POST - and its optional request content) are for specifying actions, or what to do with resources. Those should be orthogonal concerns. (But, they are not beautifully orthogonal concerns for the special case of ContentType=application/x-www-form-urlencoded, see note 2 below.)
Note 1: HTTP specification (1.1) does not state that query parameters and content are mutually exclusive for a HTTP server that accepts POST or PUT requests. So any server is free to accept both. I.e. if you write the server there's nothing to stop you choosing to accept both (except maybe an inflexible framework). Generally, the server can interpret query strings according to whatever rules it wants. It can even interpret them with conditional logic that refers to other headers like Content-Type too, which leads to Note 2:
Note 2: if a web browser is the primary way people are accessing your web application, and application/x-www-form-urlencoded is the Content-Type they are posting, then you should follow the rules for that Content-Type. And the rules for application/x-www-form-urlencoded are much more specific (and frankly, unusual): in this case you must interpret the URI as a set of parameters, and not a resource location. [This is the same point of usefulness Powerlord raised; that it may be hard to use web forms to POST content to your server. Just explained a little differently.]
Note 3: what are query strings originally for? RFC 3986 defines HTTP query strings as an URI part that works as a non-hierarchical way of locating a resource.
In case readers asking this question wish to ask what is good RESTful architecture: the RESTful architecture pattern doesn't require URI schemes to work a specific way. RESTful architecture concerns itself with other properties of the system, like cacheability of resources, the design of the resources themselves (their behavior, capabilities, and representations), and whether idempotence is satisfied. Or in other words, achieving a design which is highly compatible with HTTP protocol and its set of HTTP method verbs. :-) (In other words, RESTful architecture is not very presciptive with how the resources are located.)
Final note: sometimes query parameters get used for yet other things, which are neither locating resources nor encoding content. Ever seen a query parameter like 'PUT=true' or 'POST=true'? These are workarounds for browsers that don't allow you to use PUT and POST methods. While such parameters are seen as part of the URL query string (on the wire), I argue that they are not part of the URL's query in spirit.
You want reasons? Here's one:
A web form can't be used to send a request to a page that uses a mix of GET and POST. If you set the form's method to GET, all the parameters are in the query string. If you set the form's method to POST, all the parameters are in the request body.
Source: HTML 4.01 standard, section 17.13 Form Submission
From a programmatic standpoint, for the client it's packaging up parameters and appending them onto the url and conducting a POST vs. a GET. On the server-side, it's evaluating inbound parameters from the querystring instead of the posted bytes. Basically, it's a wash.
Where there could be advantages/disadvantages might be in how specific client platforms work with POST and GET routines in their networking stack, as well as how the web server deals with those requests. Depending on your implementation, one approach may be more efficient than the other. Knowing that would guide your decision here.
Nonetheless, from a programmer's perspective, I prefer allowing either a POST with all parameters in the body, or a GET with all params on the url, and explicitly ignoring url parameters with any POST request. It avoids confusion.
I would think it could still be quite RESTful to have query arguments that identify the resource on the URL while keeping the content payload confined to the POST body. This would seem to separate the considerations of "What am I sending?" versus "Who am I sending it to?".
The REST camp have some guiding principles that we can use to standardize the way we use HTTP verbs. This is helpful when building RESTful API's as you are doing.
In a nutshell:
GET should be Read Only i.e. have no effect on server state.
POST is used to create a resource on the server.
PUT is used to update or create a resource.
DELETE is used to delete a resource.
In other words, if your API action changes the server state, REST advises us to use POST/PUT/DELETE, but not GET.
User agents usually understand that doing multiple POSTs is bad and will warn against it, because the intent of POST is to alter server state (eg. pay for goods at checkout), and you probably don't want to do that twice!
Compare to a GET which you can do as often as you like (idempotent).
I find it perfectly acceptable to use query parameters on a POST end-point if they refer to an already-existing resource that must be updated through the POST end-point (not created).
For example:
POST /user_settings?user_id=4
{
"use_safe_mode": 1
}
The POST above has a query parameter referring to an existing resource, mirroring the GET end-point definition to get the same resource.
The body parameter defines how to update the existing resource.
Edited:
I prefer this to having the path of the end-point to point directly to the already-existing recourse, like some suggest to do, like so:
POST /user_settings/4
{
...
}
The reason is three-fold:
I find it has better readability, since the query parameters are named, like "user_id" in the above, in stead of just "4".
Usually, there is also a GET endpoint to get the same resource. In that case the path of the end-point and the query parameters will be the same and I like that symmetry.
I find the nesting can become cumbersome and difficult to read in case multiple parameters are needed to define the already-existing resource:
POST /user_settings/{user_id}/{which_settings_id}/{xyz}/{abc}/ ...
{
...
}

Can I change the headers of the HTTP request sent by the browser?

I'm looking into a restful design and would like to use the HTTP methods (POST, GET, ...) and HTTP headers as much as possible. I already found out that the HTTP methods PUT and DELETE are not supported from the browser.
Now I'm looking to get different representations of the same resource and would like to do this by changing the Accept header of the request. Depending on this Accept header, the server can serve a different view on the same resource.
Problem is that I didn't find a way to tell my browser to change this header.
The <a..> tag has a type attribute, that can have a mime type, looked like a good candidate but the header was still the browser default (in Firefox it can be changed in about:config with the network.http.accept.default key).
I would partially disagree with Milan's suggestion of embedding the requested representation in the URI.
If anyhow possible, URIs should only be used for addressing resources and not for tunneling HTTP methods/verbs. Eventually, specific business action (edit, lock, etc.) could be embedded in the URI if create (POST) or update (PUT) alone do not serve the purpose:
POST http://shonzilla.com/orders/08/165;edit
In the case of requesting a particular representation in URI you would need to disrupt your URI design eventually making it uglier, mixing two distinct REST concepts in the same place (i.e. URI) and making it harder to generically process requests on the server-side. What Milan is suggesting and many are doing the same, incl. Flickr, is exactly this.
Instead, a more RESTful approach would be using a separate place to encode preferred representation by using Accept HTTP header which is used for content negotiation where client tells to the server which content types it can handle/process and server tries to fulfill client's request. This approach is a part of HTTP 1.1 standard, software compliant and supported by web browsers as well.
Compare this:
GET /orders/08/165.xml HTTP/1.1
or
GET /orders/08/165&format=xml HTTP/1.1
to this:
GET /orders/08/165 HTTP/1.1
Accept: application/xml
From a web browser you can request any content type by using setRequestHeader method of XMLHttpRequest object. For example:
function getOrder(year, yearlyOrderId, contentType) {
var client = new XMLHttpRequest();
client.open("GET", "/order/" + year + "/" + yearlyOrderId);
client.setRequestHeader("Accept", contentType);
client.send(orderDetails);
}
To sum it up: the address, i.e. the URI of a resource should be independent of its representation and XMLHttpRequest.setRequestHeader method allows you to request any representation using the Accept HTTP header.
Cheers!
Shonzilla
I was looking to do exactly the same thing (RESTful web service), and I stumbled upon this firefox addon, which lets you modify the accept headers (actually, any request headers) for requests. It works perfectly.
https://addons.mozilla.org/en-US/firefox/addon/967/
I don't think it's possible to do it in the way you are trying to do it.
Indication of the accepted data format is usually done through adding the extension to the resource name. So, if you have resource like
/resources/resource
and GET /resources/resource returns its HTML representation, to indicate that you want its XML representation instead, you can use following pattern:
/resources/resource.xml
You have to do the accepted content type determination magic on the server side, then.
Or use Javascript as James suggests.
ModHeader extension for Google Chrome, is also a good option. You can just set the Headers you want and just enter the URL in the browser, it will automatically take the headers from the extension when you hit the url. Only thing is, it will send headers for each and every URL you will hit so you have to disable or delete it after use.
Use some javascript!
xmlhttp=new XMLHttpRequest();
xmlhttp.open('PUT',http://www.mydomain.org/documents/standards/browsers/supportlist)
xmlhttp.send("page content goes here");

Why does Fiddler break my site's redirects?

Why does using Fiddler break my site sometimes on page transitions.
After a server side redirect -- in the http response (as found in Fiddler) I get this:
Object moved
Object moved to here.
The site is an ASP.NET 1.1 / VB.NET 1.1 [sic] site.
Why doesnt Fiddler just go there for me? i dont get it.
I'm fine with this issue when developing but I'm worried that other proxy servers might cause this issue for 'real customers'. Im not even clear exactly what is going on.
That's actually what Response.Redirect does. It sends a 302 - Object moved response to the user-agent. The user-agent then automatically goes to the URL specified in the 302 response. If you need a real server-side redirect without round-tripping to the client, try Server.Transfer.
If you merely constructed the request using the request builder, you're not going to see Fiddler automatically follow the returned redirect.
In contrast, if you are using IE or another browser, it will generally check the redirect header and follow it.
For IE specifically, I believe there's a timing corner case where the browser will fail to follow the redirect in obscure situations. You can often fix this by clicking Tools / Fiddler Options, and enabling both the "Server" and "Client" socket reuse settings.
Thanks user15310, it works with Server.Transfer
Server.Transfer("newpage.aspx", true);
Firstly, transferring to another page using Server.Transfer conserves server resources. Instead of telling the browser to redirect, it simply changes the "focus" on the Web server and transfers the request. This means you don't get quite as many HTTP requests coming through, which therefore eases the pressure on your Web server and makes your applications run faster.
But watch out: because the "transfer" process can work on only those sites running on the server, you can't use Server.Transfer to send the user to an external site. Only Response.Redirect can do that.
Secondly, Server.Transfer maintains the original URL in the browser. This can really help streamline data entry techniques, although it may make for confusion when debugging.
That's not all: The Server.Transfer method also has a second parameter—"preserveForm". If you set this to True, using a statement such as Server.Transfer("WebForm2.aspx", True), the existing query string and any form variables will still be available to the page you are transferring to.
Read more here:
http://www.developer.com/net/asp/article.php/3299641/ServerTransfer-Vs-ResponseRedirect.htm