Non-SEO anti-spoofing external link redirect: Status code? - redirect

I've read several documents on the merits of the different HTTP redirect status codes, but those've all been very SEO-centric. I've now got a problem where search-engines don't factor in, because the section of the site in question is not publicly viewable.
However, we do want our website to be as accurate / helpful with meta-data as possible, especially for accessibility reasons.
Now, our application takes external links provided by third parties and routes them across an anti-spoofing page with a disclaimer. Since this redirector page can effectively also be embedded via an Ajax call in certain constellations, we also want to strip any query parameters from the referer (for privacy purposes; the target site has no business finding out what internal page the user was on before).
To do this, the confirmation button triggers a server-side script which in turn redirects (rather than just opening the page for the user).
So much as to why our anti-spoofing disclaimer page ends up triggering a redirect.
The question is:
Does it effectively make any difference which status code I use? Do non-typical browsers (e.g. screen-readers) care? If so, what's the best practise for such redirects? The most semantically sound, if you so will? They all seem various degrees of insincere to me.
I'm thinking of a 302 - but as it makes no sense trying to bookmark the page (it's protected with a crsf token), so there's probably no harm in a 301, either, is there? So I'm wondering if there's a reason for me to prefer the one over the other.

Hmm. Here's the list. 301 sounds okay (emphasis mine):
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible.
302 doesn't fit n my opinion:
The requested resource resides temporarily under a different URI
However, my favourite is 303 see other:
The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. The new URI is not a substitute reference for the originally requested resource.
But that might be so rare (I've never seen it used in the wild) that some clients may not understand it - which would render your desire for maximum compatibility moot. 301 is probably the closest choice.

Related

How to model REST URL for which there is no equivalent HTTP method?

I have customers and I want to activate and cancel their plans. I am trying to be as RESTful as possible.
Should the action to perform 'active' or 'suspend' be part of URI ?
1) POST customers/{customerId}/activatePlan/{planName}
2) POST customers/{customerId}/suspendPlan/{planName}
My problem is that both activate and cancel are verbs or actions. They do not have any equivalent HTTP action ( GET, POST, PATCH etc.)
Are my URL's restful ? if not, how to make them REST ful.
Everything is a resource on the RESTful paradigm and these resources are manipulated with one of the HTTP methods (GET, POST, PUT, DELETE, etc ...).
You can create a plan with POST:
POST customers/{customerId}/{planName}
once a plan is created we have to activate or deactivate it and here we have a couple of choices:
Using the action in the URI. We use PUT in this case as the planName resource exists (so it is an update):
PUT customers/{customerId}/{planName}/activate
Set a property on the planName resource (still a PUT as it is an update on the planName resource). The activate property in the body of the HTTP PUT request (i.e.: activate=true or activate=false):
PUT customers/{customerId}/{planName}
then you can use GET to return the status of the planName resource
GET customers/{customerId}/{planName}
and DELETE if you want to remove planName from a customer:
GET customers/{customerId}/{planName}
Are my URL's restful ?
All URI are RESTful -- REST doesn't care what spelling conventions you use for your resource identifiers.
Notice, please, that both of the following URI work exactly as you would expect them to:
https://www.merriam-webster.com/dictionary/activate
https://www.merriam-webster.com/dictionary/cancel
My problem is that both activate and cancel are verbs or actions. They do not have any equivalent HTTP action ( GET, POST, PATCH etc.)
There are a couple of different answers. Perhaps the easiest is to think about how you would design a web site that allowed you to activate and cancel plans. You would probably have a web page for this customers enrollment, and on that page might be a link with some hint, like "cancel". Clicking the link would load a new web page, with a form on it, including a bunch of fields for the customer to fill in. When the customer submits the form, the browser looks at the form, and its meta data, and figures out what request to send to the server.
A "REST API" isn't about putting your domain model on the web - it's about putting your interaction model on the web. We create a collection of documents (resources) that describe the domain, and we get useful work done by moving those documents around the network. See Jim Webber 2011.
Because we are working in the language of documents, REST interfaces are usually based on one of the following ideas
Here is a document; make local edits to your copy of the document, and send it back to the server (GET, PUT, PATCH, DELETE are the primary method tokens you see in this style)
Here is a form; fill in the details and submit it (GET and POST are the method tokens you see in these cases)
If you are unhappy with the use of POST, it may help to review Fielding 2009
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.”

RESTful design: Should renamed resources block old URIs forever? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to understand RESTful Web Services page 274 section HTTP PUT. Issuing PUT against a non-existent resource creates the resource. If PUT causes an existing resource to move, HTTP 301 (Moved Permanently) is returned with the new location. Requests to the old URI return HTTP 301, 404 or 410.
My question is about returning HTTP 301. This seems to imply that resources retain ownership of old URIs forever.
Consider: /companies/{companyName}/departments/{departmentName}
I see the following benefits of using HTTP 301:
Concurrency: If one user renames a company while another is in the process of navigating to a department, the latter will get HTTP 404 in spite of the fact that they did nothing wrong. HTTP 301 allows us to seamlessly redirect the second user to the new URI.
Bookmarks: Both humans and computers need to bookmark URIs for long-term storage. Humans post links in discussion forums. Computers use URIs for caching purposes and user preferences.
I see the following problems with HTTP 301:
Blocks long-term resource evolution: Over its life-time, department A is renamed to B, C and D. A few years later someone would like to create department A and is prevented from doing so by D. To be fair, I can't think of any practical example where this would happen so maybe it's a non-issue.
API versions limit its use: Even root resources change over time as new API versions are released and old versions are removed. What's the point of returning HTTP 301 if the client can't access the new resource the same way as it could the old?
What is the appropriate course of action? Should the URL hierarchy be modeled differently? Should the behavior/response be different?
If PUT causes an existing resource to move, HTTP 301 is returned with the new location.
Technically, you can't move resources in HTTP. If you are manipulating resources as a client, what you are doing is:
GET /oldurl
PUT /newurl
DELETE /oldurl
The server will not know that the new URL is a new location for the resource at the old URL, and there is no concept of URL persistence through redirection which can be established by the client using universal HTTP methods. If the service provides an API which allows you to move items, for example by passing certain parameters, and essentially doing the above behind the scenes but also creating a redirect in the process, then it is up to the service as to whether that redirect is overwritable with a new PUT operation or not, as well as what kind of redirect is to be used.
It's my understanding that when a post title gets renamed, we're supposed to keep track of the old name and return HTTP 301 when a client references the old address.
This has nothing to do with REST and is a manifestation of Cool URIs don't change. In a RESTful service, it is sufficient to link to the new URL via a resource or sequence of resources reachable from your service's entry point.
e.g. a HTML website is the normative example of a REST implementation
service entry point = /blog
/blog links to /blog/archive
/blog/archive links to /blog/new-post-title
since the blog service is designed to be consumed by humans using a web browser, it is expected that they may want to bookmark a URL to revisit it later. This is what redirects are for.
a machine-to-machine API has to do something similar:
service entry point = /
/ links to /companies with a rel of "http://myservice/rels/companies"
/companies links to /companies/new-company-name with a rel of "item"
No mention needs to be made of the old company name, because machines are not expected to bookmark a location, but instead to begin navigating from the entry point each time.
I would return a simple 404. This signifies that there is nothing matching the requested uri, and from the documentation, specifies:
No indication is given of whether the condition is temporary or permanent.
This way it is perfectly valid to created new resourced to replace those that existed sometime in the past.
Otherwise, if you're going to move a resource by renaming it, you could always return a 302, specifying the resource resides at a different location. You could return this status until a replacement is generated. The user must continue to use these headers to make sure the resource is still relocated:
Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.
Answering my own question:
I can't think of any practical example where keeping HTTP 301 forever would cause problems.
Links are not meant to survive across API versions. Resources from one version should not reference resources from another version (even using HTTP 301). If a resource's URI changes as the result of an API change, the old API should keep on using the old URI forever.
UPDATE: According to Willy Tarreau idempotency only requires the application to redirect for a short period of time. Once the client has finished retrying a request, the redirect is no longer needed (at least not for idempotency). You might want to keep redirects around for a longer period of time, to support features such as permalinks, but nothing in the specification forces you to.

REST Web Services API Design

Just wanted to get feedback on how I am planning to architect my API. Dummy methods below. Here's the structure:
GET http://api.domain.com/1/users/ <-- returns a list of users
POST http://api.domain.com/1/users/add.xml <-- adds user
POST http://api.domain.com/1/users/update.xml <-- updates user
DELETE (or POST?) http://api.domain.com/1/users/delete.xml <-- deletes user
Questions:
Is it OK to use just GET and POST?
Is it a good idea that I plan to rely on the filename to indicate what operation to do (e.g. add.xml to add)? Would it be better to do something like this: POST http://api.domain.com/1/users/add/data.xml?
What's a good way to keep these resources versioned? In my example, I use a /1/ after domain name to indicate version 1. Alternatives would be: http://api1.domain.com... or http://api-1.domain.com... or http://apiv1.domain.com... or http://api-v1.domain.com... or http://api.domain.com/v1/... or
What's the best way to authenticate?
Before you dig into REST, here are some terms you really need to grasp:
Resource - The things/data you want to make available in your API (in your case a "User")
URI - A universally unique ID for a resource. Should mention nothing about the method being performed (e.g. shouldn't contain "add" or "delete"). The structure of your URI however doesn't make your app any more or less RESTful - this is a common misconception.
Uniform Interface - A fixed set of operations you can perform on your resources, in most cases this is HTTP. There are clear definitions for the purpose of each of these HTTP methods.
The most unrestful thing about your URIs as they are right now is that they have information about the operation being performed right in them. URIs are IDs and nothing more!
Let's take a real world example. My name is Nathan. "Nathan" could be considered my ID (or in restful terms URI – for the purpose of this example assume I'm the only "Nathan"). My name/ID doesn't changed based on how you would like to interact with me, e.g. My name wouldn't change to "NathanSayHello" when you wanted to greet me.
It's the same for REST. Your user identified by http://api.domain.com/users/1 doesn't change to http://api.domain.com/users/1/update.xml when you want to update that user. The fact that you want to update that user is implied by the method you're using (e.g. PUT).
Here is my suggestion for your URIs
# Retrieve info about a user
GET http://api.domain.com/user/<id>
# Retrieve set all users
GET http://api.domain.com/users
# Update the user IDed by api.domain.com/user/<id>
PUT http://api.domain.com/user/<id>
# Create a new user. The details (even <id>) are based as the body of the request
POST http://api.domain.com/users
# Delete the user ID'd by api.domain.com/user/<id>
DELETE http://api.domain.com/user/<id>
As for your questions:
Use PUT and DELETE when appropriate and avoid overloading POST to handle these functions as it breaks HTTP's definition of POST. HTTP is your uniform interface. It is your contract with the API user about how they can expect to interact with your service. If you break HTTP, you break this contract.
Remove "add" altogether. Use HTTP's Content-Type header for specifying the mime-type of posted data.
Are you referring to the version of your API or the version of the resource? ETag and other response headers can be used to version the resources.
Many options here. Basic HTTP Auth (easy but insecure), Digest Auth, custom auth like AWS. OAuth is also a possibility. If security is of main importance, I use client side SSL certs.
1) On your design probably not. POST is not idempotent! So you should not use for the update or the delete, instead use PUT and DELETE from Rest
2) A better choice is to use the header Content-Type on the WS call, like: application/xml
3) Also on the header Content-Type u can use it: application-v1.0/xml
4) Not sure if it is the best, but probably the easiest way is to use HTTP's built-in authentication mechanisms in RFC 2617. An example: AWS Authentication
In REST, the HTTP "verb" is used to denote the operation type: you won't be able to express all the CRUD operations with only "GET" and "POST"
no: the URL of the resource is usually where the "document identifier" should appear
The version of the "document" can be transmitted in an HTTP response header upon creation/modification of the said resource. It should be the duty of the server to uniquely identify the resources - trying to do this on the client side will prove a daunting challenge i.e. keeping consistency.
Of course, there are many variations on the topic...
I did authentication based on headers. Something like
X-Username:happy-hamster
X-Password:notmyactualpassword
If you're concerned about security - do it through SSL.
Other implementations exist, of course. For instance, Amazon with their S3:
http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?RESTAuthentication.html
If you don't have ability to make PUT and DELETE requests, it's considered a good practice to tunnel them through POST. In this case the action is specified in URL. If I recall correctly, RoR does exactly this:
POST http://example.com/foos/2.xml/delete
or
POST http://example.com/foos/3.xml/put
...
<foo>
<bar>newbar</bar>
</foo>
It's a bit offtop, but in regards to versioning and REST overall you might want to take a look at CouchDB. Here is a good book available on-line
Using post for create and delete functionality is not a good rest api design strategy. Use Put to create, post to update and delete to delete the resources.
For more information on designing rest apis follow the link - best practices to design rest apis

in what situation does the HTTP_REFERER not work?

I have used referrer before in foo.php to decide whether the page iframing foo.php is of a particular URL. (using $_SERVER['HTTP_REFERER'])
It turned out that most of the time, it worked (about 98% of the time), but it also seemed like some users arrived the page and $_SERVER['HTTP_REFERER'] was not set in foo.php and therefore broke the code. [update: These user claimed that they followed the usual page flow and didn't use the URL of foo.php all by itself on the browser (that they let it be an iframe) and the users never altered their browser settings.]
I wonder what the reasons are that it could happen?
The HTTP/1.1 RFC does not make it mandatory to send an HTTP referer header. You can't make any assumptions about its presence when writing robust code; perfectly conforment browsers may not include it.
Moreoever, the RFC advises that "The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard", and "We suggest, though do not require, that a convenient toggle interface be provided for the user to enable or disable the sending of From and Referer information".
The later is not very common (though some browsers have a "Private" mode that fulfils the requirements). More likely for your 2% is that people Bookmarked the URL, which fulfils the first criteria (URI obtained from a source without a URI), and so the browser sends no referer.
Not by default AFAIK, but it's easy to turn it off (for privacy) e.g. in Firefox via about:config, and surely some users could be using browsers distributed to them (e.g. by their IT department) with such kinds of setting. So you should try to avoid relying on REFERER for any important functionality (also because it's mis-spelled, of course;-).

HTTP POST with URL query parameters -- good idea or not?

I'm designing an API to go over HTTP and I am wondering if using the HTTP POST command, but with URL query parameters only and no request body, is a good way to go.
Considerations:
"Good Web design" requires non-idempotent actions to be sent via POST. This is a non-idempotent action.
It is easier to develop and debug this app when the request parameters are present in the URL.
The API is not intended for widespread use.
It seems like making a POST request with no body will take a bit more work, e.g. a Content-Length: 0 header must be explicitly added.
It also seems to me that a POST with no body is a bit counter to most developer's and HTTP frameworks' expectations.
Are there any more pitfalls or advantages to sending parameters on a POST request via the URL query rather than the request body?
Edit: The reason this is under consideration is that the operations are not idempotent and have side effects other than retrieval. See the HTTP spec:
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
...
Methods can also have the property of
"idempotence" in that (aside from
error or expiration issues) the
side-effects of N > 0 identical
requests is the same as for a single
request. The methods GET, HEAD, PUT
and DELETE share this property. Also,
the methods OPTIONS and TRACE SHOULD
NOT have side effects, and so are
inherently idempotent.
If your action is not idempotent, then you MUST use POST. If you don't, you're just asking for trouble down the line. GET, PUT and DELETE methods are required to be idempotent. Imagine what would happen in your application if the client was pre-fetching every possible GET request for your service – if this would cause side effects visible to the client, then something's wrong.
I agree that sending a POST with a query string but without a body seems odd, but I think it can be appropriate in some situations.
Think of the query part of a URL as a command to the resource to limit the scope of the current request. Typically, query strings are used to sort or filter a GET request (like ?page=1&sort=title) but I suppose it makes sense on a POST to also limit the scope (perhaps like ?action=delete&id=5).
Everyone is right: stick with POST for non-idempotent requests.
What about using both an URI query string and request content? Well it's valid HTTP (see note 1), so why not?!
It is also perfectly logical: URLs, including their query string part, are for locating resources. Whereas HTTP method verbs (POST - and its optional request content) are for specifying actions, or what to do with resources. Those should be orthogonal concerns. (But, they are not beautifully orthogonal concerns for the special case of ContentType=application/x-www-form-urlencoded, see note 2 below.)
Note 1: HTTP specification (1.1) does not state that query parameters and content are mutually exclusive for a HTTP server that accepts POST or PUT requests. So any server is free to accept both. I.e. if you write the server there's nothing to stop you choosing to accept both (except maybe an inflexible framework). Generally, the server can interpret query strings according to whatever rules it wants. It can even interpret them with conditional logic that refers to other headers like Content-Type too, which leads to Note 2:
Note 2: if a web browser is the primary way people are accessing your web application, and application/x-www-form-urlencoded is the Content-Type they are posting, then you should follow the rules for that Content-Type. And the rules for application/x-www-form-urlencoded are much more specific (and frankly, unusual): in this case you must interpret the URI as a set of parameters, and not a resource location. [This is the same point of usefulness Powerlord raised; that it may be hard to use web forms to POST content to your server. Just explained a little differently.]
Note 3: what are query strings originally for? RFC 3986 defines HTTP query strings as an URI part that works as a non-hierarchical way of locating a resource.
In case readers asking this question wish to ask what is good RESTful architecture: the RESTful architecture pattern doesn't require URI schemes to work a specific way. RESTful architecture concerns itself with other properties of the system, like cacheability of resources, the design of the resources themselves (their behavior, capabilities, and representations), and whether idempotence is satisfied. Or in other words, achieving a design which is highly compatible with HTTP protocol and its set of HTTP method verbs. :-) (In other words, RESTful architecture is not very presciptive with how the resources are located.)
Final note: sometimes query parameters get used for yet other things, which are neither locating resources nor encoding content. Ever seen a query parameter like 'PUT=true' or 'POST=true'? These are workarounds for browsers that don't allow you to use PUT and POST methods. While such parameters are seen as part of the URL query string (on the wire), I argue that they are not part of the URL's query in spirit.
You want reasons? Here's one:
A web form can't be used to send a request to a page that uses a mix of GET and POST. If you set the form's method to GET, all the parameters are in the query string. If you set the form's method to POST, all the parameters are in the request body.
Source: HTML 4.01 standard, section 17.13 Form Submission
From a programmatic standpoint, for the client it's packaging up parameters and appending them onto the url and conducting a POST vs. a GET. On the server-side, it's evaluating inbound parameters from the querystring instead of the posted bytes. Basically, it's a wash.
Where there could be advantages/disadvantages might be in how specific client platforms work with POST and GET routines in their networking stack, as well as how the web server deals with those requests. Depending on your implementation, one approach may be more efficient than the other. Knowing that would guide your decision here.
Nonetheless, from a programmer's perspective, I prefer allowing either a POST with all parameters in the body, or a GET with all params on the url, and explicitly ignoring url parameters with any POST request. It avoids confusion.
I would think it could still be quite RESTful to have query arguments that identify the resource on the URL while keeping the content payload confined to the POST body. This would seem to separate the considerations of "What am I sending?" versus "Who am I sending it to?".
The REST camp have some guiding principles that we can use to standardize the way we use HTTP verbs. This is helpful when building RESTful API's as you are doing.
In a nutshell:
GET should be Read Only i.e. have no effect on server state.
POST is used to create a resource on the server.
PUT is used to update or create a resource.
DELETE is used to delete a resource.
In other words, if your API action changes the server state, REST advises us to use POST/PUT/DELETE, but not GET.
User agents usually understand that doing multiple POSTs is bad and will warn against it, because the intent of POST is to alter server state (eg. pay for goods at checkout), and you probably don't want to do that twice!
Compare to a GET which you can do as often as you like (idempotent).
I find it perfectly acceptable to use query parameters on a POST end-point if they refer to an already-existing resource that must be updated through the POST end-point (not created).
For example:
POST /user_settings?user_id=4
{
"use_safe_mode": 1
}
The POST above has a query parameter referring to an existing resource, mirroring the GET end-point definition to get the same resource.
The body parameter defines how to update the existing resource.
Edited:
I prefer this to having the path of the end-point to point directly to the already-existing recourse, like some suggest to do, like so:
POST /user_settings/4
{
...
}
The reason is three-fold:
I find it has better readability, since the query parameters are named, like "user_id" in the above, in stead of just "4".
Usually, there is also a GET endpoint to get the same resource. In that case the path of the end-point and the query parameters will be the same and I like that symmetry.
I find the nesting can become cumbersome and difficult to read in case multiple parameters are needed to define the already-existing resource:
POST /user_settings/{user_id}/{which_settings_id}/{xyz}/{abc}/ ...
{
...
}