I have read a lot of discussions here on SO, watched Jon Moore's presentation (which explained a lot, btw) and read over Roy Fielding's blog post on HATEOAS but I still feel a little in the dark when it comes to client design.
API Question
For now, I'm simply returning xhtml with forms/anchors and definition lists to represent the resources. The following snippet details how I lay out forms/anchors/lists.
# anchors
<li class='docs_url/#resourcename'>
<a rel='self' href='resource location'></a>
</li>
# forms
<form action='action_url' method='whatever_method' class='???'></form>
# lists
<dl class='docs_url/#resourcename'>
<dt>property</dt>
<dd>value</dd>
</dl>
My question is mainly for forms. In Jon's talk he documents form types such as (add_location_form) etc. and the required inputs for them. I don't have a lot of resources but I was thinking of abstract form types (add , delete, update, etc) and just note in the documentation that for (add, update) that you must send a valid representation of the target resource and with delete that you must send the identifier.
Question 1: With the notion of HATEOAS, shouldn't we really just make the client "discover" the form (by classing them add,delete,update etc) and just send back all the data we gave them? My real question here (not meant to be a discussion) is does this follow good practice?
Client Question
Following HATEOAS, with our actions on resources being discover-able, how does this effect client code (consumers of the api) and their ui. It sounds great that following these principals that the UI should only display actions that are available but how is that implemented?
My current approach is parsing the response as xml and usin xpath to look for the actions which are known at the time of client development (documented form classes ie. add,delete,update) and display the ui controls if they are available.
Question 2: Am I wrong in my way of discovery? Or is this too much magic as far as the client is concerned ( knowing the form classes )? Wouldn't this assume that the client knows which actions are available for each resource ( which may be fine because it is sort of a reason for creating the client, right? ) and should the mapping of actions (form classes) to resources be documented, or just document the form classes and allow the client (and client developer) to research and discover them?
I know I'm everywhere with this, but any insight is much appreciated. I'll mark answered a response that answers any of these two questions well. Thanks!
No, you're pretty much spot on.
Browsers simply render the HTML payload and rely on the Human to actually interpret, find meaning, and potentially populate the forms appropriately.
Machine clients, so far, tend to do quite badly at the "interpret" part. So, instead developers have to make the decisions in advance and guide the machine client in excruciating detail.
Ideally, a "smart" HATEOS client would have certain facts, and be aware of context so that it could better map those facts to the requirements of the service.
Because that's what we do, right? We see a form "Oh, they want name, address, credit card number". We know not only what "name", "address", and "credit card" number mean, we also can intuit that they mean MY name, or the name of the person on the credit card, or the name of the person being shipped to.
Machines fail pretty handily at the "intuit" part as well. So as a developer, you get to code in the logic of what you think may be necessary to determine the correct facts and how they are placed.
But, back to the ideal client, it would see each form, "know" what the fields wanted, consult its internal list of "facts", and then properly populate the payload for the request and finally make the request.
You can see that a trivial, and obviously brittle, way to do that is to simply map the parameter names to the internal data. When the parameter name is "name", you may hard code that to something like: firstName + " " + lastName. Or you may consider the actual rel to "know" they're talking about shipping, and use: shipTo.firstName + " " + shipTo.lastName.
Over time, ideally you could build up a collection of mappings and such so that if suddenly a payload introduced a new field, and it happened to be a field you already know about, you could fill that in as well "automatically" without change to the client.
But the simply truth is, that while this can be done, it's pretty much not done. The semantics are usually way to vague, you'd have to code in new "intuition" each time for each new payload anyway, so you may as well code to the payload directly and be done with it.
The key thing, though, especially about HATEOS, is that you don't "force" your data on to the server. The server tells you what it wants, especially if they're giving you forms.
So the thought process is not "Oh, if I want a shipping invoice, I see that, right now, they want name, address and order number, and they want it url encoded, and they want it sent to http://example.com/shipping_invoice. so I'll just always send: name + "&" + address + "&" + orderNumber every time to http://example.com/shipping_invoice. Easy!".
Rather what you want to do is "I see they want a name, address, and order number. So what I'll do is for each request, I will read their form. I will check what fields they want each time. If they want name, I will give them name. If they want address, I will give them address. If they want order number, I will give them order number. And if they have any PRE-POPULATED fields (or even "hidden" fields), I will send those back too, and I will send it in the encoding they asked for, assuming I support it, to the URL I got from the action field of the FORM tag.".
You can see in the former case, you're ASSUMING that they want that payload every time. Just like if you were hard coding URLs. Whereas with the second, maybe they decided that the name and address are redundant, so they don't ask for it any more. Maybe they added some nice defaults for new functionality that you may not support yet. Maybe they changed the encoding to multi-part? Or changed the endpoint URL. Who knows.
You can only send what you know when you code the client, right? If they change things, then you can only do what you can do. If they add fields, hopefully they add fields that are not required. But if they break the interface, hey, they break the interface and you get to log an error. Not much you can do there.
But the more that you leverage HATEOS part, the more of it they make available to you so you can be more flexible: forms to fill out, following redirects properly, paying attention to encoding and media types, the more flexible your client becomes.
In the end, most folks simply don't do it in their clients. They hard code the heck out of them because it's simple, and they assume that the back end is not changing rapidly enough to matter, or that any downtime if such change does happen is acceptable until they correct the client. More typically, especially with internal systems, you'll simply get an email from the developers "hey were changing XYZ API, and it's going live on March 1st. Please update your clients and coordinate with the release team during integration testing. kthx".
That's just the reality. That doesn't mean you shouldn't do it, or that you shouldn't make your servers more friendly to smarter clients. Remember a bad client that assumes everything does not invalidate a good REST based system. These systems work just fine with awful clients. wget ftw, eh?
Related
In my years specifying and designing REST APIs, I'm increasingly finding that its very similar to designing a website where the user's journey and the actions and links are story-boarded and critical to the UX.
With my API designs currently, I return links in items and at the bottom of resources. They perform actions, mutate state or bring back other resources.
But its as if each link opens in a new tab; the client explores down a new route and their next options may narrow as they go.
If this were a website, it wouldn't necessarily be a good design. The user would have to either open links in new tabs or back-up the stack all the time to get things done.
Good sites are forward only, or indeed have a way to indicate a branch off the main flow, i.e. links automatically opening in new windows (via anchor tag target).
So should a good REST API be designed as if the client discards the current resource and advances to the next and is always advancing forward?
Or do we assume the client is building a map as it goes, like um a Roomba exploring our living room?
The thing with the map concept is that the knowledge that one should return to a previous resource, of the many it might know about, is in a sentient human, a guess. Computers are incapable of guessing and so its need programming, and this implies out-of-band static documentation and breaks REST.
In my years specifying and designing REST APIs, I'm increasingly finding that its very similar to designing a website
Yes - a good REST API looks a lot like a machine readable web site.
So should a good REST API be designed as if the client discards the current resource and advances to the next and is always advancing forward?
Sort of - the client is permitted to cache representations; so if you present a link, the client may "follow" the link to the cached representation rather than using the server.
That also means that the client may, at its discretion, "hit the back button" to go off and do something else (for example, if the link that it was hoping to find isn't present, it might try to achieve its goal another way). This is part of the motivation for the "stateless" constraint; the server doesn't have to pretend to know the client's currently displayed page to interpret a message.
Computers are incapable of guessing and so its need programming, and this implies out-of-band static documentation and breaks REST.
Fielding, writing in 2008
Of course the client has prior knowledge. Every protocol, every media type definition, every URI scheme, and every link relationship type constitutes prior knowledge that the client must know (or learn) in order to make use of that knowledge. REST doesn’t eliminate the need for a clue. What REST does is concentrate that need for prior knowledge into readily standardizable forms. That is the essential distinction between data-oriented and control-oriented integration.
I found this nugget in Fielding's original work.
https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
The model application is therefore an engine that moves from one state to the next by examining and choosing from among the alternative state transitions in the current set of representations. Not surprisingly, this exactly matches the user interface of a hypermedia browser. However, the style does not assume that all applications are browsers. In fact, the application details are hidden from the server by the generic connector interface, and thus a user agent could equally be an automated robot performing information retrieval for an indexing service, a personal agent looking for data that matches certain criteria, or a maintenance spider busy patrolling the information for broken references or modified content [39].
It reads like a great REST application would be built to be forward only, like a great website should be simple to use even without a back button, including advancing to a previously-seen representation (home and search links always available).
Interestingly we tend to really think about user journeys in web design, and the term journey is a common part of our developer language, but in API design this hasn't yet permeated.
Imagine I have a fully implemented REST API that offers HATEOAS as well.
Let's assume I browse the root and besides the self link two other links (e.g. one for /users and one for /orders) are returned. As far as I have heard, HATEOAS eliminates the need for out-of-band information. How should a client know what users means? Where are the semantics stored?
I know that is kind of a stupid question, but I really would like to know that.
Suppose you've just discovered Twitter and are using it for the very first time. In your Web browser you see a column of paragraphs with a bunch of links spread around the page. You know there's a way to do something with this, but you don't know specifically what actions are available. How do you figure out what they are?
Well, you look at the links and consider what their names mean. Some you recognize right away based on convention: As an experienced Web user, you have a pretty good idea what clicking on the "home", "search" and "sign out" links is meant to accomplish.
But other links have names you don't recognize. What does "retweet" do? What does that little star icon do?
There are basically two ways you, or anyone, will figure this out:
Through experimentation, which is to say, clicking on the links and seeing what happens, then deriving a meaning for each link from the results.
Through some source of out-of-band information, such as the online help, a tutorial found through a Google search or a friend sitting next to you explaining how the site works.
It's the same with REST APIs. (Recall that REST is intended to model the way the Web enables interaction with humans.)
Although in principle computers (or API-client developers) could deduce the semantics of link relations through experimentation, obviously this isn't practical. That leaves
Convention, based on for instance the IANA 's list of standardized link relations and their meanings.
Out-of-band information, such as API documentation.
There is nothing inconsistent in the notion of REST requiring client developers to rely on something beyond the API itself to understand the meaning of link relations. This is standard practice for humans using websites, and humans using websites is what REST models.
What REST accomplishes is removing the need for out-of-band information regarding the mechanics of interacting with the API. Going back to the Twitter example, you probably had to have somebody explain to you at some point what, exactly, the "retweet" link does. But you didn't have to know the specific URL to type in to make the retweet happen, or the ID number of the tweet you wanted to act on, or even the fact that tweets have unique IDs. The Web's design meant all this complexity was taken care of for you once you figured out which link you wanted to click.
And so it is with REST APIs. It's true that in most cases, the computer or programmer will just need to be told what each link relation means. But once they have that information, they can navigate through the entire API without needing to know anything else about the details of how it's all put together.
REST doesn't eliminate the need for out-of-band information. You still have to document your media-types. REST eliminates the need for out-of-band information in the client interaction with the API underlying protocol.
The semantics are documented by the media-type. Your API root is a resource of a media-type, let's say something like application/vnd.mycompany.dashboard.v1+json, and the documentation for that media type would explain that the link relation users leads to a collection of application/vnd.mycompany.user.v1+json related to the currently authenticated user, and orders leads to a collection of application/vnd.mycompany.order.v1+json.
The library analogy works here. When you enter a library after a book, you know how to read a book, you know how to walk to a bookshelf and pick up the book, and you know how to ask the librarian for directions. Each library may have a different layout and bookshelves may be organized differently, but as long as you know what you're looking for and you and the librarian speak the same language, you can find it. However, it's too much to expect the librarian to teach you what a book is.
I'm writing a web application for public consumption...How do you get over/ deal with the fear of User Input? As a web developer, you know the tricks and holes that exist that can be exploited particularly on the web which are made all the more easier with add-ons like Firebug etc
Sometimes it's so overwhelming you just want to forget the whole deal (does make you appreciate Intranet Development though!)
Sorry if this isn't a question that can be answered simply, but perhaps ideas or strategies that are helpful...Thanks!
One word: server-side validation (ok, that may have been three words).
There's lots of sound advice in other answers, but I'll add a less "programming" answer:
Have a plan for dealing with it.
Be ready for the contingency that malicious users do manage to sneak something past you. Have plans in place to mitigate damage, restore clean and complete data, and communicate with users (and potentially other interested parties such as the issuers of any credit card details you hold) to tell them what's going on. Know how you will detect the breach and close it. Know that key operational and development personnel are reachable, so that a bad guy striking at 5:01pm on the Friday before a public holiday won't get 72+ clear hours before you can go offline let alone start fixing things.
Having plans in place won't help you stop bad user input, but it should help a bit with overcoming your fears.
If its "security" related concerns you need to just push through it, security and exploits are a fact of life in software, and they need to be addressed head-on as part of the development process.
Here are some suggestions:
Keep it in perspective - Security, Exploits and compromises are going to happen to any application which is popular or useful, be prepared for them and expect them to occur
Test it, then test it again - QA, Acceptance testing and sign off should be first class parts of your design and production process, even if you are a one-man shop. Enlist users to test as a dedicated (and vocal) user will be your most useful tool in finding problems
Know your platform - Make sure you know the technology, and hardware you are deploying on. Ensure that relevant patches and security updates are applied
research - look at applications similar to your own and see what issues they experience, surf their forums, read their bug logs etc.
Be realistic - You are not going to be able to fix every bug and close every hole. Pick the most impactful ones and address those
Lots of eyes - Enlist as many people to review your designs and code as possible. This should be in addition to your QA resources
You don't get over it.
Check everything at server side - validate input again, check permissions, etc.
Sanitize all data.
That's very easy to write in bold letter and a little harder to do in practice.
Something I always did was wrap all user strings in an object, something like StringWrapper which forces you to call an encoding method to get the string. In other words, just provide access to s.htmlEncode() s.urlEncode().htmlEncode() etc. Of course you need to get the raw string so you can have a s.rawString() method, but now you have something you can grep for to review all uses of raw strings.
So when you come to 'echo userString' you will get a type error, and you are then reminded to encode/escape the string through the public methods.
Some other general things:
Prefer white-lists over black lists
Don't go overboard with stripping out bad input. I want to be able to use the < character in posts/comments/etc! Just make sure you encode data correctly
Use parameterized SQL queries. If you are SQL escaping user input yourself, you are doing it wrong.
First, I'll try to comfort you a bit by pointing out that it's good to be paranoid. Just as it's good to be a little scared while driving, it's good to be afraid of user input. Assume the worst as much as you can, and you won't be disappointed.
Second, program defensively. Assume any communication you have with the outside world is entirely compromised. Take in only parameters that the user should be able to control. Expose only that data that the user should be able to see.
Sanitize input. Sanitize sanitize sanitize. If it's input that will be displayed on the site (nicknames for a leaderboard, messages on a forum, anything), sanitize it appropriately. If it's input that might be sent to SQL, sanitize that too. In fact, don't even write SQL directly, use an intermediary of some sort.
There's really only one thing you can't defend from if you're using HTTP. If you use a cookie to identify somebody's identity, there's nothing you can do from preventing somebody else in a coffeehouse from sniffing the cookie of somebody else in that coffee house if they're both using the same wireless connection. As long as they're not using a secure connection, nothing can save you from that. Even Gmail isn't safe from that attack. The only thing you can do is make sure an authorization cookie can't last forever, and consider making them re-login before they do something big like change password or buy something.
But don't sweat it. A lot of the security details have been taken care of by whatever system you're building on top of (you ARE building on top of SOMETHING, aren't you? Spring MVC? Rails? Struts? ). It's really not that tough. If there's big money at stake, you can pay a security auditing company to try and break it. If there's not, just try to think of everything reasonable and fix holes when they're found.
But don't stop being paranoid. They're always out to get you. That's just part of being popular.
P.S. One more hint. If you have javascript like this:
if( document.forms["myForm"]["payment"].value < 0 ) {
alert("You must enter a positive number!");
return false;
}
Then you'd sure as hell have code in the backend that goes:
verify( input.payment >= 0 )
"Quote" everything so that it can not have any meaning in the 'target' language: SQL, HTML, JavaScript, etc.
This will get in the way of course, so you have to be careful to identify when this needs special handling, like through administrative privileges to deal with some if the data.
There are multiple types of injection and cross-site scripting (see this earlier answer), but there are defenses against all of them. You'll clearly want to look at stored procedures, white-listing (e.g. for HTML input), and validation, to start.
Beyond that, it's hard to give general advice. Other people have given some good tips, such as always doing server-side validation and researching past attacks.
Be vigilant, but not afraid.
No validation in web-application layer.
All validations and security checks should be done by the domain layer or business layer.
Throw exceptions with valid error messages and let these execptions be caught and processed at presentation layer or web-application.
You can use validation framework
to automate validations with the help
of custom validation attributes.
http://imar.spaanjaars.com/QuickDocId.aspx?quickdoc=477
There should be some documentation of known exploits for the language/system you're using. I know the Zend PHP Certification covers that issue a bit and you can read the study guide.
Why not hire an expert to audit your applications from time to time? It's a worthwhile investment considering your level of concern.
Our client always say: "Deal with my users as they dont differentiate between the date and text fields!!"
I code in Java, and my code is full of asserts i assume everything is wrong from the client and i check it all at server.
#1 thing for me is to always construct static SQL queries and pass your data as parameters. This limits the quoting issues you have to deal with enormously. See also http://xkcd.com/327/
This also has performance benefits, as you can re-use the prepared queries.
There are actually only 2 things you need to take care with:
Avoid SQL injection. Use parameterized queries to save user-controlled input in database. In Java terms: use PreparedStatement. In PHP terms: use mysql_real_escape_string() or PDO.
Avoid XSS. Escape user-controlled input during display. In Java/JSP terms: use JSTL <c:out>. In PHP terms: use htmlspecialchars().
That's all. You don't need to worry about the format of the data. Just about the way how you handle it.
Yes, I realize this question was asked and answered, but I have specific questions about this that I feel were not clear on that thread and I'd prefer not to get lost in the shuffle on another thread as well.
Previous threads said that rendering the email address to an image the way Facebook does is overkill and unprofessional user experience for business/professional websites. And it seems that the general consensus is to use a JavaScript document.write solution using html entities or some other method that breaks up and/or makes the string unreadable by a simple bot. The application I'm building doesn't even need the "mailto:" functionality, I just need to display the email address. Also, this is a business web application, so it needs to look/act as professional as possible. Here are my questions:
If I go the document.write route and pass the html entity version of each character, are there no web crawlers sophisticated enough to execute the javascript and pull the rendered text anyway? Or is this considered best practice and completely (or almost completely) spammer proof?
What's so unprofessional about the image solution? If Facebook is one of the highest trafficked applications in the world and not at all run by amateurs, why is their method completely dismissed in the other thread about this subject?
If your answer (as in the other thread) is to not bother myself with this issue and let the users' spam filters do all the work, please explain why you feel this way. We are displaying our users' email addresses that they have given us, and I feel responsible to protect them as much as I can. If you feel this is unnecessary, please explain why.
Thanks.
It is not spammer proof. If someone looks at the code for your site and determines the pattern that you are using for your email addresses, then specific code can be written to try and decipher that.
I don't know that I would say it is unprofessional, but it prevents copy-and-paste functionality, which is quite a big deal. With images, you simply don't get that functionality. What if you want to copy a relatively complex email address to your address book in Outlook? You have to resort to typing it out which is prone to error.
Moving the responsibility to the users spam filters is really a poor response. While I believe that users should be diligent in guarding against spam, that doesn't absolve the person publishing the address from responsibility.
To that end, trying to do this in an absolutely secure manner is nearly impossible. The only way to do that is to have a shared secret which the code uses to decipher the encoded email address. The problem with this is that because the javascript is interpreted on the client side, there isn't anything that you can keep a secret from scrapers.
Encoders for email addresses nowadays generally work because most email bot harvesters aren't going to concern themselves with coding specifically for every site. They are going to try and have a minimal algorithm which will get maximum results (the payoff isn't worth it otherwise). Because of this, simple encoders will defeat most bots. But if someone REALLY wants to get at the emails on your site, then they can and probably easily as well, since the code that writes the addresses is publically available.
Taking all this into consideration, it makes sense that Facebook went the image route. Because they can alter the image to make OCR all but impossible, they can virtually guarantee that email addresses won't be harvested. Given that they are probably one of the largest email address repositories in the world, it could be argued that they carry a heavier burden than any of us, and while inconvenient, are forced down that route to ensure security and privacy for their vast user base.
Quite a few reasons Javascript is a good solution for now (that may change as the landscape evolves).
Javascript obfuscation is a better mouse trap for now
You just need to outrun the others. As long as there are low hanging fruit, spammers will go for those. So unless everyone starts moving to javascript, you're okay for now at least
most spammers use http based scripts which GET and parse using regex. using a javascript engine to parse is certainly possible but will slow things down
Regarding the facebook solution, I don't consider it unprofessional but I can clearly see why purists may disagree.
It breaks accessibility standards (cannot be parsed by browsers, voice readers or be clicked.
It breaks semantic construct (it's an image, not a mailto link anymore)
It breaks the presentational layer. If you increase browser default font size or use high contrast custom CSS, it won't apply to the email.
Here is a nice blog post comparing a few methods, with benchmarks.
http://techblog.tilllate.com/2008/07/20/ten-methods-to-obfuscate-e-mail-addresses-compared/
What is the benefit of Connectedness as defined by Resource Oriented Architecture (ROA)? The way I understand it, the crux of Connectedness is the ability to crawl the entire application state using only the root URIs.
But how useful is that really?
For example, imagine that HTTP GET http://example.com/users/joe returns a link to http://examples.com/uses/joe/bookmarks.
Unless you're writing a dumb web crawler (and even then I wonder), you still need to teach the client what each link means at compile-time. That is, the client needs to know that the "bookmarks URI" returns a URI to Bookmark resources, and then pass control over to special Bookmark-handling algorithms. You can't just pass links blindly to some general client method. Since you need this logic anyway:
What's the difference between the client figuring out the URI at runtime versus providing it at compile-time (making http://example.com/users/bookmarks a root URI)?
Why is linking using http://example.com/users/joe/bookmarks/2 preferred to id="2"?
The only benefit I can think of is the ability to change the path of non-root URIs over time, but this breaks cached links so it's not really desirable anyway. What am I missing?
You are right that changing Uris is not desirable but it does happen and using complete Uris instead of constructing them makes change easier to deal with.
One other benefit is that your client application can easily retrieve resources from multiple hosts. If you allowed your client to build the URI's the client would need to know on which host certain resources reside. This is not a big deal when all of the resources live on a single host but it becomes more tricky when you are aggregating data from multiple hosts.
My last thought is that maybe you are oversimplifying the notion of connectedness by looking at it as a static network of links. Sure the client needs to know about the possible existence of certain links within a resource but it does not necessarily need to know exactly what are the consequences of following that link.
Let me try an give an example: A user is placing an order for some items and they are ready to submit their cart. The submit link may actually go to two different places depending on whether the order will be delivered locally or internationally. Maybe orders over a certain value need to go through an extra step. The client just knows that it has to follow the submit link, but it does not have compiled in knowledge of where to go next. Sure you could build a common "next step" type of resource so the client could have this knowledge explicitly but by having the server deliver the link dynamically you introduce a lot less client-server coupling.
I think of the links in resources as placeholders for what the user could choose to do. Who will do the work and how it will be done is determined by what uri the server attaches to that link.
Its easier to extend, and you could write small apps and scripts to work along with the core application fairly easily.
Added: Well the whole point starts with the idea that you don't specify at compile-time how to convert URIs to uids in a hardcoded fashion, instead you might use dictionaries or parsing to do that, giving you a much more flexible system.
Then later on say someone else decides to change the URI syntax, that person could write a small script that translates URIs without ever touching your core Application. Another benefit is if your URIs are logical other users, even within a corporate scenario, can easily write Mash-ups to make use of your system, without touching your original App or even recompiling it.
Of course the counter side to the whole argument is that it'll take you longer to implement a URI based system over a simple UID system. But if your App will be used by others in a regular fashion, that initial time investment will greatly payback (it could be said to have a good extensibility based ROI).
Added: And another point which is a mater of tastes to some degree is the URI itself will be a better Name, because it conveys a logical and defined meaning
I'll add my own answer:
It is far easier to follow server-provided URIs than construct them yourself. This is especially true as resource relationships become too complex to be expressed in simple rules. It's easier to code the logic once in the server than re-implement it in numerous clients.
The relationship between resources may change even if individual resource URIs remain unchanged. For example, imagine Google Maps indexes their map tiles from 0 to 100, counting from the top-left to the bottom-right of the screen. If Google Maps were to change the scale of their tiles, clients that calculate relative tile indexes would break.
Custom IDs identify a resource. URIs go a step further by identifying how to retrieve the resource representation. This simplifies the logic of read-only clients such as web-crawlers or clients that download opaque resources such as video or audio files.