Graceful Degradation with REST in CakePHP - rest

Alright, so a better title here may have been "Progressive Enhancement with REST in CakePHP", but at least now I'll know you didn't read the question if your answer just refers to the difference between the two ;)
I'm pretty familiar with REST and how to integrate it with CakePHP, but I'm not 100% on board with how to still maintain a conventionally functioning website. Using Router::mapResources sounds like a great idea, but this creates a problem with maintaining the "gracefully degradation" version of the site, because both POST requests to /resource/ AND GET requests for /resource/add will route to the same action (add). Clearly I'll want this action to return a JSON object if they're using the REST api, but if they're using the degraded version of the site (no JS perhaps), it should be a add form, right?
What's the best way to deal with this. Do you route your REST requests to other action names using Router::resourceMap()? Do you do that crazy hack I saw to have the /api/ prefix part of the resourceMap so you can use api_action functions? Do you have the actions handle both REST and conventional requests via checking isAjax()? If so, how do you ensure that you can rely on the browser to properly support the other two request types?
I've searched around quite a bit but haven't found anything about how to keep conventional requests available in Cake along side REST, so if anyone has any advice or experience, I'd love to hear it!

CakePHP uses extension routing as well, via Router::parseExtension() so;
/test/action will render views/test/action.ctp
/test/action.html also
/test/action.json will render views/test/json/action.ctp
/test/action.xml will render views/test/xml/action.ctp
If all views are designed to handle the same data as set by your controller, you'll be able to show a regular HTML form and handle the posted data the same way as you'd handle the AJAX request.
You'll probably might have to add checks if any data is posted/submitted inside the /add, /edit, /delete actions to prevent items being deleted without a form being posted (haven't tested that though, it might be that cake blocks these urls if mapresources is set for the controller)
REST in CakePHP:
http://book.cakephp.org/2.0/en/development/rest.html
(Extension) Routing
http://book.cakephp.org/2.0/en/development/routing.html#file-extensions

Related

Deal with huge forms in Spring

I hope you can help me. I've tried to look for a solution to this problem or for a similar question here in StackOverflow but couldn't find any, so here it is.
We must develop a feature in which we will have a multi-page form. After filling all the pages of the form, the user will submit it. The problem is that the final submit will send many parameters (around 500), and we're afraid we may encounter problems with request size in many cases.
An initial approach would be having an object in session, which would be partially filled when the user navigates through the pages. I.e. when the user fills the fields in page 1, the object in session is partially filled with that data, and so on. That way, we wouldn't have to pass all the request parameters in every step and the final submit wouldn't have to send so many data. But we don't want to use this approach because we don't want to use the session to store data that are specific to a single functionality or bunch of pages.
Another approach would be saving data to a database after the user fills each page of the form, and retrieving it after the final submit so we can deal with the whole thing. Maybe we could do this, but it would delay the development of the project since it's not a trivial task.
I wonder if there's a better approach to handle this. Maybe using #Cacheable in some intelligent way, maybe using Spring WebFlow (which I've never worked with), maybe other alternatives I can't think of. Is there any strategy or technology I could use for this? Currently we are working with Spring 3.2. We are using jQuery as well, just in case it's relevant.
Thank you.
Writing as answer as I would not fit into comment:
There is no limit to request body size for POST requests. Only GET requests are limited (i.e. when parameters are sent via query parameters). No need to worry here.
I don't understand why you don't want to use session (#SessionAttributes). Having multi-step forms is one of the use-case this was designed for I would say.
Storing incomplete model objects in database is also a good approach as it is very close to REST principles. We have used this multiple times in our company.
Spring WebFlow is also a good approach if you don't want to handle all the transitional logic yourself. However SWF is not that simple technology to learn and you should include that fact in your effort estimations.
There is another approach, which I would say is becoming more and more popular: doing all the logic dynamically on a single webpage (e.g. via AngularJS or some jQuery plugin) and submit the result as a JSON object.
There is no definitive answer to your question without being very specific about your use-case and your application. And even with exhaustive description it is question about personal preference.
The single dynamic page approach (e.g. AngularJS) would be good if your overall application architecture is going to be designed that way.
Spring WebFlow would be nice if you are familiar with that technology or if you are planning on having more multi-step forms throughout the application (i.e. I would not go for SWF if I need to solve just one use-case with it).
I would probably go for #SessionAttributes if I need to quickly solve a single multi-step form. There are some complexities connected to that (partial validation and partial binding namely)... so again this might not be the simplest approach in the end.
Spring Webflow would handle your use case nicely through its flowScope.
Anyway, I you don't want to go through the pain of integrating its infrastructure only for that, the session attribute you mentioned will work perfectly and it's a correct approach. Just make sure you remove it when it's not neccesary anymore to prevent memory leaks.

Scraping WebObjects website & REST

I need to programmatically interact with a WebObjects website and extract data from the responses. The particular WebObjects site I am scraping uses component actions and stores sessions in cookies (not urls). This means that all urls look something like this:
http://example.com/WOApp/WebObjects/WOApp.woa/wo/7.0.0.0.29.1.1.1
My first questions are:
Does urls like this not completely destroy local and shared caching opportunities (cachable constraint in REST)? I imaging the only effective caching with such urls is the WebObjects server itself.
Isn't addressability broken as well? Each resource does have a unique endpoint, but it changes constantly. Furthermore (I think) that WebObjects also makes too old URLs invalid since they "time-out" after a period of time. I'm not sure whether this applies only to urls with sessions though.
Regarding the scraping I am not sure whether it's possible to extract any meaningful endpoints from the website. For example, with a normal website I would look through the HTML and extract the POST urls, then use them in my scraper by posting directly to them instead of going through the normal request-response cycle.
In this case I obviously cannot use any URLs extracted from the HTML since they are dynamically generated on each request, but I read something about being able to access WebObjects components directly if the security settings have not been set to disallow this (see https://developer.apple.com/legacy/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.5/PDF/WebObjectsDevGuide.pdf, p. 53 "Limitations on Direct requests"). I don't understand exactly how to do this though or if it's even possible.
If it's not possible what would be a good approach then? The only options I can think of is:
Using a full-blown browser client to interact with the website (e.g. WatiR or Selenium) and extract & process the HTML from their responses
Manually extracting the dynamic end-points by first request the page where they are on and then find the place in the HTML where they're located. Then use them afterwards as if they were "static".
I am interested in opinions on how to approach this scenario since I don't believe any of the solutions above are particularly good.
You've asked a number of questions, and I'll see if I can cover each in turn.
Does urls like this not completely destroy local and shared caching
opportunities (cachable constraint in REST)? I imaging the only
effective caching with such urls is the WebObjects server itself.
There is, indeed, a page cache within the WebObjects application server, and you're right to observe that these component action URLs probably thwart any other kind of caching. Additionally, even though the session ID is not present in the URL, you'd need the session ID in the cookie to re-create the same page, so having just that URL would get you a session restoration error from the application server.
Isn't addressability broken as well? Each resource does have a unique
endpoint, but it changes constantly.
Well, yes, on the face of it this is true. You've given a component action URL as an example, and they're tied to the session.
Furthermore (I think) that
WebObjects also makes too old URLs invalid since they "time-out" after
a period of time. I'm not sure whether this applies only to urls with
sessions though.
Again, all true. Component action URLs generate sessions, and sessions time out.
At this point, let me take a quick diversion. I'm assuming you're not the owner of the WebObjects application—you're talking about having to scrape a WebObjects app, and you've identified some ways in which this particular app doesn't conform to REST principles. You're completely right—a fully component-action-based WebObjects application won't be RESTful. WebObjects pre-dates REST by a few years. Having said that, there are ways in which a WebObjects application can be completely RESTful:
Using session-less direct actions gives a degree of REST-like behaviour, and would certainly solve the problems you identify with caching, addressability and expiry.
Using the ERRest framework to create a 100% RESTful application.
Of course, none of this will help you if you're just trying to scrape a legacy application.
Regarding the scraping I am not sure whether it's possible to extract
any meaningful endpoints from the website. For example, with a normal
website I would look through the HTML and extract the POST urls, then
use them in my scraper by posting directly to them instead of going
through the normal request-response cycle.
Again, if it's a fully component action-based application, you're right—all those URLs will be dynamically generated and useless to you.
In this case I obviously cannot use any URLs extracted from the HTML
since they are dynamically generated on each request, but I read
something about being able to access WebObjects components directly if
the security settings have not been set to disallow this…
That's talking about getting a component to render directly from its template with some restrictions:
As you note, the application can easily prevent it from happening at all.
As mentioned on p.53, the user input and action-invocation phases of rendering the component are skipped, which probably means this approach would be limited to rendering a component that didn't have any dynamic content anyway. This might be of some very limited use to you, though you'd need to know the component names you were interested in, and they wouldn't normally be exposed anywhere.
I'm not sure you're going to find anything better than the types of high-level functional approaches you've already suggested above, such as automating at the browser level with Selenium. If what you need is REST-style direct addressability of resources within the application, you're not going to get that unless you can re-write the application to use direct actions or ERRest where you need them.
A little late, but could help.
I use the Apache's mod_ext_filter (little modified) to pre/post filter the requests/responses from our WebObjects application. The filter calls PHP scripts and can read the dynamical hyperrefs and other things from the HTML pages. The scripts can also modify the HTTP requests, so we can programatically add/remove parameters from the request to implement new workflows in front of the legacy app and cleanup the requests before they will reach WebObjects. It is also possible to handle an additional database within the scripts and store some things over multiple requests.
So you can get the dynamically created links (maybe a button's name or HTML form destination) and can recognize these names within the request.
It is also possible to "remote control" such applications with little scripts like "click on the third button on the page". The only thing you need is a DOM parser to get the structure of the HTML pages and then rebuild the actions which the browser would do (i.e. create the HTTP request manually and send it as POST to the extracted form destination href). The only problem is the Javascript code, which we analyze and reprogram within PHP (i.e. enable/disable input elements, so they will not be transmitted within the requests)
There were some problems within the WebObjects Adapter Module for Apache. It still uses Content-Length within the HTTP header, which you cannot change in mod_ext_filter. If you change the HTML or the parameters within the request, the length of the content will not longer match. But it is possible to change that.
Theoretically it could also be possible to control such an closed-source legacy application from a new UI on a tablet or smartphone, which delegates the user interaction to the backend WebObjects app.
The scripts depends on the page structure, so if your WebObjects app will be changed, you have to correct some things in the scripts (i.e. third button could be now the fourth button).
It should also be possible to add a Restful interface in front of the application and query the data from the legacy app by the filter scripts.

Backbone with non-Restful services

I am looking at backbone for a project. I have many legacy services which are non-restful that I have no choice but to use them as is. I see that I have to override Backbone.Model.sync, parse and many other methods and handle the ajax service calls. I am not sure how the routing is going to work but I can see that there will be a lot of extra code to make that work. My question is: Is Backbone really recommended if I have to work with non-restful services? I don't find any examples or discussion anywhere online which talks about it.
Backbone's automatic understanding of REST conventions boils down to probably about 50 lines of code. If your back-end APIs are all odd and unique, yes, you'll need to write code to talk to them, but you'll need that no matter which framework you use because no framework is going to understand the unique weirdnesses of your back end services. If you feel good about the basic MVC with event bindings design of backbone, stick with it. That's the core of it. And it's a tiny core, that's why it's called backbone.
As per routing, that's really handled in the browser as a single page app and the browser URL routing and associated backbone router/view code is entirely separate from the API patterns and URLs that provide the back end services. The two can be utterly unrelated and that's fine. You'll still be able to define your own browser routing however you see fit.

How to add edit-layer as Plack-middleware?

I have an idea to add a edit-layer to website as a Plack middleware.
Explanation: let's say, we create a website, based on some framework and templates and CSS (requesting it like /some/page). Now we could create a middleware so that every request to pages starting with adm (like /adm/some/page) shows the same page, but adds a layer for content editing. So we could easily look and use the page as visitors do, but with double-click on block-level element we could modify or add content. So middleware should bind certain block-elements with certain events (double-click) and set handlers too (with some Javascript library).
For now it is just an idea and i have not seen such approach in any CMS. I am looking for hints and ideas and examples, how to start and implement such system. I hope, there is already done something like that.
You could do it, but I don't think you want to do this. My understanding is that Plack::Middleware's are supposed to be generic, and implementing a CMS as a plack middleware limits its re-usability, and its out of place, there is no inherent connection between a middleware and a CMS.
See these as examples Plack::Middleware::OAuth, Plack::Middleware::Debug, Plack::Middleware::iPhone, Plack::Middleware::Image::Scale, Plack::Middleware::HTMLMinify
It would be trivial to add a middleware filter to insert a form in your html based on /adm/ or /admin/ or whatever ... and mapping the url to the dispatch would highly depend on the underlying CMS model/view/controller framework, which is why frameworks such as Catalyst, Mojolicious and other already provide this feature
See http://advent.plackperl.org/2009/12/day-23-write-your-own-middleware.html
Basically, I think this is a job for a view/controller of your application, a plugin, not a wrapper for your application (middleware)
I know my explanation is lacking but hopefully you catch my drift

What is the advantage of using REST instead of non-REST HTTP?

Apparently, REST is just a set of conventions about how to use HTTP. I wonder which advantage these conventions provide. Does anyone know?
I don't think you will get a good answer to this, partly because nobody really agrees on what REST is. The wikipedia page is heavy on buzzwords and light on explanation. The discussion page is worth a skim just to see how much people disagree on this. As far as I can tell however, REST means this:
Instead of having randomly named setter and getter URLs and using GET for all the getters and POST for all the setters, we try to have the URLs identify resources, and then use the HTTP actions GET, POST, PUT and DELETE to do stuff to them. So instead of
GET /get_article?id=1
POST /delete_article id=1
You would do
GET /articles/1/
DELETE /articles/1/
And then POST and PUT correspond to "create" and "update" operations (but nobody agrees which way round).
I think the caching arguments are wrong, because query strings are generally cached, and besides you don't really need to use them. For example django makes something like this very easy, and I wouldn't say it was REST:
GET /get_article/1/
POST /delete_article/ id=1
Or even just include the verb in the URL:
GET /read/article/1/
POST /delete/article/1/
POST /update/article/1/
POST /create/article/
In that case GET means something without side-effects, and POST means something that changes data on the server. I think this is perhaps a bit clearer and easier, especially as you can avoid the whole PUT-vs-POST thing. Plus you can add more verbs if you want to, so you aren't artificially bound to what HTTP offers. For example:
POST /hide/article/1/
POST /show/article/1/
(Or whatever, it's hard to think of examples until they happen!)
So in conclusion, there are only two advantages I can see:
Your web API may be cleaner and easier to understand / discover.
When synchronising data with a website, it is probably easier to use REST because you can just say synchronize("/articles/1/") or whatever. This depends heavily on your code.
However I think there are some pretty big disadvantages:
Not all actions easily map to CRUD (create, read/retrieve, update, delete). You may not even be dealing with object type resources.
It's extra effort for dubious benefits.
Confusion as to which way round PUT and POST are. In English they mean similar things ("I'm going to put/post a notice on the wall.").
So in conclusion I would say: unless you really want to go to the extra effort, or if your service maps really well to CRUD operations, save REST for the second version of your API.
I just came across another problem with REST: It's not easy to do more than one thing in one request or specify which parts of a compound object you want to get. This is especially important on mobile where round-trip-time can be significant and connections are unreliable. For example, suppose you are getting posts on a facebook timeline. The "pure" REST way would be something like
GET /timeline_posts // Returns a list of post IDs.
GET /timeline_posts/1/ // Returns a list of message IDs in the post.
GET /timeline_posts/2/
GET /timeline_posts/3/
GET /message/10/
GET /message/11/
....
Which is kind of ridiculous. Facebook's API is pretty great IMO, so let's see what they do:
By default, most object properties are returned when you make a query.
You can choose the fields (or connections) you want returned with the
"fields" query parameter. For example, this URL will only return the
id, name, and picture of Ben:
https://graph.facebook.com/bgolub?fields=id,name,picture
I have no idea how you'd do something like that with REST, and if you did whether it would still count as REST. I would certainly ignore anyone who tries to tell you that you shouldn't do that though (especially if the reason is "because it isn't REST")!
Simply put, REST means using HTTP the way it's meant to be.
Have a look at Roy Fielding's dissertation about REST. I think that every person that is doing web development should read it.
As a note, Roy Fielding is one of the key drivers behind the HTTP protocol, as well.
To name some of the advandages:
Simple.
You can make good use of HTTP cache and proxy server to help you handle high load.
It helps you organize even a very complex application into simple resources.
It makes it easy for new clients to use your application, even if you haven't designed it specifically for them (probably, because they weren't around when you created your app).
Simply put: NONE.
Feel free to downvote, but I still think there are no real benefits over non-REST HTTP. All current answers are invalid. Arguments from the currently most voted answer:
Simple.
You can make good use of HTTP cache and proxy server to help you handle high load.
It helps you organize even a very complex application into simple resources.
It makes it easy for new clients to use your application, even if you haven't designed it specifically for them (probably, because they weren't around when you created your app).
1. Simple
With REST you need additional communication layer for your server-side and client-side scripts => it's actually more complicated than use of non-REST HTTP.
2. Caching
Caching can be controlled by HTTP headers sent by server. REST does not add any features missing in non-REST.
3. Organization
REST does not help you organize things. It forces you to use API supported by server-side library you are using. You can organize your application the same way (or better) when you are using non-REST approach. E.g. see Model-View-Controller or MVC routing.
4. Easy to use/implement
Not true at all. It all depends on how well you organize and document your application. REST will not magically make your application better.
IMHO the biggest advantage that REST enables is that of reducing client/server coupling. It is much easier to evolve a REST interface over time without breaking existing clients.
Discoverability
Each resource has references to other resources, either in hierarchy or links, so it's easy to browse around. This is an advantage to the human developing the client, saving he/she from constantly consulting the docs, and offering suggestions. It also means the server can change resource names unilaterally (as long as the client software doesn't hardcode the URLs).
Compatibility with other tools
You can CURL your way into any part of the API or use the web browser to navigate resources. Makes debugging and testing integration much easier.
Standardized Verb Names
Allows you to specify actions without having to hunt the correct wording. Imagine if OOP getters and setters weren't standardized, and some people used retrieve and define instead. You would have to memorize the correct verb for each individual access point. Knowing there's only a handful of verbs available counters that problem.
Standardized Status
If you GET a resource that doesn't exist, you can be sure to get a 404 error in a RESTful API. Contrast it with a non-RESTful API, which may return {error: "Not found"} wrapped in God knows how many layers. If you need the extra space to write a message to the developer on the other side, you can always use the body of the response.
Example
Imagine two APIs with the same functionality, one following REST and the other not. Now imagine the following clients for those APIs:
RESTful:
GET /products/1052/reviews
POST /products/1052/reviews "5 stars"
DELETE /products/1052/reviews/10
GET /products/1052/reviews/10
HTTP:
GET /reviews?product_id=1052
POST /post_review?product_id=1052 "5 stars"
POST /remove_review?product_id=1052&review_id=10
GET /reviews?product_id=1052&review=10
Now think of the following questions:
If the first call of each client worked, how sure can you be the rest will work too?
There was a major update to the API that may or may not have changed those access points. How much of the docs will you have to re-read?
Can you predict the return of the last query?
You have to edit the review posted (before deleting it). Can you do so without checking the docs?
I recommend taking a look at Ryan Tomayko's How I Explained REST to My Wife
Third party edit
Excerpt from the waybackmaschine link:
How about an example. You’re a teacher and want to manage students:
what classes they’re in,
what grades they’re getting,
emergency contacts,
information about the books you teach out of, etc.
If the systems are web-based, then there’s probably a URL for each of the nouns involved here: student, teacher, class, book, room, etc. ... If there were a machine readable representation for each URL, then it would be trivial to latch new tools onto the system because all of that information would be consumable in a standard way. ... you could build a country-wide system that was able to talk to each of the individual school systems to collect testing scores.
Each of the systems would get information from each other using a simple HTTP GET. If one system needs to add something to another system, it would use an HTTP POST. If a system wants to update something in another system, it uses an HTTP PUT. The only thing left to figure out is what the data should look like.
I would suggest everybody, who is looking for an answer to this question, go through this "slideshow".
I couldn't understand what REST is and why it is so cool, its pros and cons, differences from SOAP - but this slideshow was so brilliant and easy to understand, so it is much more clear to me now, than before.
Caching.
There are other more in depth benefits of REST which revolve around evolve-ability via loose coupling and hypertext, but caching mechanisms are the main reason you should care about RESTful HTTP.
It's written down in the Fielding dissertation. But if you don't want to read a lot:
increased scalability (due to stateless, cache and layered system constraints)
decoupled client and server (due to stateless and uniform interface constraints)
reusable clients (client can use general REST browsers and RDF semantics to decide which link to follow and how to display the results)
non breaking clients (clients break only by application specific semantics changes, because they use the semantics instead of some API specific knowledge)
Give every “resource” an ID
Link things together
Use standard methods
Resources with multiple representations
Communicate statelessly
It is possible to do everything just with POST and GET? Yes, is it the best approach? No, why? because we have standards methods. If you think again, it would be possible to do everything using just GET.. so why should we even bother do use POST? Because of the standards!
For example, today thinking about a MVC model, you can limit your application to respond just to specific kinds of verbs like POST, GET, PUT and DELETE. Even if under the hood everything is emulated to POST and GET, don't make sense to have different verbs for different actions?
Discovery is far easier in REST. We have WADL documents (similar to WSDL in traditional webservices) that will help you to advertise your service to the world. You can use UDDI discoveries as well. With traditional HTTP POST and GET people may not know your message request and response schemas to call you.
One advantage is that, we can non-sequentially process XML documents and unmarshal XML data from different sources like InputStream object, a URL, a DOM node...
#Timmmm, about your edit :
GET /timeline_posts // could return the N first posts, with links to fetch the next/previous N posts
This would dramatically reduce the number of calls
And nothing prevents you from designing a server that accepts HTTP parameters to denote the field values your clients may want...
But this is a detail.
Much more important is the fact that you did not mention huge advantages of the REST architectural style (much better scalability, due to server statelessness; much better availability, due to server statelessness also; much better use of the standard services, such as caching for instance, when using a REST architectural style; much lower coupling between client and server, due to the use of a uniform interface; etc. etc.)
As for your remark
"Not all actions easily map to CRUD (create, read/retrieve, update,
delete)."
: an RDBMS uses a CRUD approach, too (SELECT/INSERT/DELETE/UPDATE), and there is always a way to represent and act upon a data model.
Regarding your sentence
"You may not even be dealing with object type resources"
: a RESTful design is, by essence, a simple design - but this does NOT mean that designing it is simple. Do you see the difference ? You'll have to think a lot about the concepts your application will represent and handle, what must be done by it, if you prefer, in order to represent this by means of resources. But if you do so, you will end up with a more simple and efficient design.
Query-strings can be ignored by search engines.