Can the REST support in CF10 be used to replace the use of URL Rewrite / ModRewrite for SEO-friendly URL? Write a thin layer that defines the GET and POST method, and <cfinclude> in the correct page?
Or would it tax the server too much and better leave it to the web server to deal with?
Once in CFML, it'd be much easier to be version controlled and maintained.
Thanks
If I understand what you are saying (and perhaps I do not) you would create a handler that would intercept a request, parse out the variables, then request the appropriate page via REST? If that's what you have in mind then I'm not sure I follow what you would gain by this. REST (in general) is more of a generic HTTP API for getting at methods - not so much a page / content paradigm (thought I suppose it could be).
If what you are looking for is to use CF as an rewrite SEO URL handler you can do this now. To use an IIS example, you can create a "custom 404" handler - a CFM page - that gets all the requests that are not tied to a specific document. The handler teases out the variables by parsing through the URL, then "includes" the correct cfm Code or page. That sounds a bit like what you want - but it's not really REST.
Perhaps you are thinking of doing some sort of CFHTTP call where you grab the content you need by constructing the query string from the URL. So if someone loads a url like:
blah.com/productid/550
You could write code like so -
<cfhttp
url="http://blah.com/index.cfm?#listfirst(cgi.script_name,'/')#=#listlast(cgi.script_name,'/')#"/>
<cfoutput>#cfhttp.filecontent#</cfoutput>
While this would do the trick you would be better off using cfinclude rather than this approach. An approach like the one above would actually generate an additional thread per request - one thread for the browser's request and another for the cfhttp request.
Finally I would suggest politely that URL Rewrite (in apache or IIS) is more efficient and more "conventional" and therefore probably a better choice in general.
#Henry
REST is not a replacement for the URL rewriting.
First of all the REST URLs have a format.
http://localhost:8500/rest/App_Name/Rest_Path
"rest" part is mandatory. If you want to change "rest" you can change it in the web.xml (Change the URL Mapping).
App_Name is not mandatory. A server can have a default rest application. For default applications you do not need to specify the AppName. For accessing other (non-default) rest applications, you should specify, the AppName. You can make an application default in the Rest Service registration page in the admin.
Rest_Path identifies the CFC and the function in the CFC that needs to be invoked on the HTTP call.
If these URL format is acceptable, then the URL of these formats can be mapped to a specific function in a CFC. When ever an HTTP call is made to the URL, the corresponding CFFunction will be invoked. By using REST, you are accessing a function in the CFC. It is not possible to access a CFC or a CFM directly in this way. But in the function you can implement whatever you want(Like invoking a CFC, Invoking another CFM etc.).
Does this reply answer your question?
Thanks,
Paul
Even if one could do this, I'd say it's co-opting the wrong tool to do the wrong job. URL rewriting is the web server's job, not the CF server's, and the web server will be a hell of a lot better at it than CF will be. CF's REST interface is for building APIs, not for doing URL rewriting.
If one was to want to handle URL rewriting with CF, then using the 404 handler or onMissingTemplate() handler would be a better fit here, would it not? At least you're using a tool intended for the job (if not the best one).
As for version control... an .htaccess file is just a text file, like a CFML file is. I've not looked too closely at IIS's rewrite module, but can it not use a text file to configure / maintain its rewrites? Obviously Apache can, and we use Helicon's ISAPI Rewrite module which uses an mod_rewrite-compatible .htaccess file.
It seems to me like you're trying to make the developer's job easier by using an approach that would penalise the production performance. "Making the developer's life easier" should never be grounds for compromising the production environment (IMO, obviously).
Related
I've written a REST server in Delphi XE (using the wizard) and I want to change the URLs a bit so that instead of having
http://192.168.1.84:8080/datasnap/rest/TServerMethods1/GetListings
I get something that looks more like http://192.168.1.84:8080/GetListings
Is there a nice easy of doing this?
The naming convention is (Delphi XE3):
http://my.site.com/datasnap/rest/URIClassName/URIMethodName[/inputParameter]
You can easily change the "datasnap" and "rest" part of the URL in the TDSHTTPWebDispatcher component properties. You can change the Class Name and Method Name of the URL by simply changing the name of your class and method. However, you still have to have 4 components to the URL, so for example it could be:
http://my.site.com/api/v1/People/Listing
See here:
http://docwiki.embarcadero.com/RADStudio/XE3/en/REST#Customizing_the_URL_for_REST_requests
You could put IIS or Apache in between to accomplish this, and indeed rewrite the URL to point to your service the way you like.
That provides some more advantages anyway (security and scalability mostly). For example, you can create a fail-safe setup with double servers, or you can create multiple machines with your service, and have your web server do the load balancing for example.
You'll get extra logging capabilities, and if you easily want to serve other web content it's easy to have a full fledged web server anyway.
URL rewriting is usually done in the web server configuration, in Apache using entries in the .htaccess file
I need to programmatically interact with a WebObjects website and extract data from the responses. The particular WebObjects site I am scraping uses component actions and stores sessions in cookies (not urls). This means that all urls look something like this:
http://example.com/WOApp/WebObjects/WOApp.woa/wo/7.0.0.0.29.1.1.1
My first questions are:
Does urls like this not completely destroy local and shared caching opportunities (cachable constraint in REST)? I imaging the only effective caching with such urls is the WebObjects server itself.
Isn't addressability broken as well? Each resource does have a unique endpoint, but it changes constantly. Furthermore (I think) that WebObjects also makes too old URLs invalid since they "time-out" after a period of time. I'm not sure whether this applies only to urls with sessions though.
Regarding the scraping I am not sure whether it's possible to extract any meaningful endpoints from the website. For example, with a normal website I would look through the HTML and extract the POST urls, then use them in my scraper by posting directly to them instead of going through the normal request-response cycle.
In this case I obviously cannot use any URLs extracted from the HTML since they are dynamically generated on each request, but I read something about being able to access WebObjects components directly if the security settings have not been set to disallow this (see https://developer.apple.com/legacy/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.5/PDF/WebObjectsDevGuide.pdf, p. 53 "Limitations on Direct requests"). I don't understand exactly how to do this though or if it's even possible.
If it's not possible what would be a good approach then? The only options I can think of is:
Using a full-blown browser client to interact with the website (e.g. WatiR or Selenium) and extract & process the HTML from their responses
Manually extracting the dynamic end-points by first request the page where they are on and then find the place in the HTML where they're located. Then use them afterwards as if they were "static".
I am interested in opinions on how to approach this scenario since I don't believe any of the solutions above are particularly good.
You've asked a number of questions, and I'll see if I can cover each in turn.
Does urls like this not completely destroy local and shared caching
opportunities (cachable constraint in REST)? I imaging the only
effective caching with such urls is the WebObjects server itself.
There is, indeed, a page cache within the WebObjects application server, and you're right to observe that these component action URLs probably thwart any other kind of caching. Additionally, even though the session ID is not present in the URL, you'd need the session ID in the cookie to re-create the same page, so having just that URL would get you a session restoration error from the application server.
Isn't addressability broken as well? Each resource does have a unique
endpoint, but it changes constantly.
Well, yes, on the face of it this is true. You've given a component action URL as an example, and they're tied to the session.
Furthermore (I think) that
WebObjects also makes too old URLs invalid since they "time-out" after
a period of time. I'm not sure whether this applies only to urls with
sessions though.
Again, all true. Component action URLs generate sessions, and sessions time out.
At this point, let me take a quick diversion. I'm assuming you're not the owner of the WebObjects application—you're talking about having to scrape a WebObjects app, and you've identified some ways in which this particular app doesn't conform to REST principles. You're completely right—a fully component-action-based WebObjects application won't be RESTful. WebObjects pre-dates REST by a few years. Having said that, there are ways in which a WebObjects application can be completely RESTful:
Using session-less direct actions gives a degree of REST-like behaviour, and would certainly solve the problems you identify with caching, addressability and expiry.
Using the ERRest framework to create a 100% RESTful application.
Of course, none of this will help you if you're just trying to scrape a legacy application.
Regarding the scraping I am not sure whether it's possible to extract
any meaningful endpoints from the website. For example, with a normal
website I would look through the HTML and extract the POST urls, then
use them in my scraper by posting directly to them instead of going
through the normal request-response cycle.
Again, if it's a fully component action-based application, you're right—all those URLs will be dynamically generated and useless to you.
In this case I obviously cannot use any URLs extracted from the HTML
since they are dynamically generated on each request, but I read
something about being able to access WebObjects components directly if
the security settings have not been set to disallow this…
That's talking about getting a component to render directly from its template with some restrictions:
As you note, the application can easily prevent it from happening at all.
As mentioned on p.53, the user input and action-invocation phases of rendering the component are skipped, which probably means this approach would be limited to rendering a component that didn't have any dynamic content anyway. This might be of some very limited use to you, though you'd need to know the component names you were interested in, and they wouldn't normally be exposed anywhere.
I'm not sure you're going to find anything better than the types of high-level functional approaches you've already suggested above, such as automating at the browser level with Selenium. If what you need is REST-style direct addressability of resources within the application, you're not going to get that unless you can re-write the application to use direct actions or ERRest where you need them.
A little late, but could help.
I use the Apache's mod_ext_filter (little modified) to pre/post filter the requests/responses from our WebObjects application. The filter calls PHP scripts and can read the dynamical hyperrefs and other things from the HTML pages. The scripts can also modify the HTTP requests, so we can programatically add/remove parameters from the request to implement new workflows in front of the legacy app and cleanup the requests before they will reach WebObjects. It is also possible to handle an additional database within the scripts and store some things over multiple requests.
So you can get the dynamically created links (maybe a button's name or HTML form destination) and can recognize these names within the request.
It is also possible to "remote control" such applications with little scripts like "click on the third button on the page". The only thing you need is a DOM parser to get the structure of the HTML pages and then rebuild the actions which the browser would do (i.e. create the HTTP request manually and send it as POST to the extracted form destination href). The only problem is the Javascript code, which we analyze and reprogram within PHP (i.e. enable/disable input elements, so they will not be transmitted within the requests)
There were some problems within the WebObjects Adapter Module for Apache. It still uses Content-Length within the HTTP header, which you cannot change in mod_ext_filter. If you change the HTML or the parameters within the request, the length of the content will not longer match. But it is possible to change that.
Theoretically it could also be possible to control such an closed-source legacy application from a new UI on a tablet or smartphone, which delegates the user interaction to the backend WebObjects app.
The scripts depends on the page structure, so if your WebObjects app will be changed, you have to correct some things in the scripts (i.e. third button could be now the fourth button).
It should also be possible to add a Restful interface in front of the application and query the data from the legacy app by the filter scripts.
Alright, so a better title here may have been "Progressive Enhancement with REST in CakePHP", but at least now I'll know you didn't read the question if your answer just refers to the difference between the two ;)
I'm pretty familiar with REST and how to integrate it with CakePHP, but I'm not 100% on board with how to still maintain a conventionally functioning website. Using Router::mapResources sounds like a great idea, but this creates a problem with maintaining the "gracefully degradation" version of the site, because both POST requests to /resource/ AND GET requests for /resource/add will route to the same action (add). Clearly I'll want this action to return a JSON object if they're using the REST api, but if they're using the degraded version of the site (no JS perhaps), it should be a add form, right?
What's the best way to deal with this. Do you route your REST requests to other action names using Router::resourceMap()? Do you do that crazy hack I saw to have the /api/ prefix part of the resourceMap so you can use api_action functions? Do you have the actions handle both REST and conventional requests via checking isAjax()? If so, how do you ensure that you can rely on the browser to properly support the other two request types?
I've searched around quite a bit but haven't found anything about how to keep conventional requests available in Cake along side REST, so if anyone has any advice or experience, I'd love to hear it!
CakePHP uses extension routing as well, via Router::parseExtension() so;
/test/action will render views/test/action.ctp
/test/action.html also
/test/action.json will render views/test/json/action.ctp
/test/action.xml will render views/test/xml/action.ctp
If all views are designed to handle the same data as set by your controller, you'll be able to show a regular HTML form and handle the posted data the same way as you'd handle the AJAX request.
You'll probably might have to add checks if any data is posted/submitted inside the /add, /edit, /delete actions to prevent items being deleted without a form being posted (haven't tested that though, it might be that cake blocks these urls if mapresources is set for the controller)
REST in CakePHP:
http://book.cakephp.org/2.0/en/development/rest.html
(Extension) Routing
http://book.cakephp.org/2.0/en/development/routing.html#file-extensions
I am currently re-writing an old web-application and I want it to be RESTful. Now, one important philosophy behind a RESTful app, is that each request to the end-point has to be stateless.
With the application I am aiming for a common codebase for the API as for normal browsing. In other words, I want to avoid special URLs like http://api.domain.tld or http://domain.tld/api. I intend to interpret the HTTP Accept header for this.
One challenge I came up with, are request parameters, that a user browsing the page usually only chooses once. A good example for this is the language. Again, I can use the Accept-Language header to pick an initial language. But what if the user wishes to change this? It would be unusable if the user needed to switch the language after each request.
In my opinion, this is really a request parameter, and should be passed on as such. For example: http://domain.tld/resource?lang=en. So once the user switched the language, I would need to append this parameter to each URL on the page.
In a way, this makes the browsing-session stateful. Are there any "best practices" for this? How would you approach this. One idea I have, is to store these "global" parameters in the session, but add them to each URL nevertheless. If only to make the API easily discoverable.
On a sidenote: I am currently building the web-page using Flask which provides a method url_for to build URLs. I am considering overriding this, so each generated URL will have the parameter. But this is not a Flask specific problem. This is something most RESTful services should consider, so I will tag it neither with python, nor flask!
State Transfer
REST doesn't require each request to be stateless. The requirement is that the server does not have to manage state on behalf of the client. In effect, each request has to carry sufficient state to allow the server to process it.
Your approach, providing the user language is perfectly sensible. Others might prefer to retrieve it from a shared database but this can have some scalability concerns.
I want to ask some questions about the REST call. I am the green for the REST call and I would like to like what is REST call and how to use the URL to send a REST call to the server. Can anyone give me some basic tutorial or link for my to reference?
Besides, if I want to send a REST call to the server, what should I do? Do I need to set something in the URL? or set something in the server? Thank you.
REST is just a software architecture style for exposing resources.
Use HTTP methods explicitly.
Be stateless.
Expose directory structure-like URIs.
Transfer XML, JavaScript Object Notation (JSON), or both.
A typical REST call to return information about customer 34456 could look like:
http://example.com/customer/34456
Have a look at the IBM tutorial for REST web services
REST is somewhat of a revival of old-school HTTP, where the actual HTTP verbs (commands) have semantic meaning. Til recently, apps that wanted to update stuff on the server would supply a form containing an 'action' variable and a bunch of data. The HTTP command would almost always be GET or POST, and would be almost irrelevant. (Though there's almost always been a proscription against using GET for operations that have side effects, in reality a lot of apps don't care about the command used.)
With REST, you might instead PUT /profiles/cHao and send an XML or JSON representation of the profile info. (Or rather, I would -- you would have to update your own profile. :) That'd involve logging in, usually through HTTP's built-in authentication mechanisms.) In the latter case, what you want to do is specified by the URL, and the request body is just the guts of the resource involved.
http://en.wikipedia.org/wiki/Representational_State_Transfer has some details.