I wish to make a specific AEM component un-cached in the dispatcher.
Can anyone let me know, how this can be implemented?
Thanks in Advance!
Dispatcher caches page level information as it has no knowledge about components and constructs of the pages.
What you are looking for is something called 'Service Side Include' or more aptly for AEM, 'Sling Dynamic Include'. This is more to do with Apache server rather than dispatcher.
In a nutshell, you program your pages to have place holders for pulling in cached fragments. The page cache then becomes independent of of the component cache and performs more efficiently as a whole.
Read the article for more details.
Related
I want to host restful webservice from CQ5. Basically the intention is to expose all the users present in CQ5 to external systems based on some parameters like modified date, user state etc.
I went through https://chanchal.wordpress.com/2015/01/11/using-jax-rs-and-jersey-to-write-restful-services-in-osgi-apache-felix-adobe-cq5aem/ as I could find only this post online, but as I am a beginner I couldn't implement it. Need guidance in implementing such RESTful webservice in CQ5
CQ5 is based on Apache Sling which is inherently RESTful, so you don't usually need additional libraries. In your case (and unless the users info is already available as Sling resources, I don't remember if that's the case) implementing a Sling ResourceProvider is enough to provide a browseable RESTful representation of those resources. See the Sling docs for more info, they point to a simple PlanetResourceProvider as a minimal example.
Couldn't get the REST webservices working with AEM/CQ5. Even after installing the packages for JAXB for CQ5. It seems like sling overrides the resolving before it goes to the JAXB annotation handler. Due to lack to time had to implement an alternative approach where CQ5 will be timely writing the json data to an shared location as json file and the third party applications will fetch the files from there.
This will however impact the performace as schedulers are to be written and also it's not a recommended approach but still it will work in my scenario.
Thanks all for helping me.
I know the question is about solving the problem probably by different approach but let me specify in details what I want to ask and how much I understand about it.
We have two mvc approach used in ATG(or many other framework) pull based and push based.
As I understand it formhandlers and droplet both are playing the part of controller in different need, repositories are our model and jsps are providing views..
And if i am right till this point then what purpose the servlet chain is solving?How it fits into this picture of MVC?
Please If possible explain with the help of flow diagram from request to response (end to end).
Thanks a lot in advance to experts.
Please help.I could not find this kind of explanation anywhere.
The first thing to remember with ATG is that it is an old platform. So when trying to understand the different mechanisms through an MVC lens, remember that nothing was designed with MVC in mind-the platform predates widespread knowledge and acceptance of the MVC pattern in web development.
Droplets are a generalized mechanism for invoking Java code from a JSP (and previously, JHTML) template. The closest analogue in J2EE-land would be tag libraries. So you can use them to accomplish many different tasks. You can also use them as a poor man's MVC controller. This is done by coding a custom droplet class dedicated to handling the business logic of your page, and setting relevant page state as request parameters. Then you invoke the droplet as the first step in your JSP page. This is not very different from J2EE model 2 with the odd quirk that first your compiled JSP begins executing, then invokes your "controller" code and then resumes with processing the "view."
Form handlers, as the name indicates, are designed for processing form submissions. These are executed by a link in the servlet pipeline (DAFDropletEventServlet), before the page which was posted to begins rendering. Typical usage with form handlers is that they will send a redirect after executing, and cancel rendering of the page, which will abort any further processing by the servlet pipeline. Form handlers are the closest thing to a "controller" ATG provides. But they are awful for handling GET requests and result in some very unfortunate URLs when used for this purpose.
Why did they create two different mechanisms? Why have they not subsequently introduced an updated MVC model as part of the platform? These questions I cannot answer.
ATG, at the page rendering level, is not MVC. Don't look at it as MVC. ATG, at its core, is an extension of the basic Servlets API.
Forms and form-processing, with the successUrl and errorUrl is kind-of MVC. Droplets certainly are not.
It is perfectly acceptable in an ATG application to be fetching data as the page is rendering (i.e. Droplets)
I need to programmatically interact with a WebObjects website and extract data from the responses. The particular WebObjects site I am scraping uses component actions and stores sessions in cookies (not urls). This means that all urls look something like this:
http://example.com/WOApp/WebObjects/WOApp.woa/wo/7.0.0.0.29.1.1.1
My first questions are:
Does urls like this not completely destroy local and shared caching opportunities (cachable constraint in REST)? I imaging the only effective caching with such urls is the WebObjects server itself.
Isn't addressability broken as well? Each resource does have a unique endpoint, but it changes constantly. Furthermore (I think) that WebObjects also makes too old URLs invalid since they "time-out" after a period of time. I'm not sure whether this applies only to urls with sessions though.
Regarding the scraping I am not sure whether it's possible to extract any meaningful endpoints from the website. For example, with a normal website I would look through the HTML and extract the POST urls, then use them in my scraper by posting directly to them instead of going through the normal request-response cycle.
In this case I obviously cannot use any URLs extracted from the HTML since they are dynamically generated on each request, but I read something about being able to access WebObjects components directly if the security settings have not been set to disallow this (see https://developer.apple.com/legacy/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.5/PDF/WebObjectsDevGuide.pdf, p. 53 "Limitations on Direct requests"). I don't understand exactly how to do this though or if it's even possible.
If it's not possible what would be a good approach then? The only options I can think of is:
Using a full-blown browser client to interact with the website (e.g. WatiR or Selenium) and extract & process the HTML from their responses
Manually extracting the dynamic end-points by first request the page where they are on and then find the place in the HTML where they're located. Then use them afterwards as if they were "static".
I am interested in opinions on how to approach this scenario since I don't believe any of the solutions above are particularly good.
You've asked a number of questions, and I'll see if I can cover each in turn.
Does urls like this not completely destroy local and shared caching
opportunities (cachable constraint in REST)? I imaging the only
effective caching with such urls is the WebObjects server itself.
There is, indeed, a page cache within the WebObjects application server, and you're right to observe that these component action URLs probably thwart any other kind of caching. Additionally, even though the session ID is not present in the URL, you'd need the session ID in the cookie to re-create the same page, so having just that URL would get you a session restoration error from the application server.
Isn't addressability broken as well? Each resource does have a unique
endpoint, but it changes constantly.
Well, yes, on the face of it this is true. You've given a component action URL as an example, and they're tied to the session.
Furthermore (I think) that
WebObjects also makes too old URLs invalid since they "time-out" after
a period of time. I'm not sure whether this applies only to urls with
sessions though.
Again, all true. Component action URLs generate sessions, and sessions time out.
At this point, let me take a quick diversion. I'm assuming you're not the owner of the WebObjects application—you're talking about having to scrape a WebObjects app, and you've identified some ways in which this particular app doesn't conform to REST principles. You're completely right—a fully component-action-based WebObjects application won't be RESTful. WebObjects pre-dates REST by a few years. Having said that, there are ways in which a WebObjects application can be completely RESTful:
Using session-less direct actions gives a degree of REST-like behaviour, and would certainly solve the problems you identify with caching, addressability and expiry.
Using the ERRest framework to create a 100% RESTful application.
Of course, none of this will help you if you're just trying to scrape a legacy application.
Regarding the scraping I am not sure whether it's possible to extract
any meaningful endpoints from the website. For example, with a normal
website I would look through the HTML and extract the POST urls, then
use them in my scraper by posting directly to them instead of going
through the normal request-response cycle.
Again, if it's a fully component action-based application, you're right—all those URLs will be dynamically generated and useless to you.
In this case I obviously cannot use any URLs extracted from the HTML
since they are dynamically generated on each request, but I read
something about being able to access WebObjects components directly if
the security settings have not been set to disallow this…
That's talking about getting a component to render directly from its template with some restrictions:
As you note, the application can easily prevent it from happening at all.
As mentioned on p.53, the user input and action-invocation phases of rendering the component are skipped, which probably means this approach would be limited to rendering a component that didn't have any dynamic content anyway. This might be of some very limited use to you, though you'd need to know the component names you were interested in, and they wouldn't normally be exposed anywhere.
I'm not sure you're going to find anything better than the types of high-level functional approaches you've already suggested above, such as automating at the browser level with Selenium. If what you need is REST-style direct addressability of resources within the application, you're not going to get that unless you can re-write the application to use direct actions or ERRest where you need them.
A little late, but could help.
I use the Apache's mod_ext_filter (little modified) to pre/post filter the requests/responses from our WebObjects application. The filter calls PHP scripts and can read the dynamical hyperrefs and other things from the HTML pages. The scripts can also modify the HTTP requests, so we can programatically add/remove parameters from the request to implement new workflows in front of the legacy app and cleanup the requests before they will reach WebObjects. It is also possible to handle an additional database within the scripts and store some things over multiple requests.
So you can get the dynamically created links (maybe a button's name or HTML form destination) and can recognize these names within the request.
It is also possible to "remote control" such applications with little scripts like "click on the third button on the page". The only thing you need is a DOM parser to get the structure of the HTML pages and then rebuild the actions which the browser would do (i.e. create the HTTP request manually and send it as POST to the extracted form destination href). The only problem is the Javascript code, which we analyze and reprogram within PHP (i.e. enable/disable input elements, so they will not be transmitted within the requests)
There were some problems within the WebObjects Adapter Module for Apache. It still uses Content-Length within the HTTP header, which you cannot change in mod_ext_filter. If you change the HTML or the parameters within the request, the length of the content will not longer match. But it is possible to change that.
Theoretically it could also be possible to control such an closed-source legacy application from a new UI on a tablet or smartphone, which delegates the user interaction to the backend WebObjects app.
The scripts depends on the page structure, so if your WebObjects app will be changed, you have to correct some things in the scripts (i.e. third button could be now the fourth button).
It should also be possible to add a Restful interface in front of the application and query the data from the legacy app by the filter scripts.
I have experience with other enterprise CMS's like Teamsite & Tridion, but no hands on experience with CQ5. I'm wondering, how is CQ5 usually integrated with a large site that has content & functionality? Functionality defined as pages are generated with data from a non-CMS repository or webservice.
My question is, is CQ5 content read in as a back-end service? I know the API is http based. But is that API typically called from server or client? For my example, lets say I have a page that is from mostly driven from a web service that is linked to some non-CMS enterprise system, but I want the footer & right rail to be "content" so that the users can change it easily. At what point would the the different page sources typically be combined?
I'm wondering because I work with asp.net. I know the CQ5 is Java, so I would expect that most cusomters are java shops, but I would think that the HTTP would be easy to consume for an ASP.net site, if it is really just another backend webservice.
Your question is really not all that clear to me to be honest. So I'm going to answer this rather broad.
To answer your question about the different page sources:
The client usually initiates a http or json request to the server (although server to server calls are not uncommon in case of extended infrastructure) and the server simply executes the necessary calls (using the apis) and delivers the answer to the request. But at the point of request return, all calls to the api have been made by the server, and the server is just returning rendered html or json, or whatever structured form you want to have your data and/or content in.
A page consists of various components. Some components are fairly static. Others are very dynamic and pull their data for example from a webservice, or external database or even another cms. The combining of these resources happens upon the rendering of the page which in its turn was triggered by a request for that page. The obvious exception is ofcourse the dispatcher caching system which will return a cached version of the page if possible. But in short, all the rendering and api calls are made server side.
CQ5 is fairly flexible since it's split up into 2 instances. The backend (author) which is where the actual authoring of pages happens. And the frontend (publish) which is basically the frontend, and does the actual rendering for a client (usually).
Wether you choose to use the publish instance a backend service is just up to you to be honest. I've seen cq5 used as what it was ment for (cq5 being the frontend), and I've seen cq5 used as a backend service (for example: as a backend service provider for hybris). And I've seen the combination, where one part was used as backend service for another system, and the other part was used for a public website frontend.
CQ's rich HTTP API (based on Apache Sling) gives full access to the CQ content in various formats including JSON and XML, so integrating CQ content in other systems is easy.
In the other direction, you can use Sling's ResourceProvider mechanism to access external content and make it part of the CQ content tree. See "custom resource providers" in the Sling Resources docs at http://sling.apache.org/documentation/the-sling-engine/resources.html .
I have an idea to add a edit-layer to website as a Plack middleware.
Explanation: let's say, we create a website, based on some framework and templates and CSS (requesting it like /some/page). Now we could create a middleware so that every request to pages starting with adm (like /adm/some/page) shows the same page, but adds a layer for content editing. So we could easily look and use the page as visitors do, but with double-click on block-level element we could modify or add content. So middleware should bind certain block-elements with certain events (double-click) and set handlers too (with some Javascript library).
For now it is just an idea and i have not seen such approach in any CMS. I am looking for hints and ideas and examples, how to start and implement such system. I hope, there is already done something like that.
You could do it, but I don't think you want to do this. My understanding is that Plack::Middleware's are supposed to be generic, and implementing a CMS as a plack middleware limits its re-usability, and its out of place, there is no inherent connection between a middleware and a CMS.
See these as examples Plack::Middleware::OAuth, Plack::Middleware::Debug, Plack::Middleware::iPhone, Plack::Middleware::Image::Scale, Plack::Middleware::HTMLMinify
It would be trivial to add a middleware filter to insert a form in your html based on /adm/ or /admin/ or whatever ... and mapping the url to the dispatch would highly depend on the underlying CMS model/view/controller framework, which is why frameworks such as Catalyst, Mojolicious and other already provide this feature
See http://advent.plackperl.org/2009/12/day-23-write-your-own-middleware.html
Basically, I think this is a job for a view/controller of your application, a plugin, not a wrapper for your application (middleware)
I know my explanation is lacking but hopefully you catch my drift