Is CQ5 used as a backend service? - aem

I have experience with other enterprise CMS's like Teamsite & Tridion, but no hands on experience with CQ5. I'm wondering, how is CQ5 usually integrated with a large site that has content & functionality? Functionality defined as pages are generated with data from a non-CMS repository or webservice.
My question is, is CQ5 content read in as a back-end service? I know the API is http based. But is that API typically called from server or client? For my example, lets say I have a page that is from mostly driven from a web service that is linked to some non-CMS enterprise system, but I want the footer & right rail to be "content" so that the users can change it easily. At what point would the the different page sources typically be combined?
I'm wondering because I work with asp.net. I know the CQ5 is Java, so I would expect that most cusomters are java shops, but I would think that the HTTP would be easy to consume for an ASP.net site, if it is really just another backend webservice.

Your question is really not all that clear to me to be honest. So I'm going to answer this rather broad.
To answer your question about the different page sources:
The client usually initiates a http or json request to the server (although server to server calls are not uncommon in case of extended infrastructure) and the server simply executes the necessary calls (using the apis) and delivers the answer to the request. But at the point of request return, all calls to the api have been made by the server, and the server is just returning rendered html or json, or whatever structured form you want to have your data and/or content in.
A page consists of various components. Some components are fairly static. Others are very dynamic and pull their data for example from a webservice, or external database or even another cms. The combining of these resources happens upon the rendering of the page which in its turn was triggered by a request for that page. The obvious exception is ofcourse the dispatcher caching system which will return a cached version of the page if possible. But in short, all the rendering and api calls are made server side.
CQ5 is fairly flexible since it's split up into 2 instances. The backend (author) which is where the actual authoring of pages happens. And the frontend (publish) which is basically the frontend, and does the actual rendering for a client (usually).
Wether you choose to use the publish instance a backend service is just up to you to be honest. I've seen cq5 used as what it was ment for (cq5 being the frontend), and I've seen cq5 used as a backend service (for example: as a backend service provider for hybris). And I've seen the combination, where one part was used as backend service for another system, and the other part was used for a public website frontend.

CQ's rich HTTP API (based on Apache Sling) gives full access to the CQ content in various formats including JSON and XML, so integrating CQ content in other systems is easy.
In the other direction, you can use Sling's ResourceProvider mechanism to access external content and make it part of the CQ content tree. See "custom resource providers" in the Sling Resources docs at http://sling.apache.org/documentation/the-sling-engine/resources.html .

Related

Why to use Model from MVC in frontend?

I've done almost all of my web projects using Java(Spring MVC + Thymeleaf) and it's MVC technologies. Recently I've heared about REST and started learning some stuff about it. I realized that it is one of the coolest things I've ever seen in my whole life!(I'm just joking around, it's definitely not)
All we need is just to parse data to json type and then return it to the frontend. And in frontend we no longer need to use Model and it's objects. Frontend can get all required data in nice-to-work-with json type using one single GET request!
We don't need to use some weird Thymeleaf constructions to handle errors or to iterate through the list in our template! We can handle all events and process all data using javascript and it's frameworks. It is much more powerful.
Does there exist something I missed? When to use Model? When to use json-type data?
These are two (somewhat overlapping) approaches to frontend development: generating pages on the backend and on the frontend.
Using a backend model and Thymeleaf templates (or any other HTML templates), you generate your web page on the server side. This means the following benefits
you can write frontend and backend logic in one language (Java);
you can enforce data and security constraints in one place - on the backend, using Java code or configuration;
in many cases, it's faster for the user to get their first page, since the rendering happens on the server, and the user gets an already rendered page;
But this approach has the following drawbacks:
the server has to render the pages, which means more load on the server;
most of the modern web sites and applications use JavaScript anyway, so you'll have to write at least some JavaScript;
server-generated pages don't even come close to what's possible to render in the browser.
Providing a REST API for a JavaScript frontend, you have the following benefits:
you unload your server, since most of the UI related work happens on the user's device;
you can achieve much more with modern frontend frameworks;
navigating on an already rendered UI is faster, since you don't need for the server to render every page, you only need to get a relatively small JSON response and then update the page dynamically.
But this approach has the following tradeoffs:
the user generally waits longer to see their first page, as the browser needs to download a lot of scripts and then spend time on dynamic rendering of the page;
in a large application, you need to either have a full-stack developer team, or two teams working on frontend and backend separately;
you have to write your UI logic in JavaScript. Make of it what you will :)
you have to sometimes duplicate the constraints both on the frontend and on the backend: e.g., if a user can't edit a field, you have to both show it as read-only on the frontend, and add validation in your REST service, since the user may try to access your REST API directly and bypass the frontend validation;
you have to be more aware of securing your REST API in general.

Scraping WebObjects website & REST

I need to programmatically interact with a WebObjects website and extract data from the responses. The particular WebObjects site I am scraping uses component actions and stores sessions in cookies (not urls). This means that all urls look something like this:
http://example.com/WOApp/WebObjects/WOApp.woa/wo/7.0.0.0.29.1.1.1
My first questions are:
Does urls like this not completely destroy local and shared caching opportunities (cachable constraint in REST)? I imaging the only effective caching with such urls is the WebObjects server itself.
Isn't addressability broken as well? Each resource does have a unique endpoint, but it changes constantly. Furthermore (I think) that WebObjects also makes too old URLs invalid since they "time-out" after a period of time. I'm not sure whether this applies only to urls with sessions though.
Regarding the scraping I am not sure whether it's possible to extract any meaningful endpoints from the website. For example, with a normal website I would look through the HTML and extract the POST urls, then use them in my scraper by posting directly to them instead of going through the normal request-response cycle.
In this case I obviously cannot use any URLs extracted from the HTML since they are dynamically generated on each request, but I read something about being able to access WebObjects components directly if the security settings have not been set to disallow this (see https://developer.apple.com/legacy/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.5/PDF/WebObjectsDevGuide.pdf, p. 53 "Limitations on Direct requests"). I don't understand exactly how to do this though or if it's even possible.
If it's not possible what would be a good approach then? The only options I can think of is:
Using a full-blown browser client to interact with the website (e.g. WatiR or Selenium) and extract & process the HTML from their responses
Manually extracting the dynamic end-points by first request the page where they are on and then find the place in the HTML where they're located. Then use them afterwards as if they were "static".
I am interested in opinions on how to approach this scenario since I don't believe any of the solutions above are particularly good.
You've asked a number of questions, and I'll see if I can cover each in turn.
Does urls like this not completely destroy local and shared caching
opportunities (cachable constraint in REST)? I imaging the only
effective caching with such urls is the WebObjects server itself.
There is, indeed, a page cache within the WebObjects application server, and you're right to observe that these component action URLs probably thwart any other kind of caching. Additionally, even though the session ID is not present in the URL, you'd need the session ID in the cookie to re-create the same page, so having just that URL would get you a session restoration error from the application server.
Isn't addressability broken as well? Each resource does have a unique
endpoint, but it changes constantly.
Well, yes, on the face of it this is true. You've given a component action URL as an example, and they're tied to the session.
Furthermore (I think) that
WebObjects also makes too old URLs invalid since they "time-out" after
a period of time. I'm not sure whether this applies only to urls with
sessions though.
Again, all true. Component action URLs generate sessions, and sessions time out.
At this point, let me take a quick diversion. I'm assuming you're not the owner of the WebObjects application—you're talking about having to scrape a WebObjects app, and you've identified some ways in which this particular app doesn't conform to REST principles. You're completely right—a fully component-action-based WebObjects application won't be RESTful. WebObjects pre-dates REST by a few years. Having said that, there are ways in which a WebObjects application can be completely RESTful:
Using session-less direct actions gives a degree of REST-like behaviour, and would certainly solve the problems you identify with caching, addressability and expiry.
Using the ERRest framework to create a 100% RESTful application.
Of course, none of this will help you if you're just trying to scrape a legacy application.
Regarding the scraping I am not sure whether it's possible to extract
any meaningful endpoints from the website. For example, with a normal
website I would look through the HTML and extract the POST urls, then
use them in my scraper by posting directly to them instead of going
through the normal request-response cycle.
Again, if it's a fully component action-based application, you're right—all those URLs will be dynamically generated and useless to you.
In this case I obviously cannot use any URLs extracted from the HTML
since they are dynamically generated on each request, but I read
something about being able to access WebObjects components directly if
the security settings have not been set to disallow this…
That's talking about getting a component to render directly from its template with some restrictions:
As you note, the application can easily prevent it from happening at all.
As mentioned on p.53, the user input and action-invocation phases of rendering the component are skipped, which probably means this approach would be limited to rendering a component that didn't have any dynamic content anyway. This might be of some very limited use to you, though you'd need to know the component names you were interested in, and they wouldn't normally be exposed anywhere.
I'm not sure you're going to find anything better than the types of high-level functional approaches you've already suggested above, such as automating at the browser level with Selenium. If what you need is REST-style direct addressability of resources within the application, you're not going to get that unless you can re-write the application to use direct actions or ERRest where you need them.
A little late, but could help.
I use the Apache's mod_ext_filter (little modified) to pre/post filter the requests/responses from our WebObjects application. The filter calls PHP scripts and can read the dynamical hyperrefs and other things from the HTML pages. The scripts can also modify the HTTP requests, so we can programatically add/remove parameters from the request to implement new workflows in front of the legacy app and cleanup the requests before they will reach WebObjects. It is also possible to handle an additional database within the scripts and store some things over multiple requests.
So you can get the dynamically created links (maybe a button's name or HTML form destination) and can recognize these names within the request.
It is also possible to "remote control" such applications with little scripts like "click on the third button on the page". The only thing you need is a DOM parser to get the structure of the HTML pages and then rebuild the actions which the browser would do (i.e. create the HTTP request manually and send it as POST to the extracted form destination href). The only problem is the Javascript code, which we analyze and reprogram within PHP (i.e. enable/disable input elements, so they will not be transmitted within the requests)
There were some problems within the WebObjects Adapter Module for Apache. It still uses Content-Length within the HTTP header, which you cannot change in mod_ext_filter. If you change the HTML or the parameters within the request, the length of the content will not longer match. But it is possible to change that.
Theoretically it could also be possible to control such an closed-source legacy application from a new UI on a tablet or smartphone, which delegates the user interaction to the backend WebObjects app.
The scripts depends on the page structure, so if your WebObjects app will be changed, you have to correct some things in the scripts (i.e. third button could be now the fourth button).
It should also be possible to add a Restful interface in front of the application and query the data from the legacy app by the filter scripts.

Servlet API versus REST API

I am working with a CRM System that provides two types of APIs: servlet API & REST API, both of which are over HTTP.
I used to integrate with REST API from my ASP.NET MVC web application by calling the REST URL and manipulating the returned JSON or XML. But I cannot figure out what is meant by a servlet API and can these APIs be called over the web from my ASP.NET MVC application or these APIs should be called inside a Java application?
Sorry if my question seems trivial to someone.
The Java Servlet API refers to a set of classes used to implement server side programs. The main player is the Servlet:
A servlet is a small Java program that runs within a Web server. Servlets receive and respond to requests from Web clients, usually across HTTP, the HyperText Transfer Protocol.
If you want a very simplistic analogy, the Servlet is Java's version of CGI (Common Gateway Interface).
A REST API is a way to build applications by fully using the architecture of the web. Leaving all details of REST aside and grossly simplifying, it's basically a HTTP API.
If you want to build a HTTP API, you can use Servlets. So you can also use servlets to build a REST API although there are better alternatives to that (e.g. JAX-RS) because servlets are a "low level" component and nothing shields you from all the boiler plate code you need to write.
You can of course call a Java application build on top of the Servlet API from other clients (e.g. from ASP.NET MVC). That's what it was built for. For this reason I don't really understand what exactly your CRM system means by a Servlet API and (a separate!?) REST API... so maybe ask the CRM provider?
EDIT : Based on what I've read about the ManageEngine Service deskPlus APIs, I'm thinking that this is just an unfortunate name chosen by the provider.
As I mentioned in my comments, when you say REST API you already provide some information from the beginning. Most people, when told about REST understand that you have some abstract resources, that these resources can have multiple representations (JSON, XML, whatever), that each resource is identified by a URI, that /customers is referring to a list of customers resources, that /customers/1 is a customer and that /customers/2 is another one, that you use GET /customers/1 to find out details about the customer and DELETE /customers/1 to delete it etc.
REST is one way to interact with an application, another one is to expose operations that can be called by clients, for example like what SOAP is doing. Before REST became the new kid in town people were doing stuff with SOAP. Unlike accessing resources, SOAP is focused on accessing operations. When you mention SOAP to someone she knows that it's a protocol that can use HTTP's POST to transmit messages around, that each message has an XML payload that contains the operation name to invoke and the parameters needed for the call etc.
But even before SOAP and REST becoming widely known, people realized that they can use a form submit to sneak in RPC calls through HTTP. The HTTP form based submission is one of the methods of the API in ManageEngine Service deskPlus. But the form based submission method (as far as I know) doesn't have a cool name like SOAP or REST... so maybe that's why it was named after the Servlet API?! (I'm once again emphasizing that this is just the server implementation which is not important in the context of the HTTP protocol).
So to conclude: Yes, you CAN call the ManageEngine Service deskPlus Servlet API from ASP.NET, even a web browser or any kind of HTTP capable client.

What the best way protecting REST service? Should it be a part of site?

I have started to work in a new company. We have rest service (XML exchange with external system) and have web site. REST service work on subdomain, for example rest.mycompany.com. Company site is mycompany.com. Site and rest work like that
REST -> DB <-SITE. This means that REST is not a part of site. It's an independent system. REST and site work with one database and use 90% same code (model, mapper etc). The problem for me is double coding and I wonder why it can't be a part of site (Import export controller, XML parser and one logger system)? On the other hand, it may be better to have different systems in terms of security and highloading for each subdomain...separated traffic for each subdomain?
Site and rest work like that REST -> DB <-SITE. This means that REST is not a part of site. It's an independent system. REST and site work with one database and use 90% same code (model, mapper etc).
That's a big problem. Especially since one system might generate a bug (inconsistent data) which only shows in the other system. Quite hard to debug.
The problem for me is double coding and I wonder why it can't be a part of site (Import export controller, XML parser and one logger system)?
The REST service and the website are just UI layers. The actual business logic should be moved to a third project (class library / module / lib) which both UI layer uses.
On the other hand, it may be better to have different systems in terms of security and highloading for each subdomain...separated traffic for each subdomain?
I would stick with different sites. Not for performance but since they have distinct responsibilities.

What is the underlying technology behind ServiceAsync interface?

I developed a web application (works properly) which registers a user to the system and allows user to upload a file to system via https. The client side code is totally developed by using GWT 2.4 and the back end is several servlets. Except the upload code, all the client-server communications are done via using ServiceAsync interface as it is the common practice in GWT. The upload code is based on a form which is communicating with the upload servlet directly.
This project is developed as a course work and my Professor is keen on knowing the underlying architecture of google web toolkit specificly focused on the client-server communication. His question was,
"How the client code knows the server's url so that all the communication is done?"
His question is legitimate for the ServiceAsync interface. I am calling a function on the server side which seems interesting to him and he wants to know the underlying process behind it.
For uploading, I just defined uploadForm.setAction(GWT.getModuleBaseURL()+"upload"); where upload is the name of the upload servlet in web.xml.
I told him that the compiler generates Javascript code which contains all the web application code (whole system developed dynamically) and the url to the servlet is placed in that script file however the answer did not satisfy him. Please let me know the inner facts of the client-server communication with GWT.
Please give me some answers that can help my Professor to understand GWT's asynchronous client-to-server RPC communication.
The underlying technology is shown here as a diagram. Google says "GWT provides an RPC mechanism based on Java Servlets to provide access to server side resources. This mechanism includes generation of efficient client and server side code to serialize objects across the network using deferred binding."
The client knows the URL to query because you would have annotated your service interface with a #RemoteServiceRelativePath tag. This associates the service with a default path relative to the module base URL. That URL is where Javascript sends your request.
There's a lot more to learn about GWT's RPC if you care to, and you can start picking it apart here and here.
There are multiple ways in GWt how to bind RPC service to the specific url.
First is an annotation #RemoteServiceRelativePath which is placed on synchronous interface. Using deferred binding rule GWT will discover this URL and will set it to the service instance automatically.
Second is casting instance of GWT-RPC async service to the ServiceDefTarget and setting an url manually.
But this answer alone will not satisfy your professor, because most likely he will want to know some other details, so I recommend you to learn how exactly GWT-RPC works.
Regarding the basic question of how the client knows the url of the server, it sounds as though the professor may be asking about the full url of the site (the domain name), and not just the sub-directories for the services as they are defined in #RemoteServiceRelativePath and the web.xml.
For this more fundamental aspect of the question, I think the "Same Origin Policy" (SOP) that browsers have for javascript security could be an important part of the answer. This is explained in one of the GWT FAQs. The first thing that the browser is doing on the client side (after the HTTPS connection is established, which I think could be another important part of the answer) is reading the host html file, where the bootstrap nocache.js file is referenced. Once this file is loaded, the SOP is going to guarantee that all of the subsequent JS application files are coming from the same server as the bootstrap and the host html files. Once the application files are loaded, then everything is happening within that context, with specific internal url paths being defined for RPCs as already mentioned.