What do these terms mean in CQ5/AEM Environment? - aem

What is the meaning of the below mentioned terms ? Are they any different from each other?
URL Redirect Rules
Resource resolver settings for URL shortening
Sling Mapping
Vanity URL
Vanity Domain
Update :: Re- constructed the question
As per my understanding the above terms mean the same thing. I have read the documentation but haven't clearly understood it.

Found an awesome link to my questions.
AEM URL Rewriting
The typical website structure for an Adobe CQ5/AEM project begins with /content in the URL structure and typically contains the application name. My example application’s homepage has the URL structure /content/cookbook/en/home.html which matches the JCR structure for the website. This is not an ideal url path most people would like for their site. To address this concern we will utilize two methods for rewriting URLS within AEM.
Sling Resource Resolver
Inside AEM you can configure the Sling Resource Resolver to filter out the initial path of your site structure. To achieve this you need to edit the Apache Sling Resource Resolver Factory inside the system console’s configuration section (/system/console/configMgr). You will need to add an entry under the URL Mapping property to remove the beginning portion of the URL you want remapped. In my case I have entered /content/cookbook/-/ so that /en/books.html now resolves the url /content/cookbook/en/books.html. This will apply to all sites within your site so you may want to review your site structure to avoid a conflict.
Vanity URLs
For some sites there might be a requirement to create a friendly url for navigating into your site. In my case I want to type http://localhost:4502/books to navigate to the /en/books.html page. In this scenario I may just decide to edit the Vanity URL property for the books.html page. I can specify that /books is the vanity url and any traffic to that URL will be redirected to books.html. This can be convenient for site with only a couple vanity URLs but isn’t idea since it can be edited by an author.
Sling Resource Mappings
If you wish to keep url mapping rules outside of the author’s control then you should utilize the Resource Mapping features in Sling. Under /etc/map/http you can create nodes of the jcr type sling:Mapping that will allow you to do the same thing as vanity urls. These nodes require two properties to be set: sling:match and sling:internalRedirect. The sling:match property uses regular expression to evaluate the url to match. If the url is matched then the request is redirected to the path set in the sling:internalRedirect property. In the example application, the matched path localhost.4502/authors is redirected to the /content/cookbook/en/authors.html page.

I'll give it a try:
URL Redirect Rules -> This sounds for me more like mod_rewrite in apache
ResourceResolver settings -> Can be configured in OSGi (Apache Sling Resource Resolver Factory). Usually the path to a page starts with /content/sitename/language. So the language maybe interesting for the visitor, but the first two are not, so you want to map /content/sitename/ to / so you can have call mydomain.com/language in the browser
Sling mapping is more or less the same logic as ResourceResolver, but you don't configure the ResourceResolver in OSGi, but have a mapping below /etc/map/http
VanityUrl -> This is more like an alias for a path mostly used for marketing URLS like mydomain.com/product1 which could point to /content/sitename/language/products/product1 It would not make sense to have a ResourceResolver or Sling mapping for each product
Vanity Domain is linked to VanityUrl so you can have the same VanityUrl for different domain: mydomain.com/product1 ponts to a different site than myseconddomain.com/product1

Related

ACS AEM Commons Sitemap (Adobe Experience Manager)

I have an AEM 6.3 site which uses ACS AEM Commons 3.15.12 sitemap functionality, it's configured on publish instances to use the 'publish' externalizer domain. The rendered sitemap has the correct hostname in the sitemap URLs.
When I add an additional homepage component (for the new site) in the same sitemap config as the existing working one, keeping publish as the externalizer domain, the new site's sitemap doesn't have the new site's domain name in the generated URLs, instead it has http://localhost:4503.
The working site (sitemap) does have some /etc/map/http mappings, which I recreated in kind for the new site, but again, when using the same config (with a home page component for each site), http://localhost:4503 remained as the domain name for my new site in its ACS AEM Commons generated sitemap.xml.
I did not try creating a new config with the new site (and its home page component), using publish as the externalizer domain, and with the new mappings I created.
I did however create a new config, with the new site's homepage component, and using a custom externalizer domain, which I created to match my new site's correct domain name, and did not have any /etc/map/http maps for the new site. In this case, the generated sitemap had the correct domain name in its sitemap.xml.
I'm trying to understand what's going on. Why the different behavior in the domain names printed in the generated sitemap.xml files for each site? Also, why does ACS AEM Commons want a home page component when a path could indicate the root of a site? It makes me wonder if my new site's home page component is missing something, so as not to work (i.e. causing the ACS AEM Commons sitemap to show http://localhost:4503 instead of the site's domain name), or maybe it's mapping related, or something else?
Seeking clarity (09/08/21):
The first site in my AEM to use ACS Commons Sitemap is using "publish" (which maps to http://localhost:4503) as the externalizer domain. How is the generated sitemap for this site getting the correct domain in this case? The only other info in the ACS Commons Site Map config for this site is the sling resource type for this site's homepage component.
Additionally, there are several /etc/map/http/<xxx_site:80> entries for this site, including one for sitemap (a redirect to home.sitemap.xml). I have a feeling these entries are how the sitemap has the correct domain while only having "publish" as the externalizer domain? The protocol shows as http however, could this be changed to https by creating similar /etc/map/https entries?
Instead of creating: publish1 https://www.yourfirstdomain.com, publish2 https://www.yourseconddomain.com for additional sites as suggested (and this does seem to work), could I use the same "publish" externalizer domain, in a new/separate ACS Site Map config, as the first site does, in conjunction with similar /etc/map/http(s) entries for the additional sites/domains?
Default configuration in externalizer for "publish" domain is "http://localhost:4503".
For your new/existing domain you should first configure Day CQ Link Externalizer:
use publish1, publish2...and so on
publish1 https://www.yourfirstdomain.com,
publish2 https://www.yourseconddomain.com
After this, you can enter the respective domain (publish1, publish2,..) in ACS AEM Commons - site map servlet as externalizer domain

How do I extend the vanity URL functionality within AEM

I'm working in AEM 6.3. I'm trying to extend the default behavior of vanity URLs such that the following will happen:
User navigates to vanity URL and is redirected to actual URL
An ID that is associated (and authorable) is appended to the URL
Profit?
So I'd be extending the default page properties vanity functionality to include an ID.
Example:
Vanity URL: /foobar
ID: 123
Actual URL: www.test.com/plans
Resulting URL: www.test.com/plans?123
I've been able to modify the page properties to include a new field associated with the vanity URL in the UI. It doesn't appear to be saving the actual value though.
Haven't gotten to actually trying to apply this saved value to the URL through the dispatcher. How that would work is still very vague as well.
Yes. You could do that. But you will have to build your page rendering logic based on the URL used for accessing the page. Guess you could manage with two public facing URLs - the vanity url and the final url constructed based on the ID.
As for dispatcher configuration, check the official documentation at the below link for handling vanity URLs
https://docs.adobe.com/content/help/en/experience-manager-dispatcher/using/configuring/dispatcher-configuration.html#enabling-access-to-vanity-urls-vanity-urls

TYPO3 construct URL with several placeholders

I am not a developer but I need help talking to my web developer. We have a company website with staff profiles made with TYPO3. On each profile site, we want to embed an URL to a repository website, which will show a list of the person's publications. The URL encodes a SOLR search string for this repository. So, the URL is the same for all staff profiles except for a personal identifier somewhere in the middle. Instead of typing the complete URL into every staff profile page, I would prefer to have a URL constructed with placeholders like this
placeholderA+staffID+placeholderB
In case the SOLR search string changes in the future (i.e. excluding certain document types in this search or changing the sorting) we would have to change only the placeholders, not the complete URL in each and every profile page. There has to be a simple way to do this, but my web developer tells me this would require a database with the staff IDs and a lot of scripting. Is she right? Since we have to touch every profile page to include the URL initially, it would be ok to enter and store the unique staff identifier on each profile page. I just want to avoid touching each page again when changes are required.
It is such a useful concept, there has to be a plugin or something to do this already. Can you throw me some keywords or hints for our next discussion?
How is your URl build?
in general URLs are build in multiple parts
[[[<protocol>]<domain>]<path>[?<parameters>]][#<anchor>]
Now we need to identify where you want to insert the staffID:
is it in <path> or in <parameter>?
In TYPO3 we don't have a real <path>. It's just an image to hide the basic URL index.php with some parameters. first of all the parameter &id=123.
further parameter can occur as parameter or as path segments. Anyway TYPO3 will handle a translation between virtual path and parameters (either an extension like realurl or the core). For URL generation you call TYPO3 providing a list of parameters, or if a 'page' is called you will end with a list of parameters which will decide the rendering.
Then you can mixup the URL generation:
You take a generated URL and add path segments as you 'guess' them, without involving TYPO3. This might result in problems as you will call the server with an url TYPO3 does not know about as it has not generated it. If TYPO3 generates an URL it stores it in the database with the 'translation' into parameters.
realurl can guess the translation, but sometimes it failes, especially if chashes are used.
What are cHashes?
with cHashes TYPO3 secures the page cache against unrelated parameters. If a page is generated it depends on the parameters. Any further parameter might result in another page, so TYPO3 stores a hash of the parameters with each cached page. For verification this hash can be added to the URL. this additional parameter also is stored in the 'translation' table.
If you now add parameters to an URL which is stored and it gets translated back to parameters you have an cHash parameter which identifies the cached page. But only for a part of the parameters. your added parameters were not known and not considered when the page was generated and stored in cache. If the cached page is delivered your additional parameters are 'lost'.
So it is necessary to include all parameters in URL generation with TYPO3.
Your adding of the staffID must be done with TYPO3 and can't be an concatenation of path segments in the HTML templates (or javascript).
If you later on change your parameters you need to change the generation of your URLs.
I would recommend to add the staffID as a field to the record and generate the URL to the document list with TYPO3.

How Linkchecker works in AEM 6.2

I am working on linkchecker and want to know that when AEM saves the URLs in /var/linkchecker and on what basis?
If i am opening a link,then it saves it,or it has a polling like it traverse the complete content and put it in /var/linkchecker.
Which java class help to store valid or invalid links in its storage directory?
LinkChecker is based on an eventHandler for /content (and child) nodes on creates and updates. All content is parsed and links are validated against allowed protocols and (configurable) external site links.
External Links
All the validation is done asynchronously in the background and the HTML is updated based on verification results.
/var/linkchecker is the cache for external links. The results based on simple GET requests to external links in order to optimise the process. The HTTP 200/30x response means that the links are valid. AEM looks at this cache before requesting a validation of the external link in order to optimize the page processing. This also means that the link validation is NOT real time and the delay is proportional to the load on your server.
All the links that have been checked can be seen via the /etc/linkchecker.html screen where you can request for revalidation and refresh the status of the links.
You can configure the frequency of this background check via the Day CQ Link Checker Service configuration under /system/console/configMgr. The default interval is 5 seconds (scheduler.period parameter).
Under the config manager /system/console/configMgr you will find a lot of other Day CQ Link * configurations that control this feature.
For example, Day CQ Link Checker Transformer contains config for all the elements that need to be transformed by the link checker.
Similarly Day CQ Link Checker Info Storage Service configures the link cache.
Internal Links
Internal links are ignored unless they used FQDN and external urls (which is not normally the case on author). The only exception is in a multi-tenant environment where page from one site links to another site and all the mapping information is stored in sling mappings.

Vanity URL to a content page when activated does not reflect in publish instance in AEM 6.1

What are the best practices for assigning vanity URL to a content page in AEM 6.1.
When an author mentions a vanity path to a page and activates it, it does not really reflect in publish. Problem I observed is: when the save operation carries out on page vanity property, it saves an rewrite rule at the map location, which is generally at /etc/map unless it is specifically changed.
So when the page containing vanity path activates then this rewrite rule does not really activates along, although the JCRResorceResolver map location is same for publish and author instance which is /etc/map.
Therefore, I wanted to understand what is the way of activating the resource resolver rewrite rule along with page activation? Or are there any best practices that the vanity should not be given a control to page editors and should only be performed by an administrator directly in publish instance?
/etc/map has nothing to do with vanity urls. In /etc/map you can manually add some path/host to resource mappings.
When an editor adds a vanity url, the resource resolver(s) will catch that event and add the url to their list - if a page with a vanity url is published, the AEM publish servers will also add the vanity url automativally to their resolvers.
Take a look at /system/console/jcrresolver on both author and publish. You should see the vanity url on both machines.
If you want your vanity url on root level, it should start with "/". Other things that might prevent the vanity url from working:
You might have additional mapping rules on /etc/map(.*).
There is a dispatcher in front of your CQ/AEM Publish server that is filtering or manipulating the incoming urls.
What exactly is the vanity url added by your editor (content path and content of field vanity url) and what is the url you are calling to get the page via vanity url on the publish instance?