What am I missing for this CQ5/AEM URL rewriting scenario? - aem

I basically want short URLs to get resolved and HTML pages to be generated with short URLs for a CQ5 website. So far short URLs are getting mapped to long URLs as expected, but links in the generated HTML pages are not getting shortened.
For example, I am expecting the src attribute of the following <script> tag:
<script type="text/javascript" src="/content/foo/c0/06/9d/3d93a858-efb4-4619-8f9e-5edc65d0f5ae/style/clientlibs.1395978029951.js"></script>
To be shortened to:
<script type="text/javascript" src="/style/clientlibs.1395978029951.js"></script>
But it is not and remains intact. href attributes in anchor elements are not getting shortened either.
In JCR, the website is stored under /content/foo/c0/06/9d/3d93a858-efb4-4619-8f9e-5edc65d0f5ae/ and I have configured my /etc/hosts and Apache config files to make it accessible via http://site-1:4503 in my local development environment.
I have defined the following URL mappings:
{
"jcr:primaryType":"sling:Folder",
"http":{
"jcr:primaryType":"sling:Folder",
"site-1.4503":{
"sling:internalRedirect":[
"/content/foo/c0/06/9d/3d93a858-efb4-4619-8f9e-5edc65d0f5ae"
],
"jcr:primaryType":"sling:Mapping",
"redirect":{
"sling:internalRedirect":[
"/content/foo/c0/06/9d/3d93a858-efb4-4619-8f9e-5edc65d0f5ae/$1",
"/$1"
],
"jcr:primaryType":"sling:Mapping",
"sling:match":"(.+)$"
}
},
"site_1.4503":{
"sling:internalRedirect":[
"/content/foo/c0/06/9d/3d93a858-efb4-4619-8f9e-5edc65d0f5ae/home.html"
],
"jcr:primaryType":"sling:Mapping",
"sling:match":"site-1.4503/$"
}
}
}
When I test this mapping in JCR Resolver (http://localhost:4503/system/console/jcrresolver), it is working as expected. For example,
/content/foo/c0/06/9d/3d93a858-efb4-4619-8f9e-5edc65d0f5ae/style/clientlibs.1395978029951.js
is mapped to
http://site-1:4503/style/clientlibs.1395978029951.js
and
http://site-1:4503/style/clientlibs.1395978029951.js
is resolved to:
JcrNodeResource,
type=cq:ClientLibraryFolder,
superType=null,
path=/content/foo/c0/06/9d/3d93a858-efb4-4619-8f9e-5edc65d0f5ae/style/clientlibs
Also when I go to http://site-1:4503/style/clientlibs.1395978029951.js in my browser, the JS file is rendered as expected.
However when I view the HTML source for the home page, as I mentioned earlier, none of the long URLs are rewritten to their shortened forms.
Any ideas what am I missing here?

By default, CQ rewrites links in a, area and form tags. If you'd like to rewrite also paths in script tag, open OSGi configuration for LinkCheckerTransformerFactory service on publish and add following string to the Rewrite Elements option:
script:src
BTW: /content is not the best place for storing clientlibs. Usually we put this stuff in /etc/designs/YOUR_APP.

We finally managed to pinpoint the issue and fix this.
Somebody had added a com.day.cq.rewriter.linkchecker.impl.LinkCheckerImpl.xml under /apps/myapp/config.publish with the following content:
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"
jcr:primaryType="sling:OsgiConfig"
service.bad_link_tolerance_interval="{Long}48"
service.check_override_patterns="[^.]"
service.special_link_patterns=".*
"
service.special_link_prefix="[javascript:,data:,mailto:,#,<!--,${]"/>
I think the combination of check_override_patterns and special_link_patterns had disabled link shortening.
Removing this file made link shortening work again.

Related

How to prevent foreign GET parameters in TYPO3's canonical tag?

If an uncached page is called in the frontend with a GET parameter that is not foreseen and has been appended to the URL from a link of an external source, like a tracking parameter or something worse e.g. …
https://www.example.com/?note=any-value
… then this foreign parameter is passed on in the automatically generated canonical tag, created by TYPO3's core extension ext:seo. It looks like this:
<link rel="canonical" href="https://www.example.com/?note=any-value&cHash=f2c206f6f14a424fdbf82f683e8bf383"/>
In addition, the page is saved in the cache with this parameter. This means that subsequent visitors will also receive this incorrect canonical tag, even if they call up the page https://www.example.com/ without the parameter.
Is this a bug (tested on TYPO3 10.4.15) or can it be disabled for all unknown parameters by configuration?
If you know the parameter, you can exclude it in the global configuration …
[FE][cacheHash][excludedParameters] = L,pk_campaign,pk_kwd,utm_source,utm_medium,…
… or via ext_localconf.php in the sitepackage:
$GLOBALS['TYPO3_CONF_VARS']['FE']['cacheHash']['excludedParameters'][] = 'tlbid';
I am only concerned with parameters that were not expected. It might make sense to turn the concept around and basically exclude all parameters except for a few self-defined allowed parameters, but I don't know if that is possible so far.
Got it. Actually, TYPO3 handles these already for other common tracking and additional params, like L, utm_campaign, fbclid etc. The whole list of excluded params can be found in the source code.
To add your own, just add/modify the typo3conf/AdditionalConfiguration.php file i.e. just like:
<?php
$GLOBALS['TYPO3_CONF_VARS']['FE']['cacheHash']['excludedParameters'][] = 'note';
$GLOBALS['TYPO3_CONF_VARS']['FE']['cacheHash']['excludedParameters'][] = 'foo';
$GLOBALS['TYPO3_CONF_VARS']['FE']['cacheHash']['excludedParameters'][] = 'bar';
or
<?php
$GLOBALS['TYPO3_CONF_VARS']['FE']['cacheHash']['excludedParameters'] = array_merge(
$GLOBALS['TYPO3_CONF_VARS']['FE']['cacheHash']['excludedParameters'],
['note', 'foo', 'bar'],
);
Don't forget to clear caches after all :D (that should be a TYPO3's slogan)
It's a bug. The extension urlguard2 solves this issue.
it dont work for me in the TYPO3 V11.5.16
LocalConfig:
[FE][cacheHash][excludedParameters] = L,tx_solr,sword_list,utm_source,utm_medi…
Browser URL:
https://www.example.org/testfaelle/test?sword_list%5B0%5D=testf%C3%A4lle&no_cache=1
The HTML Frontend canonical is:
<link rel="canonical" href="https://www.example.org/testfaelle/test?sword_list%5B0%5D=testf%C3%A4lle&cHash=e81add4ca148ad10189b9cbfa4d57100">
Debugging:
if i go in the file: "/typo3/sysext/frontend/Classes/Utility/CanonicalizationUtility.php" and add the Parameters directly: $paramsToExclude[] = 'sword_list'; ist works:
<link rel="canonical" href="https://www.example.org/testfaelle/test">

Submit form to rewritten URLs?

I am trying to create nice URL's for my Magento search form, to make:
http://domain.com/catalogsearch/result/?q=KEYWORD
look like this:
http://domain.com/search/KEYWORD
I have written this is my htaccess file:
RewriteRule ^search/([^/]+)/?$ /catalogsearch/result/?q=$1 [QSA,P,NC]
Which works nicely, when I type in http://domain.com/search/KEYWORD it displays the results as it should.
BUT...
I can't workout how to get my search form to go to the nice format URL, it still goes to the original.
My search form is currently like this:
<form id="search_form" action="http://domain.com/catalogsearch/result/" method="get">
<input id="search" type="search" name="q" value="KEYWORD" maxlength="128">
<button type="submit">search</button>
</form>
Any point in the right direction much appreciated.
There are a couple of things going on here, so let me try to explain the best I can.
First and foremost, your main issue is the generation of this new "pretty" search URL. When you use a <form> with method="GET", each input (i.e. <input name="q">) will get appended to the form's action as a query parameter (you'll get /search?q=foo instead of /search/foo).
In order to fix this, you need to do two things:
Change your form tag to look like this:
<form id="search_form" action="<?php echo Mage::getUrl('search'); ?>" method="GET">
This will ensure that the form is submitted to /search instead of /catalogsearch/result. (You'll still get a ?q=foo, though, and that will be resolved in #2.)
Add a bit of JavaScript which hijacks the form submission and forms the desired URL:
var form = document.getElementById('search_form'),
input = document.getElementById('search');
form.onsubmit = function() {
// navigate to the desired page
window.location = form.action + input.value;
// don't actually submit the form
return false;
};
That'll get you up and running, but there are still some other issues which you should resolve.
Using RewriteRule based rewrites with Magento does not work well. I haven't quite figured out the technical reason for this, but I've had the same trouble that you're having. The reason that your rewrite works with the P flag is because the P flag turns the rewrite into a proxy request. This means that your web server will make another request to itself with the new URL, which avoids the typical RewriteRule trouble you'd run into.
So, how do you utilize a custom pretty URL without using RewriteRule? You use Magento's internal rewrite logic! Magento offers regex-based rewrite logic similar to RewriteRule through its configuration XML:
<config>
<global>
<rewrite>
<some_unique_identifier>
<from><![CDATA[#/search/(.*)/?$#]]></from>
<to><![CDATA[/catalogsearch/result/index/q/$1/]]></to>
<complete />
</some_unique_identifier>
</rewrite>
</global>
</config>
By putting that configuration in one of your modules, Magento will internally rewrite requests of the form /search/foo to /catalogsearch/result/index/q/foo/. Note that you have to use Magento's custom parameter structure (name-value pairs separated by /), as it will not parse query string parameters after it performs this internal rewrite. Also note that you have to specify the full module-controller-action trio (/catalogsearch/result/index/) because otherwise q would be interpreted as an action name, not a parameter name.
This is much better than using a proxy request because it doesn't issue a secondary request, and the rewrite happens in Magento's core route handling logic.
This should be enough to get you completely up and running on the right path. However, if you're interested, you could take this one step further.
By using the above techniques, you'll end up with three URLs for your searches: /search/foo, /catalogsearch/result/?q=foo, and /catalogsearch/result/q/foo. This means that you essentially have three pages for each search query, all with the same content. This is not great for SEO purposes. In order to combat this drawback, you can create a 301 permanent redirect from the second two URLs to redirect to your pretty URL, or you can use a <link rel="canonical"> tag to tell search engines that your pretty URL is the main one.
Anyways, I hope that all of this helps and puts you on the right track!

Does the testandtarget.js file have a purpose?

While analyzing some requests on our dispatcher, we noticed that we continually get a 0 byte file generated from hitting the following path
/etc/clientlibs/foundation/testandtarget
This file is a ClientLibraryFolder. Its js.txt defines the base file as such:
#base=source
There is no "source" folder that is a direct child of testandtarget. The testandtarget folder contains two ClientLibraryFolders, mbox and util. The js in these folders is loaded on the page just fine. This is why Test&Target still works. However, the testandtarget ClientLib seems to be wrong by default (this is the OOB 5.5 setup). We get a 0 byte file because the js.txt file's base points to a folder that does not exist.
Is anyone else seeing the behavior? It appears that I could just rewrite the js.txt file. Are there any ramifications for doing so?
As best I can tell, that node is an empty clientlib, but it has a child node of "mbox" with the same clientlibrary category. That clientlibrary WILL produce content, and references a source folder beneath it.
http://{localhost}/libs/cq/ui/content/dumplibs.test.html?categories=testandtarget
http://{localhost}/libs/cq/ui/content/dumplibs.html?categories=testandtarget&type=JS&theme=
I am not aware of the version history, and whether it used to have valid content, or is planned to in the future.
I would be more tempted to remove or change the category than to play with the js.txt file. Editing the js.txt file will change what content goes into the clientlib. Changing/removing the category would no longer cause a call out to the zero byte file.
<cq:includeClientLib categories="testandtarget" />
=>
<script type="text/javascript" src="/etc/clientlibs/foundation/testandtarget/mbox.js">
<script type="text/javascript" src="/etc/clientlibs/foundation/testandtarget.js">

Lazy load github gist files to display source code on the website

I have a couple of gists which I need to include in a website post to showcase the source code. Currently, I'm inlining each of the multiple gists at various places in the HTML with script tags, however, this would be a blocking call. So, is there a way to dynamically load the gists and paste it specific points in time.
I tried something like below :-
<html>
<body>
<div id="bookmarklet_1.js"></div>
<div id="bookmarklet_2.js"></div>
<div id="bookmarklet_3.js"></div>
var scriptMap = {'bookmarklet_1.js' : 'https://gist.github.com/892232.js?file=bookmarklet_1.js',
'bookmarklet_2.js' : 'https://gist.github.com/892234.js?file=bookmarklet_2.js',
'bookmarklet_3.js' : 'https://gist.github.com/892236.js?file=bookmarklet_3.js'};
var s, scr, holder;
for(s in scriptMap){
holder = document.getElementById(s);
scr= document.createElement('script');
scr.type= 'text/javascript';
scr.src= scriptMap[s];
holder.appendChild(scr);
}
</script>
</body>
</html>
The above didn't work for me, it seems that each script is doing a document.write internally to write the CSS and soure code. Has anyone tried this before or got it working ?
I started a project exactly for this purpose. Dynamically-embedded Gists
Try it now: http://urlspoiler.herokuapp.com/gists?id=992729
Use the above url as the src of a dynamically-created iframe, or add &format=html to get the Gist html snippet via ajax, then put it anywhere you want. (The gist in the above url also happens to be the documentation for how to use this project.)
I myself wanted to do exactly the same thing (with the addition of even removing the default gist style link) - ended up building a "generic" script loader that handles document.write calls :
https://github.com/kares/script.js
Here's how one can use it for embedding gists (and pasties) :
https://github.com/kares/script.js/blob/master/examples/gistsAndPasties.html
You can now get the HTML + CSS directly using JSONP.
I wrote a fuller answer in response to this question, but the key is that you can get the HTML + CSS using JSONP.
For example: https://gist.github.com/anonymous/5446989.json?callback=callback12345
callback12345({
"description": "Function to load a Gist without an iframe",
"public": true,
...
"div": <HTML code>,
"stylesheet": <URL of CSS file>
})

Facelets charset problem

In my earlier post there was a problem with JSF charset handling, but also the other part of the problem was MySQL connection parameters for inserting data into db. The problem was solved.
But, I migrated the same application from JSP to facelets and the same problem happened again. Characters from input fields are replaced when inserting to database (č is replaced with Ä), but data inserted into db from sql scripts with proper charset are displayed correctly. I'm still using registered filter and page templates are used with head meta tag as following:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-2">
If I insert into h:form tag the following attribute:
acceptcharset="iso-8859-2"
I get correct characters in Firefox, but not in IE7.
Is there anything else I should do to make it work?
Thanks in advance.
Add the following line to the filter:
response.setContentType("text/html;charset=ISO-8859-2");
Don't use acceptcharset attribute. IE has serious bugs with it.
Also, when you're using a <?xml?> declaration in top of Facelets XHTML page, ensure that it's using the desired charset or just remove the whole declaration, it's not strictly required.
<?xml version="1.0" encoding="ISO-8859-2"?>
i think you can see the implementation of org.springframework.web.filter.CharacterEncodingFilter
and you can start your tomcat by adding -Dfile.encoding=ISO-8859-2 as jvm arguments