CQ - Moving content from one page to another - aem

I realize that this is a pretty specific question but I would imagine someone has run into this before. So I've got about fifty pages or so that were created about a year ago. We're trying to revamp the page with new components specifically in the header and the footer. Except the content in the main-content area will stay the same. So I'm trying to move over everything from the old pages to the new pages but just keep the main-content area. The problem is I can't just change the resource type on the old page to point to the new page components because the content is different and I'll have a bunch of nodes in the header and footer that I don't want. For example here is my current content structure:
Old Content
star-trek
jcr:content
header
nav
social
chat
main-content
column-one
column-two
footer
sign-up
mega-menu
New Content
star-wars
jcr:content
masthead
mega-menu
main-content
column-one
column-two
bottom-footer
left-links
right-links
Does anybody have any ideas on how to move just the content in the main-content node and somehow remove the other nodes. I'm trying to somehow do this programmatically cause I don't want to create 50 pages from scratch. Any help is appreciated!

You can use the JCR API to move things around at will, I would
Block users from accessing the content in question. Can be done with temporary ACLs, or by closing access on the front-end if you can.
Run a script or servlet that changes the content using JCR APIs
Check the results
Let users access the content again
For the content modification script I suggest a script that modifies a single page (i.e. you call it with an HTTP request that ends in /content/star-trek.modify.txt) so that you can run it either on a single page, for testing, or on a group of pages once it's good.
The script starts form the current node, recurses into it to find nodes that it knowns how to modify (based on their sling:resourceType), modifies them and reports what it did in the logs or on its output.
To modify nodes the script uses the JCR Node APIs to move things around (and maybe Worskpace.move).

It is indeed possible to write a code which does what you need :
package com.test;
import java.io.File;
import java.io.IOException;
import javax.jcr.ItemExistsException;
import javax.jcr.Repository;
import javax.jcr.RepositoryException;
import javax.jcr.Session;
import javax.jcr.SimpleCredentials;
import javax.jcr.Node;
import org.apache.jackrabbit.commons.JcrUtils;
import org.apache.jackrabbit.core.TransientRepository;
import org.xml.sax.SAXException;
public class test {
public void test(Document doc) throws RepositoryException {
try {
// Create a connection to the CQ repository running on local host
Repository repository = JcrUtils
.getRepository("http://localhost:4502/crx/server");
System.out.println("rep is created");
// Create a Session
javax.jcr.Session session = repository.login(new SimpleCredentials(
"admin", "admin".toCharArray()));
System.out.println("session is created");
String starTrekNodePath = "/content/path/";
String starWarsNodePath = "/content/anotherPath"
Node starTrekpageJcrNode = null;
Node starWarstext = null;
setProperty(java.lang.String name, Node value)
boolean starTrekNodeFlag = session.nodeExists(starTrekNodePath);
boolean starWarsNodeFlag = session.nodeExists(starWarsNodePath);
if (starTrekNodeFlag && starWarsNodeFlag) {
System.out.println("to infinity and beyond");
Node starTrekNode = session.getNode(starTrekNodePath);
Node starWarsNodeFlag = session.getNode(starWarsNodePath);
//apply nested looping logic here; to loop through all pages under this node
//assumption is that you have similar page titles or something
//on these lines to determine target and destination nodes
//2nd assumption is that destination pages exist with the component structures in q
//once we have the target nodes, the following segment should be all you need
Node starTrekChildNode = iterator.next();//assuming you use some form of iterator for looping logic
Node starWarsChildNode = iterator1.next();//assuming you use some form of iterator for looping logic
//next we get the jcr:content node of the target and child nodes
Node starTrekChildJcrNode = starTrekChildNode.getNode("jcr:content");
Node starWarsChildJcrNode = starWarsChildNode.getNode("jcr:content");
// now we fetch the main-component nodes.
Node starTrekChildMCNode = starTrekChildJcrNode.getNode("main-content");
Node starWarsChildMCNode = starWarsChildJcrNode.getNode("main-content");
//now we fetch each component node
Node starTrekChildC1Node = starTrekChildMCNode.getNode("column-one");
Node starTrekChildC2Node = starTrekChildMCNode.getNode("column-two");
Node starWarsChildC1Node = starWarsChildMCNode.getNode("column-one");
Node starWarsChildC2Node = starWarsChildMCNode.getNode("column-two");
// fetch the properties for each node of column-one and column-two from target
String prop1;
String prop2;
PropertyIterator iterator = starTrekChildC1Node.getProperties(propName);
while (iterator.hasNext()) {
Property prop = iterator.nextProperty();
prop1 = prop.getString();
}
PropertyIterator iterator = starTrekChildC2Node.getProperties(propName);
while (iterator.hasNext()) {
Property prop = iterator.nextProperty();
prop2 = prop.getString();
}
// and now we set the values
starWarsChildC1Node.setProperty("property-name",prop1);
starWarsChildC2Node.setProperty("property-name",prop2);
//end loops as appropriate
}
Hopefully this should set you on the right track. You'd have to figure out how you want to identify destination and target pages, based on your folder structure in /content, but the essential logic should be the same

The problem with the results I'm seeing here is that you are writing servlets that execute JCR operations to move things around. While technically that works, it's not really a scalable or reusable way to do this. You have to write some very specific code, deploy it, execute it, then delete it (or it lives out there forever). It's kind of impractical and not totally RESTful.
Here are two better options:
One of my colleagues wrote the CQ Groovy Console, which gives you the ability to use Groovy to script changes to the repository. We frequently use it for content transformations, like you've described. The advantage to using Groovy is that it's script (not compiled/deployed code). You still have access to the JCR API if you need it, but the console has a bunch of helper methods to make things even easier. I highly recommend this approach.
https://github.com/Citytechinc/cq-groovy-console
The other approach is to use the Bulk Editor tool in AEM. You can export a TSV of content, make changes, then reimport. You'll have to turn the import feature on using an administrative function, but I've used this with some success. Beware, it's a bit buggy though, with array value properties.

Related

Moving Wagtail pages in a migration

I'm restructuring my Wagtail app to remove an IndexPage that only has a single item in it, and moving that item to be a child of the current IndexPage's parent.
basically moving from this:
Page--|
|--IndexPage--|
|--ChildPages (there's only ever 1 of these)
to this:
Page--|
|--ChildPage
I've made the changes to the models so that this structure is used for creating new content and fixed the relevant views to point to the ChildPage directly. But now I want to migrate the current data to the new structure and I'm not sure how to go about it... Ideally this would be done in a migration so that we would not have to do any of this manipulation by hand.
Is there a way to move these ChildPage's up the tree programmatically during a migration?
Unfortunately there's a hard limitation that (probably) rules out the possibility of doing page tree adjustments within migrations: tree operations such as inserting, moving and deleting pages are implemented as methods on the Page model, and within a migration you only have access to a 'dummy' version of that model, which only gives you access to the database fields and basic ORM methods, not those custom methods.
(You might be able to work around this by putting from wagtail.wagtailcore.models import Page in your migration and using that instead of the standard Page = apps.get_model("wagtailcore", "Page") approach, but I wouldn't recommend that - it's liable to break if the migration is run at a point in the migration sequence where the Page model is still being built up and doesn't match the 'real' state of the model.)
Instead, I'd suggest writing a Django management command to do the tree manipulation - within a management command it is safe to import the Page model from wagtailcore, as well as your specific page models. Page provides a method move(target, pos) which works as per the Treebeard API - the code for moving your child pages might look something like:
from myapp.models import IndexPage
# ...
for index_page in IndexPage.objects.all():
for child_page in index_page.get_children():
child_page.move(index_page, 'right')
index_page.delete()
Theoretically it should be possible to build a move() using the same sort of manipulations that Daniele Miele demonstrates in Django-treebeard and Wagtail page creation. It'd look something like this Python pseudocode:
def move(page, target):
# assuming pos='last_child' but other cases follow similarly,
# just with more bookkeeping
# first, cut it out of its old tree
page.parent.numchild -= 1
for sib in page.right_siblings: # i.e. those with a greater path
old = sib.path
new = sib.path[:-4] + (int(sib.path[-4:])-1):04
sib.path = new
for nib in sib.descendants:
nib.path = nib.path.replace_prefix(old, new)
# now, update itself
old_path = page.path
new_path = target.path + (target.numchild+1):04
page.path = new_path
old_url_path = page.url_path
new_url_path = target.url_path + page.url_path.last
page.url_path = new_url_path
old_depth = page.depth
new_depth = target.depth + 1
page.depth = new_depth
# and its descendants
depth_change = new_depth - old_depth
for descendant in page.descendants:
descendant.path = descendant.path.replace_prefix(old_path, new_path)
descendant.url_path = descendant.url_path.replace_prefix(old_path, new_path)
descendant.depth += depth_change
# finally, update its new parent
target.numchild += 1
The core concept that makes this manipulation simpler than it looks is: when a node gets reordered or moved, all its descendants need to be updated, but the only update they need is the exact same update their ancestor got. It's applied as a prefix replacement (if str) or a difference (if int), neither of which requires knowing anything about the descendant's exact value.
That said, I haven't tested it; it's complex enough to be easy to mess up; and there's no way of knowing if I updated every invariant that Wagtail cares about. So there's something to be said for the management command way as well.

load only components scripts that are in the current page

What I'm trying to achieve is that if i have 2 components nodes :
component1
clientlib
component1.js
component2
clientlib
component2.js
and i drag them into page1, then when page1 is generated, only component1.js and component2.js will be loaded when navigating to page1 .
One approach i saw is to use custom Tag Library as described here : http://www.icidigital.com/blog/best-approaches-clientlibs-aem-part-3/
I have two questions :
1) is there an existing feature in AEM to do this ?
2) if not, what is the easiest way to create such custom Tag Library ?
EDIT:
Assume that there is no ability to just include all component clientLibs, rather load only those that are added to the page.
There is no built in feature to do this. Although I've heard that the clientlib infrastructure is being looked at for a re-write so I'm optimistic that something like this will be added in the future.
We have, and I know other company have, created a "deferred script tag." Ours is a very simple tag that take a chunk of html like a clientlib include, add it to a unique list and then on an out call at the footer, spits it all out one after another.
Here's the core of a simple tag implementation that extends BodyTagSupport. Then in your footer grab the attribute and write it out.
public int doEndTag() throws JspException {
SlingHttpServletRequest request = (SlingHttpServletRequest)pageContext.getAttribute("slingRequest");
Set<String> delayed = (Set<String>)request.getAttribute(DELAYED_INCLUDE);
if(delayed == null){
delayed = new HashSet<String>();
}
if(StringUtils.isNotBlank(this.bodyContent.getString())){
delayed.add(this.bodyContent.getString().trim());
}
request.setAttribute(DELAYED_INCLUDE, delayed);
return EVAL_PAGE;
}
Theoretically the possible way of doing is to write script in your page component/abstract page component that does something like this -
Step1 : String path = currentPage.getPath()
Step2 : Query this path for components (one way is to have a master list do a contains clause on sling:resourceType)
Step 3: User resource resolver to resolve the resourceType in Step 3, this will give you resource under your apps.
Step 4: From the above resource get the sub-resource with primary type as cq:ClientLibraryFolder
Step 5: from the client libs resource in Step 4 get the categories and include the JS from them
you could actually write a model to adapt a component resource to a clientLibrary to actually clean the code.
Let me know if you need actual code, I can write that in my free time.

CQ5 / AEM5.6 Workflow: Access workflow instance properties from inside OR Split

TL;DR version:
In CQ workflows, is there a difference between what's available to the OR Split compared to the Process Step?
Is it possible to access the /history/ nodes of a workflow instance from within an OR Split?
How?!
The whole story:
I'm working on a workflow in CQ5 / AEM5.6.
In this workflow I have a custom dialog, which stores a couple of properties on the workflow instance.
The path to the property I'm having trouble with is: /workflow/instances/[this instance]/history/[workItem id]/workItem/metaData and I've called the property "reject-or-approve".
The dialog sets the property fine (via a dropdown that lets you set it to "reject" or "approve"), and I can access other properties on this node via a process step (in ecma script) using:
var actionReason;
var history = workflowSession.getHistory(workItem.getWorkflow());
// loop backwards through workItems
// and as soon as we find a Action Reason that is not empty
// store that as 'actionReason' and break.
for (var index = history.size() - 1; index >= 0; index--) {
var previous = history.get(index);
var tempActionReason = previous.getWorkItem().getMetaDataMap().get('action-message');
if ((tempActionReason != '')&&(tempActionReason != null)) {
actionReason = tempActionReason;
break;
}
}
The process step is not the problem though. Where I'm having trouble is when I try to do the same thing from inside an OR Split.
When I try the same workflowSession.getHistory(workItem.getWorkflow()) in an OR Split, it throws an error saying workItem is not defined.
I've tried storing this property on the payload instead (i.e. storing it under the page's jcr:content), and in that case the property does seem to be available to the OR Split, but my problems with that are:
This reject-or-approve property is only relevant to the current workflow instance, so storing it on the page's jcr:content doesn't really make sense. jcr:content properties will persist after the workflow is closed, and will be accessible to future workflow instances. I could work around this (i.e. don't let workflows do anything based on the property unless I'm sure this instance has written to the property already), but this doesn't feel right and is probably error-prone.
For some reason, when running through the custom dialog in my workflow, only the Admin user group seems to be able to write to the jcr:content property. When I use the dialog as any other user group (which I need to do for this workflow design), the dialog looks as though it's working, but never actually writes to the jcr:content property.
So for a couple of different reasons I'd rather keep this property local to the workflow instance instead of storing it on the page's jcr:content -- however, if anyone can think of a reason why my dialog isn't setting the property on the jcr:content when I use any group other than admin, that would give me a workaround even if it's not exactly the solution I'm looking for.
Thanks in advance if anyone can help! I know this is kind of obscure, but I've been stuck on it for ages.
a couple of days ago i ran into the same issue. The issue here is that you don't have the workItem object, because you don't really have an existing workItem. Imagine the following: you are going through the workflow, you got a couple of workItems, with means, either process step, either inbox item. When you are in an or split, you don't have existing workItems, you can ensure by visiting the /workItems node of the workflow instance. Your workaround seems to be the only way to go through this "issue".
I've solved it. It's not all that elegant looking, but it seems to be a pretty solid solution.
Here's some background:
Dialogs seem to reliably let you store properties either on:
the payload's jcr:content node (which wasn't practical for me, because the payload is locked during the workflow, and doesn't let non-admins write to its jcr:content)
the workItem/metaData for the current workflow step
However, Split steps don't have access to workItem. I found a fairly un-helpful confirmation of that here: http://blogs.adobe.com/dmcmahon/2013/03/26/cq5-failure-running-script-etcworkflowscriptscaworkitem-ecma-referenceerror-workitem-is-not-defined/
So basically the issue was, the Dialog step could store the property, but the OR Split couldn't access it.
My workaround was to add a Process step straight after the Dialog in my workflow. Process steps do have access to workItem, so they can read the property set by the Dialog. I never particularly wanted to store this data on the payload's jcr:content, so I looked for another location. It turns out the workflow metaData (at the top level of the workflow instance node, rather than workItem/metaData, which is inside the /history sub-node) is accessible to both the Process step and the OR Split. So, my Process step now reads the workItem's approveReject property (set by the Dialog), and then writes it to the workflow's metaData node. Then, the OR Split reads the property from its new location, and does its magic.
The way you access the workflow metaData from the Process step and the OR Split is not consistent, but you can get there from both.
Here's some code: (complete with comments. You're welcome)
In the dialog where you choose to approve or reject, the name of the field is set to rejectApprove. There's no ./ or anything before it. This tells it to store the property on the workItem/metaData node for the current workflow step under /history/.
Straight after the dialog, a Process step runs this:
var rejectApprove;
var history = workflowSession.getHistory(workItem.getWorkflow());
// loop backwards through workItems
// and as soon as we find a rejectApprove that is not empty
// store that as 'rejectApprove' and break.
for (var index = history.size() - 1; index >= 0; index--) {
var previous = history.get(index);
var tempRejectApprove = previous.getWorkItem().getMetaDataMap().get('rejectApprove');
if ((tempRejectApprove != '')&&(tempRejectApprove != null)) {
rejectApprove = tempRejectApprove;
break;
}
}
// steps up from the workflow step into the workflow metaData,
// and stores the rejectApprove property there
// (where it can be accessed by an OR Split)
workItem.getWorkflowData().getMetaData().put('rejectApprove', rejectApprove);
Then after the Process step, the OR Split has the following in its tabs:
function check() {
var match = 'approve';
if (workflowData.getMetaData().get('rejectApprove') == match) {
return true;
} else {
return false;
}
}
Note: use this for the tab for the "approve" path, then copy it and replace var match = 'approve' with var match = 'reject'
So the key here is that from a Process step:
workItem.getWorkflowData().getMetaData().put('rejectApprove', rejectApprove);
writes to the same property that:
workflowData.getMetaData().get('rejectApprove') reads from when you execute it in an OR Split.
To suit our business requirements, there's more to the workflow I've implemented than just this, but the method above seems to be a pretty reliable way to get values that are entered in a dialog, and access them from within an OR Split.
It seems pretty silly that the OR Split can't access the workItem directly, and I'd be interested to know if there's a less roundabout way of doing this, but for now this has solved my problem.
I really hope someone else has this same problem, and finds this useful, because it took me waaay to long to figure out, to only apply it once!

How to obtain wicket URL from PageClass and PageParameters without running Wicket application (i.e. without RequestCycle)?

In my project, there are additional (non-wicket) applications, which need to know the URL representation of some domain objects (e.g. in order to write a link like http://mydomain.com/user/someUserName/ into a notification email).
Now I'd like to create a spring bean in my wicket module, exposing the URLs I need without having a running wicket context, in order to make the other application depend on the wicket module, e.g. offering a method public String getUrlForUser(User u) returning "/user/someUserName/".
I've been stalking around the web and through the wicket source for a complete workday now, and did not find a way to retrieve the URL for a given PageClass and PageParameters without a current RequestCycle.
Any ideas how I could achieve this? Actually, all the information I need is somehow stored by my WebApplication, in which I define mount points and page classes.
Update: Because the code below caused problems under certain circumstances (in our case, being executed subsequently by a quarz scheduled job), I dived a bit deeper and finally found a more light-weight solution.
Pros:
No need to construct and run an instance of the WebApplication
No need to mock a ServletContext
Works completely independent of web application container
Contra (or not, depends on how you look at it):
Need to extract the actual mounting from your WebApplication class and encapsulate it in another class, which can then be used by standalone processes. You can no longer use WebApplication's convenient mountPage() method then, but you can easily build your own convenience implementation, just have a look at the wicket sources.
(Personally, I have never been happy with all the mount configuration making up 95% of my WebApplication class, so it felt good to finally extract it somewhere else.)
I cannot post the actual code, but having a look at this piece of code will give you an idea how you should mount your pages and how to get hold of the URL afterwards:
CompoundRequestMapper rm = new CompoundRequestMapper();
// mounting the pages
rm.add(new MountedMapper("mypage",MyPage.class));
// ... mount other pages ...
// create URL from page class and parameters
Class<? extends IRequestablePage> pageClass = MyPage.class;
PageParameters pp = new PageParameters();
pp.add("param1","value1");
IRequestHandler handler = new BookmarkablePageRequestHandler(new PageProvider(MyPage.class, pp));
Url url = rm.mapHandler(handler);
Original solution below:
After deep-diving into the intestines of the wicket sources, I was able to glue together this piece of code
IRequestMapper rm = MyWebApplication.get().getRootRequestMapper();
IRequestHandler handler = new BookmarkablePageRequestHandler(new PageProvider(pageClass, parameters));
Url url = rm.mapHandler(handler);
It works without a current RequestCycle, but still needs to have MyWebApplication running.
However, from Wicket's internal test classes, I have put the following together to construct a dummy instance of MyWebApplication:
MyWebApplication dummy = new MyWebApplication();
dummy.setName("test-app");
dummy.setServletContext(new MockServletContext(dummy, ""));
ThreadContext.setApplication(dummy);
dummy.initApplication();

Problem importing ZEXP files programmatically

I'm developing a Plone Product that needs to import objects programmatically previously exported to ZEXP files. It's working perfectly, except the navigation bar. When one object is imported, it does so correctly, but the navication bar is not updated. The object can be accessed through it's URL and it's parent container contents tab.
Bellow is the code I used to import the objects. It's based on zope's ObjectManager._importObjectFromFile implementation.
def importDocument( app, fileName, container ):
app._p_jar.sync()
owner = 1
connection = container._p_jar
ob = connection.importFile( config.REMOTE_DIR + fileName, customImporters={ magic: importXML, } )
id = ob.id
if hasattr(id, 'im_func'): id = id()
try:
container._setObject( id, ob, set_owner = owner, suppress_events=False )
except AttributeError:
print "AttributeError"
# Try to make ownership implicit if possible in the context
# that the object was imported into
ob = container._getOb( id )
ob.manage_changeOwnershipType( explicit = 0 )
transaction.commit()
return True
I've noticed that the _setObject implementation fires an ObjectAddedEvent event in it's code, and it's after that event that the menu gets updated when I use the ZMI interface to import an object, so I figure something is listening to this event and handling the menu, but oddly, it doesn't happen when using my code.
Generally speaking, importing zexp objects is not supported (in part due to cases like this where unexpected or unintended results may occur). If it works, great. If it doesn't, you are "on your own" and probably better off copying the Data.fs file to a new software stack.
That said, I'm not sure I understand why clear and rebuild the catalog (ZMI -> portal_catalog -> tab 'advance' -> 'clear & rebuild') is not the answer here. According to its description its job is to "walk the entire portal looking for content objects which should be indexed in the catalog and index them".
Unless I misunderstand, you've just described a situation where newly imported content "should be indexed" because it hasn't been indexed yet.
If you are worried about the length of time required to clear and rebuild, try running it from the command line with something like this:
http://svn.plone.org/svn/plone/plone.org/Products.PloneOrg/trunk/scripts/catalog_rebuild.py
If you are worried about crawling the whole site, then call indexObject() on each object (http://dev.plone.org/plone/browser/plone.org/Products.PloneOrg/trunk/scripts/catalog_rebuild.py#L109)
Maybe try manually rebuilding the whole catalog after the import is complete? It might give you more hints to what is wrong ...
ZMI -> portal_catalog -> tab 'advance' -> 'clear & rebuild'.
You may need to "publish" the object after import to make it visible.
Use the manage_importObject method instead.