Does it make sense building a Knowledge Base using headless CMS?

Does it make sense building a Knowledge Base using headless CMS? - content-management-system

Situation:
I work for a multinational/global organization in the HR dept as Engineering Manager.
HR needs a lot of content related to Immigration, Benefits, Leaves, Disability, Transfers, New Hire Onboarding, Covid Policies, Expense Policies.
These are rendered thru documents/Knowledge bases. As you can imagine for a global corporation that is present in multiple countries this problem can get very complex soon.
Almost all of the content is in terms of text/documents that are not really structured.
Today we are using AEM as the Content Management Platform. AEM was being used in a headful manner but AEM imposed a lot of restrictions when we had to develop Applications on top of AEM
So we are going to use AEM in a headless manner and bring in all the content in content fragments so that those content fragments can be rendered on different portals (some use cases need more than 15 portals)
Questions:
Does it make sense imposing structure on these documents?
Does continuing to use AEM make sense here?
We want to enable reuse of pages : One page is rendered on multiple platforms.
We want to enable reuse of text blocks: One block of text could be used on multiple platforms.
How do we derive information such as breadcrumbs?
How do we build an information tree: e.g. article A , B , C should be shown under US-> Leaves-> Maternal leaves while D,E,F should show under Global -> Leaves -> Bearevent Leaves. That information is not going to be present in content fragments.
How do we build a site map?
How do authors discover information? If I write a content fragment - how do I manage its taxonomy?

Just in case - AEM also delivers any content as json or xml. Or, if your sling models are set up for model.json, you even get a json representation with all the context you need.
So I am not really convinced that this statement is true:
AEM imposed a lot of restrictions when we had to develop Applications
on top of AEM
I have seen projects using experience fragments combined with content fragments in "real" AEM pages. That way you can reuse and combine content parts on several levels and even make use of the multi site manager feature.
Using tags or custom metadata fields (based on a central taxonomy) will help you add information you might need to display content parts without enclosing elements. All you need is a servlet that returns all the content or experience fragments with the right tags attached.
It's hard to tell you more here without doing the complete rwquirements engineering ;-)

in AEM,
do via writing different scripts that use different selectors
do via using reference components
perhaps render breadcrumbs using page path
use a tree structure of tags and tag the items accordingly
you may want to create a custom left side authoring pane picker
The other things are more complex (site maps!) but all can be done with AEM.
You can also use AEM to do some of those things and do other ones outside of it.

I tried my best to answer it :
Does it make sense imposing structure on these documents?
Structuring document will give you more control over content, even better to strategies it. It will help in several things such as planning, searching, filtering.
Does continuing to use AEM make sense here?
I don’t know your exact business entirely however if content have more static content unlike real time websites live update in data then it will help surly.
in addition, CMS market AEM mold good in compare with other CMS such as Sitecore. others CMS using databases whereas AEM content repository. it is debatable which is good
We want to enable reuse of pages : One page is rendered on multiple platforms.
if you are saying pages means it an HTML experience division of page. AEM has good feature of experience Fragment. Yes, there is challenges but still this will fit efficiently
We want to enable reuse of text blocks: One block of text could be used on multiple platforms.
Here also experience fragment can be a good fit, make as many variants as possible you want and reuse it
How do we derive information such as breadcrumbs?
I do not know exact wanting here but I would say implement multiple breadcrumbs for small segments or can implement custom breadcrumbs to target content small segments accordingly. Here if you content is well designed then breadcrumbs will not be real challenge
How do we build an information tree: e.g. article A , B , C should be shown under US-> Leaves-> Maternal leaves while D,E,F should show under Global -> Leaves -> Bearevent Leaves. That information is not going to be present in content fragments.
You can also manage it through Template and restrict the template in a way that page will be created under certain tree. In other way, make taxonomy in such a way it creates structure you have given for example parent page (HR work, Engineering) for each and every business unit its ok to have redundant content, use MSM feature also tag them meaningful way.
How do we build a site map?
As in breadcrumbs, you can also build sitemaps for small segments such as one for HR/US and Engineering/US render them individual or together it does not matter. It will be still well design sitemap
How do authors discover information? If I write a content fragment - how do I manage its taxonomy?
Either to make folder structure in certain way or make variants and use tagging framework
To conclude - No product will be 100% fit for any requirement, it’s just you have to use the product in such a way it will be more and more suitable for your requirement.
Good luck!

Related

AEM6 CQ how to handle Component Development and Content Authoring happening at the same time?

I just started at my new job and found myself right in the middle of a big project using Adobe AEM CQ, which I've never used before. Currently there are developers creating and tweaking components while content authors are busy authoring about 65 pages of content using those components.
Obviously, every time a component changes someone needs to update all the authored content with the new component changes. This is a huge time-waster as it seems like the only way to do this is through a custom made script that looks for nodes in the xml files and tries to convert them to the new component specs. Sometimes this is not even possible and authors need to re-author tons of stuff and lose lots of time.
Can anyone with AEM experience please let me know if:
1) There is a more painless way to migrate authored content to new components?
2) There is a better way to have developers and authors work simultaneously?
I know that the ideal way is to develop components first, and then author on top of those but it seems unrealistic especially with a big client project where things change all the time.
Thanks

Firstly, it sounds like a business process problem. The components should be fully developed and fully tested before content is being added by the authors. If the edits to components are so different that you're having this problem, i would recommend having functional and technical requirements written before the build starts.
With that said, the Groovy console for AEM is an excellent tool for updating nodes and content within an AEM site. Take a look at it here: https://github.com/Citytechinc/cq-groovy-console

I would not agree that content production should happen after all the components where developed. It's beneficial, especially when the content production will take a lot of time, to start it while the development is happening.
On the other hand I completely agree with the other part of the answer. Groovy Console is a way to go, when dealing with content migration (both before Go Live and after, during BAU process). Ideal situation is where all the current content can be mapped to data in new version of component. Then you should be able to migrate all the content with scripts. If that's not the case then you can't run away from authors putting the content manually.

Definitely components should be fully developed before their usage.
But if you want to change something specific in a component which will remain same for the entire website just like logo component or header component you can look into the Design Dialog.
So advantage of it is:
If you have already done authoring for n pages, when you change the component using Design Dialog it will be automatically reflected in all the pages wherever the component is being used.

AEM is a CMS where content is your data is put it in simpler terms. If your development process is such that data is inconsistent with the UI after every release then your delivery process might be at fault. You can use the following ways to make things better:
Make components backward compatible with the data
Make components version-able, i.e. new versions of components work with new models of data and it's left to the user to use new versions.
Provision for data or component migration in your project plan.
In practice, most AEM implementation make components backward compatible and provide an upgrade path to new versions. This is not a technical problem, it's more of a project governance issue.
This post is resurfacing so don't want people to get wrong idea from the current state of answers (and some answers that should be comments IMHO) but the approach in general to deal with components and releases is not a technical problem of the platform.

per-paragraph commenting system

I'm very interested in the emerging trend of comments-per-paragraph systems (also called "annotations systems"), such as the ones implemented by medium.com and qz.com and i'm looking at the idea of developing one for my own.
Question: it seems they are mainly implemented via javascript, that runs through the text's html paragraphs uniquely identified by an id attribute (or, in the case of Medium, a name attribute). Does it mean their CMS actually store each paragraph as a separate entry in the database? Seems overly complex to me, but otherwise, how do they manage the fact that a paragraph can be deleted, edited or moved around in the overall text? How would the unique id be preserved if the author changes the paragraph?
How is that unique id logically structured? (post_id + position_in_post)?
Thank you for your insights...

I can't speak to the medium side, but as one of the developers for Quartz, I can give insight into how qz.com annotations work.
The annotations code is custom php code and is independent of the CMS for publishing articles (wordpress VIP). We do indeed store a reference to each paragraph as a row in the database, in order to track any updates to the article content. We call this an annotation thread and when a user saves an annotation the threadId gets stored along with the annotation.
We do not have a unique id stored on the wordpress side for each paragraph, instead we store the paragraphs relative position in that article (nodeIndex “3" and nodeSelector “p” == the third p-tag in the content body for a given article) and the javascript determines where exactly to place the annotation block. We went this route to avoid heavier customizations on the wordpress side, though depending on your CMS it may be easier to address this directly in the CMS code and add unique ids in the html before sending to the client.
Every time an update to an article is published, each paragraph in the updated article is compared against what was previously stored with the annotation threads for that article. If the position and paragraph text do not match up, it attempts to find the paragraph that is the closest match and update the row for that thread and new threads are created and deleted where appropriate. All of this is handled server side whenever changes are published to an article.
A couple of alternate implications that are also worth looking at are Gawker's Kinja text annotations (currently in use on Jalopnik) and the word-for-word annotations of rapgenius.com.

(disclaimer: I'm a factlink dev.)
I work for a company trying to allow per-paragraph (or per-phrase) commenting on arbitrary sites. Essentially, you've got two choices to identify the anchor of a comment.
Remember the structure of the page (e.g. some path from a root to a paragraph), and place comments at the same position next time.
Identify the content of the paragraph and place comments near identical or similar content next time.
Both systems have their downsides, but you pretty much need to go with option 2 if you want a robust system. Structural identification is fragile in the face of changing structure. Especially irrelevant changes such theming or the precise html tags used can significantly impact the "path". When that happens, you really can't fix it - unless you inspect the content, i.e. option (2).
Sam describes what comes down to a server-side content-based in his answer. Purely client-side content-based matching is what factlink and (IIRC) hypothesis use. Most browsers support non-standard but fast substring search in page content using either window.find or TextRange.findText. Alternatively, you could walk the DOM, which is slower but gives you the flexibility to implement (e.g.) fuzzy matching.
It may seem like client-side matching is overkill or complex, but really, it's simpler: it's a very robust way to decouple your content-management from your commenting. Neither is really simple, so decoupling those concerns can be a win.

I had created a fiddle on the same lines to demonstrate power of JQuery during a training session.
http://fiddle.jshell.net/fotuzlab/Lwhu5/
Might help as a starting point along with Sam's detailed and useful insights. You get the value of textfield in Jquery function where you can send it across to your CMS using ajax/APIs.
PS: The function is not production ready. Its only meant as a starting point. A little tweaking will make it usable.

I've recently published a post on how to do this with WordPress building on an existing plugin.
Like qz.com, I assign paragraph ids on the client and then provide that info to WordPress to store as comment meta when a new comment is created. I used hashing of the paragraph text to create the id which means that the order of paragraphs is unimportant but does mean that if a paragraph is edited then any associated comments become orphaned.
At first I thought this was an issue but thinking about it, if a reader comments on a paragraph then editing that text subsequently seems a little sneaky.
The code is freely available on GitHub if you feel like forking it and enhancing it.

There is one other wordpress plugin called "commentpress" which exist since a long time.
I use an old version of this plugin for my blog and it's work very well.
You can choose to comment per lines or per paragraphs, and ergonomics is really thinking!
A demo here:
http://futureofthebook.org/
and all the code is on github:
https://github.com/IFBook/commentpress-core
After a quick look on the code, it seems they use the second approch like #Eamon Nerbonne explains on his answer.
They parse each paragraphs to make a signature based on the first char of each words. Here is the function to do that.

In case someone comes looking in here, I've implemented a medium like functionality as a Django app.
It is open source and can be found as package on Pypi, and on github.
I used one of my other apps, blogging to allocate unique Paragraph IDs to each content object (currently we're only looking at <p> tags) and puts uses some extra internal meta data in the backend while storing it in DB (MySQL currently, but what we've done is JSONed the Blob, this method is more natively suited for a document oriented DBs). The frontend is mainly jQuery driven with REST API plugging the backend with the frontend.
I took cues from this post, but then rejected the creation of some kind of digest value from paragraph because content can change. What I wanted was to preserve the annotations as long as the paragraph was not completely over-written. In the complete over-write case, I provided for collection of the annotations in an orphaned bucket.
More in these tutorials
A legacy version of the same is running on those tutorials pages, that was the first revision. (But you won't be able to post without logging in, but you could always login using social accounts to check it out :-) )

Can we automate migrating to SDL Tridion?

We are done migrating a website from old CMS to SDL Tridion. We have thousands of clients out of which fewer than five are migrated. Now let's say we need to automate migrating the rest of the thousands clients, obviously we can not use manual effort. Is there a way to develop automated solution against SDL using any APIs it may provide? If yes where can we find documentation for APIs? Any Books or online tutorials for the same?

all very technical answers. Whatever route you choose you need to weigh up the option of not doing a technical migration (and trying to get that right) versus employing a load of students to copy and paste.

Regardless of the CMS, the complexity of a migration can be measured based on how organized is your content in the system you want to migrate from.
I categorize the migration into 3 types related to the Origin and Destination:
1--> CMS to CMS
2--> Database to CMS
3--> WebSite to CMS
If the original source is a database or another CMS typically the complexity is reduced, as the content is already structured.
You have to extract that and map the existing content with the structure that will have in the new system
If the goal is migrate an existing website into a CMS the complexity increases as the content is more disorganized that
having that in the CMS.
Again, if the content in the site is properly structured is still possible to automate that, but most of the cases are old sites
maintained manually.
There are commercial tools that crawl the content from the sites and apply patterns to identify common elements, common content, common metadata, structure
and are able to massage the original content and apply logic based on rules that allows to structure the content, however even the best tool has a hard
work to do when the source is disorganized.
Also I have seen migrations that cut the final html in pieces and put that in the CMS. That is an easy approach but of course a wrong one, as
you are not taking any advantage of the CMS
And 3 Types related the source type we migrate from and the source type we want to obtain
1--> Content to Content
2--> (HTML + Content All together) into (HTML) + (Content) separated
3--> (HTML + Content + Code All together) into (HTML) + (Content) +
(Code) separated
Content to Content Migration is less complex
Second option is of course more complex, as you have to Separate Content and HTML that will become templates
Third option is even more complex, as if you are extracting the html of the page (using an http client for instance as most of the commercial tools do),
you are not capturing the logic of the page. For this case you need to work at the file level
Try to do a very depth analysis before you enter in a migration, as things can turn complex.
Only if you have a very good knowledge of the original system and solid patterns to apply you can think in an automation

Tridion has extensive APIs and these are thoroughly documented. Your starting point for SDL Tridion 2011 is https://www.sdltridionworld.com/downloads/documentation/SDLTridion2011SP1/index.aspx
Automated migrations are perfectly possible, however API support is not the limiting factor here. Understanding your data in your source and target scenarios is much more important.

I would consider contacting Kapow or Vamosa who both specialize in crawling sites and then importing them to a CMS. They both have connectors for SDL Tridion. This may save your clients both time and money.

Every migration is different, unless you are migrating "thousands of" sites (assuming a client is a site) from same source type to same destination (SDL Tridion in this case) with extremely close data models. Several SDL Tridion partners are already solving this problem and built/building assisted migration automation tools. Get in touch with us if you need more information.

Searching for a document format.. flowing layout + page control

I am bouncing around the idea of creating a custom document versioning system to use on business rule manuals. These manuals are broken up into outlined sections which contain one rule per section which are outlined in various ways (1.1, 1.2, etc). There are many manuals which contain the same rule for different locations in the country (down to the state/county level), however many locations will have different versions of the rules depending on business needs or whatnot.
My thought is to create a system which will manage versions of each section/rule separately. This would make the management of this mess much easier to maintain (think hundreds of manuals times hundreds of rules), and it would make fielding query requests from management much quicker.
Ok, it's a fairly easy and straightforward design to this point. Now for the monkey wrench. These rules are regulated by government agencies, so they must be submitted to and approved by state agencies. In doing this, many states require only the exact pages which are updated for each request to be submitted for approval. Once they are approved, these pages will get a new effective date and the rest of the manual will remain the same. There are business reasons for this process.
So my choice of document format has to allow for flowing layout much like Word, however I need to be able to programatically determine the page range of these sections and if changes or additions will cause a repagination.
The most complex layout will contain only tables, headers/footers, and a table of contents. I have thought about using OOXML, but I don't see a way to determine pagination without loading Word which is something I would prefer to avoid. I could create my own pagination algorithm, but that sounds a lot like reinventing the wheel.
Can anyone offer pointers to a solution whether it is an open document format, a book, or something else? Thank you for taking the time to read this.

If you want a truly modular document, then DocBook might be worth a look. You have all the rich formatting you need but it does need a bit of work. It really depends on who's doing the authoring and what tools they're comfortable using. DocBook is a rich mark-up language and you can do anything from work in the base plain text file or look at a number of WYSIWYG editors, e.g. ArborText.
It's not Word though - which might be enough to put your authors off!
If you did go with DocBook, you would maintain each document section in a separate text file so your versioning solution would work well. DocBook can produce output in a number of formats simultaneously so you could have an HTML version, an OOXML version, and a PDF version produced from the same source. A PDF version of each changed section might be appropriate to send to government agencies for approval.
On pagination, you could make life a lot easier for yourself by not having continuous page numbers. Use section or chapter based page numbering, e.g. page I-1, I-2, ..., II-1, II-2.

How to create a deliverable for a front-end engineer?

This is a question about the development workflow of front end engineers. I am starting a project for a rather large site with lots of pages, each page has multiple steps, and it's very difficult to lay out all the content in a spreadsheet.
The content of each page will be delivered in a spreadsheet cell, and some pages have multiple variable section that are determined by user's preferences.
I was asked my opinion about how to structure the deliverable. I am wondering if there is a best practice out there for structuring this kind of deliverable? Because when you have a poorly structured deliverable it can be almost as mindnumbing as using pen-and-pencil to write code.
Do you have any tools, formats, practices for creating deliverables that are easy to work with?

It sounds like you are just doing the UI design and then giving it to the front-end engineers.
If that is correct, I would suggest that you see if you can do the rough html/css work to get the page to look as you want, and then they can go in and give it the functionality, but that way you have an idea what is possible.
You can do much of the work, then leave comments about trying to center something a bit better, for example.
I am not a big fan of just getting the design on paper or as an image, it would be easier to just get the html/css.
There are plenty of tools now that make css and html easy to do, even if you have the css inside the html, they can separate the two, but, it would be a huge help to the designers.
Just do one page, and give it to them, and then come back in a day or two and get feedback as to what their thoughts are, and how you can improve what you give them.
As you go through this process, after a while both groups will know what to expect and you can get the rest done quickly.
This is more of an agile methodology with the front-end engineers as your customers.

My suggestion would be mockups or wireframes for the pages. Mockups would be examples of the pages in various states while the wireframe is a detailed document of the structure of the page.

HTML and CSS is way too complicated for mockup use. I usually first create a requirement backlog for UI/functionalities as well (just a list of priorized reqs in Excel).
Especially for a large site development you should also have the process and data flow definitions done (UML or other way of description) to help you define the mentioned requirements.
Based on these you will know what kind of steps does the whole site funcionality need (i.e. pages) and what the page hierarchy and structure will be like. This way it's much easier to get a grasp of the whole thing.
After that we'll create fast wireframes and visualize the end result with fast mockups done as images with Photoshop or similar. These are absolutely vital in my experience as it helps the customer (and other stakeholders) to actually understand what is beind done. For this the html and css are simply too slow to run multiple iterations with.