Is there a limit to the no. of versions a content item can have in AEM? I want to retain all the versions of my page. As in, unlimited.
Want to know if AEM has a limit internally after which it automatically removes older versions?
Appreciate any thoughts on this.
Although this is not recommended but you can disable the version manager by configuring the versionmanager.purgingEnabled to false. You will need to configure this as described in the document below:
https://docs.adobe.com/docs/en/aem/6-3/deploy/configuring/version-purging.html#Version Manager
Retaining lots of versions will gradually slow down your instance and result in poor authoring performance as the storage (Tar or Mongo) will grow large with stale data.
It is normally recommended to retain versions by a fixed number of days or fixed number of version counts.
For performance reasons, it is better to backup your AEM instance for older archived versions and rely on a restore function to access those versions.
I was asking this question once to Adobe DayCare and received the similar response like in i.net post - it is possible to disable purging the versioning of the page however it comes with the risk of authoring performance issues - pages can start loading very slowly.
The solutions that were suggested (depending on the requirements):
backing up an instance, which is not the best one if you need to be able to retrieve or compare old content anytime, recover if needed; the disadvantage is that all copy of instance needs to be stored and it needs to be repeated from time to time (when you notice performance issues)
designing and implementing a custom solution with an additional instance that would be responsible for storing these versions - I have no much details on that solution however as I understood, it would require deep analysis how it can be done
if the access to previous content is needed only for historical reasons (no need to retrieve it and publish once again) then taking use of the page to PDF extraction mechanism and storing history in DAM or another place; you can then also consider saving to PDF screenshot all page with design (not content only), presenting different browser breakpoints, annotations, etc. depending on requirements
Related
I am working on a website of 3,000+ pages that is updated on a daily basis. It's already built on an open source CMS. However, we cannot simply continue to apply hot fixes on a regular basis. We need to replace the entire system and I anticipate the need to replace the entire system on a 1-2 year basis. We don't have the staff to work on a replacement system while the other is being worked on, as it results in duplicate effort. We also cannot have a "code freeze" while we work on the new site.
So, this amounts to changing the tire while driving. Or fixing the wings while flying. Or all sorts of analogies.
This brings me to a concept called "continuous migration." I read this article here: https://www.acquia.com/blog/dont-wait-migrate-drupal-continuous-migration
The writer's suggestion is to use a CDN like Fastly. The idea is that a CDN allows you to switch between a legacy system and a new system on a URL basis. This idea, in theory, sounds like a great idea that would work. This article claims that you can do this with Varnish but Fastly makes the job easier. I don't work much with Varnish, so I can't really verify its claims.
I also don't know if this is a good idea or if there are better alternatives. I looked at Fastly's pricing scheme, and I simply cannot translate what it means to a specific price point. I don't understand these cryptic cloud-service pricing plans, they don't make sense to me. I don't know what kind of bandwidth the website uses. Another agency manages the website's servers.
Can someone help me understand whether or not using an online CDN would be better over using something like Varnish? Is there free or cheaper solutions? Can someone tell me what this amounts to, approximately, on a monthly or annual basis? Any other, better ways to roll out a new website on a phased basis for a large website?
Thanks!
I think I do not have the exact answers to your question but may be my answer helps a little bit.
I don't think that the CDN gives you an advantage. It is that you have more than one system.
Changes to the code
In professional environments I'm used to have three different CMS installations. The fist is the development system, usually on my PC. That system is used to develop the extensions, fix bugs and so on supported by unit-tests. The code is committed to a revision control system (like SVN, CVS or Git). A continuous integration system checks the commits to the RCS. When feature is implemented (or some bugs are fixed) a named tag will be created. Then this tagged version is installed on a test-system where developers, customers and users can test the implementation. After a successful test exactly this tagged version will be installed on the production system.
A first sight this looks time consuming. But it isn't because most of the steps can be automated. And the biggest advantage is that the customer can test the change on a test system. And it is very unlikely that an error occurs only on your production system. (A precondition is that your systems are build on a similar/equal environment. )
Changes to the content
If your code changes the way your content is processed it is an advantage when your
CMS has strong workflow support. Than you can easily add a step to your workflow
which desides if the content is old and has to be migrated for the current document.
This way you have a continuous migration of the content.
HTH
Varnish is a cache rather than a CDN. It intercepts page requests and delivers a cached version if one exists.
A CDN will serve up contents (images, JS, other resources etc) from an off-server location, typically in the cloud.
The cloud-based solutions pricing is often very cryptic as it's quite complicated technology.
I would be careful with continuous migration. I've done both methods in the past (continuous and full migrations) and I have to say, continuous is a pain. It means double the admin time for everything, and assumes your requirements are the same at all points in time.
Unfortunately, I would say you're better with a proper rebuilt on a 1-2 year basis than a continuous migration, but obviously you know best about that.
I would suggest you maybe also consider a hybrid approach? Build yourself an export tool to keep all of your content in a transferrable state like CSV/XML/JSON so you can just import into a new system when ready. This means you can incorporate new build requests when you need them in a new system (what's the point in a new system if it does exactly the same as the old one) and you get to keep all your content. Plus you don't need to build and maintain two CMS' all the time.
We are getting lot of problems with dispatcher, As per CQ5 documentation dispatcher is cache and/or load balancing tool, so as per my analysis we can go with out dispatcher also,I am correct? I want to integrate Squid or varnish web cache with my apache, so want get shutdown the dispatcher, will it be a good option
Any views/help is appreciated.
Yes, it's perfectly possible to run a website without the Dispatcher in front. Your options would then seem to come down to:
No caching
Implementing a cache in front of the Publish instance (e.q. Squid/Varnish, as you mentioned; configuration required)
Integrate a caching solution in Java that you can apply to parts of your templates/components individually (development required)
Also, you'd need to check with Adobe what level of support they'd give you for any of the above solutions before undertaking them. If you like, you could post specific questions to SO around the problems you're facing with the Dispatcher and you may get some resolutions too.
I was told that you should use dispatcher servers for your publish instance, because it really helps the loading times. There also was a documentation with a table showing how much it affects the performance depending on the number of documents served.
To avoid caching problems, you can specify files, folders or file types which should never be cached. You can also specify caching behaviour in the source code of the pages. Also, making changes to content on your author instance triggers a flush on the dispatcher for the affected content, to make sure that no cached old version is beeing served.
Last but not least using an apache server also allows you to handle virtual hosts and rewrite rules easily.
Its a must.
If you are getting problems with dispatcher, this could be a sign that you are using the wrong platform for your development needs. Seeing as you are needing to revert to technologies that are not needed for AEM.
I believe Wordpress stores multiple entries of posts as "revisions" but I think thats terribly inefficient use of space?
Is there a better way? I think gitit is a Wiki that uses GIT for version control, but how is it done? eg. my application is in PHP and I must make it talk to GIT to commit and retrieve data?
So, what is a good way of implementing version control in web apps (eg. in a blog it might be the post content)
I've recently implemented just such a system - which uses the concept of superseded records, together with a previous and current link. I did a considerable amount of research into how best to achieve this - in the end the model I arrived at is similar to the Wordpress (and other systems) - store the changes as a new record and use this.
Considering all of the options available, space is really the last concern for authored content such as posts - media files take up way more space and these can't be stored as deltas anyway.
In any case the way that Git works is virtually identical in that it stores the entire content for every revision except that it will eventually pack down into deltas (or when you ask it to).
Going back to 1990 we were using SCCS or RCS and sometimes with only 30mb of disk space free we really needed the version control to be efficient to avoid running out of storage.
Using deltas to save space is not really worth all of the associated aggravation given the average amount of available storage on modern systems. You could argue it's wasteful of space, however I'd argue that it is much more efficient in the long run to store things uncompressed in their original form
it's faster
it's easier to search through old versions
it's quicker to view
it's easier to jump into the middle of a set of changes without having to process a lot of deltas.
it's a lot easier to implement because you don't have to write delta generation algorithms.
Also markup doesn't fare as well as plain text with deltas especially when editing with a wysiwyg editor.
Keep one table with the most recent version of the e.g. article.
When a new version is saved, move the current over in an archive table and put a version number on it while keeping the most recent version in the first table.
The archive table can have the property ROW_FORMAT=COMPRESSED (MySQL InnoDb example) to take up less space and it won't be a performance issue since it is rarely accessed. Yes, it is somewhat overhead not to only store changesets but if you do some math you can keep a huge amount of revisions in almost no space as your articles are highly compressable text anyway.
In example, the source code of this entire page is 11Kb compressed. That gives you almost 100 versions on 1Mb. In comparison, normal articles are quite a bit smaller and may on average give you 500-1000 articles/versions on 1Mb. You can probbably afford that.
I have a website that is dynamic in the sense that a lot of data is generated from a database, but the contents of the database changes rarely (about 1-3 times a week). These changes are manual and controlled.
Instead of having the overhead of a dynamic website, I prefer to use a static pages. I'm debating what is the best solution:
curl/wget/spider
This question mentions it. The disadvantages I see might be:
manual clean up needed (links, missing images, etc.)
cannot mix static and dynamic pages
proxy
I could use a proxy to cache the static pages for a certain number of days. Disadvantages:
hard to manage the cache of each page
need to clear the cache after each manual change?
Use program to generate static pages
My current choice: I use perl programs to generate static pages from dynamic content. This doesn't scale very well as I have to hard code a lot of HTML, especially the page structure
Any other ways to do it? What would you/do you prefer?
Memcache base full-page cache with long expire time. Tag extension could allow you to invalidate only selected range of pages.
Any particular reason you want to do it this way instead of just setting up a database caching solution to stop the queries from actually having to hit the database?
Whether it's possible or not depends on the amount of dynamic data that's on your site, and the amount of memory available in your server, but it wouldn't have any of the problems you're worried about.
I would do it the same way you're doing it right now, using a script to generate static pages. You can use a templating system to avoid having to write new HTML every time.
You have not mentioned how important it is to show the changed data as soon as possible to your user.
We have used proxy cache successfully for our website to handle dynamic pages which gets lots of hits. Depending upon how soon we want the updated data to be seen by customer we kept different cache age for each categories.
We have a web application which contains a bunch of content that the system operator can change (e.g. news and events). Occasionally we publish new versions of the software. The software is being tagged and stored in subversion. However, I'm a bit torn on how to best version control the content that may be changed independently. What are some mechanisms that people use to make sure that content is stored and versioned in a way that the site can be recreated or at the very least version controlled?
When you identify two set of files which have their own life cycle (software files on one side, "news and events" on the other, you know that:
you can not versionned them together at the same time
you should not put the same label
You need to save the "news and event" files separatly (either in the VCS or in a DB like Ian Jacobs suggests, or in a CMS - Content Management system), and find a way to link the tow together (an id, a timestamp, a meta-label, ...)
Do not forget you are not only talking about two different set of files in term of life cycle, but also about different set of files in term of their very natures:
Consider the terminology introduced in this SO question "Is asset management a superset of source control" by S.Lott
software files: Infrastructure information, that is "representing the processing of the enterprise information asset". Your code is part of that asset and is managed by a VCS (Version Control System), as part of the Configuration management discipline.
"news and events": Enterprise Information, that is data (not processing); this is often split between Content Managers and Relational Databases.
So not everything should end up in Subversion.
Keep everything in the DB, and give every transaction to the DB a timestamp. that way you can keep standard DB backups and load the site content at whatever date you want if the worst happens.
I suppose part of the answer depends on what CMS you're using, and how your web app is designed, but in general, I'd regard data such as news items or events as "content". In other words, it's not part of your application - it's the data which your application processes.
Of course, there will be versioning issues between your CMS code and your application code. You could manage this by defining the interface between the two. Personally, I'd publish the data to the web app as XML, which gives you the possibility of using XML schema to define exactly what the CMS is required to produce, and what the web app should expect to process.
This ought to mean that most changes in the web app can be made without a corresponding alteration in the rendering of the data. When functionality changes require this, you can create a new version of the schema and continue to make progress. In this scenario, I'd check the schema in with the web app code, but YMMV.
It isn't easy, and it gets more complicated again if you need additional data fields in your CMS. Expect to plan for a fairly complex release process (also depending on how complex your Dev-Test-Acceptance-Production scenario is.)
If you aren't using a CMS, then you should consider it. (Of course, if the operation is very small, it may still fall into the category where doing it by hand is acceptable.) Simply putting raw data into a versioning system doesn't solve the problem - you need to be able to control the format in which your data is published to the web app. Almost certainly this format should be something intended for consumption by software, and therefore not usually suitable for hand-editing by the kind of people who write news items or events.