I believe Wordpress stores multiple entries of posts as "revisions" but I think thats terribly inefficient use of space?
Is there a better way? I think gitit is a Wiki that uses GIT for version control, but how is it done? eg. my application is in PHP and I must make it talk to GIT to commit and retrieve data?
So, what is a good way of implementing version control in web apps (eg. in a blog it might be the post content)
I've recently implemented just such a system - which uses the concept of superseded records, together with a previous and current link. I did a considerable amount of research into how best to achieve this - in the end the model I arrived at is similar to the Wordpress (and other systems) - store the changes as a new record and use this.
Considering all of the options available, space is really the last concern for authored content such as posts - media files take up way more space and these can't be stored as deltas anyway.
In any case the way that Git works is virtually identical in that it stores the entire content for every revision except that it will eventually pack down into deltas (or when you ask it to).
Going back to 1990 we were using SCCS or RCS and sometimes with only 30mb of disk space free we really needed the version control to be efficient to avoid running out of storage.
Using deltas to save space is not really worth all of the associated aggravation given the average amount of available storage on modern systems. You could argue it's wasteful of space, however I'd argue that it is much more efficient in the long run to store things uncompressed in their original form
it's faster
it's easier to search through old versions
it's quicker to view
it's easier to jump into the middle of a set of changes without having to process a lot of deltas.
it's a lot easier to implement because you don't have to write delta generation algorithms.
Also markup doesn't fare as well as plain text with deltas especially when editing with a wysiwyg editor.
Keep one table with the most recent version of the e.g. article.
When a new version is saved, move the current over in an archive table and put a version number on it while keeping the most recent version in the first table.
The archive table can have the property ROW_FORMAT=COMPRESSED (MySQL InnoDb example) to take up less space and it won't be a performance issue since it is rarely accessed. Yes, it is somewhat overhead not to only store changesets but if you do some math you can keep a huge amount of revisions in almost no space as your articles are highly compressable text anyway.
In example, the source code of this entire page is 11Kb compressed. That gives you almost 100 versions on 1Mb. In comparison, normal articles are quite a bit smaller and may on average give you 500-1000 articles/versions on 1Mb. You can probbably afford that.
Related
Is there a limit to the no. of versions a content item can have in AEM? I want to retain all the versions of my page. As in, unlimited.
Want to know if AEM has a limit internally after which it automatically removes older versions?
Appreciate any thoughts on this.
Although this is not recommended but you can disable the version manager by configuring the versionmanager.purgingEnabled to false. You will need to configure this as described in the document below:
https://docs.adobe.com/docs/en/aem/6-3/deploy/configuring/version-purging.html#Version Manager
Retaining lots of versions will gradually slow down your instance and result in poor authoring performance as the storage (Tar or Mongo) will grow large with stale data.
It is normally recommended to retain versions by a fixed number of days or fixed number of version counts.
For performance reasons, it is better to backup your AEM instance for older archived versions and rely on a restore function to access those versions.
I was asking this question once to Adobe DayCare and received the similar response like in i.net post - it is possible to disable purging the versioning of the page however it comes with the risk of authoring performance issues - pages can start loading very slowly.
The solutions that were suggested (depending on the requirements):
backing up an instance, which is not the best one if you need to be able to retrieve or compare old content anytime, recover if needed; the disadvantage is that all copy of instance needs to be stored and it needs to be repeated from time to time (when you notice performance issues)
designing and implementing a custom solution with an additional instance that would be responsible for storing these versions - I have no much details on that solution however as I understood, it would require deep analysis how it can be done
if the access to previous content is needed only for historical reasons (no need to retrieve it and publish once again) then taking use of the page to PDF extraction mechanism and storing history in DAM or another place; you can then also consider saving to PDF screenshot all page with design (not content only), presenting different browser breakpoints, annotations, etc. depending on requirements
Now(6:13pm Jun 1, 2012): I resign myself to learning git and github so that I can do version control. I won't need to mail copies of the (compressed) code to myself, but I still don't understand the mechanism after a day of looking at this stuff.
I get the SHA1 concept for uniquely identifying a file, and using the first 2 characters fo the hash as a directory name. But I'm still confused on the updates, pointers, merge business.
Previously: I have multiple versions of programs, so I can regress to an earlier one to solve a problem.
I used to like to compress the one I was using, and send it to myself via email, but today when I did that the compressed version was too small (49 kb instead of 6 mb). So I guess I am referencing the "workspace" (the extension on the app is ".xcworkspace").
I probably shouldn't waste too much time on this problem, since it is merely a backup, but on the other hand, having the full size is an indication that the whole app is self contained, instead of pointers elsewhere that may be inadvertently changed or destroyed.
Is there any way to "undo" my current version to have all the correct data, or is it really tough?
From personal experience I agree with other commentators that Git is the way to go, or even Mercurial. The learning curve bends down after a while especially if the needs are modest.
As to the need for a "Poor Man's Version Control", sometimes you do need one. For example, you work at a employer that does not allow downloading and use of non-corporate software and the centralized VCS is not allowed to be used for Ad Hoc, Experimental, or skunk work.
Related post: poor mans source control zip project files on build
I'm not sure how to get back any changes without knowing more about your set up, but I can recommend that you look into a slightly new setup: your email-an-archive-to-yourself system sounds like a poor man's Revision Control System, except worse than poor, because there are plenty of great RCS tools available for free.
I recommend you spend an hour or so and read about git. If you learn a few commands you can have a complete change history of your project, and jump back to any point in time you like. (And then change history, creating alternate timelines, become your own grandparent, and cause all sorts of problems/adventures.) Most of the time version control is used in the context of a development team, but it provides a lot of benefit even for a lone wolf.
I am setting up a simple online cms/editing system with a few multiple editors and would like a simple audit trail with diff, history, comparison and roll back functionality for small bits of text.
Our editors have gotten used to the benefits of using XML / Svn and I really would like to create a simple version of this in my system.
I realise I could probably create my own using say, a versions / history db with linked ids like this but I wondered if this is the best way or if there is an equivalent to an Svn api style interface available?
Btw I am totally new to Mongodb so go easy on me :-)
Cheers
Putting the data that create the database is not a good idea since it consits only of binary data. Additionally, this is rather huge in the beginnging since MongoDB allocates some disk space for it. So you have no benefit of putting the data folders under version controll.
If you want to track changes, you could export the data into its serialized form and store it in your VCS. If this is getting bigger, the advantage of the VCS may also drop since it will become very slow.
I assume you need to track the changes from within the data but since you deal with binary data, you are out of luck.
I know that a VCS is absolutely critical for a developer to increase productivity and protect the code, no doubts about it. But what about a designer, using say, Photoshop (though it's not specific to any tools, just to make my point clearer).
VCSs uses delta compression to store different versions of files. This works very well for code, but for images, that's a problem. Raster image files are binary formats, though vector image files are text (SVG comes to my mind) and pose to problem. The problem comes with .psd files (and any other image "source" file) - those can get pretty big and since I'm not familiar with the format, I'll consider them as binary files. How would a VCS work in this condition?
The repository could be pretty darned big if the VCS server isn't able to diff the files efficiently (or worse, not at all) and over time this can become a really big pain when someone needs to check out the repository (or clone it if using a DVCS).
Have any of you used a VCS for this purpose? How well does it work? I'm mostly interested in Mercurial, though this is a general situation that applies to any VCS.
Designers usually use specialized tools like AlienBrain, Adobe VersionCue or similar, which are essentically Version Control Systems that understand Images and other Media Assets, which allows stuff like diffing two images.
Designers IMHO should definitely use VCS systems, at least as a means for Versioning and backup - their stuff is just as important as Specs, Documentation, Code, Deploy Scripts and everything else that makes a project.
I do not know if there are bridges between "Asset Management Systems" like the mentioned ones and Developer VCS' systems though.
Version Control Systems are useful for ANYONE that is doing work that they might need an older version of at some later date. That said, I have set up all my creative friends with Subversion (in the past) and now I recommend Git. Even those that are doing Video editing with hundreds of gigs of video. They can archive off the projects when they get final payment. Drive space is CHEAP, cheaper than ever before, size isn't an issue in any modern VCS. Being able to revert back to a previous working state or experiment with something without losing data and manually managing multiple "temp" directories is invaluable if you bill by the hour.
Yes
Don't worry about the size, if you run out of space, just buy a larger hard drive.
Losing information will be far costlier.
In addition to a VCS (any will do, as you won't be needing delta storage), do regular backups.
When doing checkout you shouldn't be standing on the root of the system, but rather on a specific branch to your project, that way it won't be slower than any simple copy operation of that folder.
Definitely recommend using Version Control for any type of file you care about, or can't afford to loose. Disk space is cheap, and as has already been pointed out it'd be far worse to loose a bunch of important files than to spend a few extra bucks on a new HDD. I recommend Subversion since it has file locking, an important feature when working with binary files and version control to prevent ugly or impossible merge conflicts.
I believe so. Especially if you wish to track changes over time or need to rollback to previous versions. Centralized source control may be the way to go if you're worried about the size.
I am looking into improving the backup process a group of animators use. Currently they back up their work into external hard drives or DVDs manually, taking full copies of everything. The data consists of thousands of high resolution images, project files of various video editing software and sound files. Basically everything is binary data and nothing should ever be merged on checkin.
Should I investigate version control systems that I would use as a software developer (Subversion, GIT etc.), or is there a class of version control systems intended for non-SW data that would suit these needs better?
You could also check out AlienBrain. Its a project asset management system designed for artists.
If your scope is just "backup" then I'd say stick to backup solutions.
But if you are thinking about the whole lifecycle of the animator's work, then the type of use typically falls into the "Digital Asset Management" category for the very reasons you mention: huge data volumes; binary formats.
Since version control (SCM) software is usually designed for text files that can be diff'd and merged, they tend not to do so well with binary formats in high volume. While your average web graphics are not going to be an issue for (software) version control tools, you mention video, which puts you in another league.
The bad news (maybe - depends on your business) is that DAM is dominated by the big end of town. #Atmospherian has mentioned AlienBrain which is a good representative of niche offering for artists. At the other end of the spectrum you have more general purpose offerings like Oracle's UCM (formerly Stellent). Make sure you check the price tags though.
There must be open source or lower cost alternatives available - but I don't know them, sorry.
What does seem to be very common are custom inhouse solutions. Unlike managing code, where changes to the files themselves have their own significance, managing digital assets tends to focus on the metadata (the image/video is just an associated blob). And since since many shops have their own particular production workflow, it makes the territory ripe for some skunkworks programming (if that's your bent - go for it!).
So while I'm not recommending any particular products, I suggest if you think "digital asset management" rather than "version control" when scouting for solutions you will probably find answers more suited to your needs.
Your question is a little unclear - you seem to have conflated version control and backup.
If what you want is version control, then take a look at the list on wikipedia: Comparison of revision control software. That shows most of the widely known version control systems, and their basic features. You're looking for something where you can set it up to force user's to checkout before they edit. Be aware that commercial solutions range in price from moderately expensive up to 'You want HOW much?'
If what you want is backup software, then I'd start at List of backup software in wikipedia. There's a lot more choices in the backup software arena, and there are a lot of price points.
Either way, figure in the creation of a admin position (either as part of someone's job or a new person altogether, if you're big enough). I've worked with backup and version control systems that didn't have an admin and it's a problem. Either no one takes care of problems, or everyone gets their fingers in there and really screws things up. Either way, making it part of someone's job (officially) is the best way to limit damage.
I think Clearcase would work for you.The reason being everything is VOB(VersionedObject) no matter what it is ! Check once
From your description, it sounds like you would do pretty well with some basic backup software such as Retrospect. Using daily backups of workstations, only changed data would be backed up and it would be easy to roll back to an earlier version of a file if needed.
What you don't get from such a setup is the ability to check out / check in files and get warnings about conflicts.
Vidyatel has an editing software that can compere video content and find the difference between the video versions leaning on the video only.
The result is in - EDL/TC.
It might help.
You should take a look at boar. It is exactly what you want, "version control and backup for photos, videos and other binary files". It is version control designed for large binary files.