How can I code or use directly a version control system such as Subversion with Mongodb? - mongodb

I am setting up a simple online cms/editing system with a few multiple editors and would like a simple audit trail with diff, history, comparison and roll back functionality for small bits of text.
Our editors have gotten used to the benefits of using XML / Svn and I really would like to create a simple version of this in my system.
I realise I could probably create my own using say, a versions / history db with linked ids like this but I wondered if this is the best way or if there is an equivalent to an Svn api style interface available?
Btw I am totally new to Mongodb so go easy on me :-)
Cheers

Putting the data that create the database is not a good idea since it consits only of binary data. Additionally, this is rather huge in the beginnging since MongoDB allocates some disk space for it. So you have no benefit of putting the data folders under version controll.
If you want to track changes, you could export the data into its serialized form and store it in your VCS. If this is getting bigger, the advantage of the VCS may also drop since it will become very slow.
I assume you need to track the changes from within the data but since you deal with binary data, you are out of luck.

Related

How can revisions/version control be implemented for web apps' data

I believe Wordpress stores multiple entries of posts as "revisions" but I think thats terribly inefficient use of space?
Is there a better way? I think gitit is a Wiki that uses GIT for version control, but how is it done? eg. my application is in PHP and I must make it talk to GIT to commit and retrieve data?
So, what is a good way of implementing version control in web apps (eg. in a blog it might be the post content)
I've recently implemented just such a system - which uses the concept of superseded records, together with a previous and current link. I did a considerable amount of research into how best to achieve this - in the end the model I arrived at is similar to the Wordpress (and other systems) - store the changes as a new record and use this.
Considering all of the options available, space is really the last concern for authored content such as posts - media files take up way more space and these can't be stored as deltas anyway.
In any case the way that Git works is virtually identical in that it stores the entire content for every revision except that it will eventually pack down into deltas (or when you ask it to).
Going back to 1990 we were using SCCS or RCS and sometimes with only 30mb of disk space free we really needed the version control to be efficient to avoid running out of storage.
Using deltas to save space is not really worth all of the associated aggravation given the average amount of available storage on modern systems. You could argue it's wasteful of space, however I'd argue that it is much more efficient in the long run to store things uncompressed in their original form
it's faster
it's easier to search through old versions
it's quicker to view
it's easier to jump into the middle of a set of changes without having to process a lot of deltas.
it's a lot easier to implement because you don't have to write delta generation algorithms.
Also markup doesn't fare as well as plain text with deltas especially when editing with a wysiwyg editor.
Keep one table with the most recent version of the e.g. article.
When a new version is saved, move the current over in an archive table and put a version number on it while keeping the most recent version in the first table.
The archive table can have the property ROW_FORMAT=COMPRESSED (MySQL InnoDb example) to take up less space and it won't be a performance issue since it is rarely accessed. Yes, it is somewhat overhead not to only store changesets but if you do some math you can keep a huge amount of revisions in almost no space as your articles are highly compressable text anyway.
In example, the source code of this entire page is 11Kb compressed. That gives you almost 100 versions on 1Mb. In comparison, normal articles are quite a bit smaller and may on average give you 500-1000 articles/versions on 1Mb. You can probbably afford that.

do any source-control systems use a document database for storage?

One of those questions that's difficult to google.
We were running into issues the other day with speed of our svn repository. The standard solution to this seems to be "more RAM! more CPU!" etc. Which got me to wondering, are there any source-control systems that use a document/nosql database (mongodb, couchdb etc) for database? It seems like it might be a natural -- but I'm no expert on source-control database theory. Perhaps there's a way to configure a more recent source control to use a document db as storage?
None that I know of do, and they wouldn't want to. Given the difference in degrees of testing, it would likely hurt robustness (a really bad thing for a source code repository). It would probably also end up hurting performance, because of the inability to do delta storage.
Note that Subversion has two very different storage mechanisms, one backed by the embedded Berkeley DB, and the other backed by simple files. One or the other of these might be better suited to your usage.
Also, since you posed your question pretty broadly, I'll comment on Git and TFS.
Git uses very efficiently packed files in the filesystem to store the repository. Frequently, the entire history is smaller than a checkout. For one very old project that my lab has, the entire history is 57MiB, and a working tree (not counting history) is 56MiB.
TFS stores a lot (possibly all) of its data in a SQL database.
Git uses memory-mapped files just like MongoDB :)
Though Git doesn't actually use MongoDB and I don't think it would want to. If you look at Git, it doesn't really need a NoSQL DB, it basically is a DB.
As far as i know no of the VCS uses noSQL/document based databases. The idea of using a couchdb etc. is not new...but no one has implemented such a thing till now...

Is a VCS appropriate for usage by a designer?

I know that a VCS is absolutely critical for a developer to increase productivity and protect the code, no doubts about it. But what about a designer, using say, Photoshop (though it's not specific to any tools, just to make my point clearer).
VCSs uses delta compression to store different versions of files. This works very well for code, but for images, that's a problem. Raster image files are binary formats, though vector image files are text (SVG comes to my mind) and pose to problem. The problem comes with .psd files (and any other image "source" file) - those can get pretty big and since I'm not familiar with the format, I'll consider them as binary files. How would a VCS work in this condition?
The repository could be pretty darned big if the VCS server isn't able to diff the files efficiently (or worse, not at all) and over time this can become a really big pain when someone needs to check out the repository (or clone it if using a DVCS).
Have any of you used a VCS for this purpose? How well does it work? I'm mostly interested in Mercurial, though this is a general situation that applies to any VCS.
Designers usually use specialized tools like AlienBrain, Adobe VersionCue or similar, which are essentically Version Control Systems that understand Images and other Media Assets, which allows stuff like diffing two images.
Designers IMHO should definitely use VCS systems, at least as a means for Versioning and backup - their stuff is just as important as Specs, Documentation, Code, Deploy Scripts and everything else that makes a project.
I do not know if there are bridges between "Asset Management Systems" like the mentioned ones and Developer VCS' systems though.
Version Control Systems are useful for ANYONE that is doing work that they might need an older version of at some later date. That said, I have set up all my creative friends with Subversion (in the past) and now I recommend Git. Even those that are doing Video editing with hundreds of gigs of video. They can archive off the projects when they get final payment. Drive space is CHEAP, cheaper than ever before, size isn't an issue in any modern VCS. Being able to revert back to a previous working state or experiment with something without losing data and manually managing multiple "temp" directories is invaluable if you bill by the hour.
Yes
Don't worry about the size, if you run out of space, just buy a larger hard drive.
Losing information will be far costlier.
In addition to a VCS (any will do, as you won't be needing delta storage), do regular backups.
When doing checkout you shouldn't be standing on the root of the system, but rather on a specific branch to your project, that way it won't be slower than any simple copy operation of that folder.
Definitely recommend using Version Control for any type of file you care about, or can't afford to loose. Disk space is cheap, and as has already been pointed out it'd be far worse to loose a bunch of important files than to spend a few extra bucks on a new HDD. I recommend Subversion since it has file locking, an important feature when working with binary files and version control to prevent ugly or impossible merge conflicts.
I believe so. Especially if you wish to track changes over time or need to rollback to previous versions. Centralized source control may be the way to go if you're worried about the size.

Source control system for lazy people?

I've used CVS and SVN. The problem I run into with both of them is that you have to explicitly perform all of the add/delete/update/move etc. operations using a tool that remembers those actions so that they can be committed. Tools like TortoiseSVN make life easier, but not as easy as I would like. IDE integration is nice, too, but I don't like be bound to do everything in an IDE. My problem is that I'll accidentally make updates or rename folders without using the appropriate tools, then my source gets messed up.
Is there a simple source control tool that will let me work however I want in a folder structure, and allow me to sync everything when I'm done?
I realize that this would make some features of traditional source control impossible, but I'd be fine with that.
Newer distributed source control systems like GIT, Bazaar, and Mercurial (aka: Hg) all tend to be better at detecting broad changes done in the file system such as moving directories, renaming files and even replacing large chunks of the file system.
From my reading, Bazaar and Mercurial were essentially built from the ground up to handle this sort of free-form editing specifically because of how explicit SVN required you to be.
Dropbox is a nice versioned source control-style app that requires a little setup and zero maintenance. Reverting to old versions isn't as efficient as the usual source control, and it tracks all your changes (you can't choose whether to commit).
Basically, you create a Dropbox folder on your computer, and everything you save is automatically synchronized. It's pretty fast (it reacts in minutes, not hours), and you get a gig or two of space.
So, you get less control, but it's super easy. I personally use it for my Password Safe database and my "to-do" list, so I can access them from any computer.
Even though you've had some issues in the past with SVN. I still think its the way to go. VisualSVN server provides a simple interface for setting everything up and there are a ton of free or low cost tools to use.
While this does not answer the question itself I'd strongly recommend training yourself in using the version control system of your choice for everything. Once using the VC system became a habit there isn't much risk in messing up your source folder accidentally anymore (of course, the human factor still remains) and you get much better logs when explicitly working version-controlled than any tool doing automatic change-tracking.
On Windows, TortoiseSVN is a really nice tool that makes it quite easy to not overlook the VC system.
(to give an analogy of training one self in using a VC system: if you manage to get yourself trained using touch-typing you don't know anymore how you managed to live without it. Happened to me several years ago when I decided that I should learn it at least to be able of judging the system. I really don't understand how I typed before.)
As for the renaming issue you might want to look into git. It can cope with renames / moves without using the VC system itself.

need to implement versioning in Online backup tool

I am working on the developement of a application that will perform online backup of the files and folder in the PC, automatically or manually. Currently, I was keeping only the latest version of the file at the server.Now, I have to implement the versioning so that only the changes can be transfered to the online server and user must be able to download any of the available version of the file at Backup Server.
I need to perform Deduplication for this. Guys, though I am able to perform it using the fixed block size but facing an overhead of transferring the file having CRC information with each version backup.
I have never worked on such technology , so lacks in experience. I am eager to know is there any feasible method to embedd this functionality in the application without much pain. Is any third party tool would help to perform same thing? Please let me know?
Note: I am using FTP protocol to transfer the data.
There's a program called dump that does something similar, but it operates on filesystem blocks rather than files. rsync also may be of interest.
You will need to keep track of a large number of blocks with multiple versions and how they fit into the various versions of the original files, so you will need some kind of database to track this information, and an efficient way to query it to determine which blocks in a given file need to be transferred. Also note that adding something to the beginning of a file will cause all your blocks to be "new" if you use a naive blocking and diff scheme.
To do this well will be very complex. I highly recommend you thoroughly research already-available solutions, and if you decide you need to write your own, consider the benefits of their designs carefully.