Are there version control systems that allow you to permanently delete files? - version-control

I need to keep under version some large files (some Gigs).
I don't need, and I can't keep under version all the version of the files.
I want to be able to remove from my VCS large files version at some moment.
The files that I want to keep under version control are big .zip files or ISO images.
These files may contains executable software or data (seismic data, SAR images, GNSS data) and they are provided by the software supplier of my company.
What control version system could I use?

In CVS you can do that by removing the files from the repo. Subversion allows that by dumping the content of the repo and filter it to remove the files (that is a bit cumbersome). Perforce has an obliterate command for that. Many of the newer distributed VCS make it rather difficult by their usage of hashes all over the places and the fact that your repo may have been replicated elsewhere also complicate things. Hg has a strip command (part of the Mq extension), Git can also do that I think.

I don’t think there’s any version control system that allows you do that regularly because that goes against everything version control systems stand for.

Perforce generally allows files to be put in two way, as head revision only (so, you'd only every have one copy) or all revisions. Perforce does have the admin level obliterate command that can be used to delete revisions. Its up to you to query for a list of files, possibly by date or number of revisions, and to specify the revisions to the obliterate command. As the name suggests obliterate deletes the revisions permanently from the database, so, I always generate scripts to do this and review them before running them. If the obliterate command is NOT run with the -Y flag, it will generate a list of what would be obliterated, also very useful.

Somehow I get the impression that you should not use a version control system at all. As said before, what you're trying to do goes against everything you would need a version control system for in the first place.
I suggest you create a file system directory structure that makes sense for what you're trying to accomplish and so that you can structure your data. And just make backup's of those files.

TFS has a destroy command that you can use to permanently delete files or revisions as you see fit.
There is more information at this MSDN article.

Many version control systems allow you to configure them in a way so that they store only the differences between several versions of a file and save space through that.
For example if you have a 1Gig file committed, change a part of it and commit it again, only the changed part will be stored in the version control system.
There won't be 2Gigs used (initial and new file) but only 1Gig+sizeOfChanges.
There's just one downside:if you're storing files which change their whole content from revision to revision this can also be counter-productive as the changes take almost the same space as the original version. Archive files are a example for such files where only a small change in the (real) content can lead to a completely changed content of the archive file.
I'd suggest to test several version control systems on your own and with your specific needs and environment and monitor each one at the server-side how the storage requirements for each system changes.

Some distributed version control systems allow to create "checkpoints" that allow you to use this version as kind of a base revision and safe you from pulling all the history before the checkpoint on every checkout. So you can remove the big files, create a checkpoint, and checkout/clone the repository from that checkpoint to a new directory. Then you have there a new, small repository, but without the history before the checkpoint. It you don't need that history you can burn the old repository on CD and use the new, partial one from now on.
I've only tested it in darcs, and there it works, but YMMV depending on version control system and use cases.

It sounds to me like you need an intelligent backup system, rather than version control.
I use SyncBackSE; it allows you to keep a number of previous versions, and can also do things like "ignore all files changed more than 30 days ago".
It's one of the few bits of paid-for software I use. I think it's worth checking out.

I think you're talking about something like "AlienBrain" "bucket" system, aren't you? The ability to remove some revisions from version control.
If you want to destroy an item, it's normally called "obliterate" and it's supported by a number of systems out there.
Buckets, AFAIK are supported by:
AlienBrain
Accurev
PlasticSCM

I would save such files under a unique name (datestamped, perhaps), and perhaps additionally make a textual reference to the external file in the version control system.

Fossil allows you to do this via the "shun" mechanism. Fossil being a distributed SCM, however, means that this does not affect all repositories (for obvious reasons).

Related

Avoid Mercurial adding Local/Other tags to original file when merging

I am using mercurial via tortoiseHg (windows) as a source control management tool.
I am used to merge using beyond a compare. Today, I have to perform a very complex merge and I just discovered a new feature (my client was updated some days ago) that is extremely annoying.
When I have a conflit and ask Mercurial to take the "other" file and keep the original in a .orig file, the .orig is added with <<<<<<< local and >>>>>>> other, but more than this, the other part is merged into the original one !!!
The two parts are then unaligned and it's impossible to guarantee that the merge is OK because you have to review it line by line with no help from the comparision tool. (see screen below).
http://s13.postimg.org/yor6gno47/Untitled.jpg
I want to disable this feature, but so far, I am unable to do it. Thanks so much for help as this is furthermore blocking my work.
Regards.
The launching of a specific merge tool isn't something Mercurial controls. It does, however, have a robust mechanism for Merge Tool Configuration that allows you to provide a preference order and it will use the first one it can find. The builders of various Mercurial installation packages (ubuntu, etc.) and tools that include Mercurial (TortoiseHG, etc.) all provide their own Merge tool configuration preference list.
Either the old merge tool configuration list you had not longer points to Beyond Compare at the right location (upgraded BC and the directory name changed, etc.) or you got a new merge tool configuration list when you updated some software that included mercurial. Either way that page on MergeToolConfiguration will help you find your preference list in your hgrc files and update or correct it.
Tl;Dr: this isn't a "new" feature it's your new installation being less tailored for your system than your old one. Maybe find who packaged that one and copy the merge tool config.

How can I do a > 5GB commit to Mercurial?

I'm trying to import an existing project into Mercurial. The project is a bit over 5GB.
When I try to do an hg push I always get an error about being out of buffer space.
Does anyone know of a good way of doing the initial commit?
If you are not tied down to using Mercurial, then another possibility would be to use boar. It is not a DVCS like Mercurial, instead you have a central repository in which you store your data, and "check out" versions of files - in much the same way as with Subversion.
The important part is that it is written with the express purpose of storing large, binary files.
I have not used it, so I cannot comment on how good it is at its job, or how stable it is, but it is a possible alternative that may well suit your needs.
For a brief explanation of why storing binary files in mercurial is discouraged please read
https://www.mercurial-scm.org/wiki/BinaryFiles and http://kiln.stackexchange.com/questions/1074/why-is-it-bad-to-store-binary-files-in-mercurial
In our case we handle binary files using Dropbox. It allows you to both keep the history of files and sync the folder between team members. If you don't need to keep history of files, you can use rsync to keep binaries sync'ed.
Assuming you do actually need to put such a large commit into Mercurial, I would guess that rather than a few million tiny files, the size of your commit is primarily due to a handful of biiiig files. In this case you could investigate the Large Files Extension, which should suit your needs. When you add a large file, it is tracked by checksum rather than content, so what Mercurial itself tracks is relatively small. The extension will take care of the versions for you.
However, as Alex Stuckey mentions, you shouldn't normally be committing things such as compiled binaries (object code, resulting executables, ...), which are the most likely reason you have such a big commit. You would do well to create a decent .hgignore file (one that removes the usual suspects - *.o, *.pdb, whatever, ...), which will help eliminate accidentally adding files like that in the future. I have a standard .hgignore which gets put into nearly all my repositories as the first commit, and has served me well.

Working with folders in RCS

I have been following the tutorial http://www.burlingtontelecom.net/~ashawley/rcs/tutorial.html on how to work with files using RCS. This works well but only with one file. Is there a way to create an RCS file with directories as well?
I have a project folder called myproject, and in this directory I have all my files for that project. I want to create a revision control system for the myproject folder and all its files that are inside.
As William's comment says, RCS only works with single files. (It also doesn't seem to be particularly suitable for multiple-user stuff.)
Of course, nothing stops you from putting each (source) file in a directory under RCS control; in fact, this is essentially what CVS does (though in recent versions it handles the RCS data itself, rather than invoking RCS to do it as it used to do). Unfortunately, this fragments the change history rather badly; a commit affecting many files ends up as separate commits to each file, which just happen to have the same commit message (and timestamp?), and in general every file will have a different revision in what the user might like to think of as the "same" revision. (This makes tags quite essential.) CVS also has issues with the atomicity of commits: you could end up with commit A and commit B getting tangled up, such that in file foo commit A precedes commit B, but in file bar commit B precedes commit A!
SVN (Subversion) is an attempt to rectify some of the problems in CVS, though it also brings some new limitations, and keeps many of the existing ones; it is probably wiser (as William implies) to just use a distributed version control system (DVCS) for your multi-file projects. There are many choices:
Darcs uses a unique patch-based model: a repository is treated as a sequence of patches, which can be applied to an empty tree to build the current revision; patches can often be reordered by "commuting" pairs of patches, and cherry-picking patches from other repositories is quite easy. The downside is that the change history is a bit less clear than in most DVCSes. See http://wiki.darcs.net/Using/Model, http://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theory.
Directed-acyclic-graph (DAG) based DVCSes model a repository as a directed acyclic graph of revisions, where each revision can have one parent, two parents, or perhaps more. Each revision has an associated file tree state; sometimes renames are also tracked somehow.
Git, as already mentioned. Has a very simple model, but a very complicated interface: there are many commands, some of which are not really intended for humans to use (owing to many parts of it having been prototyped in shell script, probably), so it can be hard to find the ones you want. Also, its model might be a bit too simple: it doesn't track renames at all.
Bazaar (a.k.a. bzr) has a more complicated model, including support for file/directory renames. It's difficult to say how much more complicated, though, because whatever documentation may exist is not nearly as accessible as Git's. It does, however, have a rather simpler user interface, and there are a number of useful plugins, including a distributed-development-friendly SVN plugin: committing from a branch back to SVN need not interfere with the validity of others' branches of your branches, and bzr metadata is even committed back to SVN. Can make things much less painful if you want to start hacking on an SVN-based project without having commit access, but hope to get your changes committed eventually. Bazaar is my personal favorite DAG-based DVCS.
Mercurial (a.k.a. hg) seems fairly similar to Bazaar, though I think it tracks renames only for individual files, not for directories. It also supports plugins, though its SVN plugin isn't as nice as Bazaar's: it doesn't support lossless commits, so branching from other peoples' branches is unwise. I don't have much experience with it, so I can't really evaluate it in-depth.
As the comments already mention, if you are starting out with version control, you would be well advised to choose a newer system than RCS (git, mercurial, fossil, subversion, ...). That said, RCS still works fine for a single developer working primarily on a single machine - I still use it for my own code because I've not yet OK worked out how to get the (20+ years of) history I want into git in the way I want it.
Anyway, to use RCS, make sure you have an RCS sub-directory in each directory where you have working source code under RCS management. The RCS files will be placed in the sub-directory automatically, and retrieved automatically. If your version of make is not already aware of RCS, then you can train it so that it is - or get a version of make that does (GNU Make, for example).
TL:DR - Look into DCVS for an alternative of RCS. It uses CVS, which uses RCS, but it's more modular for working in a repository that is distributed, as well as having a hierarchy of directories.
I'm currently going through a similar issue, and may have found something worthy of note, especially for people who are being forced to use a light, command-line based revision control systems with multiple team members.
My manager will not get off this idea of using RCS as our version control. But for the specifications, he wants developers to be able to create and edit on their own repository on a localized server within our company. Two issues with this:
RCS does not create, nor hold any sort of 'repository'. It is software that keeps track of file edits, on a Per File Basis. Meaning that the 'repository' is nothing more than another directory with RCS checked-in files. This is sub-par for team-geared projects, to say the least.
On a large project with multiple directories and tens of individual working files, even the prospect of creating a top-level RCS directory with a symbolic link in the working directories gives rise to complications such as naming conventions, as well as forgetting which file came from which bottom-level / working directory.
With what SamB posted, even CVS gives additional problems with RCS that we now have to account for, but gives us a slight ability for some additional hierarchy. But one suggestion he forgot was DCVS.
It's nothing more than an extension of CVS, CVSup, and:
contains functionality to distribute CVS repositories with local lines of development and automatically handles synchronization of the distributed repositories in the background.

Is there a version control system that is completely invisible to use while writing code?

I would like to use version control but I don't want to continuously commit several times an hour. Is there a version control system that records everything while you program so you don't have to commit, but still lets you go back to a previous state of your code?
Dropbox can do it. It records every change that you make.
You can do something like this if you're using the ZFS filesystem.
However, from a version-control point of view, I really don't think it's a good idea to store every changes. The size of you repository will become huge really fast.
And FYI, you don't have to commit several times an hour, I rarely do more than 4-5 commits a day.
Some operating systems make mounting WebDAV shares into the filesystem very easy; you could configure an SVN server to export WebDAV, mount the export into your filesystem, and get to work.
Don't forget to configure your editor to store temporary files or backup files somewhere other than your source tree or current working directory. Otherwise you'll have a ton of useless files cluttering up your source control system, making it harder to use in the future.
But finding which version to revert to can be pretty difficult without check in comments or changesets linking related changes together; it might not be worth the effort of configuring the entire system if it is too difficult to use to undo specific changes.

Choosing a version control package for document control

We have a legal requirement to ensure the latest version of documents (mainly Word and Excel) are readily accessible. We currently implement document control by manually updating the page footer with a new version number but want a better system.
I've played around with TortoiseSVN and the functionality is good; but the problem is that unless I've missed a configuration variable somewhere, Subversion applies version numbers to the whole project (i.e. every file in the repository), not to the individual files. What I want is to be able to create a folder in the repository and all our documents go in there, and the version numbers of the files are only changed when that particular document is changed. Currently if we had 30 files and each was printed and put on display including the version numbers, if someone went to the repository the version number would almost certainly have changed even if the document contents were identical. Not ideal.
The alternative to this would be to create a new repository for each and every document but the administrative overhead on that will be prohibitive. I'm essentially looking for something that does much of what TortoiseSVN does, but treats files as individual projects with their own independent version number.
Whatever solution we come up with we would want the version of the document to be automatically shown in the page footer of the document. Tortoise can do this with a macro http://insights.oetiker.ch/windows/SvnProperties4MSOffice/.
Appreciate any help, thanks.
Greg
I think what you're actually looking for is a document management system.
Version Control Systems (VCS), such as Subversion (the underlying technology behind TortoiseSVN), are fundamentally unsuited for your task due to their focus on tracking changesets among all files within a given project (i.e. one changeset/version can involve changes to many files within the project).
Another advantage of using a document management system is that they typically allow you to attach extensible metadata attributes to your documents in a much richer way than version control systems, as well as providing search capabilities.