Small Shop, Why DVCS? - version-control

We have a small programming shop of at most 5 people working on a single project. I fully grok why DVCS is better for open source projects, and for large companies, but what advantages does it have for smaller companies other than "you can work on the airplane." Which would require extra SA work to make sure that our repositories on DEV boxes was properly backed up every night.
We also a have several non technical people (artists, translators) who can (sort of) deal with SVN, in peoples experience how much training is required to get them to move to a DVCS?

I'm going to speak from my experience, which is primarily with SVN and Hg, often working with designers and programmers who are not comfortable with version control.
My big beef with SVN and other CVCS's I've used is that they block you from making commits, not just when the network is down, but also in case of a conflict (or worse, someone locking a file so no one else can make changes to it!). You could of course commit to a branch, but between network bandwidth required to switch branches and the pain involved in merging, you still have a problem.
Of course, SVN blocks you from committing conflicted files so that you don't accidentally overwrite someone else's work; SVN requires you to at least acknowledge that you know one version or the other (or a custom combination of the two) is right. Mercurial, however, has a better solution (2, actually):
1. You can always commit to the local repository now and merge later. (All DVCS's have this feature.)
2. Even if you pull or push conflicting changes, instead of being blocked from committing, you have multiple heads via anonymous branches. (Sorry I can't really explain this in detail here, but you can google it.)
So your workflow goes from:
1. Get the latest, make changes, test them.
2. Get the latest, resolve conflicts, test the result.
3. Commit.
and becomes:
1. Get the latest, make changes, test them.
2. Commit (so you have a place to fall back to).
3. Get the latest, resolve conflicts, test the result.
4. Commit and push.
That extra commit means you're doing less work per commit, so you have more checkpoints to fall back on. And there are other ways you can make more commits without getting in others' way.
SVN is just slow enough to break my concentration and tempt me to go to facebook; mercurial is fast and git is faster. The speed issue becomes very important when reviewing a log or changes to a working copy. With TortoiseHg, I can click through a list of files, and instantly see changes to that file; it takes about a couple seconds per file with TortoiseSVN + WinMerge (not sure how much of this is due to the DVCS). The more I use these tools, the more I feel that a VCS needs to be fast, just like a text editor or a mouse cursor--fast enough that you shouldn't require the network to do it.
Subjectively, I find TortoiseHg to be a heck of a lot easier to use than TortoiseSVN (or other tortoises I've used). TortoiseHg is mutli-platform, too. :)
One more thing: As I understand, a SVN working copy is defined recursively: Each folder is a working copy. This allows you to do some fancy-pants stuff (e.g. having a working copy that contains folders from disparate locations in the repository). I don't know if Hg has a similar feature, but in my experience, SVN's implementation of this feature causes only problems where I work, especially for those not quite comfy with SVN. When they copy-and-paste a WC folder on their machine via the OS shell instead of via svn copy, it goofs up their WC. I've goofed up my WC this way as well. It's less of a problem with Hg--you normally work with an entire repo at once, whether you clone, update, or commit.

SVN improved a lot concerning merging since its release. But it still lacks file rename tracking, often resulting in tree conflicts. Renaming is the killer app of distributed version control tackles this issue, adding some interesting links in the comment section.
DVCS lets you push to a central repository, at the cost of one additional command compared to Subversion. Occasional users should be able to adapt to this minor change in workflow. But giving the freedom of 'local' commits and branches to power users without cluttering a central repository.
Concerning tooling, which might be of importance for user acceptance, Mercurial is on par with Subversion.

Related

revision control for many unrelated files

I'm curious to get people's thoughts on how to manage version control for unrelated functions in Matlab.
I keep a reasonably large set of general purpose scripts, each of which is more or less independent of the others. I've been keeping them all in a single directory, containing a single repository in Mercurial. I'm starting to collaborate much more, and I'd like my collaborators to be able modify the files, commit, branch, and merge.
The problem is that the files are independent of one another. Essentially, they're like many separate little projects. But Mercurial treats the repository as a single entity. So if a collaborator modifies file A and B, and I only want to merge in the changes from file A, things get complicated. I know that I could merge from the collaborator, then revert file B, but I'm wondering if there's a simpler way to handle this setup.
I could set up many tiny repositories to manage each file separately, but that also gets complicated.
I'm open to changing version control systems (although I like Mercurial a lot). Any suggestions?
It is considered a best practice to check in code after each bug fix/feature addition/or what not. Given your files are really independent "projects" it seems unlikely a bug or feature would span multiple files. Probably the best you can do is encourage your colleagues in best practices to commit changes only for a single file at once. Explain that better discipline about checking in leads to more manageable source control later. Hopefully you can get most to follow the practice and the few obstinate ones just stop taking their commits for.
It really depends on your typical reasons for merging one change but not the other. If you're using it to create a software configuration, i.e. sometimes you want to use version 1 of file A and version 2 of file B and sometimes it's the other way around, then you probably want to use subrepos to hold each file. If it's because you never want to accept part of a collaborator's change, then they need to be instructed how to make their changes more cohesive and submit them separately. That can sometimes be a difficult concept for people who either haven't used source control before, or who are accustomed to source control like svn that has little or no intrinsic concept of a changeset.
It depends whether you want to maintain a single 'master' version of the files, merging in changes that you like and ignoring others. If collaborators want to develop other branches, then they should perhaps clone the repository, and you can then accept the changesets that you want in the master.
If you want to veto changes by other collaborators, then the changes either need to be kept separate (via a cloned repository or branch) or you need a review process before changes are pushed back to the trunk.
I always use incoming repositories for collaborators. They match what the other person has made, but it avoids messing with my own repository. When you do this, you can then cherrypick their new changesets into your own repository with the transplant extension.

Keeping experimental history out of shared repository in Mercurial

I'm fairly new to Mercurial, but one of the advantages I see using Mercurial is that while writing a feature you can be more free to experiment, check in changes, share them, etc, while still maintaining a "clean" repo for the finished feature.
The issue is one of history. If I tried 6 different ways to get something to work, now I'm stuck with all of the history for all my mistakes. What I'd like to do is go through and clean up my changes and "collapse" them into one changeset that can be pushed into a shared repository. This is complicated by the fact that I might pull in new changesets from the shared repository, and have those changesets intermingled with my own.
The best way I know of to do that is to use hg export to create a patch of my changes since cloning, clone a fresh repository, and apply the patch to the fresh repository.
Those steps seems a little bit cumbersome and easy to mess up, particularly if this methodology is rolled out to the whole dev team, some of whom are a little resistant to change (don't get me started). TortoiseHg makes the process slightly better since you can highlight the changesets you want to be included in an export.
My question is this: Am I making this more complex than it needs to be? Is there a better workflow I can use to ease my troubles? Is it too much to expect a clean history where entire (small-ish) features are included in one changeset?
Or maybe my whole question could be summed up this way:
Is there an equivalent for this in mercurial? Collapsing a git repository's history
Although I think you should reconsider your use of branches in Mercurial (as per my comment on your post), using named branches doesn't really help with your concern of maintaining useless or unnecessary history - it just organizes them a bit.
I would recommend a combination of these tools:
mercurial queues
histedit (not distributed with Hg)
the mq changeset strip feature
to rework a messy history before pushing to a blessed or master repo. The easiest thing would be to use strip to permanently remove any changeset with no children. Once you've done that you can use mq or histedit to combine, relocate, or modify existing commits. Histedit will even let you redo the comment associated with a changeset.
Some pitfalls:
In your opening paragraph you mention sharing changesets during feature development. Please understand that once you've shared a changeset it's not a good idea to modify using mq or histedit, or strip. Using these extensions can result in a change to the revision hash, which will make them look like a new changeset to everyone else.
Also, I agree with Paul Nathan's comment that mq (and histedit) are power features and can easily destroy a history. It's a good idea to make a safety clone before using these extensions.
Named branches are the simplest solution. Each experimental approach gets its own branch.This retains the history of the experiments.
The next solution is to have a fresh clone for each experiment. The working one gets pushed back to the main repo.
The next solution - and probably what you are really looking for - is the mq extension, which can "squash" a series of patches into a single commit. I consider mq to be "advanced", and "subject to accidently shooting yourself in the foot". I also don't care to squash my commits - I like having my version history present for reference.

Working with folders in RCS

I have been following the tutorial http://www.burlingtontelecom.net/~ashawley/rcs/tutorial.html on how to work with files using RCS. This works well but only with one file. Is there a way to create an RCS file with directories as well?
I have a project folder called myproject, and in this directory I have all my files for that project. I want to create a revision control system for the myproject folder and all its files that are inside.
As William's comment says, RCS only works with single files. (It also doesn't seem to be particularly suitable for multiple-user stuff.)
Of course, nothing stops you from putting each (source) file in a directory under RCS control; in fact, this is essentially what CVS does (though in recent versions it handles the RCS data itself, rather than invoking RCS to do it as it used to do). Unfortunately, this fragments the change history rather badly; a commit affecting many files ends up as separate commits to each file, which just happen to have the same commit message (and timestamp?), and in general every file will have a different revision in what the user might like to think of as the "same" revision. (This makes tags quite essential.) CVS also has issues with the atomicity of commits: you could end up with commit A and commit B getting tangled up, such that in file foo commit A precedes commit B, but in file bar commit B precedes commit A!
SVN (Subversion) is an attempt to rectify some of the problems in CVS, though it also brings some new limitations, and keeps many of the existing ones; it is probably wiser (as William implies) to just use a distributed version control system (DVCS) for your multi-file projects. There are many choices:
Darcs uses a unique patch-based model: a repository is treated as a sequence of patches, which can be applied to an empty tree to build the current revision; patches can often be reordered by "commuting" pairs of patches, and cherry-picking patches from other repositories is quite easy. The downside is that the change history is a bit less clear than in most DVCSes. See http://wiki.darcs.net/Using/Model, http://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theory.
Directed-acyclic-graph (DAG) based DVCSes model a repository as a directed acyclic graph of revisions, where each revision can have one parent, two parents, or perhaps more. Each revision has an associated file tree state; sometimes renames are also tracked somehow.
Git, as already mentioned. Has a very simple model, but a very complicated interface: there are many commands, some of which are not really intended for humans to use (owing to many parts of it having been prototyped in shell script, probably), so it can be hard to find the ones you want. Also, its model might be a bit too simple: it doesn't track renames at all.
Bazaar (a.k.a. bzr) has a more complicated model, including support for file/directory renames. It's difficult to say how much more complicated, though, because whatever documentation may exist is not nearly as accessible as Git's. It does, however, have a rather simpler user interface, and there are a number of useful plugins, including a distributed-development-friendly SVN plugin: committing from a branch back to SVN need not interfere with the validity of others' branches of your branches, and bzr metadata is even committed back to SVN. Can make things much less painful if you want to start hacking on an SVN-based project without having commit access, but hope to get your changes committed eventually. Bazaar is my personal favorite DAG-based DVCS.
Mercurial (a.k.a. hg) seems fairly similar to Bazaar, though I think it tracks renames only for individual files, not for directories. It also supports plugins, though its SVN plugin isn't as nice as Bazaar's: it doesn't support lossless commits, so branching from other peoples' branches is unwise. I don't have much experience with it, so I can't really evaluate it in-depth.
As the comments already mention, if you are starting out with version control, you would be well advised to choose a newer system than RCS (git, mercurial, fossil, subversion, ...). That said, RCS still works fine for a single developer working primarily on a single machine - I still use it for my own code because I've not yet OK worked out how to get the (20+ years of) history I want into git in the way I want it.
Anyway, to use RCS, make sure you have an RCS sub-directory in each directory where you have working source code under RCS management. The RCS files will be placed in the sub-directory automatically, and retrieved automatically. If your version of make is not already aware of RCS, then you can train it so that it is - or get a version of make that does (GNU Make, for example).
TL:DR - Look into DCVS for an alternative of RCS. It uses CVS, which uses RCS, but it's more modular for working in a repository that is distributed, as well as having a hierarchy of directories.
I'm currently going through a similar issue, and may have found something worthy of note, especially for people who are being forced to use a light, command-line based revision control systems with multiple team members.
My manager will not get off this idea of using RCS as our version control. But for the specifications, he wants developers to be able to create and edit on their own repository on a localized server within our company. Two issues with this:
RCS does not create, nor hold any sort of 'repository'. It is software that keeps track of file edits, on a Per File Basis. Meaning that the 'repository' is nothing more than another directory with RCS checked-in files. This is sub-par for team-geared projects, to say the least.
On a large project with multiple directories and tens of individual working files, even the prospect of creating a top-level RCS directory with a symbolic link in the working directories gives rise to complications such as naming conventions, as well as forgetting which file came from which bottom-level / working directory.
With what SamB posted, even CVS gives additional problems with RCS that we now have to account for, but gives us a slight ability for some additional hierarchy. But one suggestion he forgot was DCVS.
It's nothing more than an extension of CVS, CVSup, and:
contains functionality to distribute CVS repositories with local lines of development and automatically handles synchronization of the distributed repositories in the background.

The theory (and terminology) behind Source Control

I've tried using source control for a couple projects but still don't really understand it. For these projects, we've used TortoiseSVN and have only had one line of revisions. (No trunk, branch, or any of that.) If there is a recommended way to set up source control systems, what are they? What are the reasons and benifits for setting it up that way? What is the underlying differences between the workings of a centralized and distributed source control system?
Think of source control as a giant "Undo" button for your source code. Every time you check in, you're adding a point to which you can roll back. Even if you don't use branching/merging, this feature alone can be very valuable.
Additionally, by having one 'authoritative' version of the source control, it becomes much easier to back up.
Centralized vs. distributed... the difference is really that in distributed, there isn't necessarily one 'authoritative' version of the source control, although in practice people usually still do have the master tree.
The big advantage to distributed source control is two-fold:
When you use distributed source control, you have the whole source tree on your local machine. You can commit, create branches, and work pretty much as though you were all alone, and then when you're ready to push up your changes, you can promote them from your machine to the master copy. If you're working "offline" a lot, this can be a huge benefit.
You don't have to ask anybody's permission to become a distributor of the source control. If person A is running the project, but person B and C want to make changes, and share those changes with each other, it becomes much easier with distributed source control.
I recommend checking out the following from Eric Sink:
http://www.ericsink.com/scm/source_control.html
Having some sort of revision control system in place is probably the most important tool a programmer has for reviewing code changes and understanding who did what to whom. Even for single person projects, it is invaluable to be able to diff current code against previous known working version to understand what might have gone wrong due to a change.
Here are two articles that are very helpful for understanding the basics. Beyond being informative, Sink's company sells a great source control product called Vault that is free for single users (I am not affiliated in any way with that company).
http://www.ericsink.com/scm/source_control.html
http://betterexplained.com/articles/a-visual-guide-to-version-control/
Vault info at www.vault.com.
Even if you don't branch, you may find it useful to use tags to mark releases.
Imagine that you rolled out a new version of your software yesterday and have started making major changes for the next version. A user calls you to report a serious bug in yesterday's release. You can't just fix it and copy over the changes from your development trunk because the changes you've just made the whole thing unstable.
If you had tagged the release, you could check out a working copy of it and use it to fix the bug.
Then, you might choose to create a branch at the tag and check the bug fix into it. That way, you can fix more bugs on that release while you continue to upgrade the trunk. You can also merge those fixes into the trunk so that they'll be present in the next release.
The common standard for setting up Subversion is to have three folders under the root of your repository: trunk, branches and tags. The trunk folder holds your current "main" line of development. For many shops and situations, this is all they ever use... just a single working repository of code.
The tags folder takes it one step further and allows you to "checkpoint" your code at certain points in time. For example, when you release a new build or sometimes even when you simply make a new build, you "tag" a copy into this folder. This just allows you to know exactly what your code looked like at that point in time.
The branches folder holds different kinds of branches that you might need in special situations. Sometimes a branch is a place to work on experimental feature or features that might take a long time to get stable (therefore you don't want to introduce them into your main line just yet). Other times, a branch might represent the "production" copy of your code which can be edited and deployed independently from your main line of code which contains changes intended for a future release.
Anyway, this is just one aspect of how to set up your system, but I think giving some thought to this structure is important.

What is the difference between all the different types of version control?

After being told by at least 10 people on SO that version control was a good thing even if it's just me I now have a followup question.
What is the difference between all the different types of version control and is there a guide that anybody knows of for version control that's very simple and easy to understand?
We seem to be in the golden age of version control, with a ton of choices, all of which have their pros and cons.
Here are the ones I see most used:
svn - currently the most popular open source?
git - very hot since Linus switched to it
mercurial - some smart people I know swear by it
cvs - the one everybody is switching from
perforce - imho, the best features, but it's not open source. The two-user license is free, though.
visual sourcesafe - I'm not much in the Microsoft world, so I have no idea about this one, other than people like to rag on it as they rag on everything from Microsoft.
sccs - for historical interest we mention this, the great-grandaddy of many of the above
rcs - and the grandaddy of many of the above
My recommendation: you're safest with either git, svn or perforce, since a lot of people use them, they are cross platform, have good guis, you can buy books about them, etc.
Dont consider cvs, sccs, rcs, they are antique.
The nice thing is that, since your projects will be relatively small, you will be able to move your code to a new system once you're more experienced and decide you want to work with another system.
Eric Sink has a good overview of source control. There are also some existing questions here on SO.
To everyone just starting using version control:
Please do not use git (or hg or bzr) because of the hype
Use git (or hg or bzr) because they are better tools for managing source code than SVN.
I used SVN for a few years at work, and switched over to git 6 months ago. Without learning SVN first I would be totaly lost when it comes to using a DVCS.
For people just starting out with version control:
Start by downloading SVN
Learn why you need version control
Learn how to commit, checkout, branch
Learn why merging in SVN is such a pain
Then switch over to a DVCS and learn:
How to clone/branch/commit
How easy it is to merge your branches back (go branch crazy!)
How easy it is to rewrite commit history and keep your branchesup to date with the main line (git rebase -i, )
How to publish your changes so others can benefit
tldr; crowd:
Start with SVN and learn the basics, then graduate to a DVCS.
I would start with:
A Visual Guide to Version Control
Wikipedia
Then once you have read up on it, download and install SVN, TortoiseSVN and skim the first few chapters of the book and get started.
Version Control is essential to development, even if you're working by yourself because it protects you from yourself. If you make a mistake, it's a simple matter to rollback to a previous version of your code that you know works. This also frees you to explore and experiment with your code because you're free of having to worry about whether what you're doing is reversible or not. There are two major branches of Version Control Systems (VCS), Centralized and Distributed.
Centralized VCS are based on using a central server, where everyone "checks out" a project, works on it, and "commits" their changes back to the server for anybody else to use. The major Centralized VCS are CVS and SVN. Both have been heavily criticized because "merging" "branches" is extremely painful with them. [TODO: write explanation on what branches are and why merging is hard with CVS & SVN]
Distributed VCS let everyone have their own server, where you can "pull" changes from other people and "push" changes to a server. The most common Distributed VCS are Git and Mercurial. [TODO: write more on Distributed VCS]
If you're working on a project I heavily recommend using a distributed VCS. I recommend Git because it's blazingly fast, but is has been criticized as being too hard to use. If you don't mind using a commercial product BitKeeper is supposedly easy to use.
The answer to another question also applies here, most importantly
Jon Works said:
The most important thing about version control is:
JUST START USING IT
His answer goes into more detail, and I don't want to be accused of plaigerism so take a look.
The simple answer is, do you like Undo buttons? The answer is of course yes, because we as human being make mistakes all the time.
As programmers, its often the case though that it can take several hours of testing, code changes, overwrites, deletions, file moves and renames before we work out the method we are trying to use to fix a problem is entirely the wrong one and the code is more broken than when we started.
As such, Source Control is a massive Undo button to revert the code to an earlier time when the grass was green and the food plentiful. And not only that, because of how source control works, you can still keep a copy of your broken code, in case a few weeks down the line you want to refer to it again and cherry pick any good ideas that did come out of it.
I personally (though it could be called overkill) use a free Single user license version of Source Gear Fortress (which is their Vault source control product with bug tracking features). I find the UI really simple to use, it supports both the checkout > edit > checkin model and the edit > merge > commit model. It can be a little tricky to set up though, requiring you to run a local copy of ISS and SQL server. You might want to try a smaller program, like those recommended by other answers here. See what you like and what you can afford.
Mark said:
git - very hot since Linus switched to it
I just want to point out that Linus didn't switch to it, Linus wrote it.
If you are working by yourself in a Windows environment, then the single user license for SourceGear's Vault is free.
We use and like Mercurial. It follows a distributed model - it eliminates some of the sense of having to "check in" work. Mozilla has moved to Mercurial, which is a good sign that it's not going to go away any time soon. One con, in my opinion, is that there isn't a very good GUI for it. If you're comfortable with the command line, though, it's pretty handy.
Mercurial Documentation
Unofficial Manual
Just start using source control, no matter what type you use. What you use doesn't matter; it's the use of it that is important
Like everyone else, SC is really dependant on your needs, your budget, your environment, etc.
At its root, source control is designed to provide a central repository of all your code, and track who did what to it when. There should be a complete history, and you can get products that do full changelogs, auditing, access control, and on and on...
Each product that is out there starts to shine (so to speak) when you start to look at how you want or need to incorporate SC into your environment (whether it's your personal code and documents or a large corporations). And as people use them, they discover that the tool has limitations, so people write new ones. SVN was born out of limitations that the creators saw with CVS. Linus wanted something better for the Linux kernel, so now we have git.
I would say start using one (something like SVN which is very popular and pretty easy to use) and see how it goes. As time progresses you may find that you need some other functionality, or need to interface with other systems, so you may need SourceSafe or another tool.
Source control is always important, and while you can get away with manually re-numbering versions of PSD files or something as you work on them, you're going to forget to run that batch script once or twice, or likely forget which number went with which change. That's where most of these SC tools can help (as long as you check-in/check-out).
See also this SO question:
Difference between GIT and CVS