Do you feel comfortable merging code? - version-control

This morning, I read two opinions on refactoring.
Opinion 1 (Page not present)
Opinion 2 (Page not present)
They recommend branching (and subsequently merging) code to:
Keep the trunk clean.
Allow a developer to walk away from risky changes.
In my experience (particularly with Borland's StarTeam), merging is a non-trival operation. And for that reason, I branch only when I must (i.e. when I want to freeze a release candidate).
In theory, branching makes sense, but the mechanics of merging make it a very risky operation.
My questions:
Do you feel comfortable merging code?
Do you branch code for reasons other than freezing a release
candidate?

Branching might be painful but it shouldn't be.
That's what git-like projects (mercurial, bazar) tells us about CVS and SVN. On git and mercurial, branching is easy. On SVN it's easy but with big projects it can be a bit hardcore to manage (because of time spent on the branching/merging process that can be very long -- compared to some others like git and mercurial -- and difficult if there are non-obvious conflicts). That don't help users that are not used to branch often to have confidence in branching. Lot of users unaware of the powerful uses of branching just keep it away to not add new problems to their projects, letting the fear of the unknown make them far from efficiency.
Branching should be an easy and powerful tool we'd have to use for any reason good enough to branch.
Some good reasons to branchs:
working on a specific feature in parallel with other people (or while working on other features alternatively if you're alone on the project);
having several brand versions of the application;
having parallel versions of the same application -- like concurrent techniques developped in the same time by to part of the team to see what works the better;
having resources of the application being changed on a artist/designers (for example in games) specific branch where the application is "stable" while other branches and trunk are used for features addition and debugging;
[add here useful usages]

Some loose guiding principles:
Branch late and only when you need to
Merge early and often
Get the right person to do the merge, either the person who made the changes or the person who wrote the original version are best
Branching is just another tool, you need to learn how to use it effectively if you want the maximum benefit.
Your attitude to branching should probably differ between distributed open source projects (such as those on Git) and your company's development projects (possibly running on SVN). For distributed projects you'll want to encourage branching to maximize innovation and experimentation, for the latter variety you'll want tighter control and to dictate checkin policies for each code line that dictate when branching should / should not occur, mostly to "protect" the code.
Here is a guide to branching:
http://www.vance.com/steve/perforce/Branching_Strategies.html
Here is a shorter guide with some high level best practices:
https://www.perforce.com/pdf/scm-best-practices.pdf

Branching is trivial. Merging is not. For that reason, we rarely branch anything.

Using SVN, I've found branching to be relatively painless. Especially if you periodically merge the trunk into your branch to keep it from getting too far out of sync.

We use svn. It only takes us about 5 minutes to branch code. It's trivial compared to the amount of pain it saves us from messing up trunk.

Working in a code base of millions of lines of code with hundreds of developers branching is an everyday occurrence. The life of the branch varies depending on the amount of work being done.
For a small fix:
designer makes a sidebranch off the main stream
makes changes
tests
reviews
merges accumulated changes from main stream to sidebranch
iterates through one or more of the previous steps
merges back to main stream
For a multi-person team feature:
team makes a feature sidebranch off the main stream
individual team member operates on feature sidebranch as in "small fix" approach and merges to feature sidebranch.
sidebranch prime periodically merges accumulated changes from main stream to feature sidebranch. Small incremental merges from the mainstream to feature sidebranch are much easier to deal with.
when feature works, do final merge from main stream to feature sidebranch
merge feature sidebranch to main stream
For a customer software release:
make a release branch
deliver fixes as needed to release branch
fixes are propogated to/from the main stream as needed
Customer release streams can be very expensive to support. Requires testing resources - people and equipment. After a year or two, developer knowledge on specific streams starts to get stale as the main stream moves forward.
Can you imagine how much it must cost for Microsoft to support XP, Vista and Windows 7 concurrently? Think about the test beds, the administration, documentation, customer service, and finally the developer teams.
Golden rule: Never break the main stream since you can stall a large number of developers. $$$

The branching problem is why I use a Distributed Version Control system (Git in my case, but there are also Mercurial and Bazaar) where creating a branch is trivial.
I use short lived branches all the time for development. This lets me mess around in my own repository, make mistakes and bad choices, and then rebase the changes to the main branch so only clean changes are kept in history.
I use tags to mark frozen code, and it is easy in these systems to go back and branch off these for bug fixes without having a load of long lived branches in the code base.

I use Subversion and consider branching very simple and easy. So to answer question 1.. Yes.
The reason for branching can vary massively. I branch if I feel I should. Quite hard to put rules and reasons down for all possibilities.
However, as far as the "Allow a developer to walk away from risky changes." comment. I totaly agree with that one. I create a branch whenever I want to really play around with the code and wish I was the only developer working on it.. When you branch, you can do that...

I've been on a project using svn and TFS and branching by itself is a really simple thing.
We used branching for release candidate as well as for long lasting or experimental features and for isolating from other team's interference.
The only painful moment in branching is merging, because an old or intensely developed branch may differ a lot from trunk and might require significant effort to merge back.
Having said the above, I would say that branching is a powerful and useful practice which should be taken into account while developing.

If merging is too much of a pain, consider migrating to a better VCS. That will be a bigger pain, but only once.

We use svn and have adopted a rule to branch breaking changes. Minor changes are done right in the trunk.
We also branch releases.
Branching and merging have worked well for us. Granted there are times we have to sit and think about how things fit together, but typically svn does a great job of merging everything.

I use svn, it takes less than a minute to branch code. I used to use Clearcase, it took less than a minute to branch code. I've also used other, lesser, SCMs and they either didn't support branches or were too painful to use. Starteam sounds like the latter.
So, if you cannot migrate to a more useful one (actually, I've only heard bad things about Starteam) then you might have to try a different approach: manual branching. This involves checking out your code, copying it to a different directory and then adding it as a new directory. When you need to merge, you'd check out both directories and use WinMerge to perform the merge, checking in the results to the original directory. Awkward and potentially difficult if you continue to use the branch, but it works.
the trick with Branching is not to treat it as a completely new product. It is a branch - a relatively short-lived device used to make changes separately and safely to a main product trunk. Anyone who thinks merging is difficult is either refactoring the code files so much (ie they are renaming, copying, creating new, deleting old) that the branch becomes a completely different thing, or they are keeping the branch so long that the accumulated changes bear little resemblance to the original.
You can keep a branch for a long time, you just have to merge your changes back regularly. Do this and branching/merging becomes very easy.

I've only done it a couple times, so I'm not exactly comfortable with it.
I've done it to conduct design experiments that would span over some checkins, so branching is an easy way to wall off yourself a garden to play in. Also, it allowed me to tinker while other people worked on the main branch, so we didn't lose much time.
I've also done it when making wide ranging changes that would render the trunk uncompilable. It became clear in my project that I'd have to remove compile-time type safety for a large portion of the codebase (go from generics to system.object). I knew this would take a while and would require changes all over the codebase which would interfere with other people's work. It would also break the build until I was complete. So I branched and stripped out the generics, working until that branch compiled. I then merged it back into the trunk.
This turned out pretty well. Prevented a lot of toe-stepping, which was great. Hopefully nothing like this will ever come up again. Its kind of a rare thing that a design will change requiring this kind of wide ranging edits that don't result in a lot of code being thrown out...

Branched have to be managed correctly to make merging painless. In my experience (with Perforce) regular integration to the branch from the main line meant that the integration back into the main line went very smoothly.
There were only rare occasions when the merging failed. The constant integration from the main line to the branch may well have involved merges, but they were only of small edits that the automatic tools could handle without human intervention. This meant that the user didn't "see" these happening.
Thus any merges required in the final integration could often be handled automatically too.
Perforces 3-way merge tools were a great help when they were actually needed.

Do you feel comfortable branching code?
It really depends of the tool I'm using. With Starteam, branching is indeed non trivial (TBH, Starteam sucks at branching). With Git, branching is a regular activity and is very easy.
Do you branch code for reasons other than freezing a release candidate?
Well, this really depends of your version control pattern but the short answer is yes. Actually, I suggest to read the following articles:
Version Control for Multiple Agile Teams by Henrik Kniberg
FeatureBranch by Martin Fowler
I really like the pattern described in the first article and it can be applied with any (non Distributed) Version Control System, including Starteam.
I might consider the second approach (actually, a mix of the both strategies) with (and only with) a Distributed Version Control Systems (DVCS) like Git, Mercurial...

We use StarTeam and we only branch when we have a situation that requires it (i.e. hotfix to production during release cycle or some long reaching project that spans multiple release windows). We use View Labels to identify releases and that makes it a simple matter to create branches later as needed. All builds are based on these view labels and we don't build non-labeled code.
Developers should be following a "code - test - commit" model and if they need a view for some testing purpose or "risky" development they create it and manage it. I manage the repository and create branches only when we need them. Those times are (but not limited to):
Production hotfix
Projects with long or overlapping development cycles
Extensive rewriting or experimental development
The merge tool in StarTeam is not the greatest, but I have yet to run into an issue caused by it. Whoever is doing the merge just needs to be VERY certain they know what they're doing.
Creating a "Read Only Reference" view in Star Team and setting it to a floating configuration will allow changes in the trunk to automatically show in the branch. Set items to branch on change. This is good for concurrent development efforts.
Creating a "Read Only Reference" view with a labeled configuration is what you'd use for hot fixes to existing production releases (assuming you've labeled them).

Branching is trivial, as most have answered, but merging, as you say, is not.
The real keys are decoupling and unit tests. Try to decouple before you branch, and keep an eye on the main to be sure that the decoupling and interface are maintained. That way when it comes time to merge, it's like replacing a lego piece: remove the old piece, and the new piece fits perfectly in its place. The unit tests are there to ensure that nothing got broken.

Branching and merging should be fairly straightforward.
I feel very comfortable branching/merging.
Branching is done for different reasons, depending on your development process model/
There's a few different branch models:
Here's a one
Trunk
.
.
.
..
. ....
. ...
. ..Release1
.
.
...
. ....
. ...Release2
.
.
..
. ...
. ..
. ...Release3
.
.
Now here's a curious thing. Suppose Release1 needed some bugfixing. Now you need to branch Release1 to develop 1.1. That is OK, because now you can branch R1, do your work, and then merge back to R1 to form R1.1. Notice how this keeps the diffs clear between releases?
Another branching model is to have all development done on the Trunk, and each release gets tagged, but no further development gets done on that particular release. Branches happen for development.
Trunk
.
.
.
.Release1
.
.
.
.
.Release2
.
.......
. ......
. ...DevVer1
. .
. .
. ...DevVer2
. ....
. ....
...
.Release3
.
There may be one or two other major branch models, I can't recall them off the top of my head.
The bottom line is, your VCS needs to support flexible branching and merging.
Per-file VCS systems present a major pain IMO(RCS, Clearcase, CVS).
SVN is said to be a hassle here as well, not sure why.
Mercurial does a great job here, as does(I think)git.

Related

Is using “feature branches” compatible with refactoring?

“feature branches” is when each feature is developed in its own branch and only merged into the main line when it has been tested and is ready to ship. This allows the product owner to choose the features that go into a given shipment and to “park” feature that are part written if more important work comes in (e.g. a customer phones up the MD to complain).
“refactoring” is transforming the code to improve its design so as to reduce to cost of change. Without doing this continually you tend to get uglier code bases which is more difficult to write tests for.
In real life there are always customers that have been sold new features and due to politics all the customers have to see that progress is being made on “their” group of features. So it is very rarely that there is a time without a lot of half-finished features sitting on branches.
If any refactoring has been done, the merging in the “feature branches” become a lot harder if not impossible.
Do we just have to give up on being able to do any refactoring?
See also "How do you handle the tension between refactoring and the need for merging?"
My view these days is that due to the political reasons that resulted in these long living branches and the disempowerment of the development director that prevented him from taking action, I should have quicker started looking for a new job.
Feature branches certainly make refactoring much harder. They also make things like continuous integration and deployment harder, because you are ballooning the number of parallel development streams that need to be built an tested. You are also obviating the central tenet of "continuous integration" -- that everyone is working on the same codebase and "continuously" integrating their changes with the rest of the team's changes. Typically, when feature branches are in use, the feature branch isn't continuously built or tested, so the first time the "feature branch" code gets run through the production build/test/deploy process is when it is "done" and merged into the trunk. This can introduce a whole host of problems at a late and critical stage of your development process.
I hold the controversial opinion that you should avoid feature branches at (nearly) all costs. The cost of merging is very high, and (perhaps more importantly) the opportunity cost of failing to "continuously integrate" into a shared code base is even higher.
In your scenario, are you sure you need a separate feature branch for each client's feature(s)? Could you instead develop those features in the trunk but leave them disabled until they are ready?. Generally, I think it is better to develop "features" this way -- check them in to trunk even if they aren't production-ready, but leave them out of the application until they are ready. This practice also encourages you to keep your components well-factored and shielded behind well-designed interfaces. The "feature branch" approach gives you the excuse to make sweeping changes across the code base to implement the new feature.
I like this provoking thesis ('giving up refactoring'), because it enriches discussion :)
I agree that you have to be very careful with bigger refactoring when having lots of parallel codelines, because conflicts can increase integration work a lot and even cause introducing regression-bugs during merging.
Because of this with refactoring vs. feature-branches problem, there are lots of tradeoffs. Therefore I decide on a case by case basis:
On feature-branches I only do refactorings if they prepare my feature to be easier to implement. I always try to focus on the feature only. Branches should differ from trunk/mainline at least as possible.
Taking it reverse I sometimes even have refactoring branches, where I do bigger refactorings (reverting multiple steps is very easy and I don't distract my trunk colleagues). Of course I will tell my team, that I am doing this refactoring and try to plan to do it during a clean-up development cycle (call it sprint if you like).
If your mentioned politics are a big thing, then I would encapsulate the refactoring efforts internally and add it to estimation. In my view customers in middle-terms will see faster progress when having better code-quality. Most likely the won't understand refactoring (which makes sense, because this out of their scope...), so I hide this from them
What I would never do is to refactor on a release-branch, whose target is stability. Only bug-fixes are allowed there.
As summary I would plan my refactorings depending on codeline:
feature-branch: only smaller ones (if they "help" my feature)
refactoring-branch: for bigger ones, where the refactoring target isn't completely clear (I often call them "scribble refactorings")
trunk/mainline: OK, but I have to communicate with developers on feature-branches to not create an integration nightmare.
release-branch: never ever
Refactoring and merging are the two combined topics Plastic SCM focuses on. In fact there are two important areas to focus: one is dealing (during merge) with files that have been moved or renamed on a branch. The good news here is that all the "new age" SCMs will let you do that correctly (Plastic, Git, Hg) while the old ones simply fail (SVN, Perforce and the even older ones).
The other part is dealing with refactored code inside the same file: you know, you move your code and other developer modifies it in parallel. It is a harder problem but we do focus on it too with the new merge/diff toolset. Find the xdiff info here and the xmerge (cross-merging) here. A good discussion about how to find moved code here (compared to "beyond compare").
While the "directory merging" or structure merging issue is a core one (whether the system does it or not), the second one is more a tooling problem (how good your three-way merge and diff tools are). You can have Git and Hg for free to solve the first problem (and even Plastic SCM is now free too).
Part of the problem is that most merge tools are just too stupid to understand any refactoring. A simple rename of a method should be merged as a rename of the method, not as an edit to 101 lines of code. Therefore for example additional calls to the method in anther branch should be cope with automatically.
There are now some better merge tools (for example SemanticMerge) that are based on language parsing, designed to deal with code that has been moved and modified. JetBrains (the create of ReShaper) has just posted a blog on this.
There has been lots of research on this over the years, at last some products are coming to market.

Version control best practices

I just made the move to version control the other day, and after a bad experience with Subversion, I switched to Mercurial, and so far am happy with it.
Although I understand and appreciate the idea of version control, I don't really have any practical experience with it.
Right now, I am using it for a couple websites I am working on, and a couple questions have come to mind:
When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?
Would I branch off when I wanted to, say, change the layout of a menu, then merge back in?
Should I branch? What is the difference (for just me, a lone developer) between branching, then merging back in, and cloning the repository and pulling it back in?
Any other advice for a version control newbie?
So far, everyone has given me good advice, but very team-oriented. I would like to clarify:
At the moment, I am just using VC on some websites I do on the side. Not quite full-out freelance work, but for the purposes of VC, I am the only one that really touches the website code.
Also, since I am using PHP on the sites, there is no compiling to be done.
Does this change your answers significantly?
Most of the questions you're asking about depends mostly on who you are working with. If you're a lone developer it shouldn't matter a lot, since you can do whatever you'd like. But if you're in a team where you have to share your code then you should discuss with your team members what the code of conduct should be since sharing changes between one another can become tricky at times.
The discussion regarding code of conduct doesn't need to be lengthy, it can be very brief; as long everyone is on the same page on how to use the repository that is shared between the programmers in the team. If you want to use the more advanced features in Mercurial, such as cherry picking or patch queues, then try using them so that it won't impact your team members in a negative way, such as rebasing on a public repository.
Remember version control has to be easy to use for everyone in the team, or else it won't be used.
When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?
While working with a team there are several approaches, but the common rule is to commit early and often. The main reason on why you should commit often is to make merge conflicts easier to handle.
A merge conflict is simply put whenever merging a file that has been changed by at least two people doesn't work because they've been editing on the same lines. If you're holding on to a commit that involves a very large change with several lines of changes across several files, it will become very difficult to manage for the receiver to manage the conflicts that may occur. The merge conflict becomes even more difficult to handle if the said set of changes are held on for too long.
There are some exceptions to the rule of committing often and one is whenever you have a breaking change. although if you have the ability to commit locally (which you are doing in Mercurial and git inherently) you could commit breaking changes. As long as you fix whatever broke, you should push it upstream to the shared repository when you've fixed your own breaking change.
Would I branch off when I wanted to, say, change the layout of a menu, then merge back in?
Should I branch?
There are many branching strategies to choose from (there is the Streamed Lines paper from 1998 that has an exhaustive pattern list of branching strategies) and when you're making them for yourself it should be open game for yourself. However when working in teams, you'd better discuss openly with the team if you need to branch or not. Whenever you have the urge to branch though you should ask yourself the following questions:
Will my future changes be breaking the work of others?
Will my team have a direct negative impact from the changes I'll be doing until I'm done?
Is my code throwaway code?
If the answer is yes in any of the questions above you should probably branch publically, or keep it for yourself (since you can do that in Mercurial in several ways). You should first discuss with your team on how to execute the whole endavour to see if there is any other way of doing it and if you're going to merge your changes back in, sometimes there are factors at play where there is no need to branch (this is mostly related to how modular the code is).
When you decide to branch be prepared to handle a merge conflict. It is sane to assume the one who created the branch and made the commits to be able to merge it back into the "main branch". At these times it would be great if everyone in the team made relevant commit comments.
As a side note: You do write good commit comments, right? RIGHT!? A good commit comment usually tells why that particular change was made or what feature the committer was working on instead of a nondescript "I made a commit" kind of comment. This makes it easier for the one who is handling the big merge conflict to figure out what line changes can be overwritten and which ones to keep while going through the revision history.
Compile times, or build times rather, sometimes play into the branch discussion you may have. If your project has a slow build time then it might be a good idea to use a staging strategy in your branches. This strategy takes into account that all developers should integrate to a "main line" and changes that are approved are elevated (or "promoted") to the next stage, such as testing or release lines. It is classically illustrated with tag names for open source software like this:
main -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-> ...
\ \ \
test o-----------o--------------o---------> ...
1.0 RC1 \ 1.0 RC2 2.0 RC1
release o----------------------> ...
1.0
The point with this is that testers can work without being interrupted by the programmers and that there is a known baseline for those who are in release management. In distributed version control, the different lines could be cloned repositories and it may look a bit different since repositories share the versioning graph. The principle however is the same.
Regarding web development, there are virtually no build times. But branching in stages (or by tagging your release revisions) it becomes easier to roll-back if you want to check a difficult-to-track-down bug.
However, a whole other thing comes into play and that is the time it takes to deploy the site. Version control tools in my experience are really bad at asset management. Handling art assets that are in total up to several GB usually is a huge pain in the butt to handle in Subversion (more so in Mercurial). Assets may require you to handle them in another way that is less time consuming, such as putting them in a shared space that are synched and backed up in a traditional manner (art assets are usually not worked on concurrently as with source code files).
What is the difference (for just me, a lone developer) between branching, then merging back in, and cloning the repository and pulling it back in?
The concepts of branching and keeping remote repositories are closer now than with centralized version control tools. You could almost consider them being the same thing. In Mercurial (and in git) you can "branch" either by:
Cloning a repository
Creating a named branch
Creating a named branch means that you're making a new path in the versioning graph for the repository you're creating it on. Creating a cloned repository means you're copying the source repository into a new location, and making a new path in the cloned repository's versioning graph. They are both two different implementations of branching as a general concept in version control.
In practice, the only difference between both methods that you should care about is in usage. You clone a repository to have a copy of the source code and have a place to store your own changes in and you create named branches whenever you want to do small experiments for yourself.
Since browsing through branches is a bit quirky for those who accustomed to a straight line of commits, advanced users know how to manipulate their versions so the version history is "clean" with e.g. cherry picking or rebase. At the moment git docs actually explain rebase rather well.
These are the practices that I follow
Each commit should make sense: one bug fix (or a set of bugs related to each other), one (small) new feature, etc. The idea is that if you need to rollback, your rollbacks fall on well defined "boundaries"
Every commit should have a good message explaining what you are committing. Really get into this habit, you will thank yourself later. Doesn't have to be verbose, a few sentences can do. If you are using a bug tracking system, associating a bug number with your commit is also extremely helpful
Now that I use git and branching is so incredibly fast and cheap, I tend to make a new branch for each new feature I'm about to implement. I'd never even consider doing this for many other VCSes. So branching depends on the system you are using, your codebase, your team, etc, there are no hard rules there.
I prefer to always use the command line and get to know my VCS's commands directly. The disconnect that a GUI based frontend can cause can be a pain, and even damaging. Controlling your source code is very important, it's worth getting in there and doing it directly. But that's just my preference.
Back up your VCS. I back up my local repository with Time Machine, and then I push out to a remote repository on my server, and that server is backed up as well. VCS alone is not really a "backup", it can go down too just like anything else.
When/how often should I commit?
You'll probably get lots of contradictory answers on this one. My view is that you should commit changes when they are working, and each commit (or checkin) should contain exactly one "edit". An "edit" is an atomic set of changes that go together to fix a bug or implement a new feature.
There is a theory that you should check in code every few hours even if it's not working, but in that case you will need to be working on your own branch - you don't want to be checking in broken code to your main line, or onto a shared branch.
The advantage of checking in every night is that you have backups (assuming that your repository is on a different machine).
As for branching:
you should have main line that contains always working code.
you should have a current development branch that contains the latest code. When you are happy with this (and it's passed all it's tests) you should merge it back into the main line.
you might want a branch that contains the last released version. This can be used for testing/debugging bugs and releasing patches (in extremis).
update before each commit
provide commit comments
commit as soon as you have something finished
don't commit anything that makes the code in the repository not compiling or buggy
update every morning
sometimes verbally communicate with colleages if there is something important to update
commit code relevant to exactly one thing (i.e. fixing a bug, implementing a feature)
don't worry to make very small commits, as long as they conform to the previous rule
Btw, what's the bad experience with Subversion?
I commit when I am finished a piece of work and only if it is working. It's bad practise to commit to somewhere where other people use the code.
Branching is something that people will argue about. Some people say never branch and just have switches to get something working or not. Do what you feel more comfortable but don't branch just because you can. I use branching and Branch when i am working on a major bit of work where if I commit broken code by accident its not going to affect everyone else.
Q: When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?
A: Whenever you are feeling comfortable, I am commiting as soon as a unit of work is finished and working (which does not mean that the complete task has to be finished). But you should not commit something that does not compile (might inhibit other people in the team,if any). Also, you should not commit incomplete stuff to the trunk if there is any possibility that you have to implement a quick fix or small change before completing it.
Q: Would I branch off when I wanted to, say, change the layout of a menu, then merge back in?
A: Only if there is a possibility that you have to implement a quick fix or small change before completing your task.
The nice thing about branching is that all commits you are doing in the branch will still be available for future reference (if necessary). Also it is much simpler and faster than cloning the repo, I think ;-)
I agree with others on commit times.
Regarding branching, I generally branch only when working on something that breaks what others are doing or when a patch needs to be rolled to production in a file that already has changes that should not go to production. If you're only one developer, then the first scenario doesn't really apply.
I use tags to manage releases - the "production" tag is always associated with the current prod code, and each release is tagged with "release-YYYYMMDD". This allows you to roll back if necessary.

What are the benefits of "Staircase" branching?

I've encountered a project where we use staircase branching. To clarify, this is the approach where code is branched for development and when the development completes, rather than cutting back to a trunk/main line, the dev branch is designated as the live code set and a new branch is taken from it. This continues ad infinitum.
I personally prefer the cut back to a trunk approach to maintain one single "live" codeset. However I am wondering if there are any compeling benefits to the staircase approach.
Thanks,
Tom
This method will bite you in the derriere the moment you want to create multiple branches and develop against them concurrently. Merging back to the trunk allows you to have parallel branches and easily bring them together one branch at a time.
I believe it's a way to make it easier to roll back, and a little clearer the differences between versions. It's also easier to continue developing a previous version (for instance, having a version for developers and a version for designers).
I myself prefer the other approach with tags indicating the checkpoints. I use git, and the process of branching and doing this kind of stuff is really well built with it.
Right from the start of the first stair, you are in the same position as were when developing the first release. this is it, no late decisions about what to move back into the main trunk. The build from the stair is your latest candidate.
Penalty, if lots of change and fix in original you have merging up to do, and that might be disruptive. I'm guessing that there may be a break-even point when rate of change for the new release and rate of patch in the old make staircase suboptimal.
There are a couple of (potentially minor) benefits to this approach.
The most compelling benefits arise from the lack of need to merge changes back into the main branch. This makes it easy to keep a branch (the old "trunk") for the version where you branched, as well as requiring no effort on the dev branch to continue on. In reality, this is no different than having one live trunk and tagging or branching out for a release, though - except that you "move" your development instead of moving your tagged branch. This can make it easier to maintain clean branches with less effort, since there's no need to "tag" a new branch for each release - it just kind of happens automatically. This is a minor timesavings, though.
In my experience, though, there is one potential disadvantage. I've often found that this approach often makes it easier for developers to accidentally break binary compatibility in libraries, since you're always working on a development copy, and each "release" is a separate branch from that. Since there's no effort required in merging back to a trunk, it can be easy to accidentally break an API. This isn't a major concern IMO, but is something to be aware of, since there is no effort during the merge process (which seems to be where most of these mistakes are often discovered).

Preferred Version Control Methodology

I am a novice in the world of source/version control and I have been doing as much reading as physically possible to get my head around the different techniques that people use for their own source/version control.
One thing that I have noticed is a pretty distinct break in the methods of developers into two (possibly more?) groups: one group prefers to keep their trunk in an always-stable state and performs all maintenance and future development in the branches, while others prefer to do all of their development in the trunk and keep it in a not-so-stable state.
I am curious as to what the community here at StackOverflow prefers or if you have your own methods.
Note: If it would help tailor the answers, I should note that I am a single developer (at most there would be two or three others in the same project) who works primarily in ASP.NET and SQL Server 2005
As I'm sure you've noticed from searching the web for answers on this topic, this is one of those things where the best answer is "It depends.", and as most of the responses have indicated, it's a trade-off between how easy do you want to be able to commit/merge new code vs. managing an extensive version history that you can easily roll back for support or debugging purposes.
I work for a small company, which means that at any given time, we could have 3 or 4 different versions of code on developer machines that have not yet been committed to the repository. We use TortoiseSVN for our version control system, which gives us the ability to branch/merge without too much difficulty, as well as being able to view the update log or revert our local copies to an earlier revision pretty easily.
Based on your question, I suppose we would fall under the group of developers who attempts to keep, at all times, a stable Trunk, and we branch new code and test it before merging it back into the Trunk. We also make an effort to keep "snapshots" of each version release so that, if necessary, we can easily check out an earlier version and re-build it, without incorporating any new features intended for a future release (This is also a great mechanism for tracking down bugs, as you can use earlier versions of code to help determine when a particular bug was first introduced into your code. However, one thing to be careful of is if your application references common code that is maintained separately from your version-ed code, you will need to keep track of that too!).
On the repository, it ends up looking something like this:
Trunk
v1.0.1.x Release
v1.0.2.x Release
v1.0.2.x Bug-Fix A <-- (These get merged back into Trunk, but remain on the repo)
v1.0.2.x Bug-Fix B
v1.1.1.x Release
v1.2.1.x Development <-- (This will get merged back to Trunk, and replaced by a Release folder)
v1.2.1.x New Feature A <-- (These get merged back into the Development branch)
v1.2.1.x New Feature B
When I first started at the company, our version structure was not quite as sophisticated, and in my experience, I would say that if you have any need whatsoever to keep track of earlier versions, it is will worth the effort to put something like this together (like I said earlier, it doesn't have to look exactly like this, so long as it fits your individual needs), keep it well documented so that all contributors can maintain it (the alternative is that the creator ends up "babysitting" the repo, which quickly becomes an incredible waste of time), and encourage all your developers to follow it. It may feel like a lot of effort in the beginning, but you'll appreciate it the first time you need to take advantage of it.
Good luck!
I do all my development in the trunk. I'm a single developer and don't want to deal with the hassle of branching. When my code is stable I just tag that current version. For example I'd tag version 1.0, 2.0 beta, 2.0 release candidate 1, version 2.0, etc. Branching would probably a better alternative if you’re maintaining old versions for bug fixes and support but since I don't do this I don't worry about it.
The differences may have to do with how painful merging is or isn't in a given version control system.
With git, branching and merging is practically effortless, so it's a common workflow for me to keep my master clean and do all my work in branches. Branching and merging in svn, particularly in previous versions, isn't quite so cheap and easy, so when I was using svn I tended to work directly on the trunk.
Always stable. Even if I'm a single developer -- almost especially if I'm a lone developer.
Having a broken tree to me means one less way to know what I should be doing.
Big changes go in branches, as well as stable releases, and do the smallest unit of changes possible at any given point so as to keep moving forward at a good pace.
This is the methodology which we follow:
Any stable release should be taken from the trunk. Any further work or modifications should go inside the working branch and should be merged with trunk when ready to release.
If have multiple independent developments, each group should have there on branch which they should sync with trunc periodically and merge it back to trunk when ready.
I've always used the main trunk as head of code. Generally new development code goes in there.
We branch for releases and we may branch for a "big" destabilizing experiment.
When we make bug fixes they go into in main first and then they get merged (back-ported) into the appropriate version branch if required.
If the big experiment works out it get's merged back into main.
We use tags for build numbers in the version branches and the main. That way we can get back to a specific version and build if we have to.
I'm in for the always-stable trunk. You need to be able to rebuild the latest stable version at any time...
In your case, I'd strongly recommend avoiding a lot of branching. It's really a fairly advanced process and not necessary for small projects and small teams.
Try and keep it simple to start with, I always try to have a known working build that can reproduced for testing and deployments etc. Depending on your repository you could use revision number (SVN), or just label the known working versions as they are required.
If you find you have multiple people touching the same files then you will need to consider a branching strategy, other than that for such a small dev team it will just be un-necessary overhead...(IMO)
One aspect is how long will the changes be in an unstable state.
When a change I make might affect other people, or the nightly build, then I do my work on a branch, and merge when stable.
When the changes I make won't affect other people (because it is my private code at home, rather than code at work), then I'm OK with checking in non-working intermediate states if that's what I want. Sometimes, I'll make a few checkins in a row which are not stable; that's OK for me when it is just me who is affected and the workflow will be continuous. It's if I come back in a few years time (as opposed to just a few days) and things aren't working that it gets problematic (one disadvantage of having been around as long as I have - I do have some projects that are still in development and maintenance that are old enough to vote).
I use a variant of tagging to achieve repeatable builds - so if I need to go back to a stable version for a bug-fix, I can use the tag information to do that. It is crucial to be able to get a stable version on demand.
One key distinction is how big files tend to be on average. Big files (1000 lines +) tend to have many independent changes that are trivially automatically mergeable. So the fact that someone else is actively changing a file you are about to start work on is probably uninteresting, so it is ok if the system makes that hard to discover.
So you tend to end up a VC strategy that has a lot of branches, and easy merges. New functionality is written in new branches.
If you are working with the smaller, highly-cohesive files typical of an OO design, in a language like Java, such accidental conflicts are a lot rarer. Even as an artificial example, it is pretty hard to come up with two sets of changes that can be made to a class (and corresponding JUnit test cases) that can sensibly be made in isolation and then automatically weaved back together by a text merge tool.
If you do a lot of refactoring (renaming and splitting files) then that stresses out the merge tool even more.
So you tend to be best off with a VC strategy that has an identifiable and usable stable trunk and minimal branches.
In the first approach, new functionality is written in new branches, and merged when complete. In the second, it is written in new files, and enabled when complete.
Of course, if you do the second, you definitely need strong protection from the baseline becoming unusable for any length of time (i.e. continuous integration and a strong automatically-run test suite).

How do you handle the tension between refactoring and the need for merging?

Our policy when delivering a new version is to create a branch in our VCS and handle it to our QA team. When the latter gives the green light, we tag and release our product. The branch is kept to receive (only) bug fixes so that we can create technical releases. Those bug fixes are subsequently merged on the trunk.
During this time, the trunk sees the main development work, and is potentially subject to refactoring changes.
The issue is that there is a tension between the need to have a stable trunk (so that the merge of bug fixes succeed -- it usually can't if the code has been e.g. extracted to another method, or moved to another class) and the need to refactor it when introducing new features.
The policy in our place is to not do any refactoring before enough time has passed and the branch is stable enough. When this is the case, one can start doing refactoring changes on the trunk, and bug-fixes are to be manually committed on both the trunk and the branch.
But this means that developpers must wait quite some time before committing on the trunk any refactoring change, because this could break the subsequent merge from the branch to the trunk. And having to manually port bugs from the branch to the trunk is painful. It seems to me that this hampers development...
How do you handle this tension?
Thanks.
This is a real practical problem. It gets worse if you have several versions you need to support and have branched for each. Even worse still if you have a genuine R&D branch too.
My preference was to allow the main trunk to proceed at its normal rate and not to hold on because in an environment where release timings were important commercially I could never argue the case that we should allow the code to stabilise ("what, you mean you released it in an unstable state?").
The key was to make sure that the unit tests that were created for the bug fixes were transitioned across when the bug was migrated into the main branch. If your new code changes are genuinely just re-factoring then the old tests should work equally well. If you changes are such that they are no longer valid then you can't just port you fix in any case and you'll need to have someone think hard about the fix in the new code stream.
After quite a few years managing this sort of problem I concluded that you probably need 4 code streams at a minimum to provide proper support and coverage, and a collection of pretty rigorous processes to manage code across them. It's a bit like the problem of being able to draw any map in 4 colours.
I never found any really good literature on this subject. It will inevitably be linked to your release strategy and the SLAs that you sign with your customers.
Addendum: I should also mention that it was necessary to write the branch merging as specific milestones into the release schedule of the main branch. Don't under-estimate the amount of work that might be entailed in bring your branches together if you have a collection of hard-working developers doing their job implementing features.
Where I work, we create temporary, short-lived (less than day -- a few weeks) working branches for every non-trivial change (feature add or bugfix). Trunk is stable and (ideally) potentially releasable all the time; only done items get merged into it. Everything committed from trunk gets merged into the working branches every day; this can be largely automated (we use Hudson, Ant and Subversion). (This last point because it's usually better to resolve any conflicts sooner than later, of course.)
The current model we use was largely influenced by an excellent article (which I've plugged before) by Henrik Kniberg: Version Control for Multiple Agile Teams.
(In our case, we have two scrum teams working on one codebase, but I've come to think this model can be beneficial even with one team.)
There is some overhead about the extra branching and merging, but not too much, really, once you get used to it and get better with the tools (e.g. svn merge --reintegrate is handy). And no, I do not create a temp branch always, e.g. for smaller, low-risk refactorings (unrelated to the main items currently under work) that can easily be completed with one commit to trunk.
We also maintain an older release branch in which critical bugs are fixed from time to time. Admittedly, there may be manual (sometimes tedious) merging work if some particular part of code has evolved in trunk significantly compared to the branch. (This hopefully becomes less of an issue as we move towards continually releasing increments from trunk (internally), and letting marketing & product management decide when they want to do an external release.)
I'm not sure if this answers your question directly, or if you can apply this in your environment (with the separate QA team and all) - but at least I can say that the tension you describe does not exist for us and we are free to refactor whenever. Good luck!
Maybe Git (or other DVCS) are better at handling merges to updated code thanks to the fact that they (really) manage changes rather than just compare files... As Joel says :
With distributed version control, merges are easy and work fine. So you can actually have a stable branch and a development branch, or create long-lived branches for your QA team where they test things before deployment, or you can create short-lived branches to try out new ideas and see how they work.
Not tried yet, though...
Where I work, we keep with the refactoring in the main branch. If the merges get tricky, they just have to be dealt with on an ad-hoc basis, they're all do-able, but occasionally take a bit of time.
Maybe our problem comes from the fact that we have branches that must have quite a long life (up to 18 months), and there are many fixes that have to be done against them.
Making sure that we only branch from extremely stable code would probably help, but will not be so easy... :(
I think the tension can be handled by adding following ingredients to your development process:
Continuous integration
Automated functional tests (I presume you already count with unit tests)
Automated delivery
With continuous integration, every commit implies a build where all unit tests get executed and you are alarmed if anything goes wrong. You start working more with head and you are less prone to branching the code base.
With automated functional tests, you are able to test your application at the click of the button. Generally, since these tests take more time, these are run nightly.
With this, the classic role of versioning starts to lose the importance. You do not make your decision on when to release based on the version and its maturity, it’s more of the business decision. If you have implemented unit and functional testing and your team is submitting tested code, head should always in state that can be released. Bugs are continuously discovered and fixed and release delivered but this is not any more cyclical process, it’s the continuous one.
You will probably have two types of detractors, since this implies changing some long rooted practices. First, the paradigm shift of continuous delivery seems counter-intuitive to managers. “Aren’t we risking to ship a major bug?” If you take a look at Linux or Windows distros, this is exactly what they are doing: pushing releases towards customers. And since you count with suite of automated tests, dangers are further diminished.
Next, QA team or department. (Some would argue that the problem is their existence in itself!) They will generally be aversive towards automating tests. It means learning new and sometimes complicated tool. Here, the best is to preach by doing it. Our development team started working on continuous integrations and in the same time writing the suite of functional tests with Selenium. When QA team saw the tool in action, it was difficult to oppose its implementation.
Finally, the thurth is that the process I described is not as simple as addig 3 ingredinents to your development process. It implies a deep change to the way you develop software.