Managing branches in software version control

Managing branches in software version control - version-control

I work on a software code base which goes into similar platforms. The platforms differ in hardware and in some software features but the large chunk of software is common. What is the best way to manage such a software project in version control system? If I create multiple branches based on each platform then I have to make sure I do multiple commits into each applicable branches. How do I maintain these branches? How do I comply to "single source of truth" philosophy? Aren't branches primarily for bug, feature development etc. that eventually get merged to mainline?

Rearrange your projects such that you have your common code in one directory (or set of files), and your platform-specific code in its own files/directories. In other words, make a logical separation of this code in a single codebase.
Branches will be a rapid descent into madness and really aren't a good way to make this separation in the first place.

Related

Source control system to branch by user instead of version

Once again, I'm a bit stumped about the best stack-exchange site on which to post this question. But I think developers are best suited to answer questions about source control, so here it is.
I am considering a crowd-sourced, user-rated game development project and am wondering what, if any, source control and merging systems might best be capable of hosting the kinds of source control I'm interested in. By user-rated, I mean that there will be some kind of rating/voting system like that found here on StackOverflow. For some details on the project idea, you can read my posting about it at http://gamedev.enigmadream.com/index.php?topic=1589.0. What I think I need is:
Ability to branch by user and maximize merging capabilities. I know source control systems are mainly focused on branching by version, and we could maybe think of each user maintaining their own version. But I guess we need some really robust merging capabilities to maximize the abilities of one user to merge changes from another user into their own branch, for example. So I think I would like the ability for "cross-branch" merging without having to merge into the common root branch first. (I'm most familiar with Team Foundation Server (TFS), which doesn't easily support this.)
Massive branching and merging. If there are hundreds or thousands of people wanting to incorporate their own changes into the project, there could be a lot of branches, and the system would need to be able to handle that without a meltdown. A single user might want to create multiple branches deriving from multiple other users' branches under their own name too, ideally, with the ability to merge among them to some extent.
Permission control by branch. I see SourceForge supports Subversion and Mercurial, but does not currently support permission controls by path/branch on these (as far as I can tell), although that does appear to be a feature under consideration. Users should be limited from pushing their code into other branches. I suspect the normal operations for a user would be pulling edits from other branches into their own branch, and checking in additional changes in their own branch.
A voting system. I know I shouldn't expect a source control system to support voting natively, but anything that could contribute to making this possible would be helpful. For example, maybe a voting system would involve or rely on the ability to label the best edits from various branches and pull them into a single file based on a label or a set of labels. And anything that would assist in merging the results of a selected set of labels from various branches (perhaps applying a new label to the set) could help too.
Very few files and possibly no directories. I would be willing to give up the ability to manage a large number of files or directories in exchange to gain any of the above because the format for the game file I'm considering is generally contained in a single text (XML or HTML5 -- haven't decided yet) file. But this does mean that the system should be pretty good at merging edits to relatively large text files efficiently. I know Team Foundation Server does a pretty good job of maintaining just changes to a file. I hope other source control systems do at least as well.
Or is source control not the proper paradigm to be talking about here? Is there some other technology ideal for merging code like this, one that doesn't involve source control and/or branching the way I'm thinking about it?

Any VCS, because "...source control systems are mainly focused on branching by version..." is just wrong, VCS support diverged changes of code over time, nothing more and nothing less
Any DVCS, because they have reasonable good branch-merge capabilities from the ground
Mercurial, which have branch-level ACLs, SVN have path-based ACLs. And because Subversion have physical tree repository (at some degree), ACLs can be applied to any part of subtree, i.e to branches also
Any CodeReview tool, integrated with VCS and modified for specific-reqs
Fossil SCM is single-file portable EXE, repo - one file; any DVCS also add only one dir of repo to existing tree and handle big files without headache

Is using “feature branches” compatible with refactoring?

“feature branches” is when each feature is developed in its own branch and only merged into the main line when it has been tested and is ready to ship. This allows the product owner to choose the features that go into a given shipment and to “park” feature that are part written if more important work comes in (e.g. a customer phones up the MD to complain).
“refactoring” is transforming the code to improve its design so as to reduce to cost of change. Without doing this continually you tend to get uglier code bases which is more difficult to write tests for.
In real life there are always customers that have been sold new features and due to politics all the customers have to see that progress is being made on “their” group of features. So it is very rarely that there is a time without a lot of half-finished features sitting on branches.
If any refactoring has been done, the merging in the “feature branches” become a lot harder if not impossible.
Do we just have to give up on being able to do any refactoring?
See also "How do you handle the tension between refactoring and the need for merging?"
My view these days is that due to the political reasons that resulted in these long living branches and the disempowerment of the development director that prevented him from taking action, I should have quicker started looking for a new job.

Feature branches certainly make refactoring much harder. They also make things like continuous integration and deployment harder, because you are ballooning the number of parallel development streams that need to be built an tested. You are also obviating the central tenet of "continuous integration" -- that everyone is working on the same codebase and "continuously" integrating their changes with the rest of the team's changes. Typically, when feature branches are in use, the feature branch isn't continuously built or tested, so the first time the "feature branch" code gets run through the production build/test/deploy process is when it is "done" and merged into the trunk. This can introduce a whole host of problems at a late and critical stage of your development process.
I hold the controversial opinion that you should avoid feature branches at (nearly) all costs. The cost of merging is very high, and (perhaps more importantly) the opportunity cost of failing to "continuously integrate" into a shared code base is even higher.
In your scenario, are you sure you need a separate feature branch for each client's feature(s)? Could you instead develop those features in the trunk but leave them disabled until they are ready?. Generally, I think it is better to develop "features" this way -- check them in to trunk even if they aren't production-ready, but leave them out of the application until they are ready. This practice also encourages you to keep your components well-factored and shielded behind well-designed interfaces. The "feature branch" approach gives you the excuse to make sweeping changes across the code base to implement the new feature.

I like this provoking thesis ('giving up refactoring'), because it enriches discussion :)
I agree that you have to be very careful with bigger refactoring when having lots of parallel codelines, because conflicts can increase integration work a lot and even cause introducing regression-bugs during merging.
Because of this with refactoring vs. feature-branches problem, there are lots of tradeoffs. Therefore I decide on a case by case basis:
On feature-branches I only do refactorings if they prepare my feature to be easier to implement. I always try to focus on the feature only. Branches should differ from trunk/mainline at least as possible.
Taking it reverse I sometimes even have refactoring branches, where I do bigger refactorings (reverting multiple steps is very easy and I don't distract my trunk colleagues). Of course I will tell my team, that I am doing this refactoring and try to plan to do it during a clean-up development cycle (call it sprint if you like).
If your mentioned politics are a big thing, then I would encapsulate the refactoring efforts internally and add it to estimation. In my view customers in middle-terms will see faster progress when having better code-quality. Most likely the won't understand refactoring (which makes sense, because this out of their scope...), so I hide this from them
What I would never do is to refactor on a release-branch, whose target is stability. Only bug-fixes are allowed there.
As summary I would plan my refactorings depending on codeline:
feature-branch: only smaller ones (if they "help" my feature)
refactoring-branch: for bigger ones, where the refactoring target isn't completely clear (I often call them "scribble refactorings")
trunk/mainline: OK, but I have to communicate with developers on feature-branches to not create an integration nightmare.
release-branch: never ever

Refactoring and merging are the two combined topics Plastic SCM focuses on. In fact there are two important areas to focus: one is dealing (during merge) with files that have been moved or renamed on a branch. The good news here is that all the "new age" SCMs will let you do that correctly (Plastic, Git, Hg) while the old ones simply fail (SVN, Perforce and the even older ones).
The other part is dealing with refactored code inside the same file: you know, you move your code and other developer modifies it in parallel. It is a harder problem but we do focus on it too with the new merge/diff toolset. Find the xdiff info here and the xmerge (cross-merging) here. A good discussion about how to find moved code here (compared to "beyond compare").
While the "directory merging" or structure merging issue is a core one (whether the system does it or not), the second one is more a tooling problem (how good your three-way merge and diff tools are). You can have Git and Hg for free to solve the first problem (and even Plastic SCM is now free too).

Part of the problem is that most merge tools are just too stupid to understand any refactoring. A simple rename of a method should be merged as a rename of the method, not as an edit to 101 lines of code. Therefore for example additional calls to the method in anther branch should be cope with automatically.
There are now some better merge tools (for example SemanticMerge) that are based on language parsing, designed to deal with code that has been moved and modified. JetBrains (the create of ReShaper) has just posted a blog on this.
There has been lots of research on this over the years, at last some products are coming to market.

Version control for multiple instances of a developing code

I work in an engineering lab, not a computer science lab. As such, our in-house software is not the deliverable product. Instead, the in-house software is used to analyze engineering problems, and we deliver the results.
This makes version control a living hell. Or perhaps I should just say that the standard "trunk and branch" version control tree structure doesn't seem to apply. I'm hoping someone can suggest a better way of doing things.
For example, each engineering project requires adding case specific input files, run-time files, and post-processing files. None of these really belong in trunk, because they aren't general, but each new project needs these files. We tried putting templates in trunk but there was no clear best practice as to when templates should be merged up.
Similarly, the in-house code is always evolving as we add new capabilities. Many of these should be merged into trunk so they will be available for future applications. However, there are also quite a few case-specific hacks which the trunk doesn't need to see.
How should we organize this mess? Obviously, the simpler the better.

We really try for our projects to keep separate:
source files (managed in any VCS of your choice, like SVN)
configuration files (specific to a team or an environment)
Branches are for development effort and those "input files, run-time files, and post-processing files" will evolve at their own pace.
For that kind of file, what we managed in a VCS are:
templates
scripts able to take that template and generate the (private, as in not versioned) config file with the right values in it.
The values come from another referential, like a database, where the teams (or environment administrators) can update them at will, without any concern about checkout/check-in/merge.
That database can then be versioned in its own VCS if needed (see this SO question for instance, or, as an alternative, that one)

In engineering version control is often underestimated whereas it is essential to restore given settings in order to repeat experiments. For general adoption easy to use, mostly GUI oriented, tools help a lot.
Leveraging version control with issue tracking that relates issues to code commits increases productivity tremendously.
Concerning repository structure, at least looking at subversion, there are just conventions but no strict rules imposed by the tool. What about having a tree called 'trunk' where all 'common code' is managed.
For every engineering task there is a branch created. Which is nothing else than a 'project folder' with version control. Source code relevant to other projects will be merged back to trunk.

Large development teams and SVN

A few questions regarding this topic:
1) What's the largest development team (doing actual commits, not counting read-only) you've had on a single SVN repository? Did you have any issues?
2) What's the largest size team you'd be comfortable with on a single SVN repository? Is a different version control tool better for very large teams? (Don't name IBM Rational, because it will get ignored and flamed, but others may be possible if a valid justification can be made. Solid Eclipse and Flex/Flash Builder IDE compatibility is a must.)
2a) Obviously this depends on the project, but are there any major shortcomings with reliance on splitting up 'large' dev teams into small, modular teams all of which utilize their own SVN repos?
3) Does it make sense for an organization to have two standard versioning tools, one for large systems (if needed) and one for small (~5 devs or less) systems?
For extra points:
4) What would you consider a "large" team (counting only developers since this is relating to SVN use, not QA, management, testers, etc)?

1/ We have amongst our many repo some used by 50 to 100 developers, for many years.
The issues are then:
bad naming convention (for branches or files, with special characters used when they really shouldn't)
pooling performance issue (with FishEye for instance)
2/ A central VCS has usually no special limit in term of repository side.
Large teams appreciate Perforce, very quick to checkout their workspace.
2a/ As you say, it depends on the project. For a true monolithic project with many inter-dependent part, the major shortcoming is the content synchronization you need to make between repo (you cannot update a module without impacting the others).
3/ Sure, that what we have.
Usually, the one reserved for large projects is a non-freeware one (especially because managers need to know there is actual VCS product support team they can rely on in case of major issues with this tool).
for smaller project, an open-source VCS (freeware) is enough.
But SVN can still manage both project sizes while being "free" (you still to pay for an administrator and for the infrastructure -- server, disk, backups, ... -- to run any tool, freeware or not).
4/ Any team larger than (in average) 15 people is likely to develop different parts of an application, at different pace. That becomes a modular development, and involve structuring its SVN repo carefully.

I've worked on an SVN repository that had well over a hundred active commiters, a revision number of over 80.000, and had been migrated from CVS 3 years before.
Generally, I'd say that SVN is not a likely bottleneck when it comes to large projects and large development teams. Sure, it may lack some features that could make some aspects easier, but that's completely insignificant compared to the organizational problems.

Strategies for Developing Multiple Products from One Codebase

I'm working on a project that will (soon) be branched into multiple different versions (Trial, Professional, Enterprise, etc).
I've been using Subversion since it was first released (and CVS before that), so I'm comfortable with the abstract notion of branches and tags. But in all my development experience, I've only ever really worked on trunk code. In a few rare cases, some other developer (who owned the repository) asked me to commit changes to a certain branch and I just did whatever he asked me to do. I consider "merging" a bizarre black art, and I've only ever attempted it under careful supervision.
But in this case, I'm responsible for the repository, and this kind of thing is totally new to me.
The vast majority of the code will be shared between all products, so I assume that code will always reside in trunk. I also assume I'll have a branch for each version, with tags for release builds of each product.
But beyond that, I don't know much, and I'm sure there are a thousand and one different ways to screw it up. If possible, I'd like to avoid screwing it up.
For example, let's say I want to develop a new feature, for the pro and enterprise versions, but I want to exclude that feature from the demo version. How would I accomplish that?
In my day-to-day development, I also assume I need to switch my development snapshot from branch to branch (or back to trunk) as I work. What's the best way to do that, in a way that minimizes confusion?
What other strategies, guidelines, and tips do you guys suggest?
UPDATE:
Well, all right then.
Looks like branching is not the right strategy at all. So I've changed the title of the question to remove the "branching" focus, and I'm broadening the question.
I suppose some of my other options are:
1) I could always distribute the full version of the software, with all features, and use the license to selectively enable and disable features based on authorization in the license. If I took this route, I can imagine a rat's nest of if/else blocks calling into a singleton "license manager" object of some sort. What's the best way of avoiding code-spaghettiism in a case like this?
2) I could use dependency injection. But generally, I hate it (since it moves logic from the source code into configuration files, which make the project more difficult to grok). And even then, I'm still distributing the full app and selecting features at runtime. If possible, I'd rather not distribute the enterprise version binaries to demo users.
3) If my platform supported conditional compilation, I could use #IFDEF blocks and build flags to selectively include features. That'd work well for big, chunky features like whole GUI panels. But what about for smaller, cross-cutting concerts...like logging or statistical tracking, for example?
4) I'm using ANT to build. Is there something like build-time dependency injection for ANT?

A most interesting question. I like the idea of distributing everything and then using a license key to enable and disable certain features. You have a valid concern about it being a lot of work to go through the code and continue to check if the user is licensed for a certain feature. It sounds a lot like you're working in java so what I would suggest is that you look into using an aspect weaver to insert the code for license checking at build time. There is still a going to be one object into which all calls for license checking goes but it isn't as bad of a practice if you're using an aspect, I would say that it is good practice.
For the most part you only need to read if something is licensed and you'll have a smallish number of components so the table could be kept in memory at all times and because it is just reads you shouldn't have too much trouble with threading.
As an alternative you could distribute a number of jars, one for each component which is licensed and only allow loading the classes which are licensed. You would have to tie into the class loader to achieve this.

Do you want to do this via Subversion ? I would use Subversion to maintain different releases (a branch per release e.g. v1.0, v2.0 etc.) but I would look at building different editions (trial/pro etc.) from the same codebase.
That way you're simply enabling or disabling various features via a build and you're not having to worry about synchronising different branches. If you use Subversion to manage different releases and different versions, I can see an explosion of branches/tags in the near future.
For switching, you can simply maintain a checked-out codebase, and use svn switch to checkout differing versions. It's a lot less time-consuming than performing new checkouts for each switch.

You are right not to jump on the branching and merging cart so fast. It's a PITA.
The only reason I would want to branch a subversion repository is if I want to share my code with another developer. For example, if you work on a feature together and it is not done yet, you should use a branch to communicate. Otherwise, I would stay on trunk as much as possible.
I second the recommendation of Brian to differentiate the releases on build and not on the code base.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse