Good github structure when dealing with many small projects that have a common code base? - version-control

I'm working for a web development company and we're thinking about using GitHub for version control. We work with several different .NET-based CMS-platforms and of course with a lot of different customers.
We have a standard code base for each CMS which we start from when building a new site. We of course would like to maintain that and make it possible to merge some changes from a developed site (when the standard code base has been improved in some way).
We often need to make small changes to a published site at a later date and would like to be able to do this with minimal effort (i.e. the customer gladly pays for us to fix his problem in 2 hours, but doesn't want to pay for a 2 hour set up first).
How should we set this up to be able to work in an efficient fashion? I'm not very used to distributed version control (I've worked with CVS, Subversion and Clear Case before), but from my imagination we could:
Set up one repository for each customer, start with a copy of the standard code base and go from there. Lots of repositories of course.
Set up one repository for each CMS and then branch off one branch for each customer. This is probably (?) the best way, but what happens when we have 100 customers (=branches) in the same repository? It also doesn't feel entirely nice that we create a lot of branches that we don't really have any intention of ever merging back to the main branch.
I don't know, perhaps lots of branches is only a problem in my imagination or perhaps there are better ways to do this that I haven't thought about. I would be interested in any experince in similar problems.
Thank you for your time and help.

With Git, several repos make sense for submodules purpose (sharing common component, see nature of Git submodules, in the third part of the answer)
But in your case, one repo with a branch per customer can work, provided you are using the branches to:
isolate some client-specific changes (and long-lived branch with no merging back to master are ok),
while rebasing those same branches on top of master (which contains the common code, with common evolutions needed by all the branches).

Related

Mantaining a CMS and websites with Mercurial

I'm pretty new to mercurial and after reading a few tutorials I'm still doubtful on what's the best way to do what I intend to do with it.
My goal would be to mantain a CMS that I'm developing (adding new features, fixing bugs, etc.) and being able to easily distributes those updates to websites I make with said CMS.
I started by making a repository for the CMS itself, then when I want to make a new website clone the CMS repository and work on it.
Now the questions: working on a website there are changes that will be specific for that and changes that I'd like to see also on the main CMS repository. How to distinguish them?
Should I create a new branch and commit all the website specific changes to that branch and the general changes to the default branch? Or shall I use tags?
What I'm looking for is an easy way to push back changes to the CMS repository, then continue to develop the CMS (in other websites for example) and eventually update all websites I made with the CMS with new features and bug fixes without too much hassle.
What's the best way to deal with the situation I described?
Thanks in advance.
Well, you really ask at least two questions (as I see)
How to maintain diverged lines of development?
How to easy distribute changes from one (?) DEVEL env to different PRODs env
Full final answer on second question will require to clarify many of the specific details, and I propose that we'll postpone it for a while.
On the first question: you are right, named branches (into which you periodically merge default branch with shared changes) for sites inside single repo may be good choice (not tags, which are only easy memorable labels for changesets).
Alternative solution may use mq on top of single default branch (with separate mq-queues for each site)

DVCS strategies for mixed-role small team?

I've done a lot of reading, and have been trialing GIT, GIT Tortoise, Tortoise SVN and PlasticSCM, to find the right source control for our small team (5-10 users).
Some background on our team: 6 copy writers/editors (2 remote), 2 developers, 2 graphic designers. We are not always working on projects together, sometimes up to 5 of us might be working on a given project. I'm unconcerned about the developers with DVCS, my concern is mainly around the other roles who are (in the nicest way) limited in their technical capability. Some of our copy writers update multiple source files (HTML, PDFs and adding concept graphics) to live, unversioned build directories (backed up as build.23.06.11.new.new.final.zip!). The copy and GD team will not have time, or to be brutally honest, the inclination to merge/resolve conflicts, or probably even remember to switch branches.
A few SO questions have shed light on what what seems to be a fairly consistent approach - main trunk (no junk in the trunk!) with teams having their own branches, and having release branches etc.
Every time I've re-read the links...
https://stackoverflow.com/questions/3854583/version-control-system-for-small-in-house-team
Getting started with Version Control
http://svn-ref.assembla.com/subversion-how-tos.html
...and google in general, I still end up asking myself the same questions:
Is it a Bad Idea to create role-specific branches for "trouble points" (copy team), where they can push to the repo, then our developers will merge their work into the actual project branch?
Should I still try to enforce a task-per-branch for everyone else?
Should I do task-per-branch for everyone but let the copy team create very broad tasks?
Is there usually a team/group/person who is considered an "admin" role for a repo who does crucial merges?
(is there an alternative suggested workflow where copy writers don't touch source?)
Unfortunately, the copy teams play a vital role in updating files which in turn affect layouts and all sorts of things, on a continual basis during dev. Its not like I can keep them in a bubble until the end of a project and chuck their work in.
... the good news is that hopefully, after a number of years, I'm ready to force everyone to move to version control! We've also settled on PlasticSCM for its intuitive GUI and Windows integration.
The best answer to this question would try to answer the 4 points above - tackle point 5 if you like - explain weak points if possible, and provide advice, gotchyas, etc.
cheers!
So basically you want to know how to get team-member of different skill-levels to use SCM and play nice with each other.
Buy-in from your team is priority #1. If you can't make them learn it, then you're left with providing a path of least resistance. So you really need to be flexible. There might be a wrong-way and a right-way to use the tool, but if the users won't accept the right-way, then the wrong-way is better than them not using it at all. How you achieve this balance is going to be different for every team.
Is it a Bad Idea to create role-specific branches for "trouble points" (copy team), where they can push to the repo, then our developers will merge their work into the actual project branch?
No, maybe its not optimal, but if this makes it easy for the Copy Team, then thats what you're left with. You could probably go even further and setup each user with their own branch. Then they never have to worry about merging other peoples changes.
Should I still try to enforce a task-per-branch for everyone else?
Each dev should have a unique "local" branch, that is not tracking an upstream branch. For example, use something generic like mydev. This makes it easy for them to switch between their local code and the current upstream branch.
You don't necessarily need to force everyone to create a local branch for every task, cause in the end, you're going to want them just to rebase their working branch onto the upstream one, and commit so it just becomes a fast-forward (i.e. linear commit).
Now for tasks that multiple devs are working on, or it is a feature that involves groups of smaller commits, then yes it does make sense to force them to create a new specific task branch. When they merge they can make sure to force a merge-commit, then it is clear that a set of commits are grouped together and all were part of a specific task. The merge commit will display like merged branch feature-X.
Should I do task-per-branch for everyone but let the copy team create very broad tasks?
It's really up to how much buy-in you can get from the Copy Team. I think if they really get confused with the DVCS tools, then you have to scale back until you can find something that does not cause too much of an impact.
One solution, is to have one of your devs help integrate the Copy Teams changes into another branch that everyone else will look at. That will help offload the learning-curve of the tool onto someone outside of the Copy Team.
Is there usually a team/group/person who is considered an "admin" role for a repo who does crucial merges?
Yes, this makes sense. However the great thing about SCM, is that everyone will be able to go back and do a code review on a merge. So if a merge breaks the code, you can either append the corrections after the merge, or remove the merge, and do it over.
(is there an alternative suggested workflow where copy writers don't touch source?)
Well, one possible technique is the Integration Manager model. The developers commit changes to their own share repos, but its up to the integration manger, to merge in the changes to the blessed repository.
I'm sure there are other methods that might work for your users, but this question is slightly ambiguous.

Best practice to maintain source code under version control with multiple companies?

I'm wondering if there is any best practice for maintaining your source code under version control among different companies. In Open Source there is a maintainer, who receives patches, decides on them and applies them. But what about closed sourced projects where different companies get different workloads and just commit them to the trunk and branches? Is this maintainer concept applicable to a project on which multiple companies work on?
You can choose from a wide range of version control systems. (Not only subversion)
With the "versioning" concept you are safe that no one damages the project permanently.
So there is no need for a manual approval process, especially when there are contracts for example between the participating companies.
I'd also set up a commit mailinglist so you have some kind of peer review of changes. So no changes can be done without anyone noticing them.
If applicable set up some kind of continous integration environment to keep the quality up.
I don't understand the question about the branches. The decision whether to use them or not is IMHO not depending on the fact that the commiters are employed in the same company or not.
Its really up to you to decide which workflow works best for the companies involved. Subversion has the ability to add permissions to your trunk and branches allowing you to lock down certain parts of your repository to people who are "trusted" with merge access to trunk. You'll need good communication amongst the companies. Using the open source Trac provides a wiki, integrated RSS feeds of the commits to the project and code browser.
Usually, each site works on its dedicated branch and can import the other remote site branch, to decide what to integrate in its own work.
But if a site need to work directly on the other site branch, one possible practice is the concept of branch membership which allows only one site at a time to work on a given branch.
(not sure it is possible with SVN though)
That allows for two remote site (with a large time shift) to work on the same task in a tightly integrated manner.
My recommendation : subversion, with that configured you give away a url and then checkout, update, get things done and when you guess that the project is ready, snapshot and deliver.

Parallel Dev: Should developers work within the same branch?

Should multiple developers work within the same branch, and update - modify - commit ? Or should each developer have his/her own each branch exclusively? And how would sharing branches impact an environment where you are doing routine maintenance as opposed to unmaintained code streams? Also, how would this work if you deploy each developers work as soon as it is done and passes testing (rapidly, as opposed to putting all of their work into a single release).
In general, I have found that having developers (who are working on the same project) use the same branch is better for finding integration problems sooner. If developers are each using an individual branch, then you're just delaying possible integration problems until later, when you merge the branches.
Of course, having developers work on the same branch means you need to have actual communication between those developers, but that's a social problem and not a technical one.
Developers would work on separate branches when there is a good reason for that branch to exist in the first place (such as a patch release of a previous version of the software, or a special build for a specific customer).
Note that tools such as Git and Mercurial allow developers to easily create their own private branches to organise their own work. This is a different situation from more than one developer sharing a branch, and (usually short-lived) private branches should be encouraged.
Branches are meant as a way to version control any feature or experimental piece of code that may break the mainline/trunk.
While it is common for developers to have their own personal branches for deep experimentation, often branches center around a new feature being added. These new features often require more than one person to be committing.
For example, on a web project, two developers and a designer may be doing a facelift to their company website. They still need to keep their mainline/trunk code clean in case they need to make a quick change to it before the facelift is complete. So they create a "facelift" branch and work on that instead. While the developers are committing javascript, the designer can be committing CSS and images. Once the facelift feature is complete, they can merge it into the mainline and send it live.
The only reason any of them would need personal branches would be for experimenting. Perhaps the designer is trying to implement "sliding door" tabs and can't get the padding right in IE6, for example. If he solves the problem, he can merge it into the facelift branch, if he can't, he simply ignores it and continues with the rest of the design back in the facelift branch.
To some extent, the version control software you are using will nudge you into a particular approach. GIT is geared toward open-source contributors and resembles the "one developer" model (branching isn't even a concept in GIT. GIT is more about managing changes). Clearcase is more corporate, so you do have multiple developers on a branch, but each developer gets to play in his or her own view.
I agree with Greg's answer, this is more a social planning issue. Lots of devs on one branch will step on each other's toes. I've been on a project where there were more developers than individual source files :)
I think that merging of branches can be problematic (dropped or inconsistent functionality), regardless of how good the source control tools are. I would more readily opt for multiple developers working on a single main branch. There could be other branches for things like production bug fixes or proof-of-concepts (POCs), where merging could/should happen very soon after change (bug fixes) or good chance that merging may not need to happen (POCs).

Version control best practices

I just made the move to version control the other day, and after a bad experience with Subversion, I switched to Mercurial, and so far am happy with it.
Although I understand and appreciate the idea of version control, I don't really have any practical experience with it.
Right now, I am using it for a couple websites I am working on, and a couple questions have come to mind:
When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?
Would I branch off when I wanted to, say, change the layout of a menu, then merge back in?
Should I branch? What is the difference (for just me, a lone developer) between branching, then merging back in, and cloning the repository and pulling it back in?
Any other advice for a version control newbie?
So far, everyone has given me good advice, but very team-oriented. I would like to clarify:
At the moment, I am just using VC on some websites I do on the side. Not quite full-out freelance work, but for the purposes of VC, I am the only one that really touches the website code.
Also, since I am using PHP on the sites, there is no compiling to be done.
Does this change your answers significantly?
Most of the questions you're asking about depends mostly on who you are working with. If you're a lone developer it shouldn't matter a lot, since you can do whatever you'd like. But if you're in a team where you have to share your code then you should discuss with your team members what the code of conduct should be since sharing changes between one another can become tricky at times.
The discussion regarding code of conduct doesn't need to be lengthy, it can be very brief; as long everyone is on the same page on how to use the repository that is shared between the programmers in the team. If you want to use the more advanced features in Mercurial, such as cherry picking or patch queues, then try using them so that it won't impact your team members in a negative way, such as rebasing on a public repository.
Remember version control has to be easy to use for everyone in the team, or else it won't be used.
When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?
While working with a team there are several approaches, but the common rule is to commit early and often. The main reason on why you should commit often is to make merge conflicts easier to handle.
A merge conflict is simply put whenever merging a file that has been changed by at least two people doesn't work because they've been editing on the same lines. If you're holding on to a commit that involves a very large change with several lines of changes across several files, it will become very difficult to manage for the receiver to manage the conflicts that may occur. The merge conflict becomes even more difficult to handle if the said set of changes are held on for too long.
There are some exceptions to the rule of committing often and one is whenever you have a breaking change. although if you have the ability to commit locally (which you are doing in Mercurial and git inherently) you could commit breaking changes. As long as you fix whatever broke, you should push it upstream to the shared repository when you've fixed your own breaking change.
Would I branch off when I wanted to, say, change the layout of a menu, then merge back in?
Should I branch?
There are many branching strategies to choose from (there is the Streamed Lines paper from 1998 that has an exhaustive pattern list of branching strategies) and when you're making them for yourself it should be open game for yourself. However when working in teams, you'd better discuss openly with the team if you need to branch or not. Whenever you have the urge to branch though you should ask yourself the following questions:
Will my future changes be breaking the work of others?
Will my team have a direct negative impact from the changes I'll be doing until I'm done?
Is my code throwaway code?
If the answer is yes in any of the questions above you should probably branch publically, or keep it for yourself (since you can do that in Mercurial in several ways). You should first discuss with your team on how to execute the whole endavour to see if there is any other way of doing it and if you're going to merge your changes back in, sometimes there are factors at play where there is no need to branch (this is mostly related to how modular the code is).
When you decide to branch be prepared to handle a merge conflict. It is sane to assume the one who created the branch and made the commits to be able to merge it back into the "main branch". At these times it would be great if everyone in the team made relevant commit comments.
As a side note: You do write good commit comments, right? RIGHT!? A good commit comment usually tells why that particular change was made or what feature the committer was working on instead of a nondescript "I made a commit" kind of comment. This makes it easier for the one who is handling the big merge conflict to figure out what line changes can be overwritten and which ones to keep while going through the revision history.
Compile times, or build times rather, sometimes play into the branch discussion you may have. If your project has a slow build time then it might be a good idea to use a staging strategy in your branches. This strategy takes into account that all developers should integrate to a "main line" and changes that are approved are elevated (or "promoted") to the next stage, such as testing or release lines. It is classically illustrated with tag names for open source software like this:
main -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-> ...
\ \ \
test o-----------o--------------o---------> ...
1.0 RC1 \ 1.0 RC2 2.0 RC1
release o----------------------> ...
1.0
The point with this is that testers can work without being interrupted by the programmers and that there is a known baseline for those who are in release management. In distributed version control, the different lines could be cloned repositories and it may look a bit different since repositories share the versioning graph. The principle however is the same.
Regarding web development, there are virtually no build times. But branching in stages (or by tagging your release revisions) it becomes easier to roll-back if you want to check a difficult-to-track-down bug.
However, a whole other thing comes into play and that is the time it takes to deploy the site. Version control tools in my experience are really bad at asset management. Handling art assets that are in total up to several GB usually is a huge pain in the butt to handle in Subversion (more so in Mercurial). Assets may require you to handle them in another way that is less time consuming, such as putting them in a shared space that are synched and backed up in a traditional manner (art assets are usually not worked on concurrently as with source code files).
What is the difference (for just me, a lone developer) between branching, then merging back in, and cloning the repository and pulling it back in?
The concepts of branching and keeping remote repositories are closer now than with centralized version control tools. You could almost consider them being the same thing. In Mercurial (and in git) you can "branch" either by:
Cloning a repository
Creating a named branch
Creating a named branch means that you're making a new path in the versioning graph for the repository you're creating it on. Creating a cloned repository means you're copying the source repository into a new location, and making a new path in the cloned repository's versioning graph. They are both two different implementations of branching as a general concept in version control.
In practice, the only difference between both methods that you should care about is in usage. You clone a repository to have a copy of the source code and have a place to store your own changes in and you create named branches whenever you want to do small experiments for yourself.
Since browsing through branches is a bit quirky for those who accustomed to a straight line of commits, advanced users know how to manipulate their versions so the version history is "clean" with e.g. cherry picking or rebase. At the moment git docs actually explain rebase rather well.
These are the practices that I follow
Each commit should make sense: one bug fix (or a set of bugs related to each other), one (small) new feature, etc. The idea is that if you need to rollback, your rollbacks fall on well defined "boundaries"
Every commit should have a good message explaining what you are committing. Really get into this habit, you will thank yourself later. Doesn't have to be verbose, a few sentences can do. If you are using a bug tracking system, associating a bug number with your commit is also extremely helpful
Now that I use git and branching is so incredibly fast and cheap, I tend to make a new branch for each new feature I'm about to implement. I'd never even consider doing this for many other VCSes. So branching depends on the system you are using, your codebase, your team, etc, there are no hard rules there.
I prefer to always use the command line and get to know my VCS's commands directly. The disconnect that a GUI based frontend can cause can be a pain, and even damaging. Controlling your source code is very important, it's worth getting in there and doing it directly. But that's just my preference.
Back up your VCS. I back up my local repository with Time Machine, and then I push out to a remote repository on my server, and that server is backed up as well. VCS alone is not really a "backup", it can go down too just like anything else.
When/how often should I commit?
You'll probably get lots of contradictory answers on this one. My view is that you should commit changes when they are working, and each commit (or checkin) should contain exactly one "edit". An "edit" is an atomic set of changes that go together to fix a bug or implement a new feature.
There is a theory that you should check in code every few hours even if it's not working, but in that case you will need to be working on your own branch - you don't want to be checking in broken code to your main line, or onto a shared branch.
The advantage of checking in every night is that you have backups (assuming that your repository is on a different machine).
As for branching:
you should have main line that contains always working code.
you should have a current development branch that contains the latest code. When you are happy with this (and it's passed all it's tests) you should merge it back into the main line.
you might want a branch that contains the last released version. This can be used for testing/debugging bugs and releasing patches (in extremis).
update before each commit
provide commit comments
commit as soon as you have something finished
don't commit anything that makes the code in the repository not compiling or buggy
update every morning
sometimes verbally communicate with colleages if there is something important to update
commit code relevant to exactly one thing (i.e. fixing a bug, implementing a feature)
don't worry to make very small commits, as long as they conform to the previous rule
Btw, what's the bad experience with Subversion?
I commit when I am finished a piece of work and only if it is working. It's bad practise to commit to somewhere where other people use the code.
Branching is something that people will argue about. Some people say never branch and just have switches to get something working or not. Do what you feel more comfortable but don't branch just because you can. I use branching and Branch when i am working on a major bit of work where if I commit broken code by accident its not going to affect everyone else.
Q: When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?
A: Whenever you are feeling comfortable, I am commiting as soon as a unit of work is finished and working (which does not mean that the complete task has to be finished). But you should not commit something that does not compile (might inhibit other people in the team,if any). Also, you should not commit incomplete stuff to the trunk if there is any possibility that you have to implement a quick fix or small change before completing it.
Q: Would I branch off when I wanted to, say, change the layout of a menu, then merge back in?
A: Only if there is a possibility that you have to implement a quick fix or small change before completing your task.
The nice thing about branching is that all commits you are doing in the branch will still be available for future reference (if necessary). Also it is much simpler and faster than cloning the repo, I think ;-)
I agree with others on commit times.
Regarding branching, I generally branch only when working on something that breaks what others are doing or when a patch needs to be rolled to production in a file that already has changes that should not go to production. If you're only one developer, then the first scenario doesn't really apply.
I use tags to manage releases - the "production" tag is always associated with the current prod code, and each release is tagged with "release-YYYYMMDD". This allows you to roll back if necessary.