How to manage source control changesets with multiple overlapping changes and daily rebuilds? - version-control

I am working at a company which uses cvs for version control.
We develop and maintain an online system involving around a hundred executables - which share large amounts of code, and a large number of supporting files.
We have a head branch, and live branches taken from the head branch. The live branches represent major releases which are released about every three months.
In addition there are a numerous daily bug fixes which must be applied to both the live branch - so they can be take to the live environment immeadiatley, and merged back to the head branch, so they will be in the next major release.
Our most obvious difficulty is with the daily fixes. As we have many daily modifications there are always multiple changes on the testing environment. Often when the executables are rebuilt for one task, untested changes to shared code get included in the build and taken to the live environment.
It seems to me we need some tools to better manage changesets.
I'm not the person who does the builds, so I am hoping to find a straight forward process for managing this, as it will make it easier for me get the build manager interested in adopting it.

I think what you need is a change in repository layout. If I understand correctly, your repository looks like this:
Mainline
|
-- Live branch January (v 1.0)
|
-- Live branch April (v 2.0)
|
-- Live branch July (v 3.0)
So each of the branches contains all your sites (hundreds) as well as folders for shared code.
There is no scientific way to exactly tell the chance of an error appearing after a release but lets have a look at the two most important factors:
Number of code lines commited per time-unit. You can not / will not want to globaly change this as it is the developer's productivity output.
Test-Coverage, aka how often gets code executed BEFORE beeing live and how much of your codebase is involved. This could easily be change by giving people more time to test before a release or by implementing automated tests. Its a ressources issue.
If your company neither wants to spend money on extra testing nor decrease release frequency (not neccessarily productivity!) you will indeed have to find a way to release less changes, effecively decreasing the number of changed lines of code per release.
As a result of this insight, having all developers committing into the same branch and going live from there multiple times a day doesn't sound like a good idea, does it?
You want increased Isolation.
Isolation in most version controll systems is implemented by
Incrementing revision numbers per atomic commit
Branching
You could try to implement a solution that packs changes from multiple revisions into release-packages a bit like the version controll system "perforce" does it. I wouldn't do that though as branching is allmost allways easier. Keep the KISS principle in mind.
So how could branching help?
You could try to Isolate change that have to go live today from changes that might have to go live tomorrow or next week.
Iteration Branches
Mainline
|
-- Live branch July (v 3.0)
|
-- Monday (may result in releases 3.0.1 - 3.0.5)
|
-- Thuesday (may result in releases 3.0.6 - 3.0.8)
|
-- Wednesday (may result in releases 3.0.9 - 3.0.14)
People need to spend more thought about "targeting" their changes to the right release but it could lead to not-so-urgent changes (especialy on shared/library code) staying longer OUTSIDE of a release and inside the live branch where by chance or systematic testing they could be discovered before going live (see factor test coverage).
Additional merging down is required of course and sometimes cherrypicking of changes out of the live-branch into the daily-branch.
Now please dont take me too literaly with the daily branches. In my company we have 2-week iterations and for each iteration a release branch and it is enough overhead allready to maintain that branch.
Instead of isolating by day you could try to isolate by product/site.
Project Branches
Mainline
|
-- Live branch July (v 3.0)
|
-- Mysite-A (released when something in A changed and only released to the destination of A)
|
-- Mysite-B
|
-- Mysite-C
In this scenario the code of the single site AND all needed shared code and libraries would reside in such a Site-branch.
If shared code has to be altered for something to work within site A you only change the shared code in site A. You also merge the change down so anyone can catch up on your changes. Catching up cycles may be a lot longer than releases so the code has time to "ripe".
In your deploy/build process you have to make sure that the shared code of site-A does NOT overwrite the shared code site-B uses of course. You are effectivly "forking" your shared code with all implication (incompatibility, overhead for integrating team-changes).
Once in a while there should be forced merges down to the live-branch (you might want to rename that then too) to integrate all changes that have been done on shared code. Your 3-month iteration will force you to that anyway I guess but you might find out that 3 months is too long for hassle-free integration.
The third approach is the most extrem.
Project & Iteration Branches
Mainline
|
-- Live branch July (v 3.0)
|
-- Mysite-A
|
-- Week 1
|
-- Week 2
|
-- Week 3
|
-- Mysite-B
|
-- Week 1
|
-- Week 2
|
-- Week 3
|
-- Mysite-C
|
-- Week 1
|
-- Week 2
|
-- Week 3
This certainly brings a huge ammount of overhead and potential headache if you are not paying attention. On the good side you can very accuratly deploy only the changes that are needed NOW for THIS Project/Site.
I hope this all gives you some ideas.
Applied source controll is alot about risk controll for increased product quality.
While the decision what level of quality your company wants to deliver might not be in your hands, knowing it will help you deciding what changes to suggest. Might turn out your customers are adequatly content with your quality and further efforts to increase it do not amortize.
Good luck.
Christoph

Related

UCM Clear Case:How do I force an activity merge?

Often when I'm delivering activities for a build, I get an issue where one or two activities have dependencies on other activities that are not yet ready to be deployed.
What I want to do in most of these situations is force a merge between the two changes and deploy the stream, so that any changes in development that are lost during merge can be recovered.
What happens instead is that ClearCase forces me to move these changes to a new activity and include the activity if I want to make the delivery at all.
I have heard that I can render a branch obsolete - which would be satisfactory in some cases, but occasionally there are changes I might want to include in the deployment - is there any way for me to force a merge between two changes before making a deployment?
Sometimes, UCM doesn't let you make a deliver because of "linked" activities, that is because a previous deliver has created a timeline which made those activities linked (meaning you can no long er deliver one without the other)
In those instances, you still can merge any activity you want in a non-UCM fashion with cleartool findmerge: see "How to merge changes from a specific UCM activity from one ClearCase stream to another" for a complete example.
Then later, you will make your deliver (with all activities from the source stream).
Adding on to #VonC's answer...
There are a couple of ways that you can wind up with activities linked together:
Version dependency: Activity A has versions 1,2,&3 of foo.c Activity B has version 4 of foo.c. You can sometimes also have "1 & 3" and "2 & 4"
Baseline Dependency: Activities A and B were delivered in the same deliver operation from Stream X to sibling stream Y. From that time forward, A&B have to be delivered together since they are in the same "deliverbl" baseline.
Number 1 can be changed by rearranging the changesets using
cleartool chacct -fcsets {Activity X} -tcsets {activity Z} foo.c
Number 2 is pretty much set in stone...

Mercurial workflow with stable and default branches

We are trying to migrate from Subversion to Mercurial but we are encountering some problems. First a bit of background:
Desired workflow:
We would like to have just two named branches, stable and default, within one repository.
Development takes place on default branch.
Bug fixes are committed to stable branch and merged to default.
After every Sprint we tag our default branch.
Eventually we can release a new version, for which we bring some code (possibly the latest Sprint tag) from default over to stable (update stable, merge Sprint_xyz), tag the branch (tag Release_xyz) and release.
We also want the following jobs on our Jenkins build server for CI:
End-of-Sprint job: This job should tag default with something like Sprint_xyz
Release job: This job should bring the latest "Sprint" tag changes over to the stable branch, then tag stable with something like Release_6.0.0 and build a release.
Some more background:
Mercurial is new to us, but for what we have seen, this seems like a sane approach. We chose tags to mark releases over named-branches and cloned-branches trying to make the development workflow as straightforward as possible (single merge step, single checkout, only a couple of branches to keep track of...).
We use scrum and potentially (but not necessarily) release a version after each sprint which may (or not) become part of the stable branch and turn into a "shipable" release.
The problem we are encountering (and which is making us wonder if we are approaching this the right way...) is the following:
We work on the default branch ('d' on the poor-man's-graph that follow):
d -o-o-o-o-
We finish a sprint and trigger an End-of-Sprint job (using Jenkins) which tags default with "Sprint 1":
d -o-o-o-o-o-
|
Sprint 1
To release Sprint 1 we update to stable branch ('s') and merge changes from the Sprint 1 tag revision and commit:
Sprint 1
|
d -o-o-o-o-o-
\
s -o-o-o-o-o-o-
Tag stable and commit:
Sprint 1
|
d -o-o-o-o-o-
\
s -o-o-o-o-o-o-o-
|
Release 1
Update to default and merge stable since default should stay a superset of stable, commit and push:
Sprint 1
|
d -o-o-o-o-o-o-o-o-o-
\ /
s -o-o-o-o-o-o-o-
|
Release 1
The problem is that when merging .hgtags from 's' to 'd' mercurial encounters a conflict which holds the release job from completing. The resulting .hgtags should contain information from both involved tags.
We have searched for a solution to this, and could probably automate these type of merge conflicts with some hooks and scripts, but it looks like an unnecessary and error-prone hack to support a workflow that otherwise seems nothing out of the ordinary.
Is there something inherently wrong with our approach that causes us to encounter these problems?
If not, what is the best way to solve these issues without having to rely on a scripts/hooks approach?
Is there a better approach that would support our workflow?
I would go for the special case hooks. The problem you're seeing is related to the Mercurial philosophy of versioning metadata in the same way as normal repository data. This is simple and effective, and leads to a system that's overall easier to understand. But in this case it also leads to your merge conflict.
The reason it leads to a merge conflict is relatively simple. The .hgtags file is just a text file with a bunch of lines in it. Each line contains a hash and the associated tag. In one branch you've added the Sprint 1 tag. In another branch you've added the Release 1 tag. These show up as one line being added to the end of the file in one branch, and a different line being added to the end of the file in another branch.
Then you merge the two branches. Suddenly Mercurial is faced with a decision. Which line should it take? Should it take both of them? If it were source code, there would really be no way to tell without human intervention.
But it isn't source code. It's a bunch of tags. The rule should be 'if the two lines being added refer to different tags, just take both of them'. But it isn't because Mercurial is treating it like a bog-standard text file that could be important source code.
Really, the .hgtags file should be handled in a fairly special way for merges. And it might actually be good to add code that handles it that way into mainline Mercurial to support your use-case.
IMHO Mercurial should be modified so that the .hgtags file would only give you a conflict warning if you have two different hashes for the same tag. The other weird case would be if you have a tag with a hash that isn't an ancestor of the change in which the tag appears. That case should be called out somehow when doing a merge, but it isn't really a conflict.
I suspect you're merging the tagged changeset from default to stable. If you merge the tagging changeset instead, you shouldn't get the merge conflict when you merge the second (probably also tagging!) changeset back to default.

Branch per promotion tradeoffs in TFS

Suppose we have a big TFS 2010 project with three branches: MAIN, TST and PRD.
The strategy is: whenever a Sprint finishes MAIN is copied/merged into TST. Whenever TST is deemed stable it is copied/merged into PRD. Whenever TST or PRD have fixes, they're merged back to MAIN, or MAIN and TST. (Don't ask me why, I can't control this and I don't particularly like it.)
At each promotion step, as I understand, one can either:
delete the target branch and branch again - this entails losing immediate access to that branch's history (it can always be recovered, right?);
merge and resolve with acceptTheirs - this entails loosing changes that may not have been merged back from target to origin.
For the merge-backs, it is important to have ancestry information. With 1. I would expect ancestry information to be kept. With 2. I am unsure.
So, two questions:
Are those two the possible/desirable ways to go about promoting software between branches?
I which cases is ancestry information not kept?
Extra points for any additional tradeoffs that might be relevant for big-size repositories.
1.Are those two the possible/desirable ways to go about promoting software between branches?
If MAIN has a child branch TST, which has a child branch PRD then without resorting to baseless merges, these are the only merges possible to promote changes between branches.
If this is a desirable branching strategy, depends on many factors like how many parallel releases are put out and team sizes. A good reference guide on this is the branching guidance of the TFS Rangers http://vsarbranchingguide.codeplex.com/ The version you are seem to be using is a variation of the basic dual branch plan (what you call main, they call dev and your production branches aren't unique labeled). This branching strategy works best if only one version is in production and releases should always contain everything made.
2.In which cases is ancestry information not kept?
If files are copied or branches are destroyed. However if you need to delete and/or recreate branches all the time and/or need to use acceptTheirs continuously, than often its an indication of; inadequate branching strategy, inadequate TFS training, or issues with the testing and patching strategy (bugs found in production and development are found and fixed at the same time, resulting in merge conflicts).

Performing Historical Builds with Mercurial

Background
We use a central repository model to coordinate code submissions between all the developers on my team. Our automated nightly build system has a code submission cut-off of 3AM each morning, when it pulls the latest code from the central repo to its own local repository.
Some weeks ago, a build was performed that included Revision 1 of the repo. At that time, the build system did not in any way track the revision of the repository that was used to perform the build (it does now, thankfully).
-+------- Build Cut-Off Time
|
|
O Revision 1
An hour before the build cut-off time, a developer branched off the repository and committed a new revision in their own local copy. They did NOT push it back to the central repo before the cut-off and so it was not included in the build. This would be Revision 2 in the graph below.
-+------- Build Cut-Off Time
|
| O Revision 2
| |
| |
|/
|
O Revision 1
An hour after the build, the developer pushed their changes back to the central repo.
O Revision 3
|\
| |
-+-+----- Build Cut-Off Time
| |
| O Revision 2
| |
| |
|/
|
O Revision 1
So, Revision 1 made it into the build, while the changes in Revision 2 would've been included in the following morning's build (as part of Revision 3). So far, so good.
Problem
Now, today, I want to reconstruct the original build. The seemingly obvious steps to do this would be to
determine the revision that was in the original build,
update to that revision, and
perform the build.
The problem comes with Step 1. In the absence of a separately recorded repository revision, how can I definitively determine what revision of the repo was used in the original build? All revisions are on the same named branch and no tags are used.
The log command
hg log --date "<cutoff_of_original_build" --limit 1
gives Revision 2 - not Revision 1, which was in the original build!
Now, I understand why it does this - Revision 2 is now the revision closest to the build cut-off time - but it doesn't change the fact that I've failed to identify the correct revision on which to rebuild.
Thus, if I can't use the --date option of the log command to find the correct historical version, what other means are available to determine the correct one?
Considering whatever history might have been in the undo files is gone by now (the only thing I can think of that could give an indication), I think the only way to narrow it down to a specific revision will be a brute force approach.
If the range of possible revisions is a bit large and the product of building changes in size or other non-date aspect that is linear or near enough to linear, you may be able to use the bisect command to basically do a binary search to narrow down what revision you're looking for (or maybe just get close to it). At each revision that bisect stops to test, you would build at that revision and test whatever aspect you're using to compare against what the scheduled build produced that night. Might not even require building, depending on the test.
If it really is as simple as the graph you depict and the range of possibilities is short, you could just start from the latest revision it might be and walk backwards a few revisions, testing against the original build.
As for a definitive test comparing the two builds, hashing the test build and comparing it to a hash of the original build might work. If a compile on the nightly build machine and a compile on your machine of the same revision do not produce binary-identical builds, you may have to use binary diffing (such as with xdelta or bsdiff) and look for the smallest diff.
Mercurial does not have the information you want:
Mercurial does not, out of the box, make it its business to log and track every action performed regarding a repository, such as push, pull, update. If it did, it would be producing a lot of logging information. It does make available hooks that can be used to do that if one so desires.
It also does not care what you do with the contents of the working directory, such as opening files or compiling, so of course it is not going to track that at all. It's simply not what Mercurial does.
It was a mistake to not know exactly what the scheduled build was building. You agree implicitly because you now log that very information. The lack of that information before has simply come back to bite you, and there is no easy way out of it. Mercurial does not have the information you need. If the central repo is just a shared directory rather than a web-hosted repository that might have tracked activity, the only information about what was built is in the compiled version. Whether it is some metadata declared in the source that becomes part of the build, a naive aspect like filesize, or you truly are stuck hashing files, you can't get your answer without some effort.
Maybe you don't need to test every revision; there may be revisions you can be certain are not candidates. Knowing the time of the compile is merely a factor as the upper bound on the range of revisions to test. You know that revisions after that time could not possibly be candidates. What you don't know is what was pushed to the server at the time the build server pulled from it. But you do know that revisions from that day are the most likely. You also know that revisions in parallel unnamed branches are less-likely candidates than linear revisions and merges. If there are a lot of parallel unnamed branches and you know all your developers merge in a particular way, you might know whether the revisions under parent1 or parent2 should be tested based.
Maybe you don't even need to compile if there is metadata you can parse from the source code to compare with what you know about the specific build.
And you can automate your search. It would be easiest to do so with a linear search: less heuristics to design.
The bottom line is simply that Mercurial does not have a magic button to help in this case.
Apologies, it's probably bad form to answer one's own question, but there wasn't enough room to properly respond in a comment box.
To Joel, a couple of things:
First - and I mean this sincerely - thanks for your response. You provided an option that was considered, but which was ultimately rejected because it would be too complex to apply to my build environment.
Second, you got a little preachy there. In the question, it was understood that because a separately recorded repository revision was absent, there would be 'some effort' to figure out the correct revision. In a response to Lance's comment (above), I agree that recording the 40-byte repository hash is the 'correct' way of archiving the necessary build info. However, this question was about what CAN be done IF you do not have that information.
To be clear, I posted my question on StackOverflow for two reasons:
I figured that others must have run into this situation before and that, perhaps, someone may have determined a means to get at the requisite information. So, it was worth a shot.
Information sharing. If others run into this problem in the future, they will have an online reference that clearly explained the problem and discussed viable options for remediation.
Solution
In the end, perhaps my greatest thanks should go to Chris Morgan, who got me thinking to use the central server's mercurial-server logs. Using those logs, and some scripting, I was able to definitively determine the set of revisions that were pushed to the central repository at the time of the build. So, my thanks to Chris and to everyone else who responded.
As Joel said, it is not possible. However there are certain solutions that can help you:
maintain a database of nightly build revisions (date + changeset id)
build server can automatically tag revision it is based on (nightly/)
switch to Bazaar, it manages version numbers differently (branched versions are in form of REVISION_FORKED.BRANCH_NUMBER.BRANCH_REVISION so your change number 2 would be 1.1.1

Branching Patterns / Policies

I'm currently revamping the way our shop does version-control. We will be using a server centric version control solution.
I would like to see how other people are branching, and doing it successfully, without all of the headaches I read about.
What are some field tested branching patterns you have seen in a production environment that worked well i.e. branching per release. What policies have been put in place to ensure that branching goes smoothly.
Thanks
It depends on what kind of software you are developing.
For us, we are a web shop, so we do not have any numbered 'releases'. We keep trunk as what is 'production' worthy and only directly commit small changes.
When we have a large project we create a branch and work it up to production ready, all the while syncing trunk changes into it.
If the project involves a large restructuring of the code base we will generally create a tag on the last revision before merging the branch changes.
Again, if you are creating packaged software where you need to maintain different versions this won't work nearly as well.
For the record, we use Subversion.
The subversion book describes some commonly used branching patterns (e.g. release branches, feature branches, etc.).
3 things when considering branching.
First : Branching is fine, as long as you intends to Merge things back later on. Sure you can always have a branch with a specific patch for one of your customer with a specific problem. But eventually you want to merge most of the patches back to the main trunk
Second : Assert your needs. I've seen tree of all sizes depending on the size of the department, the number of customers, etc...
Third : Check how good your source control is for branching AND merging. For example, CVS is notoriously very poor for this kind of operation. SVN, "CVS done right" as they claim, is somewhat better. But Linus Torvalds who created Git (which is especially good for this kind of operation) would tell you CVS can't be done right (he said it in a very interesting presentation on Git). So if you have real need for branching and merging, at least get SVN not CVS.
Have a look at branching patterns:
http://www.cmcrossroads.com/bradapp/acme/branching/
It describes a number of patterns for working with patterns. I've generally worked in two ways:
Stable Receiving Line - all development is done in branches and merged into trunk only when required. This means you always have a single stable release point.
Main Development Line - all work is carried out in trunk. When it comes to releasing you take a release tag and use that. If major experimental rework is required it's carried out in a branch and merged back into trunk when stable.
This is the way we do and it works well for us...
Project
|
+--01-Development
| |
| +--Release1.0
| | |
| | +--Solution Files
| |
| +--Release2.0
| |
| +--Solution Files
|
+--02-Integration
| |
| +--Release1.0
| | |
| | +--Solution Files
| |
| +--Release2.0
| |
| +--Solution Files
|
+--03-Staging
|
+--04-Production
well you get the idea...
NOTE: This is the directory structure in Team Foundation Server Branches exist only between 01-Development/Release1.0 and 02-Integration/Release1.0,
02-Integration/Release1.0 and 03-Staging/Release1.0,
03-Staging/Release1.0 and 04-Production/Release1.0
In other words you wouldn't be able to merge 03-Staging/Release1.0 to 04-Production/Release2.0, etc...
What this does for us is we have 4 seperate environments Development, Integration (alpha server), Staging (beta server), Production.
Code starts in development starts in development and then gets promoted as it test by QA (integration/alpha) and users (Staging/beta) and finally to production.
Features/changes are collected and grouped into releases that occure every few months.
But lets say you are in development for Release2.0 and you get a production issue on Release1.0... I easily can get latest version of Release1.0 and fix the issue and promote it up without effecting anything that I have been working on for Release2.0
Not saying this will work for everyone in every situation but this works very well for us.