Organize a version control system for a legacy system - version-control

We are investigate how to using modern code version systems for a legacy systems.
However, there are some difficulties that we do not really know how to solve. How do you think we should going to organize this?
Some parts of the code we have responsibility for, and some parts the customer are responsibility for.
The code that is in our development system is the master, and all changes are made there and then sent out to the customer. So far is it like a common project. But for some directories with a few hundred files, the client is responsible for the code. Customer can change these files as they want, and do not have inform us. When we do an update for them, we need to get a copy of these files and merge our changes before we give the new version to the customer.
We have released 4 base versions of the system over the 25 years the system have lived. Our six customers, however, use different versions of the system, which means that we have to consider it. In addition, each customer has their own requirements, and we have made major adjustments for each customer.
We have several parallel projects related to the same files. Today we have major problems that the various projects interfere with each other, so we will benefit greatly from having changes in different branches. But how do we organize this?
Some projects affect only one customer, other projects involving several customers. And some projects may be introduced into a new future base version.
So my question is how to organize this in our new version system.

Related

Pros and cons of different strategies to managing shared resources in TFS 2012

Background
According to the Visual Studio ALM Rangers, there are two major approaches to sharing resources (e.g. common libraries which are used in many separate products) in TFS 2012:
Workspace mapping, setting up workspaces so that they point to the appropriate version of each required library and product.
Shared folders, using branch/merge to get and update the shared resource
At a glance, shared folders seems like the way to go, but a client that I am working with has experienced a lot of problems with that approach in Starteam, and is reluctant to try it again in TFS. I am currently in the process of assisting the client migrating from Starteam to TFS.
I have listed pros and cons with each approach, but I am uncertain if I have missed something.
Workspace mapping:
Simple to setup and understand
Easy to test a library change in several products
Easy to get latest changes in a library, and to submit changes to a library
No tracability, or at least less tracability, e.g. if a change in a library was introduced in Product A, how to track that change in Product B
Changes in libraries may affect products in an uncontrolled manner
Build gets more complicated
Each user must set up his/her workspace individually (but there are workspace templates in TFS 2012 Power Tools)
Folder mapping:
Everything that is needed is configured in a given branch
Isolation between products and branches
Builds are simplified
More control of changes
Requires more disk space
Requires more administration in the form of branching/merging and setup of branches
One particular problem is how to test library changes in several products. As I understand that would require testing in product A, then reverse integrate to library and forward integrate to product B, then test that product and so on.
Conclusion, and final question
The client has successfully used something similar to workspace mapping in Starteam for 10 years, and plan to continue to use that approach in TFS. Although they have the problem to keep track of library changes that affects several products.
They are afraid that folder sharing will get messy and complicated.
My question is, have I missed something in my list above? Are there more reasons for why an organisation not should use workspace mapping, or for why they should use folder sharing.

How to manage the source code that runs on different customer's systems?

We have an application which is implemented for our own company.
By time, the application has been purchased by various companies.
For each company, we created a new TFS Branch in source control. And each one has been changed for specific customer requirements.
That's why the source code has many versions now.
Making a change became so difficult because the change needs to implemented and tested seperately for each branch if it is from a common structure.
What is the best and conventional way to manage source code?
Is it recommended to have a SINGLE SOLUTION that can run on each customer's systems.
There are several ways to handle customer-specific customizations, among them:
Keep a completely separate branch per customer and eventually merge code between branches. This is the solution you deploy right now.
Architect the application in a way where you have a customer-independent "kernel" which has pluggable custumization hooks. Only the customizations would be kept in separate independent repositories.
Put the customizations into a common application and make them configurable ("on/off").
Which route to take depends on the nature of the application and the amount of customizations per customer. If the context allows so, separate branches are least favourable due to the manual merging, bug fixing and testing overhead.
In a specific industry (telecom billing systems) I have seen all three: suppliers who work with dedicated code branches, others with pluggable customizations and configurable off-the-shelf products. Naturally, each supplier has a different level of customization flexibility, level of productification and integration approach.
As a software supplier the big trade-off is to balance the level of customization flexibility versus the level of productification.

How to one track several branches of a tool to a common platform

I'm currently working with a tool that over the years has evolved naturally from a number of perl scripts accessed through an apache web server, to a huge collection of tools using a common database and web site (still running apache, but using catalyst instead of CGI).
The problem we're having is that different departments have done local branches from the common main branch for implementing their own new functionality and adaptations.
We're now charged with the task of deciding how a common platform can be made available where certain base functionality is made one track instead of having all these different branches.
These kind of problems must spring up all the time so I'm hoping someone have a good strategy to offer as to how we should continue from here. Any thoughts would be appreciated.
In general, make sure you have buy in from everybody involved. Trying to do this kind of project without having people on board will just make your life more difficult.
Look for the quick wins. What functionality, if it changed, would have the fastest and clearest beneficial effect across all departments. If it takes you three months to get some good out of it, people won't rate the good results very highly.
Break functionality down as far as you can. One of the biggest problems in forked legacy systems is that a seemingly innocuous change in one place can have huge ramifications elsewhere because of the assumptions made about state. Isolating state in different features should help you out there.

Arguments against zip files as source control [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What arguments can be used against using zip files of source code as a form of version control?
In general each developer is working on their own program and has a responsibility for it. But there are times of course when other developers are involved in work on that program.
Each developer has their own naming convention for zip files ranging from appending the date, a number after the program name or even appending _old / _oldold _newversion etc… When there is collaboration on development of some code. It has to be checked who has the ‘latest’ version of the code – and where it resides, usually the correct version is identified.
There is no easy existing method to diff source trees and during development unwanted changes occasionally slip into code.
The zip file corresponding to software releases that have release to manufacturing are archived. This at least adds some traceability.
Also before RTM there the code is peer reviewed against the previously released version so quality assurance does exist.
Are there any formal white papers explaining the advantages of source control, making clear that the above isn’t a fully valid form of source control? Arguments exist here that since the end product (manufacturing releases) are under control and these are reviewed that there is no problem with the process. Developers do not have too much of a problem working with zip files in this way, but may not be aware of the advantages.
Creating and managing zip files is error-prone.
Real source control gives you tools to understand your code:
History browsing
Diffs between revisions
Annotation of source files to track the origin of a change
Real source control isn't difficult, there's lots of help out there.
The best argument is surely that using a version control system like Subversion or Mercurial is much, much easier and more secure than messing about with zip files. I doubt there has been much paper writing on the subject, as the use of
zip files for this purpose is fairly obviously wrong.
There are a number of SO questions on the general advantages of version control. For example How can I convince my department to implement a version control system? and https://stackoverflow.com/questions/250984/do-i-really-need-version-control
I assume you currently work at a company that practices this method of zip control, and you're looking for ammunition to help you change this practice. There are a lot of questions on StackOverflow about source control, and the community here are in near-total consensus on the benefits of proper source control and the horrors of working without it (for very good reason).
I'll add something here to benefit your battle: YOUR COMPANY IS #$#%&$## CRAZY!!! ZIP FILES??? ARE YOU ##$##% KIDDING ME???
I am assuming that this question was asked because the original poster is working in an office where the standard practice is to share zip files.
Zip files are obviously bad, for the reasons given by Ned Batchelder. The biggest reason I would suggest is that it's clunky, and difficult to merge changes, or get diffs between past revisions easily.
I would recommend you read A Visual Guide to Version Control for some good arguments about why version control systems are very useful, and a superior way of managing code.
I suspect there'll be as many white papers comparing zip files to proper source control as there'll be white papers comparing cutting one's genitals off with a rusty butter knife with buying a puppy.
Zip files work as a very basic form of version control. It's a way to separate "states" of the source. However, it's not a good form of version control because you have to do a lot of work to perform basic source control management tasks. For example:
Bob's team is working on a major feature that requires changing dozens of files. He works in his own private zip-controlled area for a while. He's created 30 new files, added features to 12 existing files, and made changes to existing behavior in 3 existing files over 4 months. How do you merge Bob's work with the main trunk that has also evolved over the last 4 months? Do you hand-diff thousands of lines of code and decide how to merge them? How do you ensure that anything that uses the 15 existing files isn't broken? How do you ensure that Bob's features or main trunk features aren't accidentally omitted?
Alice is investigating a bug in her code and realizes that one of Sam's classes has changed its behavior. Sam says he didn't make the change. How does Alice find when and why the change was made? How does Alice know who depends on the change?
A major customer has reported a bug in an older version of the program. This customer needs a fix and is important enough to warrant a patch. How do you add the code to the old zip file in a way that it also exists in the new files? Also, how do you record that there is a relationship between the two changes?
These are just three scenarios that a version control system handles well. Situation 1 is handled by development branches. Almost every version control system has a notion of branches that can be developed in parallel and merged as needed. Situation 2 is easily addressed by any source control system with a "blame" feature and less easily addressed by just searching commit logs. Situation 3 is a variant of situation 1, but when you merge branches most version control systems make a note. For example, you'd make a branch off of the old version, fix the bug, then merge that branch into the new code. Now when someone asks "Where did this change come from?" they see it was merged from the patch branch and the change was made to fix a bug.
By the way, I've been in each of these 3 situations and used both SVN and Perforce; both made finding a solution very easy.
These people already know all the arguments for SCM, there is nothing anyone can say to them that will sell them on it. These things must happen:
You install SCM on your local machine and use it. If you must, have it autogenerate these .zip files at every build, so no one outside your cube knows the difference.
Some kind of disaster occurs, like loss of work, show-stopper bug is re-introduced or some other worst-case scenario that is the real reason we all use SCM (the other features we learn to appreciate later).
You are unaffected by the disaster, and/or use your personal copy of the code in SCM to fix the problem/recover the lost work/whatever.
You are a hero and everyone wants to know how you did it.
Only by experiencing firsthand the pain of loss caused by poor SCM practices will your organization realize the benefits of SCM. You're smart enough to learn from the mistakes of others, but not everyone is. The rest of the time, you'll just be 2/3X more productive than the rest of the team and maybe, just maybe they'll wonder how.
By the way, this is how you get agile, continuous integration, unit testing, etc into the organization: lead by example.
The ZIP solution requires a pro-active step at the end of the development cycle when things tend to get dropped because no one outside the dev group notices when they doesn't happen. Sort of like that final code cleanup you always plan on doing when things slow down.
An SCM integrated into the dev environment pretty much enforces/encourages keeping a version history with a small amount of effort all the way through the process. This makes it more likely that a version history will actually be created.
On Using ZIP as a SCM
I'm not going to take as hard of a line as some of the others on the ZIP file solution. It is at least better than nothing. It is a perfectly valid way of keeping version histories, it is just a lot more labor intensive, error prone, and lacks a lot of useful features.
Know who you are selling to
Someone in the Dev Group: Focus your arguments on features like ease of troubleshooting by using change histories, safety to experiment with big code changes (because of rollback), and avoiding accidents where work is overwritten by other developers.
Non-Tech Managers/Bean-counters: There are free/low-cost tools that will reduce the labor cost of version control and give greater accountability/transparency into what each developer is doing/the source of coding mostakes.
I wrote a Version Control tool long ago for a company who did the authoring for DVD titles. Before that they had nothing, just a directory full of clips, icons, scripts etc. which anyone could hack away at, and no way to backtrack if it went wrong etc. HOWEVER these people were 'artists', not programmers, so they could not (would not???!) be trained to use a decent Version Control system. So as a bare-minimum, get-out-of-the-mud level tool I wrote a utility which zipped up the current state of the directory, gave the Zip a meaningful name (date + comment supplied by user) and stuck it in a Backups directory, and also allowed you to restore one of these backups.
So zips CAN provide minimum-level version control, and I speak as someone who endorsed that approach when it was right for the skill-level (in terms of programming, I don't want to imply that they couldn't manipulate pixels!) of the people using it.
But as a programmer, you should be thinking to use a tool which really helps you. As such you want to be able to compare differences for individual files, compare differences between complete milestone sets, and (if you are working on anything other than trivial programmes) handle branching and merging. If you want these features you need something BETTER than zip files.
I used to use ComponentSoftware RCS, and if it wasn't for its poor performance over a WAN we might still be using it: it is cheap (even free for single-developer use, in which form I used to use it at home) and simple to use. However nowadays I would suggest looking at SubVersion. It is very flexible, reasonably simple to understand, has a good set of Windows tools to make it even easier (e.g. Tortoise, Ankh), and ... best of all ... you can get it running for free.
It's not good as only creating a zip before a release means loosing a lot of power you get with version control.
Useually you should check in to the repository after you have added/removed/changed a functional aspekt. So that you can go back later when an error occurres that you think migth be because of this change. Or when you say "dammed this worked before the file format changed in someday in march." Naming revisions after changes makes it also easier to remember because you forgot what was done on 27 march 2009.
In general each developer is working
on their own program and has a
responsibility for it. But there are
times of course when other developers
are involved in work on that program.
In a normal development shop, this is not at all true. Different people work on the same source code all the time. XP makes it almost mandatory. Even if you separate the code into modules, there will still be interaction points with code that concerns at least two programmers.
Of course, it's almost impossible to collaborate without major problems if you don't use source control. But the scenario you describe is much more a way to adjust to this limitation than a sane project structure.
Having only a single person working on a module means that nothing will happen when that person is on vacation and you have a major problem when he leaves the company, gets sick for a long time, or dies.
How do you do a merge? How do you do an annotate? How do you bisect? Where are changelogs stored? Just go to wikipedia and look up "Version control" and go down the list: zip files can kind of sort of do about 2 things out of the whole page.
This is like asking "What arguments can be used against shorthand as a form of double-entry bookkeeping?". It's a completely different thing.
For arguments, there's Walter Tichy's original paper on RCS.
For missing features, among many others there's the ability to merge changes from different versions. This is especially well supported by tools like git and darcs, and to a lesser extent mercurial.
P.S. To Mercurial fans: the problem is that Mercurial delegates the merge process to external tools, and it's very difficult for the mercurial novice to know which tool to use, or to understand how they work—the mercurial model of merging seems far more powerful than others but correspondingly difficult to get a grip on.
I haven't seen an answer include Eric Sink's Source Control HOWTO, but it's a valuable reference. I haven't seen any formal white papers on version control, but I'm not sure the argument about "validity" is your strongest one. The problems you describe in your question indicate some pretty serious drawbacks with the current approach. If "the powers that be" in your environment aren't convinced by that, change the argument entirely.
If you make it a question of quality control, and point to continuous integration as a practice that encourages it, then the zip file approach to version control isn't a "not fully valid form of version control", but an obstacle to implementing continuous integration as a practice.
Your question doesn't indicate whether or not the end product "under control" is tested in any automated fashion (in addition to being reviewed). If the process you describe would prevent that from taking place as well, certainly add that to your argument too.
I think your best argument is showing a GOOD form of source control and showing how powerful it is. Don't trash what is currently being done (as someone is surely emotionally attached to that). You don't want to trash the "ZIP Source Control Method." Show the power of something like SVN. Make it very easy to explain. Show common use cases. (A solid demo would help.)
Let the source control version sell itself.

Strategies for Developing Multiple Products from One Codebase

I'm working on a project that will (soon) be branched into multiple different versions (Trial, Professional, Enterprise, etc).
I've been using Subversion since it was first released (and CVS before that), so I'm comfortable with the abstract notion of branches and tags. But in all my development experience, I've only ever really worked on trunk code. In a few rare cases, some other developer (who owned the repository) asked me to commit changes to a certain branch and I just did whatever he asked me to do. I consider "merging" a bizarre black art, and I've only ever attempted it under careful supervision.
But in this case, I'm responsible for the repository, and this kind of thing is totally new to me.
The vast majority of the code will be shared between all products, so I assume that code will always reside in trunk. I also assume I'll have a branch for each version, with tags for release builds of each product.
But beyond that, I don't know much, and I'm sure there are a thousand and one different ways to screw it up. If possible, I'd like to avoid screwing it up.
For example, let's say I want to develop a new feature, for the pro and enterprise versions, but I want to exclude that feature from the demo version. How would I accomplish that?
In my day-to-day development, I also assume I need to switch my development snapshot from branch to branch (or back to trunk) as I work. What's the best way to do that, in a way that minimizes confusion?
What other strategies, guidelines, and tips do you guys suggest?
UPDATE:
Well, all right then.
Looks like branching is not the right strategy at all. So I've changed the title of the question to remove the "branching" focus, and I'm broadening the question.
I suppose some of my other options are:
1) I could always distribute the full version of the software, with all features, and use the license to selectively enable and disable features based on authorization in the license. If I took this route, I can imagine a rat's nest of if/else blocks calling into a singleton "license manager" object of some sort. What's the best way of avoiding code-spaghettiism in a case like this?
2) I could use dependency injection. But generally, I hate it (since it moves logic from the source code into configuration files, which make the project more difficult to grok). And even then, I'm still distributing the full app and selecting features at runtime. If possible, I'd rather not distribute the enterprise version binaries to demo users.
3) If my platform supported conditional compilation, I could use #IFDEF blocks and build flags to selectively include features. That'd work well for big, chunky features like whole GUI panels. But what about for smaller, cross-cutting concerts...like logging or statistical tracking, for example?
4) I'm using ANT to build. Is there something like build-time dependency injection for ANT?
A most interesting question. I like the idea of distributing everything and then using a license key to enable and disable certain features. You have a valid concern about it being a lot of work to go through the code and continue to check if the user is licensed for a certain feature. It sounds a lot like you're working in java so what I would suggest is that you look into using an aspect weaver to insert the code for license checking at build time. There is still a going to be one object into which all calls for license checking goes but it isn't as bad of a practice if you're using an aspect, I would say that it is good practice.
For the most part you only need to read if something is licensed and you'll have a smallish number of components so the table could be kept in memory at all times and because it is just reads you shouldn't have too much trouble with threading.
As an alternative you could distribute a number of jars, one for each component which is licensed and only allow loading the classes which are licensed. You would have to tie into the class loader to achieve this.
Do you want to do this via Subversion ? I would use Subversion to maintain different releases (a branch per release e.g. v1.0, v2.0 etc.) but I would look at building different editions (trial/pro etc.) from the same codebase.
That way you're simply enabling or disabling various features via a build and you're not having to worry about synchronising different branches. If you use Subversion to manage different releases and different versions, I can see an explosion of branches/tags in the near future.
For switching, you can simply maintain a checked-out codebase, and use svn switch to checkout differing versions. It's a lot less time-consuming than performing new checkouts for each switch.
You are right not to jump on the branching and merging cart so fast. It's a PITA.
The only reason I would want to branch a subversion repository is if I want to share my code with another developer. For example, if you work on a feature together and it is not done yet, you should use a branch to communicate. Otherwise, I would stay on trunk as much as possible.
I second the recommendation of Brian to differentiate the releases on build and not on the code base.