How to manage multiple small parts of a library with version control - version-control

I'm new to source version control, so I don't want to make a mistake in choosing the wrong setup for my project.
I have kind of a "library" that is made of many small "procedures" (they are written in a pseudo-language specific of a third paty software). Each procedure is a small stand alone "package" of 2/3 files (just the procedure itself, the documentation, and maybe one or two other sub-procedures that are needed only to the main one).
So I have like hundreds of those procedure-packages, archived in subfolders depending on the area of application, and some of them more complex may use others more basic.
I modify those procedures pretty often in the early stages, to improve them, but of course sometimes the modifications break the compatibility since thei involve adding/removing input/output parameters, so I suppose I must somehow "tag" versions of each procedure as if it was a single piece of software...
So I'm wondering what's the best way to manage them with a version control (I'm using Mercurial): am I supposed to make like hundreds repositories? o_O Or keep everything in one big repository and tag it everytime a procedure is revised? or maybe learn and use subrepositories?
Thanks for your help.
Simone

I can be subrepositories (or GuestRepo - no updates from 2013) with tags
Each changeset in "main" repo have linked to it changesets from all repositories, i.e. when you update to old changeset in master, all subrepos also updated accordinly
Tags in main repository will allow you to mark stable|functional combinations for re-use

Subrepos make sense, if each package can stand for itself without any connection to the master software. If this condition is not met, I would stay with one single repository. Especially since you stated that your packages contain few files with few changes it seems that the subrepo approach does not make sense here.

Related

Work-arounds for the lack of version control branching support in Enterprise Architect

From the user guide
Currently, Enterprise Architect does not support Version Control Branching.
Work-arounds to achieve similar results might be possible for certain version-control products; contact Sparx Support for advice.
The support channel available to me has not been very responsive so let's see what SO has to say about it. Basically I want to manage my model the same way I manage the product it's modelling so when my product repository is branched I want the model to branch with it. What are the different ways to deal with this? Pros/cons?
First of all: a model is not code. The difficulty with a model (in XMI format) is that it looks like being readable but in fact it isn't. So the problem comes with merging. While a merge in code will create obvious results (compiler errors due to bad merges can be corrected more or less easily) a wrong merge for a model simply creates rubbish. For that reason you can not branch and merge a model.
Now, how can you go about this? As said: there is no easy solution. One way is to create separate repositories for each branch. As long as the branches live independent you're fine. But how can you merge changes from one branch to the other? Sad to say, but the only way I can recommend it the manual one. There is actually no usable compare tool for UML models. You need to locate the changes per package and export that package.
If (and it's impossible to explain in a few sentences) the changes are just local to that package you can import the changes via a simple XMI import.
If you are able to locate changes to single elements you can create a dummy package, move those elements temporarily into that package and move it over. That will update the elements in the other branch but also move them from their original position (as you did in the first branch) and you need to move the elements back to where they were in both branches and get rid of the dummy package.
In case of structural changes you're probably better off doing the change manually in the other branch and have an eye review.
To sum up: model branches are nothing you like to work with. Look for alternative ways. The best is likely to have everything in one model and find ways to mark the branches inside the model using packages structures, tagged values or whatever. Not easy to handle either.

revision control for many unrelated files

I'm curious to get people's thoughts on how to manage version control for unrelated functions in Matlab.
I keep a reasonably large set of general purpose scripts, each of which is more or less independent of the others. I've been keeping them all in a single directory, containing a single repository in Mercurial. I'm starting to collaborate much more, and I'd like my collaborators to be able modify the files, commit, branch, and merge.
The problem is that the files are independent of one another. Essentially, they're like many separate little projects. But Mercurial treats the repository as a single entity. So if a collaborator modifies file A and B, and I only want to merge in the changes from file A, things get complicated. I know that I could merge from the collaborator, then revert file B, but I'm wondering if there's a simpler way to handle this setup.
I could set up many tiny repositories to manage each file separately, but that also gets complicated.
I'm open to changing version control systems (although I like Mercurial a lot). Any suggestions?
It is considered a best practice to check in code after each bug fix/feature addition/or what not. Given your files are really independent "projects" it seems unlikely a bug or feature would span multiple files. Probably the best you can do is encourage your colleagues in best practices to commit changes only for a single file at once. Explain that better discipline about checking in leads to more manageable source control later. Hopefully you can get most to follow the practice and the few obstinate ones just stop taking their commits for.
It really depends on your typical reasons for merging one change but not the other. If you're using it to create a software configuration, i.e. sometimes you want to use version 1 of file A and version 2 of file B and sometimes it's the other way around, then you probably want to use subrepos to hold each file. If it's because you never want to accept part of a collaborator's change, then they need to be instructed how to make their changes more cohesive and submit them separately. That can sometimes be a difficult concept for people who either haven't used source control before, or who are accustomed to source control like svn that has little or no intrinsic concept of a changeset.
It depends whether you want to maintain a single 'master' version of the files, merging in changes that you like and ignoring others. If collaborators want to develop other branches, then they should perhaps clone the repository, and you can then accept the changesets that you want in the master.
If you want to veto changes by other collaborators, then the changes either need to be kept separate (via a cloned repository or branch) or you need a review process before changes are pushed back to the trunk.
I always use incoming repositories for collaborators. They match what the other person has made, but it avoids messing with my own repository. When you do this, you can then cherrypick their new changesets into your own repository with the transplant extension.

When to start to use source control in early stages of development?

We have 2 kinds of people at my shop:
The ones that starts to check-in the code since the first successful compilation.
The others that only checks-in the code when the project is almost done.
I am part of group 1, and trying to convince people of group 2 to act like me. Their arguments are like the following:
I'm the solo developer of this project.
It's just a prototype, maybe I'll have to rewrite from scratch again.
I don't want to pollute the Source Control with incomplete versions.
If I am right, please help me to raise arguments to convince them. If you agree with them tell me why.
When someone asked for good excuses not to use version control, they got 75 answers and 45 upvotes.
And when they asked Why should my team adopt source control, they got 26 answers.
Maybe you'll find something helpful there.
You don't need "arguments to convince them." Discourse is not a game, and you should not use your work as a debating platform. That's what your spouse is for :) Seriously, though, you need to explain why you care how other devs work on solo projects in which other people are not involved. What are you missing because they don't use source control? Do you need to see their early ideas to understand their later code? If you can sucessfully do that, you may be able to convince them.
I personally use version control at all times, but only because I don't walk a tightrope without a net. Other people have more courage, less time to spend on infrastructure, etc. Note that in 2009, in my opinion, hard disks rarely fail and rewritten code is often better than the code that it replaces.
While I'm answering a question with a question, let me ask another one: does your code need to compile/work/not-break-the-build to be checked in? I like my branches to get good and broken, then fixed, working, debugged, etc. At the same time, I like other devs to use source control however they want. Branches were invented for just that reason: so that people who can't get along do not have to cohabitate.
Here's my view to your points.
1) Even solo developers need somewhere to keep their code when their PC fails. What happens if they accidentally delete a file without source control?
2/3) Prototypes belong in source control so other team members can look at the code. We put our prototype code in a seperate location to the mainline branch. We call it Spike. Here's a great article on why you should keep Spike code- http://odetocode.com/Blogs/scott/archive/2008/11/17/12344.aspx
If I'm the sole developer on a project (in other words, the repository, or part of it, is under my complete control), then I start committing source code as soon as it's written, and I tend to check in after every incremental change, whether or not it works or represents any kind of milestone.
If I'm working in a repository on a project with others, then I tend to try and make my commits such that they don't break the mainline development, pass any tests, etc.
Whether or not it's a prototype, it deserves to go into source control; prototypes represent a lot of work, and lessons learned from them are valuable. Plus, prototypes have an awful habit of becoming production code, which you'll want in source control.
I try to only write code that compiles (everything else is commented out with a TODO/FIXME tag)... and also add everything to source control.
Argument 1: Even as a single dev it's nice to roll back to a running version, to track your progress, etc.
Argument 2: Who cares if it's just a prototype? You might stumble upon a similar problem in six months or so, and then just start looking for this other code...
Argument 3: Why not use more than one repo? I like to file misc stuff to my personal repo.
Start using source control about 20 minutes before you write your first line of your first artifact. There is never a good time to start after you're begun writing things.
some people can only learn from experience.
like a hard drive failure. or coding yourself into a dead-end after deleting code that actually worked
now, i'm not saying that you should erase their hard drive and then taunt them with "if only you had used source control"...but if something like were to happen, hopefully there would be a backup done first ;-)
Early and Often. As the Pragmatic Programmers say, source control is like a time machine, and you never know when you'll want to go back.
I would say to them...
I'm the solo developer of this project.
And when you leave or hand it off we'll have 0 developers. All the more reason to use source control.
The code belongs to the company not you and the company would like some accountability. Checking in code doesn't require too much effort:
svn ci <files> -m " implement ajax support for grid control
Next time someone new wants to make some changes on the grid control or do something related, they will have a great starting point. All projects start off with one or two people. Source control is easier now than it ever was--have they arranged a 30 minute demo of Tortoise SVN with them?
It's just a prototype, maybe I'll have to rewrite from scratch again.
Are they concerned about storage? Storage is cheap. Are they concerned about time wasted on versioning? It takes less time then the cursory email checks. If they are re-writing bits then source control is even more important to be able to reference old bits.
I don't want to pollute the Source Control with incomplete versions.
That's actually a good concern. I used to think the same thing at one point and avoided checking in code until it was nice and clean which is not a bad thing in and of itself but many times I just wanted to goof around. At this point learning about branching helps. Though I wish wish SVN had full support for purging folders like Perforce.
Let see their arguments:
I'm the solo developer of this project.
It's just a prototype, maybe I'll have to rewrite from scratch again.
I don't want to pollute the Source Control with incomplete versions.
First, the 3rd one. I can see the reasoning, but it is based on a bad assumption.
At work, we use Perforce, a centralized VCS, and indeed we only check in source that compile successfully and doesn't break anything (in theory, of course!), after peer review.
So when I start a non trivial change, I feel the need to intermediary commits. For example, recently I started to make some changes (somehow, in solo for this particular task, so I address point 1) on a Java code using JDom (XML parsing). Then I was stuck and wanted to use Java 1.6's built in XML parsing. It was obviously time to keep a trace of the current work, in case my attempt was failed and wanted to go back. Note this case somehow addresses the point 2.
The solution I chose is simple: I use an alternative SCM! Although some centralized VCS like SVN are usable in local (on the developer's computer), I was seduced by distributed VCS and after briefly testing Mercurial (which is good), I found Bazaar better suited to my needs and taste.
DVCS are well suited for this task because they are lightweight, flexible, allow alternative branches, doesn't "pollute" the source directory (all data is in one directory at the root of the project), etc.
By making a parallel source management, you don't pollute the source of other developers, while keeping the possibility to go back or quickly try alternative solutions.
At the end, by committing the final version to the official SCM, the result is the same, but with added security at the level of the developer.
I'd like to add two things. With version control you can:
Revert to last version that worked, or at least check how it looked like. For that you would need SCM which supports changesets / uses whole-tree commits.
Use it to find bugs, by using so called 'diff debugging' by finding commit in history that introduced the bug. You would want SCM which support it in automated or semi-automated fashion.
Personally, I often start version control after the first sucessful compile.
I just wonder why nobody mentioned distributed version control systems in this context: If you could manage to switch over to a distributed system (git, bazaar, mercury), most arguments of your second group would become pointless since they can just start their repository locally and push it to the server when they want (and they can also just remove it, if they want to restart from scratch).
For me, it's about having a consistent process. If you are writing code, it should follow the same source control process that your production code does. That helps build and enforce good development practices across the development team.
Categorizing the code as a prototype or other non-production type of project should just be used to determine where in the source control tree you put it.
We use both CVS (for non .NET projects) and TFS (for .NET projects) where I work, and the TFS repository has a Developer Sandbox folder where developers can check in personal experimental projects (prototypes).
If and when a project starts to get used in production, the code is moved out of the Developer Sandbox folder into it's own folder in the main tree.
I would say you should start adding the source and checking in before you even build the first time. It is then much easier to avoid checking in generated artifacts. I always use some source control, even for my small hobby hacks, just because it automatically filters the relevant from the noise.
So when I start prototyping I might create a project and then before building it I do "git init, git add ., git commit -a -m ..." just so that when I want to move the interesting parts I just clone over using git and then I can add it to the subversion repository or whatever is used where I am working at the moment.
It's called branching people try to get with the program :p Prototyping? Work in a branch. Experimenting? Work in a branch. New feature? Work in a branch.
Merge your branches into the main trunk when it makes sense.
I guess people tend to be laid back when it comes to setting up source control initially if the code may never be used. I have projects I coded belonging to both groups and the ones outside source control are not less important. It is one of those things that gets postponed everyday when it really should not.
On the other hand I sometimes commit too seldom complicating a revert once I screw up some CSS code and not knowing what I changed e.g. to make the footer of the site end up behind the header.
I check-in the project in source control before I start coding.
The first thing I do is create and organize the projects and support files (such as .sln files in .NET development) with the necessary support libraries and tools (usually in a lib folder) I know I will use in my project.
If I already have some code written, then I add it too, even if it is an incomplete application. Then I check-in everything. From there, everything is as usual, write some code, compile it, test it, check-in it...
You probably won't need to branch from this point or revert your changes, but I think it is a good practice to have everything under source control since the beginning, even if you don't have anything to compile.
I create a directory in source control before I start writing code for a project. I do the first commit after creating the project skeleton.
i'm drunk and and i do first git -init and then vim foo.cpp.
Any decent modern source control platform (of which VSS is not one) should not in any way be polluted by putting source code files into it. I am of the opinion that anything that has a life expectancy of more than about 1/2 an hour should be in source control as early as possible. Solo develpment is no longer a valid excuse for not using source control. It is not about security it is about bugs and long term history.

When to use a Tag/Label and when to branch?

Using TFS, when would you label your code and when would you branch?
Is there a concept of mainline/trunk in TFS?
A label in TFS is a way of tagging a collection of files. The label contains a bunch of files and the version of the file. It is a very low cost way of marking which versions of files make up a build etc.
A branch can be thought of as a copy of the files (of a certain version) in a different directory in TFS (with TFS knowing that this is a branch and will remember what files and versions it was a branch of).
As Eric Sink says, a branch is like a puppy. It takes some care and feeding.
Personally, I label often but branch rarely. I create a label for every build, but only branch when I know that I need to work on a historical version or that I need to work in isolation from the main line of code. You can create a branch from any point in time (and also a label) so that works well and means that we don't have branches lying around that are not being used.
Hope that helps,
Martin.
In any VCS, one usually tags when you want a snapshot of the code, to be kept as reference for the future. You branch when you want to develop a new feature, without disturbing the current code.
Andrew claims that labeling is lazier than branching; it's actually more efficient in most cases, not lazy. Labeling can allow users to grab a project at any point in time, keep a history of files changed for a version or build, and branch off of/work with the code at any point and later merge back into the main branch. Instead of what Andrew said, you're advised to only branch when more than one set of binaries is desired- when QC and Dev development are going on simultaneously or when you need to apply a hotfix to an old version, for example.
I always see labels as the lazy man's branch. If you are going to do something so significant that it requires a full-source label then it is probably best to denote this with a branch so that all tasks associated with that effort are in an organized place with only the effected code.
Branching is very powerful however and something worth learning about. TFS is not the best source control but it is not the worst either. TFS does support the concept of a trunk from which all branches sprout as well.
I would recommend this as a good place to read up on best practices - at least as far as TFS is concerned.

The theory (and terminology) behind Source Control

I've tried using source control for a couple projects but still don't really understand it. For these projects, we've used TortoiseSVN and have only had one line of revisions. (No trunk, branch, or any of that.) If there is a recommended way to set up source control systems, what are they? What are the reasons and benifits for setting it up that way? What is the underlying differences between the workings of a centralized and distributed source control system?
Think of source control as a giant "Undo" button for your source code. Every time you check in, you're adding a point to which you can roll back. Even if you don't use branching/merging, this feature alone can be very valuable.
Additionally, by having one 'authoritative' version of the source control, it becomes much easier to back up.
Centralized vs. distributed... the difference is really that in distributed, there isn't necessarily one 'authoritative' version of the source control, although in practice people usually still do have the master tree.
The big advantage to distributed source control is two-fold:
When you use distributed source control, you have the whole source tree on your local machine. You can commit, create branches, and work pretty much as though you were all alone, and then when you're ready to push up your changes, you can promote them from your machine to the master copy. If you're working "offline" a lot, this can be a huge benefit.
You don't have to ask anybody's permission to become a distributor of the source control. If person A is running the project, but person B and C want to make changes, and share those changes with each other, it becomes much easier with distributed source control.
I recommend checking out the following from Eric Sink:
http://www.ericsink.com/scm/source_control.html
Having some sort of revision control system in place is probably the most important tool a programmer has for reviewing code changes and understanding who did what to whom. Even for single person projects, it is invaluable to be able to diff current code against previous known working version to understand what might have gone wrong due to a change.
Here are two articles that are very helpful for understanding the basics. Beyond being informative, Sink's company sells a great source control product called Vault that is free for single users (I am not affiliated in any way with that company).
http://www.ericsink.com/scm/source_control.html
http://betterexplained.com/articles/a-visual-guide-to-version-control/
Vault info at www.vault.com.
Even if you don't branch, you may find it useful to use tags to mark releases.
Imagine that you rolled out a new version of your software yesterday and have started making major changes for the next version. A user calls you to report a serious bug in yesterday's release. You can't just fix it and copy over the changes from your development trunk because the changes you've just made the whole thing unstable.
If you had tagged the release, you could check out a working copy of it and use it to fix the bug.
Then, you might choose to create a branch at the tag and check the bug fix into it. That way, you can fix more bugs on that release while you continue to upgrade the trunk. You can also merge those fixes into the trunk so that they'll be present in the next release.
The common standard for setting up Subversion is to have three folders under the root of your repository: trunk, branches and tags. The trunk folder holds your current "main" line of development. For many shops and situations, this is all they ever use... just a single working repository of code.
The tags folder takes it one step further and allows you to "checkpoint" your code at certain points in time. For example, when you release a new build or sometimes even when you simply make a new build, you "tag" a copy into this folder. This just allows you to know exactly what your code looked like at that point in time.
The branches folder holds different kinds of branches that you might need in special situations. Sometimes a branch is a place to work on experimental feature or features that might take a long time to get stable (therefore you don't want to introduce them into your main line just yet). Other times, a branch might represent the "production" copy of your code which can be edited and deployed independently from your main line of code which contains changes intended for a future release.
Anyway, this is just one aspect of how to set up your system, but I think giving some thought to this structure is important.