Using different "paths" in Mercurial - also called branching - version-control

Is it possible to have different development "paths" from a given point in Mercurial, without having to clone my project? I currently have 2-3 different implementations options for a project and I'd like to try them out. If I could just use one and at any point come back and start in another "path" without losing data from the older one that would be nice, but I am not even sure it is possible.
Thanks

This is exactly what branching is designed for:
https://www.mercurial-scm.org/wiki/Branch
The easiest way to create a branch in Mercurial is to simply checkout an older version, and then commit again with something different from what you committed after it the first time. You won't lose the old following commit, the new commit will simply branch out into a new line of development and the original commit(s) will remain on the previous line of development.

Yes, you probably want bookmarks for this - they're a lightweight way of marking various heads without recording the names forever in the revision (which branches do.) See BookmarksExtension for more details.
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/ may also be helpful - it's essentially the canonical document on branch management strategies in Mercurial.

Related

revision control for many unrelated files

I'm curious to get people's thoughts on how to manage version control for unrelated functions in Matlab.
I keep a reasonably large set of general purpose scripts, each of which is more or less independent of the others. I've been keeping them all in a single directory, containing a single repository in Mercurial. I'm starting to collaborate much more, and I'd like my collaborators to be able modify the files, commit, branch, and merge.
The problem is that the files are independent of one another. Essentially, they're like many separate little projects. But Mercurial treats the repository as a single entity. So if a collaborator modifies file A and B, and I only want to merge in the changes from file A, things get complicated. I know that I could merge from the collaborator, then revert file B, but I'm wondering if there's a simpler way to handle this setup.
I could set up many tiny repositories to manage each file separately, but that also gets complicated.
I'm open to changing version control systems (although I like Mercurial a lot). Any suggestions?
It is considered a best practice to check in code after each bug fix/feature addition/or what not. Given your files are really independent "projects" it seems unlikely a bug or feature would span multiple files. Probably the best you can do is encourage your colleagues in best practices to commit changes only for a single file at once. Explain that better discipline about checking in leads to more manageable source control later. Hopefully you can get most to follow the practice and the few obstinate ones just stop taking their commits for.
It really depends on your typical reasons for merging one change but not the other. If you're using it to create a software configuration, i.e. sometimes you want to use version 1 of file A and version 2 of file B and sometimes it's the other way around, then you probably want to use subrepos to hold each file. If it's because you never want to accept part of a collaborator's change, then they need to be instructed how to make their changes more cohesive and submit them separately. That can sometimes be a difficult concept for people who either haven't used source control before, or who are accustomed to source control like svn that has little or no intrinsic concept of a changeset.
It depends whether you want to maintain a single 'master' version of the files, merging in changes that you like and ignoring others. If collaborators want to develop other branches, then they should perhaps clone the repository, and you can then accept the changesets that you want in the master.
If you want to veto changes by other collaborators, then the changes either need to be kept separate (via a cloned repository or branch) or you need a review process before changes are pushed back to the trunk.
I always use incoming repositories for collaborators. They match what the other person has made, but it avoids messing with my own repository. When you do this, you can then cherrypick their new changesets into your own repository with the transplant extension.

Keeping experimental history out of shared repository in Mercurial

I'm fairly new to Mercurial, but one of the advantages I see using Mercurial is that while writing a feature you can be more free to experiment, check in changes, share them, etc, while still maintaining a "clean" repo for the finished feature.
The issue is one of history. If I tried 6 different ways to get something to work, now I'm stuck with all of the history for all my mistakes. What I'd like to do is go through and clean up my changes and "collapse" them into one changeset that can be pushed into a shared repository. This is complicated by the fact that I might pull in new changesets from the shared repository, and have those changesets intermingled with my own.
The best way I know of to do that is to use hg export to create a patch of my changes since cloning, clone a fresh repository, and apply the patch to the fresh repository.
Those steps seems a little bit cumbersome and easy to mess up, particularly if this methodology is rolled out to the whole dev team, some of whom are a little resistant to change (don't get me started). TortoiseHg makes the process slightly better since you can highlight the changesets you want to be included in an export.
My question is this: Am I making this more complex than it needs to be? Is there a better workflow I can use to ease my troubles? Is it too much to expect a clean history where entire (small-ish) features are included in one changeset?
Or maybe my whole question could be summed up this way:
Is there an equivalent for this in mercurial? Collapsing a git repository's history
Although I think you should reconsider your use of branches in Mercurial (as per my comment on your post), using named branches doesn't really help with your concern of maintaining useless or unnecessary history - it just organizes them a bit.
I would recommend a combination of these tools:
mercurial queues
histedit (not distributed with Hg)
the mq changeset strip feature
to rework a messy history before pushing to a blessed or master repo. The easiest thing would be to use strip to permanently remove any changeset with no children. Once you've done that you can use mq or histedit to combine, relocate, or modify existing commits. Histedit will even let you redo the comment associated with a changeset.
Some pitfalls:
In your opening paragraph you mention sharing changesets during feature development. Please understand that once you've shared a changeset it's not a good idea to modify using mq or histedit, or strip. Using these extensions can result in a change to the revision hash, which will make them look like a new changeset to everyone else.
Also, I agree with Paul Nathan's comment that mq (and histedit) are power features and can easily destroy a history. It's a good idea to make a safety clone before using these extensions.
Named branches are the simplest solution. Each experimental approach gets its own branch.This retains the history of the experiments.
The next solution is to have a fresh clone for each experiment. The working one gets pushed back to the main repo.
The next solution - and probably what you are really looking for - is the mq extension, which can "squash" a series of patches into a single commit. I consider mq to be "advanced", and "subject to accidently shooting yourself in the foot". I also don't care to squash my commits - I like having my version history present for reference.

Know of any source-control "stash"-like programs?

I once ran across a commercial tool for Windows that allowed you to "stash" code changes outside of source control but now I can't remember the name of it. It would copy the current version of a document to a backup location and undo your checkout in source control. You could then reintroduce your backed up changes later. I believe it worked with multiple source control systems. Does anyone know what program I'm trying to describe?
The purpose of my asking is twofold: The first is to find a good way to do this. The second is because I just can't remember what that darn program was and it's driving me crazy.
Git: http://git-scm.com/
You can use git stash to temporarily put away your current set of changes: http://git-scm.com/docs/git-stash . This stores your changes locally (without committing them), and lets you reintroduce them into your working copy later.
Git isn't commercial, but is quickly gaining many converts from tools like Subversion.
I think the product you're thinking of is "CodePickle" by SmartBear Software. However, it appears to be a discontinued product.
The term that seems to be used for the type of functionality you're looking for seems to be 'shelving'.
Microsoft's Team system has a 'shelve' feature.
Perforce has several unsupported scripts, including p4tar and p4 shelve
There are several 'shelving' tools for Subversion, but I don't know how robust they are.
I'm no git user myself, but several of my colleagues are, and they seem to like it precisely for this purpose. They use the various git wrappers to commit "real" changes to the SCM system used by their company, but keep private git repositories on their drives which they can keep their changes which they don't necessarily want to commit.
When they're ready to commit to the company's SCM server, then they just merge and commit upstream. Is that what you're looking to do?
Wouldn't it be a better idea to store your private changes in private branch, using e.g. svn switch to change to main branch whenever you need to?
Mercurial has the Shelve Extension which does what you want.
You can even select which changes from a single file that you want to shelve if you really want.
In Darcs, you either don't record the changes you want stashed (it asks you about including each change independently when you record a new patch), or put them in separate patches that you don't push upstream.
There's no need to fully synchronize your local private repos with public/upstream/other ones. You can just cherry pick the patches you want to push elsewhere. Selecting patches can also be done with patterns, so if you adopt a naming convention for your stashed patches you can push everything but them easily.
That way, your private changes are still in revision control, but they aren't shared until you want them to be.
I found an excellent article about obtaining similar functionality using Subversion branches:
Shelves-in-subversion
And then there's the old fallback... 'patch', or even the old "copy everything to another location, then revert".
Both of these are less convenient than using tools that are part of most VCS systems, though.

When to use a Tag/Label and when to branch?

Using TFS, when would you label your code and when would you branch?
Is there a concept of mainline/trunk in TFS?
A label in TFS is a way of tagging a collection of files. The label contains a bunch of files and the version of the file. It is a very low cost way of marking which versions of files make up a build etc.
A branch can be thought of as a copy of the files (of a certain version) in a different directory in TFS (with TFS knowing that this is a branch and will remember what files and versions it was a branch of).
As Eric Sink says, a branch is like a puppy. It takes some care and feeding.
Personally, I label often but branch rarely. I create a label for every build, but only branch when I know that I need to work on a historical version or that I need to work in isolation from the main line of code. You can create a branch from any point in time (and also a label) so that works well and means that we don't have branches lying around that are not being used.
Hope that helps,
Martin.
In any VCS, one usually tags when you want a snapshot of the code, to be kept as reference for the future. You branch when you want to develop a new feature, without disturbing the current code.
Andrew claims that labeling is lazier than branching; it's actually more efficient in most cases, not lazy. Labeling can allow users to grab a project at any point in time, keep a history of files changed for a version or build, and branch off of/work with the code at any point and later merge back into the main branch. Instead of what Andrew said, you're advised to only branch when more than one set of binaries is desired- when QC and Dev development are going on simultaneously or when you need to apply a hotfix to an old version, for example.
I always see labels as the lazy man's branch. If you are going to do something so significant that it requires a full-source label then it is probably best to denote this with a branch so that all tasks associated with that effort are in an organized place with only the effected code.
Branching is very powerful however and something worth learning about. TFS is not the best source control but it is not the worst either. TFS does support the concept of a trunk from which all branches sprout as well.
I would recommend this as a good place to read up on best practices - at least as far as TFS is concerned.

The theory (and terminology) behind Source Control

I've tried using source control for a couple projects but still don't really understand it. For these projects, we've used TortoiseSVN and have only had one line of revisions. (No trunk, branch, or any of that.) If there is a recommended way to set up source control systems, what are they? What are the reasons and benifits for setting it up that way? What is the underlying differences between the workings of a centralized and distributed source control system?
Think of source control as a giant "Undo" button for your source code. Every time you check in, you're adding a point to which you can roll back. Even if you don't use branching/merging, this feature alone can be very valuable.
Additionally, by having one 'authoritative' version of the source control, it becomes much easier to back up.
Centralized vs. distributed... the difference is really that in distributed, there isn't necessarily one 'authoritative' version of the source control, although in practice people usually still do have the master tree.
The big advantage to distributed source control is two-fold:
When you use distributed source control, you have the whole source tree on your local machine. You can commit, create branches, and work pretty much as though you were all alone, and then when you're ready to push up your changes, you can promote them from your machine to the master copy. If you're working "offline" a lot, this can be a huge benefit.
You don't have to ask anybody's permission to become a distributor of the source control. If person A is running the project, but person B and C want to make changes, and share those changes with each other, it becomes much easier with distributed source control.
I recommend checking out the following from Eric Sink:
http://www.ericsink.com/scm/source_control.html
Having some sort of revision control system in place is probably the most important tool a programmer has for reviewing code changes and understanding who did what to whom. Even for single person projects, it is invaluable to be able to diff current code against previous known working version to understand what might have gone wrong due to a change.
Here are two articles that are very helpful for understanding the basics. Beyond being informative, Sink's company sells a great source control product called Vault that is free for single users (I am not affiliated in any way with that company).
http://www.ericsink.com/scm/source_control.html
http://betterexplained.com/articles/a-visual-guide-to-version-control/
Vault info at www.vault.com.
Even if you don't branch, you may find it useful to use tags to mark releases.
Imagine that you rolled out a new version of your software yesterday and have started making major changes for the next version. A user calls you to report a serious bug in yesterday's release. You can't just fix it and copy over the changes from your development trunk because the changes you've just made the whole thing unstable.
If you had tagged the release, you could check out a working copy of it and use it to fix the bug.
Then, you might choose to create a branch at the tag and check the bug fix into it. That way, you can fix more bugs on that release while you continue to upgrade the trunk. You can also merge those fixes into the trunk so that they'll be present in the next release.
The common standard for setting up Subversion is to have three folders under the root of your repository: trunk, branches and tags. The trunk folder holds your current "main" line of development. For many shops and situations, this is all they ever use... just a single working repository of code.
The tags folder takes it one step further and allows you to "checkpoint" your code at certain points in time. For example, when you release a new build or sometimes even when you simply make a new build, you "tag" a copy into this folder. This just allows you to know exactly what your code looked like at that point in time.
The branches folder holds different kinds of branches that you might need in special situations. Sometimes a branch is a place to work on experimental feature or features that might take a long time to get stable (therefore you don't want to introduce them into your main line just yet). Other times, a branch might represent the "production" copy of your code which can be edited and deployed independently from your main line of code which contains changes intended for a future release.
Anyway, this is just one aspect of how to set up your system, but I think giving some thought to this structure is important.