Keeping experimental history out of shared repository in Mercurial - version-control

I'm fairly new to Mercurial, but one of the advantages I see using Mercurial is that while writing a feature you can be more free to experiment, check in changes, share them, etc, while still maintaining a "clean" repo for the finished feature.
The issue is one of history. If I tried 6 different ways to get something to work, now I'm stuck with all of the history for all my mistakes. What I'd like to do is go through and clean up my changes and "collapse" them into one changeset that can be pushed into a shared repository. This is complicated by the fact that I might pull in new changesets from the shared repository, and have those changesets intermingled with my own.
The best way I know of to do that is to use hg export to create a patch of my changes since cloning, clone a fresh repository, and apply the patch to the fresh repository.
Those steps seems a little bit cumbersome and easy to mess up, particularly if this methodology is rolled out to the whole dev team, some of whom are a little resistant to change (don't get me started). TortoiseHg makes the process slightly better since you can highlight the changesets you want to be included in an export.
My question is this: Am I making this more complex than it needs to be? Is there a better workflow I can use to ease my troubles? Is it too much to expect a clean history where entire (small-ish) features are included in one changeset?
Or maybe my whole question could be summed up this way:
Is there an equivalent for this in mercurial? Collapsing a git repository's history

Although I think you should reconsider your use of branches in Mercurial (as per my comment on your post), using named branches doesn't really help with your concern of maintaining useless or unnecessary history - it just organizes them a bit.
I would recommend a combination of these tools:
mercurial queues
histedit (not distributed with Hg)
the mq changeset strip feature
to rework a messy history before pushing to a blessed or master repo. The easiest thing would be to use strip to permanently remove any changeset with no children. Once you've done that you can use mq or histedit to combine, relocate, or modify existing commits. Histedit will even let you redo the comment associated with a changeset.
Some pitfalls:
In your opening paragraph you mention sharing changesets during feature development. Please understand that once you've shared a changeset it's not a good idea to modify using mq or histedit, or strip. Using these extensions can result in a change to the revision hash, which will make them look like a new changeset to everyone else.
Also, I agree with Paul Nathan's comment that mq (and histedit) are power features and can easily destroy a history. It's a good idea to make a safety clone before using these extensions.

Named branches are the simplest solution. Each experimental approach gets its own branch.This retains the history of the experiments.
The next solution is to have a fresh clone for each experiment. The working one gets pushed back to the main repo.
The next solution - and probably what you are really looking for - is the mq extension, which can "squash" a series of patches into a single commit. I consider mq to be "advanced", and "subject to accidently shooting yourself in the foot". I also don't care to squash my commits - I like having my version history present for reference.

Related

Mercurial-like named branches experience in Perforce

I am new to Perforce and find it really hard to follow its workflow..
I have used Mercurial before (not in any advanced ways), but what I lack most in Perforce is the idea of named branches.
Let me explain what I'd like to do:
I get the latest revisions of all files and want to work on a new feature/story/task..
I create a brach, say "Feature 3021"
I code, save changes in this branch (hg commit)
I can save changes to a central server (hg push)
When I'm done coding, I merge the changes from "Feature 3021" with the main branch (default, master, etc.) - after that the main branch has the code I wrote
I can close the named brach ("Feature 3021") so that further commits are not possible.
I don't need this exact behavior in Perforce, but something analogous. I know that Perforce is centralized, so the commit-push step would be probably one, but this is a minor problem here.
All I care is to be able to save my work in version control at any time, even if it's not 100% ready - perhaps to a different branch. I'd also like other users to be able to be able to get my code (from this different branch), but only if they want this - the default branch should stay unafected as long as I don't merge my changes with it.
Is it possible? I am using Server 2012.2
Can you upgrade to Server 2013.1 or later? There's a great feature there, called Task Streams.
Here's some references:
http://www.perforce.com/blog/130627/task-streams-even-if-you-are-classic-perforce-shop
http://www.perforce.com/blog/130206/task-streams
http://www.perforce.com/perforce/doc.current/manuals/p4v/streams_task.html
The analogous flow in Perforce would be:
Maintain a main-line "branch" at some path, say //depot/default
To create a feature branch, integrate from //depot/default to //depot/feature-3021
Work under //depot/feature-3021 and submit
When you are ready to merge back, integrate //depot/feature-3021 to //depot/default
Regarding closing the branch after its use, there are a couple of options that I can think of. You could either change permissions or simply delete it. The delete could always be recovered.
Also note the paths don't need to be at the same hierarchy. A more reasonable branch strategy might use paths like this.
Main-line: //depot/default
Developer branches: //depot/dev/${user}/${feature}

revision control for many unrelated files

I'm curious to get people's thoughts on how to manage version control for unrelated functions in Matlab.
I keep a reasonably large set of general purpose scripts, each of which is more or less independent of the others. I've been keeping them all in a single directory, containing a single repository in Mercurial. I'm starting to collaborate much more, and I'd like my collaborators to be able modify the files, commit, branch, and merge.
The problem is that the files are independent of one another. Essentially, they're like many separate little projects. But Mercurial treats the repository as a single entity. So if a collaborator modifies file A and B, and I only want to merge in the changes from file A, things get complicated. I know that I could merge from the collaborator, then revert file B, but I'm wondering if there's a simpler way to handle this setup.
I could set up many tiny repositories to manage each file separately, but that also gets complicated.
I'm open to changing version control systems (although I like Mercurial a lot). Any suggestions?
It is considered a best practice to check in code after each bug fix/feature addition/or what not. Given your files are really independent "projects" it seems unlikely a bug or feature would span multiple files. Probably the best you can do is encourage your colleagues in best practices to commit changes only for a single file at once. Explain that better discipline about checking in leads to more manageable source control later. Hopefully you can get most to follow the practice and the few obstinate ones just stop taking their commits for.
It really depends on your typical reasons for merging one change but not the other. If you're using it to create a software configuration, i.e. sometimes you want to use version 1 of file A and version 2 of file B and sometimes it's the other way around, then you probably want to use subrepos to hold each file. If it's because you never want to accept part of a collaborator's change, then they need to be instructed how to make their changes more cohesive and submit them separately. That can sometimes be a difficult concept for people who either haven't used source control before, or who are accustomed to source control like svn that has little or no intrinsic concept of a changeset.
It depends whether you want to maintain a single 'master' version of the files, merging in changes that you like and ignoring others. If collaborators want to develop other branches, then they should perhaps clone the repository, and you can then accept the changesets that you want in the master.
If you want to veto changes by other collaborators, then the changes either need to be kept separate (via a cloned repository or branch) or you need a review process before changes are pushed back to the trunk.
I always use incoming repositories for collaborators. They match what the other person has made, but it avoids messing with my own repository. When you do this, you can then cherrypick their new changesets into your own repository with the transplant extension.

Small Shop, Why DVCS?

We have a small programming shop of at most 5 people working on a single project. I fully grok why DVCS is better for open source projects, and for large companies, but what advantages does it have for smaller companies other than "you can work on the airplane." Which would require extra SA work to make sure that our repositories on DEV boxes was properly backed up every night.
We also a have several non technical people (artists, translators) who can (sort of) deal with SVN, in peoples experience how much training is required to get them to move to a DVCS?
I'm going to speak from my experience, which is primarily with SVN and Hg, often working with designers and programmers who are not comfortable with version control.
My big beef with SVN and other CVCS's I've used is that they block you from making commits, not just when the network is down, but also in case of a conflict (or worse, someone locking a file so no one else can make changes to it!). You could of course commit to a branch, but between network bandwidth required to switch branches and the pain involved in merging, you still have a problem.
Of course, SVN blocks you from committing conflicted files so that you don't accidentally overwrite someone else's work; SVN requires you to at least acknowledge that you know one version or the other (or a custom combination of the two) is right. Mercurial, however, has a better solution (2, actually):
1. You can always commit to the local repository now and merge later. (All DVCS's have this feature.)
2. Even if you pull or push conflicting changes, instead of being blocked from committing, you have multiple heads via anonymous branches. (Sorry I can't really explain this in detail here, but you can google it.)
So your workflow goes from:
1. Get the latest, make changes, test them.
2. Get the latest, resolve conflicts, test the result.
3. Commit.
and becomes:
1. Get the latest, make changes, test them.
2. Commit (so you have a place to fall back to).
3. Get the latest, resolve conflicts, test the result.
4. Commit and push.
That extra commit means you're doing less work per commit, so you have more checkpoints to fall back on. And there are other ways you can make more commits without getting in others' way.
SVN is just slow enough to break my concentration and tempt me to go to facebook; mercurial is fast and git is faster. The speed issue becomes very important when reviewing a log or changes to a working copy. With TortoiseHg, I can click through a list of files, and instantly see changes to that file; it takes about a couple seconds per file with TortoiseSVN + WinMerge (not sure how much of this is due to the DVCS). The more I use these tools, the more I feel that a VCS needs to be fast, just like a text editor or a mouse cursor--fast enough that you shouldn't require the network to do it.
Subjectively, I find TortoiseHg to be a heck of a lot easier to use than TortoiseSVN (or other tortoises I've used). TortoiseHg is mutli-platform, too. :)
One more thing: As I understand, a SVN working copy is defined recursively: Each folder is a working copy. This allows you to do some fancy-pants stuff (e.g. having a working copy that contains folders from disparate locations in the repository). I don't know if Hg has a similar feature, but in my experience, SVN's implementation of this feature causes only problems where I work, especially for those not quite comfy with SVN. When they copy-and-paste a WC folder on their machine via the OS shell instead of via svn copy, it goofs up their WC. I've goofed up my WC this way as well. It's less of a problem with Hg--you normally work with an entire repo at once, whether you clone, update, or commit.
SVN improved a lot concerning merging since its release. But it still lacks file rename tracking, often resulting in tree conflicts. Renaming is the killer app of distributed version control tackles this issue, adding some interesting links in the comment section.
DVCS lets you push to a central repository, at the cost of one additional command compared to Subversion. Occasional users should be able to adapt to this minor change in workflow. But giving the freedom of 'local' commits and branches to power users without cluttering a central repository.
Concerning tooling, which might be of importance for user acceptance, Mercurial is on par with Subversion.

Using different "paths" in Mercurial - also called branching

Is it possible to have different development "paths" from a given point in Mercurial, without having to clone my project? I currently have 2-3 different implementations options for a project and I'd like to try them out. If I could just use one and at any point come back and start in another "path" without losing data from the older one that would be nice, but I am not even sure it is possible.
Thanks
This is exactly what branching is designed for:
https://www.mercurial-scm.org/wiki/Branch
The easiest way to create a branch in Mercurial is to simply checkout an older version, and then commit again with something different from what you committed after it the first time. You won't lose the old following commit, the new commit will simply branch out into a new line of development and the original commit(s) will remain on the previous line of development.
Yes, you probably want bookmarks for this - they're a lightweight way of marking various heads without recording the names forever in the revision (which branches do.) See BookmarksExtension for more details.
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/ may also be helpful - it's essentially the canonical document on branch management strategies in Mercurial.

Know of any source-control "stash"-like programs?

I once ran across a commercial tool for Windows that allowed you to "stash" code changes outside of source control but now I can't remember the name of it. It would copy the current version of a document to a backup location and undo your checkout in source control. You could then reintroduce your backed up changes later. I believe it worked with multiple source control systems. Does anyone know what program I'm trying to describe?
The purpose of my asking is twofold: The first is to find a good way to do this. The second is because I just can't remember what that darn program was and it's driving me crazy.
Git: http://git-scm.com/
You can use git stash to temporarily put away your current set of changes: http://git-scm.com/docs/git-stash . This stores your changes locally (without committing them), and lets you reintroduce them into your working copy later.
Git isn't commercial, but is quickly gaining many converts from tools like Subversion.
I think the product you're thinking of is "CodePickle" by SmartBear Software. However, it appears to be a discontinued product.
The term that seems to be used for the type of functionality you're looking for seems to be 'shelving'.
Microsoft's Team system has a 'shelve' feature.
Perforce has several unsupported scripts, including p4tar and p4 shelve
There are several 'shelving' tools for Subversion, but I don't know how robust they are.
I'm no git user myself, but several of my colleagues are, and they seem to like it precisely for this purpose. They use the various git wrappers to commit "real" changes to the SCM system used by their company, but keep private git repositories on their drives which they can keep their changes which they don't necessarily want to commit.
When they're ready to commit to the company's SCM server, then they just merge and commit upstream. Is that what you're looking to do?
Wouldn't it be a better idea to store your private changes in private branch, using e.g. svn switch to change to main branch whenever you need to?
Mercurial has the Shelve Extension which does what you want.
You can even select which changes from a single file that you want to shelve if you really want.
In Darcs, you either don't record the changes you want stashed (it asks you about including each change independently when you record a new patch), or put them in separate patches that you don't push upstream.
There's no need to fully synchronize your local private repos with public/upstream/other ones. You can just cherry pick the patches you want to push elsewhere. Selecting patches can also be done with patterns, so if you adopt a naming convention for your stashed patches you can push everything but them easily.
That way, your private changes are still in revision control, but they aren't shared until you want them to be.
I found an excellent article about obtaining similar functionality using Subversion branches:
Shelves-in-subversion
And then there's the old fallback... 'patch', or even the old "copy everything to another location, then revert".
Both of these are less convenient than using tools that are part of most VCS systems, though.