Recording transition of code between two files using Mercurial

Recording transition of code between two files using Mercurial - version-control

I organized code incorrectly in a game I'm developing, and intend to move the state update code from GameView class into the Level class. I would like to record this cut-and-paste in some way. I use Mercurial for versioning. Is this possible with Mercurial? Does any other VCS provide this feature?
Reading a bit more and watching Linus' talk about git at Google, as well as reviewing answers and comments, I understand that this is a feature of git's blame command and works by doing heuristics.
I could get this functionality by using hg-git, exporting the Mercurial changesets, and then just using the git blame -M -C command. Is there an easier way that does not involve git?
If it is not, I'll accept an existing answer that mentions git and describes using its functionality best.

git does this automatically. See How does Git track history during a refactoring?

If the level class doesn't already exist you can do it with:
hg copy GameView.ext Level.ext
and then delete from GameView whatever you've moving to Level, and modify Level to reflect the correct name and exclude everything that's staying in GameView.
If Level already existed I don't think there's any good way to do it unless you're willing to extract that code out into its own class that could start out as a copy of GameView and be included (via #include, or composition, or extension) in Level.

I don't think that Mercurial tracks file renames/moves explicitly, at least it's not a part of a changeset (although it can guess where a particular file came from based on it's content).
That being said, I'm afraid that I'm not aware of any VCS that tracks movement of code between files, just addition to the target and subtraction from the source.

Mercurial supports file rename detection as follows:
hg addremove -s 100
The -s means "similarity" and the 100 means 100%. It will look for files whose name has changed but their content remains identical.
I quite often use this command with a 85 or 90% similarity figure. And in combination with the -n switch which allows a dry run (i.e. do nothing but report), it can be very powerful.
Detection of actual moved code is not really possible I don't think.

What I do for this sort of situation is isolate the changes from my other development, then commit the movement in a single commit with a note, "moved function munge() between files.
To my knowledge, no VCS tracks the movement of data. Renames are trackable with git and Clearcase.

Related

Mercurial from MPLAB: what are the very first steps?

I have never used a version control system before, and am rather confused about the very first steps. In a personal PIC/MPLAB programming project, under Linux, (no one else involved), with a single source code file that will go through several development stages as I add and verify features, I want to keep the set of "working so far" source code files, and Mercurial looks like a very good way to do it, preferable to my usual, and error prone, ad hoc approach.
So I have a very elementary question: I have installed Mercurial and accessed it from within MPLAB, and I have created a file
"/home/Harry/MPLABXProjects/flash675.X/.hg"
Please, I want to know 1) what to do next to start it off and 2) is it going to be obvious how I go about using the Mercurial system from then on? I find the documentation confusing on this very basic point.
(Yes, the first stage is just flash an LED, using a TMR0 interrupt, and that is working ok; I will then use that as a "prover" that the hardware is still working when I get to the inevitable "nothing happening" situations. I am building an ammeter for a p.s.u. with a Hall effect transducer, so I later will be adding an 8-LED serial display for debugging, a 12-bit A/D converter and a 4-digit 7-segment display, all using I2C serial control)

Good answer to this question will have the size of a full tutorial. There are many existing tutorials in the Internet, http://hginit.com/ for example. You can walk through it but I suggest to look at Mercurial: The Definitive Guide (link). The beginning of the book has a form of tutorial so it won't take a lot of time too.
The best helpers in learning is practice and experiments. Just don't forget to keep clean copy of your repository. It will save you when you break one of repository's clones. Here is a quick instruction so my answer won't be too philosophical.
Create your "prover". For example, it is a file called prover.c.
Look what you have: hg status. You will see your file with question mark. It means this file isn't under version control.
Add it to repository: hg add prover.c. Doing this you register your file in repository. After you commit Mercurial will start tracking changes in this file.
Commit: hd commit. You will be asked to enter a commit message. Good comments will help to discover a history in future.
That's it. If you change prover.c now and run hg status you will see that Mercurial knows about all changes. You can ask Mercurial to show changes by run hg diff.

Starting to version an already medium size project

I am about to start participating in the development of a medium-sized project (~50k lines) that was until now written by a single person, and not versioned; as a result folders are cluttered with different versions of the same file (named file1, file2, file3, etc.).
I proposed to start using a VCS for it (a priori Mercurial, which is the only one I've ever used -- for my personal projects --, but I'm open to suggestions), so I'm taking any good ideas as to how to "start" the repository. E.g., should I make an initial commit with all the existing files, and immediately make a new commit with the unused files removed? Or something else?
(constructive remarks on mercurial vs bazaar vs git vs whatever are also welcome.)
Thanks for your tips.

E.g., should I make an initial commit with all the existing files, and immediately make a new commit with the unused files removed?
If the size of the repository is not a concern, then yes, that is a good starting point. Otherwise you can just commit what's actually used, and go from there.
As for which system, all DVCSes stick to the same core principles. Which one you pick is entirely subjective — the only way to truly know which one you like is to try each one.

I would say use what you are the most comfortable with and meets your needs. As far as where to start, I personally would seed the repo with the current source as is, that way you can verify that everything builds and runs as expected. you can make this initial seed a branch. That way you can always go back to your starting point before refactoring.

My approach to this was:
create a Mercurial repository in the existing project folder ("existing")
commit all project files to "existing"
create an empty repository in what a different location ("new")
As files are tested and QA'd (this was necessary because there was so much dross in "existing") pull them from "everything" to "new".
Once files had been pulled into "new"; delete the corresponding files from "existing". If access is needed to these files while the migration is under way, push them back from "new" to "existing".
This gave me the advantage of putting everything under some sort of control for recovery purposes, control over introducing the project to the DVCS. Eventually the existing project folder became completely tested and approved for the project moving forward. At this point the "everything" directory could be deleted or changed into a working folder; and "new" became the actual project folder.

I think Mercurial is a good choice. Lightweight, fast, very simple to use and well-integrated with Windows (if that's the platform you're dealing with).
I would probably get rid of all the clutter before the first commit. Delete everything you don't care about, run all the necessary tests and only then do the commit.
Yes, I'm dead set against the 0-day cluttering of repos.
Granted, a 50K SLOC project isn't very big, but if you commit files you already know you won't need, they will make your repo slightly bigger.
Also, remember to check that the tree doesn't contain large binary files. If it does, get rid of them if at all possible.

Working with folders in RCS

I have been following the tutorial http://www.burlingtontelecom.net/~ashawley/rcs/tutorial.html on how to work with files using RCS. This works well but only with one file. Is there a way to create an RCS file with directories as well?
I have a project folder called myproject, and in this directory I have all my files for that project. I want to create a revision control system for the myproject folder and all its files that are inside.

As William's comment says, RCS only works with single files. (It also doesn't seem to be particularly suitable for multiple-user stuff.)
Of course, nothing stops you from putting each (source) file in a directory under RCS control; in fact, this is essentially what CVS does (though in recent versions it handles the RCS data itself, rather than invoking RCS to do it as it used to do). Unfortunately, this fragments the change history rather badly; a commit affecting many files ends up as separate commits to each file, which just happen to have the same commit message (and timestamp?), and in general every file will have a different revision in what the user might like to think of as the "same" revision. (This makes tags quite essential.) CVS also has issues with the atomicity of commits: you could end up with commit A and commit B getting tangled up, such that in file foo commit A precedes commit B, but in file bar commit B precedes commit A!
SVN (Subversion) is an attempt to rectify some of the problems in CVS, though it also brings some new limitations, and keeps many of the existing ones; it is probably wiser (as William implies) to just use a distributed version control system (DVCS) for your multi-file projects. There are many choices:
Darcs uses a unique patch-based model: a repository is treated as a sequence of patches, which can be applied to an empty tree to build the current revision; patches can often be reordered by "commuting" pairs of patches, and cherry-picking patches from other repositories is quite easy. The downside is that the change history is a bit less clear than in most DVCSes. See http://wiki.darcs.net/Using/Model, http://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theory.
Directed-acyclic-graph (DAG) based DVCSes model a repository as a directed acyclic graph of revisions, where each revision can have one parent, two parents, or perhaps more. Each revision has an associated file tree state; sometimes renames are also tracked somehow.
Git, as already mentioned. Has a very simple model, but a very complicated interface: there are many commands, some of which are not really intended for humans to use (owing to many parts of it having been prototyped in shell script, probably), so it can be hard to find the ones you want. Also, its model might be a bit too simple: it doesn't track renames at all.
Bazaar (a.k.a. bzr) has a more complicated model, including support for file/directory renames. It's difficult to say how much more complicated, though, because whatever documentation may exist is not nearly as accessible as Git's. It does, however, have a rather simpler user interface, and there are a number of useful plugins, including a distributed-development-friendly SVN plugin: committing from a branch back to SVN need not interfere with the validity of others' branches of your branches, and bzr metadata is even committed back to SVN. Can make things much less painful if you want to start hacking on an SVN-based project without having commit access, but hope to get your changes committed eventually. Bazaar is my personal favorite DAG-based DVCS.
Mercurial (a.k.a. hg) seems fairly similar to Bazaar, though I think it tracks renames only for individual files, not for directories. It also supports plugins, though its SVN plugin isn't as nice as Bazaar's: it doesn't support lossless commits, so branching from other peoples' branches is unwise. I don't have much experience with it, so I can't really evaluate it in-depth.

As the comments already mention, if you are starting out with version control, you would be well advised to choose a newer system than RCS (git, mercurial, fossil, subversion, ...). That said, RCS still works fine for a single developer working primarily on a single machine - I still use it for my own code because I've not yet OK worked out how to get the (20+ years of) history I want into git in the way I want it.
Anyway, to use RCS, make sure you have an RCS sub-directory in each directory where you have working source code under RCS management. The RCS files will be placed in the sub-directory automatically, and retrieved automatically. If your version of make is not already aware of RCS, then you can train it so that it is - or get a version of make that does (GNU Make, for example).

TL:DR - Look into DCVS for an alternative of RCS. It uses CVS, which uses RCS, but it's more modular for working in a repository that is distributed, as well as having a hierarchy of directories.
I'm currently going through a similar issue, and may have found something worthy of note, especially for people who are being forced to use a light, command-line based revision control systems with multiple team members.
My manager will not get off this idea of using RCS as our version control. But for the specifications, he wants developers to be able to create and edit on their own repository on a localized server within our company. Two issues with this:
RCS does not create, nor hold any sort of 'repository'. It is software that keeps track of file edits, on a Per File Basis. Meaning that the 'repository' is nothing more than another directory with RCS checked-in files. This is sub-par for team-geared projects, to say the least.
On a large project with multiple directories and tens of individual working files, even the prospect of creating a top-level RCS directory with a symbolic link in the working directories gives rise to complications such as naming conventions, as well as forgetting which file came from which bottom-level / working directory.
With what SamB posted, even CVS gives additional problems with RCS that we now have to account for, but gives us a slight ability for some additional hierarchy. But one suggestion he forgot was DCVS.
It's nothing more than an extension of CVS, CVSup, and:
contains functionality to distribute CVS repositories with local lines of development and automatically handles synchronization of the distributed repositories in the background.

Is the Mercurial .hgignore my only option for handling hundreds of temp files generated when compiling?

I've been all over google and SO looking for someone who has asked this question, but am coming up completely empty. I'll apologize in advance for the lengthy round-about way of asking the question. (If I was able to figure out how to encapsulate the problem, maybe I would have been successful in finding an answer.)
How are large projects managed in Mercurial, when the act of building / compiling generates hundreds of temporary files in order to create the end result?? Is .hgignore the only answer?
Example Scenario:
You have a project that wants to use some open source package for some feature, and needs to compile from source. So you go get the package. un-.tgz it and then slap it into its own Mercurial repository so you can then start tracking changes. Then you make all your changes, and run a build.
You test your end result, are happy with the results and are ready to commit back to your local clone of the repository. So you do an hg status to check your changes prior to committing The hg status results cause you to immediately start using all those words that would make your mother ashamed — because you now have screens and screens of "build cruft".
For the sake of argument say this package is MySQL or Apache: something that
you don't control and will be changing regularly,
leaves a whole lot of cruft behind in a whole lot of places, and
there is no guarantee the cruft won't change each time you get a new version from the external source.
Wow what? The particular project causing this angst is going to be worked on by multiple developers in multiple physical locations, and so needs to be as straightforward as possible. If there is too much involved they're not going to do it, and we'll have a bigger problem on our hands. (Sadly, some old dogs are not keen on learning new tricks...)
One proposed solution was that they would just have to commit everything locally before doing a make, so they have a "clean slate" they would then have to clone from to actually do the build in. That got shot down as (a) too many steps, and (b) not wanting to cruft up the history with a bunch of "time to build now" changesets.
Someone else has proposed that all the cruft just be committed into the Mercurial repository. I am strongly against that because then the next time around those files will turn up as "modified" and therefore be included in the changeset's file list.
We can't possibly be the only people who have run into this problem. So what is the "right" solution? Is our only recourse to try create a massively intelligent .hginore file? This makes me uneasy, because if I tell Mercurial to "ignore everything in this directory I haven't already told you about", then what happens if the next applied patch adds files into that ignored directory? (Mercurial will never see that new file, right?)
Hopefully this is not a completely stupid question with an obvious answer. I've compiled things from source many times before, but have never needed to apply version control on top of that. Plus we're new to Mercurial.

Two options:
The best option is to do an out of tree build, if you can. This is a build where you place the object files outside of the source tree. Some build systems, such as CMake, support this directly. For other systems, you need to be lucky since the upstream project must have added support for this in their Makefile or similar.
A more general option is to tell Mercurial to ignore specific types of files, not entire directories. This works well in my experience.
To test the second option, I wanted to compile Apache. However, it requires APR, so I tested with that instead. After checking in a clean apr-1.3.8.tar.bz2 I did ./configure; make and looked at the output of hg status. The first few pattens were easy:
syntax: glob
*~
*.o
*.lo
*.la
*.so
.libs/*
The remaining new files look like they are specific files generated by the build process. It's easy to add them too:
% hg status --unknown --no-status >> .hgignore
That also added .hgignore since I hadn't yet scheduled it for addition. Removing that I ended up with this .hgignore file:
syntax: glob
*~
*.o
*.lo
*.la
*.so
.libs/*
.make.dirs
Makefile
apr-1-config
apr-config.out
apr.exp
apr.pc
build/apr_rules.mk
build/apr_rules.out
build/pkg/pkginfo
config.log
config.nice
config.status
export_vars.c
exports.c
include/apr.h
include/arch/unix/apr_private.h
libtool
test/Makefile
test/internal/Makefile
I consider this a quite robust way to go about this in Mercurial or any other revision control system for that matter.

The best solution would be to fix the build process so that it behaves in a 'nice' manner.. namely allowing you to specify some separate directory to store intermediate files in (that could then be completely ignored via a very simple .hgignore entry... or not even within the version-controlled directory structure at all.

For what it's worth, I've found that in this situation a smart .hgignore is the only solution that has worked for me so far. With the inclusion of regular expression support, it's very powerful, but tricky, too, since a pattern that is cruft in one directory may well be source in another.
At least you can check in the .hgignore and share it with your developers. That way the work is only done once.
[Edit] At least, however, it's possible -- as noted above by Martin Geisler -- to have full path specifications in your .hgignore file; you can, therefore, have test/Makefile in the .hgignore and still have Mercurial notice a new test2/Makefile
His process for creating the file should give you almost what you want, and you can tune it from there.

One option you have is to clean your working directory after verifying a build.
make clean
hg status
Of course you may not want to clean your project if it takes more than a few minutes to build.

If the files you want to track are already known to hg, you can hgignore everything. Then you need to use hg import to add patch, and not just use the patch command (since hg needs to be aware if some new files should be tracked).

How about a shell (or whatever) script that walks your build directory recursively, finds every file created after your build process started running, and moves all these files (of course, you can specify the exceptions) into a cruft_dir subdirectory. Then you can just put cruft_dir/* in .hgignore.
EDIT: I forgot to add, but this is fairly obvious, that this shell script runs automatically as soon as your build finishes. Maybe it's even called as the last command in your Makefile/ant/whatever file.

Know of any source-control "stash"-like programs?

I once ran across a commercial tool for Windows that allowed you to "stash" code changes outside of source control but now I can't remember the name of it. It would copy the current version of a document to a backup location and undo your checkout in source control. You could then reintroduce your backed up changes later. I believe it worked with multiple source control systems. Does anyone know what program I'm trying to describe?
The purpose of my asking is twofold: The first is to find a good way to do this. The second is because I just can't remember what that darn program was and it's driving me crazy.

Git: http://git-scm.com/
You can use git stash to temporarily put away your current set of changes: http://git-scm.com/docs/git-stash . This stores your changes locally (without committing them), and lets you reintroduce them into your working copy later.
Git isn't commercial, but is quickly gaining many converts from tools like Subversion.

I think the product you're thinking of is "CodePickle" by SmartBear Software. However, it appears to be a discontinued product.
The term that seems to be used for the type of functionality you're looking for seems to be 'shelving'.
Microsoft's Team system has a 'shelve' feature.
Perforce has several unsupported scripts, including p4tar and p4 shelve
There are several 'shelving' tools for Subversion, but I don't know how robust they are.

I'm no git user myself, but several of my colleagues are, and they seem to like it precisely for this purpose. They use the various git wrappers to commit "real" changes to the SCM system used by their company, but keep private git repositories on their drives which they can keep their changes which they don't necessarily want to commit.
When they're ready to commit to the company's SCM server, then they just merge and commit upstream. Is that what you're looking to do?

Wouldn't it be a better idea to store your private changes in private branch, using e.g. svn switch to change to main branch whenever you need to?

Mercurial has the Shelve Extension which does what you want.
You can even select which changes from a single file that you want to shelve if you really want.

In Darcs, you either don't record the changes you want stashed (it asks you about including each change independently when you record a new patch), or put them in separate patches that you don't push upstream.
There's no need to fully synchronize your local private repos with public/upstream/other ones. You can just cherry pick the patches you want to push elsewhere. Selecting patches can also be done with patterns, so if you adopt a naming convention for your stashed patches you can push everything but them easily.
That way, your private changes are still in revision control, but they aren't shared until you want them to be.

I found an excellent article about obtaining similar functionality using Subversion branches:
Shelves-in-subversion

And then there's the old fallback... 'patch', or even the old "copy everything to another location, then revert".
Both of these are less convenient than using tools that are part of most VCS systems, though.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse