I'm new to Mercurial and trying to understand how things work.
I wonder what is the difference between changesets and revisions ?
Thanks.
None.
From the Understanding Mercurial page:
When you commit, the state of the working directory relative to its
parents is recorded as a new changeset (also called a new
"revision")...
and further down the page:
Mercurial groups related changes to multiple files into single atomic
changesets, which are revisions of the whole project.
(emphasis mine)
Even if old, someone might stumble on this and I would say that there's a crucial difference. They are related as #Edward pointed out. Still, based on Mercurial's FAQ they are not the same.
A revision number is a simple decimal number that corresponds with the ordering of commits in the local repository.
The important part is local repository and further:
It is important to understand that this ordering can change from machine to machine due to Mercurial's distributed, decentralized architecture. This is where changeset IDs come in. A changeset ID is a 160-bit identifier that uniquely describes a changeset and its position in the change history, regardless of which machine it's on.
You should always use some form of changeset ID rather than the local revision number when discussing revisions with other Mercurial users as they may have different revision numbering on their system.
From experience I can tell, revision numbers do differ sometimes and are not unique.
Related
I use the "show annotation" functionality quite often. Now, I accidentally crushed the svn and solved it by making a re-commit of everything. Now, every time I use the "show annotation" function, it shows this last commit on every line.
Can I revert this somehow?
I'm assuming you didn't kill the entire SVN and "solved" that by starting over from rev 1. I'm assuming some intermediate revision got corrupted and you had to touch and commit every file in a new revision, but older revisions are visible and accessible in the SVN history. The Annotations feature, and Plan B both rely on that.
What the textbook offers
Excluding a single mid-range revision is not possible, given a certain history. You can only exclude head or tail ranges by specifying revisions other than 1 for the "From" and HEAD for the "To".
Say the "repair" revision you want to exclude is r1000. To exclude it, you can choose to consider either (from-to) r1-r999 or r1001-HEAD, leaving out r1000. So you are confined to either viewing the changes before or after the repair.
You can read up on the possibilities and options of what's internally called svn blame in the SVN documentation.
Plan B
Now, that's not really satisfying, I imagine. Here's something else you can try, but please create a backup of your repo first.
With the help of the SVN history viewer, or log viewer, find the last revision before the corrupted revision, say r997.
Make a branch based off that last good revision.
Then delete or move the current trunk, using the corresponding SVN commands.
In the last step, move or branch(=copy) the branch back to the trunk location.
You have effectively cut out the corrupt revisions. The branch-now-trunk has a "hole" in its revision numbers, because branching off r997 created a new revision younger than the corrupted and repairing revisions. Afterwards, showing annotations on that new trunk will work like before, but wont include the corruption and your "repair".
Here, I made an illustration for you:
This operation can screw up some ancestry operations like merging, but I've done it successfully before, even with large merging operations later on, so you might as well try it, too. Good luck!
First off: I know that hg branches are immutable and they cannot be renamed. I am also aware of the existence of the mutable branches extension for hg. But I'd prefer a different approach, as I can never be sure that all of our developers have it installed and active, it's still "only" an extension.
My question: We have a repo with about 20 branches in it. Due to various reasons (inexperienced use, bad choices, experiments that became production environments) some of those branches were named badly and now our repo is a little confusing. What we'd like to do is rename a few of those branches, because obviously, the more we work with them, the more it's becoming a problem.
Do you have any suggestions? I already thought of a "tool" or some kind of script that recreates the whole repo from scratch, getting changesets of the old repo and committing them - with new branch names - to a new repo, "rebuilding" it. But before I go and waste time in writing something like that, I'd like to hear if there are other possibilities.
FYI: there are about 600 commits with frequent merges across the various branches.
You can rebuild the repository by doing a Mercurial to Mercurial conversion using hg convert. Enable the convert extension first and create a branchmap to do the mapping of branch names from old to new:
a-bad-name new-name
another-bad-name better-name
You can use that to map multiple bad names into a single good name, for example.
After the conversion, you will have a new repository with the same history, but with different branch names. The changeset hashes will thus be different and people will have to reclone (but I think you're aware of this already).
Whenever I commit, I want to save in a file the revision number of the changeset that I'm creating. I also want that file to be added to the same changeset.
Note that the revision number of the parent of the working directory is not what I want because the changeset being created will have a higher revision number. Usually it's just the parent revision number + 1, but if someone committed since the time I checked out my working directory, it may be higher.
UPDATE:
It's obviously very strange that I'd be interested in this information, since as the comments below say, it's repo-specific and won't match what others see. However, I am the only developer, using a single repository. I find the repo revision numbers super convenient to keep track of what code was used to generated various research results. I can see how it's not great, but it works in this specific scenario.
Obviously, I could use the hash, but that's harder to remember and use in a conversation. If I did want to use the hash, my question would still remain: how to get the hash of the changeset that's being committed.
Related:
mercurial - I want to add some custom code to be run after commit seems to be unable to achieve the desired outcome.
This article is clearly relevant, but unless I miss something, it relies on the fact that nobody committed to the same repository since the last checkout by the current user.
I'm under Windows 7, TortoiseHG, latest version.
You can probably just put this in there:
TIP=$(hg id --num --rev tip)
NEXT=$(($TIP + 1))
but please do keep in mind that those numbers are almost entirely meaningless. When someone else clones that repository the revision numbers can change. Only the nodeids have any meaning outside the repository in which you looked them up.
Background
We use a central repository model to coordinate code submissions between all the developers on my team. Our automated nightly build system has a code submission cut-off of 3AM each morning, when it pulls the latest code from the central repo to its own local repository.
Some weeks ago, a build was performed that included Revision 1 of the repo. At that time, the build system did not in any way track the revision of the repository that was used to perform the build (it does now, thankfully).
-+------- Build Cut-Off Time
|
|
O Revision 1
An hour before the build cut-off time, a developer branched off the repository and committed a new revision in their own local copy. They did NOT push it back to the central repo before the cut-off and so it was not included in the build. This would be Revision 2 in the graph below.
-+------- Build Cut-Off Time
|
| O Revision 2
| |
| |
|/
|
O Revision 1
An hour after the build, the developer pushed their changes back to the central repo.
O Revision 3
|\
| |
-+-+----- Build Cut-Off Time
| |
| O Revision 2
| |
| |
|/
|
O Revision 1
So, Revision 1 made it into the build, while the changes in Revision 2 would've been included in the following morning's build (as part of Revision 3). So far, so good.
Problem
Now, today, I want to reconstruct the original build. The seemingly obvious steps to do this would be to
determine the revision that was in the original build,
update to that revision, and
perform the build.
The problem comes with Step 1. In the absence of a separately recorded repository revision, how can I definitively determine what revision of the repo was used in the original build? All revisions are on the same named branch and no tags are used.
The log command
hg log --date "<cutoff_of_original_build" --limit 1
gives Revision 2 - not Revision 1, which was in the original build!
Now, I understand why it does this - Revision 2 is now the revision closest to the build cut-off time - but it doesn't change the fact that I've failed to identify the correct revision on which to rebuild.
Thus, if I can't use the --date option of the log command to find the correct historical version, what other means are available to determine the correct one?
Considering whatever history might have been in the undo files is gone by now (the only thing I can think of that could give an indication), I think the only way to narrow it down to a specific revision will be a brute force approach.
If the range of possible revisions is a bit large and the product of building changes in size or other non-date aspect that is linear or near enough to linear, you may be able to use the bisect command to basically do a binary search to narrow down what revision you're looking for (or maybe just get close to it). At each revision that bisect stops to test, you would build at that revision and test whatever aspect you're using to compare against what the scheduled build produced that night. Might not even require building, depending on the test.
If it really is as simple as the graph you depict and the range of possibilities is short, you could just start from the latest revision it might be and walk backwards a few revisions, testing against the original build.
As for a definitive test comparing the two builds, hashing the test build and comparing it to a hash of the original build might work. If a compile on the nightly build machine and a compile on your machine of the same revision do not produce binary-identical builds, you may have to use binary diffing (such as with xdelta or bsdiff) and look for the smallest diff.
Mercurial does not have the information you want:
Mercurial does not, out of the box, make it its business to log and track every action performed regarding a repository, such as push, pull, update. If it did, it would be producing a lot of logging information. It does make available hooks that can be used to do that if one so desires.
It also does not care what you do with the contents of the working directory, such as opening files or compiling, so of course it is not going to track that at all. It's simply not what Mercurial does.
It was a mistake to not know exactly what the scheduled build was building. You agree implicitly because you now log that very information. The lack of that information before has simply come back to bite you, and there is no easy way out of it. Mercurial does not have the information you need. If the central repo is just a shared directory rather than a web-hosted repository that might have tracked activity, the only information about what was built is in the compiled version. Whether it is some metadata declared in the source that becomes part of the build, a naive aspect like filesize, or you truly are stuck hashing files, you can't get your answer without some effort.
Maybe you don't need to test every revision; there may be revisions you can be certain are not candidates. Knowing the time of the compile is merely a factor as the upper bound on the range of revisions to test. You know that revisions after that time could not possibly be candidates. What you don't know is what was pushed to the server at the time the build server pulled from it. But you do know that revisions from that day are the most likely. You also know that revisions in parallel unnamed branches are less-likely candidates than linear revisions and merges. If there are a lot of parallel unnamed branches and you know all your developers merge in a particular way, you might know whether the revisions under parent1 or parent2 should be tested based.
Maybe you don't even need to compile if there is metadata you can parse from the source code to compare with what you know about the specific build.
And you can automate your search. It would be easiest to do so with a linear search: less heuristics to design.
The bottom line is simply that Mercurial does not have a magic button to help in this case.
Apologies, it's probably bad form to answer one's own question, but there wasn't enough room to properly respond in a comment box.
To Joel, a couple of things:
First - and I mean this sincerely - thanks for your response. You provided an option that was considered, but which was ultimately rejected because it would be too complex to apply to my build environment.
Second, you got a little preachy there. In the question, it was understood that because a separately recorded repository revision was absent, there would be 'some effort' to figure out the correct revision. In a response to Lance's comment (above), I agree that recording the 40-byte repository hash is the 'correct' way of archiving the necessary build info. However, this question was about what CAN be done IF you do not have that information.
To be clear, I posted my question on StackOverflow for two reasons:
I figured that others must have run into this situation before and that, perhaps, someone may have determined a means to get at the requisite information. So, it was worth a shot.
Information sharing. If others run into this problem in the future, they will have an online reference that clearly explained the problem and discussed viable options for remediation.
Solution
In the end, perhaps my greatest thanks should go to Chris Morgan, who got me thinking to use the central server's mercurial-server logs. Using those logs, and some scripting, I was able to definitively determine the set of revisions that were pushed to the central repository at the time of the build. So, my thanks to Chris and to everyone else who responded.
As Joel said, it is not possible. However there are certain solutions that can help you:
maintain a database of nightly build revisions (date + changeset id)
build server can automatically tag revision it is based on (nightly/)
switch to Bazaar, it manages version numbers differently (branched versions are in form of REVISION_FORKED.BRANCH_NUMBER.BRANCH_REVISION so your change number 2 would be 1.1.1
Quick question on Mercurial. Suppose my colleague and I both have an up to date copy of trunk. We both make changes, then we both push/pull the changes from each other.
I am guessing Mercurial keeps the changes in order based on the date of the commit (since there is no incrementing revision, just a GUID). So what happens if my computer's date is one day behind and I commit half a day after my colleague. Would my change show up half a day before my colleague's?
#Martijn has your correct answer, but it doesn't seem to be clicking. If this makes anything clear pick his:
The commit times on Mercurial changesets have absolutely nothing to do with how they're merged, interleaved, or combined
Mercurial, and other DVCs, do all history tracking based entirely on the DAG (Directed Acyclic Graph), which means:
the only piece of metadata that matters is the parent or parents of a changeset
Before you commit you can see what the parent of your changeset is going to be by typing hg parents and once you commit you can see it in the hg log.
In your examples if you've pulled your colleague's changesets and updated your working directory to reflect them (hg pull ; hg update) then his or her changeset will be the parent of your changeset. If you haven't pulled/updated to reflect his or her changeset then both of your changesets will have the same parent -- they'll be siblings -- when when they're merged neither will take precedence over the other in any way.
The ordering when you do a hg log is determined by the order in which the changeset arrived in your local repository, and can differ from repo to repo depending on where they've pulled from first -- that's why you can't use the integer numbers shown next to changesets for cross-repo operations -- only the hash is global.
In short the date is never consulted in normal operations and is purely metadata with no more relevance than the author or commit description.
Indeed, your changesets are timestamped (internally stored as seconds-since-the-UNIX-epoch and a timezone offset), and when displaying changeset, they are ordered by timestamp and your changesets will be displayed in the incorrect place as their timestamps are be incorrect. See https://www.mercurial-scm.org/wiki/ChangeSet
This is not really that much of a problem though; timestamps have no bearing on how your changes are merged with those of your colleague. Only the changeset IDs matter here and when merging a common ancestor is used. It'll follow the graph of your changesets down looking for a changeset ID that your colleague also has and then merges your and his changes from the on forward. See https://www.mercurial-scm.org/wiki/Merge
So, merges are not affected, only the display of changesets, where your changesets will be sorted into the wrong location. It'll confuse you and your colleague, but your changes themselves are safe.
The order of changesets is not affected by the timestamp in the changesets. The order is determined solely by the order in which the changesets were created in your local repository.
Mercurial has an append-only architecture so when you pull from another repository, the changesets you pull in are placed after your own changesets, regardless of the timestamps in the new changesets.
The revision numbers (the local integer counter) is just a counter: each new changeset gets the next available revision number. The simple range operator in Mercurial works directly on revisions numbers, and so X:Y means "give me changesets with revision numbers from X to Y (inclusive)". This is not so useful when the revision numbers are based only on the order in which changesets were created or pulled into a repository.
Use X::Y instead, that gives you changesets that are descendants of X and ancestors of Y, i.e., it follows the branches in the history.