How does Bazaar handle file renaming? - version-control

I was curious to know more about how different version control systems track renaming files in a repository, especially in the case of a merge. On this question, comparing Git, SVN and Mercurial's strategies for file renaming, someone posted this blog post from an author claiming that Bazaar's file rename strategy is much more robust than any other VCS. The author states that Bazaar "treats renames as a first class operation".
What the author didn't explain was how this works, what it means to treat renames as a first class operation, and why its strategy is better than, e.g. Git's "best guess" rename detection algorithm.
I have no experience of Bazaar, but I'd like to know:
How does it handle file renaming?
What makes its algorithm more reliable than other popular VCS (if anything)?
I couldn't find this information easily from Bazaar's own docs.

Preface
Bzr is dead
Text without facts from 2007 is just a piece of shit
Face
In bzr or hg, you have two things: the fileobject (doesn’t change across rename) and the filename (change across rename). Both applications are keeping this information, but the difference is in the implementation. The primary storage of a fileobject in hg depends of the filename (with a pointer to the previous fileobject if the previous fileobject doesn’t have the same filename, that is the couple (filename, revision)). In bzr the fileobject is stored directly on disk.
Source: https://www.markshuttleworth.com/archives/123#comment-110483
Answers
Bzr record(ed) renames at repo as single transaction "rename", in hg it's (still) "remove"+"add", also recorded in revlog and give the same result: traceability of the whole history of every object)
In 2022 Bzr's rename isn't "more reliable" than Hg, but better Git's guessing (just because every guessing instead of recorded event may be wrong in edge cases)
Resume
In 2022 you haven't reasons to prefer Bzr (or its successor Breezy) or Darcs or Pijul over Hg pure due to rename possibilities

Related

How to recover Perforce history on a moved directory

I have a branch on Perforce, where I changes the directory structure of the project using Rename/Move command.
During merging back to the mainstream, Something went wrong that caused Perforce to think of the new structure as a whole-new directories.
Subsequently, the history of the files in the new directory structure is totally unrelated to the history of the same files before changing the structure.
Is there anyway to recover this situation ? Or ask Perforce to append the old history with the new history ?
Something went wrong that caused Perforce to think of the new structure as a whole-new directories.
Usually if this happens it means someone didn't use the "rename/move" command and used some other method to rename instead (i.e. they did something that adds the new directory as a new set of files independent of the originals rather than an atomic rename of an existing set of files). It's impossible for me to say how to "recover" without seeing what the history of the files looks like now so I can reverse-engineer what the "something went wrong" was.
I'd recommend either posting on the Perforce forums or contacting Perforce technical support so that somebody with expertise can wheedle the necessary data out of you (I can intuit that this will require an amount of back and forth that stackoverflow frowns on -- "what were the branches you were merging from and to", "okay, now run THIS command to see the history of that branch and send me the output," "okay, which of these five merge operations I can see in the history is the one you're talking about,") and propose a solution.
From another answer:
So, for a file a/b/c, you can look at the by using the -i option where appropriate. For example, p4 filelog -li a/b/c.
This is not necessary if files are renamed via the "move/rename" command, so if you need to use "filelog -i" to see file history, the files were definitely renamed by some other method. (The "p4 move" command was added in 2009 so long-time Perforce users will sometimes use other workflows.)

Search code history for closest matching version based on content

I have a file that was forked from a project at an unknown moment in the past. I want to identify as closely as possible the moment of that fork. The file has been changed since the fork-moment.
Winmerge highlights about about 20% of the lines, with about half of those being just a few characters within the line, a path change or inline function turned into a variable or function call for instance. (20% after ignoring whitespace change and enabling moved-block detection that is, closer to ~40% without that.)
I don't have to worry about branches, the original version control system was CVS. (I don't have access to the CVS file system). I have a git imported version with tags corresponding to the CVS commits, and could generate the same with Mercurial for little effort if need be.
I don't care about matching the specific CSV commit date/time/number/whatever. The goal is to identify when the content of new file started drifting, and step forward through the revision history, cherry picking what to merge to the forked file.
For this project I could brute force it, there only a dozen or so revisions where the fork has mostly likely occurred and the file is less than 500 lines. However it's not hard to imagine a scenario where this is not feasible and I'm curious about what an elegant solution might be.
How would you go about solving this?
"Brute force" sounds as if you were contemplating testing all revisions. Normally one would use a binary search. To decide if it was a good match, I'd normally use just the numbers from diffstat (since you say there are post-fork changes). Accounting for block-moves complicates things, though.

tracking file-name version control in a real version control system?

Is there an established method to tell the SCM, mercurial in my case, that files of the pattern foobaz_1_2_3.csv should all be considered versions of foobaz.csv ?
In my application I rely on data tables from an external source that put the version number in the filename. The importance of tracking changes across their versions was made painfully sharp recently when I spend days troubleshooting a bug on my side of the fence, only to discover it was because they changed some data content and notification of said change did not reach me.
If the filename was constant Hg would have informed me immediately of the internal change and I could have responded appropriately in an hour or two, with very little stress. I could just adopt the habit of renaming foobaz_2_3_4 to foobaz myself before checking in, or running diff old new and one or both of those is likely what I'll do from now on.
The whole experience has me wondering though if there might be other methods I've not thought of that don't mess with the external file. (for example what of I have a downstream user who doesn't use SCM and relies on the filename+version number, which I've thrown away?)
If you get data in file with permanently changed name and (possibly) changeable data, you can:
Store data-file under version-control (mercurial is OK)
replace old file with new every time
hg addremove -s nnn (Check Manual hg help addremove) will detect possible rename and include new file in history of old

Multiple repositories in one directory (same level) - is it possible?

My original problem is that I have a directory where I write various scripts. Each of them is independent of others, and usually one-file-long. I want to have some versioning applied to them, but I have the following problems/requirements:
I don't want to have to store each small script in a separate directory!
I don't want to store them all in one repository OTOH, as they are completely unrelated, and:
some of them may later grow to more files (and then they will need a separate dir),
I sometimes want to copy one of them to a different machine (and I want to clone the whole repo).
I want to benefit from (distributed) version control mechanisms -- at least:
"infinite" number of revisions,
ability to clone repositories on different computers,
ability to do "atomic" multi-file commits.
Is it possible?
I'd prefer to do it in some mainstream distributed VCS (a solution using Mercurial would be preferable, but I'm not fixed).
EDIT: the solution has to be free (at least "as in beer") and cross-platform (at least Win32 & Linux).
Related, but didn't help:
"two-git-repositories-in-one-directory" -- didn't find it helpful: the accepted answer looks like point 2. (above) to me; the current "community voted" answer sounds like 1.
"Version control of single files using Subversion" -- also too much of 2. or 1.
These requirements seem pretty "special" to me, so here is a solution on par with them ^^
You may use two completely different VCS, in the same directory. Even two "instances" of SVN might work: SVN stores its metadata in a directory called .SVN and has (for historical reasons regarding ASP) the option to use _SVN. The Directory listing should look like this
.SVN // Metadata for rep1
_SVN // Metadata for rep2
script1 // in rep1
script2 // in rep2
...
Of course, you will need to hide or ignore the foreign scripts or folders from each VCS...
Added:
This only accounts for two scripts in one folder and needs one additional VCS per script beyond that, so if you even consider this route and need more repositories, rename each Metadir and use a script to rename it back before updating:
MOVE .SVN-script1 .SVN
svn update
MOVE .SVN .SVN-script1
Why don't you simply create a separate branch (in the git sense) for each (group of) script(s)?
You can develop them individually as you please. Switching to a branch will show you only the scripts from that branch. It's sort of like directories but managed by the version control system. If you later want to pluck a branch out into another repository, you can do that and if you want to combine two scripts into a single project, you can do that as well. The copying them to the different machine point might be a problem but you can clone the branch you're interested in and you it should work for you.
Another proposition for my own consideration is "Using Convert to Decompose Your Repository" article on hgtip.com. It fails as a "standalone" solution, but could be helpful as an addition to the "mv .hgN .hg / MOVE .SVN-script1 .SVN" idea.
You can create multiple hidden repository directories and symlink .hg to whichever one you want to be active. So if you have two repositories, create directories for them:
.hg_production
.hg_staging
Then to activate either of them just do:
ln -sf .hg_production .hg
You could easily create a bash command to do this. So instead you could write something like activate-repo production, which would run ln -sf .hg_production .hg.
Note: Mac doesn't seem to support ln -sf so instead you'll need to do:
rm .hg; ln -s .hg_production .hg
I can only think of these two lightweight versioning systems:
1) Using Dropbox with the Pack-Rat upgrade, to keep a full history of versions for each file automatically backed up and with the possibility to be shared with multiple Dropbox users: https://www.dropbox.com/help/113
If you have multiple machines managed by the same user (you), the synching would be automatic. Also if the machines are in the same LAN, Dropbox is smart enough to sync the files over the local network, so big files shouldn't be a worry.
2) Using a 'Versions' aware text editor for Mac OS X Lion. I'd expect TextMate, Coda and other popular Mac code editors to be updated to support this feature when Lion is released.
How about a compromise between 1 and 2? Instead of a folder+repo for each script, can you bundle them into loosely related groups, such as "database", "backup", etc. and then make one folder+repo for each group? Then if you clone a repo on another machine, you're only pulling down a smaller number of unrelated files. (Is the bandwidth/drivespace really a concern?) To me, this sounds WAAAY simpler than all of the other suggestions so far.
(Technically this approach meets your requirements because (1) each script isn't in its own directory, (2) not all scripts are in the same repository, and (3) you can easily do this with any popular DVCS. :D)
UPDATE (2016): Apparently, a guy named Cosmin Apreutesei created a tool named multigit, which seems to implement what I wished for in this question! If you ever read it, thanks a lot Cosmin! I've started using your tool this year and find it awesome.
I'm starting to think of some kind of an overlay over Mercurial/git/... which would keep a couple "disabled" repository meta-directories, let's say:
.hg1/
.hg2/
.hg3/
etc., and then on hg commit FILENAME would find the particular .hgN that is linked to FILENAME, and would then temporarily:
mv .hgN .hg
hg commit FILENAME
mv .hg .hgN
The main disadvantage is that it would require me to spend some time writing the tool. Or does anybody know of some ready-made one like this? If you do, please post as a full-featured answer (not a comment), I'm more than willing to accept it.

How to deal with committer name change in Mercurial

I have a project in Mercurial with a group of committers. Unfortunately, some of the committers has changed names several times, e.g. first it was "nickname", and then it became "Name Surname ", and then something else.
Most of the repository analysis tools expect committer to have same name over the course of the project, so ideally I'd like to rename committers of previous revisions in our main repository and have everyone make a fresh clone. Is it possible?
Are there any other ways to deal with this problem?
Yes, it's possible. Use the Convert extension and then hg convert from the repository with the bad names to the new repository with the good names and use an authormap. There are many things you can accomplish using the convert extension and converting from Mercurial to another Mercurial repo.
Authormap file, supposing Eric Hopper <bumpy#bar.com> is the canonical name for the author:
Eric Hopper <bouncy#foo.com>=Eric Hopper <bumpy#bar.com>
Eric M. Hopper <bouncy#foo.com>=Eric Hopper <bumpy#bar.com>
Eric Hopper <bouncy#baz.com>=Eric Hopper <bumpy#bar.com>
Then:
hg convert -s hg -d hg --authormap authormap badnamesrepo goodnamesrepo
Note: that while converting an hg to an hg repository will not always create lots of new changesets, in this case it will, and they will be equivalent to (but different from) changesets in the original repository. This means that everybody using this repository is going to have to erase any clones they have and fetch new ones.
In the general case, converting an hg repository to an hg repository is likely to create at least a few new changesets or there wouldn't be a reason to do it. And that will almost certainly necessitate everybody destroying all their clones and re-cloning.
If your analysis tool has the ability to remap author names, that's probably the better way to go. But that's not what you asked for, so I gave you the answer you asked for. :-)