How to join two files in a version control system - version-control

I am doing a refactoring of my C++ project containing many source files.
The current refactoring step includes joining two files (say, x.cpp and y.cpp) into a bigger one (say, xy.cpp) with some code being thrown out, and some more code added to it.
I would like to tell my version control system (Perforce, in my case) that the resulting file is based on two previous files, so in future, when i look at the revision history of xy.cpp, i also see all the changes ever done to x.cpp and y.cpp.
Perforce supports renaming files, so if y.cpp didn't exist i would know exactly what to do. Perforce also supports merging, so if i had 2 different versions of xy.cpp it could create one version from it. From this, i figure out that joining two different files is possible (not sure about it); however, i searched through some documentation on Perforce and other source control systems and didn't find anything useful.
Is what i am trying to do possible at all?
Does it have a conventional name (searching the documentation on "merging" or "joining" was unsuccessful)?

You could try integrating with baseless merges (-i on the command line). If I understand the documentation correctly (and I've never used it myself), this will force the integration of two files. You would then need to resolve the integration however you choose, resulting in something close to the file you are envisioning.
After doing this, I assume the Perforce history would show the integration from the unrelated file in it's integration history, allowing you to track back to that file when desired.

I don't think it can be done in a classic VCS.
Those versioning systems come in two flavors (slide 50+ of Getting git by Scott Chacon):
delta-based history: you take one file, and record its delta. In this case, the unit being the file, you cannot associate its history with another file.
DAG-based history: you take one content and record its patches. In this case, the file itself can vary (it can be renamed/moved at will), and it can be the result of two other contents (so it is close of what you want)... but still within the history of one file (the contents coming from different branches of its DAG).

The easy part would be this:
p4 edit x.cpp y.cpp
p4 move x.cpp xy.cpp
p4 move y.cpp xy.cpp
Then the tricky part becomes resolving the move of y.cpp and doing your refactoring. But this will tell Perforce that the files are combined.

Related

Perforce CLI - how to determine which changelist a file is checked out under?

In the absence of development branches (or sometimes in the overabundance thereof), it is common to have at least half a dozen of other people's shelves unshelved in a workspace, as well as multiple of one's own: both local changes (divided by directory, of course) and shelves from other workspaces.
(As an aside, yes, this is almost as bad as pre-version control email/tarball/fileshare-based nonsense, and no, there's nothing I can do about it.)
As a result of this situation, it is often necessary to identify which of myriad changelists refers to a given file.
Is there a way of retrieving this information, using the p4 command line tool, without recourse to elaborate shell scripting or wrapper programmes?
p4 opened FILE
Note that this tells you which changelist you have the file open in -- it doesn't necessarily correspond to which changelist you may have unshelved from originally. If this is important, make sure that when you unshelve it's into distinct changelists (rather than opening everything in the same changelist) so you can keep track of what came from where.

Search code history for closest matching version based on content

I have a file that was forked from a project at an unknown moment in the past. I want to identify as closely as possible the moment of that fork. The file has been changed since the fork-moment.
Winmerge highlights about about 20% of the lines, with about half of those being just a few characters within the line, a path change or inline function turned into a variable or function call for instance. (20% after ignoring whitespace change and enabling moved-block detection that is, closer to ~40% without that.)
I don't have to worry about branches, the original version control system was CVS. (I don't have access to the CVS file system). I have a git imported version with tags corresponding to the CVS commits, and could generate the same with Mercurial for little effort if need be.
I don't care about matching the specific CSV commit date/time/number/whatever. The goal is to identify when the content of new file started drifting, and step forward through the revision history, cherry picking what to merge to the forked file.
For this project I could brute force it, there only a dozen or so revisions where the fork has mostly likely occurred and the file is less than 500 lines. However it's not hard to imagine a scenario where this is not feasible and I'm curious about what an elegant solution might be.
How would you go about solving this?
"Brute force" sounds as if you were contemplating testing all revisions. Normally one would use a binary search. To decide if it was a good match, I'd normally use just the numbers from diffstat (since you say there are post-fork changes). Accounting for block-moves complicates things, though.

Mercurial: Pre and Post merge operations (per file)

We are using Mercurial as an SCM to handle the source script files of a program. Each project we manage has ~5000 files with each file containing a section with some product-specific informations about the file itself (version list, date, time etc.). This section is - due to the way it is structured - in 80% of the merges, the only section that has conflicts. They are easily resolved, but when merging around 300 files, it gets tiresome.
The problem is: I have no control over the way this section is written and I cannot change the format of the section itself, as it would make the file unusable by the program.
My question: is there a way in mercurial (hooks?), that allows me to
pre-process the file with a script
let mercurial do the merge
if merged correctly: post-process the file with a script. otherwise: "resolve-conflicts" as usual.
You could probably get away with it by creating a custom merge tool:
https://www.mercurial-scm.org/wiki/MergeToolConfiguration
A simple script that invokes 'diff' after removing the ever changing sections might be enough.
It sounds like those sections are the sort of nonsense that the (disrecommended) KeywordsExtension are built to handle, but I gather you don't have a lot of flexibility around them.

Archiving succesive beta versions : how to save harddisk space?

I archive successive versions of an in-progress work :
MySoftware-v1.01beta.rar [2 GB]
MySoftware-v1.02beta.rar [2 GB]
MySoftware-v1.03beta.rar [2 GB]
MySoftware-v1.04beta.rar [2 GB]
etc.
Lots of files are modified, so it's not possible to backup only modified files : most of the files are modified each time.
How can do a .rar file that only saves the "difference" (should I use something like "patch" or "diff" ? -> I never used them). There are lots of "difference" tool, okay, but the result file won't be a .rar, it will only be a "difference file" : so each time I would like to re-open such an archive, I'll have to "de-diff" it and only THEN I will have a .rar again.
I'm on Windows, and if possible, I'd like to use winrar or command-line tool (it would be great if no third party software is needed).
Thanks a lot in advance!
You say 90% of your product is .wav files. Since diff on two wav files that are different is likely to produce huge differences, this is not likely to save you any space. Nor are .wav files really compressible, so zip or rar likely doesn't help much, either.
However, if, like most of us programmers, you derive your next version of the product from the previous one, by mostly retaining files unchanged (whether that be source or be .wav files), then what you really want to do is simply store, for each version, the files that changed. This is called "de-duplication" in the backup/compression world.
You can organize a complicated scheme your self to do this. (e.g., your self-suggested "do this with winrar"). But if you use a decent "source control system" (SVN or GIT would be fine), this will happen automatically as you checkin changed (and don't re-checkin unchanged) files. These tools work by keeping track of "differences" between versions; you can tell the tools to track text ("diff") style differences, or simply store the entire thing.
Also, since your individual versions occupy 2GB, I'd go waste $100 on a 2 or 4 terabyte (external) drive. That should last you in worst case through some 1000 iterations. (SVN/GIT will likely extended this a lot further).
You should really be using a source control system. A popular one is called 'git'. There are many others, each with their own strengths and weaknesses and the debate about which is 'best' is long and tedious.
Source control systems take care of storing and managing revisions of your files. The actual methods vary, but as a programmer who uses version control you 'check in' files for storage and version control, 'tag' them with revision numbers and then 'check out' files for modifying.
If you've ever downloaded source off the Internet using 'svn' or 'cvs', that's the type of thing I mean.
The source control system usually uses some sort of difference system to only store differences between modified files. Its purpose is to save you from having to even think about copying and backing up files - all you have to do is ensure your 'repository' is backed up correctly.
Also, as an added advantage you can make changes to source files and always have backups in case your changes need reverting. So suppose you want to try out a new file handling system you can use the source control system to create a testing (or whatever you want to call it) 'branch' and do all your changes in there without damaging a working copy of your software. If the changes are good you can then 'merge' the changes into the non testing branch of your repository.

Same file in multiple changelists in perforce

Is there any way to have the same file be a part of multiples changelists in perforce? With that I mean that from the set of changed lines in the file one subset will belong to a changelist, while the other subset will belong to a second changelist.
Bonus question: If perforce does not support this, then which Source Control Systems, if any, do?
To answer the bonus question: GIT allows for per-line changelists.
For a comparison between the two view this question: GIT vs. Perforce- Two VCS will enter... one will leave.
The same copy of the file? No, unfortunately this isn't possible.
Another way to do this without branching is create additional workspaces (clients). Unless you really know what you're doing, be sure to set a different root directory in each of your workspaces. To save time (and disk), don't bother syncing the whole depot in the new workspace.
Sometimes, I'll have two copies of a depot (using two workspaces); one which contains work-in-progress and one which I keep unmodified. If I need to make a quickie change on a file that's heavily modified in my WIP workspace, I can use the 'virgin' workspace to make the change and submit it.
If you are using p4 server 2009.2, there is a workaround to do it. You can shelve a particular file and the diff is stored on the server. After shelving you may want to revert the file to its original version and then work it on in another change-list.
I know this is not a way you wanted it but it is quite easy to create another workspace/client and then sync the code. The later exercise becomes more tedious when you have volumes of code that goes into another application.
For more info read:
http://blog.perforce.com/blog/?p=1872
http://www.perforce.com/perforce/doc.current/manuals/cmdref/shelve.html
You could make a copy of the file with all of the changes, revert, edit the file copy one set of changes into the file, submit, edit, copy the next set of changes, submit, edit, etc...
Bonus answer: I found this feature in Rational Team Concert (http://www-03.ibm.com/software/products/en/rtc/). You can have the same file in many changesets. If you want to add File1 to Changeset1 and Changeset2, you must complete Changeset1 first. This allows you to add File 2 to Changeset2 but then a dependency between changesets is created, so you can not deliver Changeset2 without delivering Changeset1 too. Moreover you can not make changes to a complete changeset.