Is there an established method to tell the SCM, mercurial in my case, that files of the pattern foobaz_1_2_3.csv should all be considered versions of foobaz.csv ?
In my application I rely on data tables from an external source that put the version number in the filename. The importance of tracking changes across their versions was made painfully sharp recently when I spend days troubleshooting a bug on my side of the fence, only to discover it was because they changed some data content and notification of said change did not reach me.
If the filename was constant Hg would have informed me immediately of the internal change and I could have responded appropriately in an hour or two, with very little stress. I could just adopt the habit of renaming foobaz_2_3_4 to foobaz myself before checking in, or running diff old new and one or both of those is likely what I'll do from now on.
The whole experience has me wondering though if there might be other methods I've not thought of that don't mess with the external file. (for example what of I have a downstream user who doesn't use SCM and relies on the filename+version number, which I've thrown away?)
If you get data in file with permanently changed name and (possibly) changeable data, you can:
Store data-file under version-control (mercurial is OK)
replace old file with new every time
hg addremove -s nnn (Check Manual hg help addremove) will detect possible rename and include new file in history of old
Related
I'd like to know if there are any revision control systems such that, when user checks-out/gets/creates a workarea, it will create symbolic links for the files, rather than a local copy of everything. Once we "edit" a file, then it will replace the symbolic link with a local copy.
To make it more clear, let's say we have a repository like this:
proj/
data/
big_data.csv
src/
script.py
Users are mostly working on script.py, but data folder is huge in size. If every user keeps a local copy of big_data.csv, it consumes lots of disk space. If the revision control system keeps a copy of every version of every file, then all we need is a link to it. Users don't need to hold a local copy, unless they have to edit that file.
proj/
data/
#big_data.csv -> /depot/proj/data/big_data.csv#5
src/
script.py
So which revision control tools have a feature like this? Is it possible to get something similar in perforce?
Thanks!
You can have a symlink checked into a Perforce depot, and it can point wherever you like (I assume in this case you'd have a read-only shared network filer that the file lives on), but when you p4 edit it you'll just be editing where the symlink points.
I can envision a way you'd set this up in Perforce with streams (you could do it with classic branches and template clients too, but streams make it a lot easier).
You'd have one stream (let's call it dw) where big_data is the actual file, and one stream (let's call it dr) where big_data is the symlink, and big_data is specified as an isolate Path so that changes won't merge between the two.
There's a background script (maybe a change-commit trigger on dw/big_data) that writes each revision of dw/big_data to the filer as a unique read-only file, and then edits and submits dr/big_data to create a symlink revision that points to the filer revision. Hence a user syncing dr/big_data will get (via symlink) the matching revision of dw/big_data. dr/big_data is made read-only (via p4 protect) to all normal users to make sure nobody accidentally edits it; it needs to be strictly managed by the script that ties it to the filer.
So users who don't want to edit big_data use the dr stream; they can sync big_data to whatever revision they want, and it behaves normally, but they can't edit it. Other files in the stream (e.g. script.py) are normal files and they can edit them normally.
If a user wants to edit big_data, they do p4 switch dw and everything resyncs (for most of the files this is probably a no-op) and now big_data is a normal editable file. When they submit a new revision to it, the trigger fires and a second later there's a matching dr/big_data symlink, so users in the dr branch will see the change when they sync. (The one symlink rev per revision is so that you still get the benefits of versioning; big_data ONLY updates when you sync to the new symlink rev. If you didn't want this behavior you could just ignore keeping the file in Perforce and symlink straight to the filer and edit it there directly, but I assume you want actual version control on it.) Once they're done editing, they're free to p4 switch dr and go back to the read-only symlink (the head revision of which will now reference the revision they just submitted).
Whether dr is a parent of dw or vice versa, and how this fits into the rest of your codeline hierarchy, is left as an exercise for the reader -- it depends on whether you need to make edits to big_data in more than one codeline or whether having a linear "master" history of it is sufficient.
Here's a rather advanced question for true ClearCase experts:
I frequently perform a rebase on a ClearCase snapshot view that has just a very limited number of small changes in few files (e.g. file1.c, file2.c, file3.c).
I do it with the following on the UNIX command line:
cleartool rebase -recommended -complete
Sometimes, while this command runs, out of the blue, and for no exlained reasons (yet), I get prompted for manual input to solve some "merge" conflicts. But they make no sense to me, as they happen in file(s) that I NEVER EVER TOUCHED -- and which ONLY ONE OTHER DEVELOPER EVER TOUCHES.
The "merge" prompts I see when this scenario happens during a rebase look usually like:
"Do you want INSERTION from file x? [yes/no]"
or
"Do you want DELETION from file y? [yes/no]"
or
"Do you want CHANGES from file z? [yes/no]".
Etc.
I have no clue why these "conflicts" are happening. Additionally, it's really hard (see impossible) to make good decisions because the details are shown with a very narrow number of columns, and there's hardly any way to guess right. Using graphical merging is not an option here because this is meant for an automation script that should ideally never ask for user input.
What I do know about this scenario is:
We have a team of 6 developers. 5 of us usually work the same limited number of files... say file1.c, file2.c, file.3.c
I work on a child development stream on these three files. And when I'm done, I normally deliver up to the default parent stream.
On the occasions where the "merge conflicts" on rebase happened, it's always on a totally DIFFERENT FILE -- one that is ONLY EVER TOUCHED by JUST ONE other developer in the team (it's a module that HE owns, NO ONE EVER TOUCHES THAT FILE BUT HIM). Let's call him developer #6.
When this strange "merge conflict" on rebase happens, I've usually been working for an extended time in my own development child stream (always with a snapshot view), and I've done a couple rebases (at least 3) to bring other changes ALL made by other developers (in file1.c, file2.c and file3.c) and which I needed to complete my work.
But, the other developer (#6), the ONLY ONE working on banana.c, had made changes to that file, in at least two of the three rebases activities that were created in the snapshot view of my child development stream.
Again, I repeat, I NEVER touched banana.c, and none of the 5 other developers ever did, except for the guy (#6) who owns the banana.c module.
And there, it happened - ClearCase asked me for manual input to solve a "merge" conflict with banana.c when doing "cleartool rebase -recommended -complete".
How can this be possible???
How can a file require "merging" when doing rebase if there is ONLY EVER one single developer making changes to it?
It's as if ClearCase got confused between different versions of banana.c in at least two of the three rebase activities automatically created in my stream (which both modified banana.c) and prompted me for "merge conflict" resolution (even though I never ever touched banana.c and only one developer (#6) ever did modify that file).
.
.
.
UPDATE: AUGUST 31ST 2015
Here's a log from an occurrence of the problem on August 28th 2015
Needs Merge "/view/MYDYNAMICVIEW/vobs/DIRLEVEL1/DIRLEVEL2/SOMEFILE.cpp" to /main/MAIN_INT_STREAM/SUB_STREAM/CHECKEDOUT from /main/MAIN_INT_STREAM/SUB_STREAM/MY_DEV_STREAM/37 base /main/
MAIN_INT_STREAM/SUB_STREAM/150
********************************
<<< file 1: /view/MYDYNAMICVIEW/vobs/DIRLEVEL1/DIRLEVEL2/SOMEFILE.cpp##/main/MYDYNAMICVIEW/SUB_STREAM/150
>>> file 2: /view/MYDYNAMICVIEW/vobs/DIRLEVEL1/DIRLEVEL2/SOMEFILE.cpp##/main/MYDYNAMICVIEW/SUB_STREAM/MY_DEV_STREAM/37
>>> file 3: /view/MYDYNAMICVIEW/vobs/DIRLEVEL1/DIRLEVEL2/SOMEFILE.cpp
********************************
...CUT FOR BREVITY...
*** Automatic: Applying DELETION from file 3 [deleting base line 123]
...CUT FOR BREVITY...
*** Automatic: Applying INSERT from file 3 [lines 123-124]
...CUT FOR BREVITY...
*** Automatic: Applying CHANGE from file 3 [lines 1329-1335]
...CUT FOR BREVITY...
*** No Automatic Decision Possible
merge: Warning: *** Aborting...
Missing charsets in String to FontSet conversion
Missing charsets in String to FontSet conversion
Missing charsets in String to FontSet conversion
Cannot convert string "-misc-kochi mincho-medium-r-normal--0-*-*-*-*-*-*-*" to type FontSet
...GMERGE POPPED HERE...
Moved contributor "/view/MYDYNAMICVIEW/vobs/DIRLEVEL1/DIRLEVEL2/SOMEFILE.cpp" to "/view/MYDYNAMICVIEW/vobs/DIRLEVEL1/DIRLEVEL2/SOMEFILE.cpp.contrib".
Output of merge is in "/view/MYDYNAMICVIEW/vobs/DIRLEVEL1/DIRLEVEL2/SOMEFILE.cpp".
Recorded merge of "/view/MYDYNAMICVIEW/vobs/DIRLEVEL1/DIRLEVEL2/SOMEFILE.cpp".
I never touched SOMEFILE.cpp - only ONE other developer ever changes it- why does it require a merge?
My net impression at the moment is that ClearCase's automatic merge is doing a bad job.
Could it be a good idea to think of using the "-qall" or "-qntrivial" options to disable ALL/MOST automatic merging -- and do EVERY/MOST merge manually? (or with an external tool?)
1 & 2 How can this be possible???
This "Do you want the CHANGE made in file 2? [yes] no" message only appears when 2 contributors differ from the base contributor.
In that case, a cleartool lsvtree (not with -graph, since you don't have X-Window server) might help seeing the version involved, and trying to make some cleartool diff to see the difference (compared to the base contributor)
3/: that is one example where, if possible, working collaboratively on the same stream/branch (instead of working each developer in one own's stream) would be better.
Regarding the update of August 2015, the key error message is:
Missing charsets in String to FontSet conversion
See technote "Using GUI results in "Missing charsets in String to FontSet conversion" warning"
Possible causes include:
An improper setting of the locale variable. For example it may be set to UTF-8.
The file of interest is the Linux/palette, which defines the actual fonts used in the environment. This file is read to determine the fonts that can be displayed in the ClearCase GUI.
The palette file does not contain the correct fontset.
This issue was identified as a product defect and logged under APAR PK30799.
I have a file that was forked from a project at an unknown moment in the past. I want to identify as closely as possible the moment of that fork. The file has been changed since the fork-moment.
Winmerge highlights about about 20% of the lines, with about half of those being just a few characters within the line, a path change or inline function turned into a variable or function call for instance. (20% after ignoring whitespace change and enabling moved-block detection that is, closer to ~40% without that.)
I don't have to worry about branches, the original version control system was CVS. (I don't have access to the CVS file system). I have a git imported version with tags corresponding to the CVS commits, and could generate the same with Mercurial for little effort if need be.
I don't care about matching the specific CSV commit date/time/number/whatever. The goal is to identify when the content of new file started drifting, and step forward through the revision history, cherry picking what to merge to the forked file.
For this project I could brute force it, there only a dozen or so revisions where the fork has mostly likely occurred and the file is less than 500 lines. However it's not hard to imagine a scenario where this is not feasible and I'm curious about what an elegant solution might be.
How would you go about solving this?
"Brute force" sounds as if you were contemplating testing all revisions. Normally one would use a binary search. To decide if it was a good match, I'd normally use just the numbers from diffstat (since you say there are post-fork changes). Accounting for block-moves complicates things, though.
I am doing a refactoring of my C++ project containing many source files.
The current refactoring step includes joining two files (say, x.cpp and y.cpp) into a bigger one (say, xy.cpp) with some code being thrown out, and some more code added to it.
I would like to tell my version control system (Perforce, in my case) that the resulting file is based on two previous files, so in future, when i look at the revision history of xy.cpp, i also see all the changes ever done to x.cpp and y.cpp.
Perforce supports renaming files, so if y.cpp didn't exist i would know exactly what to do. Perforce also supports merging, so if i had 2 different versions of xy.cpp it could create one version from it. From this, i figure out that joining two different files is possible (not sure about it); however, i searched through some documentation on Perforce and other source control systems and didn't find anything useful.
Is what i am trying to do possible at all?
Does it have a conventional name (searching the documentation on "merging" or "joining" was unsuccessful)?
You could try integrating with baseless merges (-i on the command line). If I understand the documentation correctly (and I've never used it myself), this will force the integration of two files. You would then need to resolve the integration however you choose, resulting in something close to the file you are envisioning.
After doing this, I assume the Perforce history would show the integration from the unrelated file in it's integration history, allowing you to track back to that file when desired.
I don't think it can be done in a classic VCS.
Those versioning systems come in two flavors (slide 50+ of Getting git by Scott Chacon):
delta-based history: you take one file, and record its delta. In this case, the unit being the file, you cannot associate its history with another file.
DAG-based history: you take one content and record its patches. In this case, the file itself can vary (it can be renamed/moved at will), and it can be the result of two other contents (so it is close of what you want)... but still within the history of one file (the contents coming from different branches of its DAG).
The easy part would be this:
p4 edit x.cpp y.cpp
p4 move x.cpp xy.cpp
p4 move y.cpp xy.cpp
Then the tricky part becomes resolving the move of y.cpp and doing your refactoring. But this will tell Perforce that the files are combined.
If a project has multiple people, say, A,B,C working together and they all edit a same source file.
Couple months later, they realize that what A has been doing is wrong and they want to roll back the file in such a way that only parts/functions/lines/... that A "touched" are removed and the work B and C did is still in the roll back version. In other words, the roll back version has only the work of B and C up to the time they decide to remove A's work.
Is there any version/source control software out there (free/commercial) can do that?
Thanks.
Git and a bit of scripting will do that. Probably a bit of hand work too, but you can resort commits using interactive rebase.
Most VCSs should be able to do this -- it's a reverse merge. In Subversion you would identify the revisions made by A and merge them in again, but the other way round. To oversimplify, this means turning line additions into line removals, and vice versa.
# Don't want revision 37 because A made it.
$ svn merge -r 37:36 path
http://svnbook.red-bean.com/en/1.5/svn.branchmerge.basicmerging.html#svn.branchmerge.basicmerging.undo
I use TFS and Git. But, there are a lot of free and open source version control softwares. You can find all the source control softwares here.
In Git, you would probably do something like
git revert `git rev-list --author=A`
[Note: completely untested.]
I bet it can (easily) be done with Monotone by using `mtn local kill_certs selector certname [certval]' command (see reference) which:
This command deletes certs with the given name on revisions that match the given selector. If a value is given, it restricts itself to only delete certs that also have that same value. Like kill_revision, it is a very dangerous command; it permanently and irrevocably deletes historical information from your database.
So, by using A's certificate, the above command will eliminate 'wrong work' done by him.