Sharing unopened files rather than getting local copy of everything - version-control

I'd like to know if there are any revision control systems such that, when user checks-out/gets/creates a workarea, it will create symbolic links for the files, rather than a local copy of everything. Once we "edit" a file, then it will replace the symbolic link with a local copy.
To make it more clear, let's say we have a repository like this:
proj/
data/
big_data.csv
src/
script.py
Users are mostly working on script.py, but data folder is huge in size. If every user keeps a local copy of big_data.csv, it consumes lots of disk space. If the revision control system keeps a copy of every version of every file, then all we need is a link to it. Users don't need to hold a local copy, unless they have to edit that file.
proj/
data/
#big_data.csv -> /depot/proj/data/big_data.csv#5
src/
script.py
So which revision control tools have a feature like this? Is it possible to get something similar in perforce?
Thanks!

You can have a symlink checked into a Perforce depot, and it can point wherever you like (I assume in this case you'd have a read-only shared network filer that the file lives on), but when you p4 edit it you'll just be editing where the symlink points.
I can envision a way you'd set this up in Perforce with streams (you could do it with classic branches and template clients too, but streams make it a lot easier).
You'd have one stream (let's call it dw) where big_data is the actual file, and one stream (let's call it dr) where big_data is the symlink, and big_data is specified as an isolate Path so that changes won't merge between the two.
There's a background script (maybe a change-commit trigger on dw/big_data) that writes each revision of dw/big_data to the filer as a unique read-only file, and then edits and submits dr/big_data to create a symlink revision that points to the filer revision. Hence a user syncing dr/big_data will get (via symlink) the matching revision of dw/big_data. dr/big_data is made read-only (via p4 protect) to all normal users to make sure nobody accidentally edits it; it needs to be strictly managed by the script that ties it to the filer.
So users who don't want to edit big_data use the dr stream; they can sync big_data to whatever revision they want, and it behaves normally, but they can't edit it. Other files in the stream (e.g. script.py) are normal files and they can edit them normally.
If a user wants to edit big_data, they do p4 switch dw and everything resyncs (for most of the files this is probably a no-op) and now big_data is a normal editable file. When they submit a new revision to it, the trigger fires and a second later there's a matching dr/big_data symlink, so users in the dr branch will see the change when they sync. (The one symlink rev per revision is so that you still get the benefits of versioning; big_data ONLY updates when you sync to the new symlink rev. If you didn't want this behavior you could just ignore keeping the file in Perforce and symlink straight to the filer and edit it there directly, but I assume you want actual version control on it.) Once they're done editing, they're free to p4 switch dr and go back to the read-only symlink (the head revision of which will now reference the revision they just submitted).
Whether dr is a parent of dw or vice versa, and how this fits into the rest of your codeline hierarchy, is left as an exercise for the reader -- it depends on whether you need to make edits to big_data in more than one codeline or whether having a linear "master" history of it is sufficient.

Related

How to recover Perforce history on a moved directory

I have a branch on Perforce, where I changes the directory structure of the project using Rename/Move command.
During merging back to the mainstream, Something went wrong that caused Perforce to think of the new structure as a whole-new directories.
Subsequently, the history of the files in the new directory structure is totally unrelated to the history of the same files before changing the structure.
Is there anyway to recover this situation ? Or ask Perforce to append the old history with the new history ?
Something went wrong that caused Perforce to think of the new structure as a whole-new directories.
Usually if this happens it means someone didn't use the "rename/move" command and used some other method to rename instead (i.e. they did something that adds the new directory as a new set of files independent of the originals rather than an atomic rename of an existing set of files). It's impossible for me to say how to "recover" without seeing what the history of the files looks like now so I can reverse-engineer what the "something went wrong" was.
I'd recommend either posting on the Perforce forums or contacting Perforce technical support so that somebody with expertise can wheedle the necessary data out of you (I can intuit that this will require an amount of back and forth that stackoverflow frowns on -- "what were the branches you were merging from and to", "okay, now run THIS command to see the history of that branch and send me the output," "okay, which of these five merge operations I can see in the history is the one you're talking about,") and propose a solution.
From another answer:
So, for a file a/b/c, you can look at the by using the -i option where appropriate. For example, p4 filelog -li a/b/c.
This is not necessary if files are renamed via the "move/rename" command, so if you need to use "filelog -i" to see file history, the files were definitely renamed by some other method. (The "p4 move" command was added in 2009 so long-time Perforce users will sometimes use other workflows.)

replace a library in perforce

I need to replace a library in a perforce depot. The library is checked in in the form of source files which are all managed by perforce.
Now the problem is that in the new version of the library there may be
unchanged files
changed files
new files and
some files may have been deleted
Of course I can just mark the whole source tree for delete, submit, copy the new version of the library to the directory in question, mark for add and submit again, but that would create a short interval of time in which no one should synchronize in order to not break his next build -- maybe that's the best option but I'd like to know whether there is a better approach.
A second solution is to copy the new version of the library to some other directory, update all references in order to reflect the new location, and then just delete the old library and mark the new one for add. This can be done in one change list. The unpleasant and error prone part here is to update the references. Also a change in the directory names is not really desired.
Does anyone know a way to do this in one step with one changelist? I experimented with a single file example. It actually is possible to mark a file for delete and then immediately create a file with the same name and mark that for add. If you do that and submit, then the result is exactly what I want for that single file. This procedure, however, seems to require touching each file manually. I could not figure out how to do that for a whole directory or directory tree.
One possibility is to use p4 reconcile to do the majority of the work, using a process such as:
In your workspace, remove the current copy of the source tree entirely: rm -rf top-directory-name (or del /s /q if you're on Windows).
copy the entire new copy of the source tree for the library into that location.
Run p4 reconcile and let it figure out what files to open for add, for edit, and for delete. CAREFULLY inspect the results by looking closely at p4 opened, p4 diff, etc.
Submit the new changelist.

With TortoiseHg, how do I exclude a file from checkins/pushs, but still get updates to it? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
how to ignore files in kiln/mercurial using tortoise hg “that are part of the repository”
I have a config file that I don’t wish to check in however I do wish to get updates whenever someone checks in a change to it.
In most systems I just need to uncheck the tick mark next to the config file at check-in time, however HG seems to make life a lot harder!
In parforce this even easier, I can just check the config file out in a different change list, how do I do the same in TortoiseHg?
In general, you don't. The usual way to handle this is not to put the config file in source control, but instead to put a template for it in source control. Something like config.sample. You can even tweak your run/build script to copy config.sample to config if config doesn't already exist.
There are plenty of other ways to try and get at this using mq or an alias like mycommit = commit -X config, but at its core a file is either tracked or it isn't and a file everyone has to change themselves shouldn't be.
If you uncheck the file from the file list before committing it won't go into the change set. This means it won't feature in a push (as these are per change set).
This is one feature of Tortoise that makes it useful over the command-line.
If you do a pull with an edited file, you will create multiple heads. You can merge these if you want the file to feature changes, but this might be a manual step.
Alternatively in the case of a config file it is useful to use the Patch Queue functionality of Mercurial. From the command-line this is possible thus (assuming it is changed in your working directory):
hg qnew "localConfig"
hg qrefresh
This creates a new patch queue item called "localConfigs", and puts the edited files (your config file) into the item. You can then:
hg qpop
To remove it from the patch queue (out of your change set path). Or:
hg qpush
To put it in your change set path. This is an easier way of managing file changes that you do regularly on top of keeping pace with the central repository: you pop your queue items out, pull and update, then push the queue items back on (handling any merge conflicts, though these are rare if your items are small). This way you avoid multiple heads.
https://www.mercurial-scm.org/wiki/MqExtension
We tend to use this mechanism in our office.
Note, pushing and popping acts like a stack collection; if "localConfigs" is on top of "moreLocalChanges" you will need both if you wish to push "localConfigs". My example assumes that the "localConfigs" patch is the only one in the queue. It is also disabled by default in Mercurial configuration, but comes bundled with it so you can enable it simply:
[extensions]
mq =

tracking file-name version control in a real version control system?

Is there an established method to tell the SCM, mercurial in my case, that files of the pattern foobaz_1_2_3.csv should all be considered versions of foobaz.csv ?
In my application I rely on data tables from an external source that put the version number in the filename. The importance of tracking changes across their versions was made painfully sharp recently when I spend days troubleshooting a bug on my side of the fence, only to discover it was because they changed some data content and notification of said change did not reach me.
If the filename was constant Hg would have informed me immediately of the internal change and I could have responded appropriately in an hour or two, with very little stress. I could just adopt the habit of renaming foobaz_2_3_4 to foobaz myself before checking in, or running diff old new and one or both of those is likely what I'll do from now on.
The whole experience has me wondering though if there might be other methods I've not thought of that don't mess with the external file. (for example what of I have a downstream user who doesn't use SCM and relies on the filename+version number, which I've thrown away?)
If you get data in file with permanently changed name and (possibly) changeable data, you can:
Store data-file under version-control (mercurial is OK)
replace old file with new every time
hg addremove -s nnn (Check Manual hg help addremove) will detect possible rename and include new file in history of old

Same file in multiple changelists in perforce

Is there any way to have the same file be a part of multiples changelists in perforce? With that I mean that from the set of changed lines in the file one subset will belong to a changelist, while the other subset will belong to a second changelist.
Bonus question: If perforce does not support this, then which Source Control Systems, if any, do?
To answer the bonus question: GIT allows for per-line changelists.
For a comparison between the two view this question: GIT vs. Perforce- Two VCS will enter... one will leave.
The same copy of the file? No, unfortunately this isn't possible.
Another way to do this without branching is create additional workspaces (clients). Unless you really know what you're doing, be sure to set a different root directory in each of your workspaces. To save time (and disk), don't bother syncing the whole depot in the new workspace.
Sometimes, I'll have two copies of a depot (using two workspaces); one which contains work-in-progress and one which I keep unmodified. If I need to make a quickie change on a file that's heavily modified in my WIP workspace, I can use the 'virgin' workspace to make the change and submit it.
If you are using p4 server 2009.2, there is a workaround to do it. You can shelve a particular file and the diff is stored on the server. After shelving you may want to revert the file to its original version and then work it on in another change-list.
I know this is not a way you wanted it but it is quite easy to create another workspace/client and then sync the code. The later exercise becomes more tedious when you have volumes of code that goes into another application.
For more info read:
http://blog.perforce.com/blog/?p=1872
http://www.perforce.com/perforce/doc.current/manuals/cmdref/shelve.html
You could make a copy of the file with all of the changes, revert, edit the file copy one set of changes into the file, submit, edit, copy the next set of changes, submit, edit, etc...
Bonus answer: I found this feature in Rational Team Concert (http://www-03.ibm.com/software/products/en/rtc/). You can have the same file in many changesets. If you want to add File1 to Changeset1 and Changeset2, you must complete Changeset1 first. This allows you to add File 2 to Changeset2 but then a dependency between changesets is created, so you can not deliver Changeset2 without delivering Changeset1 too. Moreover you can not make changes to a complete changeset.