How do people manage changes to common library files stored across mutiple (Mercurial) repositories? - version-control

This is perhaps not a question unique to Mercurial, but that's the SCM that I've been using most lately.
I work on multiple projects and tend to copy source code for libraries or utilities from a previous project to get a leg up on starting a new project. The problem comes in when I want to merge all the changes I made in my latest project, back into a "master" copy of those shared library files.
Since the files stored in disjoint repositories will have distinct version histories, Mercurial won't be able to perform an intelligent merge if I just copy the files back to the master repo (or even between two independent projects).
I'm looking for an easy way to preserve the change history so I can merge library files back to the master with a minimum of external record keeping (which is one of the reasons I'm using SVN less as merges require remembering when copies were made across branches).
Perhaps I need to do a bit more up-front organization of my repository to prepare for a future merge back to a common master.

Three solutions, pick your favorite:
Put all projects into one repository.
Make a separate repository for shared code and different repository for each project.
One repository with Subrepositories: https://www.mercurial-scm.org/wiki/subrepos, keep all common code in one subrepo and different subrepos for each project.
Copying actual files between repositories with no common ancestors will never be optimal as history is not preserved.

I'd recommend against your "copy the sourcecode" practice but use binary distribution for your custom libraries instead. These binaries are checked in along the sourcecode.
reduces build-time
no overhead of tracking changes in all copies of the library
you can use different versions of the same library in different projects.
EDIT: And for the issue with "common" or "toolbox" libaries in general, read this post from ayende.

use the transplant extension

Related

Multiple Git repositories for each Eclipse project or one Git repository

I am in the process of moving to Git from SVN. In SVN I had multiple eclipse projects in a single SVN repository that is convenient for browsing projects. I was going to move to having one git repository per eclipse project but EGit suggests doing otherwise.
The guide for EGit suggests putting multiple projects into a single Git repository.
Looking at similar questions such as this suggest one project per repository.
Which approach is best practice and what do people implement?
It depends on how closely-related these projects are. Ask yourself the following questions:
Will they always need to be branched/tagged together?
Will you want to commit over all projects, or does a commit mostly only touch one project?
Does the build system operate on all of them or do they have a boundary there?
If you put them all in one, some things from above will be easier. You will only have to branch/tag/stash/commit in one repository, as opposed to doing it for every repository separately.
But if you need to have e.g. separate release cycles for the projects, then it's necessary to have each project in an independent repository.
Note that you can always split up a repository later, or combine multiple repositories into one again without losing history.
Combining is a bit harder to do than splitting, so I would go for one repository first and see how it goes.
I use 1 repo per project.
Some reasoning:
When you discover you messed up something after several commits, it's much easier to fix when it's just one project. Just think about, you did commits to two other projects and now you need to fix the commit you did on the 3rd project.
As Fedir said, your history and log is much cleaner. It only shows the commits for that project.
It works better with the development workflow I have. I have a master branch for production, develop branch for, well, development, and I create branches to implement features (you can read more about it here: http://blog.avirtualhome.com/development-workflow-using-git/)
When you work in a team, and so "share" the git repo, do the team members really need all the other projects as well?
Just a few thoughts, but what it boils down to: Do what works for you.
I have multiple projects (Eclipse projects) and have tried different things to find out what worked best in terms of actual daily development. Here is what I found and I think that most people would find the same thing if they kept track of the results and analyzed the results objectively.
In short applying the following rules will give the best results:
Make a separate repository for each project group.
Each project group consist of a group of projects that are tightly connected to each other, that should be administered together and that cannot be easily decoupled from each other.
A project group can contain a single project.
A project group that contains multiple projects should be examined to see if some of its projects can be decoupled from each other so it can be split into smaller project groups that are still contain projects that are tightly connected to each other, that should be administered together and that cannot be easily decoupled from each other.
The following guidelines explain this process for determining which projects to put in the same repository in more detail:
If a project is not closely connected to any other project (for example, the project can be opened without other projects being opened and no other projects relies on the project being opened when they are opened) then you should definitely place it in its own repository for the reasons explained in the answers above this one.
If a project is dependent upon other projects or other projects depend upon the project then it comes down to exactly how connected are they upon each other, how well they can be packaged together and how easily can they be decoupled from each other.
A) For example a testing project that contains junit test classes to test the classes of a main project is a case where the two projects are very connected with each other, can easily be packaged together and cannot be easily decoupled from each other. These projects should be placed in the same repository for the reasons explained in part C below.
B) In a case where one project relies on another project to provide some sort of shared resources it really comes down to how well that they can be administered together and how easily that they can be decoupled from each other. For example if the project with the shared resources is relied upon by many projects, then it should be put in its own repository because the other unrelated projects are impacted by changes to the shared source code project. In a case like this, the shared resources project should be decoupled from the dependent projects instead of being directly connected to the dependent projects. (For example, it would be better to create versioned archive files [Jar files with a name like "projectName".1.0.1.0.jar for example] and include a copy of those in each project instead of sharing the resources by linking the projects together.)
C) If the multiple projects are connected, can be easily administered together but cannot be easily decoupled from each other, then it depends upon how tightly connected they are with each other.
I) If the projects are put into one repository, then the projects will be kept in sync with each other in the repository each time there is a commit, which can be a real life saver if the projects are tightly connected. However, this also creates the issues noted in the answers above this one.
II) If the projects are put into separate repositories, then you will have to take care to keep there commits in sync with each other and be sure to include some sort of mechanism to indicate which commits belong to the same sync point across the projects (Perhaps something like including the same sync point number in the comments for the commit of each project when a group of commits is done across the projects.)
III) So in cases like this, it is almost always better to put these projects together into a single repository to reduce the overhead of human effort in syncing the commits and to avoid human error should the commits need to be backed out. The only time that it might be better to place them in separate repositories is when only one of the projects is being changed regularly and the other connected projects are rarely changed.
I think this question is related to one I answered here. basically Git by its nature supports a very fine granular structure when it comes to projects/repositories. I have read and been taught that 1 repository per project is almost always best practice. You lose almost nothing by keeping the projects separate and gain a lot as other have been describing.
Probably, it will be more performant to work with if You will create multiple git repositories.
If You will make a branch, only project's files would be branched, and not all the projects.
Small project it will be faster to analyze, to commit. Operations will take less of time.
The log will be more clear also, You could make more granulated configuration if You will have multiple git repositories.

How to share code across multiple repository with Mercurial?

Over time, I developed a variety of utility functions, classes and controls that I reuse across multiple projects. For each of those projects I have a Mercurial repository and I copy the re-usable projects. Obviously this is bad since if I fix a bug in one of the reusable components, I have to copy the code manually in all repository and I could make a mistake in the process.
How do you handle such situation? How to share code across multiple repository with Mercurial in such way that if I do an update in one repository, I can easily integrate with the others.
Check out subrepositories: https://www.mercurial-scm.org/wiki/Subrepository
They won't help you keep the other copies up to date (you'll have to do that manually), but they will make that easy (you'd use hg pull; hg update in the subrepo, then commit the parent repo).
Another option (which I use on a different project) is to mandate a layout, then simply assume that the "utilities" repository is stored at ../utils, relative to each "real" repository.

Assets repository

I use mercurial as my main scm tool. Before I've worked with svn and others but all of these repositories are designed specificaly to work with plain text code.
My assets and documents are binaries and usually are big files (10~50 MB each) and my problem is not about keep track of what has changed exactly, but I want at least the basic functionality that scm's give me for code: tracking what files have changed (who,when,comments...), historical backup and distributed sync.
What tools do you use for this and how do you link changes in assets with changes in code if you are using different repository kinds?

Splitting up a project into multiple smaller projects

I have a library containing a few classes. Now I want to split up this library into two separate libraries. What is the correct/best way to handle this in combination with source control?
My initial thought is to create a new repository for each new project and in the initial commit mention that it was split of from a now unmaintained project.
While I only have a few commits so far, an issue with this method is that the history of the project is lost.
It depends on which version control you are using. For instance, in git you can use filter-branch to do the trick.
You can make a copy of the original repository, then use git filter-branch to keep the history of the part you are interested in and dropping everything else.
$ git filter-branch --subdirectory-filter mydir1
$ git gc --aggressive
$ git prune
Beware this is destructive. You will see a considerable reduction of the repository size, only having the history of mydir1 and removing all those unreachable objects.
Then, repeat the same for other libraries/subdirectories. In that way, you will keep only the history that belongs to each part/library/directory.
If you are using a different version control system, then you have to figure out the equivalent way to do it.
The rule of thumb I follow depends on whether you will be developing and/or deploying the libraries independently. If you are separating the libraries simply for code organization and the code is deployed as a single solution, then there is no need or benefit to creating separate repositories.
On the other hand, if you will be versioning and releasing the libraries independently, then having the code in separate repositories helps this. So, for instance, if you are separating the code because some of it belongs in a share framework, then put the framework code in its own repository. This will allow you to maintain, build and release the framework separate from any applications that are built using the framework.
HTH
you can create a new repository but also you can create new projects under the same repository and delete the old one in time. actually, that's up to you. if you see the previous project as test level or pre-alpha stage, you may want to create a new repository. but other than that, using the same repository is very likely for this situation.

Working with folders in RCS

I have been following the tutorial http://www.burlingtontelecom.net/~ashawley/rcs/tutorial.html on how to work with files using RCS. This works well but only with one file. Is there a way to create an RCS file with directories as well?
I have a project folder called myproject, and in this directory I have all my files for that project. I want to create a revision control system for the myproject folder and all its files that are inside.
As William's comment says, RCS only works with single files. (It also doesn't seem to be particularly suitable for multiple-user stuff.)
Of course, nothing stops you from putting each (source) file in a directory under RCS control; in fact, this is essentially what CVS does (though in recent versions it handles the RCS data itself, rather than invoking RCS to do it as it used to do). Unfortunately, this fragments the change history rather badly; a commit affecting many files ends up as separate commits to each file, which just happen to have the same commit message (and timestamp?), and in general every file will have a different revision in what the user might like to think of as the "same" revision. (This makes tags quite essential.) CVS also has issues with the atomicity of commits: you could end up with commit A and commit B getting tangled up, such that in file foo commit A precedes commit B, but in file bar commit B precedes commit A!
SVN (Subversion) is an attempt to rectify some of the problems in CVS, though it also brings some new limitations, and keeps many of the existing ones; it is probably wiser (as William implies) to just use a distributed version control system (DVCS) for your multi-file projects. There are many choices:
Darcs uses a unique patch-based model: a repository is treated as a sequence of patches, which can be applied to an empty tree to build the current revision; patches can often be reordered by "commuting" pairs of patches, and cherry-picking patches from other repositories is quite easy. The downside is that the change history is a bit less clear than in most DVCSes. See http://wiki.darcs.net/Using/Model, http://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theory.
Directed-acyclic-graph (DAG) based DVCSes model a repository as a directed acyclic graph of revisions, where each revision can have one parent, two parents, or perhaps more. Each revision has an associated file tree state; sometimes renames are also tracked somehow.
Git, as already mentioned. Has a very simple model, but a very complicated interface: there are many commands, some of which are not really intended for humans to use (owing to many parts of it having been prototyped in shell script, probably), so it can be hard to find the ones you want. Also, its model might be a bit too simple: it doesn't track renames at all.
Bazaar (a.k.a. bzr) has a more complicated model, including support for file/directory renames. It's difficult to say how much more complicated, though, because whatever documentation may exist is not nearly as accessible as Git's. It does, however, have a rather simpler user interface, and there are a number of useful plugins, including a distributed-development-friendly SVN plugin: committing from a branch back to SVN need not interfere with the validity of others' branches of your branches, and bzr metadata is even committed back to SVN. Can make things much less painful if you want to start hacking on an SVN-based project without having commit access, but hope to get your changes committed eventually. Bazaar is my personal favorite DAG-based DVCS.
Mercurial (a.k.a. hg) seems fairly similar to Bazaar, though I think it tracks renames only for individual files, not for directories. It also supports plugins, though its SVN plugin isn't as nice as Bazaar's: it doesn't support lossless commits, so branching from other peoples' branches is unwise. I don't have much experience with it, so I can't really evaluate it in-depth.
As the comments already mention, if you are starting out with version control, you would be well advised to choose a newer system than RCS (git, mercurial, fossil, subversion, ...). That said, RCS still works fine for a single developer working primarily on a single machine - I still use it for my own code because I've not yet OK worked out how to get the (20+ years of) history I want into git in the way I want it.
Anyway, to use RCS, make sure you have an RCS sub-directory in each directory where you have working source code under RCS management. The RCS files will be placed in the sub-directory automatically, and retrieved automatically. If your version of make is not already aware of RCS, then you can train it so that it is - or get a version of make that does (GNU Make, for example).
TL:DR - Look into DCVS for an alternative of RCS. It uses CVS, which uses RCS, but it's more modular for working in a repository that is distributed, as well as having a hierarchy of directories.
I'm currently going through a similar issue, and may have found something worthy of note, especially for people who are being forced to use a light, command-line based revision control systems with multiple team members.
My manager will not get off this idea of using RCS as our version control. But for the specifications, he wants developers to be able to create and edit on their own repository on a localized server within our company. Two issues with this:
RCS does not create, nor hold any sort of 'repository'. It is software that keeps track of file edits, on a Per File Basis. Meaning that the 'repository' is nothing more than another directory with RCS checked-in files. This is sub-par for team-geared projects, to say the least.
On a large project with multiple directories and tens of individual working files, even the prospect of creating a top-level RCS directory with a symbolic link in the working directories gives rise to complications such as naming conventions, as well as forgetting which file came from which bottom-level / working directory.
With what SamB posted, even CVS gives additional problems with RCS that we now have to account for, but gives us a slight ability for some additional hierarchy. But one suggestion he forgot was DCVS.
It's nothing more than an extension of CVS, CVSup, and:
contains functionality to distribute CVS repositories with local lines of development and automatically handles synchronization of the distributed repositories in the background.