How to share code across multiple repository with Mercurial? - version-control

Over time, I developed a variety of utility functions, classes and controls that I reuse across multiple projects. For each of those projects I have a Mercurial repository and I copy the re-usable projects. Obviously this is bad since if I fix a bug in one of the reusable components, I have to copy the code manually in all repository and I could make a mistake in the process.
How do you handle such situation? How to share code across multiple repository with Mercurial in such way that if I do an update in one repository, I can easily integrate with the others.

Check out subrepositories: https://www.mercurial-scm.org/wiki/Subrepository
They won't help you keep the other copies up to date (you'll have to do that manually), but they will make that easy (you'd use hg pull; hg update in the subrepo, then commit the parent repo).
Another option (which I use on a different project) is to mandate a layout, then simply assume that the "utilities" repository is stored at ../utils, relative to each "real" repository.

Related

Tracking in-house libraries in subrepositories

We are developing an in-house framework that several projects are going to use.
The idea is to have the entire framework tracked as a mercurial
subrepository of each project's repository. This resulted in the following
subrepo tree (see thin-shell
repository):
ProjectMaster/
Project/
CommonLib/
FrameworkMaster/
Framework/
CommonLib/
Does this make any sense to you? Is there a better/simpler way to handle these
dependencies which doesn't involve subrepositories?
Specifically, does it make sense to have both CommonLib subrepositories?
If not, would it make sense for Project to use FrameworkMaster/CommonLib? This
could get messy if the dependencies were more complicated.
Where would you open feature branches? On the master? Only in the relevant
subrepository?
If you don't have feature branches on the master, every time you clone the
repository you end up getting the subrepo state of the last commit, which may put
any subrepo in any random feature branch. Very confusing.
If you have feature branches on the master, you still need a feature branch
in at least one subrepo to avoid having unnamed heads there.
In general, this solution sounds difficult to work with. Any suggestions?
As per description in the thin-shell repository documentation:
For a thin-shell repository, all repositories containing 'real' code
have no subrepositories of their own (ie. they are leaf nodes). They can
thus be treated as completely ordinary repositories and a developer can
largely ignore the additional complexities of subrepositories. Work can
continue in these repositories even if their siblings become unavailable.
The resulting structure you have described has nested subrepositories that contains 'real' code and hence its not the recommended way. As per mercurial documentation, the recommended structure would be as following (I don't know if /FrameworkMaster/ was included as just place holder for nested subrepositories or it also have 'real' code. If /FrameworkMaster/ also has 'real' code then it should also be included in following as sibling leaf node):
ProjectMaster/
Project/
Framework/
CommonLib/
So, to answer your questions:
Does this make any sense to you? Is there a better/simpler way to handle these dependencies which doesn't involve subrepositories?
The better/simpler way would be to use the thin-shell repository to simplify the complex dependencies.
Specifically, does it make sense to have both CommonLib subrepositories?
If both Project and Framework are depending on same version or branch of CommonLib then it doesn't make sense to have it at both places. But if for some legacy reasons Project and Framework are requiring different version or branch of CommonLib then it does make sense to have CommonLib at both places.
Where would you open feature branches? On the master? Only in the relevant subrepository?
Not sure here what you mean by feature branches. Did you meant to say 'future' here?

Multiple Git repositories for each Eclipse project or one Git repository

I am in the process of moving to Git from SVN. In SVN I had multiple eclipse projects in a single SVN repository that is convenient for browsing projects. I was going to move to having one git repository per eclipse project but EGit suggests doing otherwise.
The guide for EGit suggests putting multiple projects into a single Git repository.
Looking at similar questions such as this suggest one project per repository.
Which approach is best practice and what do people implement?
It depends on how closely-related these projects are. Ask yourself the following questions:
Will they always need to be branched/tagged together?
Will you want to commit over all projects, or does a commit mostly only touch one project?
Does the build system operate on all of them or do they have a boundary there?
If you put them all in one, some things from above will be easier. You will only have to branch/tag/stash/commit in one repository, as opposed to doing it for every repository separately.
But if you need to have e.g. separate release cycles for the projects, then it's necessary to have each project in an independent repository.
Note that you can always split up a repository later, or combine multiple repositories into one again without losing history.
Combining is a bit harder to do than splitting, so I would go for one repository first and see how it goes.
I use 1 repo per project.
Some reasoning:
When you discover you messed up something after several commits, it's much easier to fix when it's just one project. Just think about, you did commits to two other projects and now you need to fix the commit you did on the 3rd project.
As Fedir said, your history and log is much cleaner. It only shows the commits for that project.
It works better with the development workflow I have. I have a master branch for production, develop branch for, well, development, and I create branches to implement features (you can read more about it here: http://blog.avirtualhome.com/development-workflow-using-git/)
When you work in a team, and so "share" the git repo, do the team members really need all the other projects as well?
Just a few thoughts, but what it boils down to: Do what works for you.
I have multiple projects (Eclipse projects) and have tried different things to find out what worked best in terms of actual daily development. Here is what I found and I think that most people would find the same thing if they kept track of the results and analyzed the results objectively.
In short applying the following rules will give the best results:
Make a separate repository for each project group.
Each project group consist of a group of projects that are tightly connected to each other, that should be administered together and that cannot be easily decoupled from each other.
A project group can contain a single project.
A project group that contains multiple projects should be examined to see if some of its projects can be decoupled from each other so it can be split into smaller project groups that are still contain projects that are tightly connected to each other, that should be administered together and that cannot be easily decoupled from each other.
The following guidelines explain this process for determining which projects to put in the same repository in more detail:
If a project is not closely connected to any other project (for example, the project can be opened without other projects being opened and no other projects relies on the project being opened when they are opened) then you should definitely place it in its own repository for the reasons explained in the answers above this one.
If a project is dependent upon other projects or other projects depend upon the project then it comes down to exactly how connected are they upon each other, how well they can be packaged together and how easily can they be decoupled from each other.
A) For example a testing project that contains junit test classes to test the classes of a main project is a case where the two projects are very connected with each other, can easily be packaged together and cannot be easily decoupled from each other. These projects should be placed in the same repository for the reasons explained in part C below.
B) In a case where one project relies on another project to provide some sort of shared resources it really comes down to how well that they can be administered together and how easily that they can be decoupled from each other. For example if the project with the shared resources is relied upon by many projects, then it should be put in its own repository because the other unrelated projects are impacted by changes to the shared source code project. In a case like this, the shared resources project should be decoupled from the dependent projects instead of being directly connected to the dependent projects. (For example, it would be better to create versioned archive files [Jar files with a name like "projectName".1.0.1.0.jar for example] and include a copy of those in each project instead of sharing the resources by linking the projects together.)
C) If the multiple projects are connected, can be easily administered together but cannot be easily decoupled from each other, then it depends upon how tightly connected they are with each other.
I) If the projects are put into one repository, then the projects will be kept in sync with each other in the repository each time there is a commit, which can be a real life saver if the projects are tightly connected. However, this also creates the issues noted in the answers above this one.
II) If the projects are put into separate repositories, then you will have to take care to keep there commits in sync with each other and be sure to include some sort of mechanism to indicate which commits belong to the same sync point across the projects (Perhaps something like including the same sync point number in the comments for the commit of each project when a group of commits is done across the projects.)
III) So in cases like this, it is almost always better to put these projects together into a single repository to reduce the overhead of human effort in syncing the commits and to avoid human error should the commits need to be backed out. The only time that it might be better to place them in separate repositories is when only one of the projects is being changed regularly and the other connected projects are rarely changed.
I think this question is related to one I answered here. basically Git by its nature supports a very fine granular structure when it comes to projects/repositories. I have read and been taught that 1 repository per project is almost always best practice. You lose almost nothing by keeping the projects separate and gain a lot as other have been describing.
Probably, it will be more performant to work with if You will create multiple git repositories.
If You will make a branch, only project's files would be branched, and not all the projects.
Small project it will be faster to analyze, to commit. Operations will take less of time.
The log will be more clear also, You could make more granulated configuration if You will have multiple git repositories.

Splitting up a project into multiple smaller projects

I have a library containing a few classes. Now I want to split up this library into two separate libraries. What is the correct/best way to handle this in combination with source control?
My initial thought is to create a new repository for each new project and in the initial commit mention that it was split of from a now unmaintained project.
While I only have a few commits so far, an issue with this method is that the history of the project is lost.
It depends on which version control you are using. For instance, in git you can use filter-branch to do the trick.
You can make a copy of the original repository, then use git filter-branch to keep the history of the part you are interested in and dropping everything else.
$ git filter-branch --subdirectory-filter mydir1
$ git gc --aggressive
$ git prune
Beware this is destructive. You will see a considerable reduction of the repository size, only having the history of mydir1 and removing all those unreachable objects.
Then, repeat the same for other libraries/subdirectories. In that way, you will keep only the history that belongs to each part/library/directory.
If you are using a different version control system, then you have to figure out the equivalent way to do it.
The rule of thumb I follow depends on whether you will be developing and/or deploying the libraries independently. If you are separating the libraries simply for code organization and the code is deployed as a single solution, then there is no need or benefit to creating separate repositories.
On the other hand, if you will be versioning and releasing the libraries independently, then having the code in separate repositories helps this. So, for instance, if you are separating the code because some of it belongs in a share framework, then put the framework code in its own repository. This will allow you to maintain, build and release the framework separate from any applications that are built using the framework.
HTH
you can create a new repository but also you can create new projects under the same repository and delete the old one in time. actually, that's up to you. if you see the previous project as test level or pre-alpha stage, you may want to create a new repository. but other than that, using the same repository is very likely for this situation.

revision control for many unrelated files

I'm curious to get people's thoughts on how to manage version control for unrelated functions in Matlab.
I keep a reasonably large set of general purpose scripts, each of which is more or less independent of the others. I've been keeping them all in a single directory, containing a single repository in Mercurial. I'm starting to collaborate much more, and I'd like my collaborators to be able modify the files, commit, branch, and merge.
The problem is that the files are independent of one another. Essentially, they're like many separate little projects. But Mercurial treats the repository as a single entity. So if a collaborator modifies file A and B, and I only want to merge in the changes from file A, things get complicated. I know that I could merge from the collaborator, then revert file B, but I'm wondering if there's a simpler way to handle this setup.
I could set up many tiny repositories to manage each file separately, but that also gets complicated.
I'm open to changing version control systems (although I like Mercurial a lot). Any suggestions?
It is considered a best practice to check in code after each bug fix/feature addition/or what not. Given your files are really independent "projects" it seems unlikely a bug or feature would span multiple files. Probably the best you can do is encourage your colleagues in best practices to commit changes only for a single file at once. Explain that better discipline about checking in leads to more manageable source control later. Hopefully you can get most to follow the practice and the few obstinate ones just stop taking their commits for.
It really depends on your typical reasons for merging one change but not the other. If you're using it to create a software configuration, i.e. sometimes you want to use version 1 of file A and version 2 of file B and sometimes it's the other way around, then you probably want to use subrepos to hold each file. If it's because you never want to accept part of a collaborator's change, then they need to be instructed how to make their changes more cohesive and submit them separately. That can sometimes be a difficult concept for people who either haven't used source control before, or who are accustomed to source control like svn that has little or no intrinsic concept of a changeset.
It depends whether you want to maintain a single 'master' version of the files, merging in changes that you like and ignoring others. If collaborators want to develop other branches, then they should perhaps clone the repository, and you can then accept the changesets that you want in the master.
If you want to veto changes by other collaborators, then the changes either need to be kept separate (via a cloned repository or branch) or you need a review process before changes are pushed back to the trunk.
I always use incoming repositories for collaborators. They match what the other person has made, but it avoids messing with my own repository. When you do this, you can then cherrypick their new changesets into your own repository with the transplant extension.

How do people manage changes to common library files stored across mutiple (Mercurial) repositories?

This is perhaps not a question unique to Mercurial, but that's the SCM that I've been using most lately.
I work on multiple projects and tend to copy source code for libraries or utilities from a previous project to get a leg up on starting a new project. The problem comes in when I want to merge all the changes I made in my latest project, back into a "master" copy of those shared library files.
Since the files stored in disjoint repositories will have distinct version histories, Mercurial won't be able to perform an intelligent merge if I just copy the files back to the master repo (or even between two independent projects).
I'm looking for an easy way to preserve the change history so I can merge library files back to the master with a minimum of external record keeping (which is one of the reasons I'm using SVN less as merges require remembering when copies were made across branches).
Perhaps I need to do a bit more up-front organization of my repository to prepare for a future merge back to a common master.
Three solutions, pick your favorite:
Put all projects into one repository.
Make a separate repository for shared code and different repository for each project.
One repository with Subrepositories: https://www.mercurial-scm.org/wiki/subrepos, keep all common code in one subrepo and different subrepos for each project.
Copying actual files between repositories with no common ancestors will never be optimal as history is not preserved.
I'd recommend against your "copy the sourcecode" practice but use binary distribution for your custom libraries instead. These binaries are checked in along the sourcecode.
reduces build-time
no overhead of tracking changes in all copies of the library
you can use different versions of the same library in different projects.
EDIT: And for the issue with "common" or "toolbox" libaries in general, read this post from ayende.
use the transplant extension