Tracking in-house libraries in subrepositories

Tracking in-house libraries in subrepositories - version-control

We are developing an in-house framework that several projects are going to use.
The idea is to have the entire framework tracked as a mercurial
subrepository of each project's repository. This resulted in the following
subrepo tree (see thin-shell
repository):
ProjectMaster/
Project/
CommonLib/
FrameworkMaster/
Framework/
CommonLib/
Does this make any sense to you? Is there a better/simpler way to handle these
dependencies which doesn't involve subrepositories?
Specifically, does it make sense to have both CommonLib subrepositories?
If not, would it make sense for Project to use FrameworkMaster/CommonLib? This
could get messy if the dependencies were more complicated.
Where would you open feature branches? On the master? Only in the relevant
subrepository?
If you don't have feature branches on the master, every time you clone the
repository you end up getting the subrepo state of the last commit, which may put
any subrepo in any random feature branch. Very confusing.
If you have feature branches on the master, you still need a feature branch
in at least one subrepo to avoid having unnamed heads there.
In general, this solution sounds difficult to work with. Any suggestions?

As per description in the thin-shell repository documentation:
For a thin-shell repository, all repositories containing 'real' code
have no subrepositories of their own (ie. they are leaf nodes). They can
thus be treated as completely ordinary repositories and a developer can
largely ignore the additional complexities of subrepositories. Work can
continue in these repositories even if their siblings become unavailable.
The resulting structure you have described has nested subrepositories that contains 'real' code and hence its not the recommended way. As per mercurial documentation, the recommended structure would be as following (I don't know if /FrameworkMaster/ was included as just place holder for nested subrepositories or it also have 'real' code. If /FrameworkMaster/ also has 'real' code then it should also be included in following as sibling leaf node):
ProjectMaster/
Project/
Framework/
CommonLib/
So, to answer your questions:
Does this make any sense to you? Is there a better/simpler way to handle these dependencies which doesn't involve subrepositories?
The better/simpler way would be to use the thin-shell repository to simplify the complex dependencies.
Specifically, does it make sense to have both CommonLib subrepositories?
If both Project and Framework are depending on same version or branch of CommonLib then it doesn't make sense to have it at both places. But if for some legacy reasons Project and Framework are requiring different version or branch of CommonLib then it does make sense to have CommonLib at both places.
Where would you open feature branches? On the master? Only in the relevant subrepository?
Not sure here what you mean by feature branches. Did you meant to say 'future' here?

Related

Multiple Git repositories for each Eclipse project or one Git repository

I am in the process of moving to Git from SVN. In SVN I had multiple eclipse projects in a single SVN repository that is convenient for browsing projects. I was going to move to having one git repository per eclipse project but EGit suggests doing otherwise.
The guide for EGit suggests putting multiple projects into a single Git repository.
Looking at similar questions such as this suggest one project per repository.
Which approach is best practice and what do people implement?

It depends on how closely-related these projects are. Ask yourself the following questions:
Will they always need to be branched/tagged together?
Will you want to commit over all projects, or does a commit mostly only touch one project?
Does the build system operate on all of them or do they have a boundary there?
If you put them all in one, some things from above will be easier. You will only have to branch/tag/stash/commit in one repository, as opposed to doing it for every repository separately.
But if you need to have e.g. separate release cycles for the projects, then it's necessary to have each project in an independent repository.
Note that you can always split up a repository later, or combine multiple repositories into one again without losing history.
Combining is a bit harder to do than splitting, so I would go for one repository first and see how it goes.

I use 1 repo per project.
Some reasoning:
When you discover you messed up something after several commits, it's much easier to fix when it's just one project. Just think about, you did commits to two other projects and now you need to fix the commit you did on the 3rd project.
As Fedir said, your history and log is much cleaner. It only shows the commits for that project.
It works better with the development workflow I have. I have a master branch for production, develop branch for, well, development, and I create branches to implement features (you can read more about it here: http://blog.avirtualhome.com/development-workflow-using-git/)
When you work in a team, and so "share" the git repo, do the team members really need all the other projects as well?
Just a few thoughts, but what it boils down to: Do what works for you.

I have multiple projects (Eclipse projects) and have tried different things to find out what worked best in terms of actual daily development. Here is what I found and I think that most people would find the same thing if they kept track of the results and analyzed the results objectively.
In short applying the following rules will give the best results:
Make a separate repository for each project group.
Each project group consist of a group of projects that are tightly connected to each other, that should be administered together and that cannot be easily decoupled from each other.
A project group can contain a single project.
A project group that contains multiple projects should be examined to see if some of its projects can be decoupled from each other so it can be split into smaller project groups that are still contain projects that are tightly connected to each other, that should be administered together and that cannot be easily decoupled from each other.
The following guidelines explain this process for determining which projects to put in the same repository in more detail:
If a project is not closely connected to any other project (for example, the project can be opened without other projects being opened and no other projects relies on the project being opened when they are opened) then you should definitely place it in its own repository for the reasons explained in the answers above this one.
If a project is dependent upon other projects or other projects depend upon the project then it comes down to exactly how connected are they upon each other, how well they can be packaged together and how easily can they be decoupled from each other.
A) For example a testing project that contains junit test classes to test the classes of a main project is a case where the two projects are very connected with each other, can easily be packaged together and cannot be easily decoupled from each other. These projects should be placed in the same repository for the reasons explained in part C below.
B) In a case where one project relies on another project to provide some sort of shared resources it really comes down to how well that they can be administered together and how easily that they can be decoupled from each other. For example if the project with the shared resources is relied upon by many projects, then it should be put in its own repository because the other unrelated projects are impacted by changes to the shared source code project. In a case like this, the shared resources project should be decoupled from the dependent projects instead of being directly connected to the dependent projects. (For example, it would be better to create versioned archive files [Jar files with a name like "projectName".1.0.1.0.jar for example] and include a copy of those in each project instead of sharing the resources by linking the projects together.)
C) If the multiple projects are connected, can be easily administered together but cannot be easily decoupled from each other, then it depends upon how tightly connected they are with each other.
I) If the projects are put into one repository, then the projects will be kept in sync with each other in the repository each time there is a commit, which can be a real life saver if the projects are tightly connected. However, this also creates the issues noted in the answers above this one.
II) If the projects are put into separate repositories, then you will have to take care to keep there commits in sync with each other and be sure to include some sort of mechanism to indicate which commits belong to the same sync point across the projects (Perhaps something like including the same sync point number in the comments for the commit of each project when a group of commits is done across the projects.)
III) So in cases like this, it is almost always better to put these projects together into a single repository to reduce the overhead of human effort in syncing the commits and to avoid human error should the commits need to be backed out. The only time that it might be better to place them in separate repositories is when only one of the projects is being changed regularly and the other connected projects are rarely changed.

I think this question is related to one I answered here. basically Git by its nature supports a very fine granular structure when it comes to projects/repositories. I have read and been taught that 1 repository per project is almost always best practice. You lose almost nothing by keeping the projects separate and gain a lot as other have been describing.

Probably, it will be more performant to work with if You will create multiple git repositories.
If You will make a branch, only project's files would be branched, and not all the projects.
Small project it will be faster to analyze, to commit. Operations will take less of time.
The log will be more clear also, You could make more granulated configuration if You will have multiple git repositories.

How to share code across multiple repository with Mercurial?

Over time, I developed a variety of utility functions, classes and controls that I reuse across multiple projects. For each of those projects I have a Mercurial repository and I copy the re-usable projects. Obviously this is bad since if I fix a bug in one of the reusable components, I have to copy the code manually in all repository and I could make a mistake in the process.
How do you handle such situation? How to share code across multiple repository with Mercurial in such way that if I do an update in one repository, I can easily integrate with the others.

Check out subrepositories: https://www.mercurial-scm.org/wiki/Subrepository
They won't help you keep the other copies up to date (you'll have to do that manually), but they will make that easy (you'd use hg pull; hg update in the subrepo, then commit the parent repo).
Another option (which I use on a different project) is to mandate a layout, then simply assume that the "utilities" repository is stored at ../utils, relative to each "real" repository.

Splitting up a project into multiple smaller projects

I have a library containing a few classes. Now I want to split up this library into two separate libraries. What is the correct/best way to handle this in combination with source control?
My initial thought is to create a new repository for each new project and in the initial commit mention that it was split of from a now unmaintained project.
While I only have a few commits so far, an issue with this method is that the history of the project is lost.

It depends on which version control you are using. For instance, in git you can use filter-branch to do the trick.
You can make a copy of the original repository, then use git filter-branch to keep the history of the part you are interested in and dropping everything else.
$ git filter-branch --subdirectory-filter mydir1
$ git gc --aggressive
$ git prune
Beware this is destructive. You will see a considerable reduction of the repository size, only having the history of mydir1 and removing all those unreachable objects.
Then, repeat the same for other libraries/subdirectories. In that way, you will keep only the history that belongs to each part/library/directory.
If you are using a different version control system, then you have to figure out the equivalent way to do it.

The rule of thumb I follow depends on whether you will be developing and/or deploying the libraries independently. If you are separating the libraries simply for code organization and the code is deployed as a single solution, then there is no need or benefit to creating separate repositories.
On the other hand, if you will be versioning and releasing the libraries independently, then having the code in separate repositories helps this. So, for instance, if you are separating the code because some of it belongs in a share framework, then put the framework code in its own repository. This will allow you to maintain, build and release the framework separate from any applications that are built using the framework.
HTH

you can create a new repository but also you can create new projects under the same repository and delete the old one in time. actually, that's up to you. if you see the previous project as test level or pre-alpha stage, you may want to create a new repository. but other than that, using the same repository is very likely for this situation.

revision control for many unrelated files

I'm curious to get people's thoughts on how to manage version control for unrelated functions in Matlab.
I keep a reasonably large set of general purpose scripts, each of which is more or less independent of the others. I've been keeping them all in a single directory, containing a single repository in Mercurial. I'm starting to collaborate much more, and I'd like my collaborators to be able modify the files, commit, branch, and merge.
The problem is that the files are independent of one another. Essentially, they're like many separate little projects. But Mercurial treats the repository as a single entity. So if a collaborator modifies file A and B, and I only want to merge in the changes from file A, things get complicated. I know that I could merge from the collaborator, then revert file B, but I'm wondering if there's a simpler way to handle this setup.
I could set up many tiny repositories to manage each file separately, but that also gets complicated.
I'm open to changing version control systems (although I like Mercurial a lot). Any suggestions?

It is considered a best practice to check in code after each bug fix/feature addition/or what not. Given your files are really independent "projects" it seems unlikely a bug or feature would span multiple files. Probably the best you can do is encourage your colleagues in best practices to commit changes only for a single file at once. Explain that better discipline about checking in leads to more manageable source control later. Hopefully you can get most to follow the practice and the few obstinate ones just stop taking their commits for.

It really depends on your typical reasons for merging one change but not the other. If you're using it to create a software configuration, i.e. sometimes you want to use version 1 of file A and version 2 of file B and sometimes it's the other way around, then you probably want to use subrepos to hold each file. If it's because you never want to accept part of a collaborator's change, then they need to be instructed how to make their changes more cohesive and submit them separately. That can sometimes be a difficult concept for people who either haven't used source control before, or who are accustomed to source control like svn that has little or no intrinsic concept of a changeset.

It depends whether you want to maintain a single 'master' version of the files, merging in changes that you like and ignoring others. If collaborators want to develop other branches, then they should perhaps clone the repository, and you can then accept the changesets that you want in the master.
If you want to veto changes by other collaborators, then the changes either need to be kept separate (via a cloned repository or branch) or you need a review process before changes are pushed back to the trunk.

I always use incoming repositories for collaborators. They match what the other person has made, but it avoids messing with my own repository. When you do this, you can then cherrypick their new changesets into your own repository with the transplant extension.

SVN Branching in Eclipse (Conceptual)

I understand the basic concept of a branch and merge. All of the explanations I've found talk about branching your entire trunk to create a branch project and working on it and then merging it back. Is it possible to branch a subset of a project?
I think an example will help me explain best what I want to do. Suppose I have an application with ten files file0 through file10. All files are interdependent and to be able to test any one file all the others need to be included in the build. I want to work on file0 but don't need to make changes to file1 through file10. Can I branch file0 so changes committed to file0 will update something like myrepos/branches/a-branch/file0 but all the other files in my working copy will simply be from the trunk?
The reason I want to do this is that I'm working on a huge j2ee application with tens of thousands of files and it seems like branching the entire thing will take a really long time. Also, I'm using eclipse with subclipse (and I could be wrong about this) but it seem like if I branch a project in eclipse then I will have to set up a new eclipse project to point to the branch. Unfortunately importing this particular project from SVN to eclipse takes several hours due to the size of the application. It isn't realistic for me to spend this much time.
I suppose that I could have the concepts wrong. Perhaps branching an entire project doesn't require a new working copy at all?
Thanks for any light shed on this issue.

Branching an entire (even) very large tree in Subversion is a very cheap operation, which does lazy (O(1) time) file copying.
You don't necessarily have to change your entire working copy to work on just one changed file. You can use svn switch to switch one file or one directory in your working copy to be a checked out version of the file on the branch.

In Subversion, making a branch is simply making a copy of a hierarchy of directories. Therefore, you can branch a subset, but only if that subset can be defined by a hierarchy of directories.
Can I
branch file0 so changes committed to
file0 will update something like
myrepos/branches/a-branch/file0 but
all the other files in my working copy
will simply be from the trunk?
To answer this question: No, you can't branch a single file. However, what I think you want to do instead is to make a branch and work on file0 there. As you make changes to trunk files, you simply merge them into your branch where you're working on file0.
In this way, you'll always have the latest information from trunk, which will let you test the file0 changes independently of trunk. Then you can use svn switch to move your "file lens" between the trunk and the branch (but beware, Eclipse may complain about such shenanigans).

svn branching is based on lazy copy mechanism, so you can branch safely your all project: that would not take long.
As mentioned in the the question "How do I branch an individual file in SVN?", you could branch a subset, but I believe this would be dangerous with the svn:merginfo properties mechanism: it works better it that property is set from the root of the project.

Branching in SVN is an O(1) operation. Also, as SVN internally employs lazy copying, you only pay a space penalty for what you change.
So if you are unsure, why not go ahead and branch the whole project?
(As quark mentioned, one problem with branching big projects is that, if you checkout several branches/the trunk in parallel, this might take a lot of local disk space.)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse