SVN Branching in Eclipse (Conceptual) - eclipse

I understand the basic concept of a branch and merge. All of the explanations I've found talk about branching your entire trunk to create a branch project and working on it and then merging it back. Is it possible to branch a subset of a project?
I think an example will help me explain best what I want to do. Suppose I have an application with ten files file0 through file10. All files are interdependent and to be able to test any one file all the others need to be included in the build. I want to work on file0 but don't need to make changes to file1 through file10. Can I branch file0 so changes committed to file0 will update something like myrepos/branches/a-branch/file0 but all the other files in my working copy will simply be from the trunk?
The reason I want to do this is that I'm working on a huge j2ee application with tens of thousands of files and it seems like branching the entire thing will take a really long time. Also, I'm using eclipse with subclipse (and I could be wrong about this) but it seem like if I branch a project in eclipse then I will have to set up a new eclipse project to point to the branch. Unfortunately importing this particular project from SVN to eclipse takes several hours due to the size of the application. It isn't realistic for me to spend this much time.
I suppose that I could have the concepts wrong. Perhaps branching an entire project doesn't require a new working copy at all?
Thanks for any light shed on this issue.

Branching an entire (even) very large tree in Subversion is a very cheap operation, which does lazy (O(1) time) file copying.
You don't necessarily have to change your entire working copy to work on just one changed file. You can use svn switch to switch one file or one directory in your working copy to be a checked out version of the file on the branch.

In Subversion, making a branch is simply making a copy of a hierarchy of directories. Therefore, you can branch a subset, but only if that subset can be defined by a hierarchy of directories.
Can I
branch file0 so changes committed to
file0 will update something like
myrepos/branches/a-branch/file0 but
all the other files in my working copy
will simply be from the trunk?
To answer this question: No, you can't branch a single file. However, what I think you want to do instead is to make a branch and work on file0 there. As you make changes to trunk files, you simply merge them into your branch where you're working on file0.
In this way, you'll always have the latest information from trunk, which will let you test the file0 changes independently of trunk. Then you can use svn switch to move your "file lens" between the trunk and the branch (but beware, Eclipse may complain about such shenanigans).

svn branching is based on lazy copy mechanism, so you can branch safely your all project: that would not take long.
As mentioned in the the question "How do I branch an individual file in SVN?", you could branch a subset, but I believe this would be dangerous with the svn:merginfo properties mechanism: it works better it that property is set from the root of the project.

Branching in SVN is an O(1) operation. Also, as SVN internally employs lazy copying, you only pay a space penalty for what you change.
So if you are unsure, why not go ahead and branch the whole project?
(As quark mentioned, one problem with branching big projects is that, if you checkout several branches/the trunk in parallel, this might take a lot of local disk space.)

Related

Version Control Branching

I'm using TFS version control and am very new to version control in general. The project I'm working on needs a new feature added. Would I branch the entire project or would I only branch those individual files or folders as I work on them? It seems easier to just copy the solution folder and work on an entire new copy but then if I have to go back and fix bugs in the original version I'll have to do them again in the new version. I'm a bit nervous because I've never used branching before and I don't want to screw my project up.
it's personal preference but check out the Rangers Guide to branching here http://blogs.msdn.com/b/visualstudioalm/archive/2012/10/17/alm-rangers-ship-the-new-branching-and-merging-guide-v2-1.aspx
for your situation, i would branch the whole solution, it makes it easier to develop if you have the whole solution, when you are happy you can then merge in your 'feature' branch back to the original branch. If changes have occured on the original branch in the mean time you may have to manualy merge files during the merge operation.

Buildfile with trunk/branch or separate

The issue that I am trying to figure out is whether it is best to keep the buildfile together with the source code, i.e. trunk and branches, or in some separate location (obviously still under SCM).
The question: (to keep in mind while reading the rest of the text) Buildfile that ensures branch re-buildability at any time (at expense of maintenance), or one that ensures latest bugfixes/improvements (to build process) are quickly used by multiple branches and different projects (at expense of backward compatibility with older branches)
It doesn't matter what we are building, or what technologies are used, but just for the sake of completeness: making a mobile application, building/packaging with ANT, using SVN for SCM.
Buildfile = instructions for your builder/compiler/packager to compile and package an application from source.
Buildfile with code
This is what we have right now. Ant's build.xml is stored alongside the main code in SVN. A number of other supporting "packaging" files (Apple's provisioning profiles and certs) are stored there as well.
Pros:
Single checkout from DEV perspective. When developers checkout the trunk or one of its branches, the buildfile is right there. They don't need to search for it elsewhere. A simple ant build after checkout is all they need.
When changes to the build/packaging process are done on trunk that require some reorganization in code (different file locations, support for compiler constants, etc), I need not worry about breaking existing branches, since each branch gets its own revision.
Cons:
When changes to the build/packaging process are done on trunk that improve the process and fix bugs, I now need to worry about merging those changes to all active branches, which means having to keep track of all dev/feature branches in addition to release branches.
No reusability. A technologically identical project, that only requires a few switches/property changes to the buildfile should be able to use identical buildfile. But because they are spread across multiple project locations (in addition to multiple branches, as from the point above), it becomes a nightmare to do a generic improvement that affects all those locations. Mainly due to the fact that no matter what, these files end up with little "patch-works" here and there, and eventually with conflicting merges and ever-so-slightly different processes that cannot be resolved without putting one of the projects on hold and modifying that process to "catch up" with the other.
Buildfile separate from code
To address the cons of the previous scenario in regards to re-using a single file and avoid a plethora of small fixes all over the place, I was thinking of keeping the build file separate. Shareable between the trunk, branches and other similar projects.
Pros:
Single file to modify, improve and bugfix, reusable by multiple other projects.
Cons:
No "single checkout" for DEVs (but it can be solved with svn externals or other linking solution
Breaking old/existing builds. Since there is only one version of the file now, introducing an improvement that requires code restructuring would make it incompatible with older branches. When that older branch needs to be rebuild (urgent fix to already released software), the build file will no longer work. Yes, it's solvable by getting a previous revision of the file, however:
It is not directly obvious which previous revision had worked with this branch
The older revision may be missing some other critical bugfixes to the build process.
Toss up question
So for me it is a toss up between making my life easier and only maintaining one file for bugfixes and improvements, and thus ensuring that projects use identical processes, latest bugfixes to the build process, etc. Or making developer's life easier by providing a single point of checkout, and ensuring branch "stability/re-buildability" because the buildfile that's checked in with the branch is guaranteed to work with that branch.
Is there a proper way for this? What is the proper way for this? Am I approaching this wrong?
I would suggest tagging your code with a version number every time you build. IF you ever want to roll back to a revision , you can always create a new branch from the tag of that revision and use the build.xml from that revision
You can also publish all your artefacts into a revision named directory. If you want to rollback you can just use the artefacts from this directory instead of going through the build process again .
Your build file should always be part of your code . That way a developer can check out his/her code into an IDE and start building from there for his local testing.
If you have env related configurations that are different per environment you should separate it out into a deployment configuration that is used when the code is deployed.
If you want to reuse your build files across projects , you can create a master build file with macros that is fetched prior to starting a build. The only thing you need to do is override the macros in your local project if you want to override the default behaviour

Starting to version an already medium size project

I am about to start participating in the development of a medium-sized project (~50k lines) that was until now written by a single person, and not versioned; as a result folders are cluttered with different versions of the same file (named file1, file2, file3, etc.).
I proposed to start using a VCS for it (a priori Mercurial, which is the only one I've ever used -- for my personal projects --, but I'm open to suggestions), so I'm taking any good ideas as to how to "start" the repository. E.g., should I make an initial commit with all the existing files, and immediately make a new commit with the unused files removed? Or something else?
(constructive remarks on mercurial vs bazaar vs git vs whatever are also welcome.)
Thanks for your tips.
E.g., should I make an initial commit with all the existing files, and immediately make a new commit with the unused files removed?
If the size of the repository is not a concern, then yes, that is a good starting point. Otherwise you can just commit what's actually used, and go from there.
As for which system, all DVCSes stick to the same core principles. Which one you pick is entirely subjective — the only way to truly know which one you like is to try each one.
I would say use what you are the most comfortable with and meets your needs. As far as where to start, I personally would seed the repo with the current source as is, that way you can verify that everything builds and runs as expected. you can make this initial seed a branch. That way you can always go back to your starting point before refactoring.
My approach to this was:
create a Mercurial repository in the existing project folder ("existing")
commit all project files to "existing"
create an empty repository in what a different location ("new")
As files are tested and QA'd (this was necessary because there was so much dross in "existing") pull them from "everything" to "new".
Once files had been pulled into "new"; delete the corresponding files from "existing". If access is needed to these files while the migration is under way, push them back from "new" to "existing".
This gave me the advantage of putting everything under some sort of control for recovery purposes, control over introducing the project to the DVCS. Eventually the existing project folder became completely tested and approved for the project moving forward. At this point the "everything" directory could be deleted or changed into a working folder; and "new" became the actual project folder.
I think Mercurial is a good choice. Lightweight, fast, very simple to use and well-integrated with Windows (if that's the platform you're dealing with).
I would probably get rid of all the clutter before the first commit. Delete everything you don't care about, run all the necessary tests and only then do the commit.
Yes, I'm dead set against the 0-day cluttering of repos.
Granted, a 50K SLOC project isn't very big, but if you commit files you already know you won't need, they will make your repo slightly bigger.
Also, remember to check that the tree doesn't contain large binary files. If it does, get rid of them if at all possible.

[SVN]: Synchronizing folders

I have a folder where I keep checked-out version from Aptana Subversive SVN plugin. I have another folder where the checked-out copy from Eclipse resides. Both, Aptana & Eclipse, are using the same repository. Though the repository is the same, but I am using two different working folders. Sometimes I use Eclipse to work with the same set of files in the repository and sometimes I use Aptana.
I want a tool that can synchronize the two working folders automatically. Is there any free tool?
Actually, SVN is the tool to do just that. If you fight SVN, you will run into trouble, because you might not have both working copies updated to the same the same revision, the merge tool messes up the hidden .svn folders and whatnot.
Why do you think you need to manually synchronize those two working copies? If you want to work on both simultaneously without disrupting other's work because you keep checking in half-baked things, consider working on a branch. Doing so, you make use of SVN, which was designed to keep two working copies in sync. If you're done with whatever you're doing, merge that branch into the trunk (or whatever branch you were working at) and throw it away.
If you feel like all this checking in might make your repository become too big, get a bigger disk to store it on. The very first time you or that tool messes up manual merging, it would have payed off. If you're afraid of bumping SVN's revision count without doing actual work, get a grip.
Araxis Merge has automated merge.

When to use a Tag/Label and when to branch?

Using TFS, when would you label your code and when would you branch?
Is there a concept of mainline/trunk in TFS?
A label in TFS is a way of tagging a collection of files. The label contains a bunch of files and the version of the file. It is a very low cost way of marking which versions of files make up a build etc.
A branch can be thought of as a copy of the files (of a certain version) in a different directory in TFS (with TFS knowing that this is a branch and will remember what files and versions it was a branch of).
As Eric Sink says, a branch is like a puppy. It takes some care and feeding.
Personally, I label often but branch rarely. I create a label for every build, but only branch when I know that I need to work on a historical version or that I need to work in isolation from the main line of code. You can create a branch from any point in time (and also a label) so that works well and means that we don't have branches lying around that are not being used.
Hope that helps,
Martin.
In any VCS, one usually tags when you want a snapshot of the code, to be kept as reference for the future. You branch when you want to develop a new feature, without disturbing the current code.
Andrew claims that labeling is lazier than branching; it's actually more efficient in most cases, not lazy. Labeling can allow users to grab a project at any point in time, keep a history of files changed for a version or build, and branch off of/work with the code at any point and later merge back into the main branch. Instead of what Andrew said, you're advised to only branch when more than one set of binaries is desired- when QC and Dev development are going on simultaneously or when you need to apply a hotfix to an old version, for example.
I always see labels as the lazy man's branch. If you are going to do something so significant that it requires a full-source label then it is probably best to denote this with a branch so that all tasks associated with that effort are in an organized place with only the effected code.
Branching is very powerful however and something worth learning about. TFS is not the best source control but it is not the worst either. TFS does support the concept of a trunk from which all branches sprout as well.
I would recommend this as a good place to read up on best practices - at least as far as TFS is concerned.