Buildfile with trunk/branch or separate - version-control

The issue that I am trying to figure out is whether it is best to keep the buildfile together with the source code, i.e. trunk and branches, or in some separate location (obviously still under SCM).
The question: (to keep in mind while reading the rest of the text) Buildfile that ensures branch re-buildability at any time (at expense of maintenance), or one that ensures latest bugfixes/improvements (to build process) are quickly used by multiple branches and different projects (at expense of backward compatibility with older branches)
It doesn't matter what we are building, or what technologies are used, but just for the sake of completeness: making a mobile application, building/packaging with ANT, using SVN for SCM.
Buildfile = instructions for your builder/compiler/packager to compile and package an application from source.
Buildfile with code
This is what we have right now. Ant's build.xml is stored alongside the main code in SVN. A number of other supporting "packaging" files (Apple's provisioning profiles and certs) are stored there as well.
Pros:
Single checkout from DEV perspective. When developers checkout the trunk or one of its branches, the buildfile is right there. They don't need to search for it elsewhere. A simple ant build after checkout is all they need.
When changes to the build/packaging process are done on trunk that require some reorganization in code (different file locations, support for compiler constants, etc), I need not worry about breaking existing branches, since each branch gets its own revision.
Cons:
When changes to the build/packaging process are done on trunk that improve the process and fix bugs, I now need to worry about merging those changes to all active branches, which means having to keep track of all dev/feature branches in addition to release branches.
No reusability. A technologically identical project, that only requires a few switches/property changes to the buildfile should be able to use identical buildfile. But because they are spread across multiple project locations (in addition to multiple branches, as from the point above), it becomes a nightmare to do a generic improvement that affects all those locations. Mainly due to the fact that no matter what, these files end up with little "patch-works" here and there, and eventually with conflicting merges and ever-so-slightly different processes that cannot be resolved without putting one of the projects on hold and modifying that process to "catch up" with the other.
Buildfile separate from code
To address the cons of the previous scenario in regards to re-using a single file and avoid a plethora of small fixes all over the place, I was thinking of keeping the build file separate. Shareable between the trunk, branches and other similar projects.
Pros:
Single file to modify, improve and bugfix, reusable by multiple other projects.
Cons:
No "single checkout" for DEVs (but it can be solved with svn externals or other linking solution
Breaking old/existing builds. Since there is only one version of the file now, introducing an improvement that requires code restructuring would make it incompatible with older branches. When that older branch needs to be rebuild (urgent fix to already released software), the build file will no longer work. Yes, it's solvable by getting a previous revision of the file, however:
It is not directly obvious which previous revision had worked with this branch
The older revision may be missing some other critical bugfixes to the build process.
Toss up question
So for me it is a toss up between making my life easier and only maintaining one file for bugfixes and improvements, and thus ensuring that projects use identical processes, latest bugfixes to the build process, etc. Or making developer's life easier by providing a single point of checkout, and ensuring branch "stability/re-buildability" because the buildfile that's checked in with the branch is guaranteed to work with that branch.
Is there a proper way for this? What is the proper way for this? Am I approaching this wrong?

I would suggest tagging your code with a version number every time you build. IF you ever want to roll back to a revision , you can always create a new branch from the tag of that revision and use the build.xml from that revision
You can also publish all your artefacts into a revision named directory. If you want to rollback you can just use the artefacts from this directory instead of going through the build process again .
Your build file should always be part of your code . That way a developer can check out his/her code into an IDE and start building from there for his local testing.
If you have env related configurations that are different per environment you should separate it out into a deployment configuration that is used when the code is deployed.
If you want to reuse your build files across projects , you can create a master build file with macros that is fetched prior to starting a build. The only thing you need to do is override the macros in your local project if you want to override the default behaviour

Related

Is publishing a package to npm necessary when a file that is ignored by npm is changed and published to GitHub?

The project contains a test folder that is ignored by npm but is not ignored by GitHub. When a change occurs in a file under the test folder, should it be also published to npm in order to keep versions matching? Also, in that case, semantic versioning should be increased while there is no change for npm.
Assume that there is a repo in GitHub which has a test folder that is ignored by npm. It also has the package.json file which is tracking version number inside the repo.
Q1. When a change occurred in a file under the test folder, should the patch version number be increased?
Q2. Somehow (if the answer of Q1 is yes, as it is in the first question but there might be other similar cases), when a minor version increase happened in the package.json file, but nothing is changed in the files on npm side, what should be done?
Last edit first:
The short answer to your question is no. The repo will march on ahead of whatever is published in your feeds until you cut a new release.
So the problem here is that you apparently track version numbers either in a file in your repo, or in the commit labels/tags. Don't do that, it's pointless. I know it's a very common practice (I've made that mistake myself), but it is born out of lazy thinking. Repo's are not the right kind of database to track this sort of thing. Only published packages need to have any kind of version label on them. The data-flow should be from repo => build system => package repository. The arrows should never be reversed.
When you apply that rule, your question is moot. Your test content is a separate package feed from your release content (NPM). It already has repo hashes, that are unique, unambiguous and immutable. Those hashes should flow to the build system, then to the package feed/repository system.
There is always a difference in the release date/time stamps between test and product development. Product releases always lag test releases. The purpose of a particular version of the test suite, is to validate a product release. So you should always see test suite X.Y.Z+repoHashN, precede product version X.Y.Z+repoHashN. Note that the values of X, Y and Z for test and product, need not have the same values (a product patch, might be the result of fixing known bug surfaced by the test suite), but there should always be a 'repoHashN' that uniquely matches a test version triple and product version triple.
You produce two products, an app and a test suite, from the same repo, and most of the tooling out there tracks a version per branch that is applied to all packages produced from that branch. I can't directly address any specific NPM behavioral issues, I don't use it enough, but I am fairly sure you are not encountering a bug, rather you are most likely using it incorrectly.
Because your two products develop on separate cadences, you should consider maintaining a test branch for versioning test code. There's a seemingly endless variety of possibilities for your workflows here.
I recommend using a master or main branch for the tip of all development. Always cut prerelease (-<#>+master.<repoHash>) versions from this branch, and you might as well only ever bump either the patch number or the prerelease tag number (NPM supports either scenario). Then, when you are ready to cut a release, you fork master or main into a release branch (named after the target major.minor version) and cut only release packages out of that branch. Because your test code version is not relevant to the release package version, you don't need to track that specifically, it's always updated when you merge from main/master or test/dev branches to cut the next patch level release. The patch level of the release branch only gets bumped when you are ready to release it to the public.
Do test development in a test branch to hide the churn that doesn't need to show up in the master/main history. A dozen "BUG:### WIP" entries aren't needed there so you squash merge from test to master or main. Same thing holds for product code development. Do that in a Dev branch. Test and dev branches should only ever cut packages with something like a -<#>+<devNam>.<repoHash> tag on them and only published to private feeds.
The purpose of master/main branch is to provide frequent (at least daily) build and test cycles that include merged content from test and dev branches. This is where you maintain ground truth. At any given time you should be able to safely fork a new branch from here and count on it to build a product that meets your quality specs. This allows you the freedom to cut release branches at whatever cadence you need, independent of the current state of the test or dev branches. Test and dev, should attempt to merge their work into master or main, at least daily, and immediately fix any merge conflicts or build failures.
In the case where test == dev, you can combine them into a single branch. Generally, each developer will have many branches in progress, for independent work on various tasks. Having them merge directly to main or master is a valid workflow and even considered a best practice in many shops. Sometimes, too many cooks in the kitchen however, can cause problems, and when you find your dev's are spending too much time resolving merge conflicts with master or main. You'll want separate branches for different lines of effort, for them to stage their work into, before taking it to master. Then flow into master or main isn't quite so random, and will be easier to manage.

Project files under version control?

I work on a large project where all the source files are stored in a version control except the project files. This was the lead developer's decision. His reasoning was:
Its to time consuming to reconcile the differences among developers' working directories.
It allows developers to work independently until their changes are stable
Instead, a developer initially gets a copy of a fellow developer's project files. Then when new files are added each developer notifies all the rest about the change. This strikes me as far more time consuming in the long run.
In my opinion the supposed benefits of not tracking changes to the project files are outweighed by the danger. In addition to references to its needed source files each project file has configuration settings that would be very time consuming and error prone to reproduce if it became corrupted or there was a hardware failure. Some of them have source code embedded in them that would be nearly impossible to recover.
I tried to convince the lead that both of his reasons can be accomplished by:
Agreeing on a standard folder structure
Using relative paths in the project files
Using the version control system more effectively
But so far he's unwilling to heed my suggestions. I checked the svn log and discovered that each major version's history begins with an Add. I have a feeling he doesn't know how to use the branching feature at all.
Am I worrying about nothing or are my concerns valid?
Your concerns are valid. There's no good reason to exclude project files from the repository. They should absolutely be under version control. You'll need to standardize on a directory structure for automated builds as well, so your lead is just postponing the inevitable.
Here are some reasons to check project (*.*proj) files into version control:
Avoid unnecessary build breaks. Relying on individual developers to notify the rest of the team every time the add, remove or rename a source file is not a sustainable practice. There will be mistakes and you will end up with broken builds and your team will waste valuable time trying to determine why the build broke.
Maintain an authoritative source configuration. If there are no project files in the repository, you don't have enough information there to reliably build the solution. Is your team planning to deliver a build from one of your developer's machines? If so, which one? The whole point of having a source control repository is to maintain an authoritative source configuration from which you build and deliver releases.
Simplify management of your projects. Having each team member independently updating their individual copies of your various project files gets more complicated when you introduce project types that not everyone is familiar with. What happens if you need to introduce a WiX project to generate an MSI package or a Database project?
I'd also argue that the two points made in defense of this strategy of not checking in project files are easily refuted. Let's take a look at each:
Its to time consuming to reconcile the differences among developers' working directories.
Source configurations should always be setup with relative paths. If you have hard coded paths in your source configuration (project files, resource files, etc.) then you're doing it wrong. Choosing to ignore the problem is not going to make it go away.
It allows developers to work independently until their changes are stable
No, using version control lets developers work in isolation until their changes are stable. If you each continue to maintain your own separate copies of the project files, as soon as someone checks in a change that references a class in a new source file, you've broken everyone on the team until they stop what they're doing and carefully update their project files. Compare that experience with just "getting latest" from source control.
Generally, a project checked out of SVN should be working, or there should be tools included to make it work (e.g. autogen.sh). If the project file is missing or you need knowledge about which files should be in the project, there is something missing.
Automatically generated files should not be in SVN, as it is pointless to track the changes to these.
Project files with relative path belong under source control.
Files that don't: For example in .Net, I would not put the .suo (user options) web.config (or app.config under source control. You may have developers using different connection strings, etc.
In the case of web.config, I like to put a web.config.example in. That way you copy the file to web.config upon initial checkout and tweak what settings you'd like. If you add something that needs to be added to all web.config, you merge those lines into the .example version and notify the team to merge that into their local version.
I think it depends on the IDE and configuration of the project. Some IDEs have hard-coded absolute paths and that's a real problem with multiple developers working on the same code with different local copies and configurations. Avoid absolute path references to libraries, for example, if you can.
In Eclipse (and Java), it's fine to commit .project and .classpath files (so long as the classpath doesn't have absolute references). However, you may find that using tools like Maven can help having some independence from the IDE and individual settings (in which case you wouldn't need to commit .project, .settings and .classpath in Eclipse since m2eclipse would re-create them for you automatically). This might not apply as well to other languages/environments.
In addition, if I need to reference something really specific to my machine (either configuration or file location), it tend to have my own local branch in Git which I rebase when necessary, committing only the common parts to the remote repository. Git diff/rebase works well: it tends to be able to work out the diffs even if the local changes affect files that have been modified remotely, except when those changes conflict, in which case you get the opportunity to merge the changes manually.
That's just retarded. With a set up like that, I can have a perfectly working project containing files that are subtly different from everyone else. Imagine the havoc this would cause if someone accidentally propagates this mess into QA and everyone is trying to figure out what's going on. Imagine the catastrophe that would ensue if it ever got released to the production environment...!

SVN Branching in Eclipse (Conceptual)

I understand the basic concept of a branch and merge. All of the explanations I've found talk about branching your entire trunk to create a branch project and working on it and then merging it back. Is it possible to branch a subset of a project?
I think an example will help me explain best what I want to do. Suppose I have an application with ten files file0 through file10. All files are interdependent and to be able to test any one file all the others need to be included in the build. I want to work on file0 but don't need to make changes to file1 through file10. Can I branch file0 so changes committed to file0 will update something like myrepos/branches/a-branch/file0 but all the other files in my working copy will simply be from the trunk?
The reason I want to do this is that I'm working on a huge j2ee application with tens of thousands of files and it seems like branching the entire thing will take a really long time. Also, I'm using eclipse with subclipse (and I could be wrong about this) but it seem like if I branch a project in eclipse then I will have to set up a new eclipse project to point to the branch. Unfortunately importing this particular project from SVN to eclipse takes several hours due to the size of the application. It isn't realistic for me to spend this much time.
I suppose that I could have the concepts wrong. Perhaps branching an entire project doesn't require a new working copy at all?
Thanks for any light shed on this issue.
Branching an entire (even) very large tree in Subversion is a very cheap operation, which does lazy (O(1) time) file copying.
You don't necessarily have to change your entire working copy to work on just one changed file. You can use svn switch to switch one file or one directory in your working copy to be a checked out version of the file on the branch.
In Subversion, making a branch is simply making a copy of a hierarchy of directories. Therefore, you can branch a subset, but only if that subset can be defined by a hierarchy of directories.
Can I
branch file0 so changes committed to
file0 will update something like
myrepos/branches/a-branch/file0 but
all the other files in my working copy
will simply be from the trunk?
To answer this question: No, you can't branch a single file. However, what I think you want to do instead is to make a branch and work on file0 there. As you make changes to trunk files, you simply merge them into your branch where you're working on file0.
In this way, you'll always have the latest information from trunk, which will let you test the file0 changes independently of trunk. Then you can use svn switch to move your "file lens" between the trunk and the branch (but beware, Eclipse may complain about such shenanigans).
svn branching is based on lazy copy mechanism, so you can branch safely your all project: that would not take long.
As mentioned in the the question "How do I branch an individual file in SVN?", you could branch a subset, but I believe this would be dangerous with the svn:merginfo properties mechanism: it works better it that property is set from the root of the project.
Branching in SVN is an O(1) operation. Also, as SVN internally employs lazy copying, you only pay a space penalty for what you change.
So if you are unsure, why not go ahead and branch the whole project?
(As quark mentioned, one problem with branching big projects is that, if you checkout several branches/the trunk in parallel, this might take a lot of local disk space.)

Best practices, building trunk against trunk?

We have many projects that use a common base of shared components (dlls).
Currently the development build for each project links against dlls built from the trunk of the components. (ie trunk builds use the dlls from other trunk builds)
When we do a release build, we have a script that goes through the project files and replaces the trunk references to specific numbered versions of the components (that are built from a tagged branch)
I think this weakens the testing that we do during development because the project that I am actually working on is using diferent dlls to what the release build will be using. I would like to always develop against the numbered versions of the components and only ever update them when there is a specific need.
However others in the team argue that unless we develop against trunk (and update to the newer versions of the components with each release) we will have the problem that (a) our products will hardly ever update to the newer version of the components then (b) when we do need to update it will be a huge task because the component source/interfaces will have changed so much.
What practices do you follow, and why?
Edit: Sorry all, I have just realised I have confused things by mentioning that there are several main products sharing components - although they share the components they don't run on the same PCs. My concern relates to the fact the because the components are likely to change with each release of a product (even though there was no specific requirement to update the component) that testing would miss some subtle change that was done in a component and not related to the specific work being done on the product.
Hmm, I may be in a minority here, but this comes down to release management.
Developing against the trunk of a set of shared components means, by definition, that the components are a "moving target" -- a developer using those shared components won't necessarily know if a newly found defect or failure is due to the project code or the shared components, which leads to a loss of productivity, IMNSHO.
The "shared components" have a release cycle all their own. Give your other developers a break and fix the version of the shared components that the projects are going to use and use tags, labels or branches to identify the shared component release. On the next iteration for the projects, bump up to the latest "stable" or "production" build of the shared components.
There's another "smell" here, if you'll pardon the expression. Having "shared components" whose "source/interfaces will have changed so much" between project releases sounds like the components aren't so solid or shouldn't necessarily be shared.
See also the answer to this question Shared components throughout all projects, is there a better alternative than svn:externals?
You should have strong interfaces that rarely change, so changing versions shouldn't be that hard.
Separating the versions and working against specific versions will increase overhead when you need to change, but it should also encourage less interface changes overall, which will help in the long term.
We develop against multiple branches and trunk simultaneously and we have chosen to build and test each branch with the code we'll be pushing out to production. I don't think it is safe any other way.
Basically, if a developer is working on trunk, all they have to do is worry about building from trunk and committing code to trunk.
Any developer working on a branch needs to build and test off that branch (there are multiple projects all branched/tagged the same build/release). When they commit changes to that branch, they must also merge those individual changes into trunk.
We expect all developers to be familiar with SCM (SVN) and to be capable of maintaining multiple branches of code. As a team we handle major framework shifts or huge code changes to minimize troublesome merging.
Two things here. First, I think you're right; you want to build against the most current development versions, not against the old versions. If you haven't already, you will see a situation in which the build-for-release blows up and you have to do an all-nighter cleaning up the diffs.
I'm personally a fan of the "commit to trunk, release from branch" model anyway. All commits go to the trunk, overnight builds or CI builds are against the trunk, and people create branches freely. When you have a trunk that meets acceptance criteria, tag a release candidate, BUT KEEP MAKING UPDATES TO THE TRUNK. If you hae a long release cycle, then you might have changes for release n+1 being added to the trunk, but ideally you should just shorten your release cycle instead. If there are changes to the trunk that shouldn't be in the released version, AND you have a problem that requires changes, create a branch against the tagged version --- and make sure you merge any changes back to the trunk once you have an actual release.
We are using the scons building system, and have our own file in the root directory which specifies what version of each library we're going to use when building the application.
That reduces the need to change version names in several locations like you mentioned.
Whether (b) is a valid argument depends on how often your shared components change and by how much. If they change often in your workplace, it might be a valid that you are "forced" to develop off the newest version. Whether that in itself is a problem is a valid question.
However, from your side of things, I don't see how you can push code into production without it being tested against the shared components being used in production. Do you do a second test cycle against the release build? Do you just pray that nothing breaks? Frankly, (b) can be reversed in these cases to support your point of view: If the trunk is different enough from the most recent tagged branch, then that effort has to be made to ensure your app works properly with it.
If your shared components are tagged often enough, then your colleagues are probably right, and it's easier to manage the incremental changes from the most recent tag to the trunk than it is to manage the change from arbitrary version X (determined at the last build) to arbitrary version Y (determined when you choose to upgrade).

The theory (and terminology) behind Source Control

I've tried using source control for a couple projects but still don't really understand it. For these projects, we've used TortoiseSVN and have only had one line of revisions. (No trunk, branch, or any of that.) If there is a recommended way to set up source control systems, what are they? What are the reasons and benifits for setting it up that way? What is the underlying differences between the workings of a centralized and distributed source control system?
Think of source control as a giant "Undo" button for your source code. Every time you check in, you're adding a point to which you can roll back. Even if you don't use branching/merging, this feature alone can be very valuable.
Additionally, by having one 'authoritative' version of the source control, it becomes much easier to back up.
Centralized vs. distributed... the difference is really that in distributed, there isn't necessarily one 'authoritative' version of the source control, although in practice people usually still do have the master tree.
The big advantage to distributed source control is two-fold:
When you use distributed source control, you have the whole source tree on your local machine. You can commit, create branches, and work pretty much as though you were all alone, and then when you're ready to push up your changes, you can promote them from your machine to the master copy. If you're working "offline" a lot, this can be a huge benefit.
You don't have to ask anybody's permission to become a distributor of the source control. If person A is running the project, but person B and C want to make changes, and share those changes with each other, it becomes much easier with distributed source control.
I recommend checking out the following from Eric Sink:
http://www.ericsink.com/scm/source_control.html
Having some sort of revision control system in place is probably the most important tool a programmer has for reviewing code changes and understanding who did what to whom. Even for single person projects, it is invaluable to be able to diff current code against previous known working version to understand what might have gone wrong due to a change.
Here are two articles that are very helpful for understanding the basics. Beyond being informative, Sink's company sells a great source control product called Vault that is free for single users (I am not affiliated in any way with that company).
http://www.ericsink.com/scm/source_control.html
http://betterexplained.com/articles/a-visual-guide-to-version-control/
Vault info at www.vault.com.
Even if you don't branch, you may find it useful to use tags to mark releases.
Imagine that you rolled out a new version of your software yesterday and have started making major changes for the next version. A user calls you to report a serious bug in yesterday's release. You can't just fix it and copy over the changes from your development trunk because the changes you've just made the whole thing unstable.
If you had tagged the release, you could check out a working copy of it and use it to fix the bug.
Then, you might choose to create a branch at the tag and check the bug fix into it. That way, you can fix more bugs on that release while you continue to upgrade the trunk. You can also merge those fixes into the trunk so that they'll be present in the next release.
The common standard for setting up Subversion is to have three folders under the root of your repository: trunk, branches and tags. The trunk folder holds your current "main" line of development. For many shops and situations, this is all they ever use... just a single working repository of code.
The tags folder takes it one step further and allows you to "checkpoint" your code at certain points in time. For example, when you release a new build or sometimes even when you simply make a new build, you "tag" a copy into this folder. This just allows you to know exactly what your code looked like at that point in time.
The branches folder holds different kinds of branches that you might need in special situations. Sometimes a branch is a place to work on experimental feature or features that might take a long time to get stable (therefore you don't want to introduce them into your main line just yet). Other times, a branch might represent the "production" copy of your code which can be edited and deployed independently from your main line of code which contains changes intended for a future release.
Anyway, this is just one aspect of how to set up your system, but I think giving some thought to this structure is important.