Version control of deliverables - version-control

We need to regularly synchronize many dozens of binary files (project executables and DLLs) between many developers at several different locations, so that every developer has an up to date environment to build and test at. Due to nature of the project, updates must be done often and on-demand (overnight updates are not sufficient). This is not pretty, but we are stuck with it for a time.
We settled on using a regular version (source) control system: put everything into it as binary files, get-latest before testing and check-in updated DLL after testing.
It works fine, but a version control client has a lot of features which don't make sense for us and people occasionally get confused.
Are there any tools better suited for the task? Or may be a completely different approach?
Update:
I need to clarify that it's not a tightly integrated project - more like extensible system with a heap of "plugins", including thrid-party ones. We need to make sure those modules-plugins works nicely with recent versions of each other and the core. Centralised build as was suggested was considered initially, but it's not an option.

I'd probably take a look at rsync.
Just create a .CMD file that contains the call to rsync with all the correct parameters and let people call that. rsync is very smart in deciding what part of files need to be transferred, so it'll be very fast even when large files are involved.
What rsync doesn't do though is conflict resolution (or even detection), but in the scenario you described it's more like reading from a central place which is what rsync is designed to handle.

Another option is unison

You should look into continuous integration and having some kind of centralised build process. I can only imagine the kind of hell you're going through with your current approach.
Obviously that doesn't help with the keeping your local files in sync, but I think you have bigger problems with your process.

Building the project should be a centralized process in order to allow for better control soon your solution will be caos in the long run. Anyway here is what I'd do.
Create the usual repositories for
source files, resources,
documentation, etc for each project.
Create a repository for resources.
There will be the latest binary
versions for each project as well as
any required resources, files, etc.
Keep a good folder structure for
each project so developers can
"reference" the files directly.
Create a repository for final buidls
which will hold the actual stable
release. This will get the stable
files, done in an automatic way (if
possible) from the checked in
sources. This will hold the real
product, the real version for
integration testing and so on.
While far from being perfect you'll be able to define well established protocols. Check in your latest dll here, generate the "real" versión from latest source here.

What about embedding a 'what' string in the executables and libraries. Then you can synchronise the desired list of versions with a manifest.
We tend to use CVS id strings as a part of the what string.
const char cvsid[] = "#(#)INETOPS_filter_ip_$Revision: 1.9 $";
Entering the command
what filter_ip | grep INETOPS
returns
INETOPS_filter_ip_$Revision: 1.9 $
We do this for all deliverables so we can see if the versions in a bundle of libraries and executables match the list in a associated manifest.
HTH.
cheers,
Rob

Subversion handles binary files really well, is pretty fast, and scriptable. VisualSVN and TortoiseSVN make dealing with Subversion very easy too.
You could set up a folder that's checked out from Subversion with all your binary files (that all developers can push and update to) then just type "svn update" at the command line, or use TortoiseSVN: right click on the folder, click "SVN Update" and it'll update all the files and tell you what's changed.

Related

Should I put included code under SCM?

I'm developing a web app.
If I include a jQuery plugin (or the jQuery file itself), this has to be put under my static directory, which is under SCM, to be served correctly.
Should I gitignore it, or add it, even if I don't plan on modifying anything from it?
And what about binary files (graphic resources) that might come with it?
Thanks in advance for any advice!
My view is that everything you need for your application to run correctly needs to be managed. This includes third-party code.
If you don't put it under SCM, how is it going to get deployed correctly on your production systems? If you have other ways of ensuring that, that's fine, but otherwise you run the risk that successful deployment is a matter of people remembering to do all the right things, rather than some automated low-risk "push the button" procedure.
If you don't manage it under SCM or something similar, how do you ensure that the versions you develop against and test against are the same? And that they're the same as production? Debugging an issue caused by a version difference you don't notice can be horrible.
I generally add external resources to my project directly. Doing so facilitates deployment and ensures that if someone changes the version of this file in your project, you have a clear audit history of what happened in case it causes issues in the code that you've written. Developers should know not to modify these external resources.
You could use something like git submodules, I suppose, but I haven't felt that this is worth the hassle in the past.
Binary files from external sources can be checked in to the project as well, although if they're extremely large you may want to consider a different approach.
There aren't a lot of reasons not to put external resources like jQuery into your repo:
If you pull it down from jQuery every time you check out or deploy, you have less control over which version you're using. This holds true for most third-party libraries; you probably don't want to upgrade your libraries without testing with your code to see if it breaks something.
You'll always have a complete copy of your site when you check out your repository and you won't need to go seeking resources that may have become unavailable.
For small (in terms of filesize) things like jQuery and images, I'd just add them unless you're really, really concerned about space.
It depends.
These arguments relate to having a copy of the library on your system and not pulling it from it's original location.
Arguments in favour:
It will ensure that everything needed for your project can be found in one place when someone else joins your development team. I've lost count of the number of times I've had to scramble around looking for the right versions of libraries in order to be able to get something working.
If you make any modifications to the library you can make these changes to the source controlled version so when a new version comes out you use the source control's merging tools to ensure your edits don't go missing.
Arguments against:
It could mean everyone has a copy of the library locally - unless you map the 3rd party tools to a central server.
Deploying could be problematical - again unless you map the 3rd party tools to a central server and don't include them in the deploy script.

Do you put your development/runtime tools in the repository?

Putting development tools (compilers, IDEs, editors, ...) and runtime environments (jre, .net framework, interpreters, ...) under the version control has a couple of nice reasons. First, you can easily compile/run your program just by checking out your repository. You don't have to have anything else. Second, the triple is surely version compatible as you once tested it. However, it has its own drawbacks. The main one is the big volume of large binary files that must be put under version control system. That may cause the VCS slower and the backup process harder. What's your idea?
Tools and dependencies actually used to compile and build the project, absolutely - it is very useful if you ever have to debug an issue or develop a fix for an older version and you've moved on to newer versions that aren't quite compatible with the old ones.
IDE's & editors no - ideally you're project should be buildable from a script so these would not be necessary. The generated output should still be the same regardless of what you used to edit the source.
I include a text (and thus easily diff-able) file in every project root called "How-to-get-this-project-running" that includes any and all things necessary, including the correct .net version and service packs.
Also for proprietry IDE's (e.g. Visual Studio), there can be licensing issues as this makes it difficult to manage who is using which pieces of software.
Edit:
We also used to store batch files that automatically checked out the source code automatically (and all dependencies) in source control. Developers just check out the "Setup" folder and run the batch scripts, instead of having to search the repository for appropriate bits and pieces.
What I find is very nice and common (in .Net projects I have experience with anyway) is including any "non-default install" dependencies in a lib or dependencies folder with source control. The runtime is provided by the GAC and kind of assumed.
First, you can easily compile/run your program just by checking out your repository.
Not true: it often isn't enough to just get/copy/check out a tool, instead the tool must also be installed on the workstation.
Personally I've seen libraries and 3rd-party components in the source version control system, but not the tools.
I keep all dependencies in a folder under source control named "3rdParty". I agree that this is very convinient and you can just pull down the source and get going. This really shouldnt affect the performance of the source control.
The only real draw back is that the initial size to pull down can be fairly large. In my situation anyone who pulls downt he code usually will run it also, so it is ok. But if you expect many people to pull down the source just to read then this can be annoying.
I've seen this done in more than one place where I worked. In all cases, I've found it to be pretty convenient.

What to put under version control?

Almost any IDE creates lots of files that have nothing to do with the application being developed, they are generated and mantained by the IDE so he knows how to build the application, where the version control repository is and so on.
Should those files be kept under version control along with the files that really have something to do with the aplication (source code, application's configuration files, ...)?
The things is: on some IDEs if you create a new project and then import it into the version-control repository using the version-control client/commands embedded in the IDE, then all those files are sent to the respitory. And I'm not sure that's right: what is two different developers working on the same project want to use two different IDEs?
I want to keep this question agnostic avoiding references to any particular IDE, programming language or version control system. So this question is not exactly the same as these:
SVN and binaries - but this talks about binaries and SVN
Do you keep your build tools in version control? - but this talks about build tools (e.g. putting the jdk under version control)
What project files shouldn’t be checked into SVN - but this talks about SVN and dll's
Do you keep your project files under version control? - very similar (haven't found it before), thanks VonC
Rules of thumb:
Include everything which has an influence on the build result (compiler options, file encodings, ASCII/binary settings, etc.)
Include everything to make it possible to open the project from a clean checkout and being able to compile/run/test/debug/deploy it without any further manual intervention
Don't include files which contain absolute paths
Avoid including personal preferences (tab size, colors, window positions)
Follow the rules in this order.
[Update] There is always the question what should happen with generated code. As a rule of thumb, I always put those under version control. As always, take this rule with a grain of salt.
My reasons:
Versioning generated code seems like a waste of time. It's generated right? I can get it back at a push of a button!
Really?
If you had to bite the bullet and generate the exact same version of some previous release without fail, how much effort would it be? When generating code, you not only have to get all the input files right, you also have to turn back time for the code generator itself. Can you do that? Always? As easy as it would be to check out a certain version of the generated code if you had put it under version control?
And even if you could, could you ever be sure that didn't miss something?
So on one hand, putting generated code under version control make sense since it makes it dead easy to do what VCS are meant for: Go back in time.
Also it makes it easy to see the differences. Code generators are buggy, too. If I fix a bug and have 150'000 files generated, it helps a lot when I can compare them to the previous version to see that a) the bug is gone and b) nothing else changed unexpectedly. It's the unexpected part which you should worry about. If you don't, let me know and I'll make sure you never work for my company ever :-)
The major pain point of code generators is stability. It doesn't do when your code generator just spits out a random mess of bytes every time you run (well, unless you don't care about quality). Code generators need to be stable and deterministic. You run them twice with the same input and the output must be identical down to least significant bit.
So if you can't check in generated code because every run of the generator creates differences that aren't there, then your code generator has a bug. Fix it. Sort the code when you have to. Use hash maps that preserve order. Do everything necessary to make the output non-random. Just like you do everywhere else in your code.
Generated code that I might not put under version control would be documentation. Documentation is somewhat of a soft target. It doesn't matter as much when I regenerate the wrong version of the docs (say, it has a few typos more or less). But for releases, I might do that anyway so I can see the differences between releases. Might be useful, for example, to make sure the release notes are complete.
I also don't check in JAR files. As I do have full control over the whole build and full confidence that I can get back any version of the sources in a minute plus I know that I have everything necessary to build it without any further manual intervention, why would I need the executables for? Again, it might make sense to put them into a special release repo but then, better keep a copy of the last three years on your company's web server to download. Think: Comparing binaries is hard and doesn't tell you much.
I think it's best to put anything under version control that helps developers to get started quickly, ignoring anything that may be auto-generated by an IDE or build tools (e.g. Maven's eclipse plugin generates .project and .classpath - no need to check these in). Especially avoid files that change often, that contain nothing but user preferences, or that conflict between IDEs (e.g. another IDE that uses .project just like eclipse does).
For eclipse users, I find it especially handy to add code style (.settings/org.eclipse.jdt.core.prefs - auto formatting on save turned on) to get consistently formatted code.
Everything that can be automatically generated from the source+configuration files should not be under the version control! It only causes problems and limitations (like the one you stated - using 2 different project files by different programmers).
Its true not only for IDE "junk files" but also for intermediate files (like .pyc in python, .o in c etc).
This is where build automation and build files come in.
For example, you can still build the project (the two developers will need the same build software obviously) but they then could in turn use two different IDE's.
As for the 'junk' that gets generated, I tend to ignore most if it. I know this is meant to be language agnostic but consider Visual Studio. It generates user files (user settings etc..) this should not be under source control.
On the other hand, project files (used by the build process) most certainly should. I should add that if you are on a team and have all agreed on an IDE, then checking in IDE specific files is fine providing they are global and not user specific and/or not needed.
Those other questions do a good job of explaining what should and shouldn't be checked into source control so I wont repeat them.
In my opinion it depends on the project and environment. In a company environment where everybody is using the same IDE it can make sense to add the IDE files to the repository. While this depends a bit on the IDE, as some include absolute paths to things.
For a project which is developed in different environments it doesn't make sense and will be pain in the long run as the project files aren't maintained by all developers and make it harder to find "relevant" things.
Anything that would be devastating if it were lost, should be under version control.
In my opinion, anything needed to build the project (code, make files, media, databases with required program info, etc) should be in repositories. I realise that especially for media/database files this is contriversial, but to me if you can't branch and then hit build the source control's not doing it's job. This goes double for distributed systems with cheap branch creation/merging.
Anything else? Store it somewhere different. Developers should choose their own working environment as much as possible.
From what I have been looking at with version control, it seems that most things should go into it - e.g. source code and so on. However, the problem that many VCS's run into is when trying to handle large files, typically binaries, and at times things like audio and graphic files. Therefore, my personal way to do it is to put the source code under version control, along with general small sized graphics, and leave any binaries to other systems of management. If it is a binary that I created myself using the build system of the IDE, then that can definitily be ignored, because it is going to be regenerated every build. For dependancy libraries, well this is where dependancy package managers come in.
As for IDE generated files (I am assuming these are ones that aren't generated during the build process, such as the solution files for Visual Studio) - well, I think it would depend on whether or not you are working alone. If you are working alone, then go ahead and add them - they will allow you to revert settings in the solution or whatever you make. Same goes for other non-solution like files as well. However, if you are collaborating, then my recomendation is no - most IDE generated files tend to be, well, user specific - aka they work on your machine, but not neccesarily on others. Hence, you may be better of not including IDE generated files in that case.
tl;dr you should put most things that relate to your program into version control, excluding dependencies (things like libraries, graphics and audio should be handled by some other dependancy management system). As for things directly generated by the IDE - well, it would depend on if you are working alone or with other people.

Do you version "derived" files?

Using online interfaces to a version control system is a nice way to have a published location for the most recent versions of code. For example, I have a LaTeX package here (which is released to CTAN whenever changes are verified to actually work):
http://github.com/wspr/pstool/tree/master
The package itself is derived from a single file (in this case, pstool.tex) which, when processed, produces the documentation, the readme, the installer file, and the actual files that make up the package as it is used by LaTeX.
In order to make it easy for users who want to download this stuff, I include all of the derived files mentioned above in the repository itself as well as the master file pstool.tex. This means that I'll have double the number of changes every time I commit because the package file pstool.sty is a generated subset of the master file.
Is this a perversion of version control?
#Jon Limjap raised a good point:
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
That's really the crux of the matter in this case. Yes, released versions of the package can be obtained from elsewhere. So it does really make more sense to only version the non-generated files.
On the other hand, #Madir's comment that:
the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes
is also rather pertinent in that if a user finds a bug and I fix it immediately, they can then head over to the repository and grab the file that's necessary for them to continue working without having to run any "installation" steps.
And this, I think, is the more important use case for my particular set of projects.
We don't version files that can be automatically generated using scripts included in the repository itself. The reason for this is that after a checkout, these files can be rebuild with a single click or command. In our projects we always try to make this as easy as possible, and thus preventing the need for versioning these files.
One scenario I can imagine where this could be useful if 'tagging' specific releases of a product, for use in a production environment (or any non-development environment) where tools required for generating the output might not be available.
We also use targets in our build scripts that can create and upload archives with a released version of our products. This can be uploaded to a production server, or a HTTP server for downloading by users of your products.
I am using Tortoise SVN for small system ASP.NET development. Most code is interpreted ASPX, but there are around a dozen binary DLLs generated by a manual compile step. Whilst it doesn't make a lot of sense to have these source-code versioned in theory, it certainly makes it convenient to ensure they are correctly mirrored from the development environment onto the production system (one click). Also - in case of disaster - the rollback to the previous step is again one click in SVN.
So I bit the bullet and included them in the SVN archive - the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes.
Not necessarily, although best practices for source control advise that you do not include generated files, for obvious reasons.
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
Normally, derived files should not be stored in version control. In your case, you could build a release procedure that created a tarball that includes the derived files.
As you say, keeping the derived files in version control only increases the amount of noise you have to deal with.
In some cases we do, but it's more of a sysadmin type of use case, where the generated files (say, DNS zone files built from a script) have intrinsic interest in their own right, and the revision control is more linear audit trail than branching-and-tagging source control.

Storing third-party libraries in source control

Should libraries that the application relies on be stored in source control? One part of me says it should and another part say's no. It feels wrong to add a 20mb library that dwarfs the entire app just because you rely on a couple of functions from it (albeit rather heavily). Should you just store the jar/dll or maybe even the distributed zip/tar of the project?
What do other people do?
store everything you will need to build the project 10 years from now.I store the entire zip distribution of any library, just in case
Edit for 2017:
This answer did not age well:-). If you are still using something old like ant or make, the above still applies. If you use something more modern like maven or graddle (or Nuget on .net for example), with dependency management, you should be running a dependency management server, in addition to your version control server. As long as you have good backups of both, and your dependency management server does not delete old dependencies, you should be ok. For an example of a dependency management server, see for example Sonatype Nexus or JFrog Artifcatory, among many others.
As well as having third party libraries in your repository, it's worth doing it in such a way that makes it easy to track and merge in future updates to the library easily (for example, security fixes etc.). If you are using Subversion using a proper vendor branch is worthwhile.
If you know that it'd be a cold day in hell before you'll be modifying your third party's code then (as #Matt Sheppard said) an external makes sense and gives you the added benefit that it becomes very easy to switch up to the latest version of the library should security updates or a must-have new feature make that desirable.
Also, you can skip externals when updating your code base saving on the long slow load process should you need to.
#Stu Thompson mentions storing documentation etc. in source control. In bigger projects I've stored our entire "clients" folder in source control including invoices / bills/ meeting minutes / technical specifications etc. The whole shooting match. Although, ahem, do remember to store these in a SEPARATE repository from the one you'll be making available to: other developers; the client; your "browser source view"...cough... :)
Don't store the libraries; they're not strictly speaking part of your project and uselessy take up room in your revision control system. Do, however, use maven (or Ivy for ant builds) to keep track of what versions of external libraries your project uses. You should run a mirror of the repo within your organisation (that is backed up) to ensure you always have the dependencies under your control. This ought to give you the best of both worlds; external jars outside your project, but still reliably available and centrally accessible.
We store the libraries in source control because we want to be able to build a project by simply checking out the source code and running the build script. If you aren't able to get latest and build in one step then you're only going to run into problems later on.
never store your 3rd party binaries in source control. Source control systems are platforms that support concurrent file sharing, parallel work, merging efforts, and change history. Source control is not an FTP site for binaries. 3rd party assemblies are NOT source code; they change maybe twice per SDLC. The desire to be able to wipe your workspace clean, pull everything down from source control and build does not mean 3rd party assemblies need to be stuck in source control. You can use build scripts to control pulling 3rd party assemblies from a distribution server. If you are worried about controlling what branch/version of your application uses a particular 3rd party component, then you can control that through build scripts as well. People have mentioned Maven for Java, and you can do something similar with MSBuild for .Net.
I generally store them in the repository, but I do sympathise with your desire to keep the size down.
If you don't store them in the repository, the absolutely do need to be archived and versioned somehow, and your build system needs to know how to get them. Lots of people in Java world seem to use Maven for fetching dependencies automatically, but I've not used I, so I can't really recommend for or against it.
One good option might be to keep a separate repository of third party systems. If you're on Subversion, you could then use subversion's externals support to automatically check out the libraries form the other repository. Otherwise, I'd suggest keeping an internal Anonymous FTP (or similar) server which your build system can automatically fetch requirements from. Obviously you'll want to make sure you keep all the old versions of libraries, and have everything there backed up along with your repository.
What I have is an intranet Maven-like repository where all 3rd party libraries are stored (not only the libraries, but their respective source distribution with documentation, Javadoc and everything). The reason are the following:
why storing files that don't change into a system specifically designed to manage files that change?
it dramatically fasten the check-outs
each time I see "something.jar" stored under source control I ask "and which version is it?"
I put everything except the JDK and IDE in source control.
Tony's philosophy is sound. Don't forget database creation scripts and data structure update scripts. Before wikis came out, I used to even store our documentation in source control.
My preference is to store third party libraries in a dependency repository (Artifactory with Maven for example) rather than keeping them in Subversion.
Since third party libraries aren't managed or versioned like source code, it doesn't make a lot of sense to intermingle them. Remote developers also appreciate not having to download large libraries over a slow WPN link when they can get them more easily from any number of public repositories.
At a previous employer we stored everything necessary to build the application(s) in source control. Spinning up a new build machine was a matter of syncing with the source control and installing the necessary software.
Store third party libraries in source control so they are available if you check your code out to a new development environment. Any "includes" or build commands that you may have in build scripts should also reference these "local" copies.
As well as ensuring that third party code or libraries that you depend on are always available to you, it should also mean that code is (almost) ready to build on a fresh PC or user account when new developers join the team.
Store the libraries! The repository should be a snapshot of what is required to build a project at any moment in time. As the project requires different version of external libraries you will want to update / check in the newer versions of these libraries. That way you will be able to get all the right version to go with an old snapshot if you have to patch an older release etc.
Personally I have a dependancies folder as part of my projects and store referenced libraries in there.
I find this makes life easier as I work on a number of different projects, often with inter-depending parts that need the same version of a library meaning it's not always feasible to update to the latest version of a given library.
Having all dependancies used at compile time for each project means that a few years down the line when things have moved on, I can still build any part of a project without worrying about breaking other parts. Upgrading to a new version of a library is simply a case of replacing the file and rebuilding related components, not too difficult to manage if need be.
Having said that, I find most of the libraries I reference are relatively small weighing in at around a few hundred kb, rarely bigger, which makes it less of an issue for me to just stick them in source control.
Use git subprojects, and either reference from the 3rd party library's main git repository, or (if it doesn't have one) create a new git repository for each required library. There's nothing reason why you're limited to just one git repository, and I don't recommend you use somebody else's project as merely a directory in your own.
store everything you'll need to build the project, so you can check it out and build without doing anything.
(and, as someone who has experienced the pain - please keep a copy of everything needed to get the controls installed and working on a dev platform. I once got a project that could build - but without an installation file and reg keys, you couldn't make any alterations to the third-party control layout. That was a fun rewrite)
You have to store everything you need in order to build the project.
Furthermore different versions of your code may have different dependencies on 3rd parties.
You'll want to branch your code into maintenance version together with its 3rd party dependencies...
Personally what I have done and have so far liked the results is store libraries in a separate repository and then link to each library that I need in my other repositories through the use of the Subversion svn:externals feature. This works nice because I can keep versioned copies of most of our libraries (mainly managed .NET assemblies) in source control without them bulking up the size of our main source code repository at all. Having the assemblies stored in the repository in this fashion makes it so that the build server doesn't have to have them installed to make a build. I will say that getting a build to succeed in absence of Visual Studio being installed was quite a chore but now that we got it working we are happy with it.
Note that we don't currently use many commercial third-party control suites or that sort of thing much so we haven't run into licensing issues where it may be required to actually install an SDK on the build server but I can see where that could easily become a problem. Unfortunately I don't have a solution for that and will plan on addressing it when I first run into it.