File version control of binaries / executables that use resources metadata - version-control

I have a large repository that builds many files such as dll:s and exe:s. These files build with resources information attached in the files with meta-information such as file version and company name etc.
If I wish to make an upgrade package of updates which includes only files that have been updated. I have to know whether any changes has been made to built binaries/executables. This causes two problems.
Version control in software like perforce handle source code great but source code in one project does not necessarily mean it doesn't end up in another projects dll.
As we build binaries with version information at the compilation the files "looks" different if they have different fileversion numbers but can be equal in the actual source code.
So basically, how do you manage file versions of binaries/executables in an automated process in order to determine which files that have been built are different from previous version?
One option could be to do the build process twice. Where the first build is with the old version number to compare and then build with the new version number and use. Does anyone have better suggestion or applications to recommend?

If possible, my main suggestion would be to improve the version string generation. If you can't determine from the version string what source the binary was built from, what's the point of even having a version string? Ideally if the source doesn't change, the version string doesn't change either.
If it's not possible for your actual binaries to have useful version strings, my suggestion would be to embed that information (i.e. what source files were used and what the latest change to those source files was) in the change description when you submit the binaries -- that way you can at least look it up after the fact, and can use that information to compare binaries.

Related

Do you put your development/runtime tools in the repository?

Putting development tools (compilers, IDEs, editors, ...) and runtime environments (jre, .net framework, interpreters, ...) under the version control has a couple of nice reasons. First, you can easily compile/run your program just by checking out your repository. You don't have to have anything else. Second, the triple is surely version compatible as you once tested it. However, it has its own drawbacks. The main one is the big volume of large binary files that must be put under version control system. That may cause the VCS slower and the backup process harder. What's your idea?
Tools and dependencies actually used to compile and build the project, absolutely - it is very useful if you ever have to debug an issue or develop a fix for an older version and you've moved on to newer versions that aren't quite compatible with the old ones.
IDE's & editors no - ideally you're project should be buildable from a script so these would not be necessary. The generated output should still be the same regardless of what you used to edit the source.
I include a text (and thus easily diff-able) file in every project root called "How-to-get-this-project-running" that includes any and all things necessary, including the correct .net version and service packs.
Also for proprietry IDE's (e.g. Visual Studio), there can be licensing issues as this makes it difficult to manage who is using which pieces of software.
Edit:
We also used to store batch files that automatically checked out the source code automatically (and all dependencies) in source control. Developers just check out the "Setup" folder and run the batch scripts, instead of having to search the repository for appropriate bits and pieces.
What I find is very nice and common (in .Net projects I have experience with anyway) is including any "non-default install" dependencies in a lib or dependencies folder with source control. The runtime is provided by the GAC and kind of assumed.
First, you can easily compile/run your program just by checking out your repository.
Not true: it often isn't enough to just get/copy/check out a tool, instead the tool must also be installed on the workstation.
Personally I've seen libraries and 3rd-party components in the source version control system, but not the tools.
I keep all dependencies in a folder under source control named "3rdParty". I agree that this is very convinient and you can just pull down the source and get going. This really shouldnt affect the performance of the source control.
The only real draw back is that the initial size to pull down can be fairly large. In my situation anyone who pulls downt he code usually will run it also, so it is ok. But if you expect many people to pull down the source just to read then this can be annoying.
I've seen this done in more than one place where I worked. In all cases, I've found it to be pretty convenient.

Arguments for and against including 3rd-party libraries in version control? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've met quite a few people lately who says that 3rd party libraries doesn't belong in version control. These people haven't been able to explain to me why they shouldn't yet, so I hoped you guys could come to my rescue :)
Personally, I think that when I check the trunk of a project out, it should just work - No need to go to other sites to find libraries. More often than not, you'd end up with multiple versions of the same 3rd party lib for different developers then - and sometimes with incompatibility problems.
Is it so bad to have a libs folder up there, with "guarenteed-to-work" libraries you could reference?
In SVN, there is a pattern used to store third-party libraries called vendor branches. This same idea would work for any other SVN-like version control system. The basic idea is that you include the third-party source in its own branch and then copy that branch into your main tree so that you can easily apply new versions over your local customizations. It also cleanly keeps things separate. IMHO, it's wrong to directly include the third-party stuff in your tree, but a vendor branch strikes a nice balance.
Another reason to check in libraries to your source control which I haven't seen mentioned here is that it gives you the ability to rebuild your application from a specific snapshot or version. This allows you to recreate the exact version that someone may report a bug on. If you can't rebuild the exact version you risk not being able to reproduce/debug problems.
Yes you should (when feasible).
You should be able to take a fresh machine and build your project with as few steps as possible. For me, it's:
Install IDE (e.g. Visual Studio)
Install VCS (e.g. SVN)
Checkout
Build
Anything more has to have very good justification.
Here's an example: I have a project that uses Yahoo's YUI compressor to minify JS and CSS. The YUI .jar files go in source control into a tools directory alongside the project. The Java runtime however, does not--that has become a prereq for the project much like the IDE. Considering how popular JRE is, it seems like a reasonable requirement.
No - I don't think you should put third party libraries into source control. The clue is in the name 'source control'.
Although source control can be used for distribution and deployment, that is not its prime function. And the arguments that you should just be able to check out your project and have it work are not realistic. There are always dependencies. In a web project, they might be Apache, MySQL, the programming runtime itself, say Python 2.6. You wouldn't pile all those into your code repository.
Extra code libraries are just the same. Rather than include them in source control for easy of deployment, create a deployment/distribution mechanism that allows all dependencies to easily be obtained and installed. This makes the steps for checking out and running your software something like:
Install VCS
Sync code
Run setup script (which downloads and installs the correct version of all dependencies)
To give a specific example (and I realise this is quite web centric), a Python web application might contain a requirements.txt file which reads:
simplejson==1.2
django==1.0
otherlibrary==0.9
Run that through pip and the job is done. Then when you want to upgrade to use Django 1.1 you simply change the version number in your requirements file and re-run the setup.
The source of 3rd party software doesn't belong (except maybe as static reference), but the compiled binary do.
If your build process will compile an assembly/dll/jar/module, then only keep the 3rd party source code in source control.
If you won't compile it, then put the binary assembly/dll/jar/module into source control.
This could depend on the language and/or environment you have, but for projects I work on I place no libraries (jar files) in source control. It helps to be using a tool such as Maven which fetches the necessary libraries for you. (Each project maintains a list of required jars, Maven automatically fetches them from a common repository - http://repo1.maven.org/maven2/)
That being said, if you're not using Maven or some other means of managing and automatically fetching the necessary libraries, by all means check them into your version control system. When in doubt, be practical about it.
The way I've tended to handle this in the past is to take a pre-compiled version of 3rd party libraries and check that in to version control, along with header files. Instead of checking the source code itself into version control, we archive it off into a defined location (server hard drive).
This kind of gives you the best of both worlds: a 1 step fetch process that fetches everything you need, but it doesn't bog down your version control system with a bunch of necessary files. Also, by fetching pre-compiled binaries, you can skip that phase of compilation, which makes your builds faster.
You should definitively put 3rd party libraries under the source control. Also, you should try to avoid relying on stuff installed on individual developer's machine. Here's why:
All developers will then share the same version of the component. This is very important.
Your build environment will become much more portable. Just install source control client on a fresh machine, download your repository, build and that's it (in theory, at least :) ).
Sometimes it is difficult to obtain an old version of some library. Keeping them under your source control makes sure you won't have such problems.
However, you don't need to add 3rd party source code in your repository if you don't plan to change the code. I tend just to add binaries, but I make sure only these libraries are referenced in our code (and not the ones from Windows GAC, for example).
We do because we want to have tested an updated version of the vendor branch before we integrate it with our code. We commit changes to this when testing new versions. We have the philosophy that everything you need to run the application should be in SVN so that
You can get new developers up and running
Everyone uses the same versions of various libraries
We can know exactly what code was current at a given point in time, including third party libraries.
No, it isn't a war crime to have third-party code in your repository, but I find that to upset my sense of aesthetics. Many people here seem to be of the opinion that it's good to have your whole developement team on the same version of these dependencies; I say it is a liability. You end up dependent on a specific version of that dependency, where it is a lot harder to use a different version later. I prefer a heterogenous development environment - it forces you to decouple your code from the specific versions of dependencies.
IMHO the right place to keep the dependencies is on your tape backups, and in your escrow deposit, if you have one. If your specific project requires it (and projects are not all the same in this respect), then also keep a document under your version control system that links to these specific versions.
I like to check 3rd party binaries into a "lib" directory that contains any external dependencies. After all, you want to keep track of specific versions of those libraries right?
When I compile the binaries myself, I often check in a zipped up copy of the code along side the binaries. That makes it clear that the code is not there for compiling, manipulating, etc. I almost never need to go back and reference the zipped code, but a couple times it has been helpful.
If I can get away with it, I keep them out of my version control and out of my file system. The best case of this is jQuery where I'll use Google's AJAX Library and load it from there:
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js" type="text/javascript"></script>
My next choice would be to use something like Git Submodules. And if neither of those suffice, they'll end up in version control, but at that point, its only as up to date as you are...

Version control of deliverables

We need to regularly synchronize many dozens of binary files (project executables and DLLs) between many developers at several different locations, so that every developer has an up to date environment to build and test at. Due to nature of the project, updates must be done often and on-demand (overnight updates are not sufficient). This is not pretty, but we are stuck with it for a time.
We settled on using a regular version (source) control system: put everything into it as binary files, get-latest before testing and check-in updated DLL after testing.
It works fine, but a version control client has a lot of features which don't make sense for us and people occasionally get confused.
Are there any tools better suited for the task? Or may be a completely different approach?
Update:
I need to clarify that it's not a tightly integrated project - more like extensible system with a heap of "plugins", including thrid-party ones. We need to make sure those modules-plugins works nicely with recent versions of each other and the core. Centralised build as was suggested was considered initially, but it's not an option.
I'd probably take a look at rsync.
Just create a .CMD file that contains the call to rsync with all the correct parameters and let people call that. rsync is very smart in deciding what part of files need to be transferred, so it'll be very fast even when large files are involved.
What rsync doesn't do though is conflict resolution (or even detection), but in the scenario you described it's more like reading from a central place which is what rsync is designed to handle.
Another option is unison
You should look into continuous integration and having some kind of centralised build process. I can only imagine the kind of hell you're going through with your current approach.
Obviously that doesn't help with the keeping your local files in sync, but I think you have bigger problems with your process.
Building the project should be a centralized process in order to allow for better control soon your solution will be caos in the long run. Anyway here is what I'd do.
Create the usual repositories for
source files, resources,
documentation, etc for each project.
Create a repository for resources.
There will be the latest binary
versions for each project as well as
any required resources, files, etc.
Keep a good folder structure for
each project so developers can
"reference" the files directly.
Create a repository for final buidls
which will hold the actual stable
release. This will get the stable
files, done in an automatic way (if
possible) from the checked in
sources. This will hold the real
product, the real version for
integration testing and so on.
While far from being perfect you'll be able to define well established protocols. Check in your latest dll here, generate the "real" versiĆ³n from latest source here.
What about embedding a 'what' string in the executables and libraries. Then you can synchronise the desired list of versions with a manifest.
We tend to use CVS id strings as a part of the what string.
const char cvsid[] = "#(#)INETOPS_filter_ip_$Revision: 1.9 $";
Entering the command
what filter_ip | grep INETOPS
returns
INETOPS_filter_ip_$Revision: 1.9 $
We do this for all deliverables so we can see if the versions in a bundle of libraries and executables match the list in a associated manifest.
HTH.
cheers,
Rob
Subversion handles binary files really well, is pretty fast, and scriptable. VisualSVN and TortoiseSVN make dealing with Subversion very easy too.
You could set up a folder that's checked out from Subversion with all your binary files (that all developers can push and update to) then just type "svn update" at the command line, or use TortoiseSVN: right click on the folder, click "SVN Update" and it'll update all the files and tell you what's changed.

Do you version "derived" files?

Using online interfaces to a version control system is a nice way to have a published location for the most recent versions of code. For example, I have a LaTeX package here (which is released to CTAN whenever changes are verified to actually work):
http://github.com/wspr/pstool/tree/master
The package itself is derived from a single file (in this case, pstool.tex) which, when processed, produces the documentation, the readme, the installer file, and the actual files that make up the package as it is used by LaTeX.
In order to make it easy for users who want to download this stuff, I include all of the derived files mentioned above in the repository itself as well as the master file pstool.tex. This means that I'll have double the number of changes every time I commit because the package file pstool.sty is a generated subset of the master file.
Is this a perversion of version control?
#Jon Limjap raised a good point:
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
That's really the crux of the matter in this case. Yes, released versions of the package can be obtained from elsewhere. So it does really make more sense to only version the non-generated files.
On the other hand, #Madir's comment that:
the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes
is also rather pertinent in that if a user finds a bug and I fix it immediately, they can then head over to the repository and grab the file that's necessary for them to continue working without having to run any "installation" steps.
And this, I think, is the more important use case for my particular set of projects.
We don't version files that can be automatically generated using scripts included in the repository itself. The reason for this is that after a checkout, these files can be rebuild with a single click or command. In our projects we always try to make this as easy as possible, and thus preventing the need for versioning these files.
One scenario I can imagine where this could be useful if 'tagging' specific releases of a product, for use in a production environment (or any non-development environment) where tools required for generating the output might not be available.
We also use targets in our build scripts that can create and upload archives with a released version of our products. This can be uploaded to a production server, or a HTTP server for downloading by users of your products.
I am using Tortoise SVN for small system ASP.NET development. Most code is interpreted ASPX, but there are around a dozen binary DLLs generated by a manual compile step. Whilst it doesn't make a lot of sense to have these source-code versioned in theory, it certainly makes it convenient to ensure they are correctly mirrored from the development environment onto the production system (one click). Also - in case of disaster - the rollback to the previous step is again one click in SVN.
So I bit the bullet and included them in the SVN archive - the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes.
Not necessarily, although best practices for source control advise that you do not include generated files, for obvious reasons.
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
Normally, derived files should not be stored in version control. In your case, you could build a release procedure that created a tarball that includes the derived files.
As you say, keeping the derived files in version control only increases the amount of noise you have to deal with.
In some cases we do, but it's more of a sysadmin type of use case, where the generated files (say, DNS zone files built from a script) have intrinsic interest in their own right, and the revision control is more linear audit trail than branching-and-tagging source control.

Solution deployment, CM, InstallShield

People,
We have 4 or 5 utilities that work in conjunction with our application. These utilities are either .bat files, or VB apps, PowerBuilder, etc. I am trying to manage these utils in source control, and am trying to figure out a better way to assign versions to them. Right now, the developers use the version control's meta-data -- specifically label -- to store the version number of the tool.
My goal is to have individual InstallShield packages for each utility, and an easy means to manage and assign version numbers to these packages.
Would you recommend a separate .ini file with the info, or store the info in InstallShield .ism file itself, or just use the meta-data info from version control tool?
UPDATE:
I like the idea Orion. I have one concern though. The script that increments the version number... it can not be intelligent enough to increment Major number etc. right. e.g. if one of the utils has version 1.2.3 and we are at a point where the new version is 2.0.0. The script may not be able to handle this.
I think this has to do a lot with our branching techniques -- we don't have any. The folks thought since the utils are so small, the source may not need branches.
PowerBuilder in particular has a nice trick you can do to incorporate the build number from an ini file into the compiled application.
Details here: http://www.pbdr.com/pbtips/ex/autorev.htm
We have ini file inside source control that stores the build number and its value is used in our build scripts to determine what label to apply to the source tree after a successful build. Works very nicely for our needs. When we branch, we do have to manually kick the file to increment the proper number though.
I managed our build system at my last job, which seemed to have some parallels to what you're asking.
There were ~30 C++ projects which needed compiling, and various .NET/Java things, and the odd perl script.
This was all built on our build machine using NAnt - If I were doing it today I'd use rake, but the idea is the same.
We basically had an auto-incrementing build number which was stored in a version.txt file in the root of the repository.
Each time we did a build (automatically done each night, or also on-demand if neccessary) the script would increment this number and check the file back into source control.
All the other apps referenced this file for their version number, or for things which didn't support working like this, the script would set environment variables or perform other workarounds
I'm pretty sure that our installshield programs referenced an environment variable for their version number, but we deprecated them in favour of wix as installshield really did suck
in the case of visual studio, grep/replace the number within the .csproj files, and check them back in
Hope this gives you some ideas
Using the meta data from your version control system should keep things simpler. It's how your developers already use the system. There is no additional file to maintain. My personal experience has taught me to version the satellite applications with the same as version as the main app. K.I.S.S