Abusing the word "library" - version-control

I see a lot of questions, both here on SO and elsewhere, about "maintaining
common libraries in a VCS". That is, projects foo and bar both depend on
libbaz, and the questioner is wondering how they should import the source
for libbaz into the VCS for each project.
My question is: WTF? If libbaz is a library, then foo doesn't need its
source code at all. There are some libraries that are reasonably designed
to be used in this manner (eg gnulib), but for the most part foo and bar
ought to just link against the library.
I guess my thinking is: if you cut-and-paste source for a library into
your own source tree, then you obviously don't care about future updates
to the library. If you care about updates, then just link against the
library and trust the library maintainers to maintain a stable API.
If you don't trust the API to remain stable, then you can't blindly
update your own copy of the source anyway, so what is gained?
To summarize the question: why would anyone want to maintain a copy of a
library in the source code for a project rather than just linking against
that library and requiring it as a dependency?
If the only answer is "don't want the dependency", then why not just
distribute a copy of the library along with your app, but keep them
totally separate?

The use of vendor branches to control 3rd party dependencies is discussed in some depth in the Subversion book. As I understand it, the basic advantages are guaranteeing a stable API and uniformity of libraries for all developers, and the ability to control custom modifications in house in the same versioning system.

On the project I'm working on right now, we've got the main code (which is in one Subversion project) and a host of assorted libraries from various places that are in their own Subversion modules. The Visual Studio solution maintains separate projects for each of them and links them together at the end. If we were working on Unix or similar OSs, we'd do the same thing.
The only downside I see is that I sometimes forget to update one of the libraries that changes more frequently, and my code doesn't compile until I notice that. If we had the libraries in the same module, then we wouldn't have that problem. (Not that I'd ever do it that way. The gains in flexibility and the ability to use different libraries with different main projects are just too great.)
The API is a red herring here: either it stays the same or it changes, and if it changed we'd have to update the main code either way. So is the question of source vs. binary libraries (either we compile them with the main project, or we don't).

Related

Need a sensible version-control scheme for shared library code

Using Matlab for development and Mercurial for version-control, how do I properly version all code for each of my projects, when they share some common classes and functions?
My current scheme addresses this imperfectly; I have a repository for each project and a repository for the common library. This necessitates a manual manipulation protocol, including:
Manually referencing the project name/version in the commit description for the library
Manually updating the changeset for both the project and library, if reverting to a previous state
This has worked reasonably well so far, but does run the risk of human error in following the protocol and unintended consequences of a library modification on another project. The latter can be addressed with hg update -r on the library, but is error-prone since I have to remember to go back, as I move between projects.
Searching here (and elsewhere), I thought I had found salvation in sub-repository branches, only to discover the practice is basically frowned upon and considered a feature of last resort.
I then found that some folks eschew direct versioning with the project in favor of treating the library as a package for the build software to manage. Taking the library off the Matlab path, creating version clones and telling the builder which one to use, for any particular project, is a brilliant idea - except that I also use the Matlab interpreter to run/debug my code, as well as use the library in various scripts, so I need the library on the Matlab path - which means the builder will automatically pick up the library version that's on the path.
The only other scheme I can think of is to copy the library dependencies into the Project folder for revision control by the project repository. A change protocol would have to include copying the affected library class/function back to the library folder and typing an appropriate commit message. The trick here would be in manually updating project copies of library files, unless there's a Mercurial command to selectively pull from a foreign repo.
Does anybody have a better, more robust way to manage shared library code among projects in both interpreter and build environments?
Thanks to everyone who commented and to those who took the time to read my question. I am loathe to ask questions, since I never think my queries are so novel as to be previously un-asked. But in this case, I was finding it hard to come up with the right search terms/phrase; hence the less-than concise phrasing of my question.
I still don't know if there's a standard approach to managing software configuration for a project, when it includes non-project-specific dependencies, but the scheme I've decided to adopt is outlined below. I should say that the development framework I'm using is Matlab, which may well be argued isn't a terribly good framework for developing a GUI application, but it's the only one I have for now. Should I move to .Net, or some other framework, then maybe some of the issues I'm having would be much more readily resolved.
I decided the ability to version a project in its entirety took precedence and so I copied all of the project-agnostic dependencies (that is, functions and classes that I've developed) from a central library repo to a folder within the project repo.
It just means I have to be disciplined in managing the Matlab search path, as well as the protocol for copying changes made to these dependencies back to the central library - and for polling the library for any changes that originated from another project.
This doesn't seem elegant, but it does make me think more carefully about the functions and interfaces that I put into the library, which should be a good thing.

Should libs ands frameworks be subject to a version controlling repository?

Our sw project uses for its build process different libs (popular as well as special ones) and a framework. The libs never change, whereas the framework could be changed from time to time to an updated version.
For extended further developing we want to use a version control system. Which of these solutions is the most elegant one:
The full project with all libs and the framework gets uploaded in the version control system's repository thus everyone has exactly the same files, but the use of space in the repository is enormous.
Only the artifacts of the project which are getting effectively changed over time (the main program) are in the repository. Used libs and the framework are stored on a central NAS. -> Files could be used for other projects, too.
Like 2, but everyone hast a copy of the libs and the framework on his local workspace.
For my taste solution 2 or 3 sound better, because I think that the repository should be as light as possible. What are you recommending?
This is obviously a matter of opinion, but my opinion is that the most important characteristic of version control is the ability to reproduce source at a particular point. That includes libraries. There are downsides (boost is huge, for example), but it guarantees that everyone gets the same code, especially in the case of obsolete or unsupported libraries.
You can have both; structure your source control so that it separates your source and your lib/framework. People can put them in different places locally if they so choose, but everybody will have the same codebases.
Disks are cheap; time wasted trying to figure out why people aren't all seeing the same thing isn't. The most important thing is that everyone stay in synch.

When using a package or framework is there a standard way to use version control?

i.e. Do you put the whole package under VCS or just the components you are programming? Packages by there nature will get upgraded and that code will need to be added into the VCS, plus there is a lot of code that is static.
Specifically I am going to be working on Joomla, adding and building modules, customising modules and the look and feel. Initially this will be just me but will expand to possibly two more developers as the project ramps up. My reaction would be just to VCS the lot, it means that i know it is all there and deployment via CI is easier(?).
The alternative is to exclude the bulk of the code that is not being altered which could be error prone and laborious.
As there is not a specific answer for this and i am looking for either experience or best practice advice i have marked it community wiki.
I usually do one of two things (I use SVN):
Put a release version (no SCM meta-data) of the library/framework in a separate folder in my SVN repository. That way I know that the code is stable, and if something stops working, it's not because of changes to the framework, but instead my own fault (and can easily be fixed by me.)
Use svn:externals to automatically update from the official SVN of the library/framework. This is less safe, but is sometimes nice, especially when you are a contributor to the library/framework and can fix bugs that may occur yourself.
If you're using SVN, and want to use externals, do this:
svn propset svn:externals "foldername http://libdomain.com/svn/trunk" libs
...where libs is your library folder (onto which this SVN property will get set), and foldername is the name of the sub-folder in which this particular library should be placed.

Arguments for and against including 3rd-party libraries in version control? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've met quite a few people lately who says that 3rd party libraries doesn't belong in version control. These people haven't been able to explain to me why they shouldn't yet, so I hoped you guys could come to my rescue :)
Personally, I think that when I check the trunk of a project out, it should just work - No need to go to other sites to find libraries. More often than not, you'd end up with multiple versions of the same 3rd party lib for different developers then - and sometimes with incompatibility problems.
Is it so bad to have a libs folder up there, with "guarenteed-to-work" libraries you could reference?
In SVN, there is a pattern used to store third-party libraries called vendor branches. This same idea would work for any other SVN-like version control system. The basic idea is that you include the third-party source in its own branch and then copy that branch into your main tree so that you can easily apply new versions over your local customizations. It also cleanly keeps things separate. IMHO, it's wrong to directly include the third-party stuff in your tree, but a vendor branch strikes a nice balance.
Another reason to check in libraries to your source control which I haven't seen mentioned here is that it gives you the ability to rebuild your application from a specific snapshot or version. This allows you to recreate the exact version that someone may report a bug on. If you can't rebuild the exact version you risk not being able to reproduce/debug problems.
Yes you should (when feasible).
You should be able to take a fresh machine and build your project with as few steps as possible. For me, it's:
Install IDE (e.g. Visual Studio)
Install VCS (e.g. SVN)
Checkout
Build
Anything more has to have very good justification.
Here's an example: I have a project that uses Yahoo's YUI compressor to minify JS and CSS. The YUI .jar files go in source control into a tools directory alongside the project. The Java runtime however, does not--that has become a prereq for the project much like the IDE. Considering how popular JRE is, it seems like a reasonable requirement.
No - I don't think you should put third party libraries into source control. The clue is in the name 'source control'.
Although source control can be used for distribution and deployment, that is not its prime function. And the arguments that you should just be able to check out your project and have it work are not realistic. There are always dependencies. In a web project, they might be Apache, MySQL, the programming runtime itself, say Python 2.6. You wouldn't pile all those into your code repository.
Extra code libraries are just the same. Rather than include them in source control for easy of deployment, create a deployment/distribution mechanism that allows all dependencies to easily be obtained and installed. This makes the steps for checking out and running your software something like:
Install VCS
Sync code
Run setup script (which downloads and installs the correct version of all dependencies)
To give a specific example (and I realise this is quite web centric), a Python web application might contain a requirements.txt file which reads:
simplejson==1.2
django==1.0
otherlibrary==0.9
Run that through pip and the job is done. Then when you want to upgrade to use Django 1.1 you simply change the version number in your requirements file and re-run the setup.
The source of 3rd party software doesn't belong (except maybe as static reference), but the compiled binary do.
If your build process will compile an assembly/dll/jar/module, then only keep the 3rd party source code in source control.
If you won't compile it, then put the binary assembly/dll/jar/module into source control.
This could depend on the language and/or environment you have, but for projects I work on I place no libraries (jar files) in source control. It helps to be using a tool such as Maven which fetches the necessary libraries for you. (Each project maintains a list of required jars, Maven automatically fetches them from a common repository - http://repo1.maven.org/maven2/)
That being said, if you're not using Maven or some other means of managing and automatically fetching the necessary libraries, by all means check them into your version control system. When in doubt, be practical about it.
The way I've tended to handle this in the past is to take a pre-compiled version of 3rd party libraries and check that in to version control, along with header files. Instead of checking the source code itself into version control, we archive it off into a defined location (server hard drive).
This kind of gives you the best of both worlds: a 1 step fetch process that fetches everything you need, but it doesn't bog down your version control system with a bunch of necessary files. Also, by fetching pre-compiled binaries, you can skip that phase of compilation, which makes your builds faster.
You should definitively put 3rd party libraries under the source control. Also, you should try to avoid relying on stuff installed on individual developer's machine. Here's why:
All developers will then share the same version of the component. This is very important.
Your build environment will become much more portable. Just install source control client on a fresh machine, download your repository, build and that's it (in theory, at least :) ).
Sometimes it is difficult to obtain an old version of some library. Keeping them under your source control makes sure you won't have such problems.
However, you don't need to add 3rd party source code in your repository if you don't plan to change the code. I tend just to add binaries, but I make sure only these libraries are referenced in our code (and not the ones from Windows GAC, for example).
We do because we want to have tested an updated version of the vendor branch before we integrate it with our code. We commit changes to this when testing new versions. We have the philosophy that everything you need to run the application should be in SVN so that
You can get new developers up and running
Everyone uses the same versions of various libraries
We can know exactly what code was current at a given point in time, including third party libraries.
No, it isn't a war crime to have third-party code in your repository, but I find that to upset my sense of aesthetics. Many people here seem to be of the opinion that it's good to have your whole developement team on the same version of these dependencies; I say it is a liability. You end up dependent on a specific version of that dependency, where it is a lot harder to use a different version later. I prefer a heterogenous development environment - it forces you to decouple your code from the specific versions of dependencies.
IMHO the right place to keep the dependencies is on your tape backups, and in your escrow deposit, if you have one. If your specific project requires it (and projects are not all the same in this respect), then also keep a document under your version control system that links to these specific versions.
I like to check 3rd party binaries into a "lib" directory that contains any external dependencies. After all, you want to keep track of specific versions of those libraries right?
When I compile the binaries myself, I often check in a zipped up copy of the code along side the binaries. That makes it clear that the code is not there for compiling, manipulating, etc. I almost never need to go back and reference the zipped code, but a couple times it has been helpful.
If I can get away with it, I keep them out of my version control and out of my file system. The best case of this is jQuery where I'll use Google's AJAX Library and load it from there:
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js" type="text/javascript"></script>
My next choice would be to use something like Git Submodules. And if neither of those suffice, they'll end up in version control, but at that point, its only as up to date as you are...

Storing third-party libraries in source control

Should libraries that the application relies on be stored in source control? One part of me says it should and another part say's no. It feels wrong to add a 20mb library that dwarfs the entire app just because you rely on a couple of functions from it (albeit rather heavily). Should you just store the jar/dll or maybe even the distributed zip/tar of the project?
What do other people do?
store everything you will need to build the project 10 years from now.I store the entire zip distribution of any library, just in case
Edit for 2017:
This answer did not age well:-). If you are still using something old like ant or make, the above still applies. If you use something more modern like maven or graddle (or Nuget on .net for example), with dependency management, you should be running a dependency management server, in addition to your version control server. As long as you have good backups of both, and your dependency management server does not delete old dependencies, you should be ok. For an example of a dependency management server, see for example Sonatype Nexus or JFrog Artifcatory, among many others.
As well as having third party libraries in your repository, it's worth doing it in such a way that makes it easy to track and merge in future updates to the library easily (for example, security fixes etc.). If you are using Subversion using a proper vendor branch is worthwhile.
If you know that it'd be a cold day in hell before you'll be modifying your third party's code then (as #Matt Sheppard said) an external makes sense and gives you the added benefit that it becomes very easy to switch up to the latest version of the library should security updates or a must-have new feature make that desirable.
Also, you can skip externals when updating your code base saving on the long slow load process should you need to.
#Stu Thompson mentions storing documentation etc. in source control. In bigger projects I've stored our entire "clients" folder in source control including invoices / bills/ meeting minutes / technical specifications etc. The whole shooting match. Although, ahem, do remember to store these in a SEPARATE repository from the one you'll be making available to: other developers; the client; your "browser source view"...cough... :)
Don't store the libraries; they're not strictly speaking part of your project and uselessy take up room in your revision control system. Do, however, use maven (or Ivy for ant builds) to keep track of what versions of external libraries your project uses. You should run a mirror of the repo within your organisation (that is backed up) to ensure you always have the dependencies under your control. This ought to give you the best of both worlds; external jars outside your project, but still reliably available and centrally accessible.
We store the libraries in source control because we want to be able to build a project by simply checking out the source code and running the build script. If you aren't able to get latest and build in one step then you're only going to run into problems later on.
never store your 3rd party binaries in source control. Source control systems are platforms that support concurrent file sharing, parallel work, merging efforts, and change history. Source control is not an FTP site for binaries. 3rd party assemblies are NOT source code; they change maybe twice per SDLC. The desire to be able to wipe your workspace clean, pull everything down from source control and build does not mean 3rd party assemblies need to be stuck in source control. You can use build scripts to control pulling 3rd party assemblies from a distribution server. If you are worried about controlling what branch/version of your application uses a particular 3rd party component, then you can control that through build scripts as well. People have mentioned Maven for Java, and you can do something similar with MSBuild for .Net.
I generally store them in the repository, but I do sympathise with your desire to keep the size down.
If you don't store them in the repository, the absolutely do need to be archived and versioned somehow, and your build system needs to know how to get them. Lots of people in Java world seem to use Maven for fetching dependencies automatically, but I've not used I, so I can't really recommend for or against it.
One good option might be to keep a separate repository of third party systems. If you're on Subversion, you could then use subversion's externals support to automatically check out the libraries form the other repository. Otherwise, I'd suggest keeping an internal Anonymous FTP (or similar) server which your build system can automatically fetch requirements from. Obviously you'll want to make sure you keep all the old versions of libraries, and have everything there backed up along with your repository.
What I have is an intranet Maven-like repository where all 3rd party libraries are stored (not only the libraries, but their respective source distribution with documentation, Javadoc and everything). The reason are the following:
why storing files that don't change into a system specifically designed to manage files that change?
it dramatically fasten the check-outs
each time I see "something.jar" stored under source control I ask "and which version is it?"
I put everything except the JDK and IDE in source control.
Tony's philosophy is sound. Don't forget database creation scripts and data structure update scripts. Before wikis came out, I used to even store our documentation in source control.
My preference is to store third party libraries in a dependency repository (Artifactory with Maven for example) rather than keeping them in Subversion.
Since third party libraries aren't managed or versioned like source code, it doesn't make a lot of sense to intermingle them. Remote developers also appreciate not having to download large libraries over a slow WPN link when they can get them more easily from any number of public repositories.
At a previous employer we stored everything necessary to build the application(s) in source control. Spinning up a new build machine was a matter of syncing with the source control and installing the necessary software.
Store third party libraries in source control so they are available if you check your code out to a new development environment. Any "includes" or build commands that you may have in build scripts should also reference these "local" copies.
As well as ensuring that third party code or libraries that you depend on are always available to you, it should also mean that code is (almost) ready to build on a fresh PC or user account when new developers join the team.
Store the libraries! The repository should be a snapshot of what is required to build a project at any moment in time. As the project requires different version of external libraries you will want to update / check in the newer versions of these libraries. That way you will be able to get all the right version to go with an old snapshot if you have to patch an older release etc.
Personally I have a dependancies folder as part of my projects and store referenced libraries in there.
I find this makes life easier as I work on a number of different projects, often with inter-depending parts that need the same version of a library meaning it's not always feasible to update to the latest version of a given library.
Having all dependancies used at compile time for each project means that a few years down the line when things have moved on, I can still build any part of a project without worrying about breaking other parts. Upgrading to a new version of a library is simply a case of replacing the file and rebuilding related components, not too difficult to manage if need be.
Having said that, I find most of the libraries I reference are relatively small weighing in at around a few hundred kb, rarely bigger, which makes it less of an issue for me to just stick them in source control.
Use git subprojects, and either reference from the 3rd party library's main git repository, or (if it doesn't have one) create a new git repository for each required library. There's nothing reason why you're limited to just one git repository, and I don't recommend you use somebody else's project as merely a directory in your own.
store everything you'll need to build the project, so you can check it out and build without doing anything.
(and, as someone who has experienced the pain - please keep a copy of everything needed to get the controls installed and working on a dev platform. I once got a project that could build - but without an installation file and reg keys, you couldn't make any alterations to the third-party control layout. That was a fun rewrite)
You have to store everything you need in order to build the project.
Furthermore different versions of your code may have different dependencies on 3rd parties.
You'll want to branch your code into maintenance version together with its 3rd party dependencies...
Personally what I have done and have so far liked the results is store libraries in a separate repository and then link to each library that I need in my other repositories through the use of the Subversion svn:externals feature. This works nice because I can keep versioned copies of most of our libraries (mainly managed .NET assemblies) in source control without them bulking up the size of our main source code repository at all. Having the assemblies stored in the repository in this fashion makes it so that the build server doesn't have to have them installed to make a build. I will say that getting a build to succeed in absence of Visual Studio being installed was quite a chore but now that we got it working we are happy with it.
Note that we don't currently use many commercial third-party control suites or that sort of thing much so we haven't run into licensing issues where it may be required to actually install an SDK on the build server but I can see where that could easily become a problem. Unfortunately I don't have a solution for that and will plan on addressing it when I first run into it.