Assets repository - version-control

I use mercurial as my main scm tool. Before I've worked with svn and others but all of these repositories are designed specificaly to work with plain text code.
My assets and documents are binaries and usually are big files (10~50 MB each) and my problem is not about keep track of what has changed exactly, but I want at least the basic functionality that scm's give me for code: tracking what files have changed (who,when,comments...), historical backup and distributed sync.
What tools do you use for this and how do you link changes in assets with changes in code if you are using different repository kinds?

Related

How can I do a > 5GB commit to Mercurial?

I'm trying to import an existing project into Mercurial. The project is a bit over 5GB.
When I try to do an hg push I always get an error about being out of buffer space.
Does anyone know of a good way of doing the initial commit?
If you are not tied down to using Mercurial, then another possibility would be to use boar. It is not a DVCS like Mercurial, instead you have a central repository in which you store your data, and "check out" versions of files - in much the same way as with Subversion.
The important part is that it is written with the express purpose of storing large, binary files.
I have not used it, so I cannot comment on how good it is at its job, or how stable it is, but it is a possible alternative that may well suit your needs.
For a brief explanation of why storing binary files in mercurial is discouraged please read
https://www.mercurial-scm.org/wiki/BinaryFiles and http://kiln.stackexchange.com/questions/1074/why-is-it-bad-to-store-binary-files-in-mercurial
In our case we handle binary files using Dropbox. It allows you to both keep the history of files and sync the folder between team members. If you don't need to keep history of files, you can use rsync to keep binaries sync'ed.
Assuming you do actually need to put such a large commit into Mercurial, I would guess that rather than a few million tiny files, the size of your commit is primarily due to a handful of biiiig files. In this case you could investigate the Large Files Extension, which should suit your needs. When you add a large file, it is tracked by checksum rather than content, so what Mercurial itself tracks is relatively small. The extension will take care of the versions for you.
However, as Alex Stuckey mentions, you shouldn't normally be committing things such as compiled binaries (object code, resulting executables, ...), which are the most likely reason you have such a big commit. You would do well to create a decent .hgignore file (one that removes the usual suspects - *.o, *.pdb, whatever, ...), which will help eliminate accidentally adding files like that in the future. I have a standard .hgignore which gets put into nearly all my repositories as the first commit, and has served me well.

Project files under version control?

I work on a large project where all the source files are stored in a version control except the project files. This was the lead developer's decision. His reasoning was:
Its to time consuming to reconcile the differences among developers' working directories.
It allows developers to work independently until their changes are stable
Instead, a developer initially gets a copy of a fellow developer's project files. Then when new files are added each developer notifies all the rest about the change. This strikes me as far more time consuming in the long run.
In my opinion the supposed benefits of not tracking changes to the project files are outweighed by the danger. In addition to references to its needed source files each project file has configuration settings that would be very time consuming and error prone to reproduce if it became corrupted or there was a hardware failure. Some of them have source code embedded in them that would be nearly impossible to recover.
I tried to convince the lead that both of his reasons can be accomplished by:
Agreeing on a standard folder structure
Using relative paths in the project files
Using the version control system more effectively
But so far he's unwilling to heed my suggestions. I checked the svn log and discovered that each major version's history begins with an Add. I have a feeling he doesn't know how to use the branching feature at all.
Am I worrying about nothing or are my concerns valid?
Your concerns are valid. There's no good reason to exclude project files from the repository. They should absolutely be under version control. You'll need to standardize on a directory structure for automated builds as well, so your lead is just postponing the inevitable.
Here are some reasons to check project (*.*proj) files into version control:
Avoid unnecessary build breaks. Relying on individual developers to notify the rest of the team every time the add, remove or rename a source file is not a sustainable practice. There will be mistakes and you will end up with broken builds and your team will waste valuable time trying to determine why the build broke.
Maintain an authoritative source configuration. If there are no project files in the repository, you don't have enough information there to reliably build the solution. Is your team planning to deliver a build from one of your developer's machines? If so, which one? The whole point of having a source control repository is to maintain an authoritative source configuration from which you build and deliver releases.
Simplify management of your projects. Having each team member independently updating their individual copies of your various project files gets more complicated when you introduce project types that not everyone is familiar with. What happens if you need to introduce a WiX project to generate an MSI package or a Database project?
I'd also argue that the two points made in defense of this strategy of not checking in project files are easily refuted. Let's take a look at each:
Its to time consuming to reconcile the differences among developers' working directories.
Source configurations should always be setup with relative paths. If you have hard coded paths in your source configuration (project files, resource files, etc.) then you're doing it wrong. Choosing to ignore the problem is not going to make it go away.
It allows developers to work independently until their changes are stable
No, using version control lets developers work in isolation until their changes are stable. If you each continue to maintain your own separate copies of the project files, as soon as someone checks in a change that references a class in a new source file, you've broken everyone on the team until they stop what they're doing and carefully update their project files. Compare that experience with just "getting latest" from source control.
Generally, a project checked out of SVN should be working, or there should be tools included to make it work (e.g. autogen.sh). If the project file is missing or you need knowledge about which files should be in the project, there is something missing.
Automatically generated files should not be in SVN, as it is pointless to track the changes to these.
Project files with relative path belong under source control.
Files that don't: For example in .Net, I would not put the .suo (user options) web.config (or app.config under source control. You may have developers using different connection strings, etc.
In the case of web.config, I like to put a web.config.example in. That way you copy the file to web.config upon initial checkout and tweak what settings you'd like. If you add something that needs to be added to all web.config, you merge those lines into the .example version and notify the team to merge that into their local version.
I think it depends on the IDE and configuration of the project. Some IDEs have hard-coded absolute paths and that's a real problem with multiple developers working on the same code with different local copies and configurations. Avoid absolute path references to libraries, for example, if you can.
In Eclipse (and Java), it's fine to commit .project and .classpath files (so long as the classpath doesn't have absolute references). However, you may find that using tools like Maven can help having some independence from the IDE and individual settings (in which case you wouldn't need to commit .project, .settings and .classpath in Eclipse since m2eclipse would re-create them for you automatically). This might not apply as well to other languages/environments.
In addition, if I need to reference something really specific to my machine (either configuration or file location), it tend to have my own local branch in Git which I rebase when necessary, committing only the common parts to the remote repository. Git diff/rebase works well: it tends to be able to work out the diffs even if the local changes affect files that have been modified remotely, except when those changes conflict, in which case you get the opportunity to merge the changes manually.
That's just retarded. With a set up like that, I can have a perfectly working project containing files that are subtly different from everyone else. Imagine the havoc this would cause if someone accidentally propagates this mess into QA and everyone is trying to figure out what's going on. Imagine the catastrophe that would ensue if it ever got released to the production environment...!

How do people manage changes to common library files stored across mutiple (Mercurial) repositories?

This is perhaps not a question unique to Mercurial, but that's the SCM that I've been using most lately.
I work on multiple projects and tend to copy source code for libraries or utilities from a previous project to get a leg up on starting a new project. The problem comes in when I want to merge all the changes I made in my latest project, back into a "master" copy of those shared library files.
Since the files stored in disjoint repositories will have distinct version histories, Mercurial won't be able to perform an intelligent merge if I just copy the files back to the master repo (or even between two independent projects).
I'm looking for an easy way to preserve the change history so I can merge library files back to the master with a minimum of external record keeping (which is one of the reasons I'm using SVN less as merges require remembering when copies were made across branches).
Perhaps I need to do a bit more up-front organization of my repository to prepare for a future merge back to a common master.
Three solutions, pick your favorite:
Put all projects into one repository.
Make a separate repository for shared code and different repository for each project.
One repository with Subrepositories: https://www.mercurial-scm.org/wiki/subrepos, keep all common code in one subrepo and different subrepos for each project.
Copying actual files between repositories with no common ancestors will never be optimal as history is not preserved.
I'd recommend against your "copy the sourcecode" practice but use binary distribution for your custom libraries instead. These binaries are checked in along the sourcecode.
reduces build-time
no overhead of tracking changes in all copies of the library
you can use different versions of the same library in different projects.
EDIT: And for the issue with "common" or "toolbox" libaries in general, read this post from ayende.
use the transplant extension

Best practice for source control of a customized Open Source project

I have been using an open source Java project and had to make some custom changes for our site. I have downloaded the source code via Subversion, modified two files and built a custom JAR file. Now I need to store these custom changes into OUR Subversion source control system. What is the best way to do this?
Should I check the entire tagged version of the open source code into our system and then create a branch with our change in it? Or should I just check-in our custom files and rely on the open source tagged version to always be around? Or perhaps something else altogether?
Take a good look at Subversion vendor branches, which are meant for "maintain[ing] custom modifications to third-party data in your own version control system". This sounds like exactly what you want. You would create a vendor branch for the open source Java project in your main repo (from their last SVN revision before your modifications). Then, check in your modifications. In the future, you can merge in upstream changes.
The Subversion book is free and available online, with a section devoted to choosing a repository layout.
the Subversion community recommends
that you choose a repository location
for each project root—the “topmost”
directory that contains data related
to that project—and then create three
subdirectories beneath that root:
trunk, meaning the directory under
which the main project development
occurs; branches, which is a directory
in which to create various named
branches of the main development line;
and tags, which is a collection of
tree snapshots that are created, and
perhaps destroyed, but never changed.
I'm happy to elaborate if you are having trouble determining what exactly this means for your project.
First try to avoid this as long as you can, if it is possible try to get your changes into the open-source project (less work for your self in the future...)
But if that was not a option, I would follow Matthew Flaschen advice about vendor branches.

Source Control for multiple projects/solutions with shared libraries

I am currently working on a project to convert a number of Excel VBA powered workbooks to VSTO solutions. All of the workbooks will share a number of class libraries and third party assemblies, in fact most of the work is done in the class libraries. I currently have my folder structure laid out like this.
Base
Libraries
Assemblies
Workbooks
Workbook1
Workbook2
Each of the workbooks will be its own solution, and the workbook solutions just reference the assemblies in the folder structure. My question is how would you lay out the source control? Would you start the repository at the base? Or would you create a repository for each workbook solution? Would you rearrange the folders?
Now that we have the initial development done, we're about to have a bunch of outside developers come on to the project to helps us convert the rest of the workbooks and I really like the idea of them being able to check out from the base directory and having all of the dependencies ready to go. I also worry that there are other concerns that come with having 20+ solutions/projects under one source control repository.
I want everything to be as simple as possible for people joining the project but I don't want to sacrifice long term usability. In my mind I've been going back and forth, what's simpler one repository or one repository per solution?
I'd appreciate and insight you have, because I'm fresh out.
Additional Information: Currently, I am using Mercurial personally, but the project will probably get moved to StarTeam unless I can make some convincing arguments for something else.
You don't mention in your question what source control you are using. As it doesn't sound like you need to limit your outside developers access to the rest of the repository I would not bother with setting up multiple repositories. I would assume that unless your code runs into the millions of lines size that repository size is not an issue.
It all depends what functionality your revision control system supports. In subversion you can declare other folders as external and provide a file URL for the content of that folder, this will cause subversion to deal with that folder as a separate repository even though it is within your folder structure.