For binary files should I use bfiles or bigfiles? - version-control

There are a few mercurial extensions for dealing with large binary files.
Bfiles
BigFiles
Snap
kbfiles
others?
I'd like to use the one that is most likely to be official (ie distributed with mercurial).
Kiln 2.0 uses a fork of Bfiles for its binary files. Does that make it more likely to become official?
Which is the preferred (semi-official) extension for handling binary files?

It appears that Mercurial is planning to incorporate the 'largefiles' extension for the November 2.0 release. Mercurial incorporated the 'largefiles' extension in the 2.0 release. This extension is a descendent of 'kbfiles' (from Kiln), which is in turn a descendent of the bfiles extension.
It makes largefile support much more integrated into the Mercurial commands than bfiles did, and supports pushing to http(s) urls which I believe bfiles did not.
Kiln repo
preparation work on mercurial-devel
release expectations

It's too early to tell. And it is way too early to start talking about including any of these extensions with Mercurial. IMHO they should all be considered experimental.
(I'm the author of one of those extensions (bfiles), so this is as authoritative an answer as you are likely to get. If someone proposed shipping any one of these extensions with Mercurial today, including mine, I would be strenuously opposed.)
Also, there is no logical link between game development and which extension to choose. It doesn't matter if you're tracking movies, game data, jar files, medical imaging data, or what: most source-control systems are not very good at handling it, and there is no clear answer yet which is the right way to do it with Mercurial.
IMHO stackoverflow is really not the right place for this sort of discussion; the mercurial-devel list is.

It seems that BigFiles is recommanded by game developpers using Mercurial, so maybe you should go with it. However if you want to know wich one is worked to be included in a coming version of mercurial, you should ask in or read the developers' mailing list.

Errr... Nexus. Or any other artifact repositories (or any other backup systems if you only need the latest version).
Because no binary file (especially large one) really belong to a VCS where you would want to diff or merge.
Sure, you could use a VCS, and there are actually good arguments for it, but a VCS is simply not designed for that at its core.

Related

How does one handle big library dependencies in source control?

My C++ application depends on Boost. I'd like someone to just be able to check out my repository and build the whole thing in one step. But the boost distribution is some 100MB and thousands of files, and it seems to bog down source control -- plus I really don't need it to be versioned.
What's the best way to handle this kind of problem?
Most version control tools/systems provide mechanics to add references to other repositories into your repository.
This way you can keep your repository clean from files of other repositories but still be able to just point to the correct library and version.
In Git it’s called submodules. In SVN it’s called externals.
In the end you’ll have to decide on whether you want to include the files into your repo so others won’t have to checkout the other repos as well, even when the references (submodule/external) make just that pretty easy. I’d prefer a clean repo though and reference other repositories, if available. This will also make maintaining and upgrading those libraries a lot easier.
I've found the book "Large-Scale C++ Software Design" by John Lakos very useful as far as organising large C++ projects is concerned. Recommended.

What to do when there are so many version control systems?

Do you use many version control software (TortoiseSVN, Bazaar Explorer, smartgit etc)?
One software that supports all version control systems (cvs, svn, bzr, git etc)? Which?
Do you keep converting between them (I imagine me converting gif -> jpg -> gif -> jpg...)?
UPDATE:
If I pick one, do I really have to give up contributing to all software that use the other ones?
Choose the one that suits your needs and stick with it
Why use many? Pick one, and stick with it. Normally, the choice is between Mercurial, SVN, TFS and GIT today.
Joel says to use Mercurial and provides a really nice write up for us subversion adherents to avoid going crazy during the the switch-over. Read his article and decide for yourself. I went from ignoring these other "weirdo" version control systems and sticking with subversion to thinking hmm maybe we should switch - this is actually starting to make sense to me now.
Joel on dvcs
I think one solution when having to deal with multiple working copies from different VCSs is to stick to uniform interface.
For instance there are TortoiseSVN, TortoiseHG and TortoiseGIT sharing much (I think) of UI.
OK, so the TortoiseHG's UI differs somewhat, but so is the working model and it is still an Explorer extension.
Of course this only makes sense if you are talking about different data sets with each of them, doing so with a single data set is really dangerous and often lossy operation.
UPDATE: It looks like according to your update it is really your case - you are using each to work with different repository.
Different projects demand different solutions.
If you are working with Linux - git is the solution.
If you are working with Firefox - (not SVN) Mercurial is the solution.
If you are working with drupal - CVS is the solution (they are migrating to GIT...).
If you are working with KDE - SVN is the solution.
Anyway - there is no solution to this, this is part of the world we live in. It's like asking "why so much programming languages?"
(I myself, use git-svn to checkout SVN repositories... sometimes at least, GIT is the weapon of choice).
We use PVCS (Merant) at work for legacy projects that were using it, TFS for new Visual Studio work, and I use Mercurial for my personal projects.
The mental changes required to work with the different systems are just part of the territory, just like what I have to do when switching from C# to PowerBuilder to scripting language du task to VBA to C at work.
PVCS pisses me off, TFS is tolerable, and Mercurial is pretty unobtrusive as far as I'm concerned. They each server their particular purpose.
I searched for bzr svn in Synaptic and found a Subversion integrator for Bazaar Explorer, so I can download the latest svn revision files in a Subversion repository from Bazaar Explorer. There are Git and Mercurial integrators for Bazaar Explorer in Synaptic too. Search for bzr git, then bzr hg (it's Mercurial!). I'm wondering if the cvs importer works like a cvs integrator... I think it's missing!!! :-(

Version Control: Taking on a project without any

I've recently taken on a project with no version control. I don't have any experience with version control myself. I feel it is the only way to go with this project (and probably any future projects now I think of it - I always trust myself too much..)
My question is - where do I begin with implementing version control on a project already in production? Bearing in mind I haven't used version control before so really it's two separate questions:
Starting out with version control
Implementing it on an already live
project
For background, the project is a php/mysql driven website using bits of javascript, I'm working on a (Windows) XAMPP server and I'm very keen to learn this new world of version control!
Congratulations, you are headed in the right direction!
You'll first need to choose a version control system. My current favorite is Git. Unfortunately, I don't think that Git is an easy introduction to version control. I have also used Subversion and Perforce.
Subversion (http://subversion.tigris.org/) works on many platforms, is used in a lot of projects, and has some nice GUI tools available (such as TortoiseSVN on Windows). Command-line tools are also available. It's also free. You can run it in "local filesystem" mode, meaning that you don't need to set up a separate server. It's come a long way from it's "better than CVS" roots.
Perforce (http://www.perforce.com/) is pretty nice. Its Windows implementation seems the best (last I checked, their cross-platform GUI was pretty lousy). You primarily use a GUI to interact with it, though again there are command-line tools. It's commercial software, but open source projects can get free licenses by contacting the company. The biggest drag is that you will need to set up a server. To get started, you could run the server on the same box that you develop on, but that's probably a bad idea in the long run. I found Perforce to be very good for 2-8 person teams; I don't know how well it would work with more.
The big advantage to Git (http://git-scm.com/) is that it requires virtually no set-up. Once installed, you can execute git init in any directory to create a new git repository. The revision history is kept inside the project's directory. You can start out with just local versioning, and you can scale up from there. If Git seems scary, you could also check out Mercurial (https://www.mercurial-scm.org/). I haven't used it, but I understand that it shares some of the same underlying principles as Git.
Avoid CVS. It's on its way out, and no new project should be using it unless they need to do so.
Adding source control to an existing project is easy. The hard part would be making sure that everybody is willing to use source control. If you're working alone, then it's just personal discipline. If you're part of a team, though, and some people have reservations, you will have problems. Try to get everybody on board, and be available to try to answer their questions. If people don't know how to use a tool, they simply won't use it.
Start here: http://svnbook.red-bean.com/
I've found SVN to be the easiest version control system to use, especially for beginners. It's pretty simple to start, the only real decision you have to make is where to host your stuff. There are a couple free svn servers available, but if you're really serious about your work you should host your own.
The first thing to do is to pick a version control paradigm (centralized versus distributed). To answer that, you'll need to take a look at your team and how you intend to handle check-in, check-out, merging, and branching. Once you pick a paradigm, you can choose a version control system. The mainstream systems are Subversion for centralized version control and Git and Mercurial for distributed version control.
If the project is live and working, then that should be your initial check-in to whatever version control you are using. You need a reliable baseline that you can revert to and have 0 work to deploy something that works. If your project is not functional...well, good luck. You might want to check in to start using version control and then decide how you want to proceed (either get the project to a stable and functioning state and then restart your repository or have your initial check-in be a broken system).
If the rest of the team doesn't see the benefit with version control, I would recommend installing your own system on your machine and, at the very least, use it for your own work.
Be prepared for some resistance from management and/or your co-workers. Management may not want to invest the resources for a repository machine -- these things need to be installed, maintained, backed-up, etc. Or they may object to you spending time on an "extra" like a RCS.
Your co-workers, especially if they're unfamiliar with any RCS, are likely to resist using it, or complain that it's too hard to use. There's a learning curve to any new tool, and source control systems are no exception. It's worth the time to learn, though.
My advice is to pick one -- any one that strikes your fancy -- and start using it. Don't worry about getting it 100% perfect the 1st time, it probably won't be any worse than what you have now, which is one misplaced keystroke away from oblivion.
Play with it. Check files out into a separate workspace and hack things up, knowing that it doesn't matter; you can always revert it. Learn how to use your new tool with some GUI frontends (I'm fond of 'svn diff --diff-cmd=kdiff3', myself). Get to the point where you know how to check in & out, tags things, branch, and merge. Then show your co-workers.
Personally, I'm fond of svn, but I didn't choose it; it chose me.
Step 1: Download Mercurial.
Step 2: In your favorite command line, go to the root of your source directory and type hg init.
Step 3: Do a make clean or equivalent (ie. all you want is source, no generated files).
Step 4: Type hg addremove.
Step 5: Type hg commit.
From this point on you can:
Examine the changes between your most recent commit and now: hg diff or hg status.
Make checkpoints in your code: hg commit.
Return to previous checkpoints: hg update -C -r 0
Congratulations, you are now using version control: It's really not that hard, and it's very, very useful (if for no other reason than you can look at the changes you've made to see if they make sense).
At some point you'll probably want to learn about branching (if only so that you have a backup copy of your repository on another machine) at which point you can turn to the documentation or the book.
I don't know if there is something similar for php etc, but an interesting resource here is "Brownfield Application Development in .NET". In many ways, this only uses .NET for the examples; most of the book is really about tackling policies exactly like you mention:
how to introduce source control
how to introduce unit testing
how to introduce continuous integration
etc
and all the concerns/consideration that go with them.
Partly relating to the code; but also relating to the "human" factor; colleagues, managers, etc. I highly recommend it; but you might decide the .NET background is inappropriate for you (it is a good fit for me ;-p).
You can look here: git-for-beginners-the-definitive-practical-guide
This one is a distributed version control system that currently has a good windows support with Git on Windows and a shell extension with TortoiseGit
An addition to other answers:
If the project you want to put under version control have had some releases, and if those versions are available for example as tarfiles (e.g. project-0.1.tar.gz, project-0.2.tar.gz, project-0.3.tar.gz, ...) you might want to consider importing those versions into your chosen version control system. Git for example has import-tars.perl and import-zips.py in contrib/fast-import/ directory, and writing support for other files in other programming languages for git fast-import should be easy.
Sidenote: my preferred version control system is Git.
See also: Good link or book for basics and theory of version control question.
I learned the concepts from the pragmatic series:
example for subversion, they also have books on GIT as well.
I only have experience with SourceSafe and SVN.
SourceSafe seems to have issues with corrupting it's own database, on a team of 5 we were repairing the db probably once a month. It's easy, but still something you shouldn't have to deal with. It's difficult to label code too and use that label for anything practical.
SVN is nice, it's simple to install on Linux or Windows. Most IDEs have a plugin for it, and if you're using Windows there is an Explorer extension (TortoiseSVN) that allows you to do all your operations right from Windows Explorer. There's a lot of SVN tools out there for every OS, it's very well supported. SVN also integrates with TRAC (a bug tracking system), and Bugzilla so you can tie your work tickets to code.
I will say that [HOW you use version control is probably just as, if not more important than which package you use][2]. Using it simply as a library is a very rudimentary application of it, but for a 1-2 man team making a website or an app where you won't be maintaining builds and versions, you'll be ok.
When it comes to version control, anything is better than nothing.
Wow, everyone is just boosting his favorite version control utility.
OK, to answer your question, how do you put a project under version control?
It's not that hard, once you pick a version control utility (be it, git, svn, hg, bzr .. whatever) there's usually a command or two to initialize a repository then add all the relevant files to it.
For instance, in git it might be something like:
$git init
$git add --all
$git commit -m"First commit"
Now, about choosing a version control utility, that's a tough question and highly depends on what you want. You might want to have a look at this question:
Popularity of Git/Mercurial/Bazaar vs. which to recommend
The only tools you should consider choosing among are:
git
svn (Subversion)
hg (Merculiar)
bzr (Bazaar)
mtn (Monotone)
Everything else is either old or commercial.
svn follows a server-client model; there's a central repository. If you're a one-man team then the only thing this means to you is that you have to setup a server and make sure it starts with the computer. Though I heard that you can do away with the server. A bit of googling turns up this guide for using svn without a server
All other tools follow a distributed model, again, if you're a one-man team, the only thing this means to you is that there's no server to setup.
The advantage of svn is that it's been there for a while and has many gui front-ends and better IDE integration.
I can't compare git to hg (merculiar) since I haven't used the latter, but git has a unique storage model compared to svn and hg.
bzr is said to be easier to use, but slower (it's written in python).
I'm personally satisfied with git, but you should do your own research; or maybe just choose one and stick with it. As far as I can tell, they're all mature and stable.

best practice in storing non-source files under version control

I have a project under version control, but the project has some images, videos and zip files that change every so often. I don't want to store these files under version control because they take up a lot of space and make updates and commits very slow.
What's a good way of dealing with this issue and still commit non-source files that have changed? is there a better way?
I'm currently using subversion, if there is another version control client that is better for dealing with this issue, please recommend it!
I have lots of non-source files in SVN and the only time it slows down the commit is when I change them. I don't see how this is an issue if they're only changing "every so often". Also the size really shouldn't be a concern. If your repository is on a server and you're worried about how much space it's taking up you need to upgrade. Hard drives are cheap. Buy them.
Some people feel strongly that non-source files don't belong in source control, I say an entire project should be stored in source control. That way if my development system goes down I can switch to another and after a couple minutes of downloading the project I'm back to coding.
You added in a comment:
The problem I have with this is that I don't care about the version history of those zip/video files, as long as they're the newest ones, there is no problem.
That means you have a linear workflow of development, only working on the LATEST of one main branch.
You do not seem to deal with the phase "after release", where you have to:
maintain what runs in production
develop small evolutions...
..while doing massive refactoring for experimenting some big evolutions
In the last three cases, the question of "what were the exact images, videos and zip files "I have to use" or "I was using at the time" might become important.
Anyhow, if you feel SVN do not handle them appropriately, I would still recommend having some way to remember that, for SVN revision xxx to yyy, you were using the version 'z' of your set of binaries.
For that, you could setup an external repository like Maven. See question "Is it acceptable/good to store binaries in SVN?" (my answer in that question is near the top of the page, but I link directly to Evan's answer as he mentions Maven).
While I find it a huge pain to deal with non-text files in version control, especially ones that change a lot, I've accepted the practice of "if it is needed for the build/installer it should go in version control". This of course is not a hard rule. I don't keep 3rd party libraries under version control (though know people who do).
I came under this opinion after setting up a Continuous Integration server at my shop. Having everything needed for the build that may change make's it much easier. As previously mentioned I don't keep libs under version control, but that is due to the fact that we rarely upgrade/add new libraries. If this is not the case for your shop then you may consider doing so. Also, if your images/videos/zips change more then once a year then I'd recommend keeping them under version control.

What is the difference between all the different types of version control?

After being told by at least 10 people on SO that version control was a good thing even if it's just me I now have a followup question.
What is the difference between all the different types of version control and is there a guide that anybody knows of for version control that's very simple and easy to understand?
We seem to be in the golden age of version control, with a ton of choices, all of which have their pros and cons.
Here are the ones I see most used:
svn - currently the most popular open source?
git - very hot since Linus switched to it
mercurial - some smart people I know swear by it
cvs - the one everybody is switching from
perforce - imho, the best features, but it's not open source. The two-user license is free, though.
visual sourcesafe - I'm not much in the Microsoft world, so I have no idea about this one, other than people like to rag on it as they rag on everything from Microsoft.
sccs - for historical interest we mention this, the great-grandaddy of many of the above
rcs - and the grandaddy of many of the above
My recommendation: you're safest with either git, svn or perforce, since a lot of people use them, they are cross platform, have good guis, you can buy books about them, etc.
Dont consider cvs, sccs, rcs, they are antique.
The nice thing is that, since your projects will be relatively small, you will be able to move your code to a new system once you're more experienced and decide you want to work with another system.
Eric Sink has a good overview of source control. There are also some existing questions here on SO.
To everyone just starting using version control:
Please do not use git (or hg or bzr) because of the hype
Use git (or hg or bzr) because they are better tools for managing source code than SVN.
I used SVN for a few years at work, and switched over to git 6 months ago. Without learning SVN first I would be totaly lost when it comes to using a DVCS.
For people just starting out with version control:
Start by downloading SVN
Learn why you need version control
Learn how to commit, checkout, branch
Learn why merging in SVN is such a pain
Then switch over to a DVCS and learn:
How to clone/branch/commit
How easy it is to merge your branches back (go branch crazy!)
How easy it is to rewrite commit history and keep your branchesup to date with the main line (git rebase -i, )
How to publish your changes so others can benefit
tldr; crowd:
Start with SVN and learn the basics, then graduate to a DVCS.
I would start with:
A Visual Guide to Version Control
Wikipedia
Then once you have read up on it, download and install SVN, TortoiseSVN and skim the first few chapters of the book and get started.
Version Control is essential to development, even if you're working by yourself because it protects you from yourself. If you make a mistake, it's a simple matter to rollback to a previous version of your code that you know works. This also frees you to explore and experiment with your code because you're free of having to worry about whether what you're doing is reversible or not. There are two major branches of Version Control Systems (VCS), Centralized and Distributed.
Centralized VCS are based on using a central server, where everyone "checks out" a project, works on it, and "commits" their changes back to the server for anybody else to use. The major Centralized VCS are CVS and SVN. Both have been heavily criticized because "merging" "branches" is extremely painful with them. [TODO: write explanation on what branches are and why merging is hard with CVS & SVN]
Distributed VCS let everyone have their own server, where you can "pull" changes from other people and "push" changes to a server. The most common Distributed VCS are Git and Mercurial. [TODO: write more on Distributed VCS]
If you're working on a project I heavily recommend using a distributed VCS. I recommend Git because it's blazingly fast, but is has been criticized as being too hard to use. If you don't mind using a commercial product BitKeeper is supposedly easy to use.
The answer to another question also applies here, most importantly
Jon Works said:
The most important thing about version control is:
JUST START USING IT
His answer goes into more detail, and I don't want to be accused of plaigerism so take a look.
The simple answer is, do you like Undo buttons? The answer is of course yes, because we as human being make mistakes all the time.
As programmers, its often the case though that it can take several hours of testing, code changes, overwrites, deletions, file moves and renames before we work out the method we are trying to use to fix a problem is entirely the wrong one and the code is more broken than when we started.
As such, Source Control is a massive Undo button to revert the code to an earlier time when the grass was green and the food plentiful. And not only that, because of how source control works, you can still keep a copy of your broken code, in case a few weeks down the line you want to refer to it again and cherry pick any good ideas that did come out of it.
I personally (though it could be called overkill) use a free Single user license version of Source Gear Fortress (which is their Vault source control product with bug tracking features). I find the UI really simple to use, it supports both the checkout > edit > checkin model and the edit > merge > commit model. It can be a little tricky to set up though, requiring you to run a local copy of ISS and SQL server. You might want to try a smaller program, like those recommended by other answers here. See what you like and what you can afford.
Mark said:
git - very hot since Linus switched to it
I just want to point out that Linus didn't switch to it, Linus wrote it.
If you are working by yourself in a Windows environment, then the single user license for SourceGear's Vault is free.
We use and like Mercurial. It follows a distributed model - it eliminates some of the sense of having to "check in" work. Mozilla has moved to Mercurial, which is a good sign that it's not going to go away any time soon. One con, in my opinion, is that there isn't a very good GUI for it. If you're comfortable with the command line, though, it's pretty handy.
Mercurial Documentation
Unofficial Manual
Just start using source control, no matter what type you use. What you use doesn't matter; it's the use of it that is important
Like everyone else, SC is really dependant on your needs, your budget, your environment, etc.
At its root, source control is designed to provide a central repository of all your code, and track who did what to it when. There should be a complete history, and you can get products that do full changelogs, auditing, access control, and on and on...
Each product that is out there starts to shine (so to speak) when you start to look at how you want or need to incorporate SC into your environment (whether it's your personal code and documents or a large corporations). And as people use them, they discover that the tool has limitations, so people write new ones. SVN was born out of limitations that the creators saw with CVS. Linus wanted something better for the Linux kernel, so now we have git.
I would say start using one (something like SVN which is very popular and pretty easy to use) and see how it goes. As time progresses you may find that you need some other functionality, or need to interface with other systems, so you may need SourceSafe or another tool.
Source control is always important, and while you can get away with manually re-numbering versions of PSD files or something as you work on them, you're going to forget to run that batch script once or twice, or likely forget which number went with which change. That's where most of these SC tools can help (as long as you check-in/check-out).
See also this SO question:
Difference between GIT and CVS