Best Practices for versioning web site? - version-control

What's are the best practices for versioning web sites?
Which revision control systems are well suited for such a job?
What special-purpose tools exist?
What other questions should I be asking?

Firstly you can - and should - use a revision control system, most will handle binary files although unlike text files you can't merge two different set of changes so you may want to set the system up to lock these files whilst they are being changed (assuming that that's not the default mode of operation for you rcs in the first place).
Where things get a bit more interesting for Websites is managing those files that are required for the site but don't actually form part of the site - the most obvious example being something like .psd files from which web graphics are produced but which don't get deployed.
We therefore have a tree for each site which has two folders: assets and site. Assets are things that aren't in the site, and site is - well the site.
What you have to watch with this is that designers tend to have their own "systems" for "versioning" graphic files (count the layers in the PSD). You don't need necessarily to stop them doing this but you do need to ensure that they commit each change too.
Other questions?
Deployment. We're still working on this one (-: But we're getting better (I'm happier now with what we do!)
Murph

In response to Christian Lescuyer's post, you also need to enable the "svn:keywords" property on the file with that line in it. Subversion won't bother looking in your files for keywords like $Revision$ unless that property is set.
Also, if using PHP like in his example, you may want to put $Revision$ inside a single-quoted string instead of a double quoted string to prevent PHP from trying to parse $Revision as a PHP variable and throwing a warning. :)

I use Subversion.
As an easy way to reference the website version (production, testing, development), I use a very simple trick. I add the revision number somewhere on the site (eg in the admin footer). Something like this:
<?php print("$Revision: 1 $"); ?>
Each time you checkout (development versions) or export (for production), the "1" will be replaced by the revision number in your repository, thus making it easy to setup the customer version on your test server, for example.

Related

Should I put included code under SCM?

I'm developing a web app.
If I include a jQuery plugin (or the jQuery file itself), this has to be put under my static directory, which is under SCM, to be served correctly.
Should I gitignore it, or add it, even if I don't plan on modifying anything from it?
And what about binary files (graphic resources) that might come with it?
Thanks in advance for any advice!
My view is that everything you need for your application to run correctly needs to be managed. This includes third-party code.
If you don't put it under SCM, how is it going to get deployed correctly on your production systems? If you have other ways of ensuring that, that's fine, but otherwise you run the risk that successful deployment is a matter of people remembering to do all the right things, rather than some automated low-risk "push the button" procedure.
If you don't manage it under SCM or something similar, how do you ensure that the versions you develop against and test against are the same? And that they're the same as production? Debugging an issue caused by a version difference you don't notice can be horrible.
I generally add external resources to my project directly. Doing so facilitates deployment and ensures that if someone changes the version of this file in your project, you have a clear audit history of what happened in case it causes issues in the code that you've written. Developers should know not to modify these external resources.
You could use something like git submodules, I suppose, but I haven't felt that this is worth the hassle in the past.
Binary files from external sources can be checked in to the project as well, although if they're extremely large you may want to consider a different approach.
There aren't a lot of reasons not to put external resources like jQuery into your repo:
If you pull it down from jQuery every time you check out or deploy, you have less control over which version you're using. This holds true for most third-party libraries; you probably don't want to upgrade your libraries without testing with your code to see if it breaks something.
You'll always have a complete copy of your site when you check out your repository and you won't need to go seeking resources that may have become unavailable.
For small (in terms of filesize) things like jQuery and images, I'd just add them unless you're really, really concerned about space.
It depends.
These arguments relate to having a copy of the library on your system and not pulling it from it's original location.
Arguments in favour:
It will ensure that everything needed for your project can be found in one place when someone else joins your development team. I've lost count of the number of times I've had to scramble around looking for the right versions of libraries in order to be able to get something working.
If you make any modifications to the library you can make these changes to the source controlled version so when a new version comes out you use the source control's merging tools to ensure your edits don't go missing.
Arguments against:
It could mean everyone has a copy of the library locally - unless you map the 3rd party tools to a central server.
Deploying could be problematical - again unless you map the 3rd party tools to a central server and don't include them in the deploy script.

Is there a revision control system that allows us to manage multiple parallel versions of the code and switch between them at runtime?

If I want to enable a new piece of functionality to a subset of known users first, is there any automated system of framework that exists to do this?
Perhaps not directly with version control - you might be interested to read how flickr goes about selectively deploying functionality: http://code.flickr.com/blog/page/2/
And this guy talks about implementing something similar in a rails app: http://www.alandelevie.com/2010/05/19/feature-flippers-with-rails/
Most programming languages have if statements.
I don't know what "switching between them at runtime" means. You usually don't check executable code into an SCM system. There's a separate process to check out, build, package, and deploy. That's the province of continuous integration and automated builds in agile techniques.
SCM systems like Subversion allow you to have tags and branches for parallel development. You're always free to build, package, and deploy those as you see fit.
As far as I know no...
If you wanted a revision control system that had multiple versions that you could switch between. Find a SCM you like and lookup branching.
But, it sounds like you want it to me able to switch versions in the SCM programmatically during runtime. The problem with that is, for a revision control system to be able to do that it would have to be aware of the language and how it's implemented.
It would have to know how load and run the next version. For example, if it was C code it would have to dynamically compile and run it on the fly. If it was PHP it would have to magically load the script in a sandbox http server that has PHP support. Etc... In which case, it isn't possible.
You can write an app to change the version in the scm by using the command line.
To do it during runtime, that functionality has to be part of the application itself.
The best (only) way I can think of doing it is to have one common piece of code that acts like a 'bootloader', which uses a system call to checkout the correct branch based on whatever your requirements are. It then (if necessary) compiles that code, and runs it.
It's not technically 'at runtime', but it appears that way if it works.
Your first other option is something that dynamically loads code, but that's very language-dependent, and you'd need to specify.
The other is to permanently have both in the working codebase (which doubles your size if it's a full duplication), and switch at runtime. You can save a good bit of space by using objects that are shared between both branches, and things like conditional compilation to use the same source files for both targets.

How do you organize your temporary workfiles?

I do alot of bugfixing and implementing new features for several different customers. These customers all report their bugs, change requests and new feature request into our Trac system.
Sometimes these requests result in me creating some SQL change scripts, sometimes there are Excel documents or Access databases with testdata, Word documents from the customer and so on. Alot of files that are used to fix one ticket and then can be deletede when the ticket is closed.
I usualy do this by creating folders in the filesystem like this: /customerXX/TicketNNNNN and then just dumping everything in there.
How do you organize your workfiles? Have you found some fantastic tool to do this?
I would say for scripts or files that are related to a particular ticket, the best thing to do would be to attach the file to that ticket in your issue tracking software - almost all issue trackers that I've worked with will allow you to do this. That way, you can look back and a) see exactly what you did in case something goes wrong, or b) do exactly the same thing if the issue comes up again later. That's almost certainly the best place to keep files with extra info from the customer, too (or at least the first place most people will look).
For frequently re-used scripts that aren't specific to a particular ticket, I would create a scripts/ or bin/ directory in the associated project, and keep them in there.
I also have a small handful of useful files that I keep in src/misc/ off my home directory, with things like SQL queries to get readable "explain" output out of Oracle and such, that aren't specific to any particular project. The number of these is small enough that subdirectories aren't necessary, though - I suspect if you ended up with a large number of these files, many of them could/should be moved to specific projects or your issue tracking system.
JIRA has been quite helpful for this at my site. It supports issue tracking, file attachments,and you can easily customize and categorize your projects and issues.
I use Fogbugz and I add all file to the case. I believe that no matter what application you use, The important is to keep this files for future references. If your bug-tracking tool does not let you attach file then add the files to the version control.
We use CaWeb4 and find it very easy to use for our bug tracking.

Version control of deliverables

We need to regularly synchronize many dozens of binary files (project executables and DLLs) between many developers at several different locations, so that every developer has an up to date environment to build and test at. Due to nature of the project, updates must be done often and on-demand (overnight updates are not sufficient). This is not pretty, but we are stuck with it for a time.
We settled on using a regular version (source) control system: put everything into it as binary files, get-latest before testing and check-in updated DLL after testing.
It works fine, but a version control client has a lot of features which don't make sense for us and people occasionally get confused.
Are there any tools better suited for the task? Or may be a completely different approach?
Update:
I need to clarify that it's not a tightly integrated project - more like extensible system with a heap of "plugins", including thrid-party ones. We need to make sure those modules-plugins works nicely with recent versions of each other and the core. Centralised build as was suggested was considered initially, but it's not an option.
I'd probably take a look at rsync.
Just create a .CMD file that contains the call to rsync with all the correct parameters and let people call that. rsync is very smart in deciding what part of files need to be transferred, so it'll be very fast even when large files are involved.
What rsync doesn't do though is conflict resolution (or even detection), but in the scenario you described it's more like reading from a central place which is what rsync is designed to handle.
Another option is unison
You should look into continuous integration and having some kind of centralised build process. I can only imagine the kind of hell you're going through with your current approach.
Obviously that doesn't help with the keeping your local files in sync, but I think you have bigger problems with your process.
Building the project should be a centralized process in order to allow for better control soon your solution will be caos in the long run. Anyway here is what I'd do.
Create the usual repositories for
source files, resources,
documentation, etc for each project.
Create a repository for resources.
There will be the latest binary
versions for each project as well as
any required resources, files, etc.
Keep a good folder structure for
each project so developers can
"reference" the files directly.
Create a repository for final buidls
which will hold the actual stable
release. This will get the stable
files, done in an automatic way (if
possible) from the checked in
sources. This will hold the real
product, the real version for
integration testing and so on.
While far from being perfect you'll be able to define well established protocols. Check in your latest dll here, generate the "real" versiĆ³n from latest source here.
What about embedding a 'what' string in the executables and libraries. Then you can synchronise the desired list of versions with a manifest.
We tend to use CVS id strings as a part of the what string.
const char cvsid[] = "#(#)INETOPS_filter_ip_$Revision: 1.9 $";
Entering the command
what filter_ip | grep INETOPS
returns
INETOPS_filter_ip_$Revision: 1.9 $
We do this for all deliverables so we can see if the versions in a bundle of libraries and executables match the list in a associated manifest.
HTH.
cheers,
Rob
Subversion handles binary files really well, is pretty fast, and scriptable. VisualSVN and TortoiseSVN make dealing with Subversion very easy too.
You could set up a folder that's checked out from Subversion with all your binary files (that all developers can push and update to) then just type "svn update" at the command line, or use TortoiseSVN: right click on the folder, click "SVN Update" and it'll update all the files and tell you what's changed.

Do you version "derived" files?

Using online interfaces to a version control system is a nice way to have a published location for the most recent versions of code. For example, I have a LaTeX package here (which is released to CTAN whenever changes are verified to actually work):
http://github.com/wspr/pstool/tree/master
The package itself is derived from a single file (in this case, pstool.tex) which, when processed, produces the documentation, the readme, the installer file, and the actual files that make up the package as it is used by LaTeX.
In order to make it easy for users who want to download this stuff, I include all of the derived files mentioned above in the repository itself as well as the master file pstool.tex. This means that I'll have double the number of changes every time I commit because the package file pstool.sty is a generated subset of the master file.
Is this a perversion of version control?
#Jon Limjap raised a good point:
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
That's really the crux of the matter in this case. Yes, released versions of the package can be obtained from elsewhere. So it does really make more sense to only version the non-generated files.
On the other hand, #Madir's comment that:
the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes
is also rather pertinent in that if a user finds a bug and I fix it immediately, they can then head over to the repository and grab the file that's necessary for them to continue working without having to run any "installation" steps.
And this, I think, is the more important use case for my particular set of projects.
We don't version files that can be automatically generated using scripts included in the repository itself. The reason for this is that after a checkout, these files can be rebuild with a single click or command. In our projects we always try to make this as easy as possible, and thus preventing the need for versioning these files.
One scenario I can imagine where this could be useful if 'tagging' specific releases of a product, for use in a production environment (or any non-development environment) where tools required for generating the output might not be available.
We also use targets in our build scripts that can create and upload archives with a released version of our products. This can be uploaded to a production server, or a HTTP server for downloading by users of your products.
I am using Tortoise SVN for small system ASP.NET development. Most code is interpreted ASPX, but there are around a dozen binary DLLs generated by a manual compile step. Whilst it doesn't make a lot of sense to have these source-code versioned in theory, it certainly makes it convenient to ensure they are correctly mirrored from the development environment onto the production system (one click). Also - in case of disaster - the rollback to the previous step is again one click in SVN.
So I bit the bullet and included them in the SVN archive - the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes.
Not necessarily, although best practices for source control advise that you do not include generated files, for obvious reasons.
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
Normally, derived files should not be stored in version control. In your case, you could build a release procedure that created a tarball that includes the derived files.
As you say, keeping the derived files in version control only increases the amount of noise you have to deal with.
In some cases we do, but it's more of a sysadmin type of use case, where the generated files (say, DNS zone files built from a script) have intrinsic interest in their own right, and the revision control is more linear audit trail than branching-and-tagging source control.