Distributed Version Control. - Git & Mercurial... multiple sites - version-control

I'm looking for a best practice scenario on managing multiple "sites" in mercurial. Since I'm likely to have multiple sites in a web root, all of which are different - but somewhat similar (as they are 'customizations' of a root app)
Should I
A) make a single repository of the wwwroot folder (catching all changes across all sites)
B) make EACH sits folder a different repository
this issue is that each site needs a distinct physical directory, due to vhost pointing for development, and a current need to have "some" physical file difference cross site.
What's the best practice here? I'm leaning towards separate repositories for each directory. which will make tracking any branching and merging for that ONE site cleaner....

It depends on how your software is structured, and how independent the different sites are. The best situation is when you can use your core code like a library, which lives in its own directory, and there is no need in the different sites to change even a single file of core. Then you have the free choice if you want to develop the core along with the different sites in a single repo, or to seperate core from sites. When core and the different sites are dependent on each other, you very probably have to deal with all of them in a sigle repo.
Since in my experience development work better when the different parts are independend of each other I strongly recommend to bring the core stuff into something which can be included into the sites by a directory inclusion.
The next point is how are the different sites developed. If they share lots of code, they can be developed as different branches. But there are two disadvantages of this scheme:
the different sites are normally not visible to the developer, since there is typically only one checked out
The developer has to take great care where to create changes, so that only the wanted changes are going into other branches, not something which is special to a single branch only
You might consider to move common parts of different sites into core if they share lots of common code.
Another situation is if they all have nothing in common, since then things are much better. Then you need to decide if you want them to reside in different repos, or as different directories in a single repos. When these different sites are somehow related to each other (say that they are all of the same company), then it might be better to put them into a common repo, as different subdirectories. When they are unrelated to each other (every site belongs to a different customer, and changes on these sites are not created in synch to each other), than one repo per site is better.
When you have the one repo per site approach, it might also be good if you first create a template site, which includes the core component and basic configuration, and derive your site-repos as clones from this template. Then when you change something in the core which also affects the sites, you do these changes in the template, and merge these changes afterwards into the sites repos (you only need to take care to NOT do this change in one of the site-repos, since when you merge from sites to template you might get stuff from the specific site into the template, which you don't want to be in the template).
So I suggest
develop core as a single independent product
choose the correct development model for your sites
all in one repo, with branches, when there is much code-exchange is goin on between different sites
but better refactor the sites to not share code, since the branches approach has drawbacks
all in one repo, no branches but different folders, if there is no code exchange between different sites
one repo for each site if they are completely independent.

I think, you have to try Mercurial Queues with one repo. I.e
you store "base" site in repository
all site-specific changes separated into the set of MQ-patches (one? patch per site)
you can create "push-only" repos in sites, add they to [paths] section of "working" repo and push changes or use export-copy technique
and after applying the site-patch to codebase you'll get ready to use code for each and every site

Related

GitHub Multiple Repositories vs. Branching for multiple environments

This might be a very beginner question, but I'm working on a large production website in a startup environment. I just recently started using Heroku, Github, and Ruby on Rails because I'm looking for much more flexibility and version control as compared to just locally making changes and uploading to a server.
My question, which might be very obvious, is if I should use a different repository for each environment (development, testing, staging, production, etc.) or just a main repository and branches to handle new features.
My initial thought is to create multiple repositories. For instance, if I add a new feature, like an image uploader, I would use the code from the development repository. Make the changes, and upload it with commits along the way to keep track of the small changes. Once I had tested it locally I would want to upload it to the test repository with a single commit that lists the feature added (e.g. "Added Image Uploader to account page").
My thought is this would allow micro-managing of commits within the development environment, while the testing environment commits would be more focused on bug fixes, etc.
This makes sense in my mind because as you move up in environments you remove the extraneous commits and focus on what is needed for each environment. I could also see how this could be achieved with branches though, so I was looking for some advice on how this is handled. Pros and cons, personal examples, etc.
I have seen a couple other related questions, but none of them seemed to touch on the same concerns I had.
Thanks in advance!
-Matt
Using different repos makes sense with a Distributed VCS, and I mention that publication aspect (push/pull) in:
"How do you keep changes separate and isolated across multiple deployment environments in git?"
"Reasons for not working on the master branch in Git"
The one difficult aspect of managing different environments is the configuration files which can contain different values per environment.
For that, I recommend content fiter driver:
That helps generating the actual config files with the current values in them, depending on the current deployment environment.

Mantaining a CMS and websites with Mercurial

I'm pretty new to mercurial and after reading a few tutorials I'm still doubtful on what's the best way to do what I intend to do with it.
My goal would be to mantain a CMS that I'm developing (adding new features, fixing bugs, etc.) and being able to easily distributes those updates to websites I make with said CMS.
I started by making a repository for the CMS itself, then when I want to make a new website clone the CMS repository and work on it.
Now the questions: working on a website there are changes that will be specific for that and changes that I'd like to see also on the main CMS repository. How to distinguish them?
Should I create a new branch and commit all the website specific changes to that branch and the general changes to the default branch? Or shall I use tags?
What I'm looking for is an easy way to push back changes to the CMS repository, then continue to develop the CMS (in other websites for example) and eventually update all websites I made with the CMS with new features and bug fixes without too much hassle.
What's the best way to deal with the situation I described?
Thanks in advance.
Well, you really ask at least two questions (as I see)
How to maintain diverged lines of development?
How to easy distribute changes from one (?) DEVEL env to different PRODs env
Full final answer on second question will require to clarify many of the specific details, and I propose that we'll postpone it for a while.
On the first question: you are right, named branches (into which you periodically merge default branch with shared changes) for sites inside single repo may be good choice (not tags, which are only easy memorable labels for changesets).
Alternative solution may use mq on top of single default branch (with separate mq-queues for each site)

new to mercurial and VCS: shared code multi-server setup

In our small office we're setting up mercurial - our first time using a "real" version control system. We've got three servers - a live server, a staging server and a development server.
We've also got three relatively large web sites - one for visitors, one for users and an intranet site, for the office staff.
The three web sites share some code. (for instance - a php class library, some commonly used code snippets, etc.)
Before version control, we just used symbolic links to link to the shared libraries. For example: each site had a symbolic link to an "ObjectClasses" directory - any changes made to a file in ObjectClasses would be instantly available to all the sites. You'd just upload the changed file to staging and to live, and you were done.
But... Mercurial doesn't follow symbolic links. So I've set up a subrepository for the shared libraries in the three sites on the three servers (actually 'four' servers if you count the fact that there are two programmers with two separate clones of the repository on the development server).
So there are 12 working copies of the shared object library.
So here's the question:
Is there any way to simplify the above set up?
Here's an example of what our workflow will be and it seems too complicated - but maybe this is what it's like using version control and we just need to get used to it:
Programmer A makes a change to Object Foo in the subrepo in Site 1. He wants to make this available everywhere, so he commits it, then pushes it to the staging server. I set up hooks on the staging server to automatically propogate the changes to the three sites, on the staging server, and again to the three sites on the live server. That takes care of the 6 working copies on the staging and live servers. So far, so good.
but what about the development server, where there may be work-in-progress on these files?
Programmer A now needs to manually pull the shared subrepo to Sites 2 and 3 on the development server. He also needs to tell Programmer B to manually pull the shared subrepo on Sites 1, 2 and 3 on his copy of the site on the development server. What if he's editing Object Foo on Site 1 and making different edits to Object Foo on Site 2. He's going to have to resolve two separate conflicts.
We make changes to the objects relatively frequently. This is going to drive us nuts. I really love the idea of version control - but after two weeks of wrestling with trying to find the best setup, the old sloppy way of having one copy of the shared files and calling out "hey - ya working on that file, I wanna make a change" is looking pretty good right now.
Is there really no simpler way to set this up?
Without more information about the specific web platform and technologies you're using (e.g., .NET, LAMP, ColdFusion, etc.), this answer may be inadequate, but let me take a stab nevertheless. First, if I understand you correctly, it's your working paradigm that's the problem. You're having developers make changes to files and then push them to three different sites. I suggest separating the development concerns altogether from the build/deploy concerns.
It sounds like you're using subrepositories in Mercurial to handle shared code--which is smart, by the way--so that's good. That takes care of sharing code across multiple projects. But rather than have each programmer pushing stuff to a given server after he updates it, have the programmers instead be pushing to some other "staging" repository. You could have one for each of your servers if you wish, though I think it probably makes more sense to keep all development in a single staging or "master" repository which is then used to build/deploy to your staging and/or live server.
If you wish to automate this process, there are a number of tools that can do this. I usually prefer NAnt with CruiseControl for build integration, but then my work is mostly .NET which makes it a great fit. If you can provide more specifics I can provide more details if you like, but I think the main problem for you to overcome is the way you're handling the workflow. Use Mercurial to keep multiple developers happy pulling/pushing from a single repository and then worry about deploying to your servers for testing as a separate step.

I want to separate binary files (media) from my code repositories. Is it worth it? If so, how can I manage them?

Our repositories are getting huge because there's tons of media we have ( hundreds of 1 MB jpegs, hundreds of PDFs, etc ).
Our developers who check out these repositories have to wait an abnormally long time because of this for certain repos.
Has anyone else had this dilemma before? Am I going about it the right way by separating code from media? Here are some issues/worries I had:
If I migrate these into a media server then I'm afraid it might be a pain for the developer to use. Instead of making updates to one server he/she will have to now update two servers if they are doing both programming logic and media updates.
If I migrate these into a media server, I'll still have to revision control the media, no? So the developer would have to commit code updates and commit media updates.
How would the developer test locally? I could make my site use absolute urls, eg src="http://media.domain.com/site/blah/image.gif", but this wouldn't work locally. I assume I'd have to change my site templating to decide whether it's local/development or production and based on that, change the BASE_URL.
Is it worth all the trouble to do this? We deal with about 100-150 sites, not a dozen or so major sites and so we have around 100-150 repositories. We won't have the time or resources to change existing sites, and we can only implement this on brand new sites.
I would still have to keep scripts that generate media ( pdf generators ) and the generated media on the code repository, right? It would be a huge pain to update all those pdf generators to POST files to external media servers, and an extra pain taking caching into account.
I'd appreciate any insight into the questions I have regarding managing media and code.
First, yes, separating media and generated content (like the generated pdf) from the source control is a good idea.
That is because of:
disk space and checkout time (as you describe in your question)
the lack of CVS feature actually used by this kind of file (no diff, no merge, only label and branches)
That said, any transition of this kind is costly to put in place.
You need to separate the release management process (generate the right files at the right places) from the development process (getting from one or two referential the right material to develop/update your projects)
Binaries fall generally into two categories:
non-generated binaries:
They are best kept in an artifact repository (like Nexus for instance), under a label that would match the label used for the text sources in a VCS
generated binaries (like your pdf):
ideally, they shouldn't be kept in any repository, but only generated during the release management phase in order to be deployed.

Good github structure when dealing with many small projects that have a common code base?

I'm working for a web development company and we're thinking about using GitHub for version control. We work with several different .NET-based CMS-platforms and of course with a lot of different customers.
We have a standard code base for each CMS which we start from when building a new site. We of course would like to maintain that and make it possible to merge some changes from a developed site (when the standard code base has been improved in some way).
We often need to make small changes to a published site at a later date and would like to be able to do this with minimal effort (i.e. the customer gladly pays for us to fix his problem in 2 hours, but doesn't want to pay for a 2 hour set up first).
How should we set this up to be able to work in an efficient fashion? I'm not very used to distributed version control (I've worked with CVS, Subversion and Clear Case before), but from my imagination we could:
Set up one repository for each customer, start with a copy of the standard code base and go from there. Lots of repositories of course.
Set up one repository for each CMS and then branch off one branch for each customer. This is probably (?) the best way, but what happens when we have 100 customers (=branches) in the same repository? It also doesn't feel entirely nice that we create a lot of branches that we don't really have any intention of ever merging back to the main branch.
I don't know, perhaps lots of branches is only a problem in my imagination or perhaps there are better ways to do this that I haven't thought about. I would be interested in any experince in similar problems.
Thank you for your time and help.
With Git, several repos make sense for submodules purpose (sharing common component, see nature of Git submodules, in the third part of the answer)
But in your case, one repo with a branch per customer can work, provided you are using the branches to:
isolate some client-specific changes (and long-lived branch with no merging back to master are ok),
while rebasing those same branches on top of master (which contains the common code, with common evolutions needed by all the branches).