What is the cleverest use of source repository that you have ever seen? - version-control

This actually stems from on my earlier question where one of the answers made me wonder how people are using the scm/repository in different ways for development.

Pre-tested commits
Before (TeamCity, build manager):
The concept is simple, the build system stands as a roadblock between your commit entering trunk and only after the build system determines that your commit doesn't break things does it allow the commit to be introduced into version control, where other developers will sync and integrate that change into their local working copies
After (using a DVCS like Git, that is a source repository):
My workflow with Hudson for pre-tested commits involves three separate Git repositories:
my local repo (local),
the canonical/central repo (origin)
and my "world-readable" (inside the firewall) repo (public).
For pre-tested commits, I utilize a constantly changing branch called "pu" (potential updates) on the world-readable repo.
Inside of Hudson I created a job that polls the world-readable repo (public) for changes in the "pu" branch and will kick off builds when updates are pushed.
my workflow for taking a change from inception to origin is:
* hack, hack, hack
* commit to local/topic
* git pup public
* Hudson polls public/pu
* Hudson runs potential-updates job
* Tests fail?
o Yes: Rework commit, try again
o No: Continue
* Rebase onto local/master
* Push to origin/master
Using this pre-tested commit workflow I can offload the majority of my testing requirements to the build system's cluster of machines instead of running them locally, meaning I can spend the majority of my time writing code instead of waiting for tests to complete on my own machine in between coding iterations.
(Variation) Private Build (David Gageot, Algodeal)
Same principle than above, but the build is done on the same workstation than the one used to develop, but on a cloned repo:
How not to use a CI server in the long term and not suffer the increasing time lost staring at the builds locally?
With git, it’s a piece of cake.
First, we ‘git clone’ the working directory to another folder. Git does the copy very quickly.
Next times, we don’t need to clone. Just tell git get the deltas. Net result: instant cloning. Impressive.
What about the consistency?
Doing a simple ‘git pull’ from the working directory will realize, using delta’s digests, that the changes where already pushed on the shared repository.
Nothing to do. Impressive again.
Of course, while the build is running in the second directory, we can keep on working on the code. No need to wait.
We now have a private build with no maintenance, no additional installation, not dependant on the IDE, ran with a single command line. No more broken build in the shared repository. We can recycle our CI server.
Yes. You’ve heard well. We’ve just built a serverless CI. Every additional feature of a real CI server is noise to me.
#!/bin/bash
if [ 0 -eq `git remote -v | grep -c push` ]; then
REMOTE_REPO=`git remote -v | sed 's/origin//'`
else
REMOTE_REPO=`git remote -v | grep "(push)" | sed 's/origin//' | sed 's/(push)//'`
fi
if [ ! -z "$1" ]; then
git add .
git commit -a -m "$1"
fi
git pull
if [ ! -d ".privatebuild" ]; then
git clone . .privatebuild
fi
cd .privatebuild
git clean -df
git pull
if [ -e "pom.xml" ]; then
mvn clean install
if [ $? -eq 0 ]; then
echo "Publishing to: $REMOTE_REPO"
git push $REMOTE_REPO master
else
echo "Unable to build"
exit $?
fi
fi
Dmitry Tashkinov, who has an interesting question on DVCS and CI, asks:
I don't understand how "We’ve just built a serverless CI" cohere with Martin Fowler's state:
"Once I have made my own build of a properly synchronized working copy I can then finally commit my changes into the mainline, which then updates the repository. However my commit doesn't finish my work. At this point we build again, but this time on an integration machine based on the mainline code. Only when this build succeeds can we say that my changes are done. There is always a chance that I missed something on my machine and the repository wasn't properly updated."
Do you ignore or bend it?
#Dmitry: I do not ignore nor bend the process described by Martin Fowler in his ContinuousIntegration entry.
But you have to realize that DVCS adds publication as an orthogonal dimension to branching.
The serverless CI described by David is just an implementation of the general CI process detailed by Martin: instead of having a CI server, you push to a local copy where a local CI runs, then you push "valid" code to a central repo.
#VonC, but the idea was to run CI NOT locally particularly not to miss something in transition between machines.
When you use the so called local CI, then it may pass all the tests just because it is local, but break down later on another machine.
So is it integeration? I'm not criticizing here at all, the question is difficult to me and I'm trying to understand.
#Dmitry: "So is it integeration"?
It is one level of integration, which can help get rid of all the basic checks (like format issue, code style, basic static analysis detection, ...)
Since you have that publication mechanism, you can chain that kind of CI to another CI server if you want. That server, in turn, can automatically push (if this is still fast-forward) to the "central" repo.
David Gageot didn't need that extra level, being already at target in term of deployment architecture (PC->PC) and needed only that basic kind of CI level.
That doesn't prevent him to setup more complete system integration server for more complete testing.

My favorite? An unreleased tool which used Bazaar (a DSCM with very well-thought-out explicit rename handling) to track tree-structured data by representing the datastore as a directory structure.
This allowed an XML document to be branched and merged, with all the goodness (conflict detection and resolution, review workflow, and of course change logging and the like) made easy by modern distributed source control. Splitting components of the document and its metadata into their own files prevented the issues of allowing proximity to create false conflicts, and otherwise allowed all the work that the Bazaar team put into versioning filesystem trees to work with tree-structured data of other kinds.

Definitely Polarion Track & Wiki...
The entire bug tracking and wiki database is stored in subversion to be able to keep a complete revision history.
http://www.polarion.com/products/trackwiki/features.php

Related

Auto commit and auto push changes in local repo to git

I have a local development system where I have a Ubuntu-Server VM and I use eclipse in windows host. I develop in eclipse using Remote System Explorer & SSH. I want that whenever I save a file or do some changes in ubuntu-server's /var/www/site-folder it automatically commits and pushes the changes in my git repo. I did try the google but it wasn't much of a help. Any help is appreciated guys. Really wanna improve my workflow.
This sounds like something you'll have to script. If you save as much as I do (a lot), then you'll end up with a lot of commits. Unless you're careful about when you save, you'll probably end up with a messy history, unless you squash things later.
Are you sure that you want to commit and push automatically every time you save? It also matters whether or not you're pushing to your own private branch or repo.
Actually I think there are use cases where this /is/ a good idea. If you work on two different machines (even not simultaneously), for instance, you cannot share the Eclipse workspace. One simple way to overcome this is to put a bare git repository on a cloud server (dropbox, copy, one drive, etc) and push all work, completed or otherwise, to that everytime you close eclipse.
Will the repo be messy? Sure, but that's not the point.
I could find no easy hooks within Eclipse itself to automate this so I simply put an invocation of Eclipse in a script and finished off with:
git commit -a -m "WIP commit"
git push origin
You just have to watch out for newly-created files and remember to add those before you exit.

Managing multiple users with GitHub

Can anyone please consult me on one thing?
We have a project, and we just decided to hire more programmers to work on it.
Up to now I was the only programmer, backing up the code on GitHub.
But now I need to find a safe way how to manage multiple programmers.
So there is a master branch of private project, which other programmers should be able to clone.
But they should not be able to commit changes to master branch themselves.
Perhaps they should make their own branches, and commit changes there.
And I should be the only person who is able to review their work, and merge it to master branch if it works right.
Can anyone please tell me how exactly I should set it up?
Or send some good tutorials?
Thanks so much
Collaborative coding is rather the whole point behind Github. Here's an illustrative workflow to get you started. A similar development flow is absolutely essential to open source projects (which by nature must work via the internet). In fact, a lot of open source projects use Github. You can use this process, too; although there are some caveats, which I've listed at the bottom.
The key to understanding this workflow is that each developer will manage 2 repositories:
Personal work, on their local machine
Work visible to all participants, hosted by Github.com
Setup process, integrator (project lead):
Create github account
Host project there
Setup process, developer:
Create github account
Fork the project
git clone forked branch to local machine
Development process, developer:
Work on local copy
Push to github copy. (Remember, each developer has their own Github branch.)
Submit a pull request to the project lead's Github repository.
Rinse and repeat.
Development process, integrator:
Look on Github for pull requests
Review them, approve them, merge them
Work on local copy, as above, sans the pull requests
However, this isn't the only possible workflow. For example, Github makes it convenient for your developers to send pull requests to each other, e.g. if two of them are working together on a feature. This way, all of them may work in the "integrator" role somewhat.
Caveats:
If your program is not open source, then there is one caveat to using github: you must pay to host private repositories. They have a mechanism (disclaimer: never used) for organizing collections of people to work on either public or private repositories, however, and I believe the cost can be paid entirely in the organization owner — which would be great for your developers, and cost some extra for you.
If you only have a few contributers, you might be able to get by with having a free private repository by using Bitbucket instead of Github. They have an option to host private repos for free, and the workflow would be about the same as what I've outlined above.
The best way to achieve this is with a pre-receive hook which looks at the username of the person carrying out the commit and the branch they are trying to commit to. If the username isn't in a list of allowed users and the branch is the master branch then deny the push.
e.g.
#!/bin/bash
allowedUsers=( 'bob' 'john' 'george' ); # list of allowed usernames
while read oldrev newrev ref ; do
echo ${allowedUsers[#]} | grep -q $(whoami);
if [ $? -eq 1 ] && [ "$ref" = 'refs/heads/master' ] ; then
echo "You are not allowed to push to master branch";
exit 1;
fi
done
When working with GitHub the best and easiest way is to use Pull Requests. In your case every programmer is given only Pull access to the repository, and has to cone it. When he or she is ready with the changes, that programmer sends you a pull request. You then review the request, and if ok merge it into the main repository.

Sharing a core codebase between multiple projects

We have several product lines built around a common core and currently maintain them in SVN using externals. Moving to mercurial, it is natural to move to use hg sub-repositories.
The thing is the core is quite large (probably >GB, judging by the SVN repo), and a typical developer sometimes wishes to work simultaneously on several products, say 3-4.
Did I get it correctly that it usually means a developer would have the core replicated 3-4 times for each developer, with its entire history?
Also, if a developer wishes to perform some simple operation in another product, it would mean the core have to be pulled first, even though it is already available at the client (several time...)?
In order to truly share the subrepository (and not its working copy), you can use the share extension. However, that makes the cloning process a bit counter-intuitive:
hg clone -U remote_core core
hg clone -U remote_projectA projectA
cd projectA
hg share ../core core
hg update
cd ..
hg clone -U remote_projectB projectB
cd projectB
hg share ../core core
hg update
And so on. But I warn you that you are going to have more than one headache with this setup. At work, we have a similar setup, but instead the shared subrepository has a branch (not a named branch, but a clone branch, a dedicated master repository) for each project that uses it. That way projects can modify the shared code independently while still having the easy merging between them.

How do you use Git within Eclipse as it was intended?

I've recently been looking at using Git to eventually replace the CVS repository we have at work. However after watching Linus Torvalds' video on YouTube about Git it seems that every tutorial I find suggests using Git in the same way CVS is used except that you have a local repository which I agree is very useful for speed and distribution.
However the tutorials suggest that what you do is each clone the repository you want to develop on from a remote location and that when changes are made you commit locally building up a history to help with merge control. When you are ready to commit your changes you then push them to the remote location, but first you fetch changes to check for merge conflicts (just like CVS).
However in Linus' video he describe the functionality of Git as a group of developers working on some code pushing and fetching from each other as needed, not using a remote location i.e. a centralized location. He also describes people pushing their changes out to verifiers who fetch and push code also. So you can see it's possible to create scalable structure within a company also.
My question is can anybody point me in the direction of some tutorials that actually explain how to do this distributed development of code using Git so that developers push and fetch code from each other with out committing to the remote repository and if possible it would be very nice to have this tutorials Eclipsed based.
Thanks in advance,
Alexei Blue.
I don't know any specific tutorial about this. In general, for connecting to a repository, you have to be running a git server that listens (and authenticates) to git requests.
To have a specific repository for each developer is possible - but each repository needs that server component. It is possible to store multiple repositories on the same computer, that allows reducing the number of servers required.
However, for other reasons it is beneficial to have some kind of central structure (e.g. a repository for stuff to be released; or a repository for stuff not verified yet). This structure is not required to be a single central repository, but multiple ones with well-defined workflows regarding the data move between repositories (e.g. if code from the verification repository is validated, it should be pushed to the release repository).
In other words, you should be ready to create Git servers (e.g. see http://tumblr.intranation.com/post/766290565/how-set-up-your-own-private-git-server-linux for details; but there are other tutorials for this as well), and define workflows for your own company to use it.
Additionally, I recommend looking at AlBlue's blog series called Git Tip of the Week.
Finally, to ease the introduction I suggest to first introduce Git as a direct replacement for CVS, and then present the other changes one by one.
Take a look at alblue's blog entry on Gerrit
This shows a workflow different from the classic centralized server such as CVS or SVN. Superficially it looks similar as you pull the source from a central Git server, but you push your commits to the Gerrit server that can compile and test the code to make sure it works before eventually pushing the changes to the central Git server.
Now instead of pushing the changes to Gerrit, you could have pushed the code to your pair programming buddy and he could have manually reviewed and tested the code.
Or maybe you're going on holiday and a colleague will finish the task you've started. No problem, just push your changes to their Git repo.
Git doesn't treat any of these other Git instances any different from each other. From Git's perspective, none of them are special.

How should I work on a CVS hosted project to both (1) fix bugs and (2) maintain my own private fork with additional features

The question
An open source program uses CVS for version control. I would like to make a number of bug-fixes and submit patch bombs to the developers with commit access. I would also like to maintain my own semi-private fork that mainly tracks the main code-base but that includes my own features (these features, right now, should not be incorporated into the main code-base.)
I prefer to use mercurial for my own version control needs, but I am open to other version control systems if necessary.
I'd like to:
Be able to easily create patch-bombs against the current CVS source with my own bug-fixes
Keep track of history on my own features
Have fixes and improvements from the main tree easily incorporated in my new-feature fork
Easily apply my own bug-fixes to my new-feature fork
Be able to work and track change history without an Internet connection.
What suggestions do you have for doing this?
My current idea
My own best guess is below, to give you a better idea of what I am thinking about.
I will have 3 mercurial repositories.
The first two repos are managed as specified at (https://wiki.mozilla.org/Using_Mercurial_locally_with_CVS). One just mirrors the latest changes from the CVS upstream. I do "cvs update" then "hg commit" in this repo. The second repo holds my bug-fixes as patches using the mq extension and I pull from the the first repo and re-base my patches every so often. When my patches are incorporated into the main tree, I remove the patches from the patch queue/make them permanent commits.
The third repo is my local fork. It will start out as a clone of the first repo. Then each time I do an update of the first repo, I'll pull from it into repo 3. My own features will be directly present as commits in this repo. When I fix a bug, I'll export a patch from repo 2 and apply it to the appropriate pull from repo 1.
I have used Git to manage changes on top of a CVS repository in a similar way. My solution in Git uses local branches instead of multiple repositories, but it sounds essentially similar to your proposed idea.
I found that this arrangement works best if you commit all the CVS metadata (in the CVS/) subdirectories) to your mirrored repository. This means that the CVS metadata gets replicated in the other repositories, but it doesn't cause any harm (and lets you run commands like cvs diff if you need to).