Is there an open source equivalent to piper, Google's version control tool? - version-control

Google stores all its codebase in a single repository called piper [1] [2] [3].
It has an approach that is very different than open source alternatives do (centralized 'cloud' service) and aims at scaling to a repository with billions of files, thousands of developers and millions of commits [1].
It doesn't seem Google open-sourced it nor plan to do so (contrary to their build system blaze and some other tools [4]).
Are you aware of any open source version control system with an approach similar to piper?

The short answer is no, it doesn't seem to exist.
As you can read in a Quora article, "it’s hard to tell where the version control system ends, and where some of the other parts of the development toolchain begin".
So, first, you need to be clear in what "features" you are interested in since you can be interested in a feature that is not Piper's responsibility.
Also, keep in mind that your server disk space and OS would limit the file count/size before the chosen VCS.
If you need a Centralized VCSs and billions of files, you could go with SVN or OpenCVS.
If you need a Distributed one with thousands of developers and millions of commits, take a look at Git, Bazaar, Bitbucket or Mercurial.
But do you really have all those requirements?
AFAIK there's no Piper's open source equivalent on the market.
In order to better understand Centralized and Distributed VCS, take a look at this Comparison between Centralized and Distributed Version Control Systems
Also, take a look at what is Google's repository like?

Two recent developments bring Piper-like features to Git: VFS for Git and sparse-checkout.
The first: Microsoft recently open-sourced VFS for Git which feels like it brings some of Piper's monorepo features to Git.
VFS for Git virtualizes the filesystem beneath your Git repository so that Git tools see what appears to be a normal repository when, in fact, the files are not actually present on disk. VFS for Git only downloads files as they are needed.
VFS for Git also manages Git's internal state so that it only considers the files you have accessed, instead of having to examine every file in the repository. This ensures that operations like status and checkout are as fast as possible.
This is used by Microsoft for >4000 developers in a >300GB repo with >2 million commits in their Windows Git repository.
The second: sparse-checkout for Git v2.25.0 allows you to checkout just a subset of your monorepo. This should speed up commands like git pull and git status. See this blog post for more info. Unfortunately, you have to manually specify which subdirectories you want to check out with Git sparse-checkout, whereas Piper handles this transparently for developers.

Google has built more than one version control tool. Piper is specialized for the needs of the google monorepo.
When google built android, it built gerrit and repo to handle version control. Repo is used to work with many git repositories at once, each of which may have its own maintainers and release cycles. Open source dependencies don't lend themselves to a monorepo, without the control of a single organization enforcing things such as a global build status or global refactoring. Also, the requirements of piper simply don't apply in most places, such as performance of commits keeping up with requests.
Repo Docs
Gerrit Home

There is no open-source equivalent to piper.
Note that piper is old and has an old-fashioned API dating from the perforce era. I guess you would want a more modern workflow, similar to what modern DVCS offer.
I'm pretty sure your codebase isn't as large as Google's 86TB repository. Do you really need the same thing?
I'm pretty sure you could use a monorepo based on git or mercurial. And maybe evolve to a virtual file-system such as
VFS for git if you ever need it.

Meta is open sourcing Sapling which is based on our internal source control system, but also has an added layer that allows it to be backed by a regular Github repository if you want the semantics without the scalability.
I've never worked at Google and don't know how different it is to their mono-repo setup, but I've been at Meta for six years and now that this is open source I have immediately transitioned to Sapling in all of my personal projects.

Related

Source control solutions for the traveling man

I'm travelling alot and I have personal projects that I need to work on during this time. Sometimes when waiting for the train at the terminal, at home, and at work. I need to keep the source versioned somehow. I don't mind installing an application on each desktop. I need a way to sync between them, maybe with Dropbox? What options do I have?
please not git hub, and not payed solution , something private. and simple
I would take a look at using a hosted version control solution like Github or Bitbucket. You have to pay for private repos on Github, but private repos on Bitbucket are free.
Use a distributed version control system such as Mercurial or Git. This allows you to
make off-line commits: you can make commits in the train, at home, or at work
synchronize the repositories with each other
host your code on sites like Bitbucket (Mercurial and Git hosting, free private repositories) or GitHub (Git hosting, paid private repositories).
As already was mentioned - use DVCS of choice. If you don't want hosted solution (while private Bitbucket is good), you can
copy repository as is (Git, Mercurial...) from one workplace to another and back
use portable clients without installing (PortableGit, Syntevo Smart* portable versions)
or
Fossil SCM is cross-plarform, single-exe small DVCS, which store repository in one file. Easy, fast, extremely mobile
Try github. Its a great version control program that is free so long as you don't mind your files being public. If you need privacy you can pay for private repositories.
I realize you mentioned "source" above, but there is a rather famous article written by Joey Hess titled Keeping Your Life in Subversion. Apparently, he's still doing it even though the article is from 2005 (although he's moved to git).
There's even a mailing list for people who keep their homes in version control repositories!

Local Source control repository - cross platform

I am looking for a 'local' source control software, I don't need it to be necessarily available on network.. Its meant to be only for personal use..
What I am looking for is something like:
Need it to be cross platform. The biggest problem is, I need the same local repository to be available on both windows and Linux! (Is this even possible? :s ) I dual boot Windows 7 and Ubuntu and have managed to setup workspace that works in both OS without changes, now I need a source control software!
Easy installation, I have never installed one before! :)
And Has eclipse plugin..
I have used VSS for this purpose before, but that is only on Windows!
I looked for Mercurial, but I am not sure if I can use the same repository on both the OS!
Any suggestions are appreciated!
UPDATE: Thanks for your replies.. Yes I do want the same repository to be accessed from different operating systems.. Everyone has suggested an on-line repository but I 'need it to be local'.. Internet is not something I can depend on (I now know git takes care of this..! :)), I would not want version of, say my personal recordings of some home functions tweaked in audacity, to be hosted on-line! Right now, I am trying out git, as a local repository solution..
If you definitely want a repository that's always available on a local filesystem, I'd probably go for Mercurial or Git. Most likely Mercurial, as it has the best windows support (including the TortoiseHg gui), but Git works similarly.
But there's two other issues:
Do you make frequent backups?
What file system type will you use for the shared repository?
In this particular case, I would not trust a single shared filesystem as the best basket to put your eggs in; In each boot environment, I would maintain working repositories separate from the shared one. This would give you some redundancy.
Here's how this would work:
Two repositories U and W, for Ubuntu and Windows respectively, and one shared repository S, accessable frome either boot environment.
Assuming a stable situation, with all three repositories in sync:
Commit any new code to repository U in Ubuntu.
$ hg commit -m 'changes from linux'
Push the changes to S.
$ hg push
Reboot into windows.
...
Pull the latest changesets from S into W
W> hg fetch
Update your code, commit frequently
Push prior to rebooting into linux
W> hg push
Reboot
And repeat step 4, but now from linux
$ hg fetch # performs an hg pull, followed by an update.
Rinse, lather, repeat.
That's said, with both Mercurial and Git, you can synchronise your repositories across the net any time, so I would surely recommend you try that out some time.
And note: the best backup is having a copy of your data on a live file system on another computer, preferably at another location.
I'm pretty sure you can Mercurial, since the whole repository is in .hg folder.
Try TortoiseHG - it's easy to install and use.
Why do you want it to be local? The benefit of source control, is that you can have multiple clients working on the same source, without worrying too much about conflicts etc.
Even though it doesn't really answer your question, this advices might solve your problem:
Just create a project for yourself at https://github.com/ or http://sourceforge.net/ any other free online repository hosting provider. SVN, CVS, GIT all come with excellent IDE integration and clients run on almost all operating systems.
Hope this helps. Regards.
Do you really want to have a duplicate repository on different operating systems? That doesn't make sense to me. What would be the purpose of doing that?
I think you instead want to have a single repository that you can access from any operating system.
In this case, you can just install Subversion (or whatever source control system you prefer) on a server and access it from the operating systems you use. There are plenty of client tools for Mac/Windows/Linux that can talk to subversion repositories, RapidSVN being free and cross-platform for one.
If you don't have your own server, there are plenty of places online that will host Subversion for you.

Recommendations for handling source code inhouse

Hi
I'm currently seeing a need for handling source code for a few projects I'm working on. I have no need for external hosting, but I do need to have a structure internal in my development environment.
So, how would you guys recommend to handle this? To you just place the files on a file share in your environment, or do you set up some kind of versioning systems? I'm quite new to this, but I would like to have some way of getting back to old versions of my code, I would like to have the source code centrally stored so I can reach if from bothmy laptop and workstation.
/Andy.l
Use a source control management system - I would suggest using a distributed one such as Git or Mercurial, so you don't need a server or need to be online to work.
You can still have a central location where you push and pull stuff from if you really want to.
If you must have a server, go with SVN - it is easy to setup and widely used.
With all of these options, there are hosted services that you can use as a central store.
If you are using windows OS, then Visual SVN is quite good. You can install it on the server and use a client like Tortoise SVN to connect to it from other machines. The basic version is free to use.
Definitely use a version control system, it will allow you to do some nice workflows on your coding day and have all securely stored. There are several good free vcs (git, mercurial, subversion, etc). For Some time I used a combination of git + dropbox or sugar sync to back up and share my repos
http://git-scm.com/
Do setup a source control repository. Using a SCM, has nothing but benefits.
With respect of what SCM system to chose, to very simple repositories to setup and learn are Mercurial (distributed), and Subversion (centralized). I know you said you wanted centralized access to your sources, but keep in mind that that doesn't meant you can't use Mercurial for that purpose.
Here's a great tutorial on Mercurial by Joel Spolsky.
Lots of choices based on environment, etc.
SVN is an excellent all-around choice for centralized source control. You can also use Mercurial and Git internally if you prefer DVCS (even in a local environment).
In any case, regardless of what version control system you have - get one. Even if it's just one developer doing personal projects, source control is a must.
There's no question that setting up a SCM makes sense and has only advantages. Which SCM to use depends on several circumstances:
Do your co-workers already know any SCM? We're using SVN and I think it would be quite hard to teach my colleagues the concepts of a DVCS like git
In my opinion, using a DVCS like git needs more discipline during work: you have to remember to push to the central repository.
But this is also an advantage: you can create your own development branches and work on them without publishing them to the rest of your colleagues (saves reputation in some cases :-))
If you or your co-workers often work from remote, using a DVCS is more comfortable than using a centralized one like SVN: you need no connection to your central repository but can still checkin, create branches and (quite important) view the complete history of your project without connecting (e.g. via VPN) to your servers at work.
For a centralized VCS, I can recommend SVN (setup as Hps supposed)
As DVCS I can recommend Git (msysgit with tortoisegit)
If you decide to use SVN, you can still use git-svn on the clients: the repository is being run with SVN, but anyhow, you get the advantages of a DVCS while being offline.

What kind of code can I use GitHub for?

What kind of code (what coding languages) can I use GitHub for? Can I use it for websites? Flash? Can I upload images files and other resources?
(I am completely unfamiliar with Git and SVN.)
On git, svn and mercurial:
git, svn, Mercurial are all version control systems. svn was a great improvement over cvs, a commonly used version control system prior to emergence of new VCS. svn like cvs has a client-server model. git and mercurial provides a distributed version control system that does not depend on network as any repository is self contained with all the history and change records. Of course, there are other goodies.
Remember that version control system solves the problem of "the cat ate my code". You can use it to track any kind of development - code, text documents etc.
On github, bitbucket, code.google.com and codplex:
These provide additional goodies on top of what a version control system provides.
They provide you storage for keeping your repository, which you can access and share with the world.
When you share code, you would want to also provide documentation. They provide wiki support for this purpose.
They also provide ticketing / bug management system which can ease a development project.
In short, they provide various tools that can help in project management and development of your code.
Since you are getting to whet the knowledge in some of these areas, following links will be a very useful introductions:
a-visual-guide-to-version-control
intro-to-distributed-version-control-illustrated
You can use GitHub for any source code you want to manage.
But you actually can also use GitHub for your blog(!), the idea being that you would manage your articles and their revisions as you would for a source code base.
(Example: git-blog)
More general documentations: GitHub features (wiki, issue tracking, code review...).
Git does not restrict the kind of files you can track with it... use Github for anything your project needs to track!
Check out http://help.github.com/ for some documentation on how to get started using Git in conjunction with GitHub.
Version control can be applied to ANY file type. From text to images to Flash to whatever. Subversion is my Version control system of choice, and I host my own Subversion server.
As for Git. Well Git is just another version control system. Again, the same rules apply, you can version any file-type. Git-Hub is a public Git server that you can register for and use. You can make your repository public or private.
You cannot however host a site on git-hub. You can do rudimentary blogs, and use the git-feeds to feed your site, but you can't really use it as a traditional web site.

Introduction to Mercurial

I have just begun working on a project which uses Mercurial as a version control system, and I need some basic tips on how to use this. Please use this question to give some introductory tips on this technology.
The official Mercurial site
Especially, I am looking for tips on the best programs to use and the best techniques to use (branches, in and out-checking etc. I need to learn the best-practices!)
I know you already have the Mercurial site but the resource most useful to me was the Mercurial book. It's an excellent overview of the program and how to use it.
I found the best way to learn Mercurial was just to use it on a project. I imported into Mercurial a project I had exported from subversion and did some regular development with it. I made sure to clone the repository for different changesets so that I could get used to the merging and updating. I haven't learned all of the advanced uses but I'm now on a pretty firm footing with it and haven't switched back to Subversion yet.
A lot of projects have different techniques for commit workflow. Some have changes pushed from the developers, like centralized systems, and some will pull the changes from contributors (Linux, for example). It's hard to generalize too much without knowing the process for your project.
This is how I do my development:
Centralized tree on a file share or http, called project-trunk or project that is the definitive project version
A clean tree on my system that I clone from the remote repository and use to push back to the repository. I then clone from this tree for my changes. I call this tree project-local
Clone the project-local tree for each of my changes: eg. project-addusers, project-141, etc.
After I am finished with the commits to a tree, I then push the changes to the project-local repository
Finally, push the changes in the project-local to project-trunk
I have the clean project-local tree because then I can push all the changesets back to the trunk at one time, which is helpful if there is a group of related changes that need to push back together.
As for tools, it depends on your platform. I just use the vanilla command line tool. Coming from TortoiseSVN, it was a bit of a change to go to the command line. But I'm fine with it now. I tried using TortoiseHg but it didn't function well on my Windows 7 x64 virtual machine. I hear it's much better on the supported 32-bit platforms.
Here is a helpful tutorial on Mercurial written by Joel Spolsky.
It covers basic usage and commands, as well as how to work with Mercurial at a more conceptual level. If you are already familiar with SVN, then the first part is definitely worth reading: it talks about the major conceptual differences between SVN and Mercurial, because trying to use Mercurial in the same way that you use SVN is asking for trouble.
Have a look at the Mercurial book, or at this Mercurial tutorial.
Depending on your background with other source control tools, I would also suggest a specific SCM-whatever to Mercurial guide. For example, have a look at this guide for Subversion users.
Another good resource for getting your head around the whole "distributed" source control idea is: http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/ ... with helpful diagrams!
If you use the latest TortoiseHG client and include the install directory in your PATH environment, you will be able to use both the nice GUI they provide, and the command line 'hg'
I cannot recommend using the mq extensions too much. They make for a great 'working repository' environment.
I use the queues to manage local changes against a subversion repository. I do my local short term changes and use mercurial to keep in sync with subversion and the rest of the team.
A few of Steve Losh's blog posts are good, even though they're a couple of year old now. They mainly deal with how to work with branching.
Guide to Branching in Mercurial
Branch Workflows - Branching as needed
Branch Workflows - Stable and Default
It's also worth looking at his hgtip.com site.
In addition to the Mercurial Book and the Hg Init tutorial, I'll like to mention the example-driven guide I've written:
Mercurial Kick Start
It shows how to get started with Mercurial and also covers some more advanced concepts such as named branches and hgsubversion. I've used it when teaching Mercurial to new users and they seemed to like it.