Is there any version control system available with an MVFS-like virtual file system in addition to the ClearCase?
I can't find any.
Thanks,
Mart
No (in a read/write remote access).
MVFS (MultiVersion Filesystem) is about encapsulating the native filesystem to combine:
network access
with version files through dynamic views
To my knowledge, only ClearCase offers that (especially on that many platforms: Unix, Linux, Windows, Hp).
Other VCS offer read-only remote access like Gitfs and svnfs.
From "Filesystem Interface for the Git Version Control System" (pdf, from Reilly GRANT):
The Filesystem Interface to Git (known by the acronym “figfs”, pronounced like “figs”) allows developers to work with a project in a Git repository just like a local filesystem. This means that all the branchs, tags, and revisions are available for browsing without having to check anything out.
The ability to access past revisions in a repository via the filesystem has been implemented before.
Gitfs and svnfs[12] (which is the same as gitfs except that it uses Subversion)
implement a read-only view of repository history.
The advantage of gitfs over svnfs is that Git is a distributed system and thus maintains a copy of the entire repository on the local machine, eliminating network lag when fetching revisions.
A commercial system, Rational ClearCase[9], offers a writable filesystem view of the repository, MVFS (MultiVersion File System), as an alternative to checking out files to the local filesystem. As with svnfs the performance of this system suffers from the need to query over the network for uncached file data.
Figfs eliminates this problem because a Git repository is stored entirely locally.
FYI, one of the nice things about ClearCase is that it monitors system calls to typical file operations and can determine your real dependencies in a build. This can be important when building complex systems. This capability has been added to GNU make (runs on *nix systems only though) in http://sourceforge.net/projects/posixamake/; the author's currently working on adding a derived object cache using MySQL.
Related
Google stores all its codebase in a single repository called piper [1] [2] [3].
It has an approach that is very different than open source alternatives do (centralized 'cloud' service) and aims at scaling to a repository with billions of files, thousands of developers and millions of commits [1].
It doesn't seem Google open-sourced it nor plan to do so (contrary to their build system blaze and some other tools [4]).
Are you aware of any open source version control system with an approach similar to piper?
The short answer is no, it doesn't seem to exist.
As you can read in a Quora article, "it’s hard to tell where the version control system ends, and where some of the other parts of the development toolchain begin".
So, first, you need to be clear in what "features" you are interested in since you can be interested in a feature that is not Piper's responsibility.
Also, keep in mind that your server disk space and OS would limit the file count/size before the chosen VCS.
If you need a Centralized VCSs and billions of files, you could go with SVN or OpenCVS.
If you need a Distributed one with thousands of developers and millions of commits, take a look at Git, Bazaar, Bitbucket or Mercurial.
But do you really have all those requirements?
AFAIK there's no Piper's open source equivalent on the market.
In order to better understand Centralized and Distributed VCS, take a look at this Comparison between Centralized and Distributed Version Control Systems
Also, take a look at what is Google's repository like?
Two recent developments bring Piper-like features to Git: VFS for Git and sparse-checkout.
The first: Microsoft recently open-sourced VFS for Git which feels like it brings some of Piper's monorepo features to Git.
VFS for Git virtualizes the filesystem beneath your Git repository so that Git tools see what appears to be a normal repository when, in fact, the files are not actually present on disk. VFS for Git only downloads files as they are needed.
VFS for Git also manages Git's internal state so that it only considers the files you have accessed, instead of having to examine every file in the repository. This ensures that operations like status and checkout are as fast as possible.
This is used by Microsoft for >4000 developers in a >300GB repo with >2 million commits in their Windows Git repository.
The second: sparse-checkout for Git v2.25.0 allows you to checkout just a subset of your monorepo. This should speed up commands like git pull and git status. See this blog post for more info. Unfortunately, you have to manually specify which subdirectories you want to check out with Git sparse-checkout, whereas Piper handles this transparently for developers.
Google has built more than one version control tool. Piper is specialized for the needs of the google monorepo.
When google built android, it built gerrit and repo to handle version control. Repo is used to work with many git repositories at once, each of which may have its own maintainers and release cycles. Open source dependencies don't lend themselves to a monorepo, without the control of a single organization enforcing things such as a global build status or global refactoring. Also, the requirements of piper simply don't apply in most places, such as performance of commits keeping up with requests.
Repo Docs
Gerrit Home
There is no open-source equivalent to piper.
Note that piper is old and has an old-fashioned API dating from the perforce era. I guess you would want a more modern workflow, similar to what modern DVCS offer.
I'm pretty sure your codebase isn't as large as Google's 86TB repository. Do you really need the same thing?
I'm pretty sure you could use a monorepo based on git or mercurial. And maybe evolve to a virtual file-system such as
VFS for git if you ever need it.
Meta is open sourcing Sapling which is based on our internal source control system, but also has an added layer that allows it to be backed by a regular Github repository if you want the semantics without the scalability.
I've never worked at Google and don't know how different it is to their mono-repo setup, but I've been at Meta for six years and now that this is open source I have immediately transitioned to Sapling in all of my personal projects.
I have a small Debian VPS-box on which I host and develop a few small, private PHP websites.
I develop on a Windows desktop with PHPStorm.
Most of my projects only have a few dozen source files but also contain a few thousand lib files.
I don't want to run a webserver on my local machine because this creates a whole set of problems, I don't want to be bothered with for such small projects (e.g. setting up another webserversynching files between my Desktop and the VPS-box; managing different configurations for Windows and Debian (different hosts, paths...); keeping db schema and data in synch).
I am looking for a good way to work with PHPStorm on a large amount of remote files.
My approaches so far:
Mounting the remote file system in Windows (tried via pptp/smb, ftp, webdav) and working on it with PHPStorm as if it were local files.
=> Indexing, synching, and PHPStorms VCS-support became unusably slow. This is probably due to the high latency for file access.
PHPStorm offers the possibility to automatically copy the remote files to the local machine and then synching them when changes are made.
=> After the initial copying, this is fast. Unfortunately, with this setup, PHPStorm is unable to provide VCS support, which I use heavily.
Any ideas on this are greatly appreciated :)
I use PhpStorm in a very similar setup as your second approach (local copies, automatic synced changes) AND importantly VCS support.
Ideal; Easiest In my experience the easiest solution is to checkout/clone your VCS branch on your local machine and use your remote file system as a staging platform which remains ignorant of VCS; a plain file system.
Real World; Remote VCS Required If however (as in my case) it is necessary to have VCS on each system; perhaps your remote environment is the standard for your shop or your shop's proprietary review/build tools are platform specific. Then a slightly different remote setup is required, however treating your remote system as staging is still the best approach.
Example: Perforce - centralized VCS (client work-space)
In my experience work-space based VCS systems (e.g. Perforce) can be handled best by sharing the same client work-space between local and remote systems, which has the benefit of VCS file status changes having to be applied only once. The disadvantage is that file system changes on the remote system typically must be handled manually. In my case I manually chmod (or OS equivalent) my remote files and wash my hands (problem solved). The alternative (dual work-space) approach requires more moving parts, which I do not advice.
Example: Git - distributed VCS
The easier approach is certainly Git which has it's wonderful magic of detecting file changes without file permissions being directly coupled to the VCS. This makes life easy as you can simply start with a common working branch and create two separate branches "my-feature" and "my-feature-remote-proxy" for example. Once you decide to merge your changes upstream, you do so (ideally) from your local environment. The remote proxy branch could be reverted or whatever you want. NOTE: in the case of Git I always have two branches because it's easy. And when you hard drive melts in a freak lighting strike you have extra redundancy :|
Hope this helps.
we are small team of 3 developers (Boss, me and another developer working mostly remote), and I am tasked to setup a repository server for Mercurial HG.
It seems like I can simply put our centralized repository on a shared network drive. It will extremely easy to setup, but seems like there is a risk that any one of us could abuse the convenient of working/modify the source repository directly. That is why I am thinking about using HgWebdir server as a way to control access to central repository. So directly access to the central source repository is not encouraged, but the shared drive will be here just in case.
I guess it is a question of defined our in-house version-control procedure, not a really version-control question, but I am still go ahead and ask the question. As I don't feel I am experienced enough to make the decision, and if I am not 100% sure that my reason and means a valid, it is probably hard for me to enforce the way version-control system should be used by other developers.
Edit:
I can see that there are potential issues on shared folder working with Version-control software. But anyone care to explain bit more what happened behind the scene, when pushes to shared folder? My understanding is that shared drive is essentially a shared link/shortcut, so for a shared drive, Mercurial on local machine is only hold the lock for that link, but the fact is that each users machine could had a different instance of Mercurial holding the links' lock, while the server's Mercurial instance will hold its own link on physical drive. I can see it is complicated, but how it is going to fail? I can understand the conclusion, but couldn't by myself link the facts to the conclusion
You should not place the Mercurial repository on a shared folder on a network server because Mercurial cannot reliably hold locks in all situations in such a setup, and during pushes to that central repository, locks are crucial to avoid corrupting the repository.
In fact, I would remove the "not encouraged" and replace it with "not possible", and only serve the repository either with hgweb or hg serve, the former being the recommended setup for long-running servers.
If you have a centralized server you can install hgweb there and push and pull from it as a central and BACKED-UP source. We still have Windows 2003 servers (I am in no position to change that) and with a little searching on the web was able to find info on how to setup a hgweb on a Windows server though most of it referred to Windows Server 2007.
I am looking for a 'local' source control software, I don't need it to be necessarily available on network.. Its meant to be only for personal use..
What I am looking for is something like:
Need it to be cross platform. The biggest problem is, I need the same local repository to be available on both windows and Linux! (Is this even possible? :s ) I dual boot Windows 7 and Ubuntu and have managed to setup workspace that works in both OS without changes, now I need a source control software!
Easy installation, I have never installed one before! :)
And Has eclipse plugin..
I have used VSS for this purpose before, but that is only on Windows!
I looked for Mercurial, but I am not sure if I can use the same repository on both the OS!
Any suggestions are appreciated!
UPDATE: Thanks for your replies.. Yes I do want the same repository to be accessed from different operating systems.. Everyone has suggested an on-line repository but I 'need it to be local'.. Internet is not something I can depend on (I now know git takes care of this..! :)), I would not want version of, say my personal recordings of some home functions tweaked in audacity, to be hosted on-line! Right now, I am trying out git, as a local repository solution..
If you definitely want a repository that's always available on a local filesystem, I'd probably go for Mercurial or Git. Most likely Mercurial, as it has the best windows support (including the TortoiseHg gui), but Git works similarly.
But there's two other issues:
Do you make frequent backups?
What file system type will you use for the shared repository?
In this particular case, I would not trust a single shared filesystem as the best basket to put your eggs in; In each boot environment, I would maintain working repositories separate from the shared one. This would give you some redundancy.
Here's how this would work:
Two repositories U and W, for Ubuntu and Windows respectively, and one shared repository S, accessable frome either boot environment.
Assuming a stable situation, with all three repositories in sync:
Commit any new code to repository U in Ubuntu.
$ hg commit -m 'changes from linux'
Push the changes to S.
$ hg push
Reboot into windows.
...
Pull the latest changesets from S into W
W> hg fetch
Update your code, commit frequently
Push prior to rebooting into linux
W> hg push
Reboot
And repeat step 4, but now from linux
$ hg fetch # performs an hg pull, followed by an update.
Rinse, lather, repeat.
That's said, with both Mercurial and Git, you can synchronise your repositories across the net any time, so I would surely recommend you try that out some time.
And note: the best backup is having a copy of your data on a live file system on another computer, preferably at another location.
I'm pretty sure you can Mercurial, since the whole repository is in .hg folder.
Try TortoiseHG - it's easy to install and use.
Why do you want it to be local? The benefit of source control, is that you can have multiple clients working on the same source, without worrying too much about conflicts etc.
Even though it doesn't really answer your question, this advices might solve your problem:
Just create a project for yourself at https://github.com/ or http://sourceforge.net/ any other free online repository hosting provider. SVN, CVS, GIT all come with excellent IDE integration and clients run on almost all operating systems.
Hope this helps. Regards.
Do you really want to have a duplicate repository on different operating systems? That doesn't make sense to me. What would be the purpose of doing that?
I think you instead want to have a single repository that you can access from any operating system.
In this case, you can just install Subversion (or whatever source control system you prefer) on a server and access it from the operating systems you use. There are plenty of client tools for Mac/Windows/Linux that can talk to subversion repositories, RapidSVN being free and cross-platform for one.
If you don't have your own server, there are plenty of places online that will host Subversion for you.
What kind of code (what coding languages) can I use GitHub for? Can I use it for websites? Flash? Can I upload images files and other resources?
(I am completely unfamiliar with Git and SVN.)
On git, svn and mercurial:
git, svn, Mercurial are all version control systems. svn was a great improvement over cvs, a commonly used version control system prior to emergence of new VCS. svn like cvs has a client-server model. git and mercurial provides a distributed version control system that does not depend on network as any repository is self contained with all the history and change records. Of course, there are other goodies.
Remember that version control system solves the problem of "the cat ate my code". You can use it to track any kind of development - code, text documents etc.
On github, bitbucket, code.google.com and codplex:
These provide additional goodies on top of what a version control system provides.
They provide you storage for keeping your repository, which you can access and share with the world.
When you share code, you would want to also provide documentation. They provide wiki support for this purpose.
They also provide ticketing / bug management system which can ease a development project.
In short, they provide various tools that can help in project management and development of your code.
Since you are getting to whet the knowledge in some of these areas, following links will be a very useful introductions:
a-visual-guide-to-version-control
intro-to-distributed-version-control-illustrated
You can use GitHub for any source code you want to manage.
But you actually can also use GitHub for your blog(!), the idea being that you would manage your articles and their revisions as you would for a source code base.
(Example: git-blog)
More general documentations: GitHub features (wiki, issue tracking, code review...).
Git does not restrict the kind of files you can track with it... use Github for anything your project needs to track!
Check out http://help.github.com/ for some documentation on how to get started using Git in conjunction with GitHub.
Version control can be applied to ANY file type. From text to images to Flash to whatever. Subversion is my Version control system of choice, and I host my own Subversion server.
As for Git. Well Git is just another version control system. Again, the same rules apply, you can version any file-type. Git-Hub is a public Git server that you can register for and use. You can make your repository public or private.
You cannot however host a site on git-hub. You can do rudimentary blogs, and use the git-feeds to feed your site, but you can't really use it as a traditional web site.