Git: Access Control? How to do in practice - eclipse

how would one protect a GIT repository of a complete (java) application from having a developer getting access to all the source code in the repository. I know GIT is a distributed Versioning Control where a developer normally "downloads/fetches" the complete(!) repository.
My Questions:
How to sperate "modules/autonomous parts" in git? For example havng a module "payment layer" and "database layer" and "processing layer" and so forth all abstracted via APIs/Interfaces. Do i have to setup a seperate git repository for all those modules?
Is there a way to have one large repository in GIT but to somehow restrict the access by path? (A client should only recive those files he was granted access to)
Is there a way to have one large repository in GIT but to somehow restrict the access by Branch/Tags? (A client should only recive those files he was granted access to)
Just in Case someone knows this too: Is there a way in eclipse to chekout content from multiple GIT repositories into one project and also (the other way round) commit code within in one eclipse project to multiple different GIT repositories (based on package names/paths or in the context menu)?
Thank you very much
Markus!

You will have to split up the code into multiple git repositories if you want differential control. You cannot control by branches or whatever. Git downloads the entire repo. Period.
You can look into git modules for a mechanism for making it easier to work with a thing built of multiple git repositories.

1) and 4) depends a lot of your build evironment. In git you try to have separate repositories per modules, but if the setup of the source tree becomes painful you can use git submodules (though not much people like them) or the repo tool the Android project uses. This allows you to have an "umbrella" project composed of more subprojects. Not sure if it is worth it for just a few components. Just one git repo may still make more sense.
For questions 2) and 3):
For access, I would recommend that every sub-team keeps its own fork (repository) and somebody reviews what they push to the integration repository. If you don like this approach, you can use git server hooks to enforce policies writing scripts.
In this case, the hook could check who is pushing, and the path or refspec (branch) against some config file describing the policy. This is documented here:
https://git-scm.com/book/en/v2/Customizing-Git-An-Example-Git-Enforced-Policy

1). Look at Git submodules http://linux.die.net/man/1/git-submodule
2,3). Look at Gitolite https://github.com/sitaramc/gitolite/blob/pu/doc/gitolite.conf.mkd
4). I don't think any eclipse-git plugins allows that. However, you can use an external/command-line client to achieve what you want.

Related

GitHub + Eclipse Workflows

I am sure this is a RTFM thing, but after a few days of research I still cannot determine the correct (or best) workflow for this.
I have an Eclipse Workspace with a number of Java Projects in it and a number of C++/Arduino projects.
I want to start using GitHub as an online repository (easily reachable from outside my private LAN dev environment) for my projects
I was thinking I would like a separate C++/Arduino and Java GitHub repos. More could come for Python, PowerShell, etc. (But I will happily entertain other recommendations for repo structures).
Outside of the actual mechanics of using Egit, I cannot figure out the most appropriate workflow/folder structure for accomplishing this. Should I create local Git repos and push to GitHub as a remote? Should I use GitHub's web interface to import the entire Eclipse Workspace? Should I work directly with the Eclipse Workspace or have separate Git folders?
I guess the crux of my problem is that after reading a few related posts on this site I get conflicting advice about creating a local repo from the Eclipse workspace vs. a separate local repo. I think I need to understand this distinction first before I ultimately determine the best overall workflow.
I apologize for the broad nature of this question, but I hope that the community can help me narrow the workflow process design (or the question itself).
Two things up front:
Never put your entire workspace in source control; projects: yes, workspace: no. The .metadata folder contains data specific to that location and your machine, and that's ignoring any potential security risks with making it public.
eGit works with your git clones' own metadata, so if you're more comfortable doing certain things from the command line, go for it. I know I am, but I still appreciate the UI and decorations that eGit provides. Just make sure any automatic refresh/update preferences on the Workspace or Git preference pages are turned on.
You probably want the repository to contain multiple projects, rather than having a separate repository for each one. That way histories and changes that belong together are together. Nest the layout however you like, but remember that you're not constrained to a single repository for everything, either.
I don't know that there's a best practice for this, especially with projects that already exist, but projects should themselves be relocatable. My recommendation, after backing it all up:
Make the Github projects, clone their repos locally. I do it this way, from the command line, to save me any headaches with the history, remotes, and refs. I don't think you can modify the repository metadata in this method, though.
Move the workspace projects into the local clone. You can delete them from the workspace (be sure not to delete the underlying files), physically move the directories while outside of Eclipse, and then import them as projects back in from the Git Repositories View--unless they're Maven projects, in which case it's better to use M2E's import wizard.
Stage, commit, and push the projects up to the remote origin. For Java projects, don't forget to set the JRE System Library in the Java Build Path to use an Execution Environment. It's a simple bit of indirection that makes them more portable across machines.

Is there an open source equivalent to piper, Google's version control tool?

Google stores all its codebase in a single repository called piper [1] [2] [3].
It has an approach that is very different than open source alternatives do (centralized 'cloud' service) and aims at scaling to a repository with billions of files, thousands of developers and millions of commits [1].
It doesn't seem Google open-sourced it nor plan to do so (contrary to their build system blaze and some other tools [4]).
Are you aware of any open source version control system with an approach similar to piper?
The short answer is no, it doesn't seem to exist.
As you can read in a Quora article, "it’s hard to tell where the version control system ends, and where some of the other parts of the development toolchain begin".
So, first, you need to be clear in what "features" you are interested in since you can be interested in a feature that is not Piper's responsibility.
Also, keep in mind that your server disk space and OS would limit the file count/size before the chosen VCS.
If you need a Centralized VCSs and billions of files, you could go with SVN or OpenCVS.
If you need a Distributed one with thousands of developers and millions of commits, take a look at Git, Bazaar, Bitbucket or Mercurial.
But do you really have all those requirements?
AFAIK there's no Piper's open source equivalent on the market.
In order to better understand Centralized and Distributed VCS, take a look at this Comparison between Centralized and Distributed Version Control Systems
Also, take a look at what is Google's repository like?
Two recent developments bring Piper-like features to Git: VFS for Git and sparse-checkout.
The first: Microsoft recently open-sourced VFS for Git which feels like it brings some of Piper's monorepo features to Git.
VFS for Git virtualizes the filesystem beneath your Git repository so that Git tools see what appears to be a normal repository when, in fact, the files are not actually present on disk. VFS for Git only downloads files as they are needed.
VFS for Git also manages Git's internal state so that it only considers the files you have accessed, instead of having to examine every file in the repository. This ensures that operations like status and checkout are as fast as possible.
This is used by Microsoft for >4000 developers in a >300GB repo with >2 million commits in their Windows Git repository.
The second: sparse-checkout for Git v2.25.0 allows you to checkout just a subset of your monorepo. This should speed up commands like git pull and git status. See this blog post for more info. Unfortunately, you have to manually specify which subdirectories you want to check out with Git sparse-checkout, whereas Piper handles this transparently for developers.
Google has built more than one version control tool. Piper is specialized for the needs of the google monorepo.
When google built android, it built gerrit and repo to handle version control. Repo is used to work with many git repositories at once, each of which may have its own maintainers and release cycles. Open source dependencies don't lend themselves to a monorepo, without the control of a single organization enforcing things such as a global build status or global refactoring. Also, the requirements of piper simply don't apply in most places, such as performance of commits keeping up with requests.
Repo Docs
Gerrit Home
There is no open-source equivalent to piper.
Note that piper is old and has an old-fashioned API dating from the perforce era. I guess you would want a more modern workflow, similar to what modern DVCS offer.
I'm pretty sure your codebase isn't as large as Google's 86TB repository. Do you really need the same thing?
I'm pretty sure you could use a monorepo based on git or mercurial. And maybe evolve to a virtual file-system such as
VFS for git if you ever need it.
Meta is open sourcing Sapling which is based on our internal source control system, but also has an added layer that allows it to be backed by a regular Github repository if you want the semantics without the scalability.
I've never worked at Google and don't know how different it is to their mono-repo setup, but I've been at Meta for six years and now that this is open source I have immediately transitioned to Sapling in all of my personal projects.

How to set up GIT as version control tool for a small team

We are using Eclipse with a SVN client plug-in. This client needs a server running; what about Git? We need to work in a LAN environment without internet access. I have read some basic tutorials about using Git with Eclipse. If I got a Java project in my Git repository, how can I share it with my teammate?
Even though you can share your local repositories, I would suggest setting up a server. There many free alternatives like:
gitlab (http://gitlab.org)
gitorious (http://gitorious.org)
gitolite (https://github.com/sitaramc/gitolite)
gitblit (http://gitblit.com/)
But IMO the best one is Atlassian Stash which for small team will cost you only $10.
if you need to share it, you need some way to access it from each other. Bitbucket is great for small teams who need private code.
If you are always using it from inside a LAN one of you should set up a shared section which you can all push your git changes too (a shared folder or shared drive is good enough) but i would recommend using github / bitbucket if possible
from a command line (can probably use it within eclipse too)
git clone file:////192.168.1.100/code
and then you can psuh and pull from 192.168.1.100/code assuming you have write permissions there
if you're coming from subversion to git, you will be faced with the concept of local repository vs shared repository. You will be able to have a local repository on your computer where you can do as many commits as you want and then only push relevant changes to the shared repository (the one that your teammates will be able to see).
Here's an useful link on the possibilities to share a repository: http://www.jedi.be/blog/2009/05/06/8-ways-to-share-your-git-repository/ (ignore the last one, GITHUB, which will require internet access).
In your particular situation I would recommend sharing via SSH or via GIT daemon.
I also really recommend you to take a look on Eric Sink's book here. He's even offering hardcopies for free!
as suggested you can run your own instance of gitolite or gitlab, but for a rudimentary solution i suggest you just check the following answer:
https://serverfault.com/a/113688/181010
basically you can use any folder as a shared repository as long as all users can access the files either locally or via ssh. that link discribes how to tell git to create its file with rights that are appropriate for usage by all users of one unix group (instead of only the single user owning the files).

How do you use Git within Eclipse as it was intended?

I've recently been looking at using Git to eventually replace the CVS repository we have at work. However after watching Linus Torvalds' video on YouTube about Git it seems that every tutorial I find suggests using Git in the same way CVS is used except that you have a local repository which I agree is very useful for speed and distribution.
However the tutorials suggest that what you do is each clone the repository you want to develop on from a remote location and that when changes are made you commit locally building up a history to help with merge control. When you are ready to commit your changes you then push them to the remote location, but first you fetch changes to check for merge conflicts (just like CVS).
However in Linus' video he describe the functionality of Git as a group of developers working on some code pushing and fetching from each other as needed, not using a remote location i.e. a centralized location. He also describes people pushing their changes out to verifiers who fetch and push code also. So you can see it's possible to create scalable structure within a company also.
My question is can anybody point me in the direction of some tutorials that actually explain how to do this distributed development of code using Git so that developers push and fetch code from each other with out committing to the remote repository and if possible it would be very nice to have this tutorials Eclipsed based.
Thanks in advance,
Alexei Blue.
I don't know any specific tutorial about this. In general, for connecting to a repository, you have to be running a git server that listens (and authenticates) to git requests.
To have a specific repository for each developer is possible - but each repository needs that server component. It is possible to store multiple repositories on the same computer, that allows reducing the number of servers required.
However, for other reasons it is beneficial to have some kind of central structure (e.g. a repository for stuff to be released; or a repository for stuff not verified yet). This structure is not required to be a single central repository, but multiple ones with well-defined workflows regarding the data move between repositories (e.g. if code from the verification repository is validated, it should be pushed to the release repository).
In other words, you should be ready to create Git servers (e.g. see http://tumblr.intranation.com/post/766290565/how-set-up-your-own-private-git-server-linux for details; but there are other tutorials for this as well), and define workflows for your own company to use it.
Additionally, I recommend looking at AlBlue's blog series called Git Tip of the Week.
Finally, to ease the introduction I suggest to first introduce Git as a direct replacement for CVS, and then present the other changes one by one.
Take a look at alblue's blog entry on Gerrit
This shows a workflow different from the classic centralized server such as CVS or SVN. Superficially it looks similar as you pull the source from a central Git server, but you push your commits to the Gerrit server that can compile and test the code to make sure it works before eventually pushing the changes to the central Git server.
Now instead of pushing the changes to Gerrit, you could have pushed the code to your pair programming buddy and he could have manually reviewed and tested the code.
Or maybe you're going on holiday and a colleague will finish the task you've started. No problem, just push your changes to their Git repo.
Git doesn't treat any of these other Git instances any different from each other. From Git's perspective, none of them are special.

How should I work on a CVS hosted project to both (1) fix bugs and (2) maintain my own private fork with additional features

The question
An open source program uses CVS for version control. I would like to make a number of bug-fixes and submit patch bombs to the developers with commit access. I would also like to maintain my own semi-private fork that mainly tracks the main code-base but that includes my own features (these features, right now, should not be incorporated into the main code-base.)
I prefer to use mercurial for my own version control needs, but I am open to other version control systems if necessary.
I'd like to:
Be able to easily create patch-bombs against the current CVS source with my own bug-fixes
Keep track of history on my own features
Have fixes and improvements from the main tree easily incorporated in my new-feature fork
Easily apply my own bug-fixes to my new-feature fork
Be able to work and track change history without an Internet connection.
What suggestions do you have for doing this?
My current idea
My own best guess is below, to give you a better idea of what I am thinking about.
I will have 3 mercurial repositories.
The first two repos are managed as specified at (https://wiki.mozilla.org/Using_Mercurial_locally_with_CVS). One just mirrors the latest changes from the CVS upstream. I do "cvs update" then "hg commit" in this repo. The second repo holds my bug-fixes as patches using the mq extension and I pull from the the first repo and re-base my patches every so often. When my patches are incorporated into the main tree, I remove the patches from the patch queue/make them permanent commits.
The third repo is my local fork. It will start out as a clone of the first repo. Then each time I do an update of the first repo, I'll pull from it into repo 3. My own features will be directly present as commits in this repo. When I fix a bug, I'll export a patch from repo 2 and apply it to the appropriate pull from repo 1.
I have used Git to manage changes on top of a CVS repository in a similar way. My solution in Git uses local branches instead of multiple repositories, but it sounds essentially similar to your proposed idea.
I found that this arrangement works best if you commit all the CVS metadata (in the CVS/) subdirectories) to your mirrored repository. This means that the CVS metadata gets replicated in the other repositories, but it doesn't cause any harm (and lets you run commands like cvs diff if you need to).