How do DVCS (Github, BitBucket, etc...) ensure private project code integrity? - github

How do DVCS (Github, BitBucket, etc...) ensure private project code integrity?
Sure, the companies claim no intellectual rights when you upload your code to their online repositories, but how is the privacy of the project ensured so that only the people with write/commit access to such repositories can actually view the data?
What happens if you decide to, let's say, move your project to a private server or another host? Will your project be "deleted" or only "removed" from the public index?
How can you be sure that the CEO of the company where you host your project will not be able to view your data?
Do these companies go through some sort of regular certification? Or this whole deal based on trust and understanding?

Unless those providers explicitly mention offering encrypted repos (which Assembla alludes to here, but it could only refers to https encryption), you don't have 100% guarantee.
The only way to add that level of security would be to pursue a User controlled end-to-end encryption, leveraging git's smudge/clean filter driver:
See "Transparent Git Encryption":
User controlled end-to-end encryption solves the problem:
Before data is pushed to the remote repository to store, it is encrypted with an encryption key which is known only to the data owner itself. Management of the encryption key(s) and the encryption/decryption processes is always tedious and easy to get wrong.
In the following, we shall demonstrate how to use Git with encryption in a way transparent to the end user.

As VonC says, they don't ensure anything more than promising you that only admins have access to your data.
Some hosting sites may talk about how they encrypt data on disk. That makes sense if we're talking about a laptop that might physically end up in the wrong hands, but it makes less sense for a disk sitting in a data center. The problem is that the services that run on the machine must have access to the unencrypted data and so the volume will typically be mounted when the service is running. So the encryption wont protect the data any longer and you're back to normal operating system access control.
If you really want, you can of course run all data through the decode/encode filters for Mercurial or use the equivalent filters for Git. That means that you save encrypted data at the hosting site, but you lose most of the advantages of sites like GitHub or Bitbucket. You can no longer
browse the code online in a meaningful way
review pull requests
offer tarball downloads
etc.
So I wouldn't recommend such an approach — if your data is so sensitive that you cannot host them online, then you should setup your own internal server. There I can recommend Kallithea which supports both Git and Mercurial.

As far as I know, the whole deal is based on trust, understanding, and the desire for you to not to sue them to death.

Related

Security Concerns with Private Repos in GitHub

I have signed up my organization to a GitHub Teams free plan, and we are considering pushing our code to private repositories on GitHub. Our projects consist of decades old legacy code and there are lots of hard-coded credentials (not only in the code, but also in comments) for various servers and databases.
I do not want to make my team change all this code to store credentials in config files, I am not 100% sure our various tech stacks support this. It would also be very time consuming, and there is no guarantee we can find every single reference of credentials. I’m just wondering if it is safe to push the code with all these credentials even if the repositories we create are not public?
Storing your code on GitHub is no less secure than storing it anywhere else. For example, GitHub generally takes significant effort to secure repositories, and staff are not permitted to look at the contents of private repositories without the consent of the repository owner. Pushing this code to GitHub will not intrinsically expose it any more than storing it on any other server.
However, having said that, storing credentials in your repository is a security problem regardless of where you host that code. It is easy for a repository to accidentally leak for many reasons, due to server misconfiguration, laptop theft, or various other situations. You would be well served to put at least a modicum of effort into using a more secure practice for storing credentials, if for no other reason than that you will have them stored in a single, secure place where you can find them all. For example, rotating credentials is much easier when they all live in a tool like Vault and you can easily rotate a compromised credential across all systems.
So, in general, what you are doing is not very secure, but using or not using GitHub will not change that.

How to hide some code part and run full in machine

I want to know whether the following thing can be achieved by Github Actions or other Github feature:
I have a repository having hundreds of file, I want to share only a few files by my developer/ team. (They can only able to saw those few files, I shared).
The program can only run successfully if a developer has all those files(hidden and unhidden both).
So, is there any way through which I can hide all my code from the team, and whenever they pull the repository, all those hidden files should be downloaded in an encrypted way, and rest unhidden files can be accessed by the developer and they can execute the whole repository successfully.
If it's not possible with Github, is there any alternative tool through which I can achieve this?
Thanks
Git does not provide access control to only parts of repositories, and it's not intended to. From gitnamespaces(7):
The fetch and push protocols are not designed to prevent one side from stealing data from the other repository that was not intended to be shared. If you have private data that you need to protect from a malicious peer, your best option is to store it in another repository. This applies to both clients and servers.
So if you want to give a user access to only a few files, they need to live in another repository with separate access control (that is not a fork of the original). That will involve a separate history.
If the user needs the other files in order to develop the software, then you'll just have to give them access, or you'll have to provide pre-built binary assets they can download to build against.
In general, it's not practical to not trust your developers with full code access. Usually one protects this access with legal means, like non-disclosure agreements, not technical measures, since technical measures are usually easily bypassed.

What mechanisms do different VCS's have for restricting access to files?

I have asked this question before, but that was within the context of Subversion only. As there is a possibility that we'll move to a different VCS, I'll ask again with a broader scope.
We're dealing with a repository that contains files that are subject to ITAR. Several teams will have access to the repository, but some of them are not allowed to even see ITAR sensitive data, so we're talking about read access here, not just commit access.
What we'd like to have, is access control where we can restrict access
Right away (on commit)
Post hoc (marking an already committed file as sensitive, if possible)
Version based (if possible)
A scenario could be:
Version 148 is not sensitive and is accessible to everybody
Version 149 is sensitive and should be inaccessible to those without clearance, right after commit.
Version 150 is not sensitive anymore and is again accessible to everybody.
Is there any VCS (preferably a D VCS) that provides these options?
Additional information: we're doing Scrum. There are four teams, doing their sprints out of sync with each other. There has been talk of synchronizing our sprints.
There is some overlap in the code the teams handle, but not much.
We want to move to continous integration some time in the future, but we've a long way to go.
I don't think you'll ever find a DVCS that does what you want -- each copy of the repo needs to have all the code (and history). I suppose in theory, it's possible (encrypt each file, then manage the keys, etc.). But the management overhead would probably be equivalent to saying "Before you check in an ITAR file, encrypt it with a key only people in the US know."
Your best bet is to go modular and put all ITAR code in one repo. (Often, there will be a 'stub' implementation in a regular repo that uses null encryption or whatever.) Then you can do absolute read access control for that repo.
You can also apply for an ITAR exemption. Sometimes having the software on a computer that blocks IPs from bad countries is enough. It's a long shot, but it would simplify your life.
Possibly relevant: In the 90's Sun Microsystems realized that ITAR only hampers US programmers. So they developed encryption software exclusively outside of the US as a simple way to "get around" ITAR.

Version Control from a different age

At my work I'm on a separate network to my colleague due to clearance reasons, and we both need to share code. I am wondering what the best versioning system would be? There's got to be something better than having project1.zip, project2.zip , etc - but something not as expansive as git or hg.
I would still recommend Git, as it allows to:
make a bundle (only one file, and it can be an incremental bundle)
mail that bundle to your colleague (meaning it will work even if your separate networks have no other way to communicate)
The idea is to exchange one file (from which you can pull any new history bundled in it).
And Git is very cheap for creating and adding a repo when an existing code base is already there.
That being said, any communication procedure will have to be approved by your employer: don't bypass any security measure ;)

Should I use a software hosting solution for my personal projects?

Right now, I keep all of my projects on my laptop. I'm thinking that I shouldn't do this, but instead use a version control system and check them in/out from an external hosting repository (Google Code, SourceForge, etc). I see several benefits here - first, I don't have to worry about losing my code if my computer crashes and burns or my external HDD crashes and burns; second, I can share my code with the world and perhaps even get more help when I need it.
Is this a good idea? If so, what are some other project hosts that I should investigate (other than Google Code and SourceForge)?
Assembla is awesome.
EDIT: Yes, this is a good idea - I used to use a personal copy of Vault and found it was more than I cared to manage (in case my server went down or hard drive crashed - not only was it painful to worry about losing and backing up data, but the downtime). Of course, it doesn't hurt to have your own backup as well. Cover all your bases!
After losing some freelance work to a hard drive crash, I've become keen on the philosophy that "It doesn't exist until its in source control". As I don't want to necessarily share the source for my projects with the rest of the world, I pay for webhosting (using Dreamhost who have great deals on basic shared hosting and easy one-click installs for things like subversion) and store my data that way. They don't claim to be any sort of backup service, but all I really want is a second copy offsite somewhere.
If I do decide to share the code I can always make it public later. Do note that sourceforge does not allow private/personal projects, and Google Code forces you to license your code using an open source license. Both have some limitations on the number of projects you can create (and aren't really intended to store everybody and their brother's personal projects).
Assembla looks pretty slick although it is hard to tell what all you get for free. I'm definitely going to try it out.
There is an extensive list at wikipedia.
GitHub is a really great option for git.
Most of the free, public hosting sights will insist that you license your code with an OSS license (and, possibly, your documentation). That's potentially a different thing that you're talking about (backups).
For just backups, you may want to try a for-pay service or even something like mozy.
I use Assembla - You can share your code if you want, but you are not required to. That's a big plus to me.
Online backup is cheap and easy. Why would you not?
I host most of my non-code backups on Amazon's S3 service.
Code goes on a Slicehost virtual server that has automated snapshot backups (daily as well as weekly) and runs Subversion and the Trac web interface to it.
Github is a really great hosting service if you use Git; and of course everyone should use Git. The default is free public project hosting, but if your stuff is proprietary (or perhaps embarrassing) you can get private hosting from them for some cost per month.
If you want to make your projects in some form public, than a hosting-solution may be useful for you.
I made a listing of project-hosting-sites at this question. Of these list only Origo allows you also to host a closed-source-project. As long as you want to open up your source, you can choose everyone on this list.
For my personal projects I use a git repository on a local Fedora Server (that is backed up daily). I .tgz the repository and mysqldb (for bugzilla) and back it up on Carbonite AND a local, redundant hard drive.
I can clone the git repository from any of my other machines into all other environments.
With this you have a backup and version control. I think my system is better than the one I have at work, LOL.
As long as you want to publish your personal projects as open source, you have a lot of possibilities to choose from, because there are lots of hosters that provide this.
If you just want to store your code somewhere online, but not share it with the world:
Some hosters also allow private repositories, but the only free one that I know of is Bitbucket (which I use myself for my private and open source projects).
They allow an unlimited number of public and private Mercurial and Git repositories, the only limitation is that no more than five users can access your private repositories (you can have more, but then it's not free anymore).