Security Concerns with Private Repos in GitHub - github

I have signed up my organization to a GitHub Teams free plan, and we are considering pushing our code to private repositories on GitHub. Our projects consist of decades old legacy code and there are lots of hard-coded credentials (not only in the code, but also in comments) for various servers and databases.
I do not want to make my team change all this code to store credentials in config files, I am not 100% sure our various tech stacks support this. It would also be very time consuming, and there is no guarantee we can find every single reference of credentials. I’m just wondering if it is safe to push the code with all these credentials even if the repositories we create are not public?

Storing your code on GitHub is no less secure than storing it anywhere else. For example, GitHub generally takes significant effort to secure repositories, and staff are not permitted to look at the contents of private repositories without the consent of the repository owner. Pushing this code to GitHub will not intrinsically expose it any more than storing it on any other server.
However, having said that, storing credentials in your repository is a security problem regardless of where you host that code. It is easy for a repository to accidentally leak for many reasons, due to server misconfiguration, laptop theft, or various other situations. You would be well served to put at least a modicum of effort into using a more secure practice for storing credentials, if for no other reason than that you will have them stored in a single, secure place where you can find them all. For example, rotating credentials is much easier when they all live in a tool like Vault and you can easily rotate a compromised credential across all systems.
So, in general, what you are doing is not very secure, but using or not using GitHub will not change that.

Related

how not to expose whole codebase to a remote developer?

for a small startup , I employed some remote developers. However, I only want to reveal the necessary codes to a certain developer, not the entire source code.
is this kind of feature offered by GitHub? If not, please provide a workaround.
Many thanks
With git repositories in GitHub there is no way to prevent a developer from cloning the whole repository and GitHub can't filter the contents of the repository to leave out part of the data. Permissions in GitHub can only prevent access to a repository, make the whole repo read-only or grant write access to the repository.
If you really want to limit access, you'll need to split your solution into multiple pieces, each in their own git repository. You can then set permissions for each repository in GitHub.
As a developer myself I caution you against this. A developer with only part if the sources would have a hard time verifying their changes work in way you intend to and it might make it much harder for them to debug any issues that happen in development.

How to hide some code part and run full in machine

I want to know whether the following thing can be achieved by Github Actions or other Github feature:
I have a repository having hundreds of file, I want to share only a few files by my developer/ team. (They can only able to saw those few files, I shared).
The program can only run successfully if a developer has all those files(hidden and unhidden both).
So, is there any way through which I can hide all my code from the team, and whenever they pull the repository, all those hidden files should be downloaded in an encrypted way, and rest unhidden files can be accessed by the developer and they can execute the whole repository successfully.
If it's not possible with Github, is there any alternative tool through which I can achieve this?
Thanks
Git does not provide access control to only parts of repositories, and it's not intended to. From gitnamespaces(7):
The fetch and push protocols are not designed to prevent one side from stealing data from the other repository that was not intended to be shared. If you have private data that you need to protect from a malicious peer, your best option is to store it in another repository. This applies to both clients and servers.
So if you want to give a user access to only a few files, they need to live in another repository with separate access control (that is not a fork of the original). That will involve a separate history.
If the user needs the other files in order to develop the software, then you'll just have to give them access, or you'll have to provide pre-built binary assets they can download to build against.
In general, it's not practical to not trust your developers with full code access. Usually one protects this access with legal means, like non-disclosure agreements, not technical measures, since technical measures are usually easily bypassed.

Does github limit the number of personal access tokens per user

Using Github Enterprise, I have a service/bot account where I'd like to generate a number of Personal Access tokens and provide to a number of teams.
Is there any limit in how many Personal Access Tokens can be generated per user?
As far as I'm aware, there is no limit, but if you want to be sure, you should ask either the GitHub support team or on the GitHub community forums.
GitHub itself has such a bot account and PATs are frequently used there, but do be aware that the UI may be a little (or, depending on how many tokens you issue, very) slow, since it isn't designed for people to have huge numbers of PATs.
You may find it more desirable to use deploy keys if you're accessing a repo, since these have a smaller scope (one repository) and won't have the UI problems mentioned above, but of course that won't work for the API.

GitHub limit for SSH deploy keys

Is there any GitHub SSH Deploy key limit. Let's say I would need 2000 or even 4000 deploy keys added to the git repository. Is that possible or will I hit the limit at some point?
The reason for this is that we would have 4000 devices that would need to be provisioned. And we want to have control which device can access repository and if necessary disable it. Another option is indeed access tokens, but as far I understand they are linked to the account, not repository.
https://help.github.com/articles/git-automation-with-oauth-tokens/
And that would also mean that we would need to manage the permissions separately to which repository they have access to.
First of all, why would you need up to 4000 deploy keys? This is a pretty large number and I think you should explain why you need such a large amount of deploy keys for one single repository.
However: I contacted the GitHub support, after I couldn't find anything about this in the GitHub documentation and got the following response:
I don't believe we have a fixed limit on SSH keys or deploy keys
although as the settings pages weren't designed with this sort of
usage in mind, I think it would be rather difficult to manage.
When someone needs to control access to such a large number of
machines, we'd usually recommend creating personal access
tokens instead,
as these can be automated and will provide similar access. If
the huge number of keys was necessary and causing problems, we'd do
our best to help.

How do DVCS (Github, BitBucket, etc...) ensure private project code integrity?

How do DVCS (Github, BitBucket, etc...) ensure private project code integrity?
Sure, the companies claim no intellectual rights when you upload your code to their online repositories, but how is the privacy of the project ensured so that only the people with write/commit access to such repositories can actually view the data?
What happens if you decide to, let's say, move your project to a private server or another host? Will your project be "deleted" or only "removed" from the public index?
How can you be sure that the CEO of the company where you host your project will not be able to view your data?
Do these companies go through some sort of regular certification? Or this whole deal based on trust and understanding?
Unless those providers explicitly mention offering encrypted repos (which Assembla alludes to here, but it could only refers to https encryption), you don't have 100% guarantee.
The only way to add that level of security would be to pursue a User controlled end-to-end encryption, leveraging git's smudge/clean filter driver:
See "Transparent Git Encryption":
User controlled end-to-end encryption solves the problem:
Before data is pushed to the remote repository to store, it is encrypted with an encryption key which is known only to the data owner itself. Management of the encryption key(s) and the encryption/decryption processes is always tedious and easy to get wrong.
In the following, we shall demonstrate how to use Git with encryption in a way transparent to the end user.
As VonC says, they don't ensure anything more than promising you that only admins have access to your data.
Some hosting sites may talk about how they encrypt data on disk. That makes sense if we're talking about a laptop that might physically end up in the wrong hands, but it makes less sense for a disk sitting in a data center. The problem is that the services that run on the machine must have access to the unencrypted data and so the volume will typically be mounted when the service is running. So the encryption wont protect the data any longer and you're back to normal operating system access control.
If you really want, you can of course run all data through the decode/encode filters for Mercurial or use the equivalent filters for Git. That means that you save encrypted data at the hosting site, but you lose most of the advantages of sites like GitHub or Bitbucket. You can no longer
browse the code online in a meaningful way
review pull requests
offer tarball downloads
etc.
So I wouldn't recommend such an approach — if your data is so sensitive that you cannot host them online, then you should setup your own internal server. There I can recommend Kallithea which supports both Git and Mercurial.
As far as I know, the whole deal is based on trust, understanding, and the desire for you to not to sue them to death.