I am finally moving to GitHub for source control. We can only use a public repo for a project were doing, but how public is public? Is it safe to assume that if I do not publicize the project at all, no one will really find it among the 3 mil repos they already have?
I cannot really have people seeing the source code as of right now, but 7/mo is a little steep for needing just 1 private repo.
No. That's not a safe assumption. GitHub has a search engine for public repos, and people use it (including myself). So, there's always a decent possibility that someone will see your source-code. If you want a free private repo, I suggest using BitBucket instead, or another service that offers free private repos. Note that BitBucket is only free if you have 5 or fewer users working in your repo.
When asking yourself such questions, you should consider that a lot of website visitors are other computers, not humans. The google bot is indexing every page it can find, ohloh.net creates statistics over open source projects on github, ... and so on and so forth. So if something is on the Internet, people will find it. :)
Related
If you are in an organisation, there may be GitHub repositories that are private (i.e. you don't have access to them), but it would be useful to know that they exist, and then you could arrange access where appropriate.
In other words we are trying to enable discoverability, in a way that can lead to access. This could be done with sharing readme's (noting that people need to have some discipline to write sensible readme's).
This blog post Solving the innersource discoverability problem looks like a potential solution, but may require that the user has access to see all the repos in the portal? I'd like for the user to be able to view readme's for all repos - if they don't have access, the can contact whoever is listed on the readme.
I see another option for making a file public from a private repo (using gitexporter to create a public repo with only the readme, example here. This makes it public, not my first preference, and would require every repo to do some work, far from ideal. While it doesn't give a neat portal, it should allow GitHub search functionality to find it by topic or keyword?
A related, perhaps simpler option is proposed here, where a student shares a readme from a private repo as a public GitHub page. Again, requires a little work from every repo, no neat portal, but can be found with GitHub search? While public Github pages can be made private, then would only be visible to those with repo access?
So, if I'm summarising basic requirements:
All org repos (public, private or team) have a readme that can be accessed by search by someone in the org (preferably not requiring each individual to modify their repo).
Additional nice to have features:
All readmes can be viewed in a portal with search
Bonus for being able to make super private (only collaborators can see readme - flag in readme?), org private (only people in the org - default) and public (flag in readme?).
Simple to implement!
Suggestions?
I think you have already provided a suitable solution for it here already within your question. Alternatively, you can use APIs (GET repos, GET README of a repo) to get each repositories README and save it to a database/JSON based on a cron scheduler and create a web interface based on that data.
But, I'm gonna elaborate on a few areas of improvement. The problem I see with this is the nature of the search. We aren't always looking for keywords, sometimes we are trying to find a potential fuzzy match for our problem, especially in the case of a larger organization with more than a couple of thousand repositories. In those cases, a search engine implementation will provide much better results. In my opinion, we should collect the README and FAQs and put them into Elastic search, expose search API for queries. The collection of README and FAQs should be part of the CI/CD pipeline, and while pushing new versions to artifactory it must publish metadata as well.
This looks like a use case for internal repositories to me. You can find more about internal repositories here.
Whether you can use internal repositories or not highly depends on your company's policies.
Another thing to consider is that this will expose your whole repositories, not just the README.
As someone who is just beginning to think about using private repositories, if I understand correctly, they basically let you make commits in private until you are ready to open-source your app/program to the world and then, once you do, your entire Github/Bitbucket commit history becomes visible to everyone (like as if you were developing out in the open the entire time).
Now what happens if someone open-sources something before you do and claims provenance in the field/area/app/etc.? Can you basically open-source your software in return (or contact the authors directly) and "counter-claim" provenance? Obviously, the open-source person wouldn't have known about your existence since you're developing in private mode, so whose "right-of-way" would it be in such a hypothetical situation?
I can clearly see the utility of private repos for potential forking by competitors who have many more resources than you do and can hypothetically out-code you to the finish line and/or refactor your code significantly (potentially without attribution), but beyond that I don't really see much of a direct benefit to software development in private repos. Can anyone clarify the above points for me? For the record, I have investigated related posts like: https://softwareengineering.stackexchange.com/questions/87577/whats-the-benefit-of-having-a-private-repository-for-personal-projects
Private repository is about visibility: visible only by you or by all.
It is not about content: you can store anything (not too big) in a Git repo (public or private): a project, or just a collection of files. It is not limited to " software development". You can keep private simple text files representing notes you want to remember, for instance.
Typically, the three ways of claiming ownership of an open-source project, as described in "Ownership and Open Source" by Eric S. Raymond, have nothing to do with private/public repo.
One, the most obvious, is to found the project. When a project has had only one maintainer since its inception and the maintainer is still active, custom does not even permit a question as to who owns the project.
(See also "How do I navigate to the earliest commit in a Github repository?")
The second way is to have ownership of the project handed to you by the previous owner.
The third way to acquire ownership of a project is to observe that it needs work and the owner has disappeared or lost interest.
So this is more about communication, and less about repository management.
I am a student who works with github account with edu suffix. There are some repository I would like to work with my current account assigned by school. All I know is that my school account is kind of github enterprise account.Basically, I would like to always works in only one account rather than switch beteen two accounts. Is the demand weird? Cause I didn't see any similar demand before.
So what I want to know is it possible. Also I feel like misunderstanding some fundamental principle of github which I couldn't tell.
Wish some guys could point it out. Thanks.
So the problem I can read out of your question is that you want to access public repositories on GitHub. The answer to this question is you can handle this with your student account. With the Enterprise license you should even have more features like public repositories etc.
For your second part some information about GitHub which might help you to understand the first answer.
You can always access public repositories, clone them on your machine and modify them and use all the features of git locally. The only thing that is not possible is pushing to the public repository if you don't have write access to this repository.
If you want to push some modifications to this repository you can do a fork of this. Then you will have this fork under your name and you also have write access. Now you are able to push your changes to this repository.
If you want to have your modifications also in the original repository you can do a pull request with your pushed commits of your own repository. The owner of the original repository now can decide to accept the pull request. Accepting the request will merge your changes in the original repository.
I hope this was the information you were looking for.
If my entire project is stored in a public github repo what's to stop people from downloading it and publishing it on Google play before I do?
I want to use github to keep track of changes in a group Android project, I was told an active account would also benefit future employment, and make it public to help other students. But what if I wanted to eventually publish it to Google play? See the above question.
I'm new to software development.
If you find your code online somewhere which you did not permit or without your license, you can submit a DMCA claim. Here is Google Play's takedown form and GitHub's DMCA guide.
If you want to keep your code in a private repository, try BitBucket, it's a private code repository and free up to five users.
The meaning of a public github repository is specified by github as the following:
Public Repositories can be viewed and cloned by anyone. Choose this if your repository:
is an open source project
should be easy for other members to fork and contribute back their modifications.
I would be a better option to choose private repository(paid on) to keep your code safe.
Yes, you have the option Transfer ownership so you can transfer you code to your future employer using github. Hope it helps you. Let me know if you need any further information,
Transfer this repo to another user or to an organization where you have admin rights.
If my entire project is stored in a public github repo what's to stop people from downloading it and publishing it on google play before I do?
In theory, nothing.
In practice, few people are going to run across your repository, unless you promote it (e.g., publish links to it). There are many repositories in GitHub.
Any opensource project (such as a public github repo) needs to declare a software license. In the case of github, failing to do so implies a default MIT license to any fork (clone), which gives away most rights except attribution.
GPL licenses are viral, meaning any tier in a system that contains GPL code belongs to that GPL. All published code and changes must be published back up to the original repo. This seems more suited to your repo. That would include pull requests required to license an appclone, which you could block.
Google will pull any app that clearly violates a license, and right quickly too, since they have some liability to do so. Also, developers have been known to globally broadcast their displeasure when violated.
The boilerplate license types are fairly clear and bullet-proof, as long as you don't need to change them - for that you should seek legal counsel.
Most corporations stay away from anything not clearly licensed as MIT or weaker. You can get fired for pulling GPL code into (compromising the ownership of) a proprietary codebase. GPL is for educational and partnership arrangements.
So there is obviously a clear trade-off between repo license strength and number of active contributors. An unconditional license can attract contributors, a conditional license can orphan the repo. Business is about relationships.
GitHub has created a nice website to help people decide on a license: https://choosealicense.com/
The solution is to use a private repo. You can set anyone as a contributor with write access.
Right now, I keep all of my projects on my laptop. I'm thinking that I shouldn't do this, but instead use a version control system and check them in/out from an external hosting repository (Google Code, SourceForge, etc). I see several benefits here - first, I don't have to worry about losing my code if my computer crashes and burns or my external HDD crashes and burns; second, I can share my code with the world and perhaps even get more help when I need it.
Is this a good idea? If so, what are some other project hosts that I should investigate (other than Google Code and SourceForge)?
Assembla is awesome.
EDIT: Yes, this is a good idea - I used to use a personal copy of Vault and found it was more than I cared to manage (in case my server went down or hard drive crashed - not only was it painful to worry about losing and backing up data, but the downtime). Of course, it doesn't hurt to have your own backup as well. Cover all your bases!
After losing some freelance work to a hard drive crash, I've become keen on the philosophy that "It doesn't exist until its in source control". As I don't want to necessarily share the source for my projects with the rest of the world, I pay for webhosting (using Dreamhost who have great deals on basic shared hosting and easy one-click installs for things like subversion) and store my data that way. They don't claim to be any sort of backup service, but all I really want is a second copy offsite somewhere.
If I do decide to share the code I can always make it public later. Do note that sourceforge does not allow private/personal projects, and Google Code forces you to license your code using an open source license. Both have some limitations on the number of projects you can create (and aren't really intended to store everybody and their brother's personal projects).
Assembla looks pretty slick although it is hard to tell what all you get for free. I'm definitely going to try it out.
There is an extensive list at wikipedia.
GitHub is a really great option for git.
Most of the free, public hosting sights will insist that you license your code with an OSS license (and, possibly, your documentation). That's potentially a different thing that you're talking about (backups).
For just backups, you may want to try a for-pay service or even something like mozy.
I use Assembla - You can share your code if you want, but you are not required to. That's a big plus to me.
Online backup is cheap and easy. Why would you not?
I host most of my non-code backups on Amazon's S3 service.
Code goes on a Slicehost virtual server that has automated snapshot backups (daily as well as weekly) and runs Subversion and the Trac web interface to it.
Github is a really great hosting service if you use Git; and of course everyone should use Git. The default is free public project hosting, but if your stuff is proprietary (or perhaps embarrassing) you can get private hosting from them for some cost per month.
If you want to make your projects in some form public, than a hosting-solution may be useful for you.
I made a listing of project-hosting-sites at this question. Of these list only Origo allows you also to host a closed-source-project. As long as you want to open up your source, you can choose everyone on this list.
For my personal projects I use a git repository on a local Fedora Server (that is backed up daily). I .tgz the repository and mysqldb (for bugzilla) and back it up on Carbonite AND a local, redundant hard drive.
I can clone the git repository from any of my other machines into all other environments.
With this you have a backup and version control. I think my system is better than the one I have at work, LOL.
As long as you want to publish your personal projects as open source, you have a lot of possibilities to choose from, because there are lots of hosters that provide this.
If you just want to store your code somewhere online, but not share it with the world:
Some hosters also allow private repositories, but the only free one that I know of is Bitbucket (which I use myself for my private and open source projects).
They allow an unlimited number of public and private Mercurial and Git repositories, the only limitation is that no more than five users can access your private repositories (you can have more, but then it's not free anymore).