What is the purpose of non-collaborator reviews in GitHub?

What is the purpose of non-collaborator reviews in GitHub? - github

GitHub documentation states:
After a pull request is opened, anyone with read access can review and comment on the changes it proposes.
In public GitHub repositories, what is the point of allowing people other than the repository owner or collaborators (that is, people who don't have write access) reviewing pull requests?
I suppose hypothetically it could be a learning opportunity, but it seems far more likely to be a waste of time. The people with write access get to decide what goes in and what changes need to be made in the code for that to happen. It's highly unlikely that a random read-access-only developer will know exactly what the owner wants.
Also, what would be the proper etiquette in this situation for the pull requester and the owner?

Sometimes a non-collaborator is a former contributor or other subject matter expert who has knowledge about the code or algorithm and can be invaluable as a reviewer. In an open source project, this could be an emeritus member of the core team, or for a company, it could be a colleague who has moved to another team and no longer has write permission on the repo. I've actually requested such reviews from former core team members before.
I agree that in many cases it's possible for a rando to pop up to an open source project and give a useless review, and I see those myself. I usually just ignore them, or if a particular party is providing lots of unhelpful reviews, I usually deal with it like I deal with any other unhelpful behavior and ask them nicely to stop.

Related

Is it possible for github to remove my email adress from public repositories from a purely technical standpoint? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
On GitHub, I contributed to Free/Libre & Open Source projects. Now, I want my email address from those commits to be removed.
I expect my user name to be in those commits, too. It does contain my surname and initial of my given at least. Some commits might even contain my full name.
I do expect GDPR to treat the combination of email address and personal name as goods protected in its scope. GDPR does state that the storage of personal data has to happen in consent and with a limited time frame.
(About the latter, I guess I will have to disable the auto-enabled option to "store my data for future generations in an arctic vault", but let's discuss that at another time.)
It would be cumbersome to write to each maintained repository.
Most of the times, they even have 10+ forks with no commit activity by those users which happen to share the information visibly. (GitHub does sometimes enforce public forks via an option, which at least works for lazy "fork-button-clickers".)
Therefore, I do not actually expect to get my personal data completely removed even if I put in a lot of manual work.
From a technical standpoint, git history has to be rewritten. Every DVCS user has to accept those changes [1].
Legally speaking,the case is clear. But:
Is it feasible with the help of GitHub to enforce my right to privacy in many projects? Would published NPM modules be affected as well? (I expect to have only changed their documentation, not actual executable scripts. But exactly the documentation is often hyperlinked to at github from npm.)
It would require all public repositories to accept such a change of history, and perhaps even put in the work to bulk-remove the mail address?
EDIT:
Accepted answer: GitHub can change these Projects and all Forks to private. Works for me, but would hurt these open source projects as well.
The effort to auto-rewrite history (via a script/programming) seems to be out of scope for such an infrequent request.
TY. I do regret asking too broadly and not about recent historical examples.
[1]: What I do not expect is, that every user will purge my email address from their private repositories. My problem is with the easy accessibility of my email address to web scrapers at a central location.

GitHub stores repositories, so from a purely technical standpoint, they are physically capable of editing the data to change it in any way, shape, or form. This is true of literally anybody who stores data on a standard storage medium.
However, because GitHub retains relatively few legal rights to host repositories, they won't modify repositories without the consent of the owner. If there's a legal challenge, they just disable the access to the repository; they don't edit it in any way. The issue of whether your data can and should be removed is left to the project maintainers. As a result, there's no tooling for GitHub to modify any repositories in any way outside of the normal permissions model, and sending any sort of request, legal or otherwise, won't be effective in getting your data actually removed.
I am of course not a lawyer and nothing I say is legal advice, and if you have questions about the law, you should contact an attorney licensed in your jurisdiction. However, you should note that projects that use the Developer's Certificate of Origin (such as Linux and Git) explicitly require you to assent to the recording of your personal information for the life of the project, and if you've signed off your commit, you've made a legal statement agreeing to that, which people may rely on in good faith. If you cannot make a binding statement to that effect in your jurisdiction, then as a consequence the only legal thing would be to refrain from contributing at all.

Managing benefits/drawbacks of private repos

As someone who is just beginning to think about using private repositories, if I understand correctly, they basically let you make commits in private until you are ready to open-source your app/program to the world and then, once you do, your entire Github/Bitbucket commit history becomes visible to everyone (like as if you were developing out in the open the entire time).
Now what happens if someone open-sources something before you do and claims provenance in the field/area/app/etc.? Can you basically open-source your software in return (or contact the authors directly) and "counter-claim" provenance? Obviously, the open-source person wouldn't have known about your existence since you're developing in private mode, so whose "right-of-way" would it be in such a hypothetical situation?
I can clearly see the utility of private repos for potential forking by competitors who have many more resources than you do and can hypothetically out-code you to the finish line and/or refactor your code significantly (potentially without attribution), but beyond that I don't really see much of a direct benefit to software development in private repos. Can anyone clarify the above points for me? For the record, I have investigated related posts like: https://softwareengineering.stackexchange.com/questions/87577/whats-the-benefit-of-having-a-private-repository-for-personal-projects

Private repository is about visibility: visible only by you or by all.
It is not about content: you can store anything (not too big) in a Git repo (public or private): a project, or just a collection of files. It is not limited to " software development". You can keep private simple text files representing notes you want to remember, for instance.
Typically, the three ways of claiming ownership of an open-source project, as described in "Ownership and Open Source" by Eric S. Raymond, have nothing to do with private/public repo.
One, the most obvious, is to found the project. When a project has had only one maintainer since its inception and the maintainer is still active, custom does not even permit a question as to who owns the project.
(See also "How do I navigate to the earliest commit in a Github repository?")
The second way is to have ownership of the project handed to you by the previous owner.
The third way to acquire ownership of a project is to observe that it needs work and the owner has disappeared or lost interest.
So this is more about communication, and less about repository management.

GitHub: Is it common practice for a business to store solutions on there?

I'm looking at Github and it looks great. I see there are business accounts you can set up to version control your work. I know there is a lot of open source stuff on there, but is it common practice for businesses to store solutions on there? And more importantly, is it safe? As the solutions are not to be viewed by anyone else.

For what it's worth, I just transitioned my company's source to GitHub, using private repositories. Also, I've been keeping commercial products of my own on GitHub in the same way for some time
It's working great. Your account has a list of 'contributors' for each repository, which controls who can view / commit to each one.
The business accounts on GitHub are suitable for you if you do not want to store your code on someone else's server. Sign up for this if you want to keep your repositories "behind the firewall" by installing the software on your own server.
References:
GitHub Enterprise (this is the "business" plan)
GitHub Security

Concerning safety - there was a similar question a few months ago.
Check it (and my answer there :-) out:
How safe is it to host sensitive data on repository sites like github, bitbucket, etc.?
I don't know if it's common practice for companies to store their code online...but I guess that a lot of companies don't like the idea of hosting their intellectual property at some third party.
Probably "company culture" makes a big part of it.
I'd say that "hip" internet startups are more likely to host their stuff online than "conservative" enterprises/"non-techy" companies.
Some of the "hip internet" companies (for example Facebook, Twitter, GitHub...) at least have open-sourced part of their stuff, but I don't know which of them also host their private stuff there and which don't.
(except GitHub, I read somewhere that they host ALL of their stuff themselves...makes sense :-)
Another example: Headspring Software (where quite a few known .NET developers work) runs nearly completely on online services.
The linked blog post doesn't explicitly mention where they host their source code, but I wanted to mention this example anyway because of all the other stuff they have outsourced.
Many "conservative" companies wouldn't even want their e-mail/calendar/sales data at some third-party provider in the cloud...let alone their source code.

What Check-In Policies should be considered for version control?

I'm tasked with helping to set up the process templates and check-in policies for my company's TFS 2008 installation.
Aside from three check-in policies (a check-in action must have comments against it, a code file must be peer-reviewed, there must be a work item associated with a check-in), I have been asked to consider and implement any others.
What are some of the most important or useful policies to enforce for version control?

The fewer the better.
Usually in an organization you want to ease the friction of check-in to ensure that you are encouraging developers to make frequent small discrete check-ins rather than checking out a load of stuff at once. Then again you want to ensure that you have a working codebase for everyone who needs it and are capturing the data that you need to improve your software delivery process.
Personally, a policy to enforce changeset comments and a work item association policy are ok - as they capture meta-data that is very easy to remember at the time but hard to find afterwards. It also encourages developers to get into the habit of having a work item to track all pieces of work - even experimental development or spikes.
The peer review process might be better performed using branching or another process rather than forcing a peer review on every check-in - however that depends on your process. Remember as well that you can have mandatory check-in notes in TFS to capture meta-data such as code reviewer. A check-in note is slightly different to a check-in policy and is often confused.
If you want read more discussion about check-in policies, take a look at a blog post I did on the balancing act a while ago. Also to hear some more discussion about check-in policies, I recorded a podcast recently with a fellow Team System MVP talking about their use of TFS and it might be interesting (Radio TFS, Using TFS with Ed Blankenship). Finally we also did a Radio TFS episode all about check-in policies in 2008 that might be of interest.

Don't break the build! Of course, finding an automated way to check on that and reject the check-in are the challenge.

Some rules that we follow in our company:
Commit all changes related to the same task at once (that will help review the changes and future rollbacks or merges if needed).
template based comments (eg: prefix all comments with a code that represents what was done, + for adds, - for removes, * for updates, ! for important modifications, etc).
Obviously always check-in code that compiles, and finished work to the main-line.
check-in daily unfinished work to branches.

The ones we use where I work on TFS are:
Code Analysis
This ensures that all the code was compiled on the devs machine before it was checked in
Work Item Association
If you've done a change there should have been an assigned task!
Last Build Successful
Using the TFS Build Server to check that the current code in source control compiled on an independant machine
Check In Comments (part of the TFS Powertools - http://msdn.microsoft.com/en-us/teamsystem/bb980963.aspx)
It's good to be able to see a summary of the check in without having to go to the work item(s)

Try to keep the number of developers working on the same branch small. That way the branch stays stable with respect to compilation, the unit tests, and regressions. It's a nightmare if a developer does a check in which compiles but his code breaks a key area of the application (such as login).
If you really have to have more than 10 developers checking code into the same branch, we've started an email policy where the developer checking in warns everyone that they're checking in, so that no one attempts to update their copy of the branch in the midst of a check in. Sometimes, we've had to have the converse, where we set aside an time in the date to prohibit check ins, so that updates are safe.

Frankly, the less policies, the better. The more policies you have, the greater the incentive for NOT using version control. What happens then is:
Code is developed on parallel, uncontrolled source control systems, and just the final revision goes to the official one.
People delay committing as much as possible, decreasing visibility of what they are doing to other developers.
People will actually avoid committing something if they can get away with it, and some will find a way to get away with it.
In fact, I think your three check-in policies are already too much. For instance:
Having code being peer-reviewed before check-in makes it much more difficult to have work-in-progress stored there. Instead, if the source control system allows it (and many do), control whether the source is peer reviewed or not. With some systems you can create a life cycle for a revision, with others you might create branches, and still others you might use tags.
Having a work-item associated with a check-in makes it impossible for developers to do exploratory programming, or having initiative on possible improvements. It stifles the developers. Instead, make sure that any revision going into integration tests or user acceptance tests, not to mention production itself, is associated with a work item.
This might sound anti-Enterprise, but it's just some things we have learned in a few decades of software development. Most enterprise organizations haven't been clued in to this, but, eventually, they will. So, you might go the very opposite way, but don't say no one ever told you.
I recommend the Agile Manifest, and, particularly, Lean Software Development for general principles.
Or, taking Stack Overflow design philosophy into account, make the system reward the behavior you want.

What is an appropriate level of detail for check-in notes?

When I check-in code, I sometime write very long, detailed checkin notes, other times I write very short ones (or no note at all). The longer notes tend to include information about why the change was made (business reasons, customer interactions, etc). However, I'm not sure if check-in notes are the right place for such detail. Most check-in notes I've seen tend to be short and simply reference a bug.
What is an appropriate level of detail for check-in notes?

Whatever your manager or company documentation tells you ;)
That being said, shorter is better. It's not the correct tool for lengthy documentation - your bug/feature tracking software is built for this and in most cases, can integrate with your source control.

Just enough so, when following the log few weeks later to have an idea about what hapenned.
I use these logs to check what has been done in the last day (or days) in the project I'm leading.
Shorter messages doesn't necessary mean better. Nor longer messages. Just keep in mind the goal of those comments: to give an overview of the activity on versioning system.

The right answer, I've found, is dependent on the needs of your organization. It sounds fuzzy, but the primary reason to provide detail for a code check-in is for context and understanding if that check-in needs to be reviewed or revisited. It might be incredibly verbose, or it may be remarkably simple.
In one company, our code check-ins would reference #+ticket-number. This mapped our SVN commits against a Trac ticket number, which held all of our details about a given issue or feature we were implementing. We referenced everything through Trac, so keeping our details in that form worked best for us.
For you, it depends on how you and your team work. I would base what info you keep in your check-ins on the need for the data, how often its referenced, and what happens if you lose context (i.e., have no idea why a change was implemented.)
Another consideration may be accessing those notes outside your code repository, which may not be the most effective mechanism for storing that information. Nonetheless, I find it's personal preference.

In my version-control experience, I tend to curse the ones that left no note at all, or a note that takes 5 minutes to dig through.
If you use your version control system to browse the history of a file to find a specific change, it's best to include a short comment on the why, and the what. The how is to go in the source code documentation.

Whenever I write a comment or a commit log message I ask myself "what will the next guy need to know? what are they likely to ask me about?"
Answering a question seems to be the easy way to keep comments brief and useful. It also avoids anti-documentation (rephrasing code, often in unintentionally ironic ways) or re-phrasing the metadata the vcs will be tracking anyway (added foo.java, tuesday change, new tag "bar-1-1-4")

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse