Is it possible for github to remove my email adress from public repositories from a purely technical standpoint? [closed] - github

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
On GitHub, I contributed to Free/Libre & Open Source projects. Now, I want my email address from those commits to be removed.
I expect my user name to be in those commits, too. It does contain my surname and initial of my given at least. Some commits might even contain my full name.
I do expect GDPR to treat the combination of email address and personal name as goods protected in its scope. GDPR does state that the storage of personal data has to happen in consent and with a limited time frame.
(About the latter, I guess I will have to disable the auto-enabled option to "store my data for future generations in an arctic vault", but let's discuss that at another time.)
It would be cumbersome to write to each maintained repository.
Most of the times, they even have 10+ forks with no commit activity by those users which happen to share the information visibly. (GitHub does sometimes enforce public forks via an option, which at least works for lazy "fork-button-clickers".)
Therefore, I do not actually expect to get my personal data completely removed even if I put in a lot of manual work.
From a technical standpoint, git history has to be rewritten. Every DVCS user has to accept those changes [1].
Legally speaking,the case is clear. But:
Is it feasible with the help of GitHub to enforce my right to privacy in many projects? Would published NPM modules be affected as well? (I expect to have only changed their documentation, not actual executable scripts. But exactly the documentation is often hyperlinked to at github from npm.)
It would require all public repositories to accept such a change of history, and perhaps even put in the work to bulk-remove the mail address?
EDIT:
Accepted answer: GitHub can change these Projects and all Forks to private. Works for me, but would hurt these open source projects as well.
The effort to auto-rewrite history (via a script/programming) seems to be out of scope for such an infrequent request.
TY. I do regret asking too broadly and not about recent historical examples.
[1]: What I do not expect is, that every user will purge my email address from their private repositories. My problem is with the easy accessibility of my email address to web scrapers at a central location.

GitHub stores repositories, so from a purely technical standpoint, they are physically capable of editing the data to change it in any way, shape, or form. This is true of literally anybody who stores data on a standard storage medium.
However, because GitHub retains relatively few legal rights to host repositories, they won't modify repositories without the consent of the owner. If there's a legal challenge, they just disable the access to the repository; they don't edit it in any way. The issue of whether your data can and should be removed is left to the project maintainers. As a result, there's no tooling for GitHub to modify any repositories in any way outside of the normal permissions model, and sending any sort of request, legal or otherwise, won't be effective in getting your data actually removed.
I am of course not a lawyer and nothing I say is legal advice, and if you have questions about the law, you should contact an attorney licensed in your jurisdiction. However, you should note that projects that use the Developer's Certificate of Origin (such as Linux and Git) explicitly require you to assent to the recording of your personal information for the life of the project, and if you've signed off your commit, you've made a legal statement agreeing to that, which people may rely on in good faith. If you cannot make a binding statement to that effect in your jurisdiction, then as a consequence the only legal thing would be to refrain from contributing at all.

Related

What is the purpose of non-collaborator reviews in GitHub?

GitHub documentation states:
After a pull request is opened, anyone with read access can review and comment on the changes it proposes.
In public GitHub repositories, what is the point of allowing people other than the repository owner or collaborators (that is, people who don't have write access) reviewing pull requests?
I suppose hypothetically it could be a learning opportunity, but it seems far more likely to be a waste of time. The people with write access get to decide what goes in and what changes need to be made in the code for that to happen. It's highly unlikely that a random read-access-only developer will know exactly what the owner wants.
Also, what would be the proper etiquette in this situation for the pull requester and the owner?
Sometimes a non-collaborator is a former contributor or other subject matter expert who has knowledge about the code or algorithm and can be invaluable as a reviewer. In an open source project, this could be an emeritus member of the core team, or for a company, it could be a colleague who has moved to another team and no longer has write permission on the repo. I've actually requested such reviews from former core team members before.
I agree that in many cases it's possible for a rando to pop up to an open source project and give a useless review, and I see those myself. I usually just ignore them, or if a particular party is providing lots of unhelpful reviews, I usually deal with it like I deal with any other unhelpful behavior and ask them nicely to stop.

Should I commit all my computer science homework assignments to GitHub? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
After reading a community wiki on Quora, I decided it would be good to start experimenting with GitHub. I thought, "What a better way to experiment than with introductory computer science homework?" However this practice opens up my solutions to the web, and I am concerned that other students might plagiarize with it. I have read other questions on StackOverflow about version control and homework.
Thus, a few questions come to mind as I consider this practice:
Does putting homework code on GitHub open it up to be copied?
Are people that plagiarize familiar with GitHub?
Should I be concerned?
Would plagiarism detection software scan GitHub
Does putting homework code on GitHub open it up to be copied?
If you create a public repository, then yes. Private repositories cost money (7$/month for 5 private repositories) though, as pointed out by carols10cents there is a free student version https://github.com/edu
Are people that plagiarize familiar with GitHub?
Open source is all about sharing. That is kind of it's point. Don't store things you want to keep private in a public place.
Should I be concerned?
For general homework no. Again, don't put essays and personal writing in a public repository. That would be similar to putting your essays on a public blog.
Would plagiarism detection software scan GitHub
I don't know. Probably, eventually.
Git can be used without github. To really learn git, you do not need github or bitbucket or any other paid service. GitHub is just a public set of servers to store/share/backup your work on.
Git is great for tracking revisions. If you have ever used Google Docs (Google Drive) and looked at it's history feature, you are probably familiar with how nice it is to be able to revisit changes and old versions of your work. Git formalizes this by allowing you to comment on your commits, branch your work into multiple versions, or just experiment without worrying about overwriting the original work.
Update
I read the Quora post and thought I might add this.
The very best thing that you can do to improve your skills is rent a server of your own from a vendor like Rackspace, Digital Ocean, or Linode to name just a few of the providers. These services can run as little as $5/month though $10-$20 a month is more typical. From there you will have to learn how to configure a Linux machine. You can install a git repository, mail servers, web servers, whatever you want, in a very low risk environment. Make a mistake and you can just reset the server to its virgin state. I recommend installing an Ubuntu distro because of its large community and relative ease of installing new software.
One of the problems with developers is that they often are too dependent on sysadmins for tasks that really should be part of their repertoire.
Does putting homework code on GitHub open it up to be copied?
It depends. If the repository is public, anyone can see it, and fork it. They may even send you pull requests! If the repository is private, on the other hand, it can only be seen by people that you allow. You need a paid subscription to create private repos.
Are people that plagiarize familiar with GitHub?
That's off-topic. But IMO, you should always suppose plagiarizers are familiar with everything.
Should I be concerned?
It's just homework. Why do you care? It's not like that's your doctor thesis or your next patent material, is it?
Would plagiarism detection software scan GitHub
I know there's software that does that with Wikipedia. I wouldn't be surprised if someone made that for Github. But usually such software checks whether you've copied something from well known sites - if you are the author of the original content, you have nothing to worry about. If other people are plagiarizing you, it means you are good at what you're doing.
Last but not least: you might want to read about Creative Commons. Unless you really want to keep your work top secret, it's better to use a CC license than to lose a night's sleep over people copying your work.
Yes, unless you use a private account.
How could we know?
By publicizing your work, you're not doing anything wrong. Those who would cheat by copying your work and pretend the work is theirs would be the bad guys. Now if your teacher receives twi identical homeworks, you'll have to prove your innocence, which might not be so easy.
I guess so.
My advices
experiment by opening a private account, that onlyyou can have access to, or
experiment with git (which is what matters, more than github) by installing your own git server on one of your own machines.

How important is version control integration with your bug tracking software [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Currently we use FogBugz for tracking issues and found it to be ok. I'm looking for something else that can allow end users the ability to track their cases along with us. And something that actually works well with email. I've found a few alternatives that support those features but they don't integrate with version control. We've got all the SVN hooks in fog bugz and we use them - but I haven't really found them all that useful. Has anyone found a really good reason to need version control integration with the bug trackers?
Clearly, this kind of integration is not something that is essential to the operation of the software. With a bit of discipline every check-in can be accompanied with a bug number manually, and every bug resolution can manually have a version control tag added to it.
All else being equal however, I personally will always prefer automation over 'discipline of the users', because the latter will always sooner or later let you down from time to time. Not because the users are malicious or incompetent, but simply because people cannot be 100% alert all of the time.
I find the integration of SVN with TRAC very helpful. Through SVN hooks, commits to the repository with a ticket number insert a comment on the ticket with a link to a nice visual HTML representation of the revision number, showing inserts, deletes, and diffs.
As a supervisor over a small team of programmers, I find this as a helpful tool for me to do code reviews, so I can verify that the commit truly addresses the associated issue. I wouldn't exactly call this integration essential, but it was a nice free extra on my issue tracker that I've grown to love.
It is absolutely critical for us.
Here is a typical commit log for one of our projects (sample):
Make sure filedes is cleared in child list prior to reallocating
When p->child-filedes is > 0, the child list is active and can not
be collected.
[ Impact: Closes bug 123457 ]
Note the [ Impact: ] line, which could also be "Relates-To", "Caused" or any number of other things.
This lets us use simple greps and automated scripts allowing the person committing to automatically close, or even re-open a bug.
Though we typically use Git and Mercurial, these sort of hooks would work on (almost) any VCS, especially proprietary ones that do not feature some modular plug-in that you need.
If you think of your bug system as just another part of your VCS, its really easy to see how they depend upon each-other.
Other stuff, such as fetching patches submitted with bugs is possible, too.
It is a question about your code size, and how many bugs you need to track.
And it is also really useful for non coders in the organisation i.e. managers and customer support. They can find answers to questions like "When and where was this bug fixed"...
I think it's helpful to distinguish between bugs found internal to the development organization, e.g. from peer code review, versus bugs found by a test group that is external to the development organization.
The (small) benefit to coordinating version control with bugs found by an external test group would be for historical reference.
The larger benefit is in coordinating bugs found via peer code review with version control -- by doing so you can certify that all code is peer review bug free before releasing it to external test groups; a common requirement.
FYI, Code Collaborator from SmartBear, Inc. handles this nicely.
I have found version control integration to be extremely helpful in maintaining and managing multiple versions (stable, development trunk, etc.) of a project.
Using the version control integration and a bit of discipline from coders to reference bug tickets in commits (or some pre-commit hooks to forcibly require ticket references) has allowed us to quickly and easily generate lists of changesets that are required to fix any given bug. This is instrumental when merging the fixes into various stable branches of the code.
It's not a necessity, but it certainly makes life easier for release management.
I've used SVN + Trac and Atlassian's Jira product with Fisheye SVN plugin and have found both tools to be very good. Trac seems to be a bit simpler, but very easy to use. Jira, in my opinion, had a nicer look and feel and quite a few more bells and whistles, but was almost too much at times.

Managing multiple code branches and deliveries [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I work for a small one-product one-customer company that is transitioning to a one-product, multiple-customer company. Even though we've had just one customer, we've had different projects with different delivery dates, but for each project we've been able to deliver the latest monthly release which we've kept in a separate code branch in case we've had to deliver bug fixes for that specific release.
Recently, we've acquired a number of new customers and a new problem has arisen: The head branch will typically solve (without breaking functionality) many different customer specific problems, and not all customers want all the changes, but would rather prefer to cherry-pick fixes and features.
Do you have any experience with that situation, and how to handle it practically without being overloaded by testing and work (our monthly release tests take about 3 days of computer time)? And version control wise, how do you manage (I guess cvs will finally have to go...)?
The most simple solution is to cut the product into a core product and each feature into a plug-in. That way, every customer can cherry pick the features they want. But even this solution can quickly overwhelm a small company.
In reality, you usually are in a worse situation: You have a new feature which helps customer A and breaks something for customer B (say, customer B is not ready to modify their database and the new feature doesn't work without the change, so this in fact makes the new version unusable for customer B). If you were big, you could simply ignore customer B.
As it stands, you really need to find a way to convince your customers to move on. The most simple way is money: Tell them how much it will cost them to get a tailor made product and how much all them will save if you can find a solution that suits everyone. Invite your customers over, build a list of changes together and have everyone agree on the plan.
Also, you really must have automatic unit tests, so you can be 100% sure that the product which leaves the house today can't possible be worse than what you sold four weeks ago.
Even with the best version control system out there (for me, that would be git), you can't solve the fan-out you get if you can't get everyone pull into the same direction (except, of course, you can really split each customer into a plug-in).
We have a similar setup of one (fairly specialist) product and multiple (but only the order of hundreds) customers who all want their own pet feature.
As far as I can see you can either go down the 'off-the-shelf' route where your product is non-customer specific, and any features you add are for the good of the product (possibly at customers' request); or you go down the bespoke, consultancy route where every customer has their own unique version of the product.
If your customers all require basically the same product then you should go down the first route and that means all features go to all customers.
Hiding features is easy, maintaining multiple concurrent versions is a nightmare!
A solution which could work if your customers demand to cherry pick their features is to maintain branches for each of them and then very carefully copy relevant changes from your head branch.
This means that your commits need to be as atomic as possible - only fix exactly one issue - and that no changes should go directly into customer branches. But that approach is still very dangerous.
Its possible to use CVS in this situation (altough I would recommend you take a look at other options like SVN).
I worked on some similar projects. and what we did was having a Commom branch, for core features of the system and a "Customer" branch for each variation of the product, this way you can implement specific features and bugfixes of each client and still use the same changes in "commom" to all variations of the product.
This approach takes a lot of configuration management effort though (merging and building), so you might want to have a specific person to handle theses tasks.
EDIT:
Additionaly, you should (if donĀ“t already) have a bug tracker system, in which you should document the client/branch you are working on.
Only support the head/main trunk, unless there is a branch that has a bug fix/feature that is not present in the main line.
I know you said some customers do not want that, but I have seen the end result of the many branch support. You do not want that. It is a nightmare and will cripple your development, product and test teams.
Don't do it.
Be firm.

Is our bugtracking workflow so unique? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We're currently using Mantis as our bugtracker, and we're pretty much sick and tired with it. The developers want more SVN integration, the customers want an easier system to work with.
As such, we're looking for a new bugtracker and at the moment we're looking at Redmine. However, in its default setup it doesn't match our desired workflow, or at least not much better than Mantis does.
We have the following workflow, and would like a bugtracker to match it.
A bug is reported (often by the customer), and is considered 'new'.
These bugs are regularly reviewed and either acknowledged (it's a bug) or marked as a feature (customer often needs to pay) and delayed until the financial part has been worked out.
The bugs are then assigned and handled by a developer
when finished, it's marked as 'ready-for-review' (by another developer)
when reviewed it's marked as 'reviewed'
when marked as 'reviewed', the original developer places the new code at the staging environment and marks the bug as 'ready-to-be-tested' (by the bug-reporter)
bug-reporter marks the bug as 'resolved'
when placed on production, bug-reporter closes the bug
Of course, feedback is often required especially during the early stages. We're looking for a way to distinguish between who is required to take the next step, and who the bug is assigned to (developer). We also want the customer to do so using a simple gui - asking them to change the assignee from their own account to the developer, or even more difficult: a 3rd party (think: design agency) has just too much to ask using the regular gui's.
The gui should show them what to do and which options there are - not search for them.
Does anyone have any experience with a bugtracker that works this way? Is our workflow really wack? How do you make sure everyone involved understands where the bug stands, and who is required to take which step?
Last year we had the same problem, and we figured out that the best solution for us was Jira.
With respect our workflow is more robust and complicated than yours.
We have pretty much the same kind of workflow which we are managing using Redmine with email integration. The customer logs bugs into Redmine directly. Notification comes to the project manager who decides which developer can work on the bug.
The developer opens the bug and puts it into the Investigating state.
If its a feature, he replies to it stating the reasons and puts it into the Replied state which is then revisited later.
If its a bug, then the developer starts development. Before this he puts the bug in Coding state.
Once the coding is over, he changes the state of bug as Review and the peer reviews happen.
If there is any rework, then the developer changes the state to Rework.
Once everything is ok, the developer changes the state to Delivered.
The QA verifies the bug and the finally closes it by changing the state to Closed.
We've defined all of this workflow in Redmine and have been using this pretty effectively without any hassles. Email integration makes everything easy for the project manager to track whenever any bug changes its state.
You can also create and save custom reports, which is a cool feature as well.
I've been using Trac for a small personal project, and at work we used Bugzilla for this.
The workflow you described also sounds like how Red Hat utilizes Bugzilla.
As other's have said, Jira is very good. I especially like its ability to create a custom issue workflow