Does GitHub rate-limit access to public "raw" files? - github

Does GitHub have public access restrictions?
Example file:
https://raw.githubusercontent.com/vuejs/vue/dev/package.json
What will happen if a million users download this file?

This is from a GitHub employee in regard to "raw" file access:
I spoke with our engineering team and learnt that there's a limit of
5000 requests per hour per IP address. Additionally, due to internal
routing and caching, that 5000 figure isn't going to be exact. We may
accept more but it's sometimes possible that we'll accept less too.
As was pointed out to me, if you're at risk of hitting this limit,
then you're probably doing something wrong and there's a better way to
obtain or even store the file.
After 1+ year of waiting, they still haven't confirmed if this is accurate or updated Docs, so I'm guessing routing requests via the GitHub API and using tokens might be more reliable.
Ref: https://github.com/littlebizzy/slickstack/issues/180
Ref: https://github.com/github/docs/issues/8031

I don't think there is any limitations. i have deployed many simple static website on github which gets accessed by a lot lot of users. At times, i have seen it being slow a lil bit(during heavy traffic). But limitations, there is not any limitations.

GitHib definitions for "public" code access are very vague online so hope this helps anyone who was as confused as I was!
GitHub confuses "public" with "open source".
The first is a permission-based access designation and "git" workflow strategy on GitHub, the latter a licensing issue and a broader code access paradigm. But they mix the two together to create a new workflow on their website for how code gets shared using source control git. That confused me.
In general, GitHub "public" repositories means close to the same thing as "open source" in terms of access and use. In general it means any public GitHub repo can be viewed, downloaded, forked, etc. But anything beyond that starting with "write" access on the owners original code base requires the "owner" of the repo to add that person as a "collaborator". I interpret that to mean unlimited and unrestricted access to copy, download, and view your code by any known person, machines, process., etc.!
However, the sample open source licenses (like GNU 3.0, etc.) they recommend you create or use for your projects might legally limit some use of your code. By they are not going to help you enforce or limit that. Once your code is online there is no script or lawyer or enforcing entity that can stop any of that. That is why its called "open source". I have used the GNU "free beer" license for distribution of my personal code before and like it though Ive never seen a need to enforce it as far as limiting much. The main thing it would help with is making sure you remain copyright owner on the code in the USA and in a few other countries....AND....stop big corporate entities from taking your code and claiming copyright, limiting free use, etc.
HOW GITHUB DEFINES "public"
Note: The following applies to GiHub individuals, not organizations or enterprise accounts which have much more granular control over GitHub code projects and repositories.
When you go public on GitHub, meaning you turn your repo to "public" access, you are allowing some form of "open source" or "free" use of the code. In the "git" world this could be many different things as far as both access and use. But in the GitHub world it implies full rights for people or machines to have "read" access by default when your repo is "public". What does that really mean as far as access and use? Well it means:
Anyone or any machine can view the code (they call it "visible") or code files online for free, including manually copy the code in a web browser. That means unlimited views and use of your code.
Anyone or any machine can "download" the code via their code download link. In the GitHub world that means a zip or other compacted wrapper of all the code files into a format you can download in one file. That means unlimited downloads of your code.
Anyone or any machine can "fork" (not "clone") the code. In the GitHub world that means GitHub copies the code and sticks that copy into your GitHub online web account, if you have one. This copy is a "fork" to them, though traditionally that's not what "forked software" means. With this copy a user can then download a "clone" of the forked code to their local machine and start modifying it and push changes to the GitHub forked copy. They cannot do anything with those changes as far as changing your original code base without you setting them up as a "collaborator". But it does includes sharing that with the world as well, which increases views and downloads of your code base to even more people you cannot track! So "public" means all the public clones, mirrors, or forks can be downloaded and shared as well.
BTW...."forking" the code in the GitHub world means copying the code with all the commit and git source history to their GitHub account so later - with more permissions granted by you - they can submit your code back to the original repository code base with a pull request for changes.
This confused me at first, as I thought a "public" repository at GitHub meant anyone can "clone" the original repo to their local box only, which would allow anyone to use a local copy of the GitHub remote repo and pull code updates. In that model they could never do push or pull request updates without additional permissions, which makes sense, but also could never share copies of your code online (unless they explicitly created a new repo at GitHub from your code base).
But that is not what "public" means to them. They want people to directly fork or copy projects into the public site and modify code on their platform using forks. That is the workflow GitHub encourages on "public" projects on their site. This allows any user or machine to make a full copy of everything and do whatever they like to that copy, including sharing and distributing it to others. This is why "public access" does open up your code to lots of crazy things including copies of your code spreading quickly across GitHub with no way to know how many people have truly used it in projects or even care to contribute back to your original.
Personally, at all the companies I have worked at that use Git, I have never seen that type of model for distribution of repositories. We always cloned a master in a development environment and built branches remotely and locally from there. It feels like this was not thought through as it opens up distribution of your code into millions of versions of forks most people never asked for, cannot sync, and will forget about over time.

Related

Share PR comments with team outside of organisation

We have a team outside of our organisation that is writing firmware for us. They have an internal source control that we do not have access to. They share code with us by sharing a zip file with a .git inside it and we recreate the repo in our internal source control.
We want to conduct an overall code review. This will likely take some back and forth, multiple comments on multiple lines and files of code.
Is there a way to comment internally in our source control, then share these comments to a zip file with this external team? Or is the only way to do this, by creating another source control that is shared between us and the external team?
Create a GitLab account
Add them to your organization as a developer
Create a repository
Have them add that repository as a "remote" to theirs, so that when they push, their commits also go to that GitLab repository.
Then, do your collaboration via GitLab. It has a decent interface for creating Pull/Merge Requests, adding comments to code, and accepting them.

Github folder organisation efficiency

I have a very simple question regarding the general organisation of folders in GitHub.
Because I often forget to commit specific GitHub project, I started to group different projects and folders in large generic folders.
For instance, I would have a general folder called all_projects, and put inside project 1, project 2, and so on. Then I would simply git add . everything at once.
As my general folders getting bigger and bigger, I was wondering if there are major drawbacks to this kind of organisation and how you would do it differently.
I think it's a bad practice to put everything you do in same folder. Not locally, but on Github.
Locally, you can create folder Programming or my_work or all_projects and in there, create projects separately. They also should have a proper name, not just be named. For example, if the project is doing web scraping instagram, name it Instagram web scraper or InstagramScraper or something that when read, almost instantly remember what you did just by reading project(folder) name.
Also, check if the project is large or small.
If this project is something you've built for long, with big project structure and lots of files connected somehow to each other, than you can push it to github with proper name of repo and README.md . It is self contained and it shouldn't be part of some else project that has nothing to do with it.
If small, like one script, or just something you've practiced while learning something, consider using https://gist.github.com . It's connected to your github account and it can hold small scripts.
P.S.
This is my personal opinion.

Should GitHub be used during development of a website?

I am currently developing a website using html, css, php, and javascript. During this development, I am constantly modifying the css to get just the right look.
I feel that it is impractical to re-upload files or modify them all on GitHub as I continue this development. Am I right that I think that GitHub should be used near the ending stages of development, or is there some functionality of GitHub I do not yet know?
Github (and git in general) can be helpful in the development of a website, but it really depends on your needs. In your case, you mention CSS. If you manage to ruin your CSS beyond repair, then git has your back, since you can revert back to a working version. Github makes this easier by allowing you to directly see the CSS file with all insertions and deletions.
If your content does not undergo drastic changes, and you make occasional smaller changes, then git should work well. Version control is the main goal of git, and since small changes to a file don't involve a brand new copy of the file but rather the inserted and deleted lines, there may be an advantage to using git rather than backing up your entire website every now and then.
So in the end, it depends on what your website is like and how you plan to tweak/update it.
I see no reason why it should not be. The so-called development stage of any project is (should be?) especially messy. Although the benefits of version control is not apparent during this stage, they can not be stressed enough because of its ability
to "go back" in time,
to provide a sensible, repeatable, dependable work environment
to lead into a development process that you can rely upon.
A branching model is also required. This, coupled with a quick and easy devops procedure (which works so well with git) should also be employed from day 1.
When I did my website development, I used git extensively (although not GitHub), and since GitHub has excellent support for private repos, there is no reason not to use it. You could also take a look at GitHub Pages which creates websites from your repos!

Github and Dropbox conflict risk?

I have a dev folder with all my projects. Some of these are on github and some are not. I also use Dropbox (with symlinks) to keep my data synchronised across several computers.
For example if I add something to my Documents folder on one PC I can then see it in the corresponding folder on another PC.
My question is: If I do the same with my dev folder (so the dev folder is synced by Dropbox on both PCs) will it cause problems with my pushing to github?
You don't ever want to mix code versioning strategies. Either all of your code lives in git (which is a good idea), or it all lives in Dropbox (which doesn't give you any history, hence a very bad idea).
When you add a source file to git, you should be forced to push it to Github so it can be pulled at a later date.
I get the feeling that you will run into issues when pushing the code - you'll be adding new files in through one source, but pulling through another - it'd turn into a headache more than a benefit.
I'm not sure exactly how you could "prove" that is ok. But, I have used exactly this development model with no issues. I personally, don't use symlinks in my dropbox but that shouldn't affect anything. All of my git repos are on my Dropbox. I've been working this way for over a year across OXS, Windows, and Ubuntu. All of my commits and pushes have worked just fine.
Also, this may be a repeat of this question: Using Git and Dropbox together effectively?
[edit:]
Actually one thing was recently brought to my attention is that you might run into an issue with line endings across systems. This post from GitHub (with a link back to an SO question) explains how to deal with line endings.
I had the same question and now my answer is "simply move your repository out of Dropbox".
As you can see, Using Git and Dropbox together effectively? is not the same question, but if you just search the key word "GitHub", you will see the debate about your confusing. And maybe you will make your own desition.

How do I do source code management without version control tools?

I work on a software project which has a suite of source code that undergoes periodic change. The code is typically promoted to a production environment, and development continues in a development environment. Emergency hotfixes in production need to be backported to development. A third environment for testing may also exist from time to time. Many developers work on this code at the same time, often needing to make changes to the same individual file.
In short, a classic use-case for version control software. Unfortunately, we have a stone age IT department, and we do all our development in a stock Windows XP environment with absolutely no possibility of using any other software without approval - which never happens. We are lucky to have Winzip.
So what's the best way of managing the above workflow without any real tools? At the moment we are just editing files on a Windows shared drive, making ad-hoc working copies into folders with names like "James's Copy of X", doing backups with Winzip, and calling across the room, "is anybody working on this file at the moment?"
Thanks,
James
Edit: Some clarifications:
The irony is that the system is hardly locked down at all - I could download, install and configure TortoiseHg in about 7 minutes. But I need to do this by the book.
I am also actively pursuing getting version control software through official channels, but ETA for that is 6-9 months if ever, so I'm just trying to do the best I can with what I have now.
Finally, trust me, you will be reading about this project on TheDailyWTF one day, so please help me out with what I can do now rather than what management should have done last week.
Get source control. Talk to management, refuse to work, do whatever you can to get it in.
Bring in a netbook, install a SVN server and use that. Run Git off USB drives.
Really - anything.
It is not just an industry standard now - it is irresponsible of you and your management to continue working like this.
After notifying management and explaining that this is an issue, if they do nothing, just let the inevitable happen. Something that shouldn't have been promoted to production will be (regression, bug, new feature, whatever). When they come to blame you, explain how source control can help ensure that such things do not happen again. Perhaps they will listen then.
Ok, two actual options occur to me here.
First. You have Winzip, and you appear to have web access since you're posting on Stack Overflow. Assuming you have the ability to upload files (which isn't a given, since you're still using a generic StackOverflow avatar) you could find - or build - an externally-hosted service that'll allow you to upload a ZIP file via a web browser, unzip it, and then commit the unzipped contents to a Git or Subversion repository. Stick a secure web front end on it (Apache + mod-dav-svn) and you'll have the ability to browse, review and commit changes to individual files. You won't get the benefits of local SVN/GIT capabilities like merging, but you'll have centralized project history. There could even be a quite lucrative business model in this - selling web-based SCM to developers who are stuck on IE6 and WinXP and can't install anything.
Second: You find a junior/admin in your IT team who's just as frustrated as you are at the draconian restrictions being enforced, persuade them that you know what you're doing, and get them to 'accidentally' set up a local administrator account on your workstation. WinXP is sufficiently insecure out of the box that this shouldn't be too hard to make this look like an accident.
Copy and paste the files into a seperate folder and call that folder vers_x, or get the windows backup utility to save them each day, or another backup utility to do the same? though the first post is correct, strike till vers cont is implemented.
Get git. git init on the current source "repository". Install git on everyone's pc's.
Also, the strike idea is definitely not so bad in this case.