I have a very simple question regarding the general organisation of folders in GitHub.
Because I often forget to commit specific GitHub project, I started to group different projects and folders in large generic folders.
For instance, I would have a general folder called all_projects, and put inside project 1, project 2, and so on. Then I would simply git add . everything at once.
As my general folders getting bigger and bigger, I was wondering if there are major drawbacks to this kind of organisation and how you would do it differently.
I think it's a bad practice to put everything you do in same folder. Not locally, but on Github.
Locally, you can create folder Programming or my_work or all_projects and in there, create projects separately. They also should have a proper name, not just be named. For example, if the project is doing web scraping instagram, name it Instagram web scraper or InstagramScraper or something that when read, almost instantly remember what you did just by reading project(folder) name.
Also, check if the project is large or small.
If this project is something you've built for long, with big project structure and lots of files connected somehow to each other, than you can push it to github with proper name of repo and README.md . It is self contained and it shouldn't be part of some else project that has nothing to do with it.
If small, like one script, or just something you've practiced while learning something, consider using https://gist.github.com . It's connected to your github account and it can hold small scripts.
P.S.
This is my personal opinion.
Related
Does GitHub have public access restrictions?
Example file:
https://raw.githubusercontent.com/vuejs/vue/dev/package.json
What will happen if a million users download this file?
This is from a GitHub employee in regard to "raw" file access:
I spoke with our engineering team and learnt that there's a limit of
5000 requests per hour per IP address. Additionally, due to internal
routing and caching, that 5000 figure isn't going to be exact. We may
accept more but it's sometimes possible that we'll accept less too.
As was pointed out to me, if you're at risk of hitting this limit,
then you're probably doing something wrong and there's a better way to
obtain or even store the file.
After 1+ year of waiting, they still haven't confirmed if this is accurate or updated Docs, so I'm guessing routing requests via the GitHub API and using tokens might be more reliable.
Ref: https://github.com/littlebizzy/slickstack/issues/180
Ref: https://github.com/github/docs/issues/8031
I don't think there is any limitations. i have deployed many simple static website on github which gets accessed by a lot lot of users. At times, i have seen it being slow a lil bit(during heavy traffic). But limitations, there is not any limitations.
GitHib definitions for "public" code access are very vague online so hope this helps anyone who was as confused as I was!
GitHub confuses "public" with "open source".
The first is a permission-based access designation and "git" workflow strategy on GitHub, the latter a licensing issue and a broader code access paradigm. But they mix the two together to create a new workflow on their website for how code gets shared using source control git. That confused me.
In general, GitHub "public" repositories means close to the same thing as "open source" in terms of access and use. In general it means any public GitHub repo can be viewed, downloaded, forked, etc. But anything beyond that starting with "write" access on the owners original code base requires the "owner" of the repo to add that person as a "collaborator". I interpret that to mean unlimited and unrestricted access to copy, download, and view your code by any known person, machines, process., etc.!
However, the sample open source licenses (like GNU 3.0, etc.) they recommend you create or use for your projects might legally limit some use of your code. By they are not going to help you enforce or limit that. Once your code is online there is no script or lawyer or enforcing entity that can stop any of that. That is why its called "open source". I have used the GNU "free beer" license for distribution of my personal code before and like it though Ive never seen a need to enforce it as far as limiting much. The main thing it would help with is making sure you remain copyright owner on the code in the USA and in a few other countries....AND....stop big corporate entities from taking your code and claiming copyright, limiting free use, etc.
HOW GITHUB DEFINES "public"
Note: The following applies to GiHub individuals, not organizations or enterprise accounts which have much more granular control over GitHub code projects and repositories.
When you go public on GitHub, meaning you turn your repo to "public" access, you are allowing some form of "open source" or "free" use of the code. In the "git" world this could be many different things as far as both access and use. But in the GitHub world it implies full rights for people or machines to have "read" access by default when your repo is "public". What does that really mean as far as access and use? Well it means:
Anyone or any machine can view the code (they call it "visible") or code files online for free, including manually copy the code in a web browser. That means unlimited views and use of your code.
Anyone or any machine can "download" the code via their code download link. In the GitHub world that means a zip or other compacted wrapper of all the code files into a format you can download in one file. That means unlimited downloads of your code.
Anyone or any machine can "fork" (not "clone") the code. In the GitHub world that means GitHub copies the code and sticks that copy into your GitHub online web account, if you have one. This copy is a "fork" to them, though traditionally that's not what "forked software" means. With this copy a user can then download a "clone" of the forked code to their local machine and start modifying it and push changes to the GitHub forked copy. They cannot do anything with those changes as far as changing your original code base without you setting them up as a "collaborator". But it does includes sharing that with the world as well, which increases views and downloads of your code base to even more people you cannot track! So "public" means all the public clones, mirrors, or forks can be downloaded and shared as well.
BTW...."forking" the code in the GitHub world means copying the code with all the commit and git source history to their GitHub account so later - with more permissions granted by you - they can submit your code back to the original repository code base with a pull request for changes.
This confused me at first, as I thought a "public" repository at GitHub meant anyone can "clone" the original repo to their local box only, which would allow anyone to use a local copy of the GitHub remote repo and pull code updates. In that model they could never do push or pull request updates without additional permissions, which makes sense, but also could never share copies of your code online (unless they explicitly created a new repo at GitHub from your code base).
But that is not what "public" means to them. They want people to directly fork or copy projects into the public site and modify code on their platform using forks. That is the workflow GitHub encourages on "public" projects on their site. This allows any user or machine to make a full copy of everything and do whatever they like to that copy, including sharing and distributing it to others. This is why "public access" does open up your code to lots of crazy things including copies of your code spreading quickly across GitHub with no way to know how many people have truly used it in projects or even care to contribute back to your original.
Personally, at all the companies I have worked at that use Git, I have never seen that type of model for distribution of repositories. We always cloned a master in a development environment and built branches remotely and locally from there. It feels like this was not thought through as it opens up distribution of your code into millions of versions of forks most people never asked for, cannot sync, and will forget about over time.
I tried a few searches related to this questions on here and didn't see something similar.
As part of a bootcamp I am working on, I have a few mini-projects I have to submit. These projects take between 2-10 hours to complete and typically only contain 1-5 files each. I've seen other students have separate repositories for each of the projects, which allows them to have a clean readme.txt. Initially, I grouped these mini projects into folders within one repository dedicated to these mini projects to reduce clutter. So I can see potential advantages to either approach. At either rate, these small projects were designed to demonstrate very specific skills that are relevant to future employers, so I want them to be available for inspection.
So is there a "best practice" if I am intending to use GitHub as a portfolio of my code? Would it be better to declutter my repositories by keeping small projects together so my bigger projects can shine? Or to separate them so each project is readily in-sight?
have separate repositories for each of the projects, which allows them to have a clean readme.txt.
You can have a clean readme.txt/readme.md with one repo, which would:
explain what the subfolder represents
list your subfolders
link to sub-readme.md (one readme.md per subfolder, per project)
That being said, you can create an account dedicated for this bootcamp, and keep your repos separate, that way you avoid the clutter with your main GitHub account.
I have a dev folder with all my projects. Some of these are on github and some are not. I also use Dropbox (with symlinks) to keep my data synchronised across several computers.
For example if I add something to my Documents folder on one PC I can then see it in the corresponding folder on another PC.
My question is: If I do the same with my dev folder (so the dev folder is synced by Dropbox on both PCs) will it cause problems with my pushing to github?
You don't ever want to mix code versioning strategies. Either all of your code lives in git (which is a good idea), or it all lives in Dropbox (which doesn't give you any history, hence a very bad idea).
When you add a source file to git, you should be forced to push it to Github so it can be pulled at a later date.
I get the feeling that you will run into issues when pushing the code - you'll be adding new files in through one source, but pulling through another - it'd turn into a headache more than a benefit.
I'm not sure exactly how you could "prove" that is ok. But, I have used exactly this development model with no issues. I personally, don't use symlinks in my dropbox but that shouldn't affect anything. All of my git repos are on my Dropbox. I've been working this way for over a year across OXS, Windows, and Ubuntu. All of my commits and pushes have worked just fine.
Also, this may be a repeat of this question: Using Git and Dropbox together effectively?
[edit:]
Actually one thing was recently brought to my attention is that you might run into an issue with line endings across systems. This post from GitHub (with a link back to an SO question) explains how to deal with line endings.
I had the same question and now my answer is "simply move your repository out of Dropbox".
As you can see, Using Git and Dropbox together effectively? is not the same question, but if you just search the key word "GitHub", you will see the debate about your confusing. And maybe you will make your own desition.
I am in the process of moving to Git from SVN. In SVN I had multiple eclipse projects in a single SVN repository that is convenient for browsing projects. I was going to move to having one git repository per eclipse project but EGit suggests doing otherwise.
The guide for EGit suggests putting multiple projects into a single Git repository.
Looking at similar questions such as this suggest one project per repository.
Which approach is best practice and what do people implement?
It depends on how closely-related these projects are. Ask yourself the following questions:
Will they always need to be branched/tagged together?
Will you want to commit over all projects, or does a commit mostly only touch one project?
Does the build system operate on all of them or do they have a boundary there?
If you put them all in one, some things from above will be easier. You will only have to branch/tag/stash/commit in one repository, as opposed to doing it for every repository separately.
But if you need to have e.g. separate release cycles for the projects, then it's necessary to have each project in an independent repository.
Note that you can always split up a repository later, or combine multiple repositories into one again without losing history.
Combining is a bit harder to do than splitting, so I would go for one repository first and see how it goes.
I use 1 repo per project.
Some reasoning:
When you discover you messed up something after several commits, it's much easier to fix when it's just one project. Just think about, you did commits to two other projects and now you need to fix the commit you did on the 3rd project.
As Fedir said, your history and log is much cleaner. It only shows the commits for that project.
It works better with the development workflow I have. I have a master branch for production, develop branch for, well, development, and I create branches to implement features (you can read more about it here: http://blog.avirtualhome.com/development-workflow-using-git/)
When you work in a team, and so "share" the git repo, do the team members really need all the other projects as well?
Just a few thoughts, but what it boils down to: Do what works for you.
I have multiple projects (Eclipse projects) and have tried different things to find out what worked best in terms of actual daily development. Here is what I found and I think that most people would find the same thing if they kept track of the results and analyzed the results objectively.
In short applying the following rules will give the best results:
Make a separate repository for each project group.
Each project group consist of a group of projects that are tightly connected to each other, that should be administered together and that cannot be easily decoupled from each other.
A project group can contain a single project.
A project group that contains multiple projects should be examined to see if some of its projects can be decoupled from each other so it can be split into smaller project groups that are still contain projects that are tightly connected to each other, that should be administered together and that cannot be easily decoupled from each other.
The following guidelines explain this process for determining which projects to put in the same repository in more detail:
If a project is not closely connected to any other project (for example, the project can be opened without other projects being opened and no other projects relies on the project being opened when they are opened) then you should definitely place it in its own repository for the reasons explained in the answers above this one.
If a project is dependent upon other projects or other projects depend upon the project then it comes down to exactly how connected are they upon each other, how well they can be packaged together and how easily can they be decoupled from each other.
A) For example a testing project that contains junit test classes to test the classes of a main project is a case where the two projects are very connected with each other, can easily be packaged together and cannot be easily decoupled from each other. These projects should be placed in the same repository for the reasons explained in part C below.
B) In a case where one project relies on another project to provide some sort of shared resources it really comes down to how well that they can be administered together and how easily that they can be decoupled from each other. For example if the project with the shared resources is relied upon by many projects, then it should be put in its own repository because the other unrelated projects are impacted by changes to the shared source code project. In a case like this, the shared resources project should be decoupled from the dependent projects instead of being directly connected to the dependent projects. (For example, it would be better to create versioned archive files [Jar files with a name like "projectName".1.0.1.0.jar for example] and include a copy of those in each project instead of sharing the resources by linking the projects together.)
C) If the multiple projects are connected, can be easily administered together but cannot be easily decoupled from each other, then it depends upon how tightly connected they are with each other.
I) If the projects are put into one repository, then the projects will be kept in sync with each other in the repository each time there is a commit, which can be a real life saver if the projects are tightly connected. However, this also creates the issues noted in the answers above this one.
II) If the projects are put into separate repositories, then you will have to take care to keep there commits in sync with each other and be sure to include some sort of mechanism to indicate which commits belong to the same sync point across the projects (Perhaps something like including the same sync point number in the comments for the commit of each project when a group of commits is done across the projects.)
III) So in cases like this, it is almost always better to put these projects together into a single repository to reduce the overhead of human effort in syncing the commits and to avoid human error should the commits need to be backed out. The only time that it might be better to place them in separate repositories is when only one of the projects is being changed regularly and the other connected projects are rarely changed.
I think this question is related to one I answered here. basically Git by its nature supports a very fine granular structure when it comes to projects/repositories. I have read and been taught that 1 repository per project is almost always best practice. You lose almost nothing by keeping the projects separate and gain a lot as other have been describing.
Probably, it will be more performant to work with if You will create multiple git repositories.
If You will make a branch, only project's files would be branched, and not all the projects.
Small project it will be faster to analyze, to commit. Operations will take less of time.
The log will be more clear also, You could make more granulated configuration if You will have multiple git repositories.
I am currently working on a project to convert a number of Excel VBA powered workbooks to VSTO solutions. All of the workbooks will share a number of class libraries and third party assemblies, in fact most of the work is done in the class libraries. I currently have my folder structure laid out like this.
Base
Libraries
Assemblies
Workbooks
Workbook1
Workbook2
Each of the workbooks will be its own solution, and the workbook solutions just reference the assemblies in the folder structure. My question is how would you lay out the source control? Would you start the repository at the base? Or would you create a repository for each workbook solution? Would you rearrange the folders?
Now that we have the initial development done, we're about to have a bunch of outside developers come on to the project to helps us convert the rest of the workbooks and I really like the idea of them being able to check out from the base directory and having all of the dependencies ready to go. I also worry that there are other concerns that come with having 20+ solutions/projects under one source control repository.
I want everything to be as simple as possible for people joining the project but I don't want to sacrifice long term usability. In my mind I've been going back and forth, what's simpler one repository or one repository per solution?
I'd appreciate and insight you have, because I'm fresh out.
Additional Information: Currently, I am using Mercurial personally, but the project will probably get moved to StarTeam unless I can make some convincing arguments for something else.
You don't mention in your question what source control you are using. As it doesn't sound like you need to limit your outside developers access to the rest of the repository I would not bother with setting up multiple repositories. I would assume that unless your code runs into the millions of lines size that repository size is not an issue.
It all depends what functionality your revision control system supports. In subversion you can declare other folders as external and provide a file URL for the content of that folder, this will cause subversion to deal with that folder as a separate repository even though it is within your folder structure.