I need an index-page, that shows links to all gitHub repositories.
I think that is the reason, why many repos are not found by crawlers like the Waybackmachine
I think if there was such a site with a high ranking, they would start crawling it
The developer site sais, there is an Api for getting all repos
Warning: GitHub hosts a huge number of repositories. You'll have to take this into account when designing your index.
I can think of a few options:
The legacy GitHub search API. You'll have to cope with the API rate limit though.
This StackOverflow answer could be a good start to get a rough grasp of the number of repos per language.
Leveraging the GitHub Archive project which records the public GitHub timeline. (Note: As the project only exposes events back from February 12, 2011, you won't get any data about repositories showing no activity since this date.)
Related
If you are in an organisation, there may be GitHub repositories that are private (i.e. you don't have access to them), but it would be useful to know that they exist, and then you could arrange access where appropriate.
In other words we are trying to enable discoverability, in a way that can lead to access. This could be done with sharing readme's (noting that people need to have some discipline to write sensible readme's).
This blog post Solving the innersource discoverability problem looks like a potential solution, but may require that the user has access to see all the repos in the portal? I'd like for the user to be able to view readme's for all repos - if they don't have access, the can contact whoever is listed on the readme.
I see another option for making a file public from a private repo (using gitexporter to create a public repo with only the readme, example here. This makes it public, not my first preference, and would require every repo to do some work, far from ideal. While it doesn't give a neat portal, it should allow GitHub search functionality to find it by topic or keyword?
A related, perhaps simpler option is proposed here, where a student shares a readme from a private repo as a public GitHub page. Again, requires a little work from every repo, no neat portal, but can be found with GitHub search? While public Github pages can be made private, then would only be visible to those with repo access?
So, if I'm summarising basic requirements:
All org repos (public, private or team) have a readme that can be accessed by search by someone in the org (preferably not requiring each individual to modify their repo).
Additional nice to have features:
All readmes can be viewed in a portal with search
Bonus for being able to make super private (only collaborators can see readme - flag in readme?), org private (only people in the org - default) and public (flag in readme?).
Simple to implement!
Suggestions?
I think you have already provided a suitable solution for it here already within your question. Alternatively, you can use APIs (GET repos, GET README of a repo) to get each repositories README and save it to a database/JSON based on a cron scheduler and create a web interface based on that data.
But, I'm gonna elaborate on a few areas of improvement. The problem I see with this is the nature of the search. We aren't always looking for keywords, sometimes we are trying to find a potential fuzzy match for our problem, especially in the case of a larger organization with more than a couple of thousand repositories. In those cases, a search engine implementation will provide much better results. In my opinion, we should collect the README and FAQs and put them into Elastic search, expose search API for queries. The collection of README and FAQs should be part of the CI/CD pipeline, and while pushing new versions to artifactory it must publish metadata as well.
This looks like a use case for internal repositories to me. You can find more about internal repositories here.
Whether you can use internal repositories or not highly depends on your company's policies.
Another thing to consider is that this will expose your whole repositories, not just the README.
We often have epic stories which span multiple repositories. I am looking for a mechanism to track all the work that is associated with a single story. GitHub has Issues which is a close to the solution I seek. The problem with Issues is they do not span multiple repositories. On deployment day I still need to scan ~10 repositories (there are 100 repo's, 10 are commonly used) to discover which ones have commits related to the story.
As a manual workaround I create multiple Issues. One Issue for each repository. Then I manually list the Issue#'s related to the epic story in Jira.
Is there a tool or alternative technique I can use to automatically combine these issues and treat them as one?
It would be a bit unusual to use both JIRA and GitHub Issues together. JIRA offers virtually everything that GitHub Issues does and more.
This guide from GitHub shows how you can integrate JIRA directly with GitHub, skipping Issues altogether. When properly configured you will see links to GitHub in mentioned JIRA issues. You can also trigger JIRA workflow changes based on keywords in your commit messages, much like GitHub Issues does out of the box.
I would like to review my GitHub activities for the past 12 months. On my public profile I can see as much as one month only. I can go to an individual repository and review all the commits for any period of time, of course, but this becomes unfeasible when I have to interleave data from dozens of repositories.
Is there an advanced query or page that allows me to see or download this aggregated data?
Looks like GitHub does not allow viewing contribution data for more than a Month by default but if you use it's apis you can build something that can keep track of your contribution for how ever long you want it to.
Here's where GitHub talks about contribution.
https://help.github.com/articles/viewing-contributions-on-your-profile-page/
I have several projects on GitHub, and they all have the traffic graph where I can view how much traffic my repository is getting.
The blog post I had linked is very vague about visitors. It states:
..how many unique visitors it's had..
I just find it odd that some of my repositories have daily activity, but I'm not sure if most of those views are me, and if they are, why does it say "unique visitors" when i would be the only unique visitor
Question:
Does the traffic graph used on GitHub include yourself when navigating through your own source? It's very minor, but I'm genuinely curious if the views I'm getting is myself navigating through the source, or if I have people that are actually browsing through my source.
In specific, the line that shows "Views", not "Unique visitors" because unique visitors will obviously mean new people browsing the repository.
For those who think this is offtopic, re-read the on-topic post.
Most notably:
but if your question generally covers… software tools commonly used by programmers
OK I just contacted support and received a response:
Hello -
> Do the numbers in the traffic graphs include your own views? What about the view of contributors?
Thanks for getting in touch! Yes, the numbers include everyone's
views including repository owners and contributors. There's no way to
filter this information at the moment, but I can definitely add that
as a feature request for the team to consider.
Hope that answers your question - thanks!
So it does include your own views, but they might add the option to filter it later.
It looks like this behavior has changed, and now the traffic by the repository owner's views does not count when the owner is logged in.
A recent support question asked this, among others, and received the following reply from a member of staff:
My visit to my repository also count as a visit?
No, viewing your own repository while signed in doesn’t count towards this data.
I have checked that his holds by checking one of my repositories: the graph shows no views even on days when I visited the repository several times.
It is latest update as on 22nd Dec 2021 from GIT support that it is still recording owner's own views on owner's repository
From what I have experienced, as of November 2021, it does count the owner as a viewer. It is unclear as to why it counts as a unique viewer.
To be specific, I have seven repositories in GitHub, to which all of them have been inactive. On the 8th of November, 2021, I decided to check the traffic from all of them. Besides the portfolio, none of them gained traffic. The next day, all of them from yesterday gained traffic. Coincidence? No.
Yep, it appears that Github counts your own visits to your repositories too. In this image, the "Traffic' page has 8 views from 1 visitor. Given that, the Traffic page is available only to the owner of the repository, you can deduct that Github counts your own visits too.
My company has a project where we need to share an SDK we have developed with initially a limited set of developers, but ultimately all developers, so that they can:
- download the SDK
- submit issues
- browse and post to a forum where other developers troubleshoot issues
- post reference clients that use the code which developers can download, contribute to...
Note: We don't want/need other developers to edit/contribute to the SDK itself. My company will retain control and edit/publish the SDK.
I am seeing sites like GitHub but I'm not sure it's the right fit. Could GitHub meet my needs? Are there other sites that might do this better?
Github provides private repositories for a fee. There's also Bitbucket which also does Git and is now part of the Attlassian suit it provides private repositories for free but the number of users is limited; LaunchPad does the same with Bazaar instead of Git, Microsoft has Codeplex and TFS on the azure cloud, and I'm sure there's a gaggle of providers for other version control platforms.
In short there are many, many options, one of them is sure to fit your needs and budget.