get historical languages data from GitHub API - github

I want to make a timeline for the usage of different languages in my private repositories. The GitHub API provides the bytes of data in each language with GET /repos/:owner/:repo/languages
as well as the meta data on each commit with GET /repos/:owner/:repo/commits/:sha which doesn't have a 'languages' attribute.
The brute-force idea is to clone my repositories and measure it myself at each commit from the history of commits. But I was wondering if there is a smarter way to do so.

Related

Programmatically create a commit containing Git-LFS data to a Github Repository

Ether by using the GraphQL (Preferred but seems unlikely), REST API, or by any other means.
Ideally it would be possible to commit the LFS objects and regular objects in the same commit.
I have been using the GraphQL API to successfully query and download LFS objects. I was hoping that it would be possible to also create commits that contain LFS objects.
Unfortunately it seems like createCommitOnBranch doesn't provide a documented way of doing that. So I am left with looking for alternatives.
A version of this question is in the Github Community GraphQL category: https://github.community/t/create-a-commit-containing-lfs-data-via-graphql/252637

View repository activity across all repositories under an account at the same time?

Is there a way to view repository traffic for all repositories on your account at the same time? (without creating your own custom dashboard using the Github API). It would be very convenient. I suspect a bash script might do this without too much effort (e.g. get all repo names, get the traffic/stars stats for each repo in the list). But I want to be sure something obvious doesn't already exist before writing anything myself
I am not ware of any native dashboard that would aggregate multiple GitHub repositories into one convenient view.
You would therefore have to rely on third-party scripts, such as, for example, nchah/github-traffic-stats (Python)
Get statistics on web traffic to your GitHub repositories.
Since it is limited to the last two weeks, you might have to record those statitics over time (example: Microsoft/GitHubTelemetryParsor)

Is there a way to get the raw diff of a commit via the Azure Devops API?

As part of our application, we're building an ability to integrate with Azure DevOps' REST API. One key component that we're interested in is being able to see actual diffs of specific commits, so that we can look at and analyze the line content. We've already created this integration for GitHub, GitLab, and Bitbucket, and each time it was easy: There's a fairly simple diff endpoint for each that takes in a specific commit ID and diffs it (sometimes with a specific parent commit).
I've not had much luck finding this same functionality in Azure DevOps, however: The diffs endpoint has some data related to this, but it is really just an overview of which files changed and the high-level nature of those changes, along with the IDs of specific blobs that represent the files in each state (before and after).
It's theoretically possible to use those blobs to manually construct what I'm after, and indeed I've been able to query for the before and after blobs to get a diff on each file. But that's two separate endpoint queries per file -- take a twenty-file commit, and suddenly we'd need 40 API calls just to construct a reasonable diff. That doesn't really fit our performance needs, unfortunately.
Is there a separate API endpoint or technique that lets us get to the raw diff? It doesn't need to be a raw diff a la git diff directly, just anything that lets us see the before and after state of each line (rather than each file) with minimal API calls (preferably just one). I've done much scouring through the docs and here on StackOverflow, and not found anything that accomplishes this.
There is no existing Rest API to meet your needs. But you could refer to the following steps to get the content of the git diff.
Step1: You could use the Rest API to get the commit id.
GET https://dev.azure.com/{organization}/{project}/_apis/git/repositories/{repositoryId}/commits?api-version=5.0
Step2: You could use the Rest API to get the commit by commit id.
GET https://dev.azure.com/{organization}/{project}/_apis/git/repositories/{repositoryId}/commits/{commitId}?changeCount={changeCount}&api-version=5.0
In the Rest API Result, you need to record the value of “parentsid”, “path”.
Step3: You could use the following API to get the diff content.
Post https://dev.azure.com/Organization/Project /_api/_versioncontrol/fileDiff?__v=5&diffParameters={value}&repositoryId={repositoryid}
The {value} is Json type.
Here is an example:
{"originalPath":"filepath","originalVersion":"Parentsid","modifiedPath":"filepath","modifiedVersion":"commitid","partialDiff":true,"includeCharDiffs":true}
You could add the value to the API URL.
Then run the API and the result will contains the git diff content. (2 means remove, 1 means add)
Here is a result sample:
This is the ticket I refer to, hope it helps you.

Is there a way to retrieve Github repositories whose language has been changed?

I am trying to get all repositories whose language was Java but had changed to Kotlin and vice versa.
Does anyone know if it's possible to filter these repositories with the Github api?
If you are looking to compare before/after, in terms of a programming languages for particular GitHub repos, I'm not sure you can do that short of having a big-data project.
If you want to filter GitHub repos by programming language, the GitHub API documentation states:
Suppose you want to search for popular Tetris repositories written in Assembly. Your query might look like this.
curl https://api.github.com/search/repositories?q=tetris+language:assembly&sort=stars&order=desc
Also,
Checkout my open-soure project Git-Captain, which may help you.
It's an open-source web-application built with Node.js utilizing GitHub API to find, create, and delete a branch throughout numerous GitHub repositories.
Can be setup for organizations or a single user.
I have a step-by-step how to set it up on a server to communicate with the GitHub API.

GitHub to share a set of SPARQL queries

I am using github to share a set of SPARQL queries:
http://www.boisvert.me.uk/opendata/sparql_aq+.html?file=specific%20sensor.txt
Currently the simple work allows end-users to access queries stored on the github repository, but ultimately I want to allow them to also modify the queries, as with a pastebin, and make use of the repository to better manage the shared system. Ideally I would want end-users who may not be very tech-savvy, to be able to make minor changes to queries to an open, linked data endpoint: so to keep the technology barrier low.
My problem is this: how best to structure the github project and exploit the API to make the most of the available information? I can think of different points:
Currently the project (https://github.com/boisvert/unshaql) holds client code and example queries. Does it make a difference to create an independent project (separate from the web client code) for SPARQL queries?
I would use directories within the project to classify/tag queries, and file names to title them. Are there better alternatives? It strikes me that a hierarchical structure is not a good fit to tags.
When end-users save, a simpler (and cruder) option is to allow them to push their file into just one branch, which holds the examples. A better engineered one would be to allow them to use their github credentials to fork the set of SPARQL queries and edit theirs, but with unaware users, how do I avoid creating a mess?
I think that a rigular Github repository is a rather bad fit for this kind of content. If your users have a GitHub account, you should probably use Gists instead: https://help.github.com/articles/about-gists/ I never used this myself, but it seems perfectly adapted to what you are planning. Your site could become a DB of tags over user-provided gists. That would however lock you into GitHub-specific solutions.
Even if you go for a regular repository, you should not allow the users to commit into the repository hosting your code: that would be a serious security hazard as you won't be able to control the parts of the repository to which they are allowed to commit.
If you setup two repositories, it's rather easy to have the code of a webpage in a repository, and the code automatically commited in another repository (under an anonymous identity so that your users don't have to create a github account).
Also, note that the oauth token should never be stored in a public repository (or the GitHub robots will invalidate it in a matter of hours).
See Hiding GitHub token in .gitconfig for a solution to this sub-problem.