Using GitHub's API to get lines of code added/deleted per commit (on a branch)? - github

The following gets a raw list of commits for a project's master branch:
https://api.github.com/repos/<organization_name>/<repo_name/commits?page=0&per_page=30
Question 1: How can one get a similar list but for a specific <branchname>?
Question 2: The list of commits above doesn't include any data about the lines of code added/deleted per commit (i.e., a very rough productivity metric). Is there a way to get this data in the query?

You can fetch the specific branch with sha={branchname} param in the /commits params;
sha string SHA or branch to start listing commits from. Default: the repository’s default branch (usually master).
https://api.github.com/repos/<org_name>/<repo_name>/commits?sha=<branchName>&page=0&per_page=30
To get per-file specific changes for each commit, you'd need to check url variable for each commit entity in the response of above URL. From that new endpoint call, you will get a more detailed information of that single commit. files variable in there will contain the changes contained in that commit. Both added & removed codes per file.
An example with my repo;
https://api.github.com/repos/buraequete/orikautomation/commits?sha=master&page=0&per_page=30
If we get the first commits url;
https://api.github.com/repos/buraequete/orikautomation/commits/89792e6256dfccc5e9151d81bf04145ba02fef8f
Which contains the changes you want in files variable as a list.
"files": [
{
"sha": "8aaaa7de53bed57fc2865d2fd84897211c3e70b6",
"filename": "lombok.config",
"status": "added",
"additions": 1,
"deletions": 0,
"changes": 1,
"blob_url": "https://github.com/buraequete/orikautomation/blob/89792e6256dfccc5e9151d81bf04145ba02fef8f/lombok.config",
"raw_url": "https://github.com/buraequete/orikautomation/raw/89792e6256dfccc5e9151d81bf04145ba02fef8f/lombok.config",
"contents_url": "https://api.github.com/repos/buraequete/orikautomation/contents/lombok.config?ref=89792e6256dfccc5e9151d81bf04145ba02fef8f",
"patch": "## -0,0 +1 ##\n+lombok.accessors.chain = true"
},
...
]
Sorry but I don't think there is a way to get those per file changes in the original /commits endpoint call, you have to do multiple calls...

Related

GitHub REST and GraphQL API are returning different data

I am scraping some data from GitHub. The RESTful URL to this particular PR shows that it has a merge_commit_sha value: https://api.github.com/repos/ansible/ansible/pulls/15088
However, when I try to get the same PR using GitHub GraphQL API, it shows it does not have any mergedCommit value.
resource(
url: "https://github.com/ansible/ansible/pull/15088"
) {
...on PullRequest {
id
number
title
merged
mergeCommit {
message
}
}
}
For context, the PR of interest is actually merged and should have a merged-commit value. I am looking for an explanation of the difference between these two APIs.
This link posted in the other answer contains the explanation:
As in, Git doesn’t have the originalCommit (which makes sense).
Presumably the original commit SHA is there, but the graphQL API actually checks to see if git has it, whereas the REST API doesn’t?
If you search for the commit SHA the API returns, you can't find it in the repo.
https://github.com/ansible/ansible/commit/d7b54c103050d9fc4965e57b7611a70cb964ab25
Since this is a very old pull request on an active repo, there's a good chance some old commits were cleaned up or other maintenance on the repo. It's hard to tell as that kind of maintenance obviously isn't version controlled.
Another option is the pull request was merged with fast-forward, which does not involve a merge commit. But that wouldn't explain the SHA on the REST API response.
So probably at some point they removed old merge commits to save some space, or something similar. Some objects still point to removed SHAs, but GraphQL API filters on existing objects.
Feel like it is a bug to me because if you query another PR such as 45454 , it can return the mergeCommit:
{
"data": {
"resource": {
"id": "MDExOlB1bGxSZXF1ZXN0MjE0NDYyOTY2",
"number": 45454,
"title": "win_say - fix up syntax and test issues (#45450)",
"merged": true,
"mergeCommit": {
"message": "win_say - fix up syntax and test issues (#45450)\n\n\n(cherry picked from commit c9c141fb6a51d6b77274958a2340fa54754db692)",
"oid": "f2d5954d11a1707cdb70b01dfb27c722b6416295"
}
}
}
}
Also find out other encountered the same problem at this and another similar issue at this. I suggest you can try to raise this issue to them at this.

Find a string in a GitHub Pull Request

I'd like to build a bot to let you know if a certain string, like DONT_MERGE_ME appears in a GitHub Pull Request, so I can block the commit with a failed check and add a helpful comment for the developer.
Let's say you had committed code like the follow, that you don't want to accidentally merge with your PR (e.g. you're hacking around).
const bar = 'some-hack-value'; // DONT_MERGE_ME
Given the PR id, I'd like to figure out if the PR still has the string DONT_MERGE_ME in it. However,
the GitHub Code Search API has many limits, like 384KiB max file size and only searches the default branch
the GitHub Commit Search API only searches the default branch
the GitHub Pull Requests Search API only searches by title/body/comment
Given the above limitations, it looks like the only way to figure this out, for a given PR id and commit, would be to find all commits in the PR up to this commit, download the diffs, and sum them up.
Is there a simpler way to do this with the GitHub API?
The approach I would recommend is subscribe to the pull_request event. If payload.action is either opened or synchronize, load the diff of the pull request and look for the string in all lines that have been changed.
You can preview the diff response for a pull request by adding .diff to any pull request URL, e.g. https://patch-diff.githubusercontent.com/raw/gr2m/sandbox/pull/194.diff
Find the lines starting with a + and look for your string in them
If you use the JavaScript octokit package, you can load a pull request like this
const { data: diff } = octokit.rest.pulls.get({ owner, repo, pull_number, mediaType: { format: "diff }})
Also check out the TODO GitHub App, its source is Open Source, too
I discovered (thanks for the tip, #Gregor), that there is a GitHub API for getting the pull request as a diff, if you pass certain headers.
Were we can get the delta for a repo's PR:
const pullId = 14956; // NOTE: 73 files changed!
const repoFullname = 'eslint/eslint';
const url = `https://api.github.com/repos/${repoFullname}/pulls/${pullId}.diff`;
const diffStr = (await axios.get(url,requestConfig)).data;
Then, we can use the parse-diff library to parse these, and filter out on the add changes, and matching content changes we want.
// Search for this word
const KEYWORD = 'Requirements';
// Analyze all files
const files = parse(diffStr);
const filesWithMatchingAdds = files.map(
file => ({
file: file.to,
adds: file.chunks.map(
chunk => chunk.changes
// Only look for added lines
.filter(chunk => chunk.type === 'add')
// That match our keyword
.filter(chunk => chunk.content.includes(KEYWORD))
).flat()}) // collapse into one array
// Only files with at least one match
).filter(file => file.adds.length);
Output looks something like
[
{
"file": "tests/tools/internal-rules/multiline-comment-style.js",
"adds": [
{
"type": "add",
"add": true,
"ln": 4,
"content": "+// Requirements"
}
]
}
]
Full gist here.

How to get a commit SHA from a release or tag on Github API V3

The release nor tag response don't seem to have information (SHA) about the commit they were made from. How can I get it if I only have a tag/release like v1.2.3?
There's no specific endpoint in GitHub API v3 to get the commit SHA from tag/release name.
For your use-case, you can use the List tags endpoint to get all the tags for a particular repo, iterate over the response and get the desired tag details with the commit SHA.
Endpoint: GET /repos/:owner/:repo/tags
Sample response below:
[
{
"name": "v0.1",
"commit": {
"sha": "c5b97d5ae6c19d5c5df71a34c7fbeeda2479ccbc",
"url": "https://api.github.com/repos/octocat/Hello-World/commits/c5b97d5ae6c19d5c5df71a34c7fbeeda2479ccbc"
},
"zipball_url": "https://github.com/octocat/Hello-World/zipball/v0.1",
"tarball_url": "https://github.com/octocat/Hello-World/tarball/v0.1"
}
]

Obtain TFS GIT Commit Details From TFS Work Item Artifact Link

Is it possible to leverage TFS or TS REST api to obtain details for a GIT commit by leveraging the work item commit "ArtifiactLink" url?
So you want to get detail commit information based on a work item artifacts link (while the artifact link type contains commit).
You can achieve that with two REST API, detail steps as below:
1. Get the work item with full expanded
GET https://{instance}/DefaultCollection/_apis/wit/workitems/{id}?api-version1.0&$expand=all
For TFS2015, the format looks like:
GET http://tfsServer:8080/tfs/DefaultCollection/_apis/wit/workitems?ids={id}&$expand=all&api-version=1.0
For VSTS, the format looks like:
GET https://account.visualstudio.com/DefaultCollection/_apis/wit/workitems?ids=7&$expand=all&api-version=1.0
2. Get commit(s) and related repo(s) linked in the above work item
Search in the response of the step1 REST API, get the part which rel is ArtifactLink and the url start with vstfs:///Git/Commit. The URL format is
vstfs:///Git/Commit/{project ID}%2F{repo ID}%2F{commit ID}
Such as part of the REST API response as:
{
"rel": "ArtifactLink",
"url": "vstfs:///Git/Commit/b959f22b-eeb7-40dc-b37e-986377eaa86f%2F4cfde261-fec3-451c-9d41-a400ba816110%2Fb3c3c5b8718f403402be770cb3b5912df7c64dd6",
"attributes": {
"authorizedDate": "2017-09-26T03:14:03.98Z",
"id": 92,
"resourceCreatedDate": "2017-09-26T03:14:03.98Z",
"resourceModifiedDate": "2017-09-26T03:14:03.98Z",
"revisedDate": "9999-01-01T00:00:00Z",
"name": "Fixed in Commit"
}
}
The project ID is b959f22b-eeb7-40dc-b37e-986377eaa86f, the repo ID is 2F4cfde261-fec3-451c-9d41-a400ba816110 and the commit ID is b3c3c5b8718f403402be770cb3b5912df7c64dd6.
3. Get commit(s) details
Use the project ID, repo ID and commit ID you get in step2 to get a single commit:
GET https://{instance}/DefaultCollection/{project ID}/_apis/git/repositories/{repo ID}/commits/{commit ID}?api-version={version}
For TFS 2015, the format looks like:
GET http://tfsServer:8080/tfs/DefaultCollection/{project ID}/_apis/git/repositories/{repo ID}/commits/{commit ID}?api-version=1.0
For VSTS, the format looks like:
GET https://account.visualstudio.com/DefaultCollection/{project ID}/_apis/git/repositories/{repo ID}/commits/{commit ID}?api-version=1.0

Are the GitHub repository id numbers permanent?

The GitHub v.3 API for repository data returns an integer identifier for a given repository. This identifier is the value of the field id in the data returned. In this example, it is the number 1296269:
[
{
"id": 1296269,
"owner": {
"login": "octocat",
... stuff omitted for brevity ...
},
"name": "Hello-World",
"full_name": "octocat/Hello-World",
... stuff omitted for brevity ...
}
]
Is a given identifier value ever reused once it is assigned to a repository, even if a repository or its owner account is subsequently deleted? In other words, are the identifiers unique and permanent, and never reused for any other repositories, ever?
In this context, I don't mean simply renaming a repository; that would not count as "reusing" it in the same sense, nor would replacing the content of a repository completely with other content. I am trying to understand specifically whether the GitHub id values are ever "recycled", if you will.
(I honestly searched in the GitHub documentation and the web, but could not find a statement either way. If I missed it, I apologize and would be happy to be pointed to the appropriate documentation.)
The Ruby toolkit for the GitHub API relies on the uniqueness of a GitHub id since 2014: see issue 483 and PR 485.
At the time (2014), renamed repo were not supported, but since then (April 2015), they are
If you have information about the repo before it was renamed, you should have the id which is returned by the API. If you do then to have a resilient access to the repository, you just need to do
GET /repositories/1234
And you'll always get the repository, regardless of whether the name changes (assuming you still have access to it).