GitHub REST and GraphQL API are returning different data - rest

I am scraping some data from GitHub. The RESTful URL to this particular PR shows that it has a merge_commit_sha value: https://api.github.com/repos/ansible/ansible/pulls/15088
However, when I try to get the same PR using GitHub GraphQL API, it shows it does not have any mergedCommit value.
resource(
url: "https://github.com/ansible/ansible/pull/15088"
) {
...on PullRequest {
id
number
title
merged
mergeCommit {
message
}
}
}
For context, the PR of interest is actually merged and should have a merged-commit value. I am looking for an explanation of the difference between these two APIs.

This link posted in the other answer contains the explanation:
As in, Git doesn’t have the originalCommit (which makes sense).
Presumably the original commit SHA is there, but the graphQL API actually checks to see if git has it, whereas the REST API doesn’t?
If you search for the commit SHA the API returns, you can't find it in the repo.
https://github.com/ansible/ansible/commit/d7b54c103050d9fc4965e57b7611a70cb964ab25
Since this is a very old pull request on an active repo, there's a good chance some old commits were cleaned up or other maintenance on the repo. It's hard to tell as that kind of maintenance obviously isn't version controlled.
Another option is the pull request was merged with fast-forward, which does not involve a merge commit. But that wouldn't explain the SHA on the REST API response.
So probably at some point they removed old merge commits to save some space, or something similar. Some objects still point to removed SHAs, but GraphQL API filters on existing objects.

Feel like it is a bug to me because if you query another PR such as 45454 , it can return the mergeCommit:
{
"data": {
"resource": {
"id": "MDExOlB1bGxSZXF1ZXN0MjE0NDYyOTY2",
"number": 45454,
"title": "win_say - fix up syntax and test issues (#45450)",
"merged": true,
"mergeCommit": {
"message": "win_say - fix up syntax and test issues (#45450)\n\n\n(cherry picked from commit c9c141fb6a51d6b77274958a2340fa54754db692)",
"oid": "f2d5954d11a1707cdb70b01dfb27c722b6416295"
}
}
}
}
Also find out other encountered the same problem at this and another similar issue at this. I suggest you can try to raise this issue to them at this.

Related

How to get the default merge method for a pull request from github graphql query?

Using a Github graphql query below I can see which merge options a repository has available:
{
node(id: "<id>") {
... on PullRequest {
number
repository {
mergeCommitAllowed
squashMergeAllowed
rebaseMergeAllowed
}
}
}
}
that returns:
{
"data": {
"node": {
"number": 666,
"repository": {
"mergeCommitAllowed": false,
"squashMergeAllowed": true,
"rebaseMergeAllowed": true
}
}
}
}
But, I don't see a way to know which is the default. When I look at the pull request on Github (see below), it knows that rebase is the preferred method for merging my pull request. Perhaps there is some kind of "sticky" data attached to my user? (I'm sure the last pull request I closed was using rebase)
Is it possible to know which merge method is the default with a graphql query?
Here's an example query which returns the default merge method for the authenticated user. replace owner and name as appropriate.
{
repository(owner: "TeamCodeStream", name: "codestream") {
viewerDefaultMergeMethod
}
}
I believe this is sticky to your user and possibly the repository. I've used a non-default merge method on a repository in the past, and when I came back, the merge button suggested the same value, which was definitely not the value I typically used.
The API wouldn't show you this, since that information is not a property of the repository, but both the repository and the user. API responses are not supposed to change based on the person who makes them unless there's an access control issue. Anyway, when performing a merge through the GraphQL API, the API defaults to the merge method anyway unless you specify something different.

How to trigger azure pipeline via API in a way it does not report it was manually triggered

We have an Azure pipeline building a static site. When there is a change in a content repository the site needs to be rebuilt. For that, we're using webhooks and Azure DevOps API. The request to queue the build is very simple and is illustrated for example here.
What I don't like about this is that int the build listing it says "Manually triggered for person XY", where the person XY is the one who generated the credentials used in the API request. It seems quite confusing because any API request seems strange to be labeled as "manually requested". What would be the best way how to achieve more semantically correct message?
I've found there is a reason property which can be sent in the request. But none of the values seems to represent what I want and some of them do not work (probably need additional properties and there is no documentation for that).
Based on my test, when you use the Rest API to queue a build and set the build reason, the reason could be shown in the Build(except:batchedCI and buildCompletion).
Here is the Rest API example:
Post https://dev.azure.com/Organization/Project/_apis/build/builds?api-version=4.1
Request Body:
{
"definition": {
"id": 372
},
"reason":"pullRequest"
}
The value : checkInShelveset individualCI pullRequest schedule could show their own names.
The value: manual and none could show manually trigger.
The other value(e.g. All, userCreated) will show Other Build Reason.
For the value: batchedCI and buildCompletion.
BatchedCI: Continuous integration (CI) triggered by a Git push or a TFVC check-in, and the Batch changes was selected.
This means that batch changes are required to achieve this trigger. So it doesn't support to queue build in Rest API .
buildCompletion: you could refer to this ticket This reason doesn't support in Rest API-queue Build.
Note: If you enter a custom value or misspelling, it will always display manual trigger.
In the end, I went with all value and also overriding the person via requestedFor property. This leads to the message "Other build reason", which seems usable to me.
{
"definition": {
"id": 17
},
"reason":"all",
"requestedFor": {
"id": "4f9ff423-0e0d-4bfb-9f6b-e76d2e9cd3ae"
}
}
However, I'm not sure if there aren't any unwanted consequences of this "All reasons" value.

Recommended way to list all repos/commits for a given user using github3.py

I'm building a GitHub application to pull commit information from our internal repos. I'm using the following code to iterate over all commits:
gh = login(token=gc.ACCESS_TOKEN)
for repo in gh.iter_repos():
for commit in repo.iter_commits():
print(commit.__dict__)
print(commit.additions)
print(commit.author)
print(commit.commit)
print(commit.committer)
print(commit.deletions)
print(commit.files)
print(commit.total)
The additions/deletions/total values are all coming back as 0, and the files attribute is always []. When I click on the url, I can see that this is not the case. I've verified through curl calls that the API indeed has record of these attributes.
Reading more in the documentation, it seems that iter_commits is deprecated in favor of iter_user_commits. Might this be the case why it is not returning all information about the commits? However, this method does not return any repositories for me when I use it like this:
gh = login(token=gc.ACCESS_TOKEN)
user = gh.user()
for repo in gh.iter_user_repos(user):
In short, I'm wondering what the recommended method is to get all commits for all the repositories a user has access to.
There's nothing wrong with iter_repos with a logged in GitHub instance.
In short here's what's happening (this is described in github3.py's documentation): When listing a resource from GitHub's API, not all of the attributes are actually returned. If you want all of the information, you have to request the information for each commit. In short your code should look like this:
gh = login(token=gc.ACCESS_TOKEN)
for repo in gh.iter_repos():
for commit in repo.iter_commits():
commit.refresh()
print(commit.additions)
print(commit.deletions)
# etc.

Are the GitHub repository id numbers permanent?

The GitHub v.3 API for repository data returns an integer identifier for a given repository. This identifier is the value of the field id in the data returned. In this example, it is the number 1296269:
[
{
"id": 1296269,
"owner": {
"login": "octocat",
... stuff omitted for brevity ...
},
"name": "Hello-World",
"full_name": "octocat/Hello-World",
... stuff omitted for brevity ...
}
]
Is a given identifier value ever reused once it is assigned to a repository, even if a repository or its owner account is subsequently deleted? In other words, are the identifiers unique and permanent, and never reused for any other repositories, ever?
In this context, I don't mean simply renaming a repository; that would not count as "reusing" it in the same sense, nor would replacing the content of a repository completely with other content. I am trying to understand specifically whether the GitHub id values are ever "recycled", if you will.
(I honestly searched in the GitHub documentation and the web, but could not find a statement either way. If I missed it, I apologize and would be happy to be pointed to the appropriate documentation.)
The Ruby toolkit for the GitHub API relies on the uniqueness of a GitHub id since 2014: see issue 483 and PR 485.
At the time (2014), renamed repo were not supported, but since then (April 2015), they are
If you have information about the repo before it was renamed, you should have the id which is returned by the API. If you do then to have a resilient access to the repository, you just need to do
GET /repositories/1234
And you'll always get the repository, regardless of whether the name changes (assuming you still have access to it).

How to get all of a user's public github commits

Regardless of project, I'd like to know if there's an easy way of getting all commits to all public repositories for a single username.
Since I belong to multiple organizations, I'm trying to compile a list of the projects on which I'm a contributor, as well as projects that I have accepted pull requests.
So far my google-fu and looking through the github api docs has proved insufficient.
https://connectionrequired.com/gitspective is your friend. :-) Filter out all but "Push", and you have your view, albeit without the coding work to implement it yourself first.
Inspecting what goes on with the Chrome DevTools "Network" tab might help you mimic the API queries, if you want to redo the work yourself.
The correct way to do this is via the Events API.
First you need to fetch the user's events:
GET /users/:username/events
Then you will want to filter the response array for items where type is set to PushEvent. Each one of these items corresponds to a git push by the user. The commits from that push are available in reverse chronological order in the payload.commits array.
The next step is to filter out commits made by other users by checking the author.email property of each commit object. You also have access to properties like sha, message and url on the same object, and you can eliminate duplicate commits across multiple pushes by using the distinct property.
EDIT: As pointed out by Adam Taylor in the comments, this approach is wrong. I failed to RTFM, sorry. The API lets you fetch at most 300 events and events are also limited to the last 90 days. I'll leave the answer here for completeness but for the stated question of fetching all commits, it won't work.
UPDATE 2018-11-12
The URLs mentioned below have now moved to a single URL that looks like https://github.com/AurelienLourot?from=2018-10-09 but the idea remains the same. See github-contribs.
I'd like to know if there's an easy way of getting all commits to all public repositories for a single username.
The first challenge is to list all repos a user has ever contributed to. As pointed out by others, the official API won't allow you to get this information since the beginning of time.
Still you can get that information by querying unofficial pages and parsing them in a loop:
https://github.com/users/AurelienLourot/created_commits?from=2018-05-17&to=2018-05-17
https://github.com/users/AurelienLourot/created_repositories?from=2018-05-17&to=2018-05-17
https://github.com/users/AurelienLourot/created_pull_requests?from=2018-05-17&to=2018-05-17
https://github.com/users/AurelienLourot/created_pull_request_reviews?from=2018-05-17&to=2018-05-17
(Disclaimer: I'm the maintainer.)
This is exactly what github-contribs does for you:
$ sudo npm install -g #ghuser/github-contribs
$ github-contribs AurelienLourot
✔ Fetched first day at GitHub: 2015-04-04.
⚠ Be patient. The whole process might take up to an hour... Consider using --since and/or --until
✔ Fetched all commits and PRs.
35 repo(s) found:
AurelienLourot/lsankidb
reframejs/reframe
dracula/gitk
...
The GitGub GraphQL API v4 ContributionsCollection object provides contributions grouped by repository between two dates, up to a maximum of 100 repositories. from and to can be a maximum of one year apart, so to retrieve all contributions you will need to make multiple requests.
query ContributionsView($username: String!, $from: DateTime!, $to: DateTime!) {
user(login: $username) {
contributionsCollection(from: $from, to: $to) {
commitContributionsByRepository(maxRepositories: 100) {
repository {
nameWithOwner
}
contributions {
totalCount
}
}
pullRequestContributionsByRepository(maxRepositories: 100) {
repository {
nameWithOwner
}
contributions {
totalCount
}
}
}
}
}
I know this question is quite old, but I've ended up coding my own solution to this.
In the end the solution is to find all potential repositories where the user contributed using the organization_repositories and list_repositories services (I'm using octokit).
Then we find all active branches (service branches) on these repositories and for each of them find only the commits from our user (service commits).
The sample code is a little bit extensive, but can be found here
OBS: As pointed out, this solution does not consider organizations and repositories where you contributed but are not part of.
You can get info about the user using the API method: get-a-single-user
After that you can find all user repositories and then commits with function like that:
def get_github_email(user_login, user_name, key):
'''
:param str user_login: user login for GitHub
:param str key: your client_id + client_secret from GitHub,
string like '&client_id=your_id&client_secret=yoursecret'
:param str user_name: user GitHub name (could be not equeal to user_login)
:return: email (str or None) or False
'''
url = "https://api.github.com/users/{}/repos?{}".format(user_login, key)
#get repositories
reps_req = requests.get(url)
for i in reps_req.json():
if "fork" in i:
# take only repositories created by user not forks
if i["fork"] == False:
commits_url = "https://api.github.com/repos/{}/{}/commits?{}".format(user_login, i["name"], key)
#get commits
commits_req = requests.get(commits_url)
for j in commits_req.json():
#check if author is user (there may be commits from someone else)
if j.get("commit", {}).get("author", {}).get("name") == user_name:
return j["commit"]["author"]["email"]
return False