How to get all of a user's public github commits - github

Regardless of project, I'd like to know if there's an easy way of getting all commits to all public repositories for a single username.
Since I belong to multiple organizations, I'm trying to compile a list of the projects on which I'm a contributor, as well as projects that I have accepted pull requests.
So far my google-fu and looking through the github api docs has proved insufficient.

https://connectionrequired.com/gitspective is your friend. :-) Filter out all but "Push", and you have your view, albeit without the coding work to implement it yourself first.
Inspecting what goes on with the Chrome DevTools "Network" tab might help you mimic the API queries, if you want to redo the work yourself.

The correct way to do this is via the Events API.
First you need to fetch the user's events:
GET /users/:username/events
Then you will want to filter the response array for items where type is set to PushEvent. Each one of these items corresponds to a git push by the user. The commits from that push are available in reverse chronological order in the payload.commits array.
The next step is to filter out commits made by other users by checking the author.email property of each commit object. You also have access to properties like sha, message and url on the same object, and you can eliminate duplicate commits across multiple pushes by using the distinct property.
EDIT: As pointed out by Adam Taylor in the comments, this approach is wrong. I failed to RTFM, sorry. The API lets you fetch at most 300 events and events are also limited to the last 90 days. I'll leave the answer here for completeness but for the stated question of fetching all commits, it won't work.

UPDATE 2018-11-12
The URLs mentioned below have now moved to a single URL that looks like https://github.com/AurelienLourot?from=2018-10-09 but the idea remains the same. See github-contribs.
I'd like to know if there's an easy way of getting all commits to all public repositories for a single username.
The first challenge is to list all repos a user has ever contributed to. As pointed out by others, the official API won't allow you to get this information since the beginning of time.
Still you can get that information by querying unofficial pages and parsing them in a loop:
https://github.com/users/AurelienLourot/created_commits?from=2018-05-17&to=2018-05-17
https://github.com/users/AurelienLourot/created_repositories?from=2018-05-17&to=2018-05-17
https://github.com/users/AurelienLourot/created_pull_requests?from=2018-05-17&to=2018-05-17
https://github.com/users/AurelienLourot/created_pull_request_reviews?from=2018-05-17&to=2018-05-17
(Disclaimer: I'm the maintainer.)
This is exactly what github-contribs does for you:
$ sudo npm install -g #ghuser/github-contribs
$ github-contribs AurelienLourot
✔ Fetched first day at GitHub: 2015-04-04.
⚠ Be patient. The whole process might take up to an hour... Consider using --since and/or --until
✔ Fetched all commits and PRs.
35 repo(s) found:
AurelienLourot/lsankidb
reframejs/reframe
dracula/gitk
...

The GitGub GraphQL API v4 ContributionsCollection object provides contributions grouped by repository between two dates, up to a maximum of 100 repositories. from and to can be a maximum of one year apart, so to retrieve all contributions you will need to make multiple requests.
query ContributionsView($username: String!, $from: DateTime!, $to: DateTime!) {
user(login: $username) {
contributionsCollection(from: $from, to: $to) {
commitContributionsByRepository(maxRepositories: 100) {
repository {
nameWithOwner
}
contributions {
totalCount
}
}
pullRequestContributionsByRepository(maxRepositories: 100) {
repository {
nameWithOwner
}
contributions {
totalCount
}
}
}
}
}

I know this question is quite old, but I've ended up coding my own solution to this.
In the end the solution is to find all potential repositories where the user contributed using the organization_repositories and list_repositories services (I'm using octokit).
Then we find all active branches (service branches) on these repositories and for each of them find only the commits from our user (service commits).
The sample code is a little bit extensive, but can be found here
OBS: As pointed out, this solution does not consider organizations and repositories where you contributed but are not part of.

You can get info about the user using the API method: get-a-single-user
After that you can find all user repositories and then commits with function like that:
def get_github_email(user_login, user_name, key):
'''
:param str user_login: user login for GitHub
:param str key: your client_id + client_secret from GitHub,
string like '&client_id=your_id&client_secret=yoursecret'
:param str user_name: user GitHub name (could be not equeal to user_login)
:return: email (str or None) or False
'''
url = "https://api.github.com/users/{}/repos?{}".format(user_login, key)
#get repositories
reps_req = requests.get(url)
for i in reps_req.json():
if "fork" in i:
# take only repositories created by user not forks
if i["fork"] == False:
commits_url = "https://api.github.com/repos/{}/{}/commits?{}".format(user_login, i["name"], key)
#get commits
commits_req = requests.get(commits_url)
for j in commits_req.json():
#check if author is user (there may be commits from someone else)
if j.get("commit", {}).get("author", {}).get("name") == user_name:
return j["commit"]["author"]["email"]
return False

Related

Github API get user code merged into master branch of specific repo

Just had a question regarding GitHub's REST api, I am looking through the documentation but cant seem to find what I am looking for. Is there a way to look up how much code a user has merged into the master branch over a period of time of a specific repo?
essentially I would like to get a return of
user x committed x lines of code in the range of May 2020 to May 2021. Something like that.
GITHUB API
Unfortunately, the Github API doesn't provide those informations (yet?), you'll only get the amount of contributions the user made to the repository with the List repository contributors service.
WORKAROUND
There seems to be a workaround:
Consulting this URL: https://github.com/<repo_owner>/<repo_name>/graphs/contributors, I observed the following requisition was made to fill contributors datas:
Request URL: https://github.com/<repo_owner>/<repo_name>/graphs/contributors-data
Request Method: GET
Request Headers: Not Sure...
This endpoint will return something similar to the following object:
[
{
"total":N,
"author":{
"id":<user_id>,
"login":"<user_name>",
"avatar":"<user_avatar>",
"path":"/<user_name>",
"hovercard_url":"/users/<user_name>/hovercard"
},
"weeks":[
{
"w":1586044800,
"a":0,
"d":0,
"c":0
},
{
"w":1586649600,
"a":0,
"d":0,
"c":0
},
...
]
}
]
On this response, you'll have a list of contributors with contributors' details, where:
total: represent the amount of contribution this user made to the repository.
author: the Github user datas.
weeks: a list of weekly contribution where for each week:
w: is the week id
a: the amount of lines added
d: the amount of lines deleted
c: the amount of commits made
Therefore, to get what you want, you'll have to sum all weeks contributions for each user.
Observation: I understand that to do so, you need to have access to the repository insights (they may not be available to any user).

Getting listing of all repositories on Github with tags listing for a specific 'topic'

I am working on a simple package manager GUI which uses npm to install packages from Github.
In order for this to work I need the following information for all repositories that have a specific "topic":
Full name of the repository (account/repository)
Description of the repository.
The list of "topics"
The list of tag names on the repository (all tags beginning with a 'v').
My GUI would cache this data per session but potentially a lot of people would be using the GUI on separate computers.
Looking at the Github v3 API documentation I have been able to find the following two commands:
GET /search/repositories?q=topic:my-package-marker (with Accept: application/vnd.github.mercy-preview+json header to include the topics data).
For each of the repositories retrieved by the previous command, GET /repos/{repositoryName}/git/refs/tags
I have a few concerns with this approach:
The Github documentation says:
Find repositories via various criteria. This method returns up to 100 results per page.
Which means that the client would need to loop through and download all of the pages each time.
The Github documentation also says:
Rate limit
The Search API has a custom rate limit. For requests using Basic Authentication, OAuth, or client ID and secret, you can make up to 30 requests per minute. For unauthenticated requests, the rate limit allows you to make up to 10 requests per minute.
If there are more than 30 pages of packages then this limit would be exceeded quite quickly. But perhaps even with fewer pages since for each page I would also need to get the list of tags for each of the 100 per-page repositories.
Here is pseudo-code for what it seems that I must do to get this data:
dataPages = { };
for each page in GET /search/repositories?q=topic:my-package-marker
with "Accept: application/vnd.github.mercy-preview+json"
{
dataPage = new DataPage();
for each entry in page
{
entry = new Entry();
entry.FullName = entry.full_name
entry.Description = entry.description
entry.Keywords = entry.topics
entry.Versions = { };
for each tag in GET /repos/{entry.full_name}/git/refs/tags
{
if (tag.ref starts with "refs/tags/v")
{
entry.Versions.Add(tag.ref substring from character index 11);
}
}
dataPage.Add(entry);
}
dataPages.Add(dataPage);
}
What do I want to do this?
As of npm v5 you can now install packages directly from Git repositories. This is really useful because it means that npm can be used for packages that don't really fit into the node.js/npm ecosystem. npm is a perfect fit for these packages but the npm registry is not a suitable fit for hosting these packages.
What is the best way to achieve this?

Recommended way to list all repos/commits for a given user using github3.py

I'm building a GitHub application to pull commit information from our internal repos. I'm using the following code to iterate over all commits:
gh = login(token=gc.ACCESS_TOKEN)
for repo in gh.iter_repos():
for commit in repo.iter_commits():
print(commit.__dict__)
print(commit.additions)
print(commit.author)
print(commit.commit)
print(commit.committer)
print(commit.deletions)
print(commit.files)
print(commit.total)
The additions/deletions/total values are all coming back as 0, and the files attribute is always []. When I click on the url, I can see that this is not the case. I've verified through curl calls that the API indeed has record of these attributes.
Reading more in the documentation, it seems that iter_commits is deprecated in favor of iter_user_commits. Might this be the case why it is not returning all information about the commits? However, this method does not return any repositories for me when I use it like this:
gh = login(token=gc.ACCESS_TOKEN)
user = gh.user()
for repo in gh.iter_user_repos(user):
In short, I'm wondering what the recommended method is to get all commits for all the repositories a user has access to.
There's nothing wrong with iter_repos with a logged in GitHub instance.
In short here's what's happening (this is described in github3.py's documentation): When listing a resource from GitHub's API, not all of the attributes are actually returned. If you want all of the information, you have to request the information for each commit. In short your code should look like this:
gh = login(token=gc.ACCESS_TOKEN)
for repo in gh.iter_repos():
for commit in repo.iter_commits():
commit.refresh()
print(commit.additions)
print(commit.deletions)
# etc.

Count open pull requests and issues on GitHub

I like to count all open pull requests and issues in a repository with help of the GitHub API. I found out that the API endpoint /repos/:owner/:repo result contains a open_issues property. However this is the sum of the amount of issues and pull requests.
Is there a way to get or calculate the amount of open issue and pull requersts in a repository?
osowskit is correct, the easiest way to do this is to iterate over the list of issues and the list of pull requests in a repository (I'm assuming you would like to get separate counts for each, reading between the lines of your question).
The issues API will return both issues and pull requests, so you will need to count both and subtract the number of pull requests from the number of issues to get the count of issues that aren't also pull requests. For example, using the wonderful github3.py Python library:
import github3
gh = github3.login(token='your_api_token')
issues_count = len(list(gh.repository('owner', 'repo').issues()))
pulls_count = len(list(gh.repository('owner', 'repo').pull_requests()))
print('{} issues, {} pull requests'.format(issues_count - pulls_count, pulls_count))
A more efficient way (than the accepted answer) is to use the search API.
To get the number of open issues you can call (replace with your org and repo):
https://api.github.com/search/issues?q=repo:realm/realm-java%20is:issue%20is:open&per_page=1
and to get the number of PR's:
https://api.github.com/search/issues?q=repo:realm/realm-java%20is:pr%20is:open&per_page=1
You can of course remove is:open if you want both open and closed issues/pr's.
Both will return "total_count" with the result. Note that I added per_page=1 to not actually retrieve all the issues.
With Github Graphql API, you can now do this in one single request:
{
repository(owner: "mui-org", name: "material-ui") {
issues(states: OPEN) {
totalCount
}
pullRequests(states: OPEN) {
totalCount
}
}
}
Output:
{
"data": {
"repository": {
"issues": {
"totalCount": 471
},
"pullRequests": {
"totalCount": 47
}
}
}
}
You don't need to iterate over all pull requests. The GitHub API returns pages and in the link header, you have access to the first, previous, next and last pages. You can use that to implement a more efficient algorithm:
1) Fetch the first page and specify a page size of 1
2) Get the value of the last page link (i.e. the number of pages)
3) You will thereby have the number of pages, hence of PRs and will not have to fetch all of these pages.
In case anyone is wondering, the accepted answer of constructing a search query with the Github Search API can take an is:merged argument instead of is:open, even though that option isn't well documented in the Github API:
For example:
https://api.github.com/search/issues?q=repo:realm/realm-java%20is:issue%20is:merged&per_page=1

How can I get a list of all pull requests for a repo through the github API?

I want to obtain a list of all pull requests on a repo through the github API.
I've followed the instructions at http://developer.github.com/v3/pulls/ but when I query /repos/:owner/:repo/pulls it's consistently returning fewer pull requests than displayed on the website.
For example, when I query the torvalds/linux repo I get 9 open pull requests (there are 14 on the website). If I add ?state=closed I get a different set of 11 closed pull requests (the website shows around 20).
Does anyone know where this discrepancy arises, and if there's any way to get a complete list of pull requests for a repo through the API?
You can get all pull requests (closed, opened, merged) through the variable state.
Just set state=all in the GET query, like this->
https://api.github.com/repos/:owner/:repo/pulls?state=all
For more info: check the Parameters table at https://developer.github.com/v3/pulls/#list-pull-requests
Edit: As per Tomáš Votruba's comment:
the default value for, "per_page=30". The maximum is per_page=100. To get more than 100 results, you need to call it multiple itmes: "&page=1", "&page=2"...
PyGithub (https://github.com/PyGithub/PyGithub), a Python library to access the GitHub API v3, enables you to get paginated resources.
For example,
g = Github(login_or_token=$YOUR_TOKEN, per_page=100)
r = g.get_repo($REPO_NUMBER)
for pull in r.get_pulls('all'):
# You can access pulls
See the documentation (http://pygithub.readthedocs.io/en/latest/index.html).
With Github's new official CLI (command line interface):
gh pr list --repo OWNER/REPO
which would produce something like:
Showing 2 of 2 pull requests in OWNER/REPO
#62 Doing something that-weird-branch-name
#58 My PR title wasnt-inspired-branch
See additional details and options and installation instructions.
There is a way to get a complete list and you're doing it. What are you using to communicate with the API? I suspect you may not be doing something correctly. For example (there are only 13 open pull requests currently) using my API wrapper (github3.py) I get all of the open pull requests. An example of how to do it without my wrapper in python is:
import requests
r = requests.get('https://api.github.com/repos/torvalds/linux/pulls')
len(r.json()) == 13
and I can also get that result (vaguely) in cURL by counting the results myself: curl https://api.github.com/repos/torvalds/linux/pulls.
If you, however, run into a repository with more than 25 (or 30) pull requests that's an entirely different issue but most certainly it is not what you're encountering now.
If you want to retrieve all pull requests (commits, comments, issues etc) you have to use pagination.
https://developer.github.com/v3/#pagination
The GET request "pulls" will only return open pull-requests.
If you want to get all pull-requests either you do set the parameter state to all, or you use issues.
Extra information
If you need other data from Github, such as issues, then you can identify pull-requests from issues, and you can then retrieve each pull-request no matter if it is closed or open. It will also give you a couple of more attributes (mergeable, merged, merge-commit-sha, nr of commits etc)
If an issue is a pull-request, then it will contain that attribute. Otherwise, it is just an issue.
From the API: https://developer.github.com/v3/pulls/#labels-assignees-and-milestones
"Every pull request is an issue, but not every issue is a pull request. For this reason, “shared” actions for both features, like manipulating assignees, labels and milestones, are provided within the Issues API."
Edit I just found that issues behaves similar to pull-requests, so one would need to do retrieve all by setting the state parameter to all
You can also use GraphQL API v4 to request all pull requests for a repo. It requests all the pull requests by default if you don't specify the states field :
{
repository(name: "material-ui", owner: "mui-org") {
pullRequests(first: 100, orderBy: {field: CREATED_AT, direction: DESC}) {
totalCount
nodes {
title
state
author {
login
}
createdAt
}
}
}
}
Try it in the explorer
The search API shoul help: https://help.github.com/enterprise/2.2/user/articles/searching-issues/
q = repo:org/name is:pr ...
GitHub provides a "Link" header which specifies the previous, next and last URL to fetch the values.Eg, Link Header response,
<https://api.github.com/repos/:owner/:repo/pulls?state=all&page=2>; rel="next", <https://api.github.com/repos/:owner/:repo/pulls?state=all&page=15>; rel="last"
rel="next" suggests the next set of values.
Here's a snippet of Python code that retrieves information of all pull requests from a specific GitHub repository and parses it into a nice DataFrame:
import pandas as pd
organization = 'pvlib'
repository = 'pvlib-python'
state = 'all' # other options include 'closed' or 'open'
page = 1 # initialize page number to 1 (first page)
dfs = [] # create empty list to hold individual dataframes
# Note it is necessary to loop as each request retrieves maximum 30 entries
while True:
url = f"https://api.github.com/repos/{organization}/{repository}/pulls?" \
f"state={state}&page={page}"
dfi = pd.read_json(url)
if dfi.empty:
break
dfs.append(dfi) # add dataframe to list of dataframes
page += 1 # Advance onto the next page
df = pd.concat(dfs, axis='rows', ignore_index=True)
# Create a new column with usernames
df['username'] = pd.json_normalize(df['user'])['login']