GitHub API - how to compare 2 commits - github

It's possible to get list of changed files between two commits.
Something like that
comparison between two commits in web version but using GitHub Api.

The official commit comparison API is Compare two commits:
GET /repos/:owner/:repo/compare/:base...:head
Both :base and :head can be either branch names in :repo or branch names in other repositories in the same network as :repo. For the latter case, use the format user:branch:
GET /repos/:owner/:repo/compare/user1:branchname...user2:branchname
Note that you can use tags or commit SHAs as well.
For instance:
https://api.github.com/repos/git/git/compare/v2.2.0-rc1...v2.2.0-rc2
Note the '...', not '..' between the two tags.
And you need to have the oldest tag first, then the newer tag.
That gives a status:
"status": "behind",
"ahead_by": 1,
"behind_by": 2,
"total_commits": 1,
And for each commit, information about the files:
"files": [
{
"sha": "bbcd538c8e72b8c175046e27cc8f907076331401",
"filename": "file1.txt",
"status": "added",
"additions": 103,
"deletions": 21,
"changes": 124,
"blob_url": "https://github.com/octocat/Hello-World/blob/6dcb09b5b57875f334f61aebed695e2e4193db5e/file1.txt",
"raw_url": "https://github.com/octocat/Hello-World/raw/6dcb09b5b57875f334f61aebed695e2e4193db5e/file1.txt",
"contents_url": "https://api.github.com/repos/octocat/Hello-World/contents/file1.txt?ref=6dcb09b5b57875f334f61aebed695e2e4193db5e",
"patch": "## -132,7 +132,7 ## module Test ## -1000,7 +1000,7 ## module Test"
}
]
BUT:
The response will include a comparison of up to 250 commits. If you are working with a larger commit range, you can use the Commit List API to enumerate all commits in the range.
For comparisons with extremely large diffs, you may receive an error response indicating that the diff took too long to generate. You can typically resolve this error by using a smaller commit range.
Notes:
"same network" means: two repositories hosted by the same Git repository hosting services (two repositories on github.com for example, or on the same on-premise GHE -- GitHub Enterprise -- instance)
You can therefore compare two branches between a repo and its fork.
Example:
https://api.github.com/repos/030/learn-go-with-tests/compare/master...quii:master
compare link
diff link
(this example compares a fork to its original repo, not the original repo to the fork: that is because the fork, in this case, is behind the original repo)
As noted by Tom Carver in the comments:
this suggested API silently maxes out at 300 files shown;
I haven't yet found an API that avoids this limitation

Investigating answers coming with the official API, one can find a barely mentioned way to get diffs from Github. Try this:
wget -H 'Accept: application/vnd.github.v3.diff' \
http://github.com/github/linguist/compare/96d29b76...a20631af.diff
wget -H 'Accept: application/vnd.github.v3.diff' \
http://github.com/github/linguist/compare/a20631af...96d29b76.diff
This is the link you provided as an example, with .diff appended. And the reverse diff of the same.
The header given makes sure the request is handled by the Github's v3 API. That's currently the default, but might change in the future. See Media Types.
Why two downloads?
Github serves linear diffs from older to newer versions, only. If the requested diff is indeed linear and from an older to a newer version, the second download will be empty.
If the requested diff is linear, but from a newer to an older version, the first download is empty. Instead, the whole diff is in the second download. Depending on what one want to achieve, one can normally apply it to the newer version or reverse-apply (patch -R) it to the older version.
If there is no linear relationship between the pair of requested commits, both downloads get answered with non-zero content. One from the common anchestor to the first commit and another, reversed one from this common anchestor to the other commit. Applying one diff normally and the other one reversed gives what applying the output of git diff 96d29b76..a20631af would give, too.
As far as I can tell, these raw diffs aren't subject to Github's API limitations. Requests for 540 commits with 1002 file changes went flawlessly.
Note: one can also append .patch instead of .diff. Then one still gets one big file for each, but a set of individual patches for each commit inside this file.

Traumflug's answer isn't correct if you are using the API to access private repos. Actually, I think that answer doesn't require the header since it works without it in a public repo anyways.
You should not put the .diff at the end of the url and you should use the api subdomain. If you want the diff specifically, you only need to put the appropriate media type header in the request (and the token for authentication).
So for example:
wget -H 'Accept: application/vnd.github.v3.diff' \
https://api.github.com/repos/github/linguist/compare/96d29b76...a20631af?access_token=123
GitHub's documentation is super confusing since it says it only works for branch names, but it also accepts commit shas. Also, the returned JSON includes a diff_url that is just a direct link to the diff but does not work if the repo is private, which isn't very helpful.

Here's another actual executable example using the HEAD and HEAD~1 references on my public repo DataApp--ParamCompare which should help illuminate the :owner and :repo notation once substituted with clear parameters.
curl -X GET https://api.github.com/repos/jxramos/DataApp--ParamCompare/compare/HEAD~1...HEAD
As a sanity check the equivalent browser representation can be seen at https://github.com/jxramos/DataApp--ParamCompare/compare/HEAD~1...HEAD
In general the form goes as the following to lend an alternate parameter syntax for the api routing:
https://api.github.com/repos/<owner_name>/<repo_name>/compare/HEAD~1...HEAD
One can also invoke a url such as
https://api.github.com/repos/jxramos/DataApp--ParamCompare/compare/80f0bb42606888ce7fc66b4402fcc90a1709c9e8...255fe089543f5569f90af54168af904e88fc150f
There should be an equivalent graphql means to just pare down and select those results under the files list to select all the filename values to lend something of a git diff --name-only type output straight from remote. I'll update this answer if I figure it out.
My take on this is that the graphql API doesn't conduct operations which is what a diff is, but rather allows to to query primitive types and properties and the like of the repo itself. You can see the sort of entities you're dealing with by looking at the schema itself https://developer.github.com/v4/public_schema/

Related

Using a public access token for GitHub with Neo4j

I'm trying to use LOAD CSV with a CSV file stored in GitHub. It works fine with the 10 minute, temporary token you get when viewing the raw file, but I want something that's more persistent, as I need to be able to deploy this to multiple environments. Ten minutes just won't cut it.
I figured a private access token would be the way forward, but (once again) GitHub's spectacularly poor quality documentation made this much harder than it should be.
I set up a private access token with the repo and read:org permissions and with this I can get at my files using CURL, e.g.
curl -s https://<my_token>#raw.githubusercontent.com/<my repo>/<path>/<my file>.csv
This works fine and I see the contents of my test file.
But if I try to navigate to that URL I just get a 404 error and if I use it in Neo4j with a LOAD CSV statement, I get an error couldn't load the external resource at:.
I'm basically doing this:
LOAD CSV WITH HEADERS FROM '<URL that worked in CURL>' AS row
...and it fails miserably.
Where:
LOAD CSV WITH HEADERS FROM '<URL for raw file from GitHub with 10 minute token>' AS row
works fine, so I know I can access external files, i.e. files not in the import directory.
Is this just a failing with GitHub, or am I doing something wrong?
Although I hate answering my own questions, I left this kicking around a while and nobody came back with anything that helped.
I now know a whole lot more about public access tokens than I ever wanted to, but it was all worthwhile, as it helped me get around this issue.
There's an apoc.load.jsonParams function that accepts bearer tokens. From here it didn't take too much work to get this working the same way that LOAD CSV had done.
There was one last gotcha though, I soon discovered that the URLs for the repository can't include spaces or other non-alphanumeric characters, but that's a small price to pay for success?
So this doesn't work:
LOAD CSV WITH HEADERS FROM 'https://<my_token>#raw.githubusercontent.com/<my repo>/<path>/<my file>.csv' AS row...
Instead I had to use:
CALL apoc.load.jsonParams("https://raw.githubusercontent.com/<my repo>/<path>/<my file>.json", {Authorization: "Bearer <token>"}, null) YIELD value WITH value AS row...
There's an equivalent apoc.load.csvParams procedure, but I never tested this.

Bitbucket REST API does not handle second query parameter

I want to find out the changed files between two given commits/branches/tags using the Bitbucket Rest API.
I tried to use the diff command from here
curl -u USER:PASSWORD https://REPO-URL/rest/api/latest/projects/PROJECT/repos/REPO/compare/diff?from=COMMITHASH1&to=COMMITHASH2
where CAPITAL words are place holders for actual values I cannot post here.
The result of the request is always something like
The command "to" is spelled wrong or cannot be found
(the original result is in German, so that might be the translation).
However, if I switch the query parameters like .../diff?to=...&from=..., it says that the command from is unknown. I also tried other similar diff queries like .../compare/changes?from=...&to=... or .../diff?since=...&until=..., but the result was similar as mentioned above. Also giving branch names instead of commit-hashs showed no result.
Therefore, my assumption was that the second query parameter cannot be handled correctly by the API.
Other basic queries on the API like .../branches work fine, so authentication or whatever is no problem.
What am I doing wrong? Do I need to wrap the commit-hashs into "" or something like that?
Thank you very much!
PS: As the repository is commercially used, I cannot give you the actual url, user or password to try for yourself.

Github REST API - how to retrieve specific lines of codes (code snippet)

I would like to retrieve specific lines of codes via the REST API.
After a user has authorized access by connecting to his github account (via Web Application flow), I'd like to be able to programtically retrieve with the REST API a block of lines from a repo's file.
On the github.com UI, it's quite easy to get only certain lines: you can select multiples lines and get a "permalink" such as if it's form line 3 to 7 for example:
https://github.com/{username}/{repo_name}/blob/{specific file ex: ce3f225c2025556705353f8369097e760d063c6bbce3}/{file_path_in_the_repo}#L3-L7
On the API however I don't manage to do it. I manage to get the code but only for the WHOLE file, not restricted to certain lines with:
https://api.github.com/repos/{username}/{repository_name}/contents/{file_path}
For example the following code works:
https://api.github.com/repos/getsentry/sentry-ruby/contents/sentry-rails/app/jobs/sentry/send_event_job.rb
The result is
{
"name": "send_event_job.rb",
"path": "sentry-rails/app/jobs/sentry/send_event_job.rb",
"sha": "55314dd99703fc121516513a59e20377b2534f48",
"size": 980,
"url": "https://api.github.com/repos/getsentry/sentry-ruby/contents/sentry-rails/app/jobs/sentry/send_event_job.rb?ref=master",
"html_url": "https://github.com/getsentry/sentry-ruby/blob/master/sentry-rails/app/jobs/sentry/send_event_job.rb",
"git_url": "https://api.github.com/repos/getsentry/sentry-ruby/git/blobs/55314dd99703fc121516513a59e20377b2534f48",
"download_url": "https://raw.githubusercontent.com/getsentry/sentry-ruby/master/sentry-rails/app/jobs/sentry/send_event_job.rb",
"type": "file",
"content": "aWYgZGVmaW5lZD8oQWN0aXZlSm9iKQogIG1vZHVsZSBTZW50cnkKICAgIHBh\ncmVudF9qb2IgPQogICAgICBpZiBkZWZpbmVkPyg6OkFwcGxpY2F0aW9uSm9i\nKSAmJiA6OkFwcGxpY2F0aW9uSm9iLmFuY2VzdG9ycy5pbmNsdWRlPyg6OkFj\ndGl2ZUpvYjo6QmFzZSkKICAgICAgICA6OkFwcGxpY2F0aW9uSm9iCiAgICAg\nIGVsc2UKICAgICAgICA6OkFjdGl2ZUpvYjo6QmFzZQogICAgICBlbmQKCiAg\nICBjbGFzcyBTZW5kRXZlbnRKb2IgPCBwYXJlbnRfam9iCiAgICAgICMgdGhl\nIGV2ZW50IGFyZ3VtZW50IGlzIHVzdWFsbHkgbGFyZ2UgYW5kIGNyZWF0ZXMg\nbm9pc2UKICAgICAgc2VsZi5sb2dfYXJndW1lbnRzID0gZmFsc2UgaWYgcmVz\ncG9uZF90bz8oOmxvZ19hcmd1bWVudHM9KQoKICAgICAgIyB0aGlzIHdpbGwg\ncHJldmVudCBpbmZpbml0ZSBsb29wIHdoZW4gdGhlcmUncyBhbiBpc3N1ZSBk\nZXNlcmlhbGl6aW5nIFNlbnRyeUpvYgogICAgICBpZiByZXNwb25kX3RvPyg6\nZGlzY2FyZF9vbikKICAgICAgICBkaXNjYXJkX29uIEFjdGl2ZUpvYjo6RGVz\nZXJpYWxpemF0aW9uRXJyb3IKICAgICAgZWxzZQogICAgICAgICMgbWltaWMg\nd2hhdCBkaXNjYXJkX29uIGRvZXMgZm9yIFJhaWxzIDUuMAogICAgICAgIHJl\nc2N1ZV9mcm9tIEFjdGl2ZUpvYjo6RGVzZXJpYWxpemF0aW9uRXJyb3IgZG8K\nICAgICAgICAgIGxvZ2dlci5lcnJvciAiRGlzY2FyZGVkICN7c2VsZi5jbGFz\nc30gZHVlIHRvIGEgI3tleGNlcHRpb259LiBUaGUgb3JpZ2luYWwgZXhjZXB0\naW9uIHdhcyAje2Vycm9yLmNhdXNlLmluc3BlY3R9LiIKICAgICAgICBlbmQK\nICAgICAgZW5kCgogICAgICBkZWYgcGVyZm9ybShldmVudCwgaGludCA9IHt9\nKQogICAgICAgIFNlbnRyeS5zZW5kX2V2ZW50KGV2ZW50LCBoaW50KQogICAg\nICBlbmQKICAgIGVuZAogIGVuZAplbHNlCiAgbW9kdWxlIFNlbnRyeQogICAg\nY2xhc3MgU2VuZEV2ZW50Sm9iOyBlbmQKICBlbmQKZW5kCgo=\n",
"encoding": "base64",
"_links": {
"self": "https://api.github.com/repos/getsentry/sentry-ruby/contents/sentry-rails/app/jobs/sentry/send_event_job.rb?ref=master",
"git": "https://api.github.com/repos/getsentry/sentry-ruby/git/blobs/55314dd99703fc121516513a59e20377b2534f48",
"html": "https://github.com/getsentry/sentry-ruby/blob/master/sentry-rails/app/jobs/sentry/send_event_job.rb"
}
}
But if I add L3-L7, like below it does not change anything. I would have lked it to change for exmaple the download_url so that it only includes line 3 to 7:
https://api.github.com/repos/getsentry/sentry-ruby/contents/sentry-rails/app/jobs/sentry/send_event_job.rb#L3-L7
I don't find on the Github Docs which url to call to retrieve PROGRAMATICALLY with the REST API this type of multi-line code snippet?
Note: I know how to get the whole "download_url": https://raw.githubusercontent.com/getsentry/sentry-ruby/master/sentry-rails/app/jobs/sentry/send_event_job.rb file and then parse it to only keep line X to line Y but i would like to know if there's a direct API command to do what you can easily do with the UI.
Thanks
GitHub's REST API does not provide a way to extract just a few lines of a file. In the web interface, you get the entire rendered file with just a few lines highlighted, not just a snippet.
The reason this is the case is because extracting a limited number of lines from a file is actually much more work than extracting the entire file. All files are stored as Git blobs, and there isn't a way to extract only certain lines from a blob without reading the entire file up to that point, since blobs are stored compressed. Therefore, GitHub would actually expend much more effort to read the entire file into memory and then restrict it to just the lines you wanted, and as a result, such an API would be much more restricted and not be able to handle files that were nearly as large.
Also, in some cases, there is no sane answer to what constitutes a line. While Git normally wants files to be stored with LF endings, if a file has been checked in with CRLF endings, should those be handled? (If so, that's additional work to properly handle them.) If you have a binary file, like a JPEG, there are no lines. Similarly, while files in UTF-16 probably have lines, Git considers them binary files, so they probably wouldn't be able to be handled.
Note that the reason that your #L3-L7 doesn't work as part of the API, besides that the API doesn't support it, is that this is a fragment and is generally not sent to the server. It's supposed to identify a specific portion of a document, and that's typically done client-side, in the web browser. Since with your API request there is no client to do this, the server doesn't even see your request.

Github Search API filename only returns 1 result

I'm trying to search for all projects (or at least several thousand) from the github search API. I've gotten everything else to work, except the filters on filename.
For example, sending the following request to the search API only returns 1 result:
https://api.github.com/search/code?q=django+in:requirements.txt+filename:requirements.txt+language:python+org:openmicroscopy
Likewise, sending the following
https://api.github.com/search/repositories?q=filename:Makefile&per_page=100
only returns 1 result as well. I'm willing to bet that there is more than 1 repo on github with a Makefile or a dependency on Django. I must be doing something wrong, but I can't seem to figure out what it is.
According to this post on Github's developer site to support the expected volume of requests, they have added restrictions to code queries which requires us to specify set of users, organizations, or repositories with the query. Read about considerations for code search at this link
Now, about your search API requests, in the first one the in qualifier is provided with file name requirements.txt which is wrong.
The documentation states that in should be provided with file to restrict the search to the file contents, path to restrict the search to the file path or both.
Like this, in:file, in:path, in:file,path
So, if you want to search in file contents the correct API call should be
https://api.github.com/search/code?q=django+in:file+filename:requirements.txt+org:openmicroscopy
I removed the language qualifier since you are searching in a .txt file and doing this improved the result.
Checkout this URL, it will produce same results on the website,
https://github.com/search?utf8=%E2%9C%93&q=org%3Aopenmicroscopy+django+in%3Afile+filename%3Arequirements.txt&type=Code
Your second query is a repository search, it cannot not take a filename as qualifier you should see this link for available qualifiers.

Want to automatically process email attachments based on username and subject

I'm seeking advice about setting up an email gateway so students can email me homework and the email will be processed automatically.
For example, if a studenta#univ.edu emails me with a subject of "CS208 hw1", I would cross check studenta in a list of students taking CS208, then take all the attached files, dump them in that student's hw1 folder and respond with an email stating what files were received and when. If the student's email was malformed in some way such as bad subject, or missing files, the service would send an appropriate email.
I have administrative access to an on-campus Linux machine that could be configured as an email server.
Offhand I was thinking of using fetchmail and a cron job to consistently read a designated user's email and perform the appropriate responses with some sort of script. Does this sound like a good way to go? I would welcome better ideas?
I expect that in practice there will be far, far more exceptions to whatever rules you prescribe than there will be conforming mail which is properly handled. You'll be buying yourself a headache of manual fixups and "the computer ate my homework" claims.
Since this is a CS 200 level class, require them to use some version control system and save yourself the hassles of parsing free-format e-mail with the rigid structure that a VCS imposes. Your students will benefit too from the requirement. If my 10-year-old could appreciate the merit of automatic revision control within Google Docs, I'm guessing your students can handle Mercurial or git or even (gasp!) Subversion.
added in response to comment
Yes, but with Mercurial (and presumably git) "repository" is a fancy word for "directory" and is not the heavyweight DBMSy thingy that older VCS models may have led you to expect.
Here is how as a student I would expect to work on a hypothetical assignment:
studenta#dorm$ hg clone https://Rich.univ.edu/studenta/cs208
$ cd cs208 ; broswer ./hw1.html
$ mkdir hw1 ; cd hw1 ; make my work files
$ hg add * ; hg commit -m "perfect the first time!" # updates locally only
$ make lots of bug fixes
$ hg commit -m "okay really done now"
$ hg push
# sleep, party, go to class with hangover
$ hg pull
$ browse hw2.html ; mkdir hw2
...
The assignments in the student's repository placed there by you was just for the sake of demonstration. Since you "own" the Rich.unix.edu machine, their pushes become authoritative. You'd
Write a (tiny) script to hg init $student/cs208 on Rich.univ.edu for each student in the roster.
Figure whether HTTPS or SSH works best in your environment
Add commentary - if desired - to the student's files that they'd pick up on their next pull
Have a managed, convenient, logged record of all the interactions.
The students get affirmative feedback at the moment of push that it was accepted
Finally, should the repository server be down they could
$ hg export tip | mail -s "server down; assignment done" Rich#univ.edu
And you'd still have a timestamped, digested version of their submission which has a rigid format which you could commit for them, or better still:
"Dr. Rich, the server was down!!!"
"But
you sent me an export via e-mail,
yes?"
"Of course, sir."
"Well, just
push when the machine is back up, I
already have proof that you completed
it on time."
"Oh gee, Dr. Rich, you're
swell!"
Personally, I would root for a page with an upload dialog and also the possibility to list current files and maybe an FTP server. The problem with Email is, that the transmission until the server is out of your reach, as the mail gets processed by other servers than your own on the way. Mails could be lost or altered on the way, not all servers might accept attachments of a certain size or type. Although the idea is quite good, I think it would produce a less than optimal solution than others, like the mentioned page or ftp server.
edit
I'd rather prefer msw's way. A version control system would spare you much hassle and problems. * tips hat to msw*