How to search for code in GitHub with GitHub API? - github

I'm trying to search for some piece of code using the GitHub API V3 given only the keyword, not limiting by user, organization, or repository.
For example, if I want to search for all pieces of code that contain the keyword "addClass", the results would be
https://github.com/search?q=addClass&type=Code&ref=searchresults without using GitHub API.
But how can I do the same thing through GitHub API? I tried https://api.github.com/search/code?q=addClass
It says "Must include at least one user, organization, or repository". How can I fix this?

You can do a code search without specifying a user/org/repo if you authenticate.
First, generate a personal access token for use for this purpose, from your Profile on GitHub's website:
Settings -> Developer Settings -> Personal Access Token -> Generate New Token (you can leave all access options unticked, since you're just using to make web requests)
Now, your original GET request will work and return results, if you append the token to it:
https://api.github.com/search/code?q=addClass&access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
UPDATE: OCT 2021
As pointed out by a comment below, passing the token in via a query parameter (like above) is deprecated. You must now add it as an Authorization header.
e.g.
curl --location --request GET 'https://api.github.com/search/code?q=addClass +in:file +language:csharp' \
--header 'Authorization: Token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
or in Python:
import requests
url = "https://api.github.com/search/code?q=addClass +in:file +language:csharp"
headers = {
'Authorization': 'Token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
}
response = requests.request("GET", url, headers=headers)
print(response.text)

2020: As detailed in Mark Z.'s answer, using an authentication (Authorization': 'Token xxxx') allows for a code search.
get /search/code
You can use:
either a dedicated command-line tool like feinoujc/gh-search-cli
ghs code --extension js "import _ from 'lodash'"
or the official GitHub CLI gh, (after a gh auth login) as show in issue 5117:
gh api --method=GET "search/code?q=filename:test+extension:yaml+org:new-org"
Or even:
gh api --method=GET search/code -f q='filename:test extension:yaml org:new-org' \
--jq '.items[] | [.repository.full_name,.path,.sha] | #tsv'
That would get a line-based, tab-separated list of fields in this order: repo name, file path, git sha. (see gh help formatting)
2014 (original answer): That seems related to the new restriction "New Validation Rule for Beta Code Search API" (Oct. 2013)
In order to support the expected volume of requests, we’re applying a new validation rule to the Code Search API. Starting today, you will need to scope your code queries to a specific set of users, organizations, or repositories.
So, the example of the API search code mentions now:
Suppose you want to find the definition of the addClass function inside jQuery. Your query would look something like this:
https://api.github.com/search/code?q=addClass+in:file+language:js+repo:jquery/jquery

While Gihub does not currently support code search without repo, user, or organization (see VonC's answer), codesearch does index some sources from Github via the codesearch API, albeit with an API less fully featured out than Github's.
For example, to search for wget invocations indexed from Github, call
curl "https://searchcode.com/api/codesearch_I/?q=wget&src=2"
The optional src parameter picks the code source (e.g., Github, BitBucket) that should be searched, and you can find its integer value for a source by altering the parameters of faceted search in the codesearch UI. The current value of src for Github is 2.
You can verify that the returned results from the above example come from github.com by viewing the the repo property of results items.

Related

How do I programmatically extract GitHub repositories that contain a code string?

I am looking for a way to extract GitHub repositories containing files with a certain code string. I can do manually using the GitHub search bar. For instance, if I'm looking for the usages of the library pymc3 I could look for it in the search bar and then click on Code
How does one do this programmatically?
I tried going over the GitHub Search API documentation. The Search Code functionality allows looking into code but that seems to only search based on an user, organization, or repository. The Search Repositories functionality only looks into the description, title and README.
Update 1:
While browsing this post, I believe I found the answer to identify some repositories that contain a code string.
If I write the following code -
url = "https://api.github.com/search/code?q=pymc3 +in:file"
headers = {
'Authorization': 'Token xxxxxxxxxxxxxxxxx'
}
response = requests.request("GET", url, headers=headers)
print(response.text)
I get the following result -
"total_count":43642,"incomplete_results":false,"items":[{"name":"pymc3_stoch_vol ...
However, the result gives me a bunch of information such as the git URL, HTML URL and some of the repositories that contain this string. I need to find a way to extract all the repositories that contain this string.
Update 2:
I now understand that GitHub limits results to 100 per page and 1000 results overall.
The only question remains why I didn't find this information on GitHub Search API documentation? Please do let me know if my understanding or the linked answer is wrong.
This kind of query should be addressed more by GraphQL API, but searching code is still not supported.
Only the new code-search (presented here) might be able to provide that, but:
it is still in beta
its API is not yet public.
So for now, code search in all GitHub repositories is not supported.

APIs Explorer's Firestore's projects.databases.documents.list has incorrect regex

Summary: APIs Explorer "Try this API" for projects.databases.documents.list appears (!?) to have an overly restrictive/incorrect regex on parent and requirement on collectionId.
Neither gcloud firestore nor firebase firestore: provides functionality to list collections, I'm planning to write a simple app to do so.
As always, I explored the APIs methods using the excellent APIs Explorer but the projects.databases.documents.list "Try this API" appears (!?) to have an overly restrictive|incorrect regex on parent.
The documentation correctly states that:
https://firestore.googleapis.com/v1/{parent=projects/*/databases/*/documents/*/**}/{collectionId}
And:
Required. The parent resource name. In the format:
projects/{project_id}/databases/{database_id}/documents or
projects/{project_id}/databases/{database_id}/documents/{document_path}.
For example: projects/my-project/databases/my-database/documents or projects/my-project/databases/my-database/documents/chatrooms/my-chatroom
Using Google's first format example: projects/my-project/databases/my-database/documents does not work:
APIs Explorer only accepts the second format example for parent but then requires a value for collectionId which may not be desired:
APIs Explorer appends collectionId to the parent to from the URL. In the case of parent ending /documents (which isn't permitted), this would make sense to access the chatrooms collection or in the case that parent ends /documents/chatrooms/my-chatroom to then get the messages collection (with my-chatroom) but the requirement prohibits using APIs Explorer (!) to query projects/my-project/databases/my-database/documents/chatrooms; collectionId is required and would need to be chatrooms but a parent of projects/my-project/databases/my-database/documents is not permitted.
Using one of my projects (${PROJECT}) and (default) for {database_id}, I can use the documentation's examples correctly in curl:
TOKEN=$(gcloud auth print-access-token)
PROJECT=...
PARENT=projects/${PROJECT}/databases/(default)/documents
COLLECTION=...
curl \
--header "Authorization: Bearer ${TOKEN}" \
--header 'Accept: application/json' \
--write-out '%{response_code}' \
--output /dev/null \
--silent \
https://firestore.googleapis.com/v1/${PARENT}/${COLLECTION}
200
The APIs works correctly navigating down through subcollections too.
Posting this here in the hopes that, if APIs Explorer is indeed incorrect, fixing it can help other developers not encounter this issue and be discouraged.
Note: Since I'm posting feedback, the tool doesn't correctly adjust cURL, HTTP or JavaScript generated examples to reflect the checkbox value on "API Key"; when the "API Key" is deselected, the parameter should not be included in the calls.
It seems to be a problem with the regular expression in the API explorer. I need to enter the parent as projects/fire-template-24/databases/(default)/documents/users/ (including the forward slash) and also enter the collectionId as it's required (users in my case). This seems to be making a query at:
https://[GOOGLE+APIS_URL]/projects/[ID]/databases/(default)/documents/users//users?key=[API_KEY]
This returns an error about the extra trailing slash which the API explorer forced me to add and collectionId field is also appended to it with another /.
The following regular expression seems to be working:
^projects\/[^/]+\/databases\/[^/]+\/documents(\/[^/]+(\/[^/]+\/[^/]+)*)$
The above RegEx doesn't required a forward slash in parent and matches a collection path only. If the collectionId is removed then it should work. Working example of the RegEx can be found here.
I could also observe the same in the API explorer page. I observed that there is an existing issue in the Public Issue Tracker for the same. As mentioned in the #2 comment of the issue, this issue report has been forwarded to the Firestore Engineering team to investigate, but there is no ETA for a resolution. I would suggest you star the issue so that you will get notified whenever there is any update on the issue.

List all (private) repositories of a GitHub organization

I am the owner of a GitHub organization. All repos in that org are set to private.
In the web UI dashboard, I can see that there are 112 repos in my organization. However, when I request all repositories via API (https://docs.github.com/en/rest/reference/repos#list-organization-repositories) I only get around 30 of these back.
curl -i -u username:oauth-token https://api.github.com/orgs/org/repos
Adding a query string like ?type=all to the URL does not make any difference.
Thank you for your help and ideas.
K
The trick is to use the paging query parameters AND to quote the request URI.
curl -i -u username:oauth-token "https://api.github.com/orgs/org/repos?per_page=100&page=1"
By default GitHub API returns 30 results per page here. Just as said in the doc link, try setting per_page (max 100) to get more:
https://api.github.com/orgs/org/repos?per_page=100
And use page parameter to get next pages:
https://api.github.com/orgs/org/repos?per_page=100&page=2

Azure DevOps/VSTS REST API does not get changes of a changeset

I'm trying to to get the changes of a changeset but it returns 404. I used this:
https://<myname>.visualstudio.com/<projectname>/_apis/tfvc/changesets/291/changes
changeset exists
without the '/changes' it works, returns the changeset info but I also need the merge sources
tried to specify the API version (e.g.: api-version-5.0)
I created a full control Personal Access Token for the client app but no luck. I tried to use this link in the browser and I got the same result: it works only without '/changes'.
What did I wrong?
As this is an old Q, this is for anyone else who has same problem, The projectname needs to be removed from the request.
https://<myname>.visualstudio.com/_apis/tfvc/changesets/291/changes
You look at the docs and sure enough it's not there but most other REST calls require a project name, so it can be confusing.
Also the docs are not very clear that you can interchange https://{myName}.visualstudio.com/ for the documented https://dev.azure.com/{organization}

Github API to get the author/committer of specific line of code

I'm using Github code search API to get search for some text in a given repository. The response doesn't seem to contain any information related to the content that matched, sha/hash of the commit or the author/committer info:
https://api.github.com/search/code?q=addClass+repo:jquery/jquery
Is there any way to get the author/committer of the given line of code in a repo using the Github API?
Thanks!
For searching commits we can use the new Commit Search API which is currently available for developers for preview . During the preview period we need to explicitly specify a custom media type in the 'Accept' header. For example your search using this API could be - curl -H 'Accept: application/vnd.github.cloak-preview' \https://api.github.com/search/commits?q=addClass+repo:jquery/jquery
. ref: Search Commits