How do I programmatically extract GitHub repositories that contain a code string? - github

I am looking for a way to extract GitHub repositories containing files with a certain code string. I can do manually using the GitHub search bar. For instance, if I'm looking for the usages of the library pymc3 I could look for it in the search bar and then click on Code
How does one do this programmatically?
I tried going over the GitHub Search API documentation. The Search Code functionality allows looking into code but that seems to only search based on an user, organization, or repository. The Search Repositories functionality only looks into the description, title and README.
Update 1:
While browsing this post, I believe I found the answer to identify some repositories that contain a code string.
If I write the following code -
url = "https://api.github.com/search/code?q=pymc3 +in:file"
headers = {
'Authorization': 'Token xxxxxxxxxxxxxxxxx'
}
response = requests.request("GET", url, headers=headers)
print(response.text)
I get the following result -
"total_count":43642,"incomplete_results":false,"items":[{"name":"pymc3_stoch_vol ...
However, the result gives me a bunch of information such as the git URL, HTML URL and some of the repositories that contain this string. I need to find a way to extract all the repositories that contain this string.
Update 2:
I now understand that GitHub limits results to 100 per page and 1000 results overall.
The only question remains why I didn't find this information on GitHub Search API documentation? Please do let me know if my understanding or the linked answer is wrong.

This kind of query should be addressed more by GraphQL API, but searching code is still not supported.
Only the new code-search (presented here) might be able to provide that, but:
it is still in beta
its API is not yet public.
So for now, code search in all GitHub repositories is not supported.

Related

Confluence Cloud - How to update content status via the API

I'm struggling with how to find the content status of a page in Confluence.
My end goal is to be able to change/update via the API.
I've added the list of statuses already in the Manage space section. I have successfully pulled the content of a page as well as its properties, but I can’t seem to find where the content status is stored.
Here is the URI I'm using:
https://MyDomain.atlassian.net/wiki/rest/api/content/145468621376?expand=space,body.storage,view,version.status,container,extentions
That Status is not present in OOB Confluence, so I suppose it is some third-party app from Atlassian Marketplace and you need to check with their documentation how to interact with it.
Of course, you can directly use the REST API to get page content as a string (HTML) and change its content, e.g. using Python (Atlassian Python API’s documentation)
Page actions
def my_page = confluence.get_page_property(page_id, page_property_key)
def new_body = my_page.body.replace("<macro .....>", "<new status in HTML>")
confluence.update_or_create(parent_id, title, new_body, representation='storage')
Finally sorted it out with the help of Atlassian support. If their documentation was correct it would've been super easy to do this.
https://developer.atlassian.com/cloud/confluence/rest/api-group-content-states/#api-group-content-states
Heres the catch. when you GET the status you have to add on the parameter for status even though its optional. so your get string needs to look like this:
your-domain.atlassian.net/wiki/rest/api/content{id}/state?status=current
Same goes for setting the new state. You have to add on the status parameter to the uri.

How to fetch JSON about a particular repository using github API

I've read the official documentation (GitHub docs) but it's not very clear about fetching information about a particular repo.
I want to fetch information (in JSON) about a particular repo of mine using browser only (and not Postman).
I tried this URL and it is fetching me all the repositories.
https://api.github.com/users/tmtanzeel/repos
But I need info about a particular repo only. I tried these:
https://api.github.com/users/tmtanzeel/repos/angular-project
https://api.github.com/users/tmtanzeel/repos?name=angular-project
https://api.github.com/users/tmtanzeel/repos?id=191101189
But none is working. Please pitch in.
Ok, I got it. I figured it out.
Solution:
Instaed of /<username>/repos we should be doing /repos/<username>.
Sample:
https://api.github.com/repos/tmtanzeel/repo-name
A lot more information you'll find with the above URL:

I am not able to filter issue via github search api

I am a collaborator for a private repository and able to edit, push code, create issues, close issues, etc on it. I am trying to create a report of issues open and closed on the repository. To achieve this I needed to get issues based on time interval and label. I found that the GitHub search API will be useful for me.
I started out by creating a token (PAT) giving it the whole repo scope
Then to test the API I hit the below URL with the token
https://api.github.com/search/issues?q=repo:orgname/reponame
I am able to get the results.
Then I tried to narrow down by adding is:issue and is:closed qualifier using the same token
https://api.github.com/search/issues?q=repo:orgname/reponame+is:issue+is:closed
I got the below response
{
"message": "Validation Failed",
"errors": [
{
"message": "The listed users and repositories cannot be searched either because the resources do not exist or you do not have permission to view them.",
"resource": "Search",
"field": "q",
"code": "invalid"
}
],
"documentation_url": "https://docs.github.com/v3/search/"
}
The issues are present and I can search it on Github website, but couldn't via github search api. I am able to apply a repo qualifier but couldn't add any other qualifiers.
What am I missing here?
There's two things I've found that can cause this.
In your case it's likely permissions since your encoding appears to be fine.
Searching specifically for PRs or Issues on private repos requires 'content' permissions (this is incorrectly documented in the GitHub docs as requiring metadata permissions). If a user has no public repos but they do have private ones then you get a permission error like the above, rather than the empty response that you get if they have no repos of any kind or only public repos but no matching results.
The other thing I've found that causes this is incorrect encoding of the query. An easy error to make there (from experience) is having the + sign in the query and then encoding it. This encodes the + as %3A whereas what you want is a space between each query term. The space is then encoded as +. Making this encoding mistake will also result in the same permission/not found error.
I appreciate this is probably a little late for you, but hopefully it helps others.

How to get a link to a file from a VSO repo

When I browse a GitHub repo, I can copy the URL from the browser, and I can share it like this -
https://github.com/zlatko-michailov/onesql/blob/master/lang/src/onesql.syntax.ts. The file content is returned in the http response stream without any decorations.
How can I do the same thing for a VSO repo? If I have to tweak the URL a little bit, that's OK.
I see the browser uses a REST API that is documented here - https://learn.microsoft.com/en-us/rest/api/azure/devops/git/items/get?view=azure-devops-rest-5.0. I played with different combinations of includeContent, $format, download, etc., but I could only get the content as a separate download, not in the http response body.
The subject file is some CSV data, and the client is Excel, which doesn't seem to be able to handle downloads.
I solved my own problem. There is no need to create a feed.
The API that fetches raw files is sourceProviders. The link is here: https://learn.microsoft.com/en-us/rest/api/azure/devops/build/source%20providers/get%20file%20contents?view=azure-devops-rest-5.0
It is not very well documented - examples for the required parameters are missing. The tricky one is sourceProvider. It has to be tfsgit. Skipping serviceEndpointId worked for me.
Here is the pattern:
GET https://dev.azure.com/{organization}/{project}/_apis/sourceProviders/tfsgit/filecontents?&repository={repository}&commitOrBranch={commitOrBranch}&path={path}&api-version=5.0-preview.1

How to search for code in GitHub with GitHub API?

I'm trying to search for some piece of code using the GitHub API V3 given only the keyword, not limiting by user, organization, or repository.
For example, if I want to search for all pieces of code that contain the keyword "addClass", the results would be
https://github.com/search?q=addClass&type=Code&ref=searchresults without using GitHub API.
But how can I do the same thing through GitHub API? I tried https://api.github.com/search/code?q=addClass
It says "Must include at least one user, organization, or repository". How can I fix this?
You can do a code search without specifying a user/org/repo if you authenticate.
First, generate a personal access token for use for this purpose, from your Profile on GitHub's website:
Settings -> Developer Settings -> Personal Access Token -> Generate New Token (you can leave all access options unticked, since you're just using to make web requests)
Now, your original GET request will work and return results, if you append the token to it:
https://api.github.com/search/code?q=addClass&access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
UPDATE: OCT 2021
As pointed out by a comment below, passing the token in via a query parameter (like above) is deprecated. You must now add it as an Authorization header.
e.g.
curl --location --request GET 'https://api.github.com/search/code?q=addClass +in:file +language:csharp' \
--header 'Authorization: Token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
or in Python:
import requests
url = "https://api.github.com/search/code?q=addClass +in:file +language:csharp"
headers = {
'Authorization': 'Token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
}
response = requests.request("GET", url, headers=headers)
print(response.text)
2020: As detailed in Mark Z.'s answer, using an authentication (Authorization': 'Token xxxx') allows for a code search.
get /search/code
You can use:
either a dedicated command-line tool like feinoujc/gh-search-cli
ghs code --extension js "import _ from 'lodash'"
or the official GitHub CLI gh, (after a gh auth login) as show in issue 5117:
gh api --method=GET "search/code?q=filename:test+extension:yaml+org:new-org"
Or even:
gh api --method=GET search/code -f q='filename:test extension:yaml org:new-org' \
--jq '.items[] | [.repository.full_name,.path,.sha] | #tsv'
That would get a line-based, tab-separated list of fields in this order: repo name, file path, git sha. (see gh help formatting)
2014 (original answer): That seems related to the new restriction "New Validation Rule for Beta Code Search API" (Oct. 2013)
In order to support the expected volume of requests, we’re applying a new validation rule to the Code Search API. Starting today, you will need to scope your code queries to a specific set of users, organizations, or repositories.
So, the example of the API search code mentions now:
Suppose you want to find the definition of the addClass function inside jQuery. Your query would look something like this:
https://api.github.com/search/code?q=addClass+in:file+language:js+repo:jquery/jquery
While Gihub does not currently support code search without repo, user, or organization (see VonC's answer), codesearch does index some sources from Github via the codesearch API, albeit with an API less fully featured out than Github's.
For example, to search for wget invocations indexed from Github, call
curl "https://searchcode.com/api/codesearch_I/?q=wget&src=2"
The optional src parameter picks the code source (e.g., Github, BitBucket) that should be searched, and you can find its integer value for a source by altering the parameters of faceted search in the codesearch UI. The current value of src for Github is 2.
You can verify that the returned results from the above example come from github.com by viewing the the repo property of results items.