Is there a way to get the number of repositories per language using Github's API? - github

I would like to use the Github API to retrieve the number of repositories for each language. For example,
C++ 200,134
Java 175,432
C# 123,453
...

The only API with a filter parameter would by the search repositories one:
GET /legacy/repos/search/:keyword
with the optional parameter language.
But that would returned a list of repositories on multiple page, so you would still need to make the sum yourself.
Note that very recently (as in early March, 2013), the API might limit the result to 1000 results only.

Following up on VonC's answer, the search API will now give you the total number of results matched by your query. So you can use this to get the total number of repositories for one particular language:
GET /search/repositories?q=language:languagename
Language name is case-insentitive, must be URL-encoded, and spaces must be replaced with dashes. For example (Objective C++):
GET /search/repositories?q=language:objective-c%2B%2B
{
"total_count": 2090,
...

Related

REST Protocol for searching and filtering

The standard REST verb for returning a value GET can take different parameters to select what to "get". Often there is one that takes an id to get a single value, and often some sort of search criteria to get a list.
Is there a standard way to specify the filtering and sorting of the data that is being searched for? For example, if I have an invoice record I'd like to write a GET query that says "give me all invoices for customer 123, with total > $345 and return in descending order of date".
If I were writing this myself I'd have something like:
GET http://example.com/mydata?query="customer=123&&total>345.00"&order="date"
(Note I didn't urlencode the url for clarity, though obviously that is required in practice, but I hope you get what I mean.)
I can certainly write something for this, but I am wondering if there is a standardized way to do this?
Is there a standard way to specify the filtering and sorting of the data that is being searched for?
Not that I'm aware of.
Note that HTTP doesn't really have queries (yet); HTTP has resource identifiers.
We've got a standard for resource identifiers (RFC 3986) and a standard for URI templates (RFC 6570) that describes how to produce a range of identifiers via variable expansion.
But as far as I can tell there is no published "standard" that automatically transforms a URI into a SQL query.
It's possible that one of the "convention over configuration" frameworks (ex: Rails) might have something useful here, but I haven't found it.

How can I specify multiple languages when sending a GET request to GitHub search API

I wonder how can I send a GET request to GitHub search API, specifically https://api.github.com/search/repositories and make the query to include several languages instead of one.
Here's my current query.
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala&sort=stars&order=desc&per_page=10
I have tried doing something like this but it didn't work as well
https://api.github.com/search/repositories?q=stars:%3E=1000+language:[scala, java]&sort=stars&order=desc&per_page=10
Thanks for your help
You need to pass in multiple language: element for being able to pass multiple languages to the query as per the doc.
For your specific case, the query would be :
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala+language:java&sort=stars&order=desc
with pagination applied it would be :
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala+language:java&sort=stars&order=desc&per_page=10
However, with pagination applied your search results will be limited in the browser.

GitHub Api: list of all repos with a given language

Yes, there is this question:
Github API: How to get all repositories written in a given language
however the answer provided only returns 100 results.
So how can I get the list of ALL repositories for a given language,
e.g. for Mathematica
curl https://api.github.com/search/repositories?q=language:mathematica
says there are 8000+ items that I should get, but this returns only top 30...
I have tried since
As suggested by #Bertrand Martel adding
&page=<page>&per_page=100
works.
You just have to request page 1 with 1 result per page to get total results, and then iterate over pages as needed.

REST API Get single latest resource

I'm designing a REST api and interested if anyone can help with best practice in the following scenario.
I have...
GET Customers/{customerId}/Orders - to get all customer orders
GET Customers/{customerId}/Orders/{orderId} - to get a particular order
I need to provide the ability to get a customers most recent order. What is best practice in this scenario? Simply get all and sort by date or provide a specific method?
I need to provide the ability to get a customers most recent order.
Of course you could provide query parameters to filter, sort and slice the orders collection, but why not making it simpler and give the latest order if the client needs it?
You could use something like (returning a representation of a single order):
GET /customers/{customerId}/orders/latest
The above URL will map an order that will change over the time and it's perfectly fine.
Say there is also a case where you need last 5 orders. How would your route(s) look like?
The above approach focus on the ability to get a customers most recent order requirement. If returning the last 5 orders requirement eventually comes up after some time, I would probably introduce another mapping such as /recent that returns a representation of a collection with the recent orders and accepts a query parameter that indicates the amount of orders to be returned (5 would be the default value if the parameter is omitted).
The /latest mapping would still be valid and would return a representation of the very latest order only.
Providing query parameters to filter, sort and slice the orders collection is still a valid approach.
The key is: If you know the client who will consume the API, target it to their needs. Otherwise, make it more generic. And when modifying the API, be careful with breaking changes and versioning the API is also welcome.
I think there is no need for another route.
Pass something like &order=-created_at&limit=1 in your get request
Or &order=created_at&orderby=DESC&limit=1 (note I'm not sure about naming your params so maybe you could use &count=1 instead of &limit=1, ditto order params)
I think it also depends whether you are using pagination or not on that route, so perhaps additional params are required
Customers/{customerId}/Orders?order=-created_at&limit=1
The Github API for the similar use case is using latest, to fetch the single resource which is latest.
https://docs.github.com/en/rest/reference/repos#get-the-latest-release
So to fetch a single resource which is latest you can use.
GET /customers/{customerId}/orders/latest
However would like to know what community think about this.
IMO the resource/latest gives an impression that the response will be a list of resource sorted by latest to oldest.

GitHub API - latest public repositories

I would like to list public GitHub repositories with the latest create/update/push timestamps (for me any of these is acceptable). Can I achieve this with the GitHub API?
I have tried the following:
Tried using /repositories endpoint, and use the link header to navigate to the last page. However, the link header I receive only has first and next links, whereas I need a last link.
Tried using /search/repositories endpoint. This will work as long as I have a keyword or filter in the q parameter, but it will not accept an empty q parameter.
I got in touch with GitHub support, and there are two solutions to this:
Use binary search on the since parameter of the /repositories endpoint to find the last page.
Cons: may quickly exhaust the API rate limit.
Use the /search/repositories endpoint with an always-true predicate such as stars>=0.
Cons: likely to cause a query timeout/ incomplete results.