I try find latest updated repo on GitHub.
I use this two methods:
https://api.github.com/search/repositories?q=user:github+sort:updated+&per_page=5&type=all
https://api.github.com/users/github/repos?type=all&sort=updated&per_page=5
Why i get differend repos? Which method is working?
On GitHub web site i can see results like in the first link:
https://github.com/github
I went through the results of both the requests. It looks like in the first case sort:updated uses pushed_at field to sort the results. In the second case, sort=updatedis using updated_at field to sort the results. So, depending on which field you would like to sort your results, you could use either. Strangely, i could not find any documentation of this difference.
Related
I wonder how can I send a GET request to GitHub search API, specifically https://api.github.com/search/repositories and make the query to include several languages instead of one.
Here's my current query.
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala&sort=stars&order=desc&per_page=10
I have tried doing something like this but it didn't work as well
https://api.github.com/search/repositories?q=stars:%3E=1000+language:[scala, java]&sort=stars&order=desc&per_page=10
Thanks for your help
You need to pass in multiple language: element for being able to pass multiple languages to the query as per the doc.
For your specific case, the query would be :
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala+language:java&sort=stars&order=desc
with pagination applied it would be :
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala+language:java&sort=stars&order=desc&per_page=10
However, with pagination applied your search results will be limited in the browser.
I would like to list public GitHub repositories with the latest create/update/push timestamps (for me any of these is acceptable). Can I achieve this with the GitHub API?
I have tried the following:
Tried using /repositories endpoint, and use the link header to navigate to the last page. However, the link header I receive only has first and next links, whereas I need a last link.
Tried using /search/repositories endpoint. This will work as long as I have a keyword or filter in the q parameter, but it will not accept an empty q parameter.
I got in touch with GitHub support, and there are two solutions to this:
Use binary search on the since parameter of the /repositories endpoint to find the last page.
Cons: may quickly exhaust the API rate limit.
Use the /search/repositories endpoint with an always-true predicate such as stars>=0.
Cons: likely to cause a query timeout/ incomplete results.
I am using ckan 2.6.0
According with the documentation: http://docs.ckan.org/en/latest/api/legacy-api.html
I am trying to use the endpoint /rest/dataset and works (only for public data but works), it only returns an array of datasets names, and nothing else, an example can be found here http://demo.ckan.org/api/1/rest/dataset
Is there a way to get a complete listing for datasets ? I also tried the search endpoint and returns the same array.
For example I would like to get the title, description, tags, file types, etc, like in the image below:
The REST api is deprecated/unmaintained and has been for a long time. Follow the up-to-date API documentation here.
package_search is your best bet: http://demo.ckan.org/api/action/package_search
That gives you a batch of datasets. Get more by paging through using the 'start' and 'rows' parameters.
If you simply want them all, then it's much better still to use a bulk download that some sites offer, such as data.gov.uk, which supplies it complete as a simple JSONL download: Meta-data for data.gov.uk datasets.
For my team's weekly builds, I go through all pull requests from the company GitHub and pull out the PRs associated to my team. This requires an annoying sieving step that requires a walk-through of the company's previous week of code contribution.
I looked at the official GitHub search documentation (HERE) and found the "author" field could be used to narrow down the search in the way I want, but when I try this at https://github.com/pulls it only works on one author at a time.
Is there a way to search across a list of authors?
For a little extra context, my team operates across a large list of repos, all of which are under a blanket organization which houses all repos across the company.
Make sure that you are using the full search at https://github.com/search.
Then simply add extra author: <name> fields to your query. The searching engine will OR fields. For example:
is:pr author:username1 author:username2
(Note that this only works on https://github.com/search. The search syntax on other pages, like https://github.com/pulls, is severely limited and does not support searching by multiple authors. If you try the same search on https://github.com/pulls, GitHub will simply ignore all but one author that you list.)
To limit it to repositories by a specific owner, add the user: <owner> field to the query.
Using the route github.com/search instead of github.com/pulls is the "right" answer in some sense, but I like the format of the /pulls page better. When working in a small team my approach is to use /pulls but substitute "involves" for "author", like this (for reference, the same query using /search and "author").
You will get "extra" hits where the author is someone outside the list, but it's another trick to know. (Names in the examples picked at random from recent public PRs)
You could simply use the advanced search for that: https://github.com/search/advanced 🤗
Option 1: Using Github's Search Query Language
Go to https://github.com/search
Type in a query following the format of this example (replacing author:* with your usernames.
Example: is:pr repo:zino-hofmann/graphql-flutter author:apackin author:kvenn
Explained
is:pr - only PRs (since Github treats Issues and PRs both as "Issues")
repo: - only show PRs in that repo
author: - only show PRs for these authors
It shows as "Issues", but the list will only include PRs.
Option 2: Fancy Bookmark/Alfred/Spotlight Search
You can modify the query params in the following URL to have the list of people on your team.
Replacing <username1,2,3,4> with your teammates Github username's.
Replacing <your_company> with your company URL (or removing that entirely if not on enterprise).
https://github.<your_company>.com/search?q=author%3A<username1>+author%3A<username2>+author%3A<username3>+author%3A<username4>+is%3Apr&type=Issues
Option 3: Using Github's Advanced Search UI
You can use Github's "Advanced Search" to achieve what you're looking for without needing to learn Github's query language.
For public repos: http://github.com/search/advanced
For internal/enterprise repos: http://github.<your_company>.com/search/advanced
You can use the fields below for filtering:
To filter for specific repos, use "Advanced options" -> "In these repositories"
To filter for specific authors, use "Issues options" -> "Opened by the author"
It uses query params under the hood, so you can generate the search with your UI and copy and paste it (to use for Option 3).
Note: You'll need to add "is:pr" to the resulting search query, no way to do that in the UI.
I'm interested in getting a count of github repos for a certain set of languages (with historical data if possible.)
Here are things I've tried to start collecting the stats myself:
Screen scraping a page like:
https://github.com/search?q=language%3Aperl&type=&ref=simplesearch
Using the github API:
https://api.github.com/legacy/repos/search/KEYWORD?language=perl
but unfortunately this seems to require a KEYWORD to get any results. Also, I only need a count not the meta data on each repo.
I'm also interested in historical data, and it seems that those stats might already be available somewhere.
Any ideas on better ways to get repo counts by language and/or historical data?
You can try this:
https://api.github.com/search/repositories?q=language:Python
Also, you can query the github archive.
Using big query interface, the query should be:
bq query 'SELECT repository_language, count(repository_language) as pushes
FROM [githubarchive:github.timeline]
WHERE type="CreateEvent" and repository_fork == "false"
GROUP BY repository_language
ORDER BY pushes DESC'
This query generates statistics of number of repos per language.