This question already has answers here:
github search limit results
(2 answers)
Closed 5 years ago.
So, I have created a query to retrieve some data from Github using its GraphQL API (tested with REST API; same problem). The query is partially shown below (showing just what matters):
query listRepos($queryString: String!, $numberOfRepos: Int!, $afterCursor: String!) {
rateLimit {
cost
remaining
resetAt
}
search(query: $queryString, type: REPOSITORY, first: $numberOfRepos, after: $afterCursor) {
repositoryCount
pageInfo {
endCursor
startCursor
}
<data related to the repository here>
}
}
And I did a small piece of code which is running on a browser to retrieve the data. The code works without problems up to the moment where it is doing its 50th request. At this point, the response comes without data related to pagination; that is, without any cursors related to the pageInfo object (pagination).
Note, at this point I am still far far away from getting even close to the limits of requests I can do. In fact, I have 5000 points at the beginning and each of these calls have a cost of 1. So, at the moment I stop receiving pagination cursors, I still have 4950 points left.
Also, at this point I have collected data of almost 1000 repositories. The repositoryCount shows more than 450,000 results for my search query. Hence, this is not a result of reaching the end of the pages/list/data.
So, at this point I can't move further and retrieve the rest of the data. I even tried using Insomnia and Postman to retrieve the data using the last valid cursor I have, but I only get a response without any cursor.
Why is that happening?
UPDATE
The same thing happens while trying to use the REST API. Using the query API and analyzing the response headers, I see that I am given only 34 pages of results (around 1000 repositories) although the search count returns more than 400,000 results.
Both the REST API and GraphQL API cap the number of items returned from a search query at 1,000 results: https://developer.github.com/v3/search/#about-the-search-api.
Just like searching on Google, you sometimes want to see a few pages of search results so that you can find the item that best meets your needs. To satisfy that need, the GitHub Search API provides up to 1,000 results for each search.
Related
I am new to MongoDB. And I have the following issue on currently developing web application.
We have an application where we use mongoDB to store data.
And we have an API where we search for the document via text search.
As an example: if the user type “New York” then the request should send the all the available data in the collection to the keyword “New York". (Here we call the API for each letter typed.) We have nearly 200000 data in the DB. Once the user searches for a document then it returns nearly 4000 data for some keywords. We tried with limiting the data to 5 – so it returns the top 5 data, and not the other available data. And we tried without limiting data now it returns hundreds and thousands of data as I mentioned. And it causes the request to slow down.
At Frontend we Bind search results to a dropdown. (NextJs)
My question:
Is there an optimizing way to search a document?
Are there any suggestions of a suitable way that I can implement this requirement using mongoDB and net5.0?
Or any other Implementation methods regarding this requirement?
Following code segment shows the query to retrieve the data to the incomming keyword.
var hotels = await _hotelsCollection
.Find(Builders<HotelDocument>.Filter.Text(keyword))
.Project<HotelDocument>(hotelFields)
.ToListAsync();
var terminals = await _terminalsCollection
.Find(Builders<TerminalDocument>.Filter.Text(keyword))
.Project<TerminalDocument>(terminalFeilds)
.ToListAsync();
var destinations = await _destinationsCollection
.Find(Builders<DestinationDocument>.Filter.Text(keyword))
.Project<DestinationDocument>(destinationFields)
.ToListAsync();
So this is a classic "autocomplete" feature, there are some known best practices you should follow:
On the client side you should use a debounce feature, this is a most. there is no reason to execute a request for each letter. This is most critical for an autocomplete feature.
On the backend things can get a bit more complicated, naturally you want to be using a db that is suited for this task, specifically MongoDB have a service called Atlas search that is a lucene based text search engine.
This will get you autocomplete support out of the box, however if you don't want to make big changes to your infra here are some suggestions:
Make sure the field your searching on is indexed.
I see your executing 3 separate requests, consider using something like Task.WhenAll to execute all of them at once instead of 1 by 1, I am not sure how the client side is built but if all 3 entities are shown in the same list then ideally you merge the labels into 1 collection so you could paginate the search properly.
As mentioned in #2 you must add server side pagination, no search engine can exist without one. I can't give specifics on how you should implement it as you have 3 separate entities and this could potentially make pagination implementation harder, i'd consider wether or not you need all 3 of these in the same API route.
Trying to find documentation or figure out how to replicate GitHub's v3 API Get all users endpoint via there v4 graphql API.
It's easy enough to query just about anything for a specific user but how can I retrieve a payload listing all users similarly to the v3 API payload?
Can someone point me to the correct documentation or even better provide an example that returns a list of users?
As far as I can tell there is no equivalent to "Get all users" in the v4 api, however, there is a hacky way that you can get close to it using a nodes query.
First you need to be able to generate node ID's to iterate over. Looking at mojombo (the first user by database ID) you can see how node ID's are derived.
$ curl https://api.github.com/users/mojombo
{
"login": "mojombo",
"id": 1,
"node_id": "MDQ6VXNlcjE=",
...
}
$ echo -n "MDQ6VXNlcjE=" | base64 -d
04:User1
It is the string 04:User followed by the users id (database ID).
Knowing this we can now generate a nodes query for 100 users at a time. NOTE: Not all database ID's are users, some are organisations or deleted users, and as such you will get a lot of NOT_FOUND errors.
query($ids:[ID!]!) {
nodes(ids:$ids) {
... on User {
login
}
}
}
variables {
"ids": ["MDQ6VXNlcjE=", "MDQ6VXNlcjI=", ...]
}
If you aren't limited to only using the v4 api but still want to take advantage of improved performance of the v4 api then there is a slightly less hacky alternative, which is to use the v3 api to get users 100 at a time then use the returned node_id field to perform a bulk v4 query across all 100 users using the above technique. Using this method gives far fewer NOT_FOUND errors and hence better utilises your rate limit.
To give an idea of the performance improvement you can get using the v4 api and this technique, the task I was performing went from taking an estimated ~474 days using the v3 api to less than 5 days to using this method.
The search query works well for this. You can query based on specific fields, and it will return a list of users. This is usually good if you are looking for one specific user. If you are looking for a large list of users, the search api isn't what you want, since it is optimized for finding one value based on inputs. See an example below:
{
search(query: "location:toronto language:Go", type: USER, first: 10) {
userCount
edges {
node {
... on User {
login
name
location
email
company
}
}
}
}
}
Let's paint a hypothetical picture for discussion.
Let's say a large company has 200 organizations each with 250 repositories and each of those repositories has 300 contributors.
Let's say I would like to build up a GraphQL query that answers the question:
Give me all contributors (and their privileges) of all repositories of all organizations in my account.
Obviously, pagination is needed.
But the way it is currently implemented, a pagination cursor is provided for each list of contributors, each list of repositories, and each list of organizations.
As a result, it is not possible to complete the query by following a single pagination cursor.
It is not clear to me that the query can be completed at all due to the ambiguity of specifying a pagination cursor for one list of contributors for one org/repo combo versus the next org/repo combo.
Thanks
Your initial query structure looks something like this (simplified):
query {
organizations(first: 10) {
repositories(first: 20) {
contributors(first: 30) {
name,
privileges
}
}
}
}
Now imagine this query would return a single pagination cursor. What should the next page look like?
next 10 organizations (with first 20 repositories, with first 30 contributors)
same 10 organizations, but next 20 repositories (with first 30 contributors)
same 10 organizations, with the same 20 repositories, but next 30 contributors
some wild mix of the above
When you build your own GraphQL API, you can design your cursor pagination according to your needs. But the GitHub API has to serve a wide range of consumers, and they chose a very flexible schema design, that enables the clients to fetch exactly the data they need, without overfetching. But in some cases it may take additional roundtrips to get all the data you need.
Let's look at this from a frontend perspective:
After the initial request you will display the first 10 orgs, and for each org the first 20 repos, and for each repo the first 30 contributors.
Now the user can decide of which data he wants more:
either load more orgs, or
load more repos for a specific org, or
load more contributors for a specific repo
Each of these decisions will result in a simple paginated query with one of the cursors the GitHub API provided. No need for an all-mighty pagination cursor.
(I highly doubt, that there's a UI/UX use case where you want to paginate everything at once)
Though in this case I'd say that the GitHub API is perfectly suited as it is. In my opinion it's not reasonable to display 200 * 250 * 300 = 15000000 contributors at once, because from a user's perspective that's just way too much.
Let's look at this from a backend perspective:
If you want to gather the data you described for analysis, aggregation or something similar on your backend server, and you already know that you need all the data, you may be able to skip pagination entirely by providing a large number for first. (may not work for GitHub's API - as far as I know they are limited to max. 100 entries per pagination).
Even if you are forced to use pagination, you are able to cache the results. Of course it still takes a few hundred roundtrips to the GitHub API, but this can be a scheduled job that runs once every night.
And because at this point you've already written all the necessary code, it's easy to implement some kind of partial refresh. For example if you know that "repo 42 of org 13" is pretty active, you're able to just refetch the data for this specific repo (on demand or in a shorter interval) and update your cache.
I don't know your specific use case, but as long as you don't need (nearly) live updates of this huge data set, I'd say that GitHub's API is sufficient and flexible enough for most people's requirements.
Explanation:
I am able to query the Google Core reporting APIv3 using the client library to get data on pageviews for specific URLs of a website I am working on. I want to get data(pageviews) for each day within a specified range. So far I am simply looping through the range, sending individual request to the API. in each request I am setting the same value for the start date and the end date.
Problem:
Obviously this gets the job done, BUT it is certainly not the best way to go about it. Because, assumming I want to get data for the past 3 months for each of about 2000 URIs. Then I will need 360000 number of requests and that value is well over the limit quota defined by Google.
Potential solution: So one way I thought of solving this issue is probably to send a request setting start-date and end-date to be a week apart but the API will return a sum of the values rather than the individual values.
main question: So is there a way to insist that these values should not be added up and returned as a sum but rather returned (as associative array or something like that) separately for each.
I hope the question is clear and that there is a solution! Thank you!
Very straightforward:
Metric: ga:pageview, Dimension: ga:date, Set a filter for your pagepath, and set a start-date and end-date.
Example:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3Axxyyzz&dimensions=ga%3Adate&metrics=ga%3Apageviews&filters=ga%3Apagepath%3D%3D%2Ffaq.html&start-date=2013-06-27&end-date=2013-07-11&max-results=50
This will return the pageviews for that the faq.html& page for each day in the time-frame.
You should check out the QueryExplorer. Great tool to find out how to structure queries.
I am performing a rest call to facebooks search API using type=event
e.x.
search?fields=id,name,picture,owner,description,start_time,end_time,location,venue,updated_time,ticket_uri&q=concert&type=event
I have looked through the documentation and still have a few questions about specific pagination behavior of the event search API.
If I used a broad search term like "ma" and keep querying the pagination ['next'] URL would I cycle through all facebook events starting with "ma"? Does the pagination array give any indication when there are no more results to return?.
Do these searches include past events? If so is it possible to eliminate past events using the "since" parameter?
What is the maximum for the limit parameter?
Update:
As far as I can tell the number of pages you can get from a facebook search is limited to 500. This includes pages that can be accessed via pagination. In other words a query with limit >=500 will not return a pagination url, likewise a query with limit 250 will only return one pages worth of pagination.
You will "next page" until the count of results comes less then the limit
I'm not sure if that is possible using a simple Graph Request. Maybe using FQL
I don't know exactly. But i used a 2000 limit one day. And it worked.
Other doubts you can get answers testing your resquests with this tool
https://developers.facebook.com/tools/explorer/
I am also doing the same thing like you. I am collecting public post using graph search api.
When there are no results available or you reach max limit pagination section will not be there in response. So you can always check for paging is there in json response or not something like this.
NextResult = DeserJsonFBResponce.paging != null ? DeserJsonFBResponce.paging.next : string.Empty;
I am not so sure about this with events but for public post i am able to eliminate post using science and until parameters.
Maximum for the limit parameter is 2000 per get request.