I'm looking that the follow endpoint is paginated
https://api.coingecko.com/api/v3/coins/bitcoin/tickers?page=1
Each page has 100 results but in my app, I need all results at once, since this is a free API , I researched and there are 65 pages available, which will exceed on the quota since I only have 10 - 50 request per minute and here I'm doing 65 requests , is there a way or format that I can access for example to get all pages results as one call ?
I tried with
https://api.coingecko.com/api/v3/coins/bitcoin/tickers?page=1,2,3,4,5
but it is always returning 100 elements despiting that the order is different
Related
To make a long story short, I need to gather geolocation data and signup date on ~23 million GitHub users for a historical data visualization project. I need to do this within a week's time.
I'm rate-limited to 5000 API calls/hour while authenticated. This is a very generous rate limit, but unfortunately, I've run into a major issue.
The GitHub API's "get all users" feature is great (it gives me around 30-50 users per API call, paginated), but it doesn't return the complete user information I need.
If you look at the following call (https://api.github.com/users?since=0), and compare it to a call to retrieve one user's information (https://api.github.com/users/mojombo), you'll notice that only the user-specific call retrieves the information I need such as "created_at": "2007-10-20T05:24:19Z" and "location": "San Francisco".
This means that I would have to make 1 API call per user to get the data I need. Processing 23,000,000 users then requires 4,600 hours or a little over half a year. I don't have that much time!
Are there any workarounds to this, or any other ways to get geolocation and sign up timestamp data of all GitHub users? If the paginated get all users API route returned full user info, it'd be smooth sailing.
I don't see anything obvious in the API. It's probably specifically designed to make mass data collection very slow. There's a few things you could do to try and speed this up.
Increase the page size
In your https://api.github.com/users?since=0 call add per_page=100. That will cut the number of calls getting the whole user list by 1/3.
Randomly sample the user list.
With a set of 23 million people you don't need to poll very single one to get good data about Github signup and location patterns. Since you're sampling the entire population randomly there's no poll bias to account for.
If user 1200 signed up on 2015-10-10 and user 1235 also signed up on 2015-10-10 then you know users 1201 to 1234 also signed up on 2015-10-10. I'm going to assume you don't need any more granularity than that.
Location can also be randomly sampled. If you randomly sample 1 in 10 users, or even 1 in 100 (one per page). 230,000 out of 23 million is a great polling sample. Professional national polls in the US have sample sizes of a few thousand people for an even bigger population.
How Much Sampling Can You Do In A Week?
A week gives you 168 hours or 840,000 requests.
You can get 1 user or 1 page of 100 users per request. Getting all the users, at 100 users per request, is 230,000 requests. That leaves you with 610,000 requests for individual users or about 1 in 37 users.
I'd go with 1 in 50 to account for download and debugging time. So you can poll two random users per page or about 460,000 out of 23 million. This is an excellent sample size.
The number of returned results in Spotify API varies with different values of offset and limit. For instance:
https://api.spotify.com/v1/search?query=madonna&offset=0&limit=50&type=playlist
playlists.total= 164
https://api.spotify.com/v1/search?query=madonna&offset=0&limit=20&type=playlist
playlists.total= 177
https://api.spotify.com/v1/search?query=madonna&offset=10&limit=50&type=playlist
playlists.total= 156
https://api.spotify.com/v1/search query=madonna&offset=100&limit=50&type=playlist
playlists.total= 163
The real problem is that some items are missing when iterating over the results. This can be easily reproduced as follows:
Make the following request:
https://api.spotify.com/v1/search?query=redhouse&offset=0&limit=20&type=album
The response returns albums.total=27 and 20 items.
Make another request to get the next page:
https://api.spotify.com/v1/search?query=redhouse&offset=20&limit=20&type=album
The response returns albums.total=21 and 1 item. (6 missing items!)
Make the same request with offset=0 and limit=30
https://api.spotify.com/v1/search?query=redhouse&offset=0&limit=30&type=album
The response returns albums.total=27 and 27 items, which is correct.
This happens in all searches for albums, artist, tracks and playlists. A few people (including myself) reported it as a (critical) bug in the Spotify issue tracking system.
I just wonder if there is any reliable way of iterating over the results of a search.
I want to do a query like this:
search?q=KEY_NAME&type=page&fields=id,name,location&limit=500&offset=0
when I do this the first time the result is about 470 results, now I put offset to 471 and repeat the query
search?q=KEY_NAME&type=page&fields=id,name,location&limit=500&offset=471
and the result is empty.
Why? The key_name is a famous word like "fan" and I don't think that there are only 471 results on fb pages!
What is the problem?
Never use a limit that high, afaik a limit of 100 should be the maximum. Everything else may be buggy. If you use this API call, you get more than 500 with paging:
/search?pretty=0&fields=idmname,location&q=fan&type=page&limit=100
Don´t use "offset", always use the "next" link in the JSON document to get the next batch of results: https://developers.facebook.com/docs/graph-api/using-graph-api/v2.4#paging
The next 100 entries would be available with the following endpoint for me:
/search?pretty=0&fields=idmname,location&q=fan&type=page&limit=100&after=OTkZD
Please refer to this following blog post in which it says
https://developers.facebook.com/blog/post/478/
gist of it.
As such, when querying the following tables and connections, use time-based paging instead of “offset” to ensure you are getting back as many results as possible with each call. For these Graph API connections, use the “since” and “until” parameters
I am currently creating an application that polls facebook for data. First request a page in this fashion...
pageID/posts?fields=id,message,created_time,type&limit=250
This returns the top 250 posts from a page. I then check if there page next is set and if it is make another request for the next 250 posts. I continue this recursively until there are no more posts.
With each post that is returned I go out and fetch the post details from the graph api as well.
My question is if I had 500 posts on a page. Would that equate to 502 requests? (500 requests for each post + 2 for parsing through page data to get posts) or am I incorrect in my understanding of a "request". I know when batching calls each query included in the batch actually counts as 1 request. The goal is to avoid the 600 calls / 600 second rate limiting. Thanks!
Every API call is...well, 1 request. So every time you use the /posts endpoint with whatever limit, it will be 1 request. For example, if you do that call you posted, it will be one request that returns 250 elements.
Batch requests are just faster, but each call in the batch counts as a request. So if you combine 10 calls in a batch, it will be 10 requests. The benefit of batch calls is really just that they are a lot faster: as fast as the slowest call in the batch.
If you want to get 500 posts with that example of yours, you would only need 2 calls. First one with 250 returned elements, second one by using the API call defined in the "next" value to get another 250. Just keep in mind that the default is usually 25 elements, and you can´t use any limit you want. There is a max limit for calls and it gets changed from time to time afaik so don´t count on getting the same result every time.
Btw, don't be to fixated on that 600calls/600seconds limit, it's just a general limit. The real limit is dynamic and depends on many factors. It's not public, of course. But if you really hit the limit, you are doing something wrong anyway.
Is there any restriction on the number of entries that are retrieved using a single call to the Network Updates API? I found this forum comment "The per-user limit is per call, so 300 requests with however many updates they have." on the thread
http://developer.linkedin.com/forum/increase-search-api-throttle-limit
I want to confirm that indeed there is no limit. I have received as many as 106 entries in a single call.
Thanks in advance.
The maximum number of updates returned from the Network Updates API appears to be 250. Performing the following query as an example:
http://api.linkedin.com/v1/people/~/network/updates?count=500
Even if I try to specify the start parameter at, say, 250, I can't get the next 250 updates from the API:
http://api.linkedin.com/v1/people/~/network/updates?count=250&start=250
So it looks like 250 is the max, with no ability to page beyond that.
UPDATE:
Have verified that 250 is the maximum number returned, either in a single call or via the paging parameters. Looks like the documentation has been updated to reflect this.