Retrieve large amount of data from REST API GitHub - github

How to retrieve large amount of data from REST API GitHub? Nowadays it provided only a small amount of data JSON from GitHub timeline, in many cases limited to only 300 events. I need a bigger volume to work in my Master Research and i need to know how to via the REST API.

github's api (and most IMHO good apis) use pagination to reduce load on themselves and clients. you could write a simple script to go through all the "pages" of results one at a time, then combine your results after the fact locally.
more info here:
http://developer.github.com/guides/traversing-with-pagination/

Related

how to transfer a large amount of data through spring rest api

I want to send the huge amount of data through spring rest api. some people have suggested like to make into the chunks and then send via rest. can anyone suggest me how to make into possible.

REST versus more complicated data requests

REST APIs work great for get-one, get-a-list etc.
But our frontend has a dashboard, and one part of the dashboard is a more complicated. It requires a query that aggregates/joins several different resources.
Returning the data is not a problem. But what of the taxonomy of the endpoint that returns this data? Since the data is not a resource, what should the URL look like?
For REST principles it does not matter much if data returned 'aggregates/joins several different resources'. It is implementation detail of underlying data store. The dashboard should not care how exactly that store is implemented, if it uses joins, multiple queries.
Whatever is displayed on dashboard (single item or list of items) still may be treated as resource.
Example: Imagine use case when dashboard shows aggregated user profile from multiple portals (Facebook, Linkedin, etc). You may still have REST resource /user/id for that, even if obtaining that single resource would require many complex operations.

exporting data from Bluemix Presence Insights

I'm trying to export data from Presence Insights on Bluemix, I followed the following documentation:
https://presenceinsights.ng.bluemix.net/pidocs/analytics/export/
however I can't seem to find export button mentioned inside the document.
Data can be exported from the IBM Presence Insights Dashboard if you have data available. There are also REST APIs for exporting data. They are documented in the Floors, Sites, and Zones sections of the API Reference.
There were REST APIs in the product some time ago, but they were found to have limitations that made them less useful in production. In particular, the amount of data that builds up forces the response time on the API to grow beyond what the Bluemix infrastructure allowed. The API requests would timeout. To that end, the APIs were backed out, but it appears the documentation was left. That will be removed shortly.
Presence Insights still understands the value of exporting the data, so a new scheme is under investigation. For example, it would be ideal if the data could be exported under the covers to a production data storage facility, on a regular time frame.
In the interim, an alternative solution would be to use a Subscription to gather the backend enter/exit/dwell/timeout events directly and roll your own solution to store only what you need in whatever facility works for your application.

Programmatic export/dump/mass data retrieval (BaaS)

Does anyone have experiences with programmatic exports of data in conjunction with BaaS providers like e.g. parse.com or StackMob?
I am aware that both providers (as far as I can tell from the marketing talk) offer a REST API which will allow for queries against the database, not only to be used by mobile clients but also by e.g. custom web apps.
I am also aware that both providers offer a manual export of data (parse.com via their web interface, StackMob via support).
But lets say I would like to dump all data nightly, so that I can import it into a reporting system for instance. Or maybe simply to have an up-to-date backup.
In this case, I would need a programmatic way to export/replicate the data stored in the backend. Manual exports are not an option for obvious reasons.
The REST APIs offered however seem to be designed for specific queries, not for mass reads (performance?). Let alone the pricing - I assume none of the providers would be happy about a nightly X Gigabyte data export via their REST API, so their probably will be a price tag.
I just couldn't find any specific information on this topic so far, so I was wondering if anyone else has already gone through this. Also, any suggestions on StackMob/parse alternatives are welcome, especially if related to the data export topic.
Cheers, Alex
Did you see the section of the Parse REST API on Batch operations? Batch operations reduce the number of API calls needed to grab data so that you are not using a call for every row you retrieve. Keep in mind that there is still a limit (the default is 100, but you can set it to a maximum of 1000). That means you are still limited to pulling down 1000 rows per API call.
I can't comment on StackMob because I haven't used it. At my present job, we are using Parse and we wrote a C# app which compares the data in a Parse class with a SQL table and pulls down any changes.

Web API data paging without using OData syntax

What are the options in a web api to indicate that the returned data is paged and there is further data available.
ASP.Net Web API with OData uses a syntax similar to the following:
{
"odata.metadata":"http://myapi.com/api/$metadata#MyResource","value":[
{
"ID":1,"Name":"foo"
},
...
{
"ID":100,"Name":"bar"
}
],"odata.nextLink":"http://myapi.com/api/MyResource?$skip=20"
}
Are there any other ways to indicate the link to the next/previous 'page' of data without using a metadata wrapper around the results. Can this be achieved by using custom response headers instead?
Let's take a step back and think about WebAPI. WebAPI in essence is a raw data delivery mechanism. It's great for making an API and it elevates separation of concerns to a pretty good height (specifically eliminating UI concerns).
Using Web API, however, doesn't really change core of the issue you are facing. You're asking "how do I want to query my data store in an performant manner and return the data to the client efficiently?" Your decisions here really parallel the same question when building a more traditional web app.
As you noted, oData is one method to return this information. The benefit here is it's well known and well defined. The body of questions/blogs/articles on the topic is growing rapidly. The wrapper doesn't add any meaningful overhead.
Yet, oData is by no means the only way you can do this. We've had to cope with this since software has been displaying search results. It's tough to give you specific advice without really understanding your scenario. Here are some questions that bubbled up as I read your question :
Are your results sets huge but users only see the first one or two
pages?
Or do user tend to page through all of the results?
Are pages of results limited (like 20 or 50 per page) or 100's/ 1000's ?
Does the data set shift rapidly, so records are added as the user is
paging?
Are your result sets short and adding columns that repeat tolerable?
Do you have enough control over the client do do something out of band -- like custom HTTP headers, or a separate HTTP request that just asks for a query summary?
There really are hundreds of options depending on your needs. I don't know what you're using as a data store, but I wrote a post on getting row count efficiently. The issues there are very germane here, albeit from the DB perspective. It might help you get some perspective.