Paging in inbox threads/comments does not work? - facebook

I am tryning to list all messages for a thread in the inbox. I notice that I get the 25 last messages by default by doing something like this:
https://graph.facebook.com/<threadID>/comments?access_token=<token>
I get data for the 25 last messages in the thread, in this case message 4 to 28. The first message has a created_time" of "2011-01-21", the last (newest) has a
"created_time" of "2013-09-24".
The data returned for the "comments" connection has paging, the "next" and "previous" links are present and looks like this:
"previous"
https://graph.facebook.com/<threadID>/comments?access_token=<token>&limit=25&since=1380049638&__paging_token=<threadID>_28"
"next"
https://graph.facebook.com/<threadID>/comments?access_token=<token>&limit=25&until=1295625728&__paging_token=<threadID>_4
However, both return empty data sets!
How can I get this to work?
Another obeservation: when experimenting with "until", I noticed that when setting "until=2013-02-23" or earlier the response is also an empty data set!
I have also noticed another thing: the default limit seems to be 25 messages, however even when setting limit to a high number (like "limit=100) you only get around 28-30 messages per request. So it seems that for the thread/comments connections there are two problems: 1) "limit=" does not work as expected 2) "until=" does not work as expected: going back before a certain date/time returns an empty data set (this is why the paging does not work I guess).
Any ideas on how to get around this?

If you have a problem with next URL for the pagination, try using the offset along with the limit parameters in the URI.
For example, instead of making an API call to <threadID>/comments, make a call to /comments?limit=100&offset=0. This will start the list of the messages from an offset of 0 and will display a list of 100 messages on each page. The next URL will work just fine in this case. You can however increase the limit of the messages per page.
Also, there are some issues with the paging. Have a look at this post to learn how it works actually.

Related

Keep all fields option creating duplicate records in http client origin in StreamSets

I have an http client origin which gives a json response. Pipeline uses pagination (by page number). When I am enabling ‘Keep all fields’ option in http client it is creates the duplicates of first record in every page. Say I have 10 records in the file, it writes first record 10 times in the same, 1st record in the second page 10 times and so on. basically it repeats first record for the entire page. Any way to fix this issue? We need to ‘Keep all fields’ enabled to get page properties for processing within the job.

How to design a REST API to fetch a large (ephemeral) data stream?

Imagine a request that starts a long running process whose output is a large set of records.
We could start the process with a POST request:
POST /api/v1/long-computation
The output consists of a large sequence of numbered records, that must be sent to the client. Since the output is large, the server does not store everything, and so maintains a window of records with a upper limit on the size of the window. Let's say that it stores upto 1000 records (and pauses computation whenever this many records are available). When the client fetches records, the server may subsequently delete those records and so continue with generating more records (as more slots in the 1000-length window are free).
Let's say we fetch records with:
GET /api/v1/long-computation?ack=213
We can take this to mean that the server should return records starting from index 214. When the server receives this request, it can assume that the (well-behaved) client is acknowledging that records up to number 213 are received by the client and so it deletes them, and then returns records starting from number 214 to whatever is available at that time.
Next if the client requests:
GET /api/v1/long-computation?ack=214
the server would delete record 214 and return records starting from 215.
This seems like a reasonable design until it is noticed that GET requests need to be safe and idempotent (see section 9.1 in the HTTP RFC).
Questions:
Is there a better way to design this API?
Is it OK to keep it as GET even though it appears to violate the standard?
Would it be reasonable to make it a POST request such as:
POST /api/v1/long-computation/truncate-and-fetch?ack=213
One question I always feel like that needs to be asked is, are you sure that REST is the right approach for this problem? I'm a big fan and proponent REST, but try to only apply to to situations where it's applicable.
That being said, I don't think there's anything necessarily wrong with expiring resources after they have been used, but I think it's bad design to re-use the same url over and over again.
Instead, when I call the first set of results (maybe with):
GET /api/v1/long-computation
I'd expect that resource to give me a next link with the next set of results.
Although that particular url design does sort of tell me there's only 1 long-computation on the entire system going on at the same time. If this is not the case, I would also expect a bit more uniqueness in the url design.
The best solution here is to buy a bigger hard drive. I'm assuming you've pushed back and that's not in the cards.
I would consider your operation to be "unsafe" as defined by RFC 7231, so I would suggest not using GET. I would also strongly advise you to not delete records from the server without the client explicitly requesting it. One of the principles REST is built around is that the web is unreliable. Under your design, what happens if a response doesn't make it to the client for whatever reason? If they make another request, any records from the lost response will be destroyed.
I'm going to second #Evert's suggestion that you absolutely must keep this design, you instead pick a technology that's build around reliable delivery of information, such as a messaging queue. If you're going to stick with REST, you need to allow clients to tell you when it's safe to delete records.
For instance, is it possible to page records? You could do something like:
POST /long-running-operations?recordsPerPage=10
202 Accepted
Location: "/long-running-operations/12"
{
"status": "building next page",
"retry-after-seconds": 120
}
GET /long-running-operations/12
200 OK
{
"status": "next page available",
"current-page": "/pages/123"
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "building next page",
"retry-after-seconds": 120
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "complete"
}
GET /pages/123
{
// a page of records
}
DELETE /pages/123
// remove this page so new records can be made
You'll need to cap out page size at the number of records you support. If the client request is smaller than that limit, you can background more records while they process the first page.
That's just spitballing, but maybe you can start there. No promises on quality - this is totally off the top of my head. This approach is a little chatty, but it saves you from returning a 404 if the new page isn't ready yet.

Facebook Request(s): what counts as 1 request?

I am currently creating an application that polls facebook for data. First request a page in this fashion...
pageID/posts?fields=id,message,created_time,type&limit=250
This returns the top 250 posts from a page. I then check if there page next is set and if it is make another request for the next 250 posts. I continue this recursively until there are no more posts.
With each post that is returned I go out and fetch the post details from the graph api as well.
My question is if I had 500 posts on a page. Would that equate to 502 requests? (500 requests for each post + 2 for parsing through page data to get posts) or am I incorrect in my understanding of a "request". I know when batching calls each query included in the batch actually counts as 1 request. The goal is to avoid the 600 calls / 600 second rate limiting. Thanks!
Every API call is...well, 1 request. So every time you use the /posts endpoint with whatever limit, it will be 1 request. For example, if you do that call you posted, it will be one request that returns 250 elements.
Batch requests are just faster, but each call in the batch counts as a request. So if you combine 10 calls in a batch, it will be 10 requests. The benefit of batch calls is really just that they are a lot faster: as fast as the slowest call in the batch.
If you want to get 500 posts with that example of yours, you would only need 2 calls. First one with 250 returned elements, second one by using the API call defined in the "next" value to get another 250. Just keep in mind that the default is usually 25 elements, and you can´t use any limit you want. There is a max limit for calls and it gets changed from time to time afaik so don´t count on getting the same result every time.
Btw, don't be to fixated on that 600calls/600seconds limit, it's just a general limit. The real limit is dynamic and depends on many factors. It's not public, of course. But if you really hit the limit, you are doing something wrong anyway.

FQL/GRAPH Api Logic

I have a logic problem I can't seem to solve (might be possible).
Example:
I am inside 100 facebook groups
I need the 10 lastest posts of EACH group I am in.
That's pretty much it but I can't seem to find a way to do this without making a foreach loop calling the api over and over again, if I had a couple hundred more groups it would be impossible.
PS: I'm using FQL atm but am able to use graph, I've coded this in like 3 different ways but no success.
This is the farthest I could get:
SELECT actor_id,source_id FROM stream WHERE source_id IN (select gid from group_member where uid = me())
It only returns from one page, maybe there's no way to return all of this without a foreach asking for each groups 10 lastest messages.
There's no need to use FQL of batching. This can be done with a simple Graph API request IMHO:
GET /me/groups?fields=id,name,feed.fields(id,message).limit(10)
This will return 10 posts for each of your groups. In case there too much data to be returned, try setting the limit parameter for the base query as well:
GET /me/groups?fields=id,name,feed.fields(id,message).limit(10)&limit=20
Then, you'll get a next field in the result JSON. By calling the URL contained in this field, you'll get your next results. Do this until the result is empty, then you reached the end.
You can use batch calls, described here https://developers.facebook.com/docs/graph-api/making-multiple-requests/
Using batch requests, you can request upto 50 calls in one go. Note than batch request doesn't increase the rate limits, so if you make 50 requests in batch, it will be considered as 50 calls, and not one. However you will get the response in a shorter time.
If you think you're making too many calls, you should put some delay in between calls and avoid rate limiting.

How to implement robust pagination with a RESTful API when the resultset can change?

I'm implementing a RESTful API which exposes Orders as a resource and supports pagination through the resultset:
GET /orders?start=1&end=30
where the orders to paginate are sorted by ordered_at timestamp, descending. This is basically approach #1 from the SO question Pagination in a REST web application.
If the user requests the second page of orders (GET /orders?start=31&end=60), the server simply re-queries the orders table, sorts by ordered_at DESC again and returns the records in positions 31 to 60.
The problem I have is: what happens if the resultset changes (e.g. a new order is added) while the user is viewing the records? In the case of a new order being added, the user would see the old order #30 in first position on the second page of results (because the same order is now #31). Worse, in the case of a deletion, the user sees the old order #32 in first position on the second page (#31) and wouldn't see the old order #31 (now #30) at all.
I can't see a solution to this without somehow making the RESTful server stateful (urg) or building some pagination intelligence into each client... What are some established techniques for dealing with this?
For completeness: my back-end is implemented in Scala/Spray/Squeryl/Postgres; I'm building two front-end clients, one in backbone.js and the other in Python Django.
The way I'd do it, is to make the indices from old to new. So they never change. And then when querying without any start parameter, return the newest page. Also the response should contain an index indicating what elements are contained, so you can calculate the indices you need to request for the next older page. While this is not exactly what you want, it seems like the easiest and cleanest solution to me.
Initial request: GET /orders?count=30 returns:
{
"start"=1039;
"count"=30;
...//data
}
From this the consumer calculates that he wants to request:
Next requests: GET /orders?start=1009&count=30 which then returns:
{
"start"=1009;
"count"=30;
...//data
}
Instead of raw indices you could also return a link to the next page:
{
"next"="/orders?start=1009&count=30";
}
This approach breaks if items get inserted or deleted in the middle. In that case you should use some auto incrementing persistent value instead of an index.
The sad truth is that all the sites I see have pagination "broken" in that sense, so there must not be an easy way to achieve that.
A quick workaround could be reversing the ordering, so the position of the items is absolute and unchanging with new additions. From your front page you can give the latest indices to ensure consistent navigation from up there.
Pros: same url gives the same results
Cons: there's no evident way to get the latest elements... Maybe you could use negative indices and redirect the result page to the absolute indices.
With a RESTFUL API, Application state should be in the client. Here the application state should some sort of time stamp or version number telling when you started looking at the data. On the server side, you will need some form of audit trail, which is properly server data, as it does not depend on whether there have been clients and what they have done. At the very least, it should know when the data last changed. No contradiction with REST here.
You could add a version parameter to your get. When the client first requires a page, it normally does not send a version. The server replies contains one. For instance, if there are links in the reply to next/other pages, those links contains &version=... The client should send the version when requiring another page.
When the server recieves some request with a version, it should at least know whether the data have changed since the client started looking and, dependending of what sort of audit trail you have, how they have changed. If they have not, it answer normally, transmitting the same version number. If they have, it may at least tell the client. And depending how much it knows on how the data have changed, it may taylor the reply accordingly.
Just as an example, suppose you get a request with start, end, version, and that you know that since version was up to date, 3 rows coming before start have been deleted. You might send a redirect with start-3, end-3, new version.
WebSockets can do this. You can use something like pusher.com to catch realtime changes to your database and pass the changes to the client. You can then bind different pusher events to work with models and collections.
Just Going to throw it out there. Please feel free to tell me if it's completely wrong and why so.
This approach is trying to use a left_off variable to sort through without using offsets.
Consider you need to make your result Ordered by timestamp order_at DESC.
So when I ask for first result set
it's
SELECT * FROM Orders ORDER BY order_at DESC LIMIT 25;
right?
This is the case when you ask for the first page (in terms of URL probably the request that doesn't have any
yoursomething.com/orders?limit=25&left_off=$timestamp
Then When receiving your data set. just grab the timestamp of last viewed item. 2015-12-21 13:00:49
Now to Request next 25 items go to: yoursomething.com/orders?limit=25&left_off=2015-12-21 13:00:49 (to lastly viewed timestamp)
In Sql you would just make the same query and say where timestamp is equal or less than $left_off
SELECT * FROM (SELECT * FROM Orders ORDER BY order_at DESC) as a
WHERE a.order_at < '2015-12-21 13:00:49' LIMIT 25;
You should get a next 25 items from the last seen item.
For those who sees this answer. Please comment if this approach is relevant or even possible in the first place. Thank you.