Multiple source_ids in a "stream" FQL query - facebook

Since some time I'm getting inconsistent behavior when using multiple source_ids in an FQL query to the stream table. The query is e.g.:
SELECT source_id, post_id, created_time, message, permalink, type, attachment
FROM stream
WHERE (created_time >= 1338444667) and
((source_id = 74133697733 and actor_id = 74133697733) or
(source_id = 259126564951 and actor_id = 259126564951))
It seems that the timespan from which posts are returned is quite limited. But what's the cutoff value? Sometimes I'm not getting posts from 30 minutes ago, sometimes I'm getting multiple posts from between 2 and 3 hours ago, only to stop getting all of them on the next query. Is there a rule?
There's no mention of special multi-source_id queries treatment in https://developers.facebook.com/docs/reference/fql/stream/.

Related

How to query last 50 photos by friends on Facebook

I'm trying to query Facebook for the last 50 photos posted by my friends.
Sounds easy enough, right? So far I've found two different approaches using FQL, and neither works.
Photo Method [Explorer]
SELECT pid, owner, src_big, caption, created
FROM photo WHERE owner IN (
SELECT uid1 FROM friend WHERE uid2=me()
) ORDER BY created DESC LIMIT 50
The problem with this is that it's doing a depth-first search when I really want a breadth first search. It's going through my friends list, finding the first friend with a bunch of photos, and then giving me back her most recent photos, ignoring more recent photos from friends further down on my friends list.
Stream Method [Explorer]
SELECT pid, owner, src_big, caption, created
FROM photo where pid in (
SELECT attachment.media.photo.pid
FROM stream
WHERE filter_key IN (
SELECT filter_key FROM stream_filter WHERE uid=me()
) AND type = 247
ORDER BY created_time DESC
LIMIT 50
)
This is much closer to what I want, except that it's giving me photos from pages, too, and I only want photos from friends. As far as I can tell, there is no way to filter uid by type (e.g. friend, page, etc.).
What am I missing? Is there a third way? Is this impossible?
I tried your query and timeout error encountered(by this time):
Your main key should put filter_key with app_2305272732 in the first place to return photo only, instead of filter by type=247 afterward, so you can get more results:
SELECT pid, owner, src_big, caption, created FROM photo where pid IN(SELECT attachment.media.photo.pid FROM stream WHERE filter_key='app_2305272732' AND actor_id IN(SELECT uid1 FROM friend WHERE uid2=me()) AND created_time<=now() ORDER BY created_time DESC LIMIT 200)
As you can see, the actor_id only include friends, but the response can be faster(~2 seconds for 2000 friends) if you put actor_id directly, so you may consider to cache the friends id (depends on your app flow).
actor_id IN(FRIEND_ID1,FRIEND_ID2,FRIEND_ID3...)
Also, there's no guarantee to get 50 photos in once, but you can navigate to next page by created_time(Get from last photo's created field). Even thought the created time from photo table may slightly different(~1 seconds) than stream table, but it should acceptable.
created_time<CREATED
Finally, you should increase the limit, so you can get more result at once. I found 200 is acceptable and i'm able to get more than 20 photos in one page.
LIMIT 200
Update:
You should use source_id instead of actor_id, to get much more results. So the correct query is:
SELECT pid, owner, src_big, caption, created FROM photo where pid IN(SELECT attachment.media.photo.pid FROM stream WHERE filter_key='app_2305272732' AND source_id IN(SELECT uid1 FROM friend WHERE uid2=me()) AND created_time<=now() ORDER BY created_time DESC LIMIT 200)
The LIMIT parameter can be decrease if you do this query, let's say LIMIT 50, so your single query can be faster and avoid timeout request failed(if the photos data is too many and too heavy), it's depends on your decision.
I have to remind you, you can't simply use created from photo table to query next page of stream table, because a feed may contains many photos and the created time for each photos can be much more different(the distinct can be in hours!). So you should consider to do multiquery to retrieve created_time of stream table if you want to do next page, for example:
{"query1":"SELECT attachment.media.photo.pid, created_time FROM stream WHERE filter_key='app_2305272732' AND source_id IN(SELECT uid1 FROM friend WHERE uid2=me()) AND created_time<=now() ORDER BY created_time DESC LIMIT 200", "query2":"SELECT pid, owner, src_big, caption, created FROM photo where pid IN(SELECT attachment.media.photo.pid FROM #query1)"}
Also, please note that ORDER BY created DESC is sort the feed, not photo, so it's normal if you see the create time is not in the order.
You may consider to do comparison for every single photo's created time with next page's created_time, and only show the photo which are currently earlier than next page's created_time. For example, if the next page created_time is 5.00 PM, and you have the photo A with created time at 2.00 PM(get from first page). You can just hold on the photo A until the next page created_time is older than 2.00 PM, so you can display photo A to the user. Of course, you have to do sorting after you insert photo A to current page's photos.

Facebook FQL properly sorting stream table using ORDER BY created_time

I'm having some trouble to retrieve a sequence of posts using FQL when sorting by *created_time*. It seems that when I try to retrieve posts ordering by *created_time*, FQL first retrieves the first 50 posts (LIMIT 50) ordered by *updated_time* and then apply ORDER BY on these first 50 posts.
The *updated_time* datetime gets updated whenever someone comments on that post, so if someone comments on a post that's 1 year old, the last post from my first query will be that post and my next sequence will start from that point in time (older than 1 year).
This causes me a problem because to get the next 50 posts in the sequence, I have to use the *created_time* datetime from the last post from the first query.
Any ideas on how to this the correct way?
SELECT post_id,actor_id,message,created_time,updated_time FROM stream WHERE source_id=xxx AND created_time < xxxxxxxx ORDER BY created_time DESC LIMIT 50
Example:
There is a post with timestamps: 'updated_time': 1372837741 and 'created_time': 1372081023
With the following query, there are no results although the post's created_time is inside the specified created_time range.
SELECT post_id,actor_id,message,permalink,created_time,updated_time FROM stream WHERE source_id=xxxxxxxxx
AND created_time < 1372081033
AND created_time > 1372081013
But if I change the query's range to include the post's updated_time, the post above is returned.
SELECT post_id,actor_id,message,permalink,created_time,updated_time FROM stream WHERE source_id=xxxxxxxxxxx
AND created_time < 1372837742 AND created_time > 1372081013
This is probably a bug.
Thanks in advance!

Facebook FQL query returns empty result set when LIMIT is used with DESC order

You can try this using the Graph API Explorer:
SELECT post_id, id, fromid, time, text, user_likes, likes FROM comment WHERE post_id ='126757470715601_530905090300835' AND time < '1366318653' ORDER BY time DESC LIMIT 30
This will return an empty result set. BUT if I remove DESC from the query, it will return 30 results.
SELECT post_id, id, fromid, time, text, user_likes, likes FROM comment WHERE post_id ='126757470715601_530905090300835' AND time < '1366318653' ORDER BY time LIMIT 30
So adding DESC to the order by somehow changed the way LIMIT behaves. Can anyone shed some light on this?
Update:
The time is within range. Sorry by my wrong calculation before. What you can do is increase the comment number to huge value, such as LIMIT 150, because there's may be a lot of comments is_privacy='0' around 30 items.

Facebook FQL result set mismatch

When I run the following FQL query in two Graph API explorer windows at the same time, I get two different result sets:
SELECT
post_id,
actor_id,
target_id,
created_time,
type,
permalink,
message,
description,
attachment
FROM stream
WHERE filter_key IN (SELECT filter_key
FROM stream_filter
WHERE uid=me() AND type='newsfeed')
ORDER BY created_time ASC
LIMIT 50
Here are the differences:
The 1st result set has a few extra posts in the start, whereas the 2nd has a few extra posts in the end (I use ORDER BY created_time ASC in the FQL.)
One of the posts in between has an empty permalink in the 1st result set, whereas its present in the 2nd result set.
There are few posts not present in the 1st result set which are present in 2nd, and vice versa.
Is this because of load balancing within the Facebook server farm?
How can one make sure they are getting all posts which can be seen on the user's newsfeed via the API?

Facebook FQL stream limit?

I want to get the full history of my wall. But I seem to hit a limit somewhere back in June.
I do multiple calls like this:
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID LIMIT 50
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID LIMIT 51,100
and so on...
But I always end up on the same last (first) post on my wall.
Through facebook.com I can go back much longer so Facebook obviously have the data.
Why am I not getting older posts?
Is there another way to scrape my history?
From http://developers.facebook.com/docs/reference/fql/stream :
The stream table is limited to the last 30 days or 50 posts, whichever is greater
I am experiencing the same thing. I don't understand it at all, but it appears that the offset cannot be greater than the limit * 1.5
Theoretically, this means that always increasing the limit to match the offset would fix it, but I haven't been able to verify this (I'm not sure whether the problems I'm seeing are other bugs in my code or if there are other limitations I don't understand about getting the stream).
Can anyone explain what I'm seeing and whatever I'm missing?
You can reproduce my results by going to the FQL Test Console:
http://developers.facebook.com/docs/reference/rest/fql.query
pasting in this query:
SELECT post_id, created_time, message, likes, comments, attachment, permalink, source_id, actor_id
FROM stream
WHERE filter_key IN
(
SELECT filter_key
FROM stream_filter
WHERE uid=me() AND type='newsfeed'
)
AND is_hidden = 0 limit 100 offset 150
When you click "Test Method" you will see one of the 2 results I am getting:
The results come back: [{post_id:"926... (which I expected)
It returns empty [] (which I didn't expect)
You will likely need to experiment by changing the "offset" value until you find the exact place where it breaks. Just now I found it breaks for me at 155 and 156.
Try changing both the limit and the offset and you'll see that the empty results don't occur at a particular location in the stream. Here are some examples of results I've seen:
"...limit 50 offset 100" breaks, returning empty []
"...limit 100 offset 50" works, returning expected results
"...limit 50 offset 74" works
"...limit 50 offset 75" breaks
"...limit 20 offset 29" works
"...limit 20 offset 30" breaks
Besides seeing the limit=offset*1.5 relationship, I really don't understand what is going on here.
Skip the FQL and go straight to graph. I tried FQL and it was buggy when it came to limits and getting specified date ranges. Here's the graph address. Put in your own page facebook_id and access_token:
https://graph.facebook.com/FACEBOOK_ID/posts?access_token=ACCESS_TOKEN
Then if you want to get your history set your date range using since, until and limit:
https://graph.facebook.com/FACEBOOK_ID/posts?access_token=ACCESS_TOKEN&since=START_DATE&until=END_DATE&limit=1000
Those start and end dates are in unix time, and I used limit because if I didn't it would only give me 25 at a time. Finally if you want insights for your posts, you'll have to go to each individual post and grab the insights for that post:
https://graph.facebook.com/POST_ID/insights?access_token=ACCESS_TOKEN
I dont know why, but when I use the filter_key = 'others' the LIMIT xx works.
Here is my fql query
SELECT message, attachment, message_tags FROM stream WHERE type = 'xx' AND source_id = xxxx AND is_hidden = 0 AND filter_key = 'others' LIMIT 5
and now I get exactly 5 posts...when i use LIMIT 7 i get 7 and so on.
As #Subcreation said, something is wack with FQL on stream with LIMIT and OFFSET and higher LIMIT/OFFSET ratios seem to work better.
I have created an issue on it Facebook at http://developers.facebook.com/bugs/303076713093995. I suggest you subscribe to it and indicate you can reproduce it to get it bumped up in priority.
In the bug I describe how a simple stream FQL returns very inconsistent response counts based on its LIMIT/OFFSET. For example:
433 - LIMIT 500 OFFSET 0
333 - LIMIT 500 OFFSET 100
100 - LIMIT 100 OFFSET 0
0 - LIMIT 100 OFFSET 100
113 - LIMIT 200 OFFSET 100
193 - LIMIT 200 OFFSET 20
You get a maximum likes of 1000 when using LIMIT
FQL: SELECT user_id FROM like WHERE object_id=10151751324059927 LIMIT 20000000
You could specify created_time for your facebook query.
create_time field is unix based time. You could convert it with such convertor http://www.onlineconversion.com/unix_time.htm, or use program methods depends on you language.
Template based on your request
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID and created_time>BEGIN_OF_RANGE and created_time>END_OF_RANGE LIMIT 50
And specific example from 20.09.2012 to 20.09.2013
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID and created_time>1348099200 and created_time>1379635200 LIMIT 50
I have a similar issue trying to download older posts from a public page, adding a filter ' AND created_time < t', and setting t for each query to the minumum created_time I got so far. The weird thing is that for some values of t this returns an empty set, but if I manually set t back of one or two hours, then I start getting results again. I tried to debug this using the explorer and got to a point where a certain t would get me 0 results, and t-1 would get results, and repeating would give me the same behavior.
I think this may be a bug, because obviously if I created_time < t-1 gives me results, then also created_time < t should.
If it was a question of rate limits or access rights, then I should get an error, instead I get an empty set and only for some values of t.
My suggestion for you is to filter on created_time, and change it manually when you stop getting results.
Try it with a comma:
SELECT post_id, created_time, message, likes, comments, attachment, permalink, source_id, actor_id FROM stream WHERE filter_key IN (SELECT filter_key FROM stream_filter WHERE uid=me() AND type='newsfeed') AND is_hidden = 0 limit 11,5