Identify duplicate posts in Graph API - facebook

Problem with identifying duplicate content on FB graph API.
If a photo album has been posted to the timeline more than once, the post is returned in the results multiple times with different link, object_id & post_id properties.
My code therefore can't tell that the number of likes is always n (number of duplicates) * actual_like_count.
How can I avoid counting twice?
EDIT: Here is some example data
type status_type link likes comments shares
photo added_photos a.xxx.yyy.zzz/mmm 48 6 1
photo "" a.xxx.yyy.zzz/ppp 48 1 0
photo added_photos a.xxx.yyy.zzz/ppp 48 1 19
In this example all the metrics are different, despite having the same album id (xxx).
Here is example of duplicate counts with exact same link structure
type status_type link likes comments shares
photo added_photos a.xxx.yyy.zzz/qqq 63 3 0
photo "" a.xxx.yyy.zzz/rrr 63 3 0
photo added_photos a.xxx.yyy.zzz/sss 63 3 0
Notice in the first table, the portion after slash in the link matches for the second two rows, yet still different metrics.
object_id is always different

Albums posted to the timeline will have a post_id associated with it, which will always be different, but the link to the album remains the same. You can try comparing the Links for each album.
https://www.facebook.com/photo.php?fbid=xxx&set=a.yyy.zzz.aaa&type=b&relevant_count=c
For the most part, the link will always be the same. From what I've seen, the set parameter in the link usually has a link to the album that would be the same for duplicate posts:
It usually looks something like:
set=a.xxx.yyy.zzz.zzz
The a.xxx is the album ID. You can check this by calling /xxx on the Graph API to get the album details.

Related

How to obtain the total number of photos in one area using the panoramio data API?

I am trying to get the total number of photos from one specific area using the panoramio data API.
The following code
www.panoramio.com/map/get_panoramas.php?set=full&from=0&to=0&minx=-180&miny=-90&maxx=180&maxy=90&mapfilter=false
returns
{"count":72543250,"has_more":true,"map_location":{"lat":-46.647137999999998,"lon":-72.607527000000005,"panoramio_zoom":0},"photos":[]}
count should be around 100 000 000 (the total number of photos in panoramio) and not 72 543 250 and I guess the value of "has_more" should be false.
Thanks in advance.
The difference between the total number of photos according to the API and the actual total number of photos in Panoramio is that the API only returns geo-referenced photos and not all photos in Panoramio are geo-referenced.
The 100,000,000 are recent ID, because some ID are deleted, so REAL number may be lower.

Querying old links from public Facebook page returns an empty set

I am trying to fetch links that were posted on a public Facebook page sometime ago, in 2011 for example. Specifically, the Arabic CNN page on facebook: http://www.facebook.com/CNNArabic
Things I tried:
1- Graph API, a query like this:
CNNArabic/links?fields=id,name,link,created_time&limit=25&until=2012-05-15
2- FQL
SELECT link_id, url, created_time FROM link WHERE owner = 102581028206 and created_time < 1337085958 LIMIT 100
Both give an empty data set while there is data on the page on or before this date.
Other things I noticed:
1-If I changed the date to something like 2013-01-17 (which is yesterday), it works fine.
2-If I changed the date to something like 2012-12-17 (about a month ago), an empty data set is returned, however if I followed the next page links in the returned data set from the query in number 1 above until I pass by this date I actually get data.
I tried writing code that kept following the next page pointers until I reach the links on the date I want. However I need data much older (say in 2011) and the result set gets exhausted say 2 or 3 months earlier than now, in other words no more next links are returned so I actually can never reach that old data.
To cut this short:
is there a way I can query links that were posted on a public page before a specified date?
Querying the page's feed works in the Graph API:
/102581028206/feed?fields=id,link,name,created_time&limit=25&until=2012-05-15
This returns all the posts. There should be a way to filter this using field expansion, but I couldn't get the few things I tried to work.
You can get this filtered for links only with FQL on the stream table:
SELECT message, attachment, created_time FROM stream WHERE source_id = 102581028206
AND created_time < strtotime('2012-05-15') AND type=80 LIMIT 10
Links are type=80.

Fetching object_id for all pictures of facebook friends

I am trying to retrieve the object_id of all pictures that my friends has on facebook.
This is the method I use that I believe should work fine:
https://api.facebook.com/method/fql.query?access_token=[YOURTOKEN]&query=SELECT object_id FROM photo WHERE aid IN (SELECT aid FROM album WHERE owner IN (SELECT uid FROM friend WHERE uid1=me )) ORDER BY created DESC
My problem is that I only retrieve 5108 object_id's , thats nowhere close to the total number of pictures that all of my friend has.
Is there a restriction from facebook ? Any suggestions appreciated.
You can add LIMIT and OFFSET to the end of your query. So to get the first 1000 photos, you would have LIMIT 1000 OFFSET 0, then for the next group LIMIT 1000 OFFSET 1001 and so on.
You are also using a legacy endpoint. You should be using the newer one:
https://graph.facebook.com/fql?q=[QUERY]&access_token=[TOKEN]

How to Sort/ Limit the API call for Groups for member list

I see that we can only get 500 members of a group using the graph API.
and the doc says these are "the first 500 members",
Are these sorted by date signed up, or latest 500?
Is there any way I can further limit these to signed up in the last 24 hours/ 1 week?
Is the 500 limit there in using FQL also? (the docs don't specify that )
Is there any way I can further limit these to signed up in the last 24 hours/ 1 week using FQL?
i see that we can only get 500 members of a group using the graph API. and the doc says these are "the first 500 members",
are these sorted by date signed up, or latest 500,???
I’d say by date of joining the group, because otherwise calling them the “first” 500 would make little sense.
Is the 500 limit there in using FQL also? (the docs dont specify that )
From my tests there seems to be no such limit on the group_member table (just tried it for the FB developer group using Grapf API explorer, and my browser froze for about a minute loading the data).
is there any way i can further limit these to signed up in the last 24 hours/ 1 week using FQL?
No, there is no such info as signup date in the FQL table.

Facebook FQL stream limit?

I want to get the full history of my wall. But I seem to hit a limit somewhere back in June.
I do multiple calls like this:
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID LIMIT 50
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID LIMIT 51,100
and so on...
But I always end up on the same last (first) post on my wall.
Through facebook.com I can go back much longer so Facebook obviously have the data.
Why am I not getting older posts?
Is there another way to scrape my history?
From http://developers.facebook.com/docs/reference/fql/stream :
The stream table is limited to the last 30 days or 50 posts, whichever is greater
I am experiencing the same thing. I don't understand it at all, but it appears that the offset cannot be greater than the limit * 1.5
Theoretically, this means that always increasing the limit to match the offset would fix it, but I haven't been able to verify this (I'm not sure whether the problems I'm seeing are other bugs in my code or if there are other limitations I don't understand about getting the stream).
Can anyone explain what I'm seeing and whatever I'm missing?
You can reproduce my results by going to the FQL Test Console:
http://developers.facebook.com/docs/reference/rest/fql.query
pasting in this query:
SELECT post_id, created_time, message, likes, comments, attachment, permalink, source_id, actor_id
FROM stream
WHERE filter_key IN
(
SELECT filter_key
FROM stream_filter
WHERE uid=me() AND type='newsfeed'
)
AND is_hidden = 0 limit 100 offset 150
When you click "Test Method" you will see one of the 2 results I am getting:
The results come back: [{post_id:"926... (which I expected)
It returns empty [] (which I didn't expect)
You will likely need to experiment by changing the "offset" value until you find the exact place where it breaks. Just now I found it breaks for me at 155 and 156.
Try changing both the limit and the offset and you'll see that the empty results don't occur at a particular location in the stream. Here are some examples of results I've seen:
"...limit 50 offset 100" breaks, returning empty []
"...limit 100 offset 50" works, returning expected results
"...limit 50 offset 74" works
"...limit 50 offset 75" breaks
"...limit 20 offset 29" works
"...limit 20 offset 30" breaks
Besides seeing the limit=offset*1.5 relationship, I really don't understand what is going on here.
Skip the FQL and go straight to graph. I tried FQL and it was buggy when it came to limits and getting specified date ranges. Here's the graph address. Put in your own page facebook_id and access_token:
https://graph.facebook.com/FACEBOOK_ID/posts?access_token=ACCESS_TOKEN
Then if you want to get your history set your date range using since, until and limit:
https://graph.facebook.com/FACEBOOK_ID/posts?access_token=ACCESS_TOKEN&since=START_DATE&until=END_DATE&limit=1000
Those start and end dates are in unix time, and I used limit because if I didn't it would only give me 25 at a time. Finally if you want insights for your posts, you'll have to go to each individual post and grab the insights for that post:
https://graph.facebook.com/POST_ID/insights?access_token=ACCESS_TOKEN
I dont know why, but when I use the filter_key = 'others' the LIMIT xx works.
Here is my fql query
SELECT message, attachment, message_tags FROM stream WHERE type = 'xx' AND source_id = xxxx AND is_hidden = 0 AND filter_key = 'others' LIMIT 5
and now I get exactly 5 posts...when i use LIMIT 7 i get 7 and so on.
As #Subcreation said, something is wack with FQL on stream with LIMIT and OFFSET and higher LIMIT/OFFSET ratios seem to work better.
I have created an issue on it Facebook at http://developers.facebook.com/bugs/303076713093995. I suggest you subscribe to it and indicate you can reproduce it to get it bumped up in priority.
In the bug I describe how a simple stream FQL returns very inconsistent response counts based on its LIMIT/OFFSET. For example:
433 - LIMIT 500 OFFSET 0
333 - LIMIT 500 OFFSET 100
100 - LIMIT 100 OFFSET 0
0 - LIMIT 100 OFFSET 100
113 - LIMIT 200 OFFSET 100
193 - LIMIT 200 OFFSET 20
You get a maximum likes of 1000 when using LIMIT
FQL: SELECT user_id FROM like WHERE object_id=10151751324059927 LIMIT 20000000
You could specify created_time for your facebook query.
create_time field is unix based time. You could convert it with such convertor http://www.onlineconversion.com/unix_time.htm, or use program methods depends on you language.
Template based on your request
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID and created_time>BEGIN_OF_RANGE and created_time>END_OF_RANGE LIMIT 50
And specific example from 20.09.2012 to 20.09.2013
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID and created_time>1348099200 and created_time>1379635200 LIMIT 50
I have a similar issue trying to download older posts from a public page, adding a filter ' AND created_time < t', and setting t for each query to the minumum created_time I got so far. The weird thing is that for some values of t this returns an empty set, but if I manually set t back of one or two hours, then I start getting results again. I tried to debug this using the explorer and got to a point where a certain t would get me 0 results, and t-1 would get results, and repeating would give me the same behavior.
I think this may be a bug, because obviously if I created_time < t-1 gives me results, then also created_time < t should.
If it was a question of rate limits or access rights, then I should get an error, instead I get an empty set and only for some values of t.
My suggestion for you is to filter on created_time, and change it manually when you stop getting results.
Try it with a comma:
SELECT post_id, created_time, message, likes, comments, attachment, permalink, source_id, actor_id FROM stream WHERE filter_key IN (SELECT filter_key FROM stream_filter WHERE uid=me() AND type='newsfeed') AND is_hidden = 0 limit 11,5