Facebook Open Graph API: weird behavior of parameter limit while getting a paginated user's news feed - facebook

I've written a little script in JAVA, that tests the parameter limit with four different values (10, 100, 1000 and 10000) when querying a user's news feed of Facebook using the Open Graph API and the RestFB client. As you'll see, it has a strange behavior...
Scenario:
public static void main(String[] args) {
// vars
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
FacebookClient client = new DefaultFacebookClient(accessToken);
Connection<Post> home;
List<Post> postList;
Map<String, Post> postMap;
int i;
// limits to test
String[] limits = {"10", "100", "1000", "10000"};
for (String limit : limits) {
// init list and map (looking for duplicate posts)
postList = new LinkedList<Post>();
postMap = new LinkedHashMap<String, Post>();
// get news feed
home = client.fetchConnection(id + "/home", Post.class, Parameter.with("limit", limit));
// going through pages
i = 1;
for (List<Post> page : home) {
for (Post post : page) {
// store into list
postList.add(post);
// store into map (unique post id)
postMap.put(post.getId(), post);
}
i++;
}
// sort posts by created time
Collections.sort(postList, new Comparator<Post>() {
#Override
public int compare(Post post1, Post post2) {
return post1.getCreatedTime().compareTo(post2.getCreatedTime());
}
});
// log
try {
FileWriter out = new FileWriter("log/output.txt", true);
out.write("LIMIT: " + limit + "\n");
out.write("\tPAGES: " + (i - 1) + "\n");
out.write("\tLIST SIZE: " + postList.size() + "\n");
out.write("\tMAP SIZE: " + postMap.size() + "\n");
out.write("\tOLDER POST: " + dateFormat.format(postList.get(0).getCreatedTime()) + "\n");
out.write("\tYOUGNER POST: " + dateFormat.format(postList.get(postList.size() - 1).getCreatedTime()) + "\n");
out.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
Output:
LIMIT: 10
PAGES: 7
LIST SIZE: 56
MAP SIZE: 56
OLDER POST: 2009-03-22 14:58:03
YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 100
PAGES: 3
LIST SIZE: 174
MAP SIZE: 172
OLDER POST: 2012-01-12 23:01:34
YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 1000
PAGES: 2
LIST SIZE: 294
MAP SIZE: 292
OLDER POST: 2009-03-22 14:58:03
YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 10000
PAGES: 2
LIST SIZE: 294
MAP SIZE: 292
OLDER POST: 2009-03-22 14:58:03
YOUGNER POST: 2012-05-11 15:48:49
Interpretations and questions:
Obviously, you can't get all the posts a user has had on his news feed since his account was created. Is limit limited?
With a limit of 100, 1000 and 10000, I must have had each time two duplicated posts within the whole returned news feed (174 - 172 = 194 - 192). Why? I never saw the same post twice on my personal news feed...
With (and only with) a limit of 100, the older post I get was created during the year 2012, meanwhile the other values of limit make the query retrieving a post that was created during the year 2009. I can understand that with an upper limit (1000 or 10000), the query retrieves older posts. But why does a limit of 10 make the query retrieving an older post than a query limited by 100?
Last but not least point: I'm not getting the same number of posts. Obviously, the more the limit is high, the more the number of retrieved posts is high. What I thought first, is that the only consequence of a smaller limit was an upper number of pages (which is the case though), but that the number of retrieved posts would not change. But it does. Why? That said, the number of posts seems to converge between a limit of 100 and 1000, because the number of posts is identical with a limit of 1000 and a limit of 10000.
PS: specifying a since and/or a until parameter to the query doesn't change anything.
Any answer/comment is welcome :)
Cheers.
Edit:
This is my best recall:
LIMIT: 200
PAGES: 3
LIST SIZE: 391
MAP SIZE: 389
OLDER POST: 2012-01-27 14:17:16
YOUGNER POST: 2012-05-11 16:52:38
Why 200? Is it specified anywhere in the documentation?

Its not in documentation but personally I have tested following for my project.
Facebook limit is limited to 500 posts. No matter you put a limit higher than 500 it will fetch only 500 results max. Try with 500 (or more), you will get maximum posts.
You wont get 500 posts every time but will get above 490 posts in general.
Some posts get filtered by various reasons (like privacy, blocked user, not suitable for specific region and other things)
This answers your 1st and 4th quetion.
For question no. 2 , I do not work in java, so I cant say if there's a prob in your code/logic or what your code is doing.
For question no. 3 , God help facebook !
Edit
For 4th problem, you may be hitting the queries/hour limit of graph api (facebook uses it to prevent spamming, you cant query apis frequently in quick succession)
Also,
this is why, you do not get all results returned by facebook.
(if you specified a limit of “5” but the five posts returned are not
visible to the viewer, you will get an empty result set.)
In addition to the limits mentioned in the documentation for each of
the tables and connections listed above, it is helpful to know that
the maximum number of results we will fetch before running the
visibility checks is 5,000.
Reference: Paging with graph api and fql
Also, there is a limit on no of results for a particular table. You can get a detail about them on respective fql tables.
For stream table (the one for posts/feed),
Each query of the stream table is limited to the previous 30 days or
50 posts, whichever is greater, however you can use time-specific
fields such as created_time along with FQL operators (such as < or >)
to retrieve a much greater range of posts.
Reference: Fql stream table
Look here too:
Facebook FQL stream limit?

There is an ongoing bug in Facebook open graph API paging having to do with the limit parameter. The higher the limit, the more pages of posts --- as if a lower limit also culls a sampling of posts. The problem has surfaced and retreated ever since the post search function was down for a month in September.
A new bug has surfaced: at present a post search without an access_token and a small limit (like 12) will return few and sparsely populated results pages. The same search made with the access_token given in the API documentation example will give full pages of 12 results +/- and no skipping. I have no idea what kind of access_token they use, but no attempts on my part have duplicated their results. The post search without access token is more or less non-functional (again)!

There could be some logic on facebook side to prevent data mining. Try add some delay while going through pages and see if better.

Related

Get follower count on Scratch (API)

I am looking to find the follower count of a Scratch user using the Scratch API. I already know how to get their message count, with https://api.scratch.mit.edu/users/[USER]/messages/count/.
This answer targets the Scratch REST API, documented here.
You get the user's followers by requesting them: https://api.scratch.mit.edu/users/some_username/following where some_username is to be replaced by the actual username.
This will return 0 to 20 results (20 is the default limit of objects returned by the REST API). If there's less than 20 results, then you're done. The amount of followers is simply the count of the objects returned.
If there's 20 objects returned, we can't be certain we've requested all the user's friends as there might be more to come. Therefore, we skip the first 20 followers of that user by supplying the ?offset= parameter: https://api.scratch.mit.edu/users/some_username/following?offset=20
This retrieves the second 'page' of friends. Now we simply loop through the procedure described above, incrementing offset by 20 each time until either less than 20 results are returned or no results are returned. The amount of friends of that user is the cumulative count of the objects returned.
As mentioned by _nix on this forum thread, there is currently no API to achieve this. However, he/she rightly points out that the number can be obtained from a user's profile page.
You may write a script (in JavaScript, for example) to parse the HTML and get the follower count in the brackets at the top of the page.
Hope this helps!
There is a solution in Python:
import requests
import re
def followers(self,user):
followers = int(re.search(r'Followers \(([0-9]+)\)', requests.get(f'https://scratch.mit.edu/users/{user}/followers').text, re.I).group(1))
return f'{followers} on [scratch](https://scratch.mit.edu/users/{user}/followers)'
Credit goes to 12944qwerty, in his code (adapted to remove some implementation specific stuff).
use ScratchDB
var user = "username here";
fetch(`https://scratchdb.lefty.one/v3/user/info/${user}`).then(res => res.json()).then(data => {
console.log(`${user} has ` + data["followers"].toString() + " followers");
}
(Edit: this is javascript btw, I prefer Python but Python doesn't have a cloud.set function and this is how I did it)
Use ScratchDB (I used httpx, but you can GET with anything):
import httpx
import json
user = "griffpatch"
response = httpx.get(f"https://scratchdb.lefty.one/v3/user/info/{ user }")
userData = json.loads(response.text)
followers = userData["statistics"]["followers"]
https://api.scratch.mit.edu/users/griffpatch/followers
this gives the follower names, scratch staus(scratch team or not), pfp, everything in their profile

Filtering results from edge /[pageId]/posts

I would like to show the latest N Facebook posts on my website.
I am using this simple code:
var FB = require('fb')
FB.api('/flourandfire/posts', 'get', { fields: [ 'message', 'picture' ], access_token: '1694494XXXXXXXX|1ba36298123bbf9689942fXXXXXXXXXX' },
function (res) {
if (!res || res.error) {
console.log(!res ? 'error occurred' : res.error)
return
}
console.log(res)
})
However, I need to be able to have some kind of filtering, since there is "noise" in the feed due to the direct link with Instagram (which results in short posts with a picture, that I do NOT want to include in the website's feed)
I basically need to somehow "differentiate" specific posts, which will then placed on the site.
I could fetch 100 posts and filter myself based on a specific tag (like #pub; however, there is the risk of having lots of Instagram posts, more than 100, and end up with ZERO posts on the website.
How would you solve this issue?
There is no official filtering, i would do it like this:
Get N entries by setting the limit parameter to N
Filter them on your own
If there are less than N elements after filtering, use another API call to get more items
Repeat from Number 2
Just an idea though, you could also increase the initial limit - but it would usually result in a slower API call so be careful with that.

GitHub Api: User followers - paging?

I am playing with some Javascript and Github API, and I've came to one problem.
Each time, when I try to call for followers of any user who has followers, the callback that I get from the server shows only 30 users. For example:
https://api.github.com/users/vojtajina/followers - 30 followers
and user followers from original website:
https://github.com/vojtajina/followers - 1,039 followers
My questions is - what is going on? There is no 'next page' in the callback from the server. How can I get all of his/hers followers in the callback?
The max number of items per page is 100, so using the per_page=100 querystring parameter will increase the result to have 100 users per page:
https://api.github.com/users/vojtajina/followers?per_page=100
Using the page querystring parameter, you have control to pagination. For example, to get the second page, you should add page=2:
https://api.github.com/users/vojtajina/followers?per_page=100&page=2
If you want to get all the followers you have to iterate the pages until you receive an empty array.
If you want to use this into a Node.js / JavaScript (on client) app, you can use gh.js–a library I developed which handles this:
var GitHub = require("gh.js");
var gh = new GitHub({
token: "an optional token"
});
gh.get("users/vojtajina/followers", { all: true } function (err, followers) {
console.log(err || followers); // do something with the followers
});

Facebook share count from debugger page

How can I get number of shares that is shown on facebook debugger page via API?
I've empirically found it to fit the most for comparing with share counters from some others social networks, but it looks like this number does not show up anywhere except that debugger page.
Here are some details.
By now I've found 3 API calls that return somewhat relevant data:
via graph API: http://graph.facebook.com/?id=http%3A%2F%2Farzamas.academy%2Fspecial%2Fruslit
via FQL: https://graph.facebook.com/fql?q=SELECT%20url,%20normalized_url,%20share_count,%20like_count,%20comment_count,%20total_count,commentsbox_count,%20comments_fbid,%20click_count%20FROM%20link_stat%20WHERE%20url=%27http%3A%2F%2Farzamas.academy%2Fspecial%2Fruslit%27
via some old API: https://api.facebook.com/method/links.getStats?urls=http%3A%2F%2Farzamas.academy%2Fspecial%2Fruslit&format=json
The values in second and third call are identical, for my test url http://arzamas.academy/special/ruslit the current ones are
share_count: 492, like_count: 5042, comment_count: 491, total_count: 6025
The counter from the first call is named shares and is equal to total_count from second and third call.
When you paste the url in facebook debugger and click 'Show existing scrape information', one of the first rows in table is
Canonical URL: http://arzamas.academy/special/ruslit (6025 likes, 1635 shares)
Number of likes is equal to total_count from API calls, but how can I get that 1635 shares number via API?
I've found that specifying Graph API version newer than 2.0 gives another number (share.share_count), for some reason it is the sum of both numbers, that are shown in debugger (likes + shares).
https://developers.facebook.com/tools/explorer/145634995501895/?method=GET&path=%3Fid%3Dhttp%253A%252F%252Farzamas.academy%252Fspecial%252Fruslit&version=v2.3&
So now I can get counters from two calls and subtract to get the value that I need. There are obvious cons for this method:
2 calls
token requirement
looks not very reliable
But it should work, I'll try to implement it and will mark this answer as correct if there are no additional caveats and until there is no better solution.

Search Facebook events and pagination (Graph API)

I am requesting this page to get the events with the keyword
"conference":https://graph.facebook.com/search?q=conference&type=event
This works fine.
The problem is the pagination returned:
"paging": {
"previous":"https://graph.facebook.com/search?q=conference&type=event&limit=25&since=2010-12-18T17%3A00%3A00%2B0000",
"next":"https://graph.facebook.com/search?q=conference&type=event&limit=25&until=2010-11-04T16%3A29%3A59%2B0000"
}
It seems to have more events with "conference", but requesting these 2 pagination URLS returns no data.
It's weird because it's the same for any requested keyword, and the pagination URLs returned by the Facebook API seems to always returns empty data.
Does anyone know what's the issue?
Thanks
I encountered similar confusion with a query against places. The "next" URL behaved exactly as you described it.
I could query location information using a url like this:
https://graph.facebook.com/search?access_token=INSERT_TOKEN&type=place&center=55.8660,-4.2715&distance=150&limit=10
And got back JSON with the first 10 places plus the following fragment which suggests the existence of paging params:
"paging": {
"next": "https://graph.facebook.com/search?access_token=INSERT_TOKEN&type=place&center=55.8660\u00252C-4.2715&distance=150&limit=10&offset=10"
Hitting that URL doesn't work. But I did figure out a combination of limit and offset params that gave me effective paging.
limit=10 & offset not defined => first 10 results
limit=20 & offset=10 => next 10 results
limit=30 & offset=20 => next 10 results
limit=40 & offset=30 => last 8 results (can stop here because less than 10 back)
limit=50 & offset=40 => confirmation that there are no more results
I realise that I've got "limit" and "offset" rather than the "limit" and "until" params that you get, but, hopefully you could apply the same technique i.e. keep incrementing the limit and inc the date/time to that of your last result?
I think this is a standard practice in Facebook Graph API. I think if your request resulted to a non empty JSON, they will always give you the next paging, even though it might be empty.
I am however not 100% sure, because Facebook Graph API does not seem to be very well documented... (for example they said we can modify this pagination thing but did not explain clearly how to do it).
Seems facebook has changed it recently.
Here's the fix:
For a datetime returned in next and previous as
"2011-01-18T08\u00253A42\u00253A35\u00252B0000",
replace all occurrences of "\u0025" with "%" and it should work fine.
If you notice the facebook's datetime format, it is
2011-01-18T08:42:35+0000
(date accepted by strtotime C function)