Facebook Graph API Batch - Iterative paging - facebook

I'm trying to get all posts from a number of public Facebook pages. The following combined with some paging does this but it's slow as separate calls are required to traverse through each page of posts (and then comments and then replies if these are included).
curl -X GET "https://graph.facebook.com/FacebookDevelopers/feed?fields=id&access_token="$TOKEN
From what I can find in the facebook API documentation it seems like the batch endpoint is better for this but I'm having trouble getting all pages of posts.
The following takes the initial request and traverses forward one page (ie returning a total of 2 pages) but I can't get more than 2 pages.
curl \
-F 'access_token='$TOKEN \
-F 'batch=[{ "method":"GET","name":"getfeed","omit_response_on_success":false,"relative_url":"FacebookDevelopers/feed?fields=id"},
{ "method":"GET","omit_response_on_success":false,"relative_url":"FacebookDevelopers/feed?fields=id&after={result=getfeed:$.paging.cursors.after}"}]' \
https://graph.facebook.com
Is there a way to get all posts from a public page using the batch endpoint?

For each paging request you will the cursor parameter (for the after call). You can "nest" the batch calls only one level deep, you so can indeed only request two pages at the same time.
Besides the technical limitation, please also be aware that scraping is not allowed and there are request throttling limits.

Related

How to get count of followers through Github API?

I want to get count of my followers when each time the page loads.
The endpoint I found is api.github.com/users/tim-hub/followers, which can get all followers data (with pagination).
What the problems are (through restful api)
I do not need the followers details
Pagination is like 30 items each page, which makes it hard to count all of them
I am thinking about to fetch a page , i.e. page 4 ?page=4, and do a simple calculation 30*(4-1)+[count of page 4], but it means it will call multiple times, because the followers are changing, if it increases or decreases, I have to call to page 5 or page 3. and loop it until find the last page.
Why not GraphQL
GraphQL api seems like can be used to get the count, what the problem is that I want to call this through front-end, and Graph way requires authentication, I do not want to share my persona token to all.
Similarely to "How to find my organization Id over github?", you can use the GitHub user API to isolate the followers number.
curl -H "Accept: application/json" https://api.github.com/users/aUser | jq ".followers"
In my case:
curl -H "Accept: application/json" https://api.github.com/users/VonC| jq ".followers"
179
For a GraphQL-based solution, see "Github API - Find number of followers for all my followers".

Searching for playlists on soundcloud api returns 500 error

I'm trying to use the soundcloud api to find playlists with a specific tag. My first step in doing so is pinging the soundcloud api for all playlists. I run the following in my command line to do so (client id replaced for privacy):
curl 'https://api.soundcloud.com/playlists.json?client_id=MY_CLIENT_ID'
This always returns a 500 interval server error, whether I ask for it in json or normal xml:
curl 'https://api.soundcloud.com/playlists?client_id=MY_CLIENT_ID'
However, when I make the analogous request for tracks, it works fine:
curl 'https://api.soundcloud.com/tracks?client_id=MY_CLIENT_ID'
curl 'https://api.soundcloud.com/tracks.json?client_id=MY_CLIENT_ID'
What gives? Is this an error on my side or their side?
I don't think you can grab all playlists. There must be millions of them, and that is a lot to return.
If you look here at the Soundcloud API Docs, they add a playlist id to the URL.
$ curl "http://api.soundcloud.com/playlists/405726.json?client_id=YOUR_CLIENT_ID"
Hopefully this helps!

Even though Facebook API returns 'code' 200 for some 'nodes', when accessing the webpage it return 404

I'm developing a web app that uses FB data for some FB posts. I have a bunch of post ids and am fetching the data related to them using batched requests. Then am showing a summary of each post (number of comments, shares, likes) and link to the actual FB page (https://www.facebook.com/). But clicking on the link shows a 404 page on FB!!
Example, the node_id, '69983322463_10152179775342464' will return data in the graph explorer. But when you access https://www.facebook.com/69983322463_10152179775342464 it returns 404!
In case my question is not clear:
GET https://graph.facebook.com/69983322463_10152179775342464?access_token={a valid access token} returns data.
But GET https://www.facebook.com/69983322463_10152179775342464 (with or without an access_token param) returns a 404
Is there some field in the API response that signifies that the page does not exist anymore?
Thanks,
mano
This is because not every post is public. Only publicly available posts can be accessed directly.
For rest you need a valid access token to GET its details. When you tried the post id in graph api explorer it showed the result since an access token was applied.
So, you simply use a valid access token, may be any app access token (app_id|app_secret)- that never expires, and make the /GET request.
Eg: \GET /69983322463_10152179775342464?access_token={app-access-token}

FaceBook loads HTTPS hosted iframe apps via HTTP POST (S3 & CloudFront errors)

I have been trying to write a bucket policy that will allow (X-HTTP-Method-Override) because my research shows that Facebook loads HTTPS hosted iframe apps via HTTP POST instead of HTTP GET which causes S3 and CloudFront errors.
Can anyone please help me with this problem?
This is what's returned from S3 if I served my Facebook app directly from S3:
<?xml version="1.0" encoding="UTF-8" ?>
- <Error>
<Code>MethodNotAllowed</Code>
<Message>The specified method is not allowed against this resource.</Message>
<ResourceType>OBJECT</ResourceType>
<Method>POST</Method>
<RequestId>B21565687724CCFE</RequestId>
<HostId>HjDgfjr4ktVxqlIBeIlvXT3UzBNuPg8b+WbhtNHOvNg3cDNpfLH5GIlyUUpJKZzA</HostId>
</Error>
This is what's returned from CloudFront if I served my Facebook app from CloudFront with S3 as the origin:
ERROR
The request could not be satisfied.
Generated by cloudfront (CloudFront)
I think the solution should be to write a bucket policy that makes use of X-HTTP-Method-Override... Probably I am wrong though. A solution to this problem would be highly appreciated.
After trying many different ways to get this to work, it turns out that it simply is not possible to make the POST to static content work on S3 as things stand. Even if you allow POST through Cloudfront, enable CORS, change the bucket policy so that the Cloudfront origin identity can GET/PUT etc. it will still throw an error.
As an aside, S3 is not the only thing that balks at responding to such a POST request to static content. If you configure nginx as an origin for a Facebook iframe you will get the same 405 error, though you can work around that problem in a couple of ways (essentially rewriting it to a GET under the covers). You can also change the page (though still static) to be a dynamic extension (.aspx or .php) to work around the issue with nginx.
You can host all your other content on S3 of course, and just move the page that you POST to onto a different origin. With a decent cache time you should see minimal traffic, but it will mean keeping your content in two places. What I ended up doing was:
Creating EC2 instances in an autoscaling group (just in case) to serve the content
They used a cron job to sync the content from S3 every 5 minutes
No change in workflow was required (still just upload content to S3)
It's not ideal, nor is it particularly efficient, but hopefully it will save others a lot of fruitless testing trying to get this to work on S3 alone.
You can set your Cloudfront distribution to allow POST methods.
If you go into your dashboard and edit the Behavior for the distribution
- Then select Allowed HTTP Methods - GET, HEAD, PUT, POST, PATCH, DELETE, OPTIONS
This allows the POST from Facebook to go through to your origin.
I was fighting with S3 and CloudFront for last couple of days. and I confirm that with any bucket policy we cannot redirect POST calls from Facebook to S3 static (JS enriched) contents.
The only solution seems to be the one Adam Comerford mentioned in this thread:
Having a light application which receives Facebook calls then fetching the content from S3 or CloudFront.
If anyone has any other solution or idea it will be appreciated.
you can't change POST to GET - that's the way Facebook loads app page because it also sends data about the current user as POST body (see signed_request for more details). I would suggest you look into fixing your app to make sure it properly responds to POST request.

Is there a way to increase the API Rate limit or to bypass it altogether for GitHub?

I am developing a web application which needs to send a lot of HTTP requests to GitHub. After n number of successful requests, I get HTTP 403: Forbidden with the message API Rate Limit Exceeded.
Is there a way to increase the API Rate limit or to bypass it altogether for GitHub?
This is a relative solution, because the limit is still 5000 API calls per hour,
or ~80 calls per minute, which is really not that much.
I am writing a tool to compare over 350 repositories in an organization and
to find their correlations.
Ok, the tool uses python for git/github access, but I think
that is not the relevant point, here.
After some initial success, I found out that the capabilities of the GitHub API
are too limited in # of calls and also in bandwidth, if you really want to ask
the repos a lot of deep questions.
Therefore, I switched the concept, using a different approach:
Instead of doing everything with the GitHub API, I wrote a GitHub Mirror script
that is able to mirror all of those repos in less than 15 minutes using my
parallel python script via pygit2.
Then, I wrote everything possible using the local repositories and pygit2.
This solution became faster by a factor of 100 or more, because there was neither an API nor a bandwidth bottle neck.
Of course, this did cost extra effort, because the pygit2 API is quite a bit
different from github3.py that I preferred for the GitHub solution part.
And that is actually my conclusion/advice:
The most efficient way to work with lots of Git data is:
clone all repos you are interested in, locally
write everything possible using pygit2, locally
write other things, like public/private info, pull requests, access to
wiki pages, issues etc. using the github3.py API or what you prefer.
This way, you can maximize your throughput, while your limitation is now the
quality of your program. (also non-trivial)
In order to increase the API rate limit you might
authenticate yourself at Github via your OAuth2 token or
use a key/secret to increase the unauthenticated rate limit.
There are multiple ways of doing this:
Basic Auth + OAuth2Token
curl -u <token>:x-oauth-basic https://api.github.com/user
Set and Send OAuth2Token in Header
curl -H "Authorization: token OAUTH-TOKEN" https://api.github.com
Set and Send OAuth2Token as URL Parameter
curl https://api.github.com/?access_token=OAUTH-TOKEN
Set Key & Secret for Server-2-Server communication
curl 'https://api.github.com/users/whatever?client_id=xxxx&client_secret=yyyy'
Just make new "Personal Access Token" here and use simple fetch method (if you are coding in JS of course :D) and replace YOUR_ACCESS_TOKEN with your token.
The best way to test it is to use Postman
async function fetchGH() {
const response = await fetch('https://api.github.com/repos/facebook/react/issues', {
headers: {
'Authorization': 'token YOUR_ACCESS_TOKEN',
}
})
return await response.json()
}
Solution: Add authentication details or the client ID and secret (generated when you register your application on GitHub).
Found details here and here
"If you need to make unauthenticated calls but need to use a higher rate limit associated with your OAuth application, you can send over your client ID and secret in the query string"
While seems like there's still no way to increase the rate limit, GitHub now has a GraphQL API that potentially allows you to lower your API call.
Keep in mind that GitHub calculates rate limit differently between GraphQL and REST API. GraphQL API rate limit is 5000 points / hour (not 5000 calls per hour! So 1 GraphQL call can cost you more than 1 point) You can read more here: https://docs.github.com/en/graphql/overview/resource-limitations (TL;DR is more resource to fetch in the query = more points)
For example, if you have similar use case with Christian's answer, instead of doing multiple call to multiple endpoint
GET /repos/{owner}/{repoA}
GET /repos/{owner}/{repoB}
You can just do one GraphQL call to https://api.github.com/graphql with this query:
query {
repoA: repository(owner:"owner", name:"repoA") {
...
}
repoB: repository(owner:"owner", name:"repoB") {
...
}
}
Depending on what's the repository queries, you can still add even more repositories on one call and still using 1 point on each call.
I have observed this error during multibrnach pipeline configuration in jenkins
I had selected the source as github. After changing it to git and passing guthub repo details it worked. (have git executable path configured in jenkins and have a credential set for authentication to github)
May I suggest "become an archiver" https://github.com/github/site-policy/issues/56
BTW even "non-archivers" have access to public data...
All the tools are on github.