Why did Facebook recently stop pushing updates to our webhook application? - facebook

I've inherited a Facebook webhook application written in Node that tracks posts for several Facebook pages associated with our organization. On 2017-07-02 (about 3 weeks ago), it appears Facebook stopped pushing webhook updates to this application. This application had been running successfully for the last couple years prior to this.
These are the last entries I see in our application's access log:
724661:173.252.105.118 - - [02/Jul/2017:16:40:23 -0700] "POST /callback/facebook HTTP/1.1" 200 5 "-" "Webhooks/1.0"
724662:66.220.145.151 - - [02/Jul/2017:16:55:29 -0700] "POST /callback/facebook HTTP/1.1" 200 5 "-" "Webhooks/1.0"
724663:31.13.114.11 - - [02/Jul/2017:16:55:30 -0700] "POST /callback/facebook HTTP/1.1" 200 5 "-" "Webhooks/1.0"
I have confirmed that our application is still functioning by sending a manual request using curl that it did successfully process.
I see here that Facebook marked availability of v2.3 of their API to end on July 8:
https://developers.facebook.com/docs/apps/changelog
In the upgrade guide for v2.3 to v2.4, it notes:
There are a number of Page permissions changes between v2.3 and v2.4 that your apps will need to account for. Notably, a Page access token is now required to interface with /v2.4/{page_id}/promotable_posts, /v2.4/{page_id}/offers, and /v2.4/{page_id}/milestones.
Is this why Facebook has stopped pushing updates to our webhook endpoint? If so, where can I find more information on using Page access tokens with webhooks?

Facebook will stop sending updates to your webhook callback URL after a while, if your app is not responding with a 200 status code quickly enough. In such a case, you have to set up your callback URL again.
This is - somewhat - documented under
https://developers.facebook.com/docs/messenger-platform/webhook-reference#response,
Your webhook callback should always return a 200 OK HTTP response when invoked by Facebook. Failing to do so may cause your webhook to be unsubscribed by the Messenger Platform.
It is extremely important to return a 200 OK HTTP as fast as possible. Facebook will wait for a 200 before sending you the next message. In high volume bots, a delay in returning a 200 can cause significant delays in Facebook delivering messages to your webhook.
It does not explicitly mention this here, but I am pretty sure failing to respond with a 200 in a certain amount of time is also considered a failure.
Even less real-time related tools such as the Facebook Scraper (that grabs the Open Graph meta data from links shared by users) have a timeout of 10 seconds; so Facebook's patience for webhooks is not gonna be any longer, likely rather (much) shorter specifically for Messenger platform related webhooks - those response times directly influence the user experience after all.
Edit: You realized you actually asked about Graph API webhooks, not Messenger platform. But it is similar for those - https://developers.facebook.com/docs/graph-api/webhooks#callback
If any update sent to your server fails, we will retry immediately, then try a few more times with decreasing frequency over the next 24 hours. Your server should handle deduplication in these cases. Updates unaccepted for 24 hours will be dropped.
Response
Your endpoint should return a 200 OK HTTPS response for all update notifications.
And from discussions in the FB developers group I know that repeated timeouts also cause un-subscription from the webhooks in this case.

Related

Facebook Graph API Rate limit for development

I have a Facebook Application in development mode that shows as having 3 daily_active_users. From my understanding of the Graph API documentation, I can make 200 * daily active users = total request per hour, thus, I should be able to make 600 requests per hour
I am then making a Test User and trying to create a page via the accounts endpoint:
https://developers.facebook.com/docs/graph-api/reference/user/accounts/#Creating
This goes well for a single request. I then tried to script this, with 2 second timeouts in between each request, and tried to create 100 pages. After about 10 requests, I get the following response from the Facebook API:
{"error":{"message":"We limit how often you can post, comment or do other things in a given amount of time in order to help protect the community from spam. You can try again later. Learn More","type":"OAuthException","code":368,"error_data":{"sentry_block_data":"...","help_center_id":0},"error_subcode":1390008,"error_user_msg":"","fbtrace_id":"..."}}.
It states that my request is being rejected due to hitting some kind of limit, but what is this limit? I can't find it in the documentation anywhere. Is there a limit to the number of pages I can create with a test user per hour/day?

Suddenly, pages_messaging_subscriptions permission is required

I have a simple message bot that was set up according to the Messenger Platform guide. It has been working fine for the last few months, with about half a dozen messages sent a day. I have not touched it at all, but suddenly, sending a message, ie calling https://graph.facebook.com/v2.6/me/messages?access_token=..., returns:
{"message":"(#230) Requires pages_messaging_subscriptions permission to manage the object","type":"OAuthException","code":230,"fbtrace_id":"DVs...."}
This was out of the blue. Things were working fine, I did not even log on to Facebook during this time, and I haven't even looked at my webhook callback website. But some time from Aug 17 onward, this exception was returned for every attempted message send.
Has something changed? Anyway, I could not find a subscription field by the name pages_messaging_subscriptions in the Webhooks Page Subscription page.
What do I need to get my message bot to work again?
August 15 was Updater Messenger Platform Policies.
official post in blog
Now, to send a message a day after the activity of the user is necessary to request additional permission in the application settings.

The Google Admin SDK API errors out with no explanations

Regarding this API: https://developers.google.com/admin-sdk/email-audit/#accessing_account_information
I have been using the Admin SDK to retrieve login history for users in our Google Apps for Business setup. When I request individual users at a time, the request sometimes takes a few hours to process (in which the state is PENDING). However, when those few hours pass, I still get the login history that I need.
The problem continues as I begin requesting more users. We have around 750 users, and of those 750~ requests I made, 725 gave me an error after waiting ONE WEEK for my requests to be processed. Even worse, the ones that did not error out are still pending! Here is the response I get when I check the status of a request that errored out:
{'status': 'ERROR', 'adminEmailAddress': '***#etsy.com', 'requestDate': '***', 'requestId': '***', 'userEmailAddress': '***#etsy.com'}
This has got to be the flakiest and most unreliable API I have ever been unfortunate enough to work with. Requests can take anywhere from an hour to over a week to process, with no indicator of success in the mean time. Errors can also happen for no apparent reason, and no messages or explanations as to why.
It looks like this issue has been resolved by the Google Engineers. Try to run the calls again. It shouldn't be in pending more than the "normal" expected time. I just tried earlier, and I was able to export login info for my users.

Facebook Crawler Bot Crashing Site

Did Facebook just implement some web crawler? My website has been crashing a couple times over the past few days, severely overloaded by IPs that I've traced back to Facebook.
I have tried googling around but can't find any definitive resource regarding controling Facebook's crawler bot via robots.txt. There is a reference on adding the following:
User-agent: facebookexternalhit/1.1
Crawl-delay: 5
User-agent: facebookexternalhit/1.0
Crawl-delay: 5
User-agent: facebookexternalhit/*
Crawl-delay: 5
But I can't find any specific reference on whether Facebook bot respects the robots.txt. According to older sources, Facebook "does not crawl your site". But this is definitely false, as my server logs showed them crawling my site from a dozen+ IPs from the range of 69.171.237.0/24 and 69.171.229.115/24 at the rate of many pages each second.
And I can't find any literature on this. I suspect it is something new that FB just implemented over the past few days, due to my server never crashing previously.
Can someone please advice?
As discussed in in this similar question on facebook and Crawl-delay, facebook does not consider itself a bot, and doesn't even request your robots.txt, much less pay attention to it's contents.
You can implement your own rate limiting code as shown in the similar question link. The idea is to simply return http code 503 when you server is over capacity, or being inundated by a particular user-agent.
It appears those working for huge tech companies don't understand "improve your caching" is something small companies don't have budgets to handle. We are focused on serving our customers that actually pay money, and don't have time to fend off rampaging web bots from "friendly" companies.
We saw the same behaviour at about the same time (mid October) - floods of requests from Facebook that caused queued requests and slowness across the system. To begin with it was every 90 minutes; over a few days this increased in frequency and became randomly distributed.
The requests appeared not to respect robots.txt, so we were forced to think of a different solution. In the end we set up nginx to forward all requests with a facebook useragent to a dedicated pair of backend servers. If we were using nginx > v0.9.6 we could have done a nice regex for this, but we weren't, so we used a mapping along the lines of
map $http_user_agent $fb_backend_http {
"facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)"
127.0.0.1:80;
}
This has worked nicely for us; during the couple of weeks that we were getting hammered this partitioning of requests kept the heavy traffic away from the rest of the system.
It seems to have largely died down for us now - we're just seeing intermittent spikes.
As to why this happened, I'm still not sure - there seems to have been a similar incident in April that was attributed to a bug
http://developers.facebook.com/bugs/409818929057013/
but I'm not aware of anything similar more recently.
Whatever facebook invented you definitely need to fix your server as it is possible to crash it with external requests.
Also, just a first hit on google for facebookexternalhit: http://www.facebook.com/externalhit_uatext.php

How to Avoid Posting a Duplicate when Publishing to Facebook?

With the Graph API, I publish a story by POSTing to the /me/feed connection. I get back a success or an error result from Facebook. So far so good. Once in a while, the API takes a long time and the connection times out. In that case, I don't know for sure if the request succeeded of failed (i.e. maybe the request never reached Facebook, or maybe it succeeded and the result never made it back to me). How do you handle this situation?
More details:
I publish a lot of posts to Facebook and Twitter, so the timeout situation happens often. With Twitter, the solution is easy. If the request times out the first time, I simply try again. Twitter detects duplicates, so if the post was successfully published the first time, then I'll get a "duplicate status" error on the second request and I know that I don't need to retry any more.
But Facebook doesn't detect duplicates, so if I retry the publish request, I risk having two copies of the post published to the user wall, which is not nice. On the other hand, if I don't retry, I risk having the post not published at all. Thoughts?
I get back a success or an error result from Facebook.
Hmmm. When I post to the Graph API, I get back an error or the id of the post. I never see any success message. What SDK are you using around the API?
Once in a while, the API takes a long time and the connection times
out.
Usually when things are running slowly, it's due to the channelUrl not being specified. See https://developers.facebook.com/docs/reference/javascript/
It is important for the channel file to be cached for as long as
possible. When serving this file, you must send valid Expires headers
with a long expiration period. This will ensure the channel file is
cached by the browser which is important for a smooth user experience.
Without proper caching, cross domain communication will become very
slow and users will suffer a severely degraded experience.