Facebook Crawler Bot Crashing Site - facebook

Did Facebook just implement some web crawler? My website has been crashing a couple times over the past few days, severely overloaded by IPs that I've traced back to Facebook.
I have tried googling around but can't find any definitive resource regarding controling Facebook's crawler bot via robots.txt. There is a reference on adding the following:
User-agent: facebookexternalhit/1.1
Crawl-delay: 5
User-agent: facebookexternalhit/1.0
Crawl-delay: 5
User-agent: facebookexternalhit/*
Crawl-delay: 5
But I can't find any specific reference on whether Facebook bot respects the robots.txt. According to older sources, Facebook "does not crawl your site". But this is definitely false, as my server logs showed them crawling my site from a dozen+ IPs from the range of 69.171.237.0/24 and 69.171.229.115/24 at the rate of many pages each second.
And I can't find any literature on this. I suspect it is something new that FB just implemented over the past few days, due to my server never crashing previously.
Can someone please advice?

As discussed in in this similar question on facebook and Crawl-delay, facebook does not consider itself a bot, and doesn't even request your robots.txt, much less pay attention to it's contents.
You can implement your own rate limiting code as shown in the similar question link. The idea is to simply return http code 503 when you server is over capacity, or being inundated by a particular user-agent.
It appears those working for huge tech companies don't understand "improve your caching" is something small companies don't have budgets to handle. We are focused on serving our customers that actually pay money, and don't have time to fend off rampaging web bots from "friendly" companies.

We saw the same behaviour at about the same time (mid October) - floods of requests from Facebook that caused queued requests and slowness across the system. To begin with it was every 90 minutes; over a few days this increased in frequency and became randomly distributed.
The requests appeared not to respect robots.txt, so we were forced to think of a different solution. In the end we set up nginx to forward all requests with a facebook useragent to a dedicated pair of backend servers. If we were using nginx > v0.9.6 we could have done a nice regex for this, but we weren't, so we used a mapping along the lines of
map $http_user_agent $fb_backend_http {
"facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)"
127.0.0.1:80;
}
This has worked nicely for us; during the couple of weeks that we were getting hammered this partitioning of requests kept the heavy traffic away from the rest of the system.
It seems to have largely died down for us now - we're just seeing intermittent spikes.
As to why this happened, I'm still not sure - there seems to have been a similar incident in April that was attributed to a bug
http://developers.facebook.com/bugs/409818929057013/
but I'm not aware of anything similar more recently.

Whatever facebook invented you definitely need to fix your server as it is possible to crash it with external requests.
Also, just a first hit on google for facebookexternalhit: http://www.facebook.com/externalhit_uatext.php

Related

Limit number of facebook request allowed?

My previous facebook developer account was blocked due I guess of making to many requests (No reason given by them and no answer after trying to contact them).
So before fall in the same issue with a new account. I would like to know if some has some relevant info or experience of how much request per second or hour can be made safetly to Facebook's API before been mark as abuse of service and been banned.
Thanks.
There is an article about rate limits in the official docs: https://developers.facebook.com/docs/graph-api/advanced/rate-limiting

Facebook app blocked for posting too fast. What are the limits?

We (a local hackerspace) have a Tumblr blog and wanted to make ourselves a Facebook page. Before going live we wanted to import all our Tumblr content to Facebook so our fans on Facebook can browse it here as well. For this I have made an app that reads all the posts from our Tumblr blog and publishes them to our new Facebook page (backdating those posts as well). Here's my problem: after the app does about ~130 re-posts (~260 operations: publish + backdate) I start getting an error:
Received Facebook error response of type OAuthException: It looks like you were misusing this feature by going too fast. You’ve been blocked from using it.
Learn more about blocks in the Help Center. (code 368, subcode 1390008)
The block is gone the next day, but after a similar amount of operations it's back. After a couple of hours later, when the block is gone again, I introduced 6 second delays between operations, but that didn't help and after 19 re-posts I'm blocked again. Some facts:
I am publishing posts to a feed of (yet) unpublished page I am the (only) owner of.
The app is a standalone JAVA application and uses restfb to work with Facebook.
The line that is causing the error: facebookClient.publish("me/feed", FacebookType.class, params.toArray(new Parameter[0]));
All publish operations contain a link, mostly to respective posts on out Tumblr. Some contain message, caption or a name (depending on post type).
I need to re-post ~900 posts from Tumblr, I have done ~250 so far. When over, I will likely put in on server, scheduled, to keep syncing single new posts.
This app is not meant to be used publicly, it is rather a personal utility (but the code will be posted to GitHub, should anybody need it).
This is my first experience with Facebook API and I wasn't able to find a place where I could officially address them with this question. I could proceed by doing 100 posts/day, but I'm afraid I will eventually get banned for good, even though I don't feel like doing anything wrong.
I haven't put any more code here, as the code itself does not seem to be a problem, but rather the rate at which it is executed.
So, should I proceed with 100 posts/day and hope I won't be banned, or is there another "correct" way of dealing with this?
Thanks in advance!
I'm answering a bit late but I just had this problem too so I did some research : it seems that besides the rate limits shown in Facebook docs, there's also a much more limited and opaque rate for POST requests to limit spam.
It's not clearly set but it could depend on your relationship to the page you're writing to (admin or not), if you post to multiple pages and finally if you post too quickly.
To answer the question, it seems that it would have been okay if you had done like 1 post per minute or less.
I think you exceed the rate limiting for your user Id.
- Your app can make 200 calls per hour per user in aggregate. As an
example, if your app has 100 users, this means that your app can make
20,000 calls. One user could make 19,000 of those calls and another
could make 1,000, so this isn't a per-user limit. It's a per-app
limit
- That hour is a sliding window, updated every few minutes
- If your app is rate limited, all calls for that app will be limited, not
just for a specific user
- The number of users your app has is the
average daily active users of your app, plus today's new logins
Check this: https://developers.facebook.com/docs/graph-api/advanced/rate-limiting
It looks like you were misusing this feature by going too fast. You’ve been blocked from using it.
Learn more about blocks in the Help Center.
If you think you're seeing this by mistake, please let us know.

Ask Google to Stop Googlebot Crawl

Okay, so a Wordpress gallery plugin lead to a massive headache - with about 17 galleries having their own pagination, the links within created what might as well be infinite number of variant URLs combining the various query variables from each gallery.
As such, Google has been not so smart and has been HAMMERING the server to the tune of 4 gigs an hour prior to my actions, and about 800 requests a minute on the same page sending the server load up to 30 at one point.
It's been about 12 hours, and regardless of the changes I've made, Google is not listening (yet) and is still hammering away.
My question is: Is there a way to contact Google support and tell them to shut their misbehaving bot down on a particular website?
I want a more immediate solution as I do not enjoy the server being bombarded.
Before you say it, even though this isn't what I'm asking about, I've done the following:
Redirected all traffic using the misused query variable back to the Googlebot IP in hopes that the bot being forwarded back to itself will be a wake up call that something is not right with the URL. (I don't care if this is a bad idea)
Blocking the most active IP address from accessing that site.
Disabled the URLs from being created by the troubled plugin.
In Google Webmaster Tools/Search Console, I've set the URL parameters to "No: Doesn't affect page content" for the query variables.
Regardless of all of this, Google is still hammering away at 800 requests per minute/13 requests a second.
Yes, I could just wait it out, but I'm looking for a "HEY GOOGLE! STOP WHAT YOU ARE DOING!" solution besides being patient and allowing resources to be wasted.

Why Facebook Likes Blocked in My URL? I'm Not a Spammer

good morning!
Since last week my website URL - www.musiconline.xpg.com.br - and others sites from www.xpg.com.br , is/are with the LIKE BUTTON blocked.
I need the solution for this problem Urgently, because I'm NOT a Spammer!!!
My Fan Page is with a problem too: http://www.facebook.com/musicasonline
I'm trying to talk with Facebook in Forum and Support, but I still have no answer until now.
Thanks a lot for all!
I finally got my URL back. Facebook blocked use of our URL for years, yes years. For some reason no one would include our website URL (a-fib.com) in any post, or even on our own FB page. They'd get an error message saying our site had been flagged as spammy. This had been going on for years! FB has zero customer service.
Here's what worked. After trying everything else over the course of three years, we finally resorted to writing a pleading letter on our letterhead, asking why would anyone block a non-profit trying to help heart patients? We sent it 'registered mail' to FB headquarters. (Advice I found among other ideas online from those with a similar problem.) It worked. Praise the lord!
Facebook does not care if you personally are a spammer or not – it just blocks the whole domain, in this case most likely xpg.com.br.
This is a risk you’re always taking when using a shared domain. To avoid it, the best way is to get your own domain – then it’s you and only you who’s responsible – whereas now, if one of the users on this domain does not behave, all other users of the same domain will get punished as well.

How can I save my application now-oauth and HTTPS?

Well, I am a non-Facebook developer, and a normal .NET programmer who have created a Facebook application for a regular website around one year ago using FBML. I have no time to read the roadmap every day and know what will be next. So I had no idea about OAuth and HTTPS till yesterday.
All of a sudden I received an email yesterday saying (27 Sep) upgrade to OAuth 2.0 and HTTPS,
otherwise the application will be disabled from 1st Oct.
Now, I read about OAuth 2.0 the whole day yesterday, and think I can get around it now chaining things, but I found that the site which currently host my Facebook application doesn't have SSL (HTTPS support) or dedicated IP address and I found that I need to invest around 20$ to get SSL and another $20 for a dedicated IP address. I am ready to pay
for it, but as you know a response from hosting are not that quick.
Now I have emailed my hosting to make arrangement for it, but sadly they have not responded yet, and I now only have one day left (I don't know they will respond or not).
So how can I save my application? I don't have any server that support HTTPS for now. Even if I get some free SSL I am not able to implement it as I don't have a dedicated IP address.
Is there some way I can put a "will be back soon" kind of message. Or how can I save the application now?
I think I need to update that HTTPS canvas page in the setting anyhow.
FBML applications are not required to migrate and support HTTPS and signed requests.
Source: http://developers.facebook.com/blog/post/567/
Also, be aware that Facebook will stop supporting FBML apps at the end of the year.