How to recognize Facebook User-Agent

How to recognize Facebook User-Agent - facebook

When sharing one of my pages on FB, I want to display something different. Problem is, I prefer not to use the og: elements, but to recognize FB user-agent.
What is it? I can't find it.

For list of user-agent strings, look up here. The most used, as of September 2015, are facebookexternalhit/* and Facebot. As you haven't stated what language you're trying to recognize the user-agent in, I can't tell you more information. If you do want to recognize Facebook bot in PHP, use
if (
strpos($_SERVER["HTTP_USER_AGENT"], "facebookexternalhit/") !== false ||
strpos($_SERVER["HTTP_USER_AGENT"], "Facebot") !== false
) {
// it is probably Facebook's bot
}
else {
// that is not Facebook
}
UPDATE: Facebook has added Facebot to list of their possible user-agent strings, so I've updated my code to reflect the change. Also, code is now more predictible to possible future changes.

"Facebook's user-agent string is facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)..."
Hi
Small, yet important, correction -> Facebook external hit uses 2 different user agents:
facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Setting you fitler to 1.1 only may cause filtering issues with 1.0 version.
For more information about Facebook Bot (and other bots) please refer to Botopedia.org - a Comunity-Sourced bot directory, powered by Incapsula.
Besides user-agent data, the directory also offers an IP verification option, allowing you to cross-verify an IP/User-Agent, thus helping to prevent impersonation attempts.

Here are the Facebook crawlers User Agent:
FacebookExternalHit/1.1
FacebookExternalHit/1.0
or
facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Note that the version numbers might change. So use a regular expression to find the crawler name and then display your content.
Update:
You can use this code in PHP to check for Facebook User Agent
if(preg_match('/^FacebookExternalHit\/.*?/i',$agent)){
print "Facebook User-Agent";
// process here for Facebook
}
Here is ASP.NET code. You can use this function to check if the userAgent is Facebook's useragent.
public static bool IsFacebook(string userAgent)
{
userAgent = userAgent.ToLower();
return userAgent.Contains("facebookexternalhit");
}
Note:
Why would you need to do that? When you share a link to your site on Facebook, facebook crawls it and parses it to get some data to display the thumbnail, title and some content from your page, but it would link back to your site.
Also, I think this would lead to cloaking of the site, i.e. displaying different data to user and the crawlers. Cloaking is not considered a good practice and may search engines and site take note of it.
Update: Facebook also added a new useragent as of May 28th, 2014
Facebot
You can read more about the facebook crawler on https://developers.facebook.com/docs/sharing/webmasters/crawler

Please do note that sometimes the agent is visionutils/0.2 . You should check for it too.

Facebook User-Agents are:
FacebookExternalHit/1.1
FacebookExternalHit/1.0
facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit/1.0 (+https://www.facebook.com/externalhit_uatext.php)
facebookexternalhit/1.1 (+https://www.facebook.com/externalhit_uatext.php)
I'm using the code below to detect FB User-Agent in PHP and it works as intended:
$agent = $_SERVER['HTTP_USER_AGENT'];
if(stristr($agent, 'FacebookExternalHit')){
//Facebook User-Agent
}else{
//Other User-Agent
}

Short solution is to check pattern, and not to load all the mess to user each time
<?php
# Facebook optimized stuff
if(strstr($_SERVER['HTTP_USER_AGENT'],'facebookexternalhit')) {
$buffer.='<link rel="image_src" href="images/site_thumbnail.png" />';
}
?>

In the perspective of user-agent modifications on FB side, it is maybe safer to use a regex like that :
<?php
if (preg_match("/facebook|facebot/i", $_SERVER['HTTP_USER_AGENT'])){
do_something();
}
?>
You can find more information about Facebook crawler on their doc: https://developers.facebook.com/docs/sharing/webmasters/crawler

And if you want to block facebook bot from accessing your website (assuming you're using Apache) add this to your .htaccess file:
<Limit GET POST>
BrowserMatchNoCase "Feedfetcher-Google" feedfetcher
BrowserMatchNoCase "facebookexternalhit" facebook
order deny,allow
deny from env=feedfetcher
deny from env=facebook
</Limit>
It also blocks google's feedfetcher that also can be used for cheap DDoSing.

Firstly you should not use in_array as you will need to have the full user agent and not just a subset, thus will quickly break with changes (i.e. version 1.2 from facebook will not work if you follow the current preferred answer). It is also slower to iterate through an array rather than use a regex pattern.
As no doubt you will want to look for more bot's later so I've given the example below with 2 bot names split in a pattern with the pipe | symbol. the /i at the end makes it case insensitive.
Also you should not use $_SERVER['HTTP_USER_AGENT']; but you should filter it first incase someone has been a little nasty things exist in there.
$pattern = '/(FacebookExternalHit|GoogleBot)/i';
$agent = filter_input(INPUT_SERVER, 'HTTP_USER_AGENT', FILTER_SANITIZE_ENCODED);
if(preg_match($pattern,$agent)){
echo "found one of the patters";
}
A bit safer and faster code.

You already have the answer for Facebook above, but one way to get any user agent is to place a script on your site that will mail you when there is a visit to it. For example, create this file on your domain at, say, https://example.com/user-agent.php :
<?php
mail('you#youremail.com', 'User Agent', $_SERVER['HTTP_USER_AGENT']);
Then, visit Facebook, and type the link to the script there, and hit space bar. You don't actually have to share anything, just typing the link in and a space will cause Facebook to fetch a preview. You should then get an email with Facebook's user agent.

Another generic approach in PHP
$agent = $_SERVER['HTTP_USER_AGENT'];
$agent = trim($agent);
$agent = strtolower($agent);
if (
strpos($agent,'facebookexternalhit/1.1')===0
|| strpos($agent,'facebookexternalhit/1.0')===0
){
//probably facebook
}else{
//probably not facebook
}

Related

Get username field in Facebook Graph API 2.0

The "old" Facebook Graph API had a "username" field which could be used to create a human-readable profile URL. My username for example is "sebastian.trug" which results in a Facebook profile URL http://www.facebook.com/sebastian.trug.
With Graph API 2.0 Facebook has removed the "username" field from the user data as retrieved from "/me".
Is there any way to get this data via the 2.0 API or is the "username" now being treated as a deprecated field?

Facebook got rid of the username because the username is one way of sending emails via Facebook.
For example, given the url http://www.facebook.com/sebastian.trug
the corresponding Facebook email would be sebastian.trug#facebook.com
which, if emailed, would be received to messages directly (if the message setting is set to public), otherwise to the other inbox.

The username field of the User object has been removed, and does not exist in Graph API v2.0. In v2.0 of the API is there is no way to get the FB username of a user.
Source: https://developers.facebook.com/docs/apps/changelog#v2_0_graph_api
"/me/username is no longer available."

#Simon Cross - It being deprecated is documented, yes. That is not the question, the question is how to get it and -furthermore- I wonder why Facebook has made such a terrible choice and removed the username. Hundreds of applications that rely on the username to create accounts on their service will be broken.
#user3596238 - You can stick with the V.1 API which will be around until the end of April 2015, by far not the best solution but Facebook might be irrelevant by then anyway.
https://developers.facebook.com/docs/apps/changelog
Solution: ask the user for a username besides the actual Facebook login? - In my opinion, that makes the Facebook login completely pointless anyway.

While the 2.0 SDK will not provide the username field any longer, it is quite easily scraped if you have the user id number (which you will likely be using to access the graph anyway).
The url facebook.com/<user id> will redirect to facebook.com/<username>, which can then be extracted however you like.

my approach is scrapping the username using nokogiri through the user profile.
kinda like this (in ruby):
html = RestClient.get("http://facebook.com/123123xxx)
doc = Nokogiri::HTML(html)
username = doc.css('head meta')[1].attributes["content"].value

One of the ways could be to access facebook.com/{userid} using cURL and then follow the redirect.
The page rediects to facebook.com/{username}

Inspired from #RifkiFauzi 's answer, here is my solution in pure Python
#get html of a page via pure python ref. https://stackoverflow.com/a/23565355/248616
import requests
r = requests.get('http://fb.com/%s' % FB_USER_ID) #open profile page of the facebook user
r.raise_for_status()
html = r.content
#search string with regex ref. https://stackoverflow.com/a/4667014/248616
import re
# m = re.search('meta http-equiv="refresh" content="0; URL=/([^?]+)\?', html)
m = re.search('a class="profileLink" href="([^"]+)"', html)
href = m.group(1) #will be https://www.facebook.com/$FB_USER_NAME on 201705.24
username = href.split('/')[-1]
print(href)
print(username)

https://graph.facebook.com/?id=100005908663675
Simply change id to whatever.

Facebook could not post to wall facebook error 1367001

This self answer is the result of agonizing over the deprecated facebook sharer url based api, which kept refusing to post the URL I had setup through the sharer URL format. The resultant response was the following no matter what privacy setting I set:
{
"__ar":1,
"error":1367001,
"errorSummary":"Could not post to Wall",
"errorDescription":"The message could not be posted to this Wall.",
"payload":null,
"bootloadable":{},
"ixData":[]
}
My mess of code. (Yes it breaks every convention known to web development and yes I inherited this code.)
<?php
$url = urlencode(Domain::getDomain()."/".$details['urlname']);
$title=urlencode($details['name']);
$summary=$details['name'];
$image=urlencode(constant('BASE_IMAGES').'/'.$details['gallery']['listing'][0]['thumb']['src']);
?>
<a onClick="window.open('http://www.facebook.com/sharer.php?s=100&p[title]=<?php echo $title;?>&p[summary]=<?php echo $summary;?>&p[url]=<?php echo $url; ?>&p[images][0]=<?php echo $image;?>','sharer','toolbar=0,status=0,width=548,height=325'); return false" href="javascript: void(0)"><img src="/site/images/icon-facebook.png" alt="Facebook" border="0"></a>
The official question is as follows: Why does the sharer method of sharing links in facebook result in this error?

For my case, this was because I had not included the fully qualified URL including the protocol. My PHP generated URL was a mess of concatenation and did not include the protocol. After adding it, the post went through.
I had to change the second line listed above to
$url = urlencode('http://'.Domain::getDomain()."/".$details['urlname']);
There could be more reasons why Facebook throws this 1367001 Error, but this is at least one thing to check.

Yes, Please start your Url with http://
eg : http://www.google.com
May be there might be Url validations been performed at their (FB's)end hence start with Protocol name and end with domain name (com,co.in, ie)
This fixed my issue :)

is it possible to include # in redirect_uri for facebook's dialog/feed

I have a jquery mobile site that I want to share via facebook's dialog/feed system.
jQuery mobile uses #'s for their internal navigation system, so if I want to share a jqm url for page_3 of my jqm site, I would use something like: http://www.my_jqm_site.com/#page_3.
But that # is causing grief for facebook's dialog/feed:
https://www.facebook.com/dialog/feed
?app_id = ...
&link = http://apps.facebook.com/celjska_puzzle/#page_3
&redirect_uri = http://apps.facebook.com/celjska_puzzle/#page_3
&picture = ...
&name = .
&caption = ...
&description = ...
So is their anyway to do it?
I have tried it both with and without encoding.
Currently I suppose I will use a ? and then get the page to make some alterations via javascript during loading, but I really hate the thought of doing it this way.

I'm working on a problem like this right now. The URL to go in redirect_uri and link:
.../xfile.jsp?item=/contests/bhg_homeimprovement/bhg_splashsweeps_win2500_homeimprovement&temp=yes&hid=#HashID#&esrc=nwbhgsweeps072514a
The FB dialog gets an error as is. Encoding the full URL fixes that and does output hashes around "HashID", but the equal sign before it is removed. Adding a 2nd equal sign there will output 2 equal signs, but having just one will output none.
This isn't a complete solution but it does look like hash marks are possible.

Facebook debugger: Clear whole site cache

I am aware that Facebook caches the Like data for specific pages on your site once they're visited for the first time, and that entering the url into the debugger page clears the cache. However, we've now improved our Facebook descriptions/images/etc and we need to flush the cache for the entire site (about 300 pages).
Is there a simple way to do this, or if we need to write a routine to correct them one by one, what would be the best way to achieve this?

Is there a simple way to do this,
Not as simple as a button that clears the cache for a whole domain, no.
or if we need to write a routine to correct them one by one, what would be the best way to achieve this?
You can get an Open Graph URL re-scraped by making a POST request to:
https://graph.facebook.com/?id=<URL>&scrape=true&access_token=<app_access_token>
So you’ll have to do that in a loop for your 300 objects. But don’t do it too fast, otherwise you might hit your app rate limit – try to leave a few seconds between the requests, according to a recent discussion in the FB developers group that should work fine. (And don’t forget to URL-encode the <URL> value properly before inserting it into the API request URL.)

The simple solution in wordpress, go to permalinks and change the permalinks and use a custom permalink, in my case I just added an underscore so did this...
/_%postname%/
Facebook then has no info on the (now) new urls so they scrape it all fresh.
I was looking for this same answer and all the answers were super complicated for me as a non coder.
Turned out there is a very simple answer and I came up with it all by myself :) .
I have a wordpress website that with a variety of plugins I've bulk uploaded over 4,000 images that created 4,000 posts.
The problem was I uploaded them and then tried setting up the facebook share plugins before sorting the og:meta tag issue so the total 4,000 posts were scraped by FB with no og:meta so when I then added them it made no difference. The fb debugger could not be used as I had over 4k posts.
I must admit I'm a bit excited, for many years I have got helpfull answers from google searches sending me to this forum. Often the suggestions I found were well over my head as I'm not a coder, I'm a "copy paster".
I'm so happy to be able to give back to this great forum and help someone else out :)

Well i also got the same scenario and used hack and it works but obviously as #Cbroe mentioned in his answer that the API call has some limitation with rate limiting so i guess you should take care of it in my case i only have 100 URLs to re-scrape.
So here is the solution:
$xml = file_get_contents('http://example.com/post-sitemap.xml'); // <-- Because i have a wordpress site which has sitemap.
$xml = simplexml_load_string($xml); // Load it as XML
$applicationAccessToken = 'YourToken'; // Application Access Token You can get it from https://developers.facebook.com/tools/explorer/
$urls = [];
foreach($xml->url as $url) {
$urls[] = $url->loc; // Get URLS from site map to our new Array
}
$file = fopen("response.data", "a+"); // Write API response to another file so later we can debug it.
foreach($urls as $url) {
echo "\033[Sending URL for Scrape $url \n";
$data = file_get_contents('https://graph.facebook.com/?id='.$url.'&scrape=true&access_token='.$applicationAccessToken);
fwrite($file, $data . "\n"); //Put Response in file
sleep(5); // Sleep for 5 seconds!
}
fclose($file); // Close File as all the urls is scraped.
echo "Bingo It's Compelted!";

TYPO3: 404 for restricted access page instead of login form

I have a link pointing to restricted page. When I access the link directly in logout status, its redirect to 404. Actually it should redirect to login form.
I tried:
config {
typolinkLinkAccessRestrictedPages=PAGE_ID
typolinkLinkAccessRestrictedPages_addParams = &return_url=###RETURN_URL###&pageId=###PAGE_ID###
}
Not working.
Also I tried the login status redirect plugin, no use.
Anyone know how to do this? I am using TYPO3 version 4.4.8.

As this is still unanswered, does this help?
Valid for TYPO3 < 8.x
# Check if user is logged in:
[usergroup = *]
# do something
[else]
page.config >
page.config.additionalHeaders = Location: http://www.yourdomain.org/login.html
[end]
I recently posted this to another questions and it crossed my mind that it might be a suitable workaround for your probem.
Found here

I'm not sure how to make redirection work correctly, but perhaps a bit of background will be helpful.
typolinkLinkAccessRestrictedPages only interacts with link generation. That way, anywhere you have a link to an access restricted page, you should get a link that points to the "PAGE Id" page. I suspect you are using your login pid in place of PAGE Id, which I guess should work, but I haven't used this particular feature. I have typolinkLinkAccessRestrictedPages = NONE which makes all links show up, linked to the correct url, but only users who are logged in will successfully load those pages.
If anyone, without being logged in, uses a bookmark to an access restricted page, or they click on one of these links, or directly type in the address, or whatever, they will run into TYPO3's 404 handling (with the error message: ID was not an accessible page). To change how TYPO3 handles these errors, you need to change what TYPO3 does via this setting in localconf.php:
$TYPO3_CONF_VARS["FE"]["pageNotFound_handling"]
I don't know if there's a clean way to just automatically redirect to the login page without hacking the pageNotFound_handling.
As far as the typoscript solution, that wouldn't work for my site, because the trigger isn't whether or not someone is logged in (often they will not be logged in)--the trigger for my site is trying to access a protected page when you are not logged in. I don't want it to redirect everyone who isn't logged in because a lot of pages don't require any login.

Fe_login cannot alone do this...
Follow these steps::
Install "pagenotfoundhandling" extention after felogin login
configuration.
Configure 403 page as login page in "pagenotfoundhandling" extention configuration.
Then, when you try to access "Access restricted page", "pagenotfoundhandling" will redirect to login page then pagenotfoundhandling handle redirect to again requested page. I have tested this on TYPO3 6.2.14

And I found an other workaround that looks like it should work fine.
# pages and subpages starting at 123 and 321 are restricted
[PIDinRootline = 123,321] && [loginUser = ]
page.headerData.666 = TEXT
page.headerData.666 {
data = getIndpEnv:TYPO3_REQUEST_URL
wrap = <meta http-equiv="refresh" content="0; URL=/passwort/?referer= | " />
}
[global]
Important notice: Do not restrict the complete page, only all contents of the page. Otherwise RealURL will trigger the 404 handler.
At the moment page.config.additionalHeaders (like used by #Mateng) does not support stdWrap, though you cannot add a referrer to redirect to the desired page after login (see TYPO3 Forge and vote for feature request).

Complete solution :
1. first in typo3conf/LocalConfiguration.php you have to add:
'FE' => [
'pageNotFound_handling' => 'REDIRECT:/login/',
"pageNotFound_handling_statheader" => 'HTTP/1.1 404 Not Found',
...
],
then add to typoscript :
'
config {
typolinkLinkAccessRestrictedPages = YOUR_LOGIN_PAGE_ID
typolinkLinkAccessRestrictedPages_addParams = &return_url=###RETURN_URL###
}
plugin.tx_felogin_pi1.redirectMode = referer
'

Because there seems no proper solution for this behaviour of TYPO3, I use the following workaround with RealURL.
Create a 404 page in TYPO3
set the Speaking URL path segment to "404-error" and check
Override the whole page path
Add a text that describes what is happening (i. e. "Page doesn't exist or is restricted, please login")
Add the felogin plugin to that page and hide it when users are logged in
Set [FE][pageNotFound_handling] = /404-error/ in the install tool
This 404-error page is shown every time a user requests a page that he is either not allowed to see or a page that does not exist. When the user uses the login form on the page, he will find the proper content immediately after login because the URI did not change at all (when there is no redirect configured for the fe_login plugin).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse