Facebook takes wrong canonical URL

Facebook takes wrong canonical URL - facebook

Scenario of the problem:
We enforced HTTPS on a website. Any URL with HTTP now redirects (301 permanent redirect) to an appropriate HTTPS URL.
To avoid Facebook like/share buttons (that are placed on many pages of the website) loosing previous numbers of likes/shares, we made the buttons to "link" to the old HTTP URLs via the "data-href" property.
Additionally we placed the "og:url" meta tag on some pages, pointing to the old HTTP URLs.
I then scraped that pages at the Facebook debugger tool https://developers.facebook.com/tools/debug to make sure the Facebook gets the fresh data. According to the scraped data, canonical URLs were indeed pointing to the old HTTP URLs just as it should be according to our actions listed above. This was also reflected in the like/share buttons on our pages keeping the old numbers.
A few days later I discovered that some pages lose the old numbers of likes/shares. Checking the pages in the Facebook debugger shows that Facebook now takes HTTPS URLs as canonical. We did not make any changes on our pages, and the "og:url" tag is still pointing to the HTTP URLs. But the Facebook wrongly takes HTTPS URLs as canonical URLs. Now if I scrape the information again in the debugger, it agains becomes normal, showing HTTP as canonical and restoring the old number of likes/shares. But obviously it's not a solution to the problem, because we cannot constantly monitor all our pages and scrape them again and again.
Any ideas of what may causing the problem?

Facebook follows HTTP redirects as well. You need to make your old HTTP URLs available to the scraper, without redirecting it to the HTTPS version. (The scraper can be recognized by its User-Agent, see social plugins FAQ.)
The old HTTP URLs need to be available to the scraper, and not redirected to HTTPS, as the FAQ also mentions:
“This also requires that the old URL still renders a document with Open Graph tags and returns a HTTP 200 response, at least when loaded by Facebook's crawler. If you want other clients to redirect when they visit the URL, you must send your 301 HTTP response to all non-Facebook crawler clients. The old URL should contain its own og:url tag that points to itself.”

Related

"Circular Redirect" errors in Facebook open graph debugger after API was upgraded to 2.6

On April 12th, Facebook upgraded the FB App we use to share new articles to our page to API version 2.6. Since then, stories get posted, but the image is usually not added to the story.
When I check with the opengraph debugger, I see strange errors like:
Circular Redirect
We could not resolve the canonical URL because the redirect path contained a cycle.
With redirect paths of:
Redirect Path
Input URL arrow-right https://www.bleepingcomputer.com/news/government/34-tech-firms-sign-accord-not-to-assist-government-hacking-operations/
301 HTTP Redirect arrow-right https://www.bleepingcomputer.com/news/government/34-tech-firms-sign-accord-not-to-assist-government-hacking-operations/
Even stranger, sometimes the redirect paths show urls like below, which are not from my site:
Input URL arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/
301 HTTP Redirect arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/?utm_content=70198165&utm_medium=social&utm_source=facebook
og:url Meta Tag arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/
Once I click on the scrape button again, it gets rid of the error, but the images still do not show in the post on my FB page.
It is almost as if the scraper is targeting URLs that are not mine, even though I am sending them correctly and I have the same urls in my og:url and canonical tags.
Any ideas?

Open Graph scraping base URL instead the URL it's given

The Facebook OpenGraph debug tool is scraping the wrong page.
If I give it a full URL (pointing to an individual page on my site) that I want it to scrape, instead of scraping that page and finding its meta tags, it scrapes my site's main page and returns those meta tags (which are obviously wrong in this context).
The weird thing is, it will even find and scrape my site's main page even if it's not located at the root of my domain. For example:
I want it to scrape http://mydomain.com/myhomepage/specific_page.html
Instead, it scrapes http://mydomain.com/myhomepage/
This implies to me that the error must be a setting someplace, either on my site or on my Facebook App settings. Would the App settings do that? Redirect to whatever URL is set if a requested URL is a descendent of it?
The URL I'm requesting is not doing a 302 or anything - I can click the link from the FB debug tool even and it will take me to the appropriate page.
A few notes:
specific_page.html is not an actual file, it is routed through index.php using mod_rewrite in Apache's htaccess. I tried being specific with http://mydomain.com/myhomepage/index.php/specific_page.html and it did not work then either.
Another SO question led me to believe that the user-agent might be getting redirected if it doesn't allow cookies (as the Facebook web crawler does not) so I opened a fresh browser, disabled cookies, tried again, and I still reached the appropriate page.

As mentioned in the comments above, in your case this was due to an og:url meta tag, redirecting Facebook's crawler to that URL
In general, cases like this are usually the og:url tag, a HTTP redirect, or a canonical meta tag pointing at the 'other' / 'wrong' URL - Facebook's crawler follows those redirects looking for the final URL

Prevent Facebook following/discarding short url to redirect page

We're trying to build a share widget for referral links (links that are essentially short urls - a.b.com/uniqueCode, which redirects to a client website but goes through our service and lets us track them) - essentially someone needs to share their own affiliate url on Facebook.
The problem is Facebook seems to always resolve the url to the final destination, and doesn't display the url we pass in. I can't find any documentation on whether there's a way to prevent this or not. We've tried both with a 301 and 302 redirect, with no change. I've tried different urls to make sure we're not seeing the result of their url caching.
Is there a way to instruct Facebook (and Google Plus, Linkedin) to keep the referring link we provide?

You could try doing the redirect using Javascript.
Also, you might be able to do it by showing the social networks' crawlers different pages based on their useragent or IP or so – look into your webserver logs or so and check what happens there immediately after you've posted the link.

Preserve Facebook Likes, 301 Redirects, href tags, and canonical URLs [duplicate]

I understand the og:url meta tag is the canonical url for the resource in the open graph.
What strategies can I use if I wish to support 301 redirecting of the resource, while preserving its place in the open graph? I don't want to lose my likes because i've changed the URLs.
Is the best way to do this to store the original url of the content, and refer to that? Are there any other strategies for dealing with this?
To clarify - I have page:
/page1, with an og:url of http://www.example.com/page1
I now want to move it to
/page2, using a 301 redirect to http://www.example.com/page2
Do I have any options to avoid losing the likes and comments other than setting the og:url meta to /page1?

Short answer, you can't.
Once the object has been created on Facebook's side its URL in Facebook's graph is fixed - the Likes and Comments are associated with that URL and object; you need that URL to be accessible by Facebook's crawler in order to maintain that object in the future. (note that the object becoming inaccessible doesn't necessarily remove it from Facebook, but effectively you'd be starting over)
What I usually recommend here is (with examples http://www.example.com/oldurl and http://www.example.com/newurl):
On /newpage, keep the og:url tag pointing to /oldurl
Add a HTTP 301 redirect from /oldurl to /newurl
Exempt the Facebook crawler from this redirect
Continue to serve the meta tags for the page on http://www.example.com/oldurl if the request comes from the Facebook crawler.
No need to return any actual content to the crawler, just a simple HTML page with the appropriate tags
Thus:
Existing instances of the object on Facebook will, when clicked, bring users to the correct (new) page via your redirect
The Like button on the (new) page will still produce a like of the correct object (but at the old URL)
If you're moving a lot of URLs around or completely rewriting your URL scheme you should use the new URLs for new articles/products/etc, but you'll need to keep the redirect in place if you want to retain likes, comments, etc on the older content.
This includes if you're changing domain.
The only problem here is maintaining the old URL -> new URL mapping somewhere in your code, but it's not technically difficult, just an additional thing to maintain in the future.
BTW, The Facebook crawler UA is currently facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

I'm having the same problem with my old sites. Domains are changing, admins want to change urls for seo etc
I came to conclusion its best to have some sort uniqe id in db just for facebook - from the beginning. For articles for example I have myurl.com/a/123 where 123 is ID of the article.
Real url is myurl.com/category/article-title. Article can then be put in different category, renamed etc with extensive logic for 301 redirects behind it. But the basic fb identifier can stay the same for ever.
Of course this is viable only when starting with a fresh site or when implementing fb comments for the first time.
Just an idea if you can plan ahead :) Let me know what you think.

How can I move a URL via 301 redirect and retain the page's Facebook likes and Open Graph information?

I understand the og:url meta tag is the canonical url for the resource in the open graph.
What strategies can I use if I wish to support 301 redirecting of the resource, while preserving its place in the open graph? I don't want to lose my likes because i've changed the URLs.
Is the best way to do this to store the original url of the content, and refer to that? Are there any other strategies for dealing with this?
To clarify - I have page:
/page1, with an og:url of http://www.example.com/page1
I now want to move it to
/page2, using a 301 redirect to http://www.example.com/page2
Do I have any options to avoid losing the likes and comments other than setting the og:url meta to /page1?

Short answer, you can't.
Once the object has been created on Facebook's side its URL in Facebook's graph is fixed - the Likes and Comments are associated with that URL and object; you need that URL to be accessible by Facebook's crawler in order to maintain that object in the future. (note that the object becoming inaccessible doesn't necessarily remove it from Facebook, but effectively you'd be starting over)
What I usually recommend here is (with examples http://www.example.com/oldurl and http://www.example.com/newurl):
On /newpage, keep the og:url tag pointing to /oldurl
Add a HTTP 301 redirect from /oldurl to /newurl
Exempt the Facebook crawler from this redirect
Continue to serve the meta tags for the page on http://www.example.com/oldurl if the request comes from the Facebook crawler.
No need to return any actual content to the crawler, just a simple HTML page with the appropriate tags
Thus:
Existing instances of the object on Facebook will, when clicked, bring users to the correct (new) page via your redirect
The Like button on the (new) page will still produce a like of the correct object (but at the old URL)
If you're moving a lot of URLs around or completely rewriting your URL scheme you should use the new URLs for new articles/products/etc, but you'll need to keep the redirect in place if you want to retain likes, comments, etc on the older content.
This includes if you're changing domain.
The only problem here is maintaining the old URL -> new URL mapping somewhere in your code, but it's not technically difficult, just an additional thing to maintain in the future.
BTW, The Facebook crawler UA is currently facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

I'm having the same problem with my old sites. Domains are changing, admins want to change urls for seo etc
I came to conclusion its best to have some sort uniqe id in db just for facebook - from the beginning. For articles for example I have myurl.com/a/123 where 123 is ID of the article.
Real url is myurl.com/category/article-title. Article can then be put in different category, renamed etc with extensive logic for 301 redirects behind it. But the basic fb identifier can stay the same for ever.
Of course this is viable only when starting with a fresh site or when implementing fb comments for the first time.
Just an idea if you can plan ahead :) Let me know what you think.