Open Graph scraping base URL instead the URL it's given - facebook

The Facebook OpenGraph debug tool is scraping the wrong page.
If I give it a full URL (pointing to an individual page on my site) that I want it to scrape, instead of scraping that page and finding its meta tags, it scrapes my site's main page and returns those meta tags (which are obviously wrong in this context).
The weird thing is, it will even find and scrape my site's main page even if it's not located at the root of my domain. For example:
I want it to scrape http://mydomain.com/myhomepage/specific_page.html
Instead, it scrapes http://mydomain.com/myhomepage/
This implies to me that the error must be a setting someplace, either on my site or on my Facebook App settings. Would the App settings do that? Redirect to whatever URL is set if a requested URL is a descendent of it?
The URL I'm requesting is not doing a 302 or anything - I can click the link from the FB debug tool even and it will take me to the appropriate page.
A few notes:
specific_page.html is not an actual file, it is routed through index.php using mod_rewrite in Apache's htaccess. I tried being specific with http://mydomain.com/myhomepage/index.php/specific_page.html and it did not work then either.
Another SO question led me to believe that the user-agent might be getting redirected if it doesn't allow cookies (as the Facebook web crawler does not) so I opened a fresh browser, disabled cookies, tried again, and I still reached the appropriate page.

As mentioned in the comments above, in your case this was due to an og:url meta tag, redirecting Facebook's crawler to that URL
In general, cases like this are usually the og:url tag, a HTTP redirect, or a canonical meta tag pointing at the 'other' / 'wrong' URL - Facebook's crawler follows those redirects looking for the final URL

Related

"Circular Redirect" errors in Facebook open graph debugger after API was upgraded to 2.6

On April 12th, Facebook upgraded the FB App we use to share new articles to our page to API version 2.6. Since then, stories get posted, but the image is usually not added to the story.
When I check with the opengraph debugger, I see strange errors like:
Circular Redirect
We could not resolve the canonical URL because the redirect path contained a cycle.
With redirect paths of:
Redirect Path
Input URL arrow-right https://www.bleepingcomputer.com/news/government/34-tech-firms-sign-accord-not-to-assist-government-hacking-operations/
301 HTTP Redirect arrow-right https://www.bleepingcomputer.com/news/government/34-tech-firms-sign-accord-not-to-assist-government-hacking-operations/
Even stranger, sometimes the redirect paths show urls like below, which are not from my site:
Input URL arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/
301 HTTP Redirect arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/?utm_content=70198165&utm_medium=social&utm_source=facebook
og:url Meta Tag arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/
Once I click on the scrape button again, it gets rid of the error, but the images still do not show in the post on my FB page.
It is almost as if the scraper is targeting URLs that are not mine, even though I am sending them correctly and I have the same urls in my og:url and canonical tags.
Any ideas?

Facebook takes wrong canonical URL

Scenario of the problem:
We enforced HTTPS on a website. Any URL with HTTP now redirects (301 permanent redirect) to an appropriate HTTPS URL.
To avoid Facebook like/share buttons (that are placed on many pages of the website) loosing previous numbers of likes/shares, we made the buttons to "link" to the old HTTP URLs via the "data-href" property.
Additionally we placed the "og:url" meta tag on some pages, pointing to the old HTTP URLs.
I then scraped that pages at the Facebook debugger tool https://developers.facebook.com/tools/debug to make sure the Facebook gets the fresh data. According to the scraped data, canonical URLs were indeed pointing to the old HTTP URLs just as it should be according to our actions listed above. This was also reflected in the like/share buttons on our pages keeping the old numbers.
A few days later I discovered that some pages lose the old numbers of likes/shares. Checking the pages in the Facebook debugger shows that Facebook now takes HTTPS URLs as canonical. We did not make any changes on our pages, and the "og:url" tag is still pointing to the HTTP URLs. But the Facebook wrongly takes HTTPS URLs as canonical URLs. Now if I scrape the information again in the debugger, it agains becomes normal, showing HTTP as canonical and restoring the old number of likes/shares. But obviously it's not a solution to the problem, because we cannot constantly monitor all our pages and scrape them again and again.
Any ideas of what may causing the problem?
Facebook follows HTTP redirects as well. You need to make your old HTTP URLs available to the scraper, without redirecting it to the HTTPS version. (The scraper can be recognized by its User-Agent, see social plugins FAQ.)
The old HTTP URLs need to be available to the scraper, and not redirected to HTTPS, as the FAQ also mentions:
“This also requires that the old URL still renders a document with Open Graph tags and returns a HTTP 200 response, at least when loaded by Facebook's crawler. If you want other clients to redirect when they visit the URL, you must send your 301 HTTP response to all non-Facebook crawler clients. The old URL should contain its own og:url tag that points to itself.”

Facebook debugger reports "Circular redirect path detected" (301) on certain website

Why isn't the Facebook debugger able to parse http://www.brandenburg-business-guide.de/ ? It reports 301 Circular redirect path detected. However, there is actually no redirection in place. Also apache's access.log reports no retrievals by facebook.
See https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fwww.brandenburg-business-guide.de%2F and hit Debug button.
The page does not contain any OpenGraph meta tags. Check out the documentation.
Facebook scraper expects the page to contain og:url meta tag which will serve as the canonical url OR link tag with rel="canonical". Since the document is missing both, Facebook cannot decide what is the canonical url, hence the circular redirect path error.
For a reference, try Goodreads debugging information.
Hope this helps.

Open Graph URL is a permalink that redirects to page

I'm adding on my website the open graph meta tags and testing if it is working with the Facebook URL Linter.
The only thing that is not working how I would like it is the og:url: tag, in this meta tag I want to add the permalink url of the current page.
The permalink actually redirects to the current page. I use this because my URL of the pages looks like this : http://website.com/photos/243/hello-this-is-the-title/ and the last part of the URL can be changed be the user and if it is changed the URL will change and it will not be associated with all the "Likes" stored at Facebook.
This is why I have a permalink page that looks like this : http://website.com/permalink/243/ and this will redirect to http://website.com/photos/243/hello-this-is-the-title/, so that all the likes on Facebook are associated with the permalink instead of the other one.
When I use the Facebook URL linter it tells me that there are some critical errors that need to be fixed - Circular redirect path detected (see 'Redirect Path' section for details).
I don't know if what I want to do is possible. But I could really use a little help here.
This is why I have a permalink page that looks like this : http://website.com/permalink/243/ and this will redirect to http://website.com/photos/243/hello-this-is-the-title/, so that all the likes on Facebook are associated with the permalink instead of the other one.
When I use the Facebook URL linter it tells me that there are some critical errors that need to be fixed - Circular redirect path detected (see 'Redirect Path' section for details).
Two options:
Exclude the Facebook scraper from being redirected, by looking for it’s user agent (details).
Don’t redirect server-side, but do it client-side via JavaScript instead. (The scraper does not care about JavaScript.)

Canonical OG URL fails in URL linter because of redirect

I recently changed domain name for my site and migrated my content. Most URLs from the old site use a 301 redirect to the new site, as you would guess.
In an effort to retain FB like and comment data, I kept the og:url property set to the old URL, since it is the original and canonical identifier in Open Graph. I implemented in August, and it was working properly, with previous like data retained. Now it is not working and showing previous like data, and fails in the URL Debugger.
Here is an example from the new site:
http://seattle.findwell.com/million-things-to-do-seattle/washington-brewers-festival-2011
In URL Debuggger, it now returns this error:
There was an error in fetching the object at URL 'http://seattle.findwell.com/million-things-to-do-seattle/washington-brewers-festival-2011/', or one of the the URLs specified via a redirect or the 'og:url' property including one of http://www.hometalkin.com/seattle/million-things-to-do-seattle/washington-brewers-festival-2011/.
Nothing has changed in my OG tags. Has something changed with canonical URL in open graph that causes it to fail when a redirect is in place?
This page:
http://seattle.findwell.com/million-things-to-do-seattle/washington-brewers-festival-2011
has this og:url:
<meta content="http://www.hometalkin.com/seattle/million-things-to-do-seattle/washington-brewers-festival-2011/"
property="og:url" />
but when you actually go to (in fact - when Facebook crawler tries to go to) this URL from og:url the site redirects you back to:
http://seattle.findwell.com/million-things-to-do-seattle/washington-brewers-festival-2011/
This is a circular reference.
In order to fix it you need to change your og:url to:
http://seattle.findwell.com/million-things-to-do-seattle/washington-brewers-festival-2011/
I actually had a very difficult time with this when I was first starting out as a developer.
I made a tool for this exact purpose -- as I thought it might be helpful to others:
Facebook/Open Graph Like Button Generator
It generates (and stores) the open graph tag(s) so you don't need to put them in your page and the 'Redirect URL' tells it where to send all the traffic.
It detects the Facebook bot/scraper too so it won't interfere with anything :)
Good luck
I had exactly the same problem.
I have changed my website url and I have over 40.000 radios with fb likes and comments.
Example:
http://www.radioways.com/fr/radio/nrj.html
to
http://www.radioways.fr/radio/nrj.html
I spend days and days reading and checking the forum and I did not find the answer...
Here is what to do:
1. In the og canonical, you need to put your old website url
2. In the Iframe url, you can put both. The result of the likes, will be the addition of the likes of the ogcanonical + the likes of the Iframe
An this will not work (and this is why I spent so much time USELESS) untill you validate those setup with FB
Best regards.