Some Time ago our users reported problems with sharing content from our page (text/image wouldn't show up in the share dialog), after some research we added the og:image:width and og:image:height-tags. To reduce loading time, the facebook-scraper receives the page with an empty <body> (we had troubles with timeout too).
Everything worked great until about a week ago. Pages that have already been scraped showed errors in the Object-Debugger:
Error parsing input URL, no data was cached, or no data was scraped.
This error shows up nearly everytime i click Show existing scrape information, eventually it would go away (without re-scraping the page) but then following error shows up:
The 'og:type' property is required, but not present.
The Sharing-Debugger additionally shows following error:
The parser's result for this metadata did not match the input metadata. Likely, this was
caused by the data being ordered in an unexpected way, multiple values being given for a
property only expecting a single value, or property values for a given property being
mismatched. Here are the input properties that were not seen in the parsed result:
'fb:admins, og:type, og:description, og:title, og:site_name, og:image:url, og:image:width,
og:image:height'
Sometimes it also says that our images are too big and couldnt be downloaded, but the image is shown in the preview. Sometimes it even goes as far as showing
Could not scrape URL because it has been blocked
What doesnt add up here is that if i click on See exactly what our scraper sees for your URL it shows me the source of our page with empty <body> and <og:...>-tags in the <head>
The debugger shows me the correct og:url, og:type, og:title, og:description and og:image, the preview is alright, response code is 206 and last scraping was somewhere in August.
After rescraping a few times, most of the time the error messages are gone, but that cant be the solution. It seems as if the debugger sends random errors for whatever reason
So what do these error messages really mean? Are they wrong? What am i missing here?
(Note: a 3rd pary is having trouble sharing our pages in their application due to those error messages, everytime they appeared in the debugger, their data seems to be somehow broken)
After some back and forth with one of facebooks supporter in the developer forum, they acknowledged my problem as a bug and assigned a team for further investigation. However, a few days later my bug report was closed with following message:
Those messages are due to a bug in our Debugger when you scrape no-canonical URLs. On that case, the information about your URL is updated asynchronously so it takes a little while for the error messages to go away.
If you input the canonical URL in the debugger, the error messages will go away after the first scrape.
Unfortunately, due to the way our systems works, we are not planning to fix that error in the near future.
Related
I have a simple Tumblr website blog, upon which I post content.
However since I changed my DNS, the Facebook Object debugger sees really old data for my root url: http://www.kofferbaque.nl/ and for every post (for instance: http://kofferbaque.nl/post/96638253942/moodoid-le-monde-moo) it shows a 404 not found, which is bullshit because the actual content is there.
The full error message: Error parsing input URL, no data was cached, or no data was scraped.
I have tried the following things to fix it:
clear browser cache / cookies / history
using ?fbrefresh=1 after the URL (didn't work)
I've added a FB app_id to the page (made sure the app was in production - added the correct namespaces etc. - also didn't change anything)
Checked out other questions regarding this subject
Rechecked all my meta tags a dozen times
What other options are there to fix this issue?
If you need more info please ask in the comments.
2014-09-08 - Update
When throwing my url into the static debugger https://developers.facebook.com/tools/debug/og/echo?q=http://www.kofferbaque.nl/. The 'net' tab from firebug gives the following response:
<meta http-equiv="refresh" content="0; URL=/tools/debug/og/echo?q=http%3A%2F%2Fwww.kofferbaque.nl%2F&_fb_noscript=1" /><meta http-equiv="X-Frame-Options" content="DENY" />
2014-09-11 - Update
removed duplicate <!DOCTYPE html> declaration
cleanup up <html> start tag (aka - removed IE support temporarily)
I've placed a test blog post to see if it would work, it didn't. Somehow my root url started 'magically' updating itself. Or let's say, it removed the old data - probably due to the fact that I removed the old app it was still refering to. However, it still doesn't see the 'newer' tags correctly.
Still no succes
2014-09-12 - Update
Done:
moving <meta> tags to the top of the <head> element
removed fb:app_id from page + the body script, for it has no purpose.
This appearantly doesn't make any changes. It also appears that tumblr injects lots of script tags at the start of the head element. Maybe that is the reason the Facebook scraper doesn't 'see' the meta tags.
The frustrating bit is that through some other og tag scanner: http://iframely.com/debug?uri=http%3A%2F%2Fkofferbaque.nl%2F, it shows all the correct info.
First, the HTML is not valid. You got the doctype two times (at least on the post page), and there is content before the html tag (script tag and IE conditionals).
This may be the problem, but make sure you put the og-tags together at the beginning of the head section - The debugger only reads part of the page afaik, so make sure the og-tags are in that part. Put all the other og-tags right after "og:site_name".
Btw: ?fbrefresh=1 is not really necessary, you can use ANY parameter - just to create a different url. But the debugger offers a button to refresh the scraping, so it´s useless anyway.
I’m trying to troubleshoot a specific behavior. The last entry I wrote on a WordPress blog returns no data when run through Facebook’s Object Debugger (linter). I just get a “Error Parsing URL:Error parsing input URL, no data was scraped.”
However, if I try with any previous post, all seems to be fine: the linter scrape the page correctly;
If the Facebook button under the problematic entry is clicked, a snippet is correctly produced, except for a thumbnail of the image: permalink, summary, all is correct.
When I examined the source code of for the permalink entry in my browser, I can see all tags correctly displayed, even the og:image tag (the URL is valid).
This is sudden behavior. I didn’t experience any problem since I setup Facebook Open Graph protocol on my blog.
P.
Got it. Sometime between the time I created my previous entry and the time I wrote the new one, the CDN (content delivery network) I’m running me blog through stopped working.
Facebook linter wasn’t happy because it couldn’t find the image (since the image wasn’t distributed through the CDN: the CDN handles media, not plain text, that’s why Facebook was still able to scrape title, summary, etc.).
Lesson learned: when running test with Facebook Object Debugger, first disable any cache system (or make sure it works properly) or it may impact the results.
Im currently using the fb url linter and can see that everything is there correctly, however when a user likes the page it misses key information.
the linter is also stating there is missing content, however it also shows that all tags are there as well. is there anything wrong with my code ?
this is the result on the facebook url linter
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fdev.murchison-hume.com%2FProducts%2FSuperlative-Liquid-Hand-Soap%2FCoriander-Superlative-Liquid-Hand-Soap-Refill
it has conflicting messages stating things dont exist, when they obviously do as they are outlined below the error messages...
any help greatly appreciated
Recheck your type contents. The first error is telling you that you don't own the object type. For custom objects, usually you need to pass in
<meta property="og:type" content="MY_APP_NAMESPACE:product"/>
If that fails, og:type errors usually comes up when you fail to setup your objects correctly in your open graph settings (at developers.facebook.com).
I suspect that fixing the first error will make the others disappear.
Having issues getting Timeline to work. It is a two part problem.
First, there is an issue of caching parts of the OG metatags. When the debugger goes to my URL, I know it is hitting it correctly because the og:url it spits back is correct which means it has been processed on my end (ex: I send it to og.php?og=read&chapter=799, and it will spit back the right book_id for the og:url, meaning my script processed it). But all the other information seems to be cached. I originally and erroneously had an fb:app_id and og:site_url for an object, so I removed those. The output still shows those as having an existing site_url which is throwing an error. Having a fb:app_id forces the og:type of 'website', which I have set (correctly) to my namespace and object. When I try to POST the action, I get an oAuthException error back, that an og:type of 'website' isn't valid for an object. Once again, that should be fixed, but it keeps caching the old OG data. I have tried adding ?fbrefresh=1, but that did nothing.
Another issue, possibly related...even though I know it got there, and my script processed the request, Facebook doesn't report that. When I click on "See exactly what our scraper sees for your URL" it shows the authentication URL (see below)! As though, it never got there and the popup was initiated, which isn't even how the code for og.php works!! My guess is they got that from the base domain name itself (exmaple.com) before trying the full request with example.com/og.php.
window.parent.location='https://www.facebook.com/dialog/oauth?client_id=164431733642252&redirect_uri=http%3A%2F%2Fapps.facebook.com%2Fexample%2F%3Fpage%3D&state=064bd26ff582a9ec7c96729e6b69bbd2&canvas=1&fbconnect=0&scope=email%2Cpublish_stream%2Cpublish_actions%2C';
I figured it out. I thought the og:url was the URL you wanted people to use to get to the correct page in your app, like an action link. It is, but it isn't. I now have it match the OBJECT_URL you send to timeline.
I had a different URL (an action link to the app), which when redirected, can't be reached by the crawler because it is inside the applications authorized wall. This caused the og:type of website, and data to appear cached.
To fix it, the object_url I post to timeline, and the og:url in the metatag is the same. But you can figure out if it is the crawler or the action link by looking for the query string: ?fb_action_ids=SOME_ID which is sent from link on the timeline. If it contains that, then I forward it to the application page needed from there.
I'm having similar problems to you. It kept complaining about og:site_url being set, even though I never set those. It appears that the error messages it sends are actually inaccurate, and the problem is not that og:site_url is being set, but that the og:url is different from the object url. Sometimes a wrong error message is worse than no error message!
A further question is why an object url has to correspond to a live page that a user will see. An object is a logical unit, but it doesn't necessarily correspond to a single user-visible page. Your redirection trick might work, but it is not the proper way to do something. When I post an action related to an obect, the object url should be used to draw the information of the object, but I should be able to send the user somewhere else. If this was an intended design I think it is a mistake.
When debugging a website with a Like button here:
https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fwww.arcosjapan.com%2Fscroll%2Fcaresox-hc.html
It throws a critical error:
"You must preload this data. TAAL[BLAME_file]".
I have installed hundreds of like buttons and have never run across this. Does anybody know what it means? I am aware of the other issue that it is pointing out on that page: "The app ID specified within the "fb:app_id" meta tag is not allowed on this domain." and I am pretty sure that is not the issue as my app is correctly configured.
Facebook Debugger is broken every once in a while. Now it seems to be working again. It gave the same error message just a couple of days ago.