Facebook changes my URL variable values when scraping it? - facebook

OK, here is a weird one...
Try putting this URL (http://dev.bride.ca/wedding-dresses/index.cfm?page=1&shopRegions=162&GownTypeID=1&maxGownPrice=1000&GownLabelID=12) through the Facebook debugger.
Here, just click this : http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fdev.bride.ca%2Fwedding-dresses%2Findex.cfm%3Fpage%3D1%26shopRegions%3D162%26GownTypeID%3D1%26maxGownPrice%3D1000%26GownLabelID%3D12
See the weird change?
When it creates a canonical URL which to parse, it takes the very last URL variable (GownLabelID=12) and changes it to "GownLabelID=0"
Actually, it does the same with the previous variable : maxGownPrice=1000 becomes maxGownPrice=0
But it does not do it with all the variables, just those two. And it does not do it to the last two. I swapped the order and moved the last one to the beginning and it still did it to the same variable.
I am out of ideas.. ANYONE??
Thanks in advance! (*) this being on the DEV server, feel free to try different things etc. It is not "live" to consumers or anything..

When it crates a canonical URL which to parse, it takes the very last URL variable (GownLabelID=12) and changes it to "GownLabelID=0"
The scraper does not “create” the canonical URL – it reads them from your page’s OG meta tags.
You are the one that put
<meta property="og:url" content="http://dev.bride.ca/wedding-dresses/index.cfm?
page=1&shopRegions=162&GownTypeID=1&maxGownPrice=0&GownLabelID=0"/>
into the page’s HTML code – and the scraper just follows what you’re telling it.

Related

Facebook Object Debugger returns 404 not found when trying to scrape

I have a simple Tumblr website blog, upon which I post content.
However since I changed my DNS, the Facebook Object debugger sees really old data for my root url: http://www.kofferbaque.nl/ and for every post (for instance: http://kofferbaque.nl/post/96638253942/moodoid-le-monde-moo) it shows a 404 not found, which is bullshit because the actual content is there.
The full error message: Error parsing input URL, no data was cached, or no data was scraped.
I have tried the following things to fix it:
clear browser cache / cookies / history
using ?fbrefresh=1 after the URL (didn't work)
I've added a FB app_id to the page (made sure the app was in production - added the correct namespaces etc. - also didn't change anything)
Checked out other questions regarding this subject
Rechecked all my meta tags a dozen times
What other options are there to fix this issue?
If you need more info please ask in the comments.
2014-09-08 - Update
When throwing my url into the static debugger https://developers.facebook.com/tools/debug/og/echo?q=http://www.kofferbaque.nl/. The 'net' tab from firebug gives the following response:
<meta http-equiv="refresh" content="0; URL=/tools/debug/og/echo?q=http%3A%2F%2Fwww.kofferbaque.nl%2F&_fb_noscript=1" /><meta http-equiv="X-Frame-Options" content="DENY" />
2014-09-11 - Update
removed duplicate <!DOCTYPE html> declaration
cleanup up <html> start tag (aka - removed IE support temporarily)
I've placed a test blog post to see if it would work, it didn't. Somehow my root url started 'magically' updating itself. Or let's say, it removed the old data - probably due to the fact that I removed the old app it was still refering to. However, it still doesn't see the 'newer' tags correctly.
Still no succes
2014-09-12 - Update
Done:
moving <meta> tags to the top of the <head> element
removed fb:app_id from page + the body script, for it has no purpose.
This appearantly doesn't make any changes. It also appears that tumblr injects lots of script tags at the start of the head element. Maybe that is the reason the Facebook scraper doesn't 'see' the meta tags.
The frustrating bit is that through some other og tag scanner: http://iframely.com/debug?uri=http%3A%2F%2Fkofferbaque.nl%2F, it shows all the correct info.
First, the HTML is not valid. You got the doctype two times (at least on the post page), and there is content before the html tag (script tag and IE conditionals).
This may be the problem, but make sure you put the og-tags together at the beginning of the head section - The debugger only reads part of the page afaik, so make sure the og-tags are in that part. Put all the other og-tags right after "og:site_name".
Btw: ?fbrefresh=1 is not really necessary, you can use ANY parameter - just to create a different url. But the debugger offers a button to refresh the scraping, so it´s useless anyway.

Variable og: tags

Hello and thanks for checking out my question.
<meta property="og:image" content="http://www.myurl.com/images/test.png"/>
I have an application that is intended to service multiple "campaigns", each having its own graphic and description I would like to show when someone likes the page. I can update the meta tags all I want but FB will still only use the most recently scraped data (from the nightly scrape or from the linter).
I read that I might be able to cURL to the linter to have it pull the new data right before the like is sent, but what happens when I am servicing hundreds+ of people and multiple campaigns?
Is there any way around this? I have not found any solid solutions after several hours of searching.
tl;dr
I want my posted likes to respect the current meta tags and ignore or update the FB cache for that data.
Each object / page needs its own URL, even if those URLs are handled by the same code -using a query string parameter in the URL to identify different objects is the most common way to achieve this, with server-side URL rewriting being another

Facebook caching issue [duplicate]

I'm having troubles with my meta tags with Open Graph. It seems as though Facebook is caching old values of my meta tags. Old values for Attributes og:title and og:url are still used, even though I have changed them already.
I ran Lint on a page in my site, and this appeared:
Notice that there are two values for og:title and og:url, and the last one prevailed. However, The last two entries are the OLD entries that I used for this site. I am now currently using these meta tags (you can verify if you view the source of the HTML):
<meta property="og:title" content="Smart og rummelig pusletaske fra Petit Amour med god plads til alt – værdi 1.099 kr – køb nu kun 599 kr "/>
<meta property="og:description" content="Pinq.dk - Det gode liv for det halve"/>
<meta property="og:type" content="product"/>
<meta property="og:url" content="http://pinq.dk/tilbud/landsdaekkende/lissy/"/>
<meta property="og:image" content="http://pinq.dk/wp-content/themes/pinq/images/logo-top.png"/>
<meta property="og:site_name" content="Pinq" />
<meta property="fb:app_id" content="161840830532004" />
Why is Facebook caching og:title and og:url? Is anyone experiencing the same issue?
Go to http://developers.facebook.com/tools/debug
Enter the URL following by fbrefresh=CAN_BE_ANYTHING
Examples:
http://www.example.com?fbrefresh=CAN_BE_ANYTHING
http://www.example.com?postid=1234&fbrefresh=CAN_BE_ANYTHING
OR visit:
http://developers.facebook.com/tools/debug/og/object?q=http://www.example.com/?p=3568&fbrefresh=89127348912
I was having the same issue last night, and I got this solution from some website.
Facebook saves your cache thumbnail. It won't refresh even if you delete the thumnail/image from your server. But Facebook allows you to refresh by using fbrefresh.
The most voted question is quite outdated:
These are the only 2 options that should be used as of November 2014:
For non developers
Use the FB Debugger: https://developers.facebook.com/tools/debug/og/object
Paste the url you want to recache. (Make sure you use the same url included on your og:url tag)
Click the Fetch Scrape information again Button
For Developers
Make a GET call programmatically to this URL: https://graph.facebook.com/?id=[YOUR_URL_HERE]&scrape=true (see: https://developers.facebook.com/docs/games_payments/takingpayments#scraping)
Make sure the og:url tag included on the head on that page matches with the one you are passing.
you can even parse the json response to get the number of shares of that URL.
Additional Info About Updating Images
If the og:image URL remains the same but the image has actually changed it won't be updated nor recached by Facebook scrapers even doing the above. (even passing a ?last_update=[TIMESTAMP] at the end of the image url didn't work for me).
The only effective workaround for me has been to assign a new name to the image.
Note regarding image or video updates on previously posted posts:
When you call the debugger to scrap changes on your og:tags of your page, all previous Facebook shares of that URL will still show the old image/video. There is no way to update all previous posts and it's this way by design for security reasons. Otherwise, someone would be able to pretend that a user shared something that he/she actually didn't.
If you have many pages and don't want to refresh them manually - you can do it automatically.
Lets say you have user profile page with photo:
$url = 'http://'.$_SERVER['HTTP_HOST'].'/'.$user_profile;
$user_photo = 'http://'.$_SERVER['HTTP_HOST'].'/'.$user_photo;
<meta property="og:url" content="<?php echo $url; ?>"/>
<meta property="og:image" content="<?php echo $user_photo; ?>"
Just add this to your page:
// with jQuery
$.post(
'https://graph.facebook.com',
{
id: '<?php echo $url; ?>',
scrape: true
},
function(response){
console.log(response);
}
);
// with "vanilla" javascript
var fbxhr = new XMLHttpRequest();
fbxhr.open("POST", "https://graph.facebook.com", true);
fbxhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
fbxhr.send("id=<?php echo $url; ?>&scrape=true");
This will refresh Facebook cache. If you use the jQuery solution, have a look at "response" in console.log - you will find there "updated_time" field and other useful information.
The OG thumbnail does not seem to refresh even if passing the fbrefresh variable.
To update this without waiting for automated clearing you'll need to change the filename of the thumbnail associated meta tag value and refresh.
Basically, the answer is patience ;)
I checked the Linter this morning, and og:title and og:url displays correctly, without the redundant values. I guess FaceBook automatically clears its cache at some specific interval. I just have to wait.
I had the same issues using og:image, several attempts to rename the file or clear FB cache did not work either via the facebook debugger or testing via an actual account.
The new facebook guidelines state the image size should be 1200 x 630 or having that aspect ratio, this seems to be wrong, the only thing that worked for me was using an image with square dimensions.
Edit* Afew hours I went back to use 1200 x 630 and it magically worked, it was magical.
I also renamed the files to f*^*kfacebook.jpg, not sure it helped but it felt good.
Yes, facebook automatically clears the cache every 24 hours: Actually facebook scrapes the pages and updates the cache every 24 hours https://developers.facebook.com/docs/reference/plugins/like/#scraperinfo.
Ooook, finally it helped (I use IP.Board). What I had to do was:
Change url of og:image on my website (General configuration).
Try this method with ?fbrefresh=1154464gd56
Thanks to author for this thread!
EDIT: What is more you need to remember about image requirements. For now (january 2013) it's:
- at least 200 px in both directions
- maximum ratio 3:1
Visit the FB page https://developers.facebook.com/tools/debug/og/object/
Enter your domain.
Click the button "Fetch new scrape information"
Done
I'm sorry folks but the correct answer is:
There is no fool proof way to update the open graph og:image url with immediate result. It is cached until fb updates (reportedly every 24 hours)
Here are things that have been reported to work by others but I have had ZERO success with any of them.
Choosing "Fetch new scrape information"
Changing the actual image filename and/or deleting the original
Adding a query string to the image url by appending a PHP TIMESTAMP or ?anything
Adding the "...yoursite.com/?fbrefresh=anything" query string to the debugger fetch url
Choosing the graph API link at the bottom of the og dev page
Choosing to see exactly what the scraper sees - does not appear to request real time un-cached scrape data, it still shows the cached image url even if the file no longer exists
Inspecting your code is always a spot on way to confirm it is not an issue with browser cache or some caching service. If the meta information is up to date in your code and you've tried all of the above (unless another suggestion comes to fruition), the correct answer is you can do nothing but wait.
We just ran into this, as it turns out, we weren't linting the right url, since the real url had a query string (duh, different page as far as a bot is concerned).
http://example.com/
!==
http://example.com/?utm_campaign=foo
The linter will recache your page, you don't have to wait.
One thing to add, the url is case sensitive. Note that:
apps.facebook.com/HELLO
is different in the linter's eyes then
apps.facebook.com/hello
Be sure to use the exact site url that was entered in the developer settings for the app. The linter will return the properties otherwise but will not refresh the cache.
I've found out that if your image is 72dpi it will give you the image size error. Use 96dpi instead. Hope this helps.
Go to http://developers.facebook.com/tools/debug
Paste in the url of the page and click debug. If your site is using url aliases make sure you are using the same url as Facebook
is using for the page you are sharing (example: in Drupal use the
node/* path instead of the alias if the page is shared via that
url).
Click in the "Share preview" part on "See this in the share dialog" link
Facebook Developer Documents says title property has exception:
Once 50 actions (likes, shares and comments) have been associated with
an object, you won't be able to update its title
https://developers.facebook.com/docs/sharing/opengraph/using-objects#update
Had a similar experience. Website link was showing a 404 in the preview that facebook generated. Turns out the og:url metadata was wrong. We had already fixed it a few days back but were still seeing a 404 on the preview. We used the tool at https://developers.facebook.com/tools/debug/ and that forced the refresh (didn't have to append any parameters by the way)
In our case, Facebook didn't refresh the cache after 24 hours but the tool helped force it.
It is a cache, ofc it refreshes, that's what cache is ment to do once in a while. So waiting will eventually work but sometimes you need to do that faster. Changing the filename works.
I was having this issue too. The scraper shows the right information, but the share url was still populated with old data.
The way I got around this was to use the feed method, instead of share, and then populate the data manually (which isn't exposed with the share method)
Something like this:
shareToFB = () => {
window.FB.ui({
method: 'feed',
link: `signup.yourdomain.com/?referrer=${this.props.subscriber.sid}`,
name: 'THIS WILL OVERRIDE OG:TITLE TAG',
description: 'THIS WILL OVERRIDE OG:DESCRIPTION TAG',
caption: 'THIS WILL OVERRIDE THE OG:URL TAG'
});
};
Really easy solve. Tested and working. You just need to generate a new url when you update your meta tags. It's as simple as adding a "&cacheBuster=1" to your url. If you change the meta tags, just increment the "&cacheBuster=2"
Orginal URL
www.example.com
URL when og meta tags are updated:
www.example.com?cacheBuster=1
URL when og meta tags are updated again:
www.example.com?cacheBuster=2
Facebook will treat each like a new url and get fresh meta data.
Years later and this is still a common problem, but its not always facebook's cache: It is very often human error (allow me to elaborate)
OG:TYPE effects your image scrape:
https://ogp.me/#type_article not the same as https://ogp.me/#type_website
Be aware that og:type=website will cause any /sub-pages/ of that url to become "canonical". This means you will have trouble getting your images to update using the scraper no matter what you do.
Consider this "assumption and common mistake"
-<meta property="og:type" content="website" /> => https://www.example.org (parent)
-<meta property="og:type" content="website" /> => https://www.example.org/sub-page/
-<meta property="og:type" content="website" /> => https://www.example.org/sub-page/child-2/
- Ergo: /sub-page/ and /child-2/ will inherit the og:image of the parent
Those are not "all websites", 1 is a website, the others are articles.
If you do that Facebook will think all of those are canonical and it will put the FIRST og:image into all of them. (try it, you'll see) - if you set the og:url to be your root or parent domain you've told facebook they are all canonical. (there is good reason for that, but its off topic)
Consider this solution (which is what most people "really want")
-<meta property="og:type" content="article" /> => https://www.example.org/sub-page/
-<meta property="og:type" content="article" /> => https://www.example.org/sub-page/child-2/
If you do that now Facebook will give you far far less problems with scraping your NEW images.
In closing, YES the cache busters, random vars, changing urls and suggestions here can work, but they will seem like "intermittent voodoo" if the og:type is not specified correctly.
PS: remember that a CDN or serverside cache will serve to Facebook's scraper even if you "think" you can see the most recent version. (I wont spend any time on this other than to point out it will waste colossal amounts of your time if not double checked.)
I had a different, but similar problem with Facebook recently, and found that the scraper/debug page mentioned, simply does not appear to read any page in its entirety. My meta properties for Open Graph were further down in the head section, and the scraper would constantly inform me that the image specification was not correct, and would use a cached version regardless. I moved the Open Graph tags further up in the code, near the very top of the page, and then everything worked perfectly, every time.
I had the same problem with fb and twitter caching old meta data, this threw me for a curve as I continued to edit the code but no change. I finally discovered they had caching my first request. Adding a query string to the url worked for twitter but not fb(for me).

Caching OG data in the debugger, forcing og:type website, and authorization popup for "See exactly what our scraper sees for your URL" woes

Having issues getting Timeline to work. It is a two part problem.
First, there is an issue of caching parts of the OG metatags. When the debugger goes to my URL, I know it is hitting it correctly because the og:url it spits back is correct which means it has been processed on my end (ex: I send it to og.php?og=read&chapter=799, and it will spit back the right book_id for the og:url, meaning my script processed it). But all the other information seems to be cached. I originally and erroneously had an fb:app_id and og:site_url for an object, so I removed those. The output still shows those as having an existing site_url which is throwing an error. Having a fb:app_id forces the og:type of 'website', which I have set (correctly) to my namespace and object. When I try to POST the action, I get an oAuthException error back, that an og:type of 'website' isn't valid for an object. Once again, that should be fixed, but it keeps caching the old OG data. I have tried adding ?fbrefresh=1, but that did nothing.
Another issue, possibly related...even though I know it got there, and my script processed the request, Facebook doesn't report that. When I click on "See exactly what our scraper sees for your URL" it shows the authentication URL (see below)! As though, it never got there and the popup was initiated, which isn't even how the code for og.php works!! My guess is they got that from the base domain name itself (exmaple.com) before trying the full request with example.com/og.php.
window.parent.location='https://www.facebook.com/dialog/oauth?client_id=164431733642252&redirect_uri=http%3A%2F%2Fapps.facebook.com%2Fexample%2F%3Fpage%3D&state=064bd26ff582a9ec7c96729e6b69bbd2&canvas=1&fbconnect=0&scope=email%2Cpublish_stream%2Cpublish_actions%2C';
I figured it out. I thought the og:url was the URL you wanted people to use to get to the correct page in your app, like an action link. It is, but it isn't. I now have it match the OBJECT_URL you send to timeline.
I had a different URL (an action link to the app), which when redirected, can't be reached by the crawler because it is inside the applications authorized wall. This caused the og:type of website, and data to appear cached.
To fix it, the object_url I post to timeline, and the og:url in the metatag is the same. But you can figure out if it is the crawler or the action link by looking for the query string: ?fb_action_ids=SOME_ID which is sent from link on the timeline. If it contains that, then I forward it to the application page needed from there.
I'm having similar problems to you. It kept complaining about og:site_url being set, even though I never set those. It appears that the error messages it sends are actually inaccurate, and the problem is not that og:site_url is being set, but that the og:url is different from the object url. Sometimes a wrong error message is worse than no error message!
A further question is why an object url has to correspond to a live page that a user will see. An object is a logical unit, but it doesn't necessarily correspond to a single user-visible page. Your redirection trick might work, but it is not the proper way to do something. When I post an action related to an obect, the object url should be used to draw the information of the object, but I should be able to send the user somewhere else. If this was an intended design I think it is a mistake.

facebook sharer og:url content not displayed after share

I have set the meta tags as shown in the example
Everything works fine, except the og:url tag. Before i hit the share button, it's displaying the correct url as set in the tag. But after I hit the share button, the shared message display the domain name of the site instead. So for example if I set the og:url as www.helloworld.com/hellouniverse, the shared message (this can be seen after i hit share) will instead display www.helloworld.com.
Does anyone probably have any idea how to fix this ?
Thanks
As far as I know, this is the default behavior of facebook, they take the domain name from the og:url parameter and display it.
I've tried several approaches to deal with it, but eventually came up with nothing.
Because it is their internal script, there isn't much you can do about it.