Object Debugger Error Scraping Page ... near solution? - facebook

I have a very strange issue while sharing a page, probably connected to DNS used by Facebook.
I usually share pages from my own sites with no problem. In only one new site, I cannot correctly share any page.
where is the problem?
If I try to share a page from this new site (www.tarocchibluemoon.com), I expected to share an image, a page title etc.
However, I didn't see any images choosen from the ones in my page.
I used the debugger developers.facebook.com/tools/debug
and typed in the site http://www.tarocchibluemoon.com having a beautiful "Critical Errors must be fixed"
Looking deeper in Graph API I see:
{
"url": "http://www.tarocchibluemoon.com/",
"type": "website",
"title": "www.tarocchibluemoon.com",
"image": [
{
"url": "http://www.tarocchibluemoon.com/images/domain_reserviert.gif"
}
],
"updated_time": "2011-11-14T20:43:22+0000",
"id": "10150336639081017"
}
This means that debugger sees the site like it was a month ago when the provider showed the classic default page shown when you buy a new domain with written inside "The domain is reserved" (a page like this example).
Probably Facebook didn't received the update to the DNS done when I published the site!
I tried also to change again the IP address of my site but with no results.

I think the problem is the canonical tag in your head section (end of line 3):
<link href="http://bluemoon.thiellaconsulting.com/Default.aspx" rel="canonical" />
Facebook tries to scrape your canonical url - but in this case that url doesn't exist so you get a 'can't download' error.
If you switch that tag so it points to your current domain (or remove it altogether) you should allow Facebook to scrape the page and update it's graph entry.

Related

Best practices for linking [<a href] canonical URL that have redirects while using localhost

Say I have a website:
https://www.example.com
This website has many different HTML pages such as:
https://www.example.com/page.html
The website is hosted on AWS Amplify and has a variety of 301 redirects which are handled with JSON. Below is an example:
[
{
"source": "https://www.example.com/page.html",
"target": "https://www.example.com/page",
"status": "301",
"condition": null
}
]
So, as result, my page is always showing /page instead of /page.html on the client side, as expected. I read a lot about canonical URLS today and learned:
For the quickest effect, use 3xx HTTP (also known as server-side) redirects.
Suppose your page can be reached in multiple ways:
https://example.com/home
https://home.example.com
https://www.example.com
Pick one of those URLs as your canonical URL, and use redirects to send traffic from the other URLs to your preferred URL.
From: How to specify a canonical with rel="canonical" and other methods | Google Search Central | Documentation | Google Developers
Which is what I did with the JSON in AWS. I also found that using <link rel="canonical" href="desired page" in the <head> of my HTML is the best practice for telling google (Analytics, etc.) which page is the desired canonical. Which I have since updated all my pages with.
Now the main problem is whenever you hover a href or copy the link address, it includes the .HTML extension on the client side. As soon as this link is pasted and entered the server updates without the .HTML extension. My question is what is the best practice to exclude the extension and display the target address when copying the link address or hovering and the href appearing in the bottom left (Chrome MacOS 110.0.5481.77).
I've seen sites using absolute paths that include the full domain. This isn't a problem, however, most of the development of the site is done on a localhost. Doing this will make that a hassle as I would have to type in the full local path each time which includes the .html extension to get an accurate representation locally. Is there a certain way to do this, which is the correct way?
*Most of this is all new information to me so if something I'm saying is invalid, please correct me.

Facebook is not picking og:site_name tag

I have a page http://46.39.131.94:53157/FacebookShare/Go/f80b86a2-0b18-44c9-9de0-42ca465112f2 (don't mind the IP address, this is my home computer running IIS Express, when the app is finished I will publish it with proper DNS name). When I try to share this page on Facebook, the share post is created normally, but og:site_name tag doesn't seem to be displayed anywhere. Instead, my IP address is displayed.
I have tried the Share debugger, it show zero errors or warnings. In the "Based on the raw tags, we constructed the following Open Graph properties" section the og:site_name tag is missing, but it is displayed when I click the "Show All Raw Tags".
I have tried to "Scrape again" and refresh. Changed the URL several times. I am out of ideas.
I would expect the "SHARE TEST" (the content of og:site_name) to appear in grey caps just below the description instead of the IP address.
Am I missing something?
I have written to Facebook support team and this is what they said:
That's correct, we don't currently show the "site_name" Open Graph parameter in the Share Dialog. We don't necessarily use all of the Open Graph tags on Facebook and in this instance we show the hostname or IP address from the shared URL.
All the best,
David
So, the og:site_name is not currently used by Facebook Share dialog.
That answers my question.

Error parsing input URL, no data was scraped. only with new pages on my site

The problem i have is that i own a website where other people can post stuff ,creating new pages on my domain, but the problem that occured today is that all the new post pages created today are malfunctioning , sharing is not loading thumbnail picture and title and so on, but the weird this is that all the posts(new pages) created before today are all working fine
What caused an error to occur out of nowhere?
I also cannot debug any of the URL's of my website as the same error: Error parsing input URL, no data was scraped
The website im having problems with is here http://www.vabameedia.ee/vm/184/h%C3%A4da-ei-anna-h%C3%A4beneda.html
This is one of the sites where it says no error on page but facebook still cant reach it. http://www.vabameedia.ee/vm/178/craig-parks-%C3%BChek%C3%A4eline-krossisoitja.html
For people experiencing the same problem but for different causes, I discovered a few interesting things about how Facebook "scrapes" pages, checking the logs of the server while doing some trials.
First of all: if you never tried to share a page with FB, FB never tried to scrape it, and it will not try to do so if you only put the url in the Debug tool.
That's the first reason because you get the error: it just states that FB has no information on the page, you must "force" it to scrape the page.
The first time you try to share a page, FB scrapes it (asks your server the first 40k of the page and analyse the opengraph tags).
What can happen is that you do not see the image: Facebook Share Dialog does not display thumbnails one first load
The reason is that FB behind the scenes is still scraping your page and caching the image. The next time, in fact, you have also the image.
How to solve it? Pre caching: https://developers.facebook.com/docs/sharing/best-practices#precaching
or simply add
<meta property="og:image:width" content="450"/>
<meta property="og:image:height" content="298"/>
I was pulling my hair out trying to fix this issue. Hours and hours of troubleshooting to no avail. After speaking with one of our programmers about a topic unrelated I thought of something to try as a long shot.
Much to my surprise, it worked!!!
This is the reason behind the problem and my solution for it:
When you draft a post in WordPress it generates a link based on your article's title (unless you manually change it). The title of my article included special characters, however the auto-generated link didn't display these special characters, only hyphens to replace the spaces. Should be fine right? Wrong! Somewhere embedded in metadata and code in the WordPress platform are those special characters and they mess up the way Facebook pulls info from the article being linked to. This is a problem because certain special characters invalidate hyperlinks.
For example:
Article Title: R[eloaded]
Auto-generated hyperlink DISPLAYED in WordPress "Permalink" field: http://www.example.com/reloaded
Actual WordPress Auto-generated hyperlink: http://www.example.com/r[eloaded]
Those brackets will invalidate the link and Facebook will be unable to pull any information (ie pictures) from it.
Solution:
(1) Simply, manually change the WordPress hyperlink address to something that doesn't include any special characters (this will not change the title of your article).
(2) Click "Update" to change the post to include the new hyperlink.
(3) Click "Purge from Cache" in the WordPress window
(4) Refresh your Facebook browser window
(5) Paste the new hyperlink for your article
(6) Enjoy your Facebook post with a preview image and information
Sidenote: Don't pull your hair out over Facebook, it's not worth it. =)
If you're using Wordpress, edit the post in question to change the permalink (just alter it slightly), then update the post. Using the new permalink in the Facebook OG debugger should now work.
It's a weird fix, but I think it takes care of a problem caused by special characters being used in the title of a post, which is then used to make the permalink.
Its all about DNS issue, was having same issue and resolved it by updating domain name servers to actual name servers.
In my case my domain was pointed to ns1.websterz.net and ns2.websterz.net and on this server i had DNS redirect to my other server (where web site is hosted). I Just updated name servers of the domain to actual name servers where my web site is hosted on. This was account migration case i forgot to update name servers as of new server.
Everything works fine now.

Why does Object debugger say my URL is a facebook URL and isn't "scrapable"

In trying to create an "object" page for my first facebook app, I've run into some difficulty. I followed Facebook's Open Graph Tutorial nearly exactly.
After creating an "object" html page with the appropriate <meta property="og:... tags I tried running the URL through the Debugger Tool as suggested in the tutorial but I'm given the following error:
"Facebook URLs aren't scrapable by this Debugger. Try your own."
This page is in the same directory on my company's linux box as the canvas page, and is certainly not a "Facebook URL". If it matters, I'm using an IP instead of a domain name: xx.x.x.xxx/app/obj.html
...
I continued the tutorial anyway, but ultimately it does not seem to want to post a new action/object (is this even right?). I did however manage to get something to work, as in the app timeline view I apparently actioned one of those objects a couple hours ago. I assume this happened when I was pasting curl POST commands into the terminal.
I'm pretty new to the whole open graph, and facebook APIs, etc., so I'm probably operating under false assumptions of some sort, and I've been all over trying different things, but this error seems pretty bizarre to me and I can't seem to resolve it.
UPDATE
I just took the object page and put it on my own personal shared hosting acct. The debugger worked (inexplicably) fine on it, but I couldn't go too far since it's a different domain than the one authorized by my app.
Make sure og:url inside your html page does not point to facebook.
Also, make sure to look at the open graph protocol page (to see you formatted the og tags correctly.
Also, make sure the page is accessible to everyone, not just yourself.
Without knowing the URL it's hard to be sure, but it's most likely that your URL is either including a og:url tag pointing to a facebook.com address, or a HTTP 301/302 redirect to Facebook instead

Facebook not able to analyse my web pages to determine useful images etc for shared links

I am trying to ensure that meaningful information is supplied with Facebook shared links to the HTML5 pages on my website. Right now, links are just showing up as a URL, with no description or accompanying image.
I have been using the open graph tags to provide metadata on the page.
I have been trying to check the pages on the website using the Facebook debugging tool but the tool does not get any data from the page being checked.
For example, I try to debug the page: http://www.gaiaguide.info/do/Hierarchy
It responds with the error: "Could not retrieve data from URL."
The graph API data provided by the debugging tool has the following value:
{
"error": {
"message": "An unknown error has occurred.",
"type": "OAuthException",
"code": 1
}
}
When I look at what Facebook scraper sees for the page, all I get is the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
The entire HTML content is missing. What is even more peculiar is that it is not even the same doctype as the page I am trying to test.
It is not clear to me why an OAuthException would be raised. The page is visible to external sources for validation. For example, I have validated the page on an HTML5 validation site and it is definitely seeing the entirety of the page contents.
I haved tried URLs from other sites served on the same IP address from the same server and they are fine. An appropriate image and summary is provided in the set of information that would be used to construct the shared link.
I have found other HTML5 pages that validate fine on the Facebook debugger.
I have tried to remove the og:* meta tags from the pages to see if they were causing Facebook to think that the website should be requiring some kind of user authentication but that has not impacted upon the problem.
I have tried to remove the "sharethis.com" mark-up and Javascript that is responsible for the sharing icons to the pages but that has also had no effect.
Any insights into what should be a simple problem would be greatly appreciated.
Thanks
Geoff S
The only issue I can see on your page is the empty space at the top of your HTML code. I made a copy of your page (http://www.webniraj.com/hierarchy.html) and put it through the debugger after removing the odd characters at the beginning of the file.
The page seemed to work correctly after that:
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fwww.webniraj.com%2Fhierarchy.html