Best practices for linking [<a href] canonical URL that have redirects while using localhost - redirect

Say I have a website:
https://www.example.com
This website has many different HTML pages such as:
https://www.example.com/page.html
The website is hosted on AWS Amplify and has a variety of 301 redirects which are handled with JSON. Below is an example:
[
{
"source": "https://www.example.com/page.html",
"target": "https://www.example.com/page",
"status": "301",
"condition": null
}
]
So, as result, my page is always showing /page instead of /page.html on the client side, as expected. I read a lot about canonical URLS today and learned:
For the quickest effect, use 3xx HTTP (also known as server-side) redirects.
Suppose your page can be reached in multiple ways:
https://example.com/home
https://home.example.com
https://www.example.com
Pick one of those URLs as your canonical URL, and use redirects to send traffic from the other URLs to your preferred URL.
From: How to specify a canonical with rel="canonical" and other methods | Google Search Central | Documentation | Google Developers
Which is what I did with the JSON in AWS. I also found that using <link rel="canonical" href="desired page" in the <head> of my HTML is the best practice for telling google (Analytics, etc.) which page is the desired canonical. Which I have since updated all my pages with.
Now the main problem is whenever you hover a href or copy the link address, it includes the .HTML extension on the client side. As soon as this link is pasted and entered the server updates without the .HTML extension. My question is what is the best practice to exclude the extension and display the target address when copying the link address or hovering and the href appearing in the bottom left (Chrome MacOS 110.0.5481.77).
I've seen sites using absolute paths that include the full domain. This isn't a problem, however, most of the development of the site is done on a localhost. Doing this will make that a hassle as I would have to type in the full local path each time which includes the .html extension to get an accurate representation locally. Is there a certain way to do this, which is the correct way?
*Most of this is all new information to me so if something I'm saying is invalid, please correct me.

Related

Facebook counter drop to 0 after moving to https

Following the instructions given by google folks, I added https support to our blog.
Nginx, behind the scene redirect everything non http to https, proxied to a ruby on rails app.
Everything seems to work quite well but facebook counters appears now buggy.
If you look the source of this page : https://milesandlove.com/argentine/le-fitz-roy
I added a lot of og meta tags :
<meta property="og:url" content="https://milesandlove.com/argentine/le-fitz-roy"/>
<link rel="canonical" href="https://milesandlove.com/argentine/le-fitz-roy"/>
And the share button :
<a class="addthis_button_facebook_like" fb:like:layout="button_count" fb:like:href="https://milesandlove.com/argentine/le-fitz-roy"></a>
Note that even if its a add_this button, it would be exactly the same result with the official facebook one.
The weird thingis since nobody like the page, it kept showing the old count . Since a new person came and like the page, it suddenly reset the counter to 0 !
Is the count definately lost ?
Why Facebook is protocol aware ?
I read that a tricky solution whould be to serve a http:// page to the facebook
crawler. Is it the only solution ?
Essentially you've changed your URL - you might have to contact Facebook in order to "migrate" your likes (if that is even possible).
It is 100% possible to serve totally different content on the same domain with different protocols just like http differs from ftp, http can differ from https. I would say that this is expected behavior.
I don't think that this is a "tricky" solution. There are many cases in which you would want a crawler to see slightly different content from a regular user in a browser. You could set this up to only respond to Facebook by using their specified IP addresses mentioned on this page.
Facebook will reset the likes count on your ages when you move to https:// and there's no way around this. I have a 301 redirect on the old URL and Facebook doesn't follow it. It will not keep the old likes and it will treat https:// domain as a separate page. Which is bs really! I don't know of a single site that serves different content on http:// and https://. So, there's no solution to this issue at this stage.

How to prevent Google from indexing redirect URL I do not own

A domainname that I do not own, is redirecting to my domain. I don´t know who owns it and why it is redirecting to my domain.
This domain however is showing up in Googles search results. When doing a whois it also returns this message:
"Domain:http://[baddomain].com webserver returns 307 Temporary Redirect"
Since I do not own this domain I cannot set a 301 redirect, or disable it. When clicking the baddomain in Google it shows the content of my website but the baddomain.com stays visible in the URL bar.
My question is: How can I stop Google from indexing and showing this bad domain in the search results and only show my website instead?
Thanks.
Some thoughts:
You cannot directly stop Google from indexing other sites, but what you could do is add the cannonical tag to your pages so Google can see that the original content is located on your domain and not "bad domain".
For example check out : https://support.google.com/webmasters/answer/139394?hl=en
Other actions can be taken SEO wise if the 'baddomain' is outscoring you in the search rankings, because then it sounds like your site could use some optimizing.
The better your site and domain rank in the SERPs, the less likely it is that people will see the scraped content and 'baddomain'.
You could however also look at the referrer for the request and if it is 'bad domain' you should be able to do a redirect to your own domain, change content etc, because the code is being run from your own server.
But that might be more trouble than it's worth as you'd need to investigate how the 'baddomain' is doing things and code accordingly. (properly iframe or similar from what you describe, but that can still be circumvented using scripts).
Depending on what country you and 'baddomain' are located in, there are also legal actions. So called DMCA complaints. This however can also be quite a task, and well - it's often not worth it because a new domain will just pop up.

Googlebot vs "Google Plus +1 Share Button bot"?

Site Setup
I have a fully client-side one page webapp that is dynamically updated and routed on the client side. I redirect any #! requests to a headless server that renders the request with javascript executed and returns the final html to the bot. The head of the site also contains:
<meta name="fragment" content="!">
Fetch as Google works
Using the Fetch as Google webmaster tool, in the Fetch Status page, I can see that the jQuery I used to update the og:title, og:image, and og:description was executed and the default values replaced. Everything looks good, and if I mouseover the URL, the screenshot is correct.
However, with the Google Plus button, no matter what values og:title, og:image, and og:description tags are updated to, the share pop-up always uses the default/initial values.
Attempted use
I call this after each time the site content is updated, rerouted, and og meta content updated.
gapi.plusone.render("plusone-div");
I was assuming that if this approach works for the Googlebot, it should also work for the +1 button. Is there a difference between the Googlebot and whatever is used by +1 to retrieve the site metadata?
edit:
Passing a url containing the #! results in a 'site not found'
gapi.plusone.render("plusone-div", {"href" : 'http://www.site.com/#!city/Paris');
The Google crawler does not render the snippet when the +1 button is rendered but rather when a user clicks the +1 button (or share button). What you should try is to determine what your server is sending to the Googlebot during this user initiated and asynchronous load by the Google crawler.
You can emulate this by using the following cURL command:
curl -A "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+https://developers.google.com/+/web/snippet/)" http://myurl.com/path/to/page
You can output that command to a file by adding -o testoutput.html to the command.
This will give you an idea of what the Google crawler sees when it encounters your page. The structured data testing tool can also give you hints.
What you'll likely see is that unless your doing your snippet preparation in a static file or on the server side is that you're likely not going to get the snippet that you desire.
If you can provide real URLs to test, I can probably provide more specific feedback.
Google+ fetch the pages using the _escaped_fragment_ query parameter but without the equal sign.
So, it would fetch http://www.site.com/?_escaped_fragment and NOT https://www.site.com?_escaped_fragment_=
Google Search crawler still using the fragment with the equal sign, this is just for Google plus crawler.

Allowing multiple domains for 1 Facebook App (like Tumblr)

I am trying to get my website validated with the Facebook object debugger and I'm running into the following error:
Object at URL 'http://www.example.com/latest' of type
'smallteaser:teaser' is invalid because the domain
'www.example.com' is not allowed for the specified application id
'597566643589666'.
This error makes perfect sense since I haven't allowed the example.com domain specific access to the Facebook app. But do I really have to?
What I would like to achieve is similar to how Tumblr works when a custom domain is used.
Say, for example, the website www.davidslog.com: it has the following meta tags:
<meta property="fb:app_id" content="48119224995" />
--> This is the Tumblr app ID
<meta property="og:url" content="http://www.davidslog.com/?og=1" />
--> This is a custom domain which points to a Tumblr blog
<meta property="og:type" content="tumblr-feed:tumblelog" />
--> This is a custom Tumblr object type (in namespace tumblr-feed)
And if you then compare this with, for instance, the domain theartofnotwriting.tumblr.com, which has the following metadata:
<meta property="fb:app_id" content="48119224995">
--> This is the same Tumblr app ID
<meta property="og:url" content="http://theartofnotwriting.tumblr.com/?og=1">
--> This is a different domain
<meta property="og:type" content="tumblr-feed:tumblelog">
You can clearly see that the same Tumblr app has multiple URLs and everything validates correctly.
So why is it that this Tumblr page validates correctly and mine doesn't? How can a Facebook app be configured to allow being used on multiple domains?
I ran into this same issue. I figured that Tumblr must have some sort of partnership in place with Facebook to get this special treatment ( ip whitelist? special api? ) -- so I contacted my former Partnerships Rep at Facebook to enquire.
I got to speak with a platform engineer at Facebook about this, and I was totally wrong. There is nothing special going on.
The reason why all the domains running on Tumblr are validating fine with a single app_id, is that the facebook debug tool only checks the validity of the og_tag's structure (at least when it comes to the app_id). It does not validate if the app_id is properly associated with the given domain.
You can test this by putting up a test page with the your app_id on two different domains -- they'll both validate as fine in the debug tool.
When it comes to actual Facebook API access, Tumblr does everything on their domain. When people do use Facebook buttons/etc on Tumblr, it is often through a third party proxy tool (like ShareThis) or with a non-api button embed. I couldn't find a single custom-domain running on Tumblr that used the Facebook API or app_id related buttons. If you can, I'd love to see it.
It's the not answer you want (or I want) -- but that is what is happening. Tumblr's app_id appears on all the domains, but only actually works on ".tumblr.com"; The Facebook debug tool doesn't actually validate the app_id.
How can a Facebook app be configured to allow being used on multiple
domains?
If you try to add more than one domain in the app settings, you get an error that looks like this:
example.com must be derived from one of: Site URL, Mobile Site URL,
Canvas URL, Secure Canvas URL, Page Tab URL or Secure Page Tab URL.
example.org must be derived from one of: Site URL, Mobile Site URL,
Canvas URL, Secure Canvas URL, Page Tab URL or Secure Page Tab URL.
One solution is to set the "Page Tab URL" to a fake URL on example.org like so:
example.org/myfakepage
You don’t actually have to use the page tab for anything. This just allows you to add a second domain.
How can a facebook app be configured to allow being used on multiple domains?
It can’t. Facebook apps are tied to one domain (and subdomains thereof).
Imagine what would happen otherwise – someone could add lots of (big) websites to one single app, and then f.e. embed the JS SDK on each of them, and recognize a user that is connected to that app over “half the internet” … and thereby track their (almost) every step.
Facebook of course does not want this¹ – because they want to make money of the data they collect about users and their movements through the web (they can in theory track you on every single website that uses a simple like button) – they would be stupid if they gave that same ability to every app developer.
¹ OK, that’s my own assumption.
You cannot add multiple domains, unless the domains differ only by extension or subdomain.
In the example below, cuponeados differs only by domain extension (.com vs .com.ar), so both cuponeados.com.ar and cuponeados.com are allowed:
See this answer here: Need to add multiple domains in a single Facebook Application
The way Tumbler does this is to allow sub domains under their domain using *.example.com. This will permit all the sub-domains to work with their app (like odisharkins.example.com, facebook.example.com). There are certain aspects to adding several domains: look at the Facebook Blog.
Further domains must be derived from one of: Site URL, Mobile Site URL, Canvas URL, Secure Canvas URL, Page Tab URL or Secure Page Tab URL.
odisharkins.tumbler.com would not be an issue: it would work fine!
However, harkinstech.com or odisharkins.com will not work.
Worked for me: "The trick is to specify multiple app domains and use a comma separated list of valid URL's for the website URL configuration."
https://www.sitepoint.com/community/t/single-facebook-app-with-multiple-domains/99834/4
Go to developers.facebook.com.
Click on your application and edit the settings.
Add domains to that in the following form: example.com, example.org, subdomain.example.com (no http).
Save.
That’s the only way to do it, at east for the present time. You either add domains (and subdomains) manually or you can’t proceed.

Site not valid - but it is

So, I'm building a website called "dagbok.nu", which is swedish for "diary now" :)
Anyway, when creating the Facebook application, it claims that the site URL is invalid as well as the app domain. For site url, I used "http://dagbok.nu" and for site domain, I used "dagbok.nu". Please don't reply (as I've seen others do on similar issues) that I should type the site url with the scheme and the domain without - that's exactly what I'm doing.
Right, so according to another question here, one could trouble shoot this functionality using FB's own URL scraper, so I did just that:
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fdagbok.nu
And the reply: Error Parsing URL: Error parsing input URL, no data was scraped
Right, so now I can assume that the reason for it being considered invalid is because of FB not being able to scrape the URL. But why?
According to this question, one of the reasons seems to be that FB has deemed the URL insecure or "spammy". I've acquired this domain from a previous owner so this wasn't all that impossible. But when doing the same thing as Matthew in that post - i.e. trying to post in my timeline using the domain "http://dagbok.nu", I didn't get any information. The status box expanded as if to include a thumbnail and information about the link, but it only contained a "(No title)" text and nothing more.
So now I don't know what to do. I've tried to check the DIG and NS records from multiple servers around the web, and everyone seems to resolve it correctly, and I've had friends double check the URL from the states as well. I can't understand what's wrong and I have no idea how to ask someone at FB how to resolve this. Does anyone here have a good advice for this? Thanks in advance! :)
EDIT
When changing the domain to another domain that points to the exact same web server and document_root, it works! So this is definitely a problem with the domain "dagbok.nu" and not with the code on that page.
EDIT
When using the debug function above - I see no activity in the server log what so ever. Facebook doesn't even contact the server. When using the alternate url - the one from the last edit, it pops up in the logs as it should.
EDIT
I filed a bug report with Facebook, And their first response was that they were going to follow up. Now, a month later, I got an email that said "We are prioritizing bugs based on impact to the developer community. As this bug report has not received much attention from other developers, we are closing it so as to better focus on the top issues", and then they told me to go here to stackoverflow to try to solve my issue - but the issue is WITH THEM, and of course no one else have reported that my site doesn't work, it affects only me, and I haven't opened it yet due to this bug!
EDIT
I wanted to file a new bug report, but I can't even that now, since they are blocking bug reports with this URL as well!
I had to edit the URL - here is the new bug report
When Facebook tries to scrap your site for information, they send a call to your server with specific user agent called "facebookexternalhit"...
Facebook needs to scrape your page to know how to display it around
the site.
Facebook scrapes your page every 24 hours to ensure the properties are
up to date. The page is also scraped when an admin for the Open Graph
page clicks the Like button and when the URL is entered into the
Facebook URL Linter. Facebook observes cache headers on your URLs -
it will look at "Expires" and "Cache-Control" in order of preference.
However, even if you specify a longer time, Facebook will scrape your
page every 24 hours.
The user agent of the scraper is: "facebookexternalhit/1.1(+http://www.facebook.com/externalhit_uatext.php)"
Make sure it is not blocked by your server firewall
Look in your server log if it even tried to access your site
If you think this is a firewall issue look at this link
Your problem appears to be with your character encoding string. Your Apache server is currently sending the unsupported string latin1. You've defined your meta:content-type as iso-8859-1. See the w3c validator
From what I've seen, the Facebook parser will stop immediately if it encounters either an unrecognized character encoding string or a mismatch in character encoding strings between your header and meta tags.
The problem could be originating from either your httpd.conf or php.ini files. Change these to match your meta and restart Apache. Since the problem seems to be domain-specific, I'd check httpd.conf first.
Could your domain be blacklisted? Could you try messaging your url to someone, and see if Facebook gives you a "This message contains blocked content..." error?
For example:
If you don't provide certain minimum Facebook markup on your page, it will respond with "Error Parsing URL: Error parsing input URL, no data was scraped." I only looked at the homepage, but it appears that dagbok.nu contains no Facebook markup. I'm not sure what things must be present at minimum, but in my implementation, I assume the fb:app_id meta tag and the JavaScript SDK script must be there. You may want to take a look at http://developers.facebook.com/docs/guides/web/#plugins , particularly the Authentication section.
I discovered your question because I had this same error today for an unknown reason. I found that it was caused because the content of my og:image meta tag used an incorrect URL to the image I was trying to use. So as you add Facebook markup to your page, make sure your values are correct or you may continue to receive this message.
This doesn't seem to be a Facebook problem if you take a look at what I've discovered.
The results when testing it with W3C Online Validation Tool are 1 of 2 results.
Tested using: dagbok.nu but note http://dagbok.nu has no difference in test results. Remove the last forward slash in between tests.
Test: 1
Results: 72 Errors 0 Warning
Note: Shown here is a fragment of the source Frameset DOCTYPE webpage.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<NOSCRIPT><IMG SRC="http://svs.bystorm.se/rv?java=off"></NOSCRIPT><SCRIPT SRC="http://svs.bystorm.se/rvj"></SCRIPT>
<HTML STYLE="height:100%;">
<HEAD>
<META HTTP-EQUIV="content-type" CONTENT="text/html;charset=iso-8859-1">
Test: 2
Results: 4 Errors 1 Warning
Note: Shown here is a fragment of the source Transitional DOCTYPE webpage.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html >
<head>
<title>Dagbok: Framsida</title>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta name="author" content="Jonas Eklundh Communication (http://jonas.eklundh.com)">
<meta name="author-email" content="jonas#eklundh.com">
<meta name="copyright" content="Jonas Eklundh Communication #2012">
<meta name="keywords" content="Atlas,Innehållssystem,Jonas Eklundh">
<meta name="description" content="">
<meta name="creation-time" content="0,079s">
<meta name="kort" content="DGB">
Repeated tests loop these results when done a couple seconds apart indicating a page-redirect is occurring.
Security warnings are seen in Firefox and Chrome when visiting your site using these secure URL's:
https://dagbok.nu
https://www.dagbok.nu
The browser indicates the site should not be trusted because it's impersonating another site using invalid security certificate from *.loopiasecure.com
Recommendation: Check your .htaccess file, CMS Settings, page redirection, and security settings. Use the above source webpages to realize those file-locations / file-names that are being served to discover what's set incorrectly.
Once that's done, I think Facebook will be happy to then debug your webpage and provide additional recommendations.
Had the same problem and I discovered it was an incorrect IPv6 address in the AAAA records for my domain. The IPv4 record was correct, so the site worked in a browser but FB obviously check the IPv6 records!
This issue may also happen when Cloudflare is used. This is because Cloudflare protects the page from Facebook, which is then unable to collect the data, which in turn makes Facebook think the page is invalid.
My fix was:
Turn off Cloudflare for the page.
Scrape the page using Facebook's Dev Tools: https://developers.facebook.com/tools/debug/og/object
Click and let run the "Fetch new scrape information" button.
Re-enable cloudflare protection for the page.
You should then be able to continue to add the page where you needed.