Specify sitemap language (same language for all sitemap) - google-search-console

According to Google, you can specify languages in a sitemap like this:
<url>
<loc>http://www.example.com/english/page.html</loc>
<xhtml:link
rel="alternate"
hreflang="de"
href="http://www.example.com/deutsch/page.html"/>
<xhtml:link
rel="alternate"
hreflang="en"
href="http://www.example.com/english/page.html"/>
</url>
However, I just need to specify that ALL the sitemap/website is in Spanish, which means it's not a multi-language sitemap, it's a one-language sitemap but that language happens to be "Spanish".
Should I include a hreflang tag for each and every URL? or is there a better way to do this, like specifying it in the header section?

No, setting header for the sitemap xml only sets it for the sitemap.xml and not all locations declared in the sitemap. You have to declare it for all locations.
Checked with the URL inspection tool to see if there are any errors with Google trying to index your site.
If you had access to the server, you can set Link response header alongst the following lines.
Nginx
add-header Link <$scheme://$host$request_uri>; rel="alternate"; hreflang="es"
Apache
Header set Link "<%{REQUEST_SCHEME}://%{HTTP_HOST}%{REQUEST_URI}>; rel=\"alternate\"; hreflang=\"es\""
Also, you could set the "lang" attribute on the "html" tag or "link" tag of every page in your website. You can use a template for this if your site is built using a static site generator.
If you only have access in Cloud console, you have to make an entry for every location in your website in the sitemap.xml.

Related

Best practices for linking [<a href] canonical URL that have redirects while using localhost

Say I have a website:
https://www.example.com
This website has many different HTML pages such as:
https://www.example.com/page.html
The website is hosted on AWS Amplify and has a variety of 301 redirects which are handled with JSON. Below is an example:
[
{
"source": "https://www.example.com/page.html",
"target": "https://www.example.com/page",
"status": "301",
"condition": null
}
]
So, as result, my page is always showing /page instead of /page.html on the client side, as expected. I read a lot about canonical URLS today and learned:
For the quickest effect, use 3xx HTTP (also known as server-side) redirects.
Suppose your page can be reached in multiple ways:
https://example.com/home
https://home.example.com
https://www.example.com
Pick one of those URLs as your canonical URL, and use redirects to send traffic from the other URLs to your preferred URL.
From: How to specify a canonical with rel="canonical" and other methods | Google Search Central | Documentation | Google Developers
Which is what I did with the JSON in AWS. I also found that using <link rel="canonical" href="desired page" in the <head> of my HTML is the best practice for telling google (Analytics, etc.) which page is the desired canonical. Which I have since updated all my pages with.
Now the main problem is whenever you hover a href or copy the link address, it includes the .HTML extension on the client side. As soon as this link is pasted and entered the server updates without the .HTML extension. My question is what is the best practice to exclude the extension and display the target address when copying the link address or hovering and the href appearing in the bottom left (Chrome MacOS 110.0.5481.77).
I've seen sites using absolute paths that include the full domain. This isn't a problem, however, most of the development of the site is done on a localhost. Doing this will make that a hassle as I would have to type in the full local path each time which includes the .html extension to get an accurate representation locally. Is there a certain way to do this, which is the correct way?
*Most of this is all new information to me so if something I'm saying is invalid, please correct me.

Sitemap not really working in typo3 9.5.x

I'm trying to get a sitemap working in Typo3 9.5.x. If I go to https://domain.tld/?type=1533906435 I get the following page.
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/typo3/sysext/seo/Resources/Public/CSS/Sitemap.xsl"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://domain.tld/index.html?sitemap=pages&cHash=38eee382dd3fc2edb80b67944d477100</loc>
<lastmod>2019-10-07T13:57:04-07:00</lastmod>
</sitemap>
</sitemapindex>
So far so good. But the link in there should take me to the actual sitemap, but instead takes me straight to the root page without any redirection. This happens on 2 different sites. I didn't configure anything special, just enabled the seo system extension and included the static template as described here.
When I submitted the sitemap to Google's Search Console, it said "could not fetch", but the next day the status was "Success" and it discovered URLS. I guess Google crawled the root page and found the links on it.
How do I get the sitemap working or is there a bug somewhere?
The index.html part in the path to the pages sitemap looks weird to me.
Can you please try to open the loc url without the index.html part?
If you can see the sitemap then, we have to take a look where this index.html is coming from.

GitHub Pages: Image From Link Not Showing

I used to have my website hosted through Shopify, and when I linked to it in my LinkedIn job description the logo showed up. I've since moved my website to GitHub Pages, and now the logo is blank when I link to it in LinkedIn (or anywhere else for that matter). Is there something I can do to fix this, or is it just a con of GH Pages?
It always helps to include a link to the codebase for reference, but it looks like you're likely working with this repo on your GitHub profile.
It's possible that Shopify or a theme you were using before included these by default, but typically you have to specify the preview image in your site's metadata. The preview images for formatted links are pulled from an Open Graph image property, which you define in a meta tag in your HTML's <head> section (see the OG documentation here). So, in your head include file, you'd add a meta tag like this:
<meta property="og:image" content="https://graemeharrison.com/assets/img/logo.png" />
Then, ideally, you'll include this head file in each layout file so that it's included in each page's HTML.
A couple of things that worked for me:
Put your image in your 'public' directory near index.html, and in your meta tag retrieve it with content="http://yourdomain.com/yourimage.png". (https didn't work for me but http did)
Also, https://www.linkedin.com/post-inspector is a good tool to check your og image appears.

What is the best approach for redirection of old pages in Jekyll and GitHub Pages?

I have blog on github pages - jekyll
What is the best way to solve url strategy migration?
I found the best practice in common is create htaccess like so
Redirect 301 /programovani/2010/04/git-co-to-je-a-co-s-tim/ /2010/04/05/git-co-to-je-a-co-s-tim.html
But it does not seems to work with Github. Another solution i found is create rake task, which will generate redirection pages. But since it's an html, it's not able to send 301 head, so SE crawlers will not recognize it as an redirection.
The best solution is to use both <meta http-equiv="refresh" and <link rel="canonical" href=
It works very well, Google Bot reindexed my entire website under new links without losing positions. Also the users are redirected to the new posts right away.
<meta http-equiv="refresh" content="0; url=http://konradpodgorski.com/blog/2013/10/21/how-i-migrated-my-blog-from-wordpress-to-octopress/">
<link rel="canonical" href="http://konradpodgorski.com/blog/2013/10/21/how-i-migrated-my-blog-from-wordpress-to-octopress/" />
Using <meta http-equiv="refresh" will redirect each visitor to the new post.
As for Google Bot, it treats <link rel="canonical" href= as 301 redirect, the effect is that you get your pages reindexed and that is what you want.
I described whole process how I moved my blog from Wordpress to Octopress here.
http://konradpodgorski.com/blog/2013/10/21/how-i-migrated-my-blog-from-wordpress-to-octopress/#redirect-301-on-github-pages
Have you tried the Jekyll Alias Generator plugin?
You put the alias urls in the YAML front matter of a post:
---
layout: post
title: "My Post With Aliases"
alias: [/first-alias/index.html, /second-alias/index.html]
---
When a user visits one of the alias urls, they are redirected to the main url via a meta tag refresh:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta http-equiv="refresh" content="0;url=/blog/my-post-with-aliases/" />
</head>
</html>
See also this blog post on the subject.
redirect-from plugin
https://github.com/jekyll/jekyll-redirect-from#redirect-to
It is supported by GitHub and makes it easy:
_config.yml
gems:
- jekyll-redirect-from
a.md
---
permalink: /a
redirect_to: 'http://example.com'
---
as explained at: https://help.github.com/articles/redirects-on-github-pages/
Now:
firefox localhost:4000/a
will redirect you to example.com.
The plugin takes over whenever the redirect_to is defined by the page.
Tested on GitHub pages v64.
Note: this version has a serious recently fixed bug which wrongly reuses the default layout for the redirect: https://github.com/jekyll/jekyll-redirect-from/pull/106
Manual layout method
If you don't feel like using https://github.com/jekyll/jekyll-redirect-from it's easy to implement it yourself:
a.md
---
layout: 'redirect'
permalink: /a
redir_to: 'http://example.com'
sitemap: false
---
_layouts/redirect.html based on Redirect from an HTML page :
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Redirecting...</title>
{% comment %}
Don't use 'redirect_to' to avoid conflict
with the page redirection plugin: if that is defined
it takes over.
{% endcomment %}
<link rel="canonical" href="{{ page.redir_to }}"/>
<meta http-equiv="refresh" content="0;url={{ page.redir_to }}" />
</head>
<body>
<h1>Redirecting...</h1>
<a href="{{ page.redir_to }}">Click here if you are not redirected.<a>
<script>location='{{ page.redir_to }}'</script>
</body>
</html>
Like this example, the redirect-from plugin does not generate 301s, only meta + JavaScript redirects.
We can verify what is going on with:
curl localhost:4000/a
This solution allows you to use true HTTP redirects via .htaccess — however, nothing involving .htaccess will work on GitHub pages because they do not use Apache.
As of May 2014 GitHub Pages supports redirects, but according to the jekyll-redirect-from Gem documentation they are still based on HTTP-REFRESH (using <meta> tags), which requires full a page load before redirection can occur.
I don't like the <meta> approach so I whipped up a solution for anyone looking to provide real HTTP 301 redirects within an .htaccess file using Apache, which serves a pre-generated Jekyll site:
First, add .htaccess to the include property in _config.yml
include: [.htaccess]
Next, create an .htaccess file and be sure to include YAML front matter. Those dashes are important because now Jekyll will parse the file with Liquid, Jekyll's templating language:
---
---
DirectoryIndex index.html
RewriteEngine On
RewriteBase /
...
Make sure your posts that require redirects have two properties like so:
---
permalink: /my-new-path/
original: blog/my/old/path.php
---
Now in .htaccess, just add a loop:
{% for post in site.categories.post %}
RewriteRule ^{{ post.original }} {{ post.permalink }} [R=301,L]
{% endfor %}
This will dynamically generate .htaccess every time you build the site, and the include in your config file ensures that .htaccess makes it into _site directory.
RewriteRule ^blog/my/old/path.php /my-new-path/ [R=301,L]
From there, it's up to you to serve _site using Apache. I normally clone the full Jekyll repo into a non-webroot directory, then my vhost is a symlink to the _site folder:
ln -s /path/to/my-blog/_site /var/www/vhosts/my-blog.com
Tada! Now Apache can serve the _site folder from your virtual root, complete with .htaccess-powered redirects that use whichever HTTP response code you desire!
You could even get super fancy and use a redirect property within each post's front matter to designate which redirect code to use in your .htaccess loop.
The best option is to avoid url changes altogether by setting the permalink format in _config.yml to match your old blog.
Beyond that, the most complete solution is generating redirect pages, but isn't necessarily worth the effort. I ended up simply making my 404 page a bit friendlier, with javascript to guess the correct new url. It doesn't do anything for search, but actual users can get to the page they were looking for and there's no legacy stuff to support in the rest of the code.
http://tqcblog.com/2012/11/14/custom-404-page-for-a-github-pages-jekyll-blog/
Since github doesn't allow 301 redirects (which isn't surprising), you'll have to make a decision between moving to your new URL structure (and taking a search engine hit) or leaving the URLs the way they are. I suggest you go ahead and make the move. Let the search engine chips fall where they may. If someone hits one of your old links via the search engine, they'll be redirected to the new location. Over time, the search engines will pick up your changes.
Something you can do to help matters is to create a Sitemap where you only list your new pages and not the old ones. This should speed up the replacement of old URLs with the new ones. Additionally, if all your old URLs are in your '/programovani' directory, you can also use a robots.txt file to tell future crawls they should ignore that directory. For example:
User-agent: *
Disallow: /programovani/
It will take a little while for the search engines to catch up with the changes. This isn't really a big deal. As long as the old URLs still exist and redirect actual people to the active pages, you'll be fine.
As others have mentioned, the best solution is to preserve working URLs or duplicate the pages and specify a canonical URL.
Since github pages doesn't support true redirects, I chose to set up rerouter on Heroku to return 301 (permanent) redirects from my site's old domain to the new one. I described the details here:
http://joey.aghion.com/simple-301-redirects/
Jekyll has gone through some major updates in the past few months, so maybe this wasn't possible when this question was originally posted...
Jekyll supports a permalink attribute in the YAML front-matter section of your blog posts. You can specify the URL that you would like your post to have and Jekyll will use that (instead of the filename) when generating your site.
---
title: My special blog post
permalink: /programovani/2010/04/git-co-to-je-a-co-s-tim
---
My blog post markdown content

Using wget to download dokuwiki pages in plain xhtml format only

I'm currently modifying the offline-dokuwiki[1] shell script to get the latest documentation for an application for automatically embedding within instances of that application. This works quite well except in its current form it grabs three versions of each page:
The full page including header and footer
Just the content without header and footer
The raw wiki syntax
I'm only actually interested in 2. This is linked to from the main pages by a html <link> tag in the <head>, like so:
<link rel="alternate" type="text/html" title="Plain HTML"
href="/dokuwiki/doku.php?do=export_xhtml&id=documentation:index" />
and is the same url as the main wiki pages only they contain 'do=export_xhtml' in the querystring. Is there a way of instructing wget to only download these versions or to automatically add '&do=export_xhtml' to the end of any links it follows? If so this would be a great help.
[1] http://www.dokuwiki.org/tips:offline-dokuwiki.sh (author: samlt)
DokuWiki accepts the do parameter as HTTP header as well. You could run wget with the parameter --header "X-DokuWiki-Do: export_xhtml"