Duplicate or link to WebSite JSON-LD? - schema.org

I'm replacing the microdata (itemscope et al) on our sites with JSON-LD. Do I need to declare the WebSite on every page, or can I place it once on the home page?
If the latter, will processors (by which I mean Google) tie each page to it automatically via the domain name, or is there some way to link to it? Given that "Linked Data" is right there in the name, I've found no examples that make use of it. They all replicate or embed the data directly in the thing that's linking.
For example, I want to link to our YouTube videos that we embed in articles, but Google doesn't understand a URL for the video property. If I expand it into a VideoObject, Google complains that I don't know the width, height, duration, etc. All that data is on youtube.com at the URL I'm specifying. Why can't it pull the video information itself?

Do I need to declare the WebSite on every page, or can I place it once on the home page?
From the perspectives of Schema.org and Linked Data, it’s perfectly fine (and I would say it’s even the best practice) to provide an item only once, and reference it via its URI whenever it’s needed.
In JSON-LD, this can be done with #id. For example:
<!-- on the homepage -->
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "WebSite",
"#id": "http://example.com/#site",
"hasPart": {
"#type": "WebPage",
"#id": "http://example.com/"
}
}
</script>
<!-- on another page -->
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "WebPage",
"#id": "http://example.com/foobar",
"isPartOf": {"#id": "http://example.com/#site"}
}
</script>
Whether Google actually follows these references is not clear (as far as I know, it’s undocumented)¹. It’s clear that their testing tool doesn’t show the data from referenced URIs, but that doesn’t have to mean much. At least their testing tool displays the URI (as "ID") in case one is provided.
If you want to provide a URL value for the video property, note that URL is not one of its expected values. While Schema.org still allows this (any property can have a text or URL value), it’s likely that some consumers will handle only expected values. It’s also perfectly fine to provide a VideoObject value if you only provide a url property. The fact that Google’s testing tool gives errors doesn’t mean that something’s wrong; it just means that Google won’t consider this video for their video-related rich results.
¹ But for the few rich result features Google offers, authors would typically not need to reference something from another page anyway, I guess. Referencing of URIs is typically done for other Semantic Web and Linked Data cases.

Related

Best practices for linking [<a href] canonical URL that have redirects while using localhost

Say I have a website:
https://www.example.com
This website has many different HTML pages such as:
https://www.example.com/page.html
The website is hosted on AWS Amplify and has a variety of 301 redirects which are handled with JSON. Below is an example:
[
{
"source": "https://www.example.com/page.html",
"target": "https://www.example.com/page",
"status": "301",
"condition": null
}
]
So, as result, my page is always showing /page instead of /page.html on the client side, as expected. I read a lot about canonical URLS today and learned:
For the quickest effect, use 3xx HTTP (also known as server-side) redirects.
Suppose your page can be reached in multiple ways:
https://example.com/home
https://home.example.com
https://www.example.com
Pick one of those URLs as your canonical URL, and use redirects to send traffic from the other URLs to your preferred URL.
From: How to specify a canonical with rel="canonical" and other methods | Google Search Central | Documentation | Google Developers
Which is what I did with the JSON in AWS. I also found that using <link rel="canonical" href="desired page" in the <head> of my HTML is the best practice for telling google (Analytics, etc.) which page is the desired canonical. Which I have since updated all my pages with.
Now the main problem is whenever you hover a href or copy the link address, it includes the .HTML extension on the client side. As soon as this link is pasted and entered the server updates without the .HTML extension. My question is what is the best practice to exclude the extension and display the target address when copying the link address or hovering and the href appearing in the bottom left (Chrome MacOS 110.0.5481.77).
I've seen sites using absolute paths that include the full domain. This isn't a problem, however, most of the development of the site is done on a localhost. Doing this will make that a hassle as I would have to type in the full local path each time which includes the .html extension to get an accurate representation locally. Is there a certain way to do this, which is the correct way?
*Most of this is all new information to me so if something I'm saying is invalid, please correct me.

How to describe a website's favicon using schema.org structured data?

Is there an accepted way to represent information about a website's favicon using https://schema.org? I am using image (to list here some of the icon variations that I generated using realfavicongenerator.net) with https://schema.org/WebSite in the following way:
<script type="application/ld+json">
{
"#context": "https://schema.org",
"#type": "WebSite",
"url": "http://localhost:4000/",
"name": "WebSite Name",
"image":
[
"/android-chrome-512x512.png",
"/android-chrome-192x192.png",
"/favicon-194x194.png"
]
}
</script>
A web search led me to discover the favicon for the schema.org website, but nothing about how to convey information about a website's favicon using schema.org.
Check the description of the logo property of Schema. As in your example, this property has:
Values expected to be one of these types ImageObject
However, there is also a difference from your example:
Used on these types Brand Organization Place Product Service
You can see that there is no WebSite type as specified in your example.
Additionally, you can use Google's guide for the logo property. Here you should pay attention to the following Google requirements:
logo URL
URL of a logo that is representative of the organization.
Additional image guidelines:
The image must be 112x112px, at minimum. The image URL must be
crawlable and indexable. The image file format must be supported by
Google Images.
From this info, we can conclude that Google requires a square, although this is not explicitly indicated.
Like any other image, the logo also needs to be compressed, and given the insignificant visual value, the compression can be maximum, for example, I use a 65% compression level.

How to ignore Microdata due to JSON-LD?

We are using a theme which unfortunately relies on Microdata like: <div itemscope itemtype="http://schema.org/Article">
We would like to use JSON-LD instead, however, a theme is constantly updated by the company which created it, and updating it after Microdata removal would take too much time and labor. I wondered if there is a tag which would say "ignore Microdata", so it could stay as it is and we could include our JSON-LD snippet without modifying a whole template.
There is no way to convey that the Microdata should be ignored.
In the ideal case, you would give the Microdata and the JSON-LD items that are about the same thing the same URI (itemid in Microdata, #id in JSON-LD).
<div itemscope itemtype="http://schema.org/Article" itemid="#the-article">
</div>
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "Article",
"#id": "#the-article"
}
</script>
That way, supporting consumers can learn that these items describe the same thing, i.e., there are not two articles, only one, and properties added to one item are also relevant for the other item.
If that’s not possible, you could try to "destroy" the Microdata without making the document invalid. You could do this with a script, after each release of a new version of the theme. Simply remove every itemtype attribute. Your document will still keep the Microdata, but it’s no longer using a vocabulary, so the structured data will likely not be re-used.

Set noindex for Google structured data

I'm using multiple blocks of structured data on my website:
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "Event",
"name": "Something",
"url": "http://www.example.com/?id=123"
}
</script>
This will show another bar under the search results of my website to directly visit the page to see more details about the event.
But if providing links like http://www.example.com/?id=123 Google will also show this link as normal link in search results.
But if setting noindex for this webpage Google will also refuse to list the events, won't it?
Does anybody know a solution?
Here's an image what I mean:
Put a canonical link with value http://www.example.com/ on the http://www.example.com/?id=123 page. All pages with query parameters will be merged into http://www.example.com/ regarding rankings.

Object Debugger Error Scraping Page ... near solution?

I have a very strange issue while sharing a page, probably connected to DNS used by Facebook.
I usually share pages from my own sites with no problem. In only one new site, I cannot correctly share any page.
where is the problem?
If I try to share a page from this new site (www.tarocchibluemoon.com), I expected to share an image, a page title etc.
However, I didn't see any images choosen from the ones in my page.
I used the debugger developers.facebook.com/tools/debug
and typed in the site http://www.tarocchibluemoon.com having a beautiful "Critical Errors must be fixed"
Looking deeper in Graph API I see:
{
"url": "http://www.tarocchibluemoon.com/",
"type": "website",
"title": "www.tarocchibluemoon.com",
"image": [
{
"url": "http://www.tarocchibluemoon.com/images/domain_reserviert.gif"
}
],
"updated_time": "2011-11-14T20:43:22+0000",
"id": "10150336639081017"
}
This means that debugger sees the site like it was a month ago when the provider showed the classic default page shown when you buy a new domain with written inside "The domain is reserved" (a page like this example).
Probably Facebook didn't received the update to the DNS done when I published the site!
I tried also to change again the IP address of my site but with no results.
I think the problem is the canonical tag in your head section (end of line 3):
<link href="http://bluemoon.thiellaconsulting.com/Default.aspx" rel="canonical" />
Facebook tries to scrape your canonical url - but in this case that url doesn't exist so you get a 'can't download' error.
If you switch that tag so it points to your current domain (or remove it altogether) you should allow Facebook to scrape the page and update it's graph entry.