Set noindex for Google structured data - robots.txt

I'm using multiple blocks of structured data on my website:
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "Event",
"name": "Something",
"url": "http://www.example.com/?id=123"
}
</script>
This will show another bar under the search results of my website to directly visit the page to see more details about the event.
But if providing links like http://www.example.com/?id=123 Google will also show this link as normal link in search results.
But if setting noindex for this webpage Google will also refuse to list the events, won't it?
Does anybody know a solution?
Here's an image what I mean:

Put a canonical link with value http://www.example.com/ on the http://www.example.com/?id=123 page. All pages with query parameters will be merged into http://www.example.com/ regarding rankings.

Related

How to describe a website's favicon using schema.org structured data?

Is there an accepted way to represent information about a website's favicon using https://schema.org? I am using image (to list here some of the icon variations that I generated using realfavicongenerator.net) with https://schema.org/WebSite in the following way:
<script type="application/ld+json">
{
"#context": "https://schema.org",
"#type": "WebSite",
"url": "http://localhost:4000/",
"name": "WebSite Name",
"image":
[
"/android-chrome-512x512.png",
"/android-chrome-192x192.png",
"/favicon-194x194.png"
]
}
</script>
A web search led me to discover the favicon for the schema.org website, but nothing about how to convey information about a website's favicon using schema.org.
Check the description of the logo property of Schema. As in your example, this property has:
Values expected to be one of these types ImageObject
However, there is also a difference from your example:
Used on these types Brand Organization Place Product Service
You can see that there is no WebSite type as specified in your example.
Additionally, you can use Google's guide for the logo property. Here you should pay attention to the following Google requirements:
logo URL
URL of a logo that is representative of the organization.
Additional image guidelines:
The image must be 112x112px, at minimum. The image URL must be
crawlable and indexable. The image file format must be supported by
Google Images.
From this info, we can conclude that Google requires a square, although this is not explicitly indicated.
Like any other image, the logo also needs to be compressed, and given the insignificant visual value, the compression can be maximum, for example, I use a 65% compression level.

Duplicate or link to WebSite JSON-LD?

I'm replacing the microdata (itemscope et al) on our sites with JSON-LD. Do I need to declare the WebSite on every page, or can I place it once on the home page?
If the latter, will processors (by which I mean Google) tie each page to it automatically via the domain name, or is there some way to link to it? Given that "Linked Data" is right there in the name, I've found no examples that make use of it. They all replicate or embed the data directly in the thing that's linking.
For example, I want to link to our YouTube videos that we embed in articles, but Google doesn't understand a URL for the video property. If I expand it into a VideoObject, Google complains that I don't know the width, height, duration, etc. All that data is on youtube.com at the URL I'm specifying. Why can't it pull the video information itself?
Do I need to declare the WebSite on every page, or can I place it once on the home page?
From the perspectives of Schema.org and Linked Data, it’s perfectly fine (and I would say it’s even the best practice) to provide an item only once, and reference it via its URI whenever it’s needed.
In JSON-LD, this can be done with #id. For example:
<!-- on the homepage -->
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "WebSite",
"#id": "http://example.com/#site",
"hasPart": {
"#type": "WebPage",
"#id": "http://example.com/"
}
}
</script>
<!-- on another page -->
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "WebPage",
"#id": "http://example.com/foobar",
"isPartOf": {"#id": "http://example.com/#site"}
}
</script>
Whether Google actually follows these references is not clear (as far as I know, it’s undocumented)¹. It’s clear that their testing tool doesn’t show the data from referenced URIs, but that doesn’t have to mean much. At least their testing tool displays the URI (as "ID") in case one is provided.
If you want to provide a URL value for the video property, note that URL is not one of its expected values. While Schema.org still allows this (any property can have a text or URL value), it’s likely that some consumers will handle only expected values. It’s also perfectly fine to provide a VideoObject value if you only provide a url property. The fact that Google’s testing tool gives errors doesn’t mean that something’s wrong; it just means that Google won’t consider this video for their video-related rich results.
¹ But for the few rich result features Google offers, authors would typically not need to reference something from another page anyway, I guess. Referencing of URIs is typically done for other Semantic Web and Linked Data cases.

What does Google mean by "The full body of the article"

If I'm wanting to enable article rich snippets on a page using JSON, Google says to do this:
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "NewsArticle",
"headline": "Article headline",
"alternativeHeadline": "The headline of the Article",
"image": ["thumbnail1.jpg", "thumbnail2.jpg"],
"datePublished": "2015-02-05T08:00:00+08:00",
"description": "A most wonderful article",
"articleBody": "The full body of the article"
}
</script>
Under articleBody it says to place The full body of the article. Does that literally mean the entire article from beginning to end?
Yes it does mean all of your article. This will certainly be used by search engines to index the content of your article and allow users to search on it.
Also avoid to have any html tags within this.

Google improved search box within the search results not working

I follow Google’s tutorial for improved search box within the search results just like above screenshot.
I added this code in my front sitepage:
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "WebSite",
"url": "https://www.xxx.com/",
"potentialAction": {
"#type": "SearchAction",
"target": "https://www.xxx.com/search/site/{keys}",
"query-input": "required name=keys"
}
}
</script>
Am I doing something wrong? My site uses Drupal 7.
Looks correct (assuming that /search/site/strawberry successfully searches for "strawberry").
Note that Google is not displaying the Sitelinks Search Box for all sites/queries:
Search box not displaying? The sitelinks search box appears only for navigational queries and when relevant for users. Google algorithms use a variety of factors to determine when the box appears, including the information on the site and different types of navigational queries from Search users.
Edit: 2017-05-09 Updated Sitelinks Search Box Information and URL

Object Debugger Error Scraping Page ... near solution?

I have a very strange issue while sharing a page, probably connected to DNS used by Facebook.
I usually share pages from my own sites with no problem. In only one new site, I cannot correctly share any page.
where is the problem?
If I try to share a page from this new site (www.tarocchibluemoon.com), I expected to share an image, a page title etc.
However, I didn't see any images choosen from the ones in my page.
I used the debugger developers.facebook.com/tools/debug
and typed in the site http://www.tarocchibluemoon.com having a beautiful "Critical Errors must be fixed"
Looking deeper in Graph API I see:
{
"url": "http://www.tarocchibluemoon.com/",
"type": "website",
"title": "www.tarocchibluemoon.com",
"image": [
{
"url": "http://www.tarocchibluemoon.com/images/domain_reserviert.gif"
}
],
"updated_time": "2011-11-14T20:43:22+0000",
"id": "10150336639081017"
}
This means that debugger sees the site like it was a month ago when the provider showed the classic default page shown when you buy a new domain with written inside "The domain is reserved" (a page like this example).
Probably Facebook didn't received the update to the DNS done when I published the site!
I tried also to change again the IP address of my site but with no results.
I think the problem is the canonical tag in your head section (end of line 3):
<link href="http://bluemoon.thiellaconsulting.com/Default.aspx" rel="canonical" />
Facebook tries to scrape your canonical url - but in this case that url doesn't exist so you get a 'can't download' error.
If you switch that tag so it points to your current domain (or remove it altogether) you should allow Facebook to scrape the page and update it's graph entry.