How to describe a website's favicon using schema.org structured data? - favicon

Is there an accepted way to represent information about a website's favicon using https://schema.org? I am using image (to list here some of the icon variations that I generated using realfavicongenerator.net) with https://schema.org/WebSite in the following way:
<script type="application/ld+json">
{
"#context": "https://schema.org",
"#type": "WebSite",
"url": "http://localhost:4000/",
"name": "WebSite Name",
"image":
[
"/android-chrome-512x512.png",
"/android-chrome-192x192.png",
"/favicon-194x194.png"
]
}
</script>
A web search led me to discover the favicon for the schema.org website, but nothing about how to convey information about a website's favicon using schema.org.

Check the description of the logo property of Schema. As in your example, this property has:
Values expected to be one of these types ImageObject
However, there is also a difference from your example:
Used on these types Brand Organization Place Product Service
You can see that there is no WebSite type as specified in your example.
Additionally, you can use Google's guide for the logo property. Here you should pay attention to the following Google requirements:
logo URL
URL of a logo that is representative of the organization.
Additional image guidelines:
The image must be 112x112px, at minimum. The image URL must be
crawlable and indexable. The image file format must be supported by
Google Images.
From this info, we can conclude that Google requires a square, although this is not explicitly indicated.
Like any other image, the logo also needs to be compressed, and given the insignificant visual value, the compression can be maximum, for example, I use a 65% compression level.

Related

Set noindex for Google structured data

I'm using multiple blocks of structured data on my website:
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "Event",
"name": "Something",
"url": "http://www.example.com/?id=123"
}
</script>
This will show another bar under the search results of my website to directly visit the page to see more details about the event.
But if providing links like http://www.example.com/?id=123 Google will also show this link as normal link in search results.
But if setting noindex for this webpage Google will also refuse to list the events, won't it?
Does anybody know a solution?
Here's an image what I mean:
Put a canonical link with value http://www.example.com/ on the http://www.example.com/?id=123 page. All pages with query parameters will be merged into http://www.example.com/ regarding rankings.

Duplicate or link to WebSite JSON-LD?

I'm replacing the microdata (itemscope et al) on our sites with JSON-LD. Do I need to declare the WebSite on every page, or can I place it once on the home page?
If the latter, will processors (by which I mean Google) tie each page to it automatically via the domain name, or is there some way to link to it? Given that "Linked Data" is right there in the name, I've found no examples that make use of it. They all replicate or embed the data directly in the thing that's linking.
For example, I want to link to our YouTube videos that we embed in articles, but Google doesn't understand a URL for the video property. If I expand it into a VideoObject, Google complains that I don't know the width, height, duration, etc. All that data is on youtube.com at the URL I'm specifying. Why can't it pull the video information itself?
Do I need to declare the WebSite on every page, or can I place it once on the home page?
From the perspectives of Schema.org and Linked Data, it’s perfectly fine (and I would say it’s even the best practice) to provide an item only once, and reference it via its URI whenever it’s needed.
In JSON-LD, this can be done with #id. For example:
<!-- on the homepage -->
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "WebSite",
"#id": "http://example.com/#site",
"hasPart": {
"#type": "WebPage",
"#id": "http://example.com/"
}
}
</script>
<!-- on another page -->
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "WebPage",
"#id": "http://example.com/foobar",
"isPartOf": {"#id": "http://example.com/#site"}
}
</script>
Whether Google actually follows these references is not clear (as far as I know, it’s undocumented)¹. It’s clear that their testing tool doesn’t show the data from referenced URIs, but that doesn’t have to mean much. At least their testing tool displays the URI (as "ID") in case one is provided.
If you want to provide a URL value for the video property, note that URL is not one of its expected values. While Schema.org still allows this (any property can have a text or URL value), it’s likely that some consumers will handle only expected values. It’s also perfectly fine to provide a VideoObject value if you only provide a url property. The fact that Google’s testing tool gives errors doesn’t mean that something’s wrong; it just means that Google won’t consider this video for their video-related rich results.
¹ But for the few rich result features Google offers, authors would typically not need to reference something from another page anyway, I guess. Referencing of URIs is typically done for other Semantic Web and Linked Data cases.

Google improved search box within the search results not working

I follow Google’s tutorial for improved search box within the search results just like above screenshot.
I added this code in my front sitepage:
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "WebSite",
"url": "https://www.xxx.com/",
"potentialAction": {
"#type": "SearchAction",
"target": "https://www.xxx.com/search/site/{keys}",
"query-input": "required name=keys"
}
}
</script>
Am I doing something wrong? My site uses Drupal 7.
Looks correct (assuming that /search/site/strawberry successfully searches for "strawberry").
Note that Google is not displaying the Sitelinks Search Box for all sites/queries:
Search box not displaying? The sitelinks search box appears only for navigational queries and when relevant for users. Google algorithms use a variety of factors to determine when the box appears, including the information on the site and different types of navigational queries from Search users.
Edit: 2017-05-09 Updated Sitelinks Search Box Information and URL

Object Debugger Error Scraping Page ... near solution?

I have a very strange issue while sharing a page, probably connected to DNS used by Facebook.
I usually share pages from my own sites with no problem. In only one new site, I cannot correctly share any page.
where is the problem?
If I try to share a page from this new site (www.tarocchibluemoon.com), I expected to share an image, a page title etc.
However, I didn't see any images choosen from the ones in my page.
I used the debugger developers.facebook.com/tools/debug
and typed in the site http://www.tarocchibluemoon.com having a beautiful "Critical Errors must be fixed"
Looking deeper in Graph API I see:
{
"url": "http://www.tarocchibluemoon.com/",
"type": "website",
"title": "www.tarocchibluemoon.com",
"image": [
{
"url": "http://www.tarocchibluemoon.com/images/domain_reserviert.gif"
}
],
"updated_time": "2011-11-14T20:43:22+0000",
"id": "10150336639081017"
}
This means that debugger sees the site like it was a month ago when the provider showed the classic default page shown when you buy a new domain with written inside "The domain is reserved" (a page like this example).
Probably Facebook didn't received the update to the DNS done when I published the site!
I tried also to change again the IP address of my site but with no results.
I think the problem is the canonical tag in your head section (end of line 3):
<link href="http://bluemoon.thiellaconsulting.com/Default.aspx" rel="canonical" />
Facebook tries to scrape your canonical url - but in this case that url doesn't exist so you get a 'can't download' error.
If you switch that tag so it points to your current domain (or remove it altogether) you should allow Facebook to scrape the page and update it's graph entry.

From where does Google get the abstract for each of its site results, that it displays on its search result page?

I am working on a project in which i have to search for terms on a search engine and then cluster the results on their contextual sense. So i have to treat each result as a document. unfortunately, the data present along with each result on the result page is too little for clustering. Hence, I wanted to know from where the search engines get the abstract for each result that they show. If i could get that entire abstract then i can cluster the results by treating them as separate documents.
From where does google get the abstract ?
For eg: If you search for "1000 Mile" on google, the second result shows the following abstract:
"The women's 1000 Mile Collection is based on classic designs and reflects Wolverine's long heritage of crafting quality footwear. Complementing these classics ..."
This abstract is not present in the Meta tags of the page.
From where does Google find this data.
Thanks
From Does Google use the Meta Description Tag for Description of Page?
Google will choose your search results snippets from the following places (not necessarily in this order):
The page's Meta Description tag
The page's Open Directory Project (ODP) Listing
Page content relevant to the search query
If you do not want Google to use the ODP listing's description then you can tell them not to do so with the following Meta tag:
<meta name="robots" content="NOODP">
If you want to encourage Google to use your Meta Description tag then make sure it is unique to each page. Also make sure it contains an accurate description of the page's content.
In thew absence of an ODP description and Meta Description tag, Google will use a portion of the page's text as the description. This text will contain the closest matches to the search query. I have not seen any official limit to how long this can be but a couple of sentences seems about right.
On a related note, if you don't want a snippet to be shown with a particular page you can use the following Meta tag to prevent one from being shown:
<meta name="robots" content="nosnippet">
See this blog post for Google's tips on using the meta description tag.
According to this site, "The meta description should typically be at most 145 to 150 characters in length as these are the maximum number of characters typically displayed at Yahoo! and Google, respectively."
That site is Flash-based, and Google can index Flash content, so given that the snippet isn't in the HTML source of the page as you point out, nor is it in the cached version of the page, I'm guessing that it's somewhere in the Flash movie.
It's kind of arbitrary that the snippet mentions 'The women's 1000 Mile Collection' while the site link itself is to the parent category of 1000 mile, not just women's, so I'm guessing here that gathering snippet-friendly metadata from a Flash site is an imprecise science. That's my best guess.
In this Google Webmaster blog post, they explain how they use external text or HTML files loaded into the Flash movie, and in one of the comments Jonathan Simon says (sorry):
"We try our best to crawl Flash content but the results can sometimes be less than ideal. You are only seeing a title in the search results for your site because that's the only bit of HTML text that you have outside of your Flash content. You could add a Meta description element to offer more information in HTML. You could also add some other text that's not a part of your Flash content. Just doing this should improve the snippet you see associated with your site in the search results."