Schema.org: Use Microdata, RDFa or JSON-LD? - schema.org

Are there any advantages/disadvantages in using a specific format for http://www.schema.org/Product? Something like "Searchengines understand Microdata better than JSON-LD"? I would like to use JSON-LD, because it doesn't mess-up with your html-code, but I'm not sure if it would be better concerning the searchengines to use Microdata.

There is no general answer, it depends on the consumer of the data.
A specific consumer supports a specific set of syntaxes, and might or might not recommend a subset of these supported syntaxes.
Because search engines usually try to make sure not to get lead astray (e.g., a page about X claims via its Schema.org use to be about Y), it seems natural that they would prefer a syntax that couples the Schema.org metadata to the visible content of the page (in HTML5, this would be Microdata or RDFa); for the same reasons why many meta tags are dead for SEO.
However, this is not necessarily always the case. Google, for example, recommends the use of JSON-LD for a few of their features (bold emphasis mine):
Promote Critic Reviews:
Though we strongly recommend using JSON-LD, Google can also read schema.org fields embedded in a web page with the microdata or RDFa standards.
Sitelinks Search Box:
We recommend JSON-LD. Alternatively, you can use microdata.

Related

Will a custom field in micro data (schema.org) make the whole microdata wrong?

I want to use json ld micro-data from schema.org to reference images on my website but at the same time retrieve the json ld object in my Javascript to use it for others things. The problem is that I want to add custom fields that do not match any type in Schema.org. My question is, will search engines just ignore the fields they don't recognize or discard the whole micro-data ?
Short answer, yes they’ll ignore properties they don’t define, and it’s quite normal to use JSON-LD for other purposes, such as to drive your UI. That said, it’s best if these properties and types come from a known vocabulary, or even resolve to your own. It’s always good to test your data patterns first, using any of the online tools available.
Also, it’s not JSON-LD micro-data, they are actually two different things, both of which (along with RDFa) can be used to provide schema.org markup. JSON-LD is contained in a script element, the others are expressed using HTML attributes.

Is there a relationship between schema.org and WAI-ARIA?

Is there a relationship between schema.org and WAI-ARIA?
Is one a subset of the other? Are they different? Are there any commonalities?
No, there is no relationship. Neither is a sub-set of the other.
Schema.org is intended to provide search engines with additional information about content, via microdata, RDFa, and JSON-LD. Within HTML you would use microdata. You can read more on using microdata at schema.org. There is no formal standards body behind it, it is defined by the major search engines.
ARIA (Accessible Rich Internet Applications) is a bridging technology that allows authors to add additional information to HTML so that assistive technology can make better use of it. Ideally it will go away as browsers catch up. It has no bearing on search engines. It is maintained by W3C, where you can read an overview on ARIA.

Is there such a thing as RESTful URLs

I'm new to REST. I think I'm understanding most of it. However, whether a particular style of URL has anything to do with REST is still confusing. Part of what I read on the web talks about the URL style and in other sources I have seen it argued that the URL style has absolutely nothing to do with REST. So what's the correct answer? And if the answer is that the URL style has nothing to do with it, then why do so many frameworks that "support" REST enforce (or at least support) one style of URL? I would just like to put this issue to rest (bad pun intended).
I think that there is no strict rules that require some fixed URL style. And you can use diferrent styles. But will be better if you use characters of lower case with meaningful words that will at least give some ideas about operation that it performs.

Can we measure complexity of web site?

I am familiar with using cyclomatic complexity to measure software. However, in terms of web site, do we have a kind of metrics to measure complexity of website?
If you count HTML tags in the displayed HTML pages, as "Operators", you can compute a Halstead number for each web page.
If you inspect the source code that produces the web pages, you can compute complexity measures (Halstead, McCabe, SLOC, ...) of those. To do that, you need tools that can compute such metrics from the web page sources.
Our SD Source Code Search Engine (SCSE) is normally used to search across large code bases (e.g., the web site code) even if the code base is set of mixed languages (HTML, PHP, ASP.net, ...). As a side effect, the SCSE just so happens to compute Halstead, McCabe, SLOC, comment counts, and a variety of other basic measurements, for each file it can search (has indexed).
Those metrics are exported as an XML file; see the web link above for an example.
This would give you a rough but immediate ability to compute web site complexity metrics.
Though the question was asked 6 moths ago...
If your website is 100% static site with no javascript at all, then it needs to be powered by a back-end programmed in a programming language. So indirectly, the complexity measures that afflict the back-end programming will also affect the complexity of maintaining the site.
Typicall, I've observed a corellation between the maintainability and quality (or lack thereof) of the web pages themselves to the quality (or lack thereof) exhibited, through software metrics, in the back-end programming. Don't quote me on that, and take it with a grain of salt. It is purely an observation I've made in the gigs I've worked on.
If your site - dynamic content or not - also has JavaScript in it, then this is also source code that demonstrate measurable attributes in terms of software complexity metrics. And since JavaScript is typically used for rendering HTML content, it stands as a possibility (but not as a certainty) that attrocious, hard to maintain JavaScript will render similarly attrocious, hard to maintain HTML (or be embedded in attrocious, hard to maintain markup.)
For a completely static site, you could still devise some type of metrics, though I'm not aware of any that are publisized.
Regarless, a good web site should have uniform linking.
It should provide uniform navigation.
Also, html pages shouldn't be replicated or duplicated, little to no dead links.
Links within the site should be relative (either to their current location or to the logical root '/') and not absolute. That is, don't hard-code the domain name.
URI naming patterns should be uniform (preferably lower case.) URLs are case-insensitive, so it makes absolutely no sense to have links that mix both cases. Additionally, links might be mapping to stuff in actual filesystems that might be case-sensitive, which leads to the next.
URIs that represent phrases should be uniform (either use - or _ to separate words, but not both, and certainly no spaces). Avoid camel case (see previous comment on lower casing.)
I'm not aware of any published or advocated software-like metrics for web sites, but I would imagine that if there are, they might try to measure some of the attributes I mentioned above.
I suppose you could consider "hub scores" to be a complexity metric since it considers how many external sites are referenced. "Authoritative sources in a hyperlinked environment" by Jon Kleinberg discusses it.
This is a great paper on the topic.

What criteria should I take into consideration when allowing users to enter rich text on my website?

I've worked with several different types of "user-generated content" sites: wikis, a message board, blogs... These systems can differ greatly: a blog post editor allows more control over presentation than that for comments on the blog post, a wiki topic editor encourages wiki links over raw URLs, etc.
However, one key design decision is common to each: should I provide the user with a simplified markup language such as Wikitext, Markdown or BBCode, forcing users to learn that, or should I give them a WYSIWYG editor like CKEdit or TinyMCE and filter or transform the resulting HTML behind the scenes?
There was a time when I thought this was a simple matter of identifying my intended audience: tech-minded users get markup, non-technical get WYSIWYG. In practice, this hasn't worked out all that well, occasional users struggling with markup and the WYSIWYG editors providing at best a leaky abstraction for the underlying HTML.
So with my initial confidence throughly crushed, I come looking for advice:
What factors should I be taking into account when making this decision?
Have simple markup systems become commonplace enough that I can rely on most users having at least some familiarity with them?
...Or should I abandon them as merely a relic of the past, and work on finding ways to make WYSIWYG work more effectively...?
I'm not looking to go back and tear apart what I've already done. For better or worse, these systems are working, their few users comfortable or at least competent by now. But it would be nice to have some better guidelines when putting together future designs.
One approach that seems to work fairly well is the use of Markdown as done here on SO. Stupid and/or lazy people (with apologies to all who are) can simply throw text in the box; it comes out looking as messy but it's mostly there and readable. People who care about how their text looks can do some simple things that are for the most part almost intuitive (like leaving blank lines between paragraphs, putting asterisks or numbers before list items) and it Just Works™
This is Good Enough™ for a lot of applications and people. Some of the glitzier sites, such as Google Blogs, give you your choice (changeable at the click of a button) of editing raw HTML or using a WYSIWYG editor (that fails just often enough that I usually opt for raw HTML). In theory, you could even give your users 3 alternatives, such as Markdown, HTML and WYSIWYG; but at some point you'll be wondering why you even bothered. Some users will always struggle with some aspect of the interface and they'll blame you. I believe in finding a happy medium and not bothering to make everybody happy.
From my point of view, the most important considerations are those of security. If you allow raw HTML, your users can insert spam and malware and basically hijack your site for their purposes; so you have to carefully limit what's allowed. Another consideration is that if you allow, e.g. H1 headers, people can take up a lot of space and attention with posts that should really be subordinate. If you allow CSS (including style attributes in HTML tags) then again there are ways to deface your "real" content. Another big problem stems from unclosed or unmatched tags. These are the really serious problems, and you want to err on the side of strictness to avoid more serious problems.
"What factors should I be taking into account when making this decision?" What do your customers want? Can you not have a 'fall back' kind of system where the 'simplified' WYSIWYG can be used until they need the added features of raw markup? What kinds of things do the most users use most often? What features are used less often but, when they are needed, you customers cannot live without?
"Have simple markup systems become commonplace enough that I can rely on most users having at least some familiarity with them?" For people using wikis and blogs, I think the answer is yes. Even commenters on blogs to some extent for the simplest things but again, I think you should let them do markup in-line if they are able (or some common sub set of markup) and have the option of more power if they need it.
"..Or should I abandon them as merely a relic of the past, and work on finding ways to make WYSIWYG work more effectively...?" I would not take this all on at once. Work from a solid kernel of functionality and work outward to a complete system.