What is the difference between the Schema.org properties isPartOf and hasPart and when to use the one instead of the other?
As noted on their pages, they are inverse properties.
As an example, let’s take a webpage that is part of a website. You could then state one of these:
WebSite hasPart WebPage
WebPage isPartOf WebSite
It doesn’t matter which one you choose. (But there might of course be consumers that only recognize one of these properties.)
Note: Most of the time, Schema.org doesn’t define an inverse equivalent for a property. For example, there is author, but no authorOf. This is because you can use every property for both directions, with the help of the syntax:
RDFa:
rev
(example)
Microdata:
itemprop-reverse (non-standard, which is one of the reasons to prefer RDFa over Microdata)
(example)
JSON-LD:
#reverse
(example)
Related
I am reading up on Schema.org to be able to add the markup to a website I am working on. However, I'm already running into something I don't understand.
In the example for Product, it shows you should have a div whose itemprop is of type offers, but in the Product definition at http://schema.org/Product, I don't see offers as a property of Product at all.
If you look at http://schema.org/offers, it says offers is a property of Thing, but I don't see offers listed as a property of Thing at http://schema.org/Thing. What am I misunderstanding here?
Product does define the offers property. If you don’t see the offers property in the first table on that page (under the table heading "Properties from Product"), you are probably affected by a (known) bug. It typically works again when reloading the page later.
offers doesn’t have Thing as domain (but: AggregateOffer, CreativeWork, Event, MenuItem, Product, Service, Trip). If you are referring to the line "Thing > Property > offers", it doesn’t mean that the offers property is defined for/at Thing, it means that the offers property is a Thing. You can ignore that detail. What matters is the domain ("Used on these types") and the range ("Values expected to be one of these types") of a property.
It's perhaps worth highlighting the distinction between "types" and "properties". The vocabulary is a hierarchical taxonomy of the tangible and intangible things around us, which it calls types. In microdata, these use the itemtype attribute.
Properties describe the attributes of and relationships between the types, and in microdata use the itemprop attribute.
So, the type Product has the property offers (it's definitely there, you must be missing it1). A product can offer various things, one of which is the possibility of having some right to own or use it, which is described by the type Offer.
The property offers is indeed a property of Thing, but Thing is at the very top of the taxonomy, i.e. everything the ontology describes is a "thing", tangible or otherwise. So Thing is then broken down into more specific types of things:
Thing
- Intangible
-- Offer
-- Property
--- offers
So offers is a Thing like you and I are things — it's true, but we could be a lot more specific. In this case, offers is a property of the type Property, which in turn is a a more specific type of Intangible, which is a Thing.
1 Image of "offers" property under /Product:
As there are a limited number of options available in Schema.org, I wonder whats the best schema to use when it doesn't fit into the other categories. For example if I'm writing about a Car (assuming there is no car schema as I've not seen one) then should I use the Article or WebPage schemas?
Official documentation suggests three options:
If you publish content of an unsupported type, you have three options:
Do nothing (don't mark up the content in any way). However, before you decide to do this, check to see if any of the types supported by schema.org - such as reviews, comments, images, or breadcrumbs - are relevant.
Use a less-specific markup type. For example, schema.org has no "Professor" type. However, if you have a directory of professors in your university department, you could use the "person" type to mark up the information for every professor in the directory.
If you are feeling ambitious, use the schema.org extension system to define a new type.
Also if you do not declare explicitly the type of a web page it is considered to be of http://schema.org/WebPage, that is the most general type that you can use in this case.
Quote source
(Schema.org has a type for cars, Car, which is a Product. I’m using a parrot as example in this answer.)
You might want to differentiate between the thing the page is about and the page.
You can mark up your page with WebPage, but that doesn’t convey what the page is about / what it contains. To denote that, you need another item that can be used as value for the about / mainEntity property.
If Schema.org doesn’t offer a specific type, go up in the type hierarchy. There’s always a type that works: Thing. Or in other words: start at Thing and go down until you find the most specific type. See my answer on Webmasters SE with more details.
So a page (WebPage) about a specific parrot (Thing) could be marked up like this:
<body typeof="schema:WebPage">
<article property="schema:mainEntity" typeof="schema:Thing">
</article>
</body>
And if possible, it can be a good idea to use suitable specific types from other vocabularies (e.g., from animal or even parrot ontologies) in addition to the Schema.org types. For example, you could use the Parrot type from DBpedia:
<body typeof="schema:WebPage" prefix="dbpedia: http://dbpedia.org/resource/">
<article property="schema:about" typeof="schema:Thing dbpedia:Parrot">
</article>
</body>
I have seen in a post that the slash is no longer up to date for creating new extensions in Schema.org.
I am using Microdata and would prefer to stick to it across my site.
What is the new way to create a new extension?
For example I want to create a new extension for MedicalTourism under the category Travel Agency. Before it would have been
http://schema.org/TravelAgency/MedicalTourism
What is the new way?
And what would the code look like?
You may still use Schema.org’s "slash-based" extension mechanism. It’s "outdated", but not invalid.
But it’s not (and never was) a good idea to use this mechanism if you want other consumers to understand or make special use of your extensions.
In some cases you could use Schema.org’s Role type, which allows you to give some additional data about a property, but not about types.
Alternatives
Propose new types/properties: If they are useful and the sponsors agree, they might get added to the Schema.org vocabulary at some point.
Use an existing vocabulary that defines types/properties for your use case (or create a new vocabulary if you don’t find one):
Either instead of Schema.org,
or in addition to Schema.org (while this works nicely with RDFa, Microdata is pretty limited: you’d have to use Schema.org’s additionalType property for additional types and full URIs for additional properties).
In writing a scraper, we typically use some kind of selector to identify particular nodes of interest. Ideally the selectors should continue to work even as the page changes over time. A lot of the common approaches like grabbing nodes by id are fragile on frequently updated pages and impossible on some nodes. I'm trying to find good algorithms for generating robust selectors, but since there doesn't seem to be a standard terminology for this problem, it's hard to find everything that's out there.
Here are the selector DSLs I already know.
XPath selectors - Implemented everywhere from JS to the popular
Python and Ruby scraping libraries.
CSS selectors - Found in many of the places where you can find xpath
selectors.
High level selectors - Here I'll give the example of Chickenfoot,
which allows users to write click("begin tutorial") to find a link
with the text "Begin Tutorial." Usually these are implemented on top of
xpath and CSS selectors. I'd love to find out about more members of
this language family.
Visual selectors - This would be the approach taken by, for instance,
Sikuli, which makes it appear as though the program is calling a
function on a screengrab of the relevant node. I don't know any
web-specific instances of this approach, but I imagine there are
some.
Here are the selector generation algorithms I already know. By a selector generation algorithm I mean an algorithm that takes a node as input and produces a robust selector as output.
iMacros: Finds all elements with the same node type and text as the
target element, finds the target element's index in this list list. Uses
the node type, text, and index as the selector. Also includes id
for forms and form elements.
CoScripter: Uses element's text if available. If not, uses preceding
text.
Selenium: Uses id where available. Uses various other attributes
otherwise, such as image alt text, links' displayed texts, buttons'
displayed texts.
Wargo System: Uses element text.
Many systems: Many systems use the xpath from the root to the target node, or some
suffix of that xpath.
All of these selector generation algorithms fail on some nodes. Are there better approaches out there? Or other approaches that I could combine with these algorithms to produce a better hybrid algorithm?
When I started investigating this topic for some work I am doing, I was also surprised by how little information is available on this topic.
I did find this 2003 paper, but unfortunately, I only have access to the abstract:
Abe, Mari, and Masahiro Hori. “Robust Pointing by XPath Language: Authoring Support and Empirical Evaluation.” In Proceedings of the 2003 Symposium on Applications and the Internet, 156 – . SAINT ’03. Washington, DC, USA: IEEE Computer Society, 2003.
For my own use, I followed the approach in Tim Vasil's 50-line jquery plugin. I won't reproduce the code which is available at that link, but instead I'll describe it:
It recursively traverses up the DOM tree from the element, building a selector "backwards". At each level:
If the node has an ID, just use that and skip all the parents; they aren't added to the selector.
If node has a tag name or a set of classes that is unique among its siblings, use that as the selector. Otherwise, use :nth-child.
Since I will be storing element contents between visits to a page, I'm thinking of implementing some "blunder detection" here, maybe using a percentage change from last visit to detect if the selector may be grabbing the wrong element.
I would like to know if there is anyone who has implemented the subjectscheme maps of DITA1.2 in their work? If yes, can you please break-up the example to show:
how to do it?
when not to use it?
I am aware of the theory behind it, but I am yet to implement the same and I wanted to know if there are things I must keep in mind during the planning and implementation phase.
An example is here:
How to use DITA subjectSchemes?
The DITA 1.2 spec also has a good example (3.1.5.1.1).
What you can currently do with subject scheme maps is:
define a taxonomy
bind the taxonomy to a profiling or flagging attribute, so that it the attribute only takes a value that you have defined
filter or flag elements that have a defined value with a DITAVAL file.
Advantage 1: Since you have a taxonomy, filtering a parent value also filters its children, which is convenient.
Advantage 2: You can fully define and thus control the list of values, which prevents tag bloat.
Advantage 3: You can reuse the subject scheme map in many topic maps, in the usual modular DITA way, so you can apply the same taxonomies anywhere.
These appear to be the main uses for a subject scheme map at present.
The only disadvantages I have found is that I can think of other hypothetical uses for subject scheme maps such as faceted browsing, but I don't think any implementation exists. The DITA-OT doesn't have anything like that yet anyway.