How to serialize an annotated axiom to the RDF form?

How to serialize an annotated axiom to the RDF form? - annotations

Let's take the axiom
SubClassOf( DataAllValuesFrom( <d> xsd:boolean ) ObjectSomeValuesFrom( <o> owl:Thing ) Annotation( rdfs:comment "comm"^^xsd:string ) ).
What should this axiom look like in the form of RDF?
If I understand the specification correctly,
there is one and only one way:
Example 1:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<o> a owl:ObjectProperty .
[ a owl:Axiom ;
rdfs:comment "comm" ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedSource [ a owl:Restriction ;
rdfs:subClassOf _:c2 ;
owl:allValuesFrom xsd:boolean ;
owl:onProperty <d>
] ;
owl:annotatedTarget _:c2
] .
<d> a owl:DatatypeProperty .
_:c2 a owl:Restriction ;
owl:onProperty <o> ;
owl:someValuesFrom owl:Thing .
But, it suddenly turned out that there are people who understand the specification in a different way.
And the axiom above may or even must be written as follows:
Example 2:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<o> a owl:ObjectProperty .
<d> a owl:DatatypeProperty .
[ a owl:Axiom ;
rdfs:comment "comm" ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedSource [ a owl:Restriction ;
rdfs:subClassOf [ a owl:Restriction ;
owl:allValuesFrom owl:Thing ;
owl:onProperty <o>
] ;
owl:onProperty <d> ;
owl:someValuesFrom xsd:boolean
] ;
owl:annotatedTarget [ a owl:Restriction ;
owl:allValuesFrom owl:Thing ;
owl:onProperty <o>
]
] .
So, the question is, who is right? Which example is correct?
In my opinion, the second RDF (example 2) violates understanding of RDF reification and data connectivity.
But I could not convey this to the opponent.
I have arguments based on the specification
(that may be offered as an answer later),
but these arguments turned out to be untenable in his eyes,
so I appeal to a wide range of specialists here to get new arguments,
or, maybe, to improve my own vision of the concept:
nobody (with except of me) has said yet that the example 1 is the only correct way.
So it would be nice, having the specification, obtain a proof that the first (or the second) example is correct.
If I understood correctly, my opponent appeals to the following phrase from the specification:
In the mapping, each generated blank node (i.e., each blank node that does not correspond to an anonymous individual) is fresh in each application of a mapping rule..
Which, he thinks, means that super-class ObjectSomeValuesFrom( <o> owl:Thing ) must get b-node twice while writing to RDF.
How to proof that this is not true (or true)?
Thank you.

So, since no answers yet, here is my own, which is based on my understanding of the official specification https://www.w3.org/TR/owl2-mapping-to-rdf/.
Any comments and improvements are welcome.
1. Introduction
The spec defines only the operators T(E) and TANN(ann, y), where ann is Annotation( AP av ), Eand y are some objects.
The spec also says: The definition of the operator T uses the operator TANN in order to translate annotations.
For the operations that are described in the section 2.1 Translation of Axioms without Annotations
and the section 2.3 Translation of Axioms with Annotations there are no own names.
The operator TANN is defined in Table 2, section 2.2 Translation of Annotations,
but it is annotation for annotation, which is producing a b-node with root triple _:x rdf:type owl:Annotation.
The operator that creates top-level annotations with the root triple _: x rdf: type owl: Axiom is described in the section 2.3.1 Axioms that Generate a Main Triple, but also does not have a proper name.
And, in sake of demonstration, I'm going to introduce a new name for this "operator": ANN.
Note 1: do not confuse it with the function ANN from the section 3.2.2 Parsing of Annotations - we don't need that last thing; this answer is only about mapping, not parsing.
Note 2: I am not writing my own spec, I am just trying to explain my vision using the new abbreviation.
In general case this injection may not be correct, but for demonstration purposes, I think it is OK.
Also, let's consider the axiom SubClassOf as an operator with two operands.
It is described in the Table 1 from the section 2.1 Translation of Axioms without Annotations like this:
SubClassOf( CE1 CE2 ) = T(CE1) rdfs:subClassOf T(CE2) .
Let's also consider an overloaded operator SubClassOf with two operands and vararg of annotations.
The SubClassOf( CE1 CE2 annotations { n > 1 } ) is defined in the section 2.3.1 Axioms that Generate a Main Triple like the following:
s p xlt .
_:x rdf:type owl:Axiom .
_:x owl:annotatedSource s .
_:x owl:annotatedProperty p .
_:x owl:annotatedTarget xlt .
TANN(annotation1, _:x)
...
TANN(annotationm, _:x)
For simplicity let's dwell on one case when there is only one top-level annotation.
So, that operator is SubClassOf( CE1, CE2, ann) and it looks like this:
T(CE1) rdfs:subClassOf T(CE2) .
ANN(CE1, CE2, rdfs:subClassOf, ann) .
This is a new operator ANN, which is similar to TANN, but accepts two operands, annotation and constant, that defines the predicate.
It produces the root triple _:x rdf:type owl:Axiom and all other triples are similar to the triples for the operator TANN in the example above, so ANN(s, xlt, p, ann) is :
_:x rdf:type owl:Axiom .
_:x owl:annotatedSource s .
_:x owl:annotatedProperty p .
_:x owl:annotatedTarget xlt .
TANN(ann, _:x)
2. An ontology without annotations.
Now lets consider the example from the question where the first operand is DataAllValuesFrom and the second is ObjectSomeValuesFrom:
SubClassOf( DataAllValuesFrom( <d> xsd:boolean ) ObjectSomeValuesFrom( <o> owl:Thing ) ) .
In TURTLE it would look like this:
<d> a owl:DatatypeProperty .
<o> a owl:ObjectProperty .
[ rdf:type owl:Restriction ;
owl:onProperty <d> ;
owl:allValuesFrom xsd:boolean ;
rdfs:subClassOf [ rdf:type owl:Restriction ;
owl:onProperty <o> ;
owl:someValuesFrom owl:Thing
]
] ;
Or the same ontology in NTRIPLES syntax:
<d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#DatatypeProperty> .
<o> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
_:c1 <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:c2 .
_:c1 <http://www.w3.org/2002/07/owl#allValuesFrom> <http://www.w3.org/2001/XMLSchema#boolean> .
_:c1 <http://www.w3.org/2002/07/owl#onProperty> <d> .
_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:c2 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://www.w3.org/2002/07/owl#Thing> .
_:c2 <http://www.w3.org/2002/07/owl#onProperty> <o> .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
SubClassOf is an axiom that generate a main triple (see the section 2.3.1 Axioms that Generate a Main Triple).
So, the main triple (s p xlt) here is _:c1 <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:c2,
where s (the subject, in the example DataAllValuesFrom( <d> xsd:boolean )) is _:c1, p (the predicate) is rdfs:subClassOf, and xlt (xlt stands for a blank node, an IRI, or a literal, here it is the object, in the example ObjectSomeValuesFrom( <o> owl:Thing )) is _:c2.
Note, in ONT-API such TURTLE can be generated by the following code:
OntModel m = OntModelFactory.createModel().setNsPrefixes(OntModelFactory.STANDARD);
m.createDataAllValuesFrom(m.createDataProperty("d"), m.getDatatype(XSD.xboolean))
.addSuperClass(m.createObjectSomeValuesFrom(m.createObjectProperty("o"),
m.getOWLThing()));
m.write(System.out, "ttl");
3. Behaving of the operator T.
The spec says: In the mapping, each generated blank node (i.e., each blank node that does not correspond to an anonymous individual) is fresh in each application of a mapping rule..
I believe this is only about the operator T.
This statement is roughly matched what is said in the Parsing OWL, Structure Sharing, OWL1 spec:
In practice, this means that blank nodes (i.e. those with no name) which are produced during the transformation and represent arbitrary expressions in the abstract syntax form should not be "re-used"..
In ordinary case it is not a problem neither for ONT-API nor OWL-API, all these things behave similarly.
The following code produces identical RDF both for OWL-API (default impl) and ONT-API (with the OWL-API interfaces used):
OWLOntologyManager m = OntManagers.createONT();
OWLDataFactory df = m.getOWLDataFactory();
OWLClassExpression ce = df.getOWLObjectComplementOf(df.getOWLThing());
OWLOntology o = m.createOntology();
o.add(df.getOWLSubClassOfAxiom(ce, ce));
o.saveOntology(OntFormat.TURTLE.createOwlFormat(), System.out);
For the two equal class expressions ObjectComplementOf( owl:Thing ) which are operands of SubClassOf( CE1, CE2 ) there would be two different b-nodes.
So, nobody disputes the fact that in OWL there is no objects sharing.
But, in my opinion, this must not be apply to the relationship between the axiom and its annotations, which is the case of the operator ANN, see the next paragraph.
4.1 An annotated axiom that generate a main triple. Reification with SPO.
Now lets add an annotation Annotation( rdfs:comment "comm" ) to the SubClassOf( DataAllValuesFrom( <d> xsd:boolean ) ObjectSomeValuesFrom( <o> owl:Thing ) ) (see previous paragraph 2) in a manner that I think is the only true.
Remember, that the operator SubClassOf(CE1, CE2, ann) generates the following ttl:
T(CE1) rdfs:subClassOf T(CE2) .
ANN(CE1, CE2, rdfs:subClassOf, ann) .
or
s p xlt .
_:x rdf:type owl:Axiom .
_:x owl:annotatedSource s .
_:x owl:annotatedProperty p .
_:x owl:annotatedTarget xlt .
TANN(ann, _:x)
Here, the triple s p xlt is the result of applying the operator SubClassOf(CE1, CE2).
From the Table 2, section 2.2 Translation of Annotations, the operator TANN(Annotation( AP av ), _:x) for Annotation( rdfs:comment "comm"^^xsd:string ) will give the triple _:x rdfs:comment "comm"^^xsd:string, so we have (SubClassOf(CE1, CE2, Annotation( rdfs:comment "comm"^^xsd:string ))):
s p xlt .
_:x rdf:type owl:Axiom .
_:x owl:annotatedSource s .
_:x owl:annotatedProperty p .
_:x owl:annotatedTarget xlt .
_:x rdfs:comment "comm"^^xsd:string .
The triple s p xlt here is _:c1 rdfs:subClassOf _:c2 (see paragraph 2);
so finally we get the following annotated axiom:
_:c1 rdfs:subClassOf _:c2 .
_:x rdfs:comment "comm"^^xsd:string .
_:x rdf:type owl:Axiom .
_:x owl:annotatedSource _:c1 .
_:x owl:annotatedProperty rdfs:subClassOf .
_:x owl:annotatedTarget _:c2 .
The full ontology (without ontology id) in NTRIPLES syntax would look like this:
<o> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
<d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#DatatypeProperty> .
_:x <http://www.w3.org/2000/01/rdf-schema#comment> "comm" .
_:x <http://www.w3.org/2002/07/owl#annotatedTarget> _:c2 .
_:x <http://www.w3.org/2002/07/owl#annotatedProperty> <http://www.w3.org/2000/01/rdf-schema#subClassOf> .
_:x <http://www.w3.org/2002/07/owl#annotatedSource> _:c1 .
_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Axiom> .
_:c2 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://www.w3.org/2002/07/owl#Thing> .
_:c2 <http://www.w3.org/2002/07/owl#onProperty> <o> .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:c1 <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:c2 .
_:c1 <http://www.w3.org/2002/07/owl#allValuesFrom> <http://www.w3.org/2001/XMLSchema#boolean> .
_:c1 <http://www.w3.org/2002/07/owl#onProperty> <d> .
_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
Or the same in TURTLE:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<o> a owl:ObjectProperty .
[ a owl:Axiom ;
rdfs:comment "comm" ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedSource [ a owl:Restriction ;
rdfs:subClassOf _:c2 ;
owl:allValuesFrom xsd:boolean ;
owl:onProperty <d>
] ;
owl:annotatedTarget _:c2
] .
<d> a owl:DatatypeProperty .
_:c2 a owl:Restriction ;
owl:onProperty <o> ;
owl:someValuesFrom owl:Thing .
The triple _:c1 rdfs:subClassOf _:c2 (SPO) is present in the graph and has its reification:
_:x owl:annotatedTarget _:c2 .
_:x owl:annotatedProperty rdfs:subClassOf .
_:x owl:annotatedSource _:c1 .
Note, that this ontology can be generated by the following code:
OntModel m = OntModelFactory.createModel().setNsPrefixes(OntModelFactory.STANDARD);
m.createDataAllValuesFrom(m.createDataProperty("d"), m.getDatatype(XSD.xboolean))
.addSubClassOfStatement(m.createObjectSomeValuesFrom(m.createObjectProperty("o"), m.getOWLThing()))
.annotate(m.getRDFSComment(), "comm");
m.write(System.out, "ttl");
System.out.println(".......");
m.write(System.out, "nt");
4.2 An annotated axiom that generate a main triple. Reification with (S*)P(O*).
Well, the spec also says that In the mapping, each generated blank node (i.e., each blank node that does not correspond to an anonymous individual) is fresh in each application of a mapping rule.
This is about the operator T, but not for the operators TANN, ANN, SubClassOf(CE1, CE2) or SubClassOf(CE1, CE2, ann).
But SubClassOf operators consist of T and ANN(TANN), so they must also implicitly generate a blank node for each operands.
I remind that the operator SubClassOf(CE1, CE2, ann) originally (see p.1) looks like following:
T(CE1) rdfs:subClassOf T(CE2) .
ANN(CE1, CE2, rdfs:subClassOf, ann) .
But it is still not fully clear what should actually happen with its second part - the operator ANN(CE1, CE2, rdfs:subClassOf, ann).
Let's take my opponent's assumption (as far as I understand it), that the class expressions must not be shared even within a whole axiom including all its hierarchy-tree of annotations.
This is definitely true for the operator SubClassOf(CE1, CE2), and wrong for the operator TANN, and the subject of controversy for the operator ANN (that includes TANN).
But for a sake of experiment lets assume that the rule also must be applicable to the ANN operands.
So, the SubClassOf(CE1, CE2, ann) is now defined as follows:
SubClassOf(CE1, CE2) .
ANN(T(CE1), T(CE2), rdfs:subClassOf, ann) .
or
T(CE1) rdfs:subClassOf T(CE2) .
ANN(T(CE1), T(CE2), rdfs:subClassOf, ann) .
The SubClassOf(CE1, CE2) will give the following NTRIPLES (see p.2):
<d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#DatatypeProperty> .
<o> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
_:c2 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://www.w3.org/2002/07/owl#Thing> .
_:c2 <http://www.w3.org/2002/07/owl#onProperty> <o> .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:c1 <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:c2 .
_:c1 <http://www.w3.org/2002/07/owl#allValuesFrom> <http://www.w3.org/2001/XMLSchema#boolean> .
_:c1 <http://www.w3.org/2002/07/owl#onProperty> <d> .
_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
Here, the b-node _:c1 corresponds to the class expression DataAllValuesFrom( <d> xsd:boolean ),
and the b-node _:c2 corresponds to the ObjectSomeValuesFrom( <o> owl:Thing ).
Then we do T in ANN for the subject (the first operand T(CE1)):
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:b1 <http://www.w3.org/2002/07/owl#allValuesFrom> <http://www.w3.org/2001/XMLSchema#boolean> .
_:b1 <http://www.w3.org/2002/07/owl#onProperty> <d> .
and for the object (the second operand T(CE2)):
_:b2 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://www.w3.org/2002/07/owl#Thing> .
_:b2 <http://www.w3.org/2002/07/owl#onProperty> <o> .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
And print ANN itself:
_:x <http://www.w3.org/2000/01/rdf-schema#comment> "comm" .
_:x <http://www.w3.org/2002/07/owl#annotatedTarget> _:b2 .
_:x <http://www.w3.org/2002/07/owl#annotatedProperty> <http://www.w3.org/2000/01/rdf-schema#subClassOf> .
_:x <http://www.w3.org/2002/07/owl#annotatedSource> _:b1 .
_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Axiom> .
Notice, that now we have fresh b-nodes for CE1 and CE2 (_:b1 and _:b2 - respectively),
and have a reference in annotation (_:x) for these two nodes.
Inside the annotation graph-structure there are _:b1, _:b2, not _:c1,_:c2,
just because we first apply the operator T to the input class expression,
and only then pass the result further into the operator ANN.
The full ontology would be as the following (just concatenate all parts above) (NTRIPLES):
<o> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
<d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#DatatypeProperty> .
_:c2 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://www.w3.org/2002/07/owl#Thing> .
_:c2 <http://www.w3.org/2002/07/owl#onProperty> <o> .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:c1 <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:c2 .
_:c1 <http://www.w3.org/2002/07/owl#allValuesFrom> <http://www.w3.org/2001/XMLSchema#boolean> .
_:c1 <http://www.w3.org/2002/07/owl#onProperty> <d> .
_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:x <http://www.w3.org/2000/01/rdf-schema#comment> "comm" .
_:x <http://www.w3.org/2002/07/owl#annotatedTarget> _:b2 .
_:x <http://www.w3.org/2002/07/owl#annotatedProperty> <http://www.w3.org/2000/01/rdf-schema#subClassOf> .
_:x <http://www.w3.org/2002/07/owl#annotatedSource> _:b1 .
_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Axiom> .
_:b2 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://www.w3.org/2002/07/owl#Thing> .
_:b2 <http://www.w3.org/2002/07/owl#onProperty> <o> .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:b1 <http://www.w3.org/2002/07/owl#allValuesFrom> <http://www.w3.org/2001/XMLSchema#boolean> .
_:b1 <http://www.w3.org/2002/07/owl#onProperty> <d> .
Or the same in TURTLE:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<o> a owl:ObjectProperty .
[ a owl:Restriction ;
rdfs:subClassOf [ a owl:Restriction ;
owl:onProperty <o> ;
owl:someValuesFrom owl:Thing
] ;
owl:allValuesFrom xsd:boolean ;
owl:onProperty <d>
] .
<d> a owl:DatatypeProperty .
[ a owl:Axiom ;
rdfs:comment "comm" ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedSource [ a owl:Restriction ;
owl:allValuesFrom xsd:boolean ;
owl:onProperty <d>
] ;
owl:annotatedTarget [ a owl:Restriction ;
owl:onProperty <o> ;
owl:someValuesFrom owl:Thing
]
] .
As you can see, the triple _:c1 rdfs:subClassOf _:c2 (SPO) is present in the graph, but has no reification.
Instead, there is a reification for the triple _:b1 rdfs:subClassOf _:b2 ((S*)P(O*)), which does not actually exist in the graph:
_:x owl:annotatedTarget _:b2 .
_:x owl:annotatedProperty rdfs:subClassOf .
_:x owl:annotatedSource _:b1 .
Since the triple _:b1 rdfs:subClassOf _:b2 does not exist, then, in my opinion, this exercise demonstrates invalid behavior.
4.3 An annotated axiom that generate a main triple by OWL-API. Reification with SP(O*).
As you might guess, my opponent defends the current behavior of OWL-API (v5.1.11).
So let's see what OWL-API does.
The code to generate:
OWLOntologyManager man = OntManagers.createOWL();
OWLDataFactory df = man.getOWLDataFactory();
OWLAxiom a = df.getOWLSubClassOfAxiom(df.getOWLDataSomeValuesFrom(df.getOWLDataProperty("d"),
df.getBooleanOWLDatatype()),
df.getOWLObjectAllValuesFrom(df.getOWLObjectProperty("o"), df.getOWLThing()),
Collections.singletonList(df.getRDFSComment("comm")));
OWLOntology o = man.createOntology();
o.add(a);
o.saveOntology(new TurtleDocumentFormat(), System.out);
NTRIPLES (the Ontology ID is omitted):
<o> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
<d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#DatatypeProperty> .
_:u <http://www.w3.org/2002/07/owl#allValuesFrom> <http://www.w3.org/2002/07/owl#Thing> .
_:u <http://www.w3.org/2002/07/owl#onProperty> <o> .
_:u <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:x <http://www.w3.org/2000/01/rdf-schema#comment> "comm" .
_:x <http://www.w3.org/2002/07/owl#annotatedTarget> _:u .
_:x <http://www.w3.org/2002/07/owl#annotatedProperty> <http://www.w3.org/2000/01/rdf-schema#subClassOf> .
_:x <http://www.w3.org/2002/07/owl#annotatedSource> _:c1 .
_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Axiom> .
_:c1 <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:c2 .
_:c1 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://www.w3.org/2001/XMLSchema#boolean> .
_:c1 <http://www.w3.org/2002/07/owl#onProperty> <d> .
_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:c2 <http://www.w3.org/2002/07/owl#allValuesFrom> <http://www.w3.org/2002/07/owl#Thing> .
_:c2 <http://www.w3.org/2002/07/owl#onProperty> <o> .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
The original TURTLE:
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix xml: <http://www.w3.org/XML/1998/namespace> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#base <http://www.w3.org/2002/07/owl#> .
[ rdf:type owl:Ontology
] .
#################################################################
# Object Properties
#################################################################
### o
<o> rdf:type owl:ObjectProperty .
#################################################################
# Data properties
#################################################################
### d
<d> rdf:type owl:DatatypeProperty .
#################################################################
# General axioms
#################################################################
[ rdf:type owl:Axiom ;
owl:annotatedSource [ rdf:type owl:Restriction ;
owl:onProperty <d> ;
owl:someValuesFrom xsd:boolean ;
rdfs:subClassOf [ rdf:type owl:Restriction ;
owl:onProperty <o> ;
owl:allValuesFrom owl:Thing
]
] ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedTarget [ rdf:type owl:Restriction ;
owl:onProperty <o> ;
owl:allValuesFrom owl:Thing
] ;
rdfs:comment "comm"
] .
### Generated by the OWL API (version 5.1.11) https://github.com/owlcs/owlapi/
And the reformatted TURTLE (again, without ontology id):
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<o> a owl:ObjectProperty .
<d> a owl:DatatypeProperty .
[ a owl:Axiom ;
rdfs:comment "comm" ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedSource [ a owl:Restriction ;
rdfs:subClassOf [ a owl:Restriction ;
owl:allValuesFrom owl:Thing ;
owl:onProperty <o>
] ;
owl:onProperty <d> ;
owl:someValuesFrom xsd:boolean
] ;
owl:annotatedTarget [ a owl:Restriction ;
owl:allValuesFrom owl:Thing ;
owl:onProperty <o>
]
] .
As you can see, the triple _:c1 rdfs:subClassOf _:c2 (SPO) is present in the graph, but has no reification, just like in the previous paragraph (p4.2).
Instead, there is a reification for the triple _:c1 rdfs:subClassOf _:u (SP(O*)), which does not actually exist in the graph:
_:x owl:annotatedTarget _:u .
_:x owl:annotatedProperty rdfs:subClassOf .
_:x owl:annotatedSource _:c1 .
Also note, for this example, the operator SubClassOf(CE1, CE2, ann) must be as follows:
T(CE1) rdfs:subClassOf T(CE2) .
ANN(CE1, T(CE2), rdfs:subClassOf, ann) .
here, the first operand is passed as is, but for the second there is T-transformation, which produces a fresh b-node.
Since the triple _:c1 rdfs:subClassOf _:u does not exist in the whole graph,
this example also demonstrates wrong behavior.
So, in my opinion
OWL-API (v5.1.11) does not produce correct RDF in the case an annotated axiom consists of anonymous expressions.
5. Conclusion and notes.
So, why both specs prohibit reusing b-nodes for a mapping?
Well, I see the only one explanation - the authors want axioms to be atomic. If some axiom's components were shared, then it is not possible to separately turn off/on desired axioms while reasoning.
Does the example from the paragraph 4.1 violate this principle? No, the annotation still belongs to the only axiom, and cannot refer to another.
The examples from the paragraphs 4.2, 4.3 are wrong: the corresponding reified statements do not really exist.
But, as far as I can see, my opponent, defending the correctness of 4.3, gives arguments that lead to the correctness of 4.2.
I think, this strange fact is also an implicit proof of correctness 4.1.
The operator SubClassOf(CE1, CE2, ann) from the example 4.3 is unsymmetric. There are no any clues in the spec which may lead to such an unbalanced outcome. Why there is a transformation T for the second operand, but not for the first - this is a question.
The source (a comment in github issue): https://github.com/owlcs/owlapi/issues/874#issuecomment-527399645

Related

using sed to capture groups of indeterminate length and a multitude of characters

I am struggling to grasp the sed command.
I am working with gene annotation files. In particular, I convert gff3 to gtf files needed to execute cellranger-arc mkref. Both gffread and agat fail to do so perfectly on gff3 files from ncbi. My agat-gtf file doesn't contain 'transcript_id' as is.
The gtf format is a tab delimited format, with the final column being for attributes. The attributes are separated using semicolons. Currently, my agat-gtf file has 'locus_tag' descriptors which I want to replace as 'transcript_id' with necessary quote marks around the name of the transcript. As an example, I want
... ; locus_tag AbcdE_f1 ; ...
to be replaced with
... ; transcript_id "AbcdE_f1" ; ...
I have tried
sed -i.bak "s/locus_tag\([0-9a-zA-Z ,._-]{1,}\);/transcript_id \"1\";/g" myFile.gtf, but it does nothing. Thanks for any help.
As per request (I'll include two lines as input) typical input
sample:
ChrPT RefSeq exon 956 981 . + . Dbxref "GeneID:38831453" ; ID "nbis-exon-1" ; Parent PhpapaC_p1 ; gbkey exon ; gene "3' rps12" ; locus_tag PhpapaC_p1 ; product "ribosomal protein S12" <br>
ChrPT RefSeq gene 1033 1500 . + . Dbxref "GeneID:2546745" ; ID "nbis-gene-17" ; Name rps7 ; gbkey Gene ; gene rps7 ; gene_biotype protein_coding ; locus_tag PhpapaCp002
Desired output:
ChrPT RefSeq exon 956 981 . + . Dbxref "GeneID:38831453" ; ID "nbis-exon-1" ; Parent PhpapaC_p1 ; gbkey exon ; gene "3' rps12" ; transcript_id "PhpapaC_p1" ; product "ribosomal protein S12" <br>
ChrPT RefSeq gene 1033 1500 . + . Dbxref "GeneID:2546745" ; ID "nbis-gene-17" ; Name rps7 ; gbkey Gene ; gene rps7 ; gene_biotype protein_coding ; transcript_id "PhpapaCp002"

Using GNU sed
$ sed -E 's/\<locus_tag\>[ \t]([^ \t]*)/transcript_id "\1"/' input_file
ChrPT RefSeq exon 956 981 . + . Dbxref "GeneID:38831453" ; ID "nbis-exon-1" ; Parent PhpapaC_p1 ; gbkey exon ; gene "3' rps12" ; transcript_id "PhpapaC_p1" ; product "ribosomal protein S12" <br>
ChrPT RefSeq gene 1033 1500 . + . Dbxref "GeneID:2546745" ; ID "nbis-gene-17" ; Name rps7 ; gbkey Gene ; gene rps7 ; gene_biotype protein_coding ; transcript_id "PhpapaCp002"

FYI using AGAT properly should definitely provide a proper GTF file with transcript_id

Data losing original format

I am relatively new to powershell and having a bit of a strange problem with a script. I have searched the forums and haven't been able to find anything that works.
The issue I am having is that when I covert output of commands to and from base64 for transport via a custom protocol we use in our environment it is losing its formatting. Commands are executed on the remote systems by passing the command string to IEX and store the output to a variable. I convert the output to base64 format using the following command
$Bytes = [System.Text.Encoding]::Unicode.GetBytes($str1)
$EncodedCmd = [Convert]::ToBase64String($Bytes)
At the other end when we recieve the output we convert back using the command
[System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String($EncodedCmd))
The problem I am having is that although the output is correct the formatting of the output has been lost. For example if I run the ipconfig command
Windows IP Configuration Ethernet adapter Local Area Connection 2: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Ethernet
adapter Local Area Connection 3: Connection-specific DNS Suffix . : Link-local IPv6 Address . . . . . : fe80::3cd8:3c7f:c78b:a78f%14 IPv4 Address. . . . . . . . . . .
: 192.168.10.64 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 192.168.10.100 Ethernet adapter Local Area Connection: Connection-sp
ecific DNS Suffix . : IPv4 Address. . . . . . . . . . . : 172.10.15.201 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 172.10.15
1.200 Tunnel adapter isatap.{42EDCBE-8172-5478-AD67E-8A28273E95}: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Tunnel ada
pter isatap.{42EDCBE-8172-5478-AD67E-8A28273E95}: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Tunnel adapter isatap.{42EDCBE-8172-5478-AD67E-8A28273E95}: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Tunnel adapter Teredo Tunneling Pseudo-Inter
face: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . :
The formatting is all over the place and hard to read, I have played around with it a bit, but I can't find a really good way of returning the command output in the correct format. Appreciate any ideas on how I can fix the formatting

What happens here is that the $str1 variable is an array of strings. It doesn't contain newline characters but each line is on its own row.
When the variable is converted as Base64, all the rows in the array are catenated together. This can be seen easily enough:
$Bytes[43..60] | % { "$_ -> " + [char] $_}
0 ->
105 -> i
0 ->
111 -> o
0 ->
110 -> n
0 ->
32 ->
0 ->
32 ->
0 ->
32 ->
0 ->
69 -> E
0 ->
116 -> t
0 ->
104 -> h
Here the 0 are caused by double byte Unicode. Pay attention to 32 that is space character. So one sees that there is just space padding, no line terminators in the source string
Windows IP Configuration
Ethernet
As a solution, either add line feed characters or serialize the whole array as XML.
Adding line feed characters is done via joining the array elements with -join and using [Environment]::NewLine as the separator caracter. Like so,
$Bytes = [System.Text.Encoding]::Unicode.GetBytes( $($str1 -join [environment]::newline))
$Bytes[46..67] | % { "$_ -> " + [char] $_}
105 -> i
0 ->
111 -> o
0 ->
110 -> n
0 ->
13 ->
0 ->
10 ->
0 ->
13 ->
0 ->
10 ->
0 ->
13 ->
0 ->
10 ->
0 ->
69 -> E
0 ->
116 -> t
0 ->
Here, the 13 and 10 are CR and LF characters that Windows uses for line feed. After adding the line feed characters, the result string looks like the source. Be aware that thought it looks the same, it is not the same. Source is an array of strings, the outcome is single string containing line feeds.
If you must preserve the original, serialization is the way to go.

sed: delete lines that match a pattern in a given field

I have a file tab delimited that looks like this:
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
53_344 2 . C G 999 . . GT:PL:DP:DPR
6_56775 67 . T A 999 . . GT:PL:DP:DPR
53_234 78 . CCG GAT 999 . . GT:PL:DP:DPR
45_569 5 . TCCG GTTA 999 . . GT:PL:DP:DPR
3_67687 2 . T G 999 . . GT:PL:DP:DPR
53_569 89 . T G 999 . . GT:PL:DP:DPR
I am trying to use sed to delete all the lines that contain more than one letter in the 4th field (in the case above, line 7 and 8 from the top). I have tried the following regular expression but there must be a glitch some where that I cannot find:
sed '5,${;/\([^.]*\t\)\{3\}\[A-Z][A-Z]\+\t/d;}' input.vcf>new.vcf
The syntax is as follows:
5,$ #start at line 5 until the end of the file ($)
([^.]*\t) #matching group is any single character followed by a zero or more characters followed by a tab.
{3} #previous block repeated 3 times (presumably for the 4th field)
[A-Z][A-Z]+\t #followed by any string of two letters or more followed by a tab.
Unfortunately, this doesn' t work but I know I am close to make it to work. Any hints or help will make this a great teaching moment.
Thanks.

If awk is okay for you, you can use below command:
awk '(FNR<5){print} (FNR>=5)&&length($4)<=1' input.vcf
Default delimiter is space, you can use -F"\t" to switch it to tab, put it after awk. for instance, awk -F"\t" ....
(FNR<5){print} FNR is file number record, when it is less than 5, print the whole line
(FNR>=5) && length($4)<=1 will handle the rest lines and filter lines which 4th field has one character or less.
Output:
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
53_344 2 . C G 999 . . GT:PL:DP:DPR
6_56775 67 . T A 999 . . GT:PL:DP:DPR
3_67687 2 . T G 999 . . GT:PL:DP:DPR
53_569 89 . T G 999 . . GT:PL:DP:DPR
You can redirect the output to an output file.

$ awk 'NR<5 || $4~/^.$/' file
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
53_344 2 . C G 999 . . GT:PL:DP:DPR
6_56775 67 . T A 999 . . GT:PL:DP:DPR
3_67687 2 . T G 999 . . GT:PL:DP:DPR
53_569 89 . T G 999 . . GT:PL:DP:DPR

Fixed your sed filter (took me a while almost went crazy over it)
5,${/^\([^\t]\+\t\)\{3\}[A-Z][A-Z]\+\t/d}
Your errors:
[^.]*: everything but a dot.
Thanks to Ed, now I know that. I thought dot had to be escaped, but that does not seem to apply between brackets. Anyhow, this could match a tabulation char and match 2 or 3 groups instead of one, failing to match your line (regex are greedy by default)
\[A-Z][A-Z]: bad backslash. What did it do? hum, dunno!
test:
$ sed '5,${/^\([^\t]\+\t\)\{3\}[A-Z][A-Z]\+\t/d}' foo.Txt
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
53_344 2 . C G 999 . . GT:PL:DP:DPR
6_56775 67 . T A 999 . . GT:PL:DP:DPR
3_67687 2 . T G 999 . . GT:PL:DP:DPR
53_569 89 . T G 999 . . GT:PL:DP:DPR
conclusion: to process delimited fields, awk is better :)

Postgres table with select distinct and multiple sub queries

I have a table in Postgres 9.2 with 38 variables and I need a selection of the "best" results.
What I need is:
distinct var1 and var2 then from that:
min var3 and also var4 from that same row
max var5 and if more than one result then where min var3, var6 to var12 from that same row
var13 sorted by conditions (3 first, 6 second 0 last) and also var14-var18 from that same row
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 ...
1 1 2 a 2 a . . . . . . 0 . . . . .
1 1 1 b 1 b . . . . . . 3 . . . . .
1 2 4 c 3 c . . . . . . 3 . . . . .
1 2 3 d 4 d . . . . . . 6 . . . . .
2 1 1 a 3 a . . . . . . 3 . . . . .
3 1 3 a 2 a . . . . . . 6 . . . . .
3 1 2 b 4 b . . . . . . 0 . . . . .
4 1 3 a 4 a . . . . . . 3 . . . . .
4 1 6 b 2 b . . . . . . 0 . . . . .
4 2 2 c 2 c . . . . . . 0 . . . . .
4 3 5 d 3 d . . . . . . 3 . . . . .
4 3 4 e 4 e . . . . . . 6 . . . . .
4 3 7 f 4 f . . . . . . 3 . . . . .
...
The result should be:
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18
1 1 1 b 2 a . . . . . . 3 . . . . .
1 2 3 d 4 d . . . . . . 3 . . . . .
2 1 1 a 3 a . . . . . . 3 . . . . .
3 1 2 b 4 b . . . . . . 6 . . . . .
4 1 3 a 4 a . . . . . . 3 . . . . .
4 2 2 c 2 c . . . . . . 0 . . . . .
4 3 4 e 4 e . . . . . . 3 . . . . .
...
here is also an image of the table where the colored fields show what should be selected:
Hope this makes sense.
EDIT:
Got a pointer in another post to provide CREATE and INSERT for the table.
create table parent (
v1 character varying,
v2 character varying,
v3 character varying,
v4 character varying,
v5 character varying,
v6 character varying,
v7 character varying,
v8 character varying,
v9 character varying,
v10 character varying,
v11 character varying,
v12 character varying,
v13 character varying,
v14 character varying,
v15 character varying,
v16 character varying,
v17 character varying,
v18 character varying
);
insert into parent values('1','1','2','a','2','a','x1','x1','x1','x1','x1','x1','0','x1','x1','x1','x1','x1');
insert into parent values('1','1','1','b','1','b','x2','x2','x2','x2','x2','x2','3','x2','x2','x2','x2','x2');
insert into parent values('1','2','4','c','3','c','x3','x3','x3','x3','x3','x3','3','x3','x3','x3','x3','x3');
insert into parent values('1','2','3','d','4','d','x4','x4','x4','x4','x4','x4','6','x4','x4','x4','x4','x4');
insert into parent values('2','1','1','a','3','a','x1','x1','x1','x1','x1','x1','3','x1','x1','x1','x1','x1');
insert into parent values('3','1','3','a','2','a','x1','x1','x1','x1','x1','x1','6','x1','x1','x1','x1','x1');
insert into parent values('3','1','2','b','4','b','x2','x2','x2','x2','x2','x2','0','x2','x2','x2','x2','x2');
insert into parent values('4','1','3','a','4','a','x1','x1','x1','x1','x1','x1','3','x1','x1','x1','x1','x1');
insert into parent values('4','1','6','b','2','b','x2','x2','x2','x2','x2','x2','0','x2','x2','x2','x2','x2');
insert into parent values('4','2','2','c','2','c','x3','x3','x3','x3','x3','x3','0','x3','x3','x3','x3','x3');
insert into parent values('4','3','5','d','3','d','x4','x4','x4','x4','x4','x4','3','x4','x4','x4','x4','x4');
insert into parent values('4','3','4','e','4','e','x5','x5','x5','x5','x5','x5','6','x5','x5','x5','x5','x5');
insert into parent values('4','3','7','f','4','f','x6','x6','x6','x6','x6','x6','3','x6','x6','x6','x6','x6');

xml::Twig and findnodes

I have the following xml code snippet :
<a>
<b> textb <b>
<c> textc <c>
<d> textd <d>
<\a>
<a>
<b> textb <b>
<c> textc <c>
<d> textd <d>
<\a>
I use xml::twig to parse it as below :
my #c= map { $_->text."\n" } $_->findnodes( './a/');
and get the textbtextctextd as one element of the array. Is there an option to get with findnodes
textb,textc,textd as 3 array elements and not one?

Use the star at the end of the expression:
$_->findnodes( './a/*');
The '*' matches any tag, so you get the 3 child nodes - your current example only matches the 'a', and its text is the concatenation of the text of the nested elements.

in XML::Twig 3.39 (and above) you can use findvalue to get an array of strings.
my #c = $_->findvalue('./a/');

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to serialize an annotated axiom to the RDF form? - annotations

Related

using sed to capture groups of indeterminate length and a multitude of characters

Data losing original format

sed: delete lines that match a pattern in a given field

Postgres table with select distinct and multiple sub queries

xml::Twig and findnodes

Categories

Resources