Understanding the right data type & structure for schema.org JSON representation - schema.org

I've been looking at schema.org and it seems like a great idea for a public project that models the schema for several common types of data entities (Person, Place, Thing, Book, Movie, etc...).
I'm having trouble understanding two concepts regarding the data types and structure
I'll use the Recipe schema as an example, specifically the (simplified) raw JSON representation from the bottom of that page:
{
"#context": "http://schema.org",
"#type": "Recipe",
"author": "John Smith",
"name": "Mom's World Famous Banana Bread",
"nutrition": {
"#type": "NutritionInformation",
"calories": "240 calories",
"fatContent": "9 grams fat"
},
"recipeIngredient": [
"3 or 4 ripe bananas, smashed",
"1 egg",
"3/4 cup of sugar"
],
}
The author field should be of type Organization or Person, but the above JSON simply represents it as a string ("John Smith"). On the other hand, the nutrion field is of type NutritionInformation but it's represented as a fully structured object (i.e. not just a string). In what situations should we use the former versus the latter? Is it assumed that each object can optionally be distilled down to a simple string if more detail is not needed?
The recipeIngredient field is a list/array of items, but nothing in the specification document mentions that it should be a list. Can it also just be a single element? How do we know when to use a list versus a single element?

Expected types
Every Schema.org property can have a Text¹, a URL¹, or a Role value, even if they are not listed under "Values expected to be one of these types".
Quote from the data model documentation:
We also expect that often, where we expect a property value of type Person, Place, Organization or some other subClassOf Thing, we will get a text string, even if our schemas don't formally document that expectation. In the spirit of "some data is better than none", search engines will often accept this markup and do the best we can. Similarly, some types such as Role and URL can be used with all properties, and we encourage this kind of experimentation amongst data consumers.
In case Text is not listed as expected type, it’s usually better to provide one of the expected types instead of Text. In your example, going with an expected type would at least convey whether the author is a Person or an Organization, and it would give you the chance to provide an #id value, which allows others to make statements about the author of this recipe, or to understand that two authors are the same.
Multiple values
Every property can have multiple values. This is a core feature in all three syntaxes supported by Schema.org (JSON-LD, Microdata, RDFa).
Unless the property’s definition says otherwise, you should not put multiple values into one property value, not least because no delimiter is defined. So, not using an array for recipeIngredient would be incorrect, as this property expects "A single ingredient".
¹ As Text and URL are subtypes of DataType, these types should not be specified. If it’s a string value, it’s of type Text; if it’s an #id value, it’s of type URL.

Related

Does a birthdate/deathdate class should be a composition or an aggregation to an individual class?

The entity is a person.
So the entity have a birthdate and maybe already have a deathdate.
But this dates can or cannot be informed (depends of the entity and avaibility of the informations) ; so the entity might have none of those.
But I feel to do mess with the cardinality and the relation type.
How should I represent that ?
I have created an abstract class Individual. It leads to 2 final class : Person (identified person) or Pseudonym (anonym person).
It linked to a class Birthdate and a class Deathdate (both are generalized as a class Date).
[Birthdate]----<>[Individual] relationship is :
one (optional)-to-many (0..1 - 1..*)
0..1 : Because birthdate can be omitted and individual can have just one date of birth.
1..* : Because birthdate must concern at least one, but can concern severals individual.
[Deathdate]----<>[Individual] relationship is :
one (optional)-to-many (0..1 - 1..*)
0..1 : Because the individual isn't dead yet and can die just once.
1..* : Because deathdate must concern at least one but can concern severals individual.
But since, theoretically, everyone have a birthdate (and will have a deathdate) I was tempted by a composition. But some might prefer keep these dates secret and I wondered if composition could allow that.
Futhermore one date can correspond to severals individuals and here also I guess composition isn't possible then OR else it's me who did the confusion between Individual class and its instances (the individuals) and then Composition would be possible but not with the aforementionned cardinality.
At the moment I chose that :
Aggregation :
___________ _______________
|Birthdate|0..1-----1..*< >| |
___________ | <<Individual>>|
|Deathdate|0..1-----1..*< >|_______________|
But I hesitate with this one
Composition :
___________ _______________
|Birthdate|0..1-----1<#>| |
___________ | <<Individual>>|
|Deathdate|0..1-----1<#>|_______________|
What is the right answer ? Thanks for the attention.
There is a number of issues with the approach.
First - using a class for dates is simply an overkill. Both birthdate and deathdate are attributes of a specific person and can be easily modelled as inline properties of the Individual class. Unless there is some significant reason to use something more than the good old Date DataType, keep with the standard approach.
For visibility issue, as object oriented principles say you should not expose the properties directly anyway. Rather than that you should have an operation responsible for retrieving birthdate and deathdate that will control if the date can be read or not. You may add boolean attributes that will support that, but it isn't necessary if the ability to see the dates depend on some state of the Individual or other things (e.g. "who" asks). In the former case you may also wish to still show explicitly those boolean attributes as derived ones.
If you insist on using a class for dates (e.g. as you want to have a Wikipedia-style "Born on date"/"Deceased on date" collections) you should create just one class Date and build associations to this class pretty much similar to the way you did in your second approach. In such situation, the multiplicity does not work "database style" but is a property of association itself. In particular association you have one birthdate/deathdate and one Individual. By default you will have two 1-0..1 association one for each but depending on the approach you may have much more complex approach as well.
I'll later add diagrams for more clarity.
One last remark.
Do not use << >> for the class name. Those are reserved to indicate stereotypes.
If you want to indicate that Individual is abstract either show it in italics or (if your tool doesn't allow that) use <<abstract>> stereotype.

Retrieving arbitrary data into nested object with ORM

I am attempting to develop an api in go, to allow the user to specify an arbitrary data structure, and easily set up endpoints that perform CRUD operations on an auto generated postgres database, based on the structure that they define.
For now, I have been using gorm, and am able to have a database automatically generated based on a user-defined set of structs, that support all types of relations (has one, one to many, etc.). I am also able to insert into the generated database, when JSON is sent in through the endpoints.
The issue I have discovered, is when I try to receive the data. It seems where many of the go ORMs fall short on, is mapping data from all tables back into the nested structs of the parent struct.
For example, if the user defines:
type Member struct {
ID string
FirstName string
Hometown Hometown `gorm:"ForeignKey:MemberRefer"`
}
type Hometown struct {
ID string
City string
Province string
MemberRefer string
}
The database creates the tables:
Members
id
first_name
Hometowns
id
city
province
member_refer
However, when retrieving the data, all that is mapped back is:
{
"id": "dc2bb591-506f-40a5-a141-bdc0c8410ba1",
"name": "Kevin Krishna",
"hometown": {
"id": "",
"city": "",
"province": ""
}
}
Does anyone know of a go orm that supports this kind of behaviour?
Thanks
5 sec google search showed me the answer:
Preloading associations
Now that you actually have them properly related, you can .Preload() get the nested object you want:
db.Preload("GoogleAccount").First(&user)
Get nested object in structure in gorm
https://www.google.com/search?q=gorm+nested+struct+golang

Selected Updates in Nested AppSync Schema

I am trying to carry out selected upates of individual nested fields with a DynamoDB table which is connected to an AppSync interface. I am able to update individual top level fields but when it comes to nested fields I am unsure how to approach. I am a newbie to this so perhaps I am thinking about this wrong and I need to flatten the data through the schema so that the data is flat in the DynamoDb tables. I have struggled to find an example of how to tackle this kind of situation with fairly complex tables. I am using the Custom Types to bring some standardisation across the App and different resolvers/.
We have a AppSync Schema defined approximately like this
type Main_entries {
id: String!
title: String!
recordInfo: CustomType
}
Type CustomType {
fieldA: String
fieldB: String
fieldC: String
}
What I have are some main types but also some Custom Types used throughout the application. What I want to be able to do is to update fieldB whilst keeping the rest of the data intact.
I have used the UpdateItem approach here
With this I can say update title whilst keeping the rest of the record intact but if my Mutation instructs fieldB to be updated a SET is created to update the entire recordInfo type so fieldA and fieldC are omitted.
Does anyone know of any ideas or even better know where there may be some examples.
Many thanks in advance.

REST API URI for entities with two different keys

I must design an API to manage a Document entity: the originality of this entity is it can have two different ids:
id1 (number, i.e. 1234)
id2 (number, i.e. 89)
For each document, one and only one id is available (id1 or id2, not both)
Usually I solve this issue by using query parameters to perform some kind of "search" feature:
GET /documents?id1=1234
GET /documents?id2=89
But it works only if there is no sub-entity...
Let's say I want to get the authors of the documents :
GET /documents/1234/authors
Impossible because I can't know what type of id I get: is it id1 or id2 ?
GET /documents/authors?id1=1234
Not really REST I think because id1 then refers to the "Author" entity, not "Document" anymore...
GET /id1-documents/1234/authors
GET /id2-documents/1234/authors
Then you create two URIs that return the same entity (/author) not really REST compliant.
GET /documents/id1=1234/authors
GET /documents/id2=89/authors
It looks like a composite key created only for the API, it has no "backend" meaning. For me it sounds strange to create a "composite" key on the fly.
GET /document-authors?id1=1234
GET /document-authors?id2=89
In this case you completely lose the notion of tree... You end up with an API that contains only root entities.
Do you see another alternative ?
Which one looks the best ?
Thank you very much.
It seems to me that you're conflating two different resources here - documents and authors. A document has a relationship with an author, but they should be separate resources because the authors have existence from any individual document. With that in mind you need to ask whether your clients are searching for authors or documents. If it's authors, then they should be querying an authors API rather than a documents API.
e.g.For all the authors of documents with id1 89 or id1 1234 or id2 4444 you might query like this...
GET /authors?docId1=89&docId1=1234&docId2=4444
That should return a list of author representations. If people care about the documents themselves, the author representations could contain links to the documents.
Alternatively, if you're looking for documents then you should be querying that directly...
GET /documents?id1=89&id1=1234&id2=4444
What you're modelling as a sub-resource isn't really a subresource. It's a relationship between 2 independent resources and should be modelled as a set of links. Each document returned from the documents api should contain a set of authors links (if people really care about the authors) and vice versa from the authors to the documents.
Here's an opinionated solution from SlashDB, which allows for record filtering and traversing to related resources at the same time.
The example is similar to yours - two entities Artist and Album.
Let's identify the Artist first.
Artist by ID:
https://demo.slashdb.com/db/Chinook/Artist/ArtistId/2
Artist by Name:
https://demo.slashdb.com/db/Chinook/Artist/Name/Accept
An Artist may have issued Albums. The two entities are related. We allow extending the URL with the name of the related entity, like so:
https://demo.slashdb.com/db/Chinook/Artist/Name/Accept/Album
You can keep "going", say to get to the Tracks from those albums
https://demo.slashdb.com/db/Chinook/Artist/Name/Accept/Album/Track
And even continue filtering too i.e. only tracks, which are shorter than 300000 milliseconds:
https://demo.slashdb.com/db/Chinook/Artist/Name/Accept/Album/Track/Milliseconds/..300000

RESTFul way to reference resource with unique fields

One of the requirements for our REST interface is that each resource be identifiable by unique fields (aside from the primary identifier). The reason for this is that we want to be able to handle bulk importing data - in which case the client can't know the system generated primary identifiers.
This means we have to be able to reference our resources by unique fields. Using a primary key our read requests look like this:
GET example.com/rest/customers/1
and to get orders related to that customer
GET example.com/rest/customers/1/orders
Now, lets assume two fields in customer identify it uniquely, name ("foo") and businessId ("bar"). Given that, I came up with the following URI to get the orders for this customer:
GET example.com/someotherpath/customers/foo,bar/orders
But I don't like that I have a different path to identify that this is a resource being accessed via unique fields. How would you structure the above query in a RESTful way using unique fields instead of the primary key?
Further, an order looks like this:
{
<SNIP>
"orderId" : "42"
"_links": {
"customer": {
"href" : "rest/customers/1"
"key": [ "foo", "bar" ]
}
},
}
Any issues with allowing client to interchangeably specify href OR key when communicating with the interface?
For the first bit, I just wouldn't do it. If a customer has a unique id (and they should), I wouldn't allow end users to specify N other fields that happen to also uniquely identify the customer. It's messy for the user (which field goes first?) and also messy for you on the back end.
For the second bit, the issue is: what happens when they specify both? Which takes precedence? Are they going to remember? Do you want to have to support both? It's generally a good idea to only allow one way to do any particular thing if you can get away with it.