Controlling edge multiplicity and direction in OrientDB, can it be improved? - orientdb

It seems there is a way to control multiplicity of edges in ODB through edge constraints.
http://orientdb.com/docs/last/Tutorial-Using-schema-with-graphs.html (towards the bottom)
The edge constraints are one of those usability things that make ODB less user friendly IMHO.
Here are some multiplicity types taken from Titan's manual (because the way they are explained is simple to understand):
MULTI: Allows multiple edges of the same label between any pair of
vertices. In other words, the graph is a multi graph with respect to
such edge label. There is no constraint on edge multiplicity.
SIMPLE: Allows at most one edge of such label between any pair of vertices. In
other words, the graph is a simple graph with respect to the label.
Ensures that edges are unique for a given label and pairs of vertices.
MANY2ONE: Allows at most one outgoing edge of such label on any vertex
in the graph but places no constraint on incoming edges. The edge
label mother is an example with MANY2ONE multiplicity since each
person has at most one mother but mothers can have multiple children.
ONE2MANY: Allows at most one incoming edge of such label on any vertex
in the graph but places no constraint on outgoing edges. The edge
label winnerOf is an example with ONE2MANY multiplicity since each
contest is won by at most one person but a person can win multiple
contests.
ONE2ONE: Allows at most one incoming and one outgoing edge
of such label on any vertex in the graph. The edge label marriedTo is
an example with ONE2ONE multiplicity since a person is married to
exactly one other person.
Needless to say, because of this kind of abstraction, implementing such multiplicity while creating edges in Titan is fairly straight forward and simple. In ODB, well, it isn't quite so transparent or simple. This is along the lines of the requested usability abstraction discussed in this issue on Github.
Let's go through the possibilities and see how ODB does them (and according to my understanding of ODB, which admittedly isn't great, so I could be wrong. Please correct me if I am!).
MULTI
This is a standard heavy weight edge in ODB. So, creating a heavy weight edge follows the MULTI multiplicity rule automatically.
SIMPLE
This seems like it would fall under the "UNIQUE" indexing method of an edge. However, I don't think that is completely right, because the UNIQUE index enforces a single in and out between two vertices. So using a UNIQUE index is more like the ONE2ONE. I believe this might be the equivalent to a light weight edge, but with an added unique index.(?)
MANY2ONE
I believe this can be done with the in and out constraints.
ONE2MANY
Same as above.
ONE2ONE
Available through the UNIQUE constraint.
So, through this exercise, I think I have learned that ODB can cover all multiplicity scenarios, though, I am absolutely not certain. Why must I be uncertain? This whole concept could be simplified by using the same terms Titan uses. It seems the abstraction is necessary. I believe it would make ODB easier to understand.
Maybe these suggestions could be worth thinking about. Starting from the Cars database example taken from the docs.
Creating the edge classes stays the same.
orientdb> CREATE CLASS Owns EXTENDS E
orientdb> CREATE CLASS Lives EXTENDS E
SIMPLE (a standard light weight edge, but automatically allows multiple light weight edges, which seems a step above TItan!)
orientdb> CREATE EDGE Owns FROM ( SELECT FROM Person ) TO ( SELECT FROM Car )
MULTI (an edge with properties is created/ heave weight edge, with in and out mandatory)
orientdb> CREATE MULTI EDGE Owns FROM ( SELECT FROM Person ) TO ( SELECT FROM Car )
MANY2ONE (not sure what needs to happen with in and out here)
orientdb> CREATE MANY2ONE EDGE Lives FROM ( SELECT FROM Country ) TO ( SELECT FROM Person )
ONE2MANY (same as above, not sure about what needs to happen with in and out)
orientdb> CREATE ONE2MANY EDGE Owns FROM ( SELECT FROM Person ) TO ( SELECT FROM Cars )
ONE2ONE (this is a heavy weight edge, with an automatic UNIQUE constraint)
orientdb> CREATE ONE2ONE EDGE Owns FROM ( SELECT FROM Person ) TO ( SELECT FROM Cars )
UNIQUE (an additional constraint, only for light weight edges)
orientdb> CREATE UNIQUE EDGE Owns FROM ( SELECT FROM Person ) TO ( SELECT FROM Car )
To be honest, I am really not sure this covers all needed or wanted possibilities, when it comes to edge direction constraints or multiplicity. However, I know for a fact the above suggest SQL is a lot easier for me to understand., which is the goal of a declarative language like SQL. We are abstracting out three things as I see it. Edge type creation (light or heavy weight), edge direction and multiplicity.
As I look back at what I just wrote, I guess what I am uncertain about is how to actually create many-to-one and one-to-many edges in ODB.
Any other thoughts on this and corrections to my thinking would be greatly appreciated.
Scott

I've got a bite in the ODB Google Group.
https://groups.google.com/forum/#!topic/orient-database/sZ4GjSvEKtI
Scott

Related

OrientDB traverse (children) while connected to a vertex and get an other vertex

I'm not sure the title is the best way to phrase it, here's the structure:
Structure
Here's the db json backup if you want to import it to test it: http://pastebin.com/iw2d3uuy
I'd like to get the Dishes eaten by the Humans living in Continent 1 until a _Parent Human moved to Continent 2.
Which means the target is Dish 1 & 2.
If a parent moved to another Continent, I don't want their dish nor the dishes of their children, even if they move back to Continent 1.
I don't know if it matters, but a Human can have multiple children.
If there wasn't the condition about the children of a Human who has moved from the Continent, this query would have worked:
SELECT expand(in('_Is_in').in('_Lives').in('_Eaten_by'))
FROM Continent WHERE continent_id = 1
But I guess here we're forced to use (among other things)
TRAVERSE out('_Parent') FROM Human WHILE
I've tried to use the while of traverse with a subquery to get all the Humans I'm interested in, before to try to get the Dishes, but I'm not even sure we can use while with a subquery.
I hope the structure will help other users to quickly find out if this query is useful to them. If anyone is wondering, I used the Graph tab of OrientDB Studio to make it, along with GIMP.
As a bonus, if anyone knows the Gremlin syntax, it would also be useful to learn it.
Please feel free to edit this post as you see fit and contribute your thoughts :)
SELECT expand(in('_Eaten_by'))
FROM (TRAVERSE out('_Parent')
FROM (SELECT from Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1)
Explanation:
TRAVERSE out('_Parent')
FROM (SELECT FROM Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1
returns Human 1 and 2.
That query traverses Human, starting from Human 1 while the Human is connected to Continent 1.
It starts from in('_Parent').size() = 0 which are the Humans without any _Parent (there's only Human 1 in this case) (size() is the size of the collection of vertices coming in from _Parent).
And SELECT expand(in('_Eaten_by')) FROM
gets the Dishes, starting from the Humans we got from the traversal and going through the edge _Eaten_by.
Note: be sure to always use ' around the vertices and edges names, otherwise the names don't seem to be taken in account.

OrientDB query for hierarchical data

OrientDB Server v2.0.10 ,
I am trying to come up with a query for the following scenario.
I have 2 hierarchies: A->B->C and D->E->F
The number of nodes in the hierarchy can change.
The node in 1st hierarchy can be connected to the other hierarchy using some relation say 'Assigned'.
What I want is the parent node of the 2nd hierarchy if there is any incoming edge to any of the node in that 2nd hierarchy from the 1st.
For example, say we have Car-Child->Engine-Child->Piston and Country-Child->State-Child->City
And a relationship Made_In which relates Car or Engine or Piston to either Country or State or City
So if there is a relation with either of Country or State or City, the Country should be returned. Example, Engine1-Made_In->Berlin, this would return Germany.
Sorry for such a toyish example. I hope it is clear.
Thanks.
You should consider reading the chapter about "traversing" - that should be the missing link to answer your question. You can find it here: http://orientdb.com/docs/last/SQL-Traverse.html
Basically, if you think of your graph as a family tree, you want to achieve 3 things:
Find all children, grand-children, grand-grand-children (and so on) from tree 1 for a given family member (=Hierarchy1)
Find those who have relations to members of another family tree (=ASSIGNED)
Show me who's on top of this tree (=Hierarchy2)
One of the possible solutions should look a little something like this:
Since you want to end up on top of hierarchy2, you have to start on the other side, i.e. hierarchy1.
Get hierarchy1 (top-to-bottom)
TRAVERSE out("CHILD") FROM Car
Choose all relations
SELECT out("MADE_IN) FROM ([1])
and from those, go bottom-to-top
TRAVERSE in("CHILD") FROM ([2])
Who's on top?
SELECT FROM ([3]) WHERE #class="Country"
Combined into one sql, it looks as ugly as this:
SELECT FROM (
TRAVERSE in("CHILD") FROM (
SELECT out("MADE_IN") FROM (
TRAVERSE out("CHILD") FROM Car
)
)
) WHERE #class="Country"
You could replace Car with any #rid in hierarchy1 to get a list of countries it or any part of it was made in.
There might be better solutions for sure. But at least this one should work, so I hope it will help.

How to set street obstructed in planet_osm_line/pg_route

I am working on a project where we are going to be looking at finding the shortest/fastest route from point A to point B. I've been looking at the tables generated by the osm2pgsql. And I'm wondering how would I represent a road obstructed after the osm has been loaded into our database. Our project will rely on osm to map out all of the roads we will also have an operator looking at live video footage of roads. At which point if the operator see's a road is obstructed we want to update the database to reflect this road obstructed say by a downed tree.
I've been looking at all of the columns and the only one that stands out in my head is barrier. I have been unable to find any documentation on what each column represents and how pg_route takes each into consideration when creating a route. What I'm looking for is a column that when pg_route looks in the database and sees a road it says oh that roads blocked skip it?
This is good question for gis.se...
First thing is pg_routing can't route via data generated by osm2pgsql - this data is not a network. You need data generated by osm2po or osm2pgrouting and this data is quite different.
Second thing is - there is no such column. In every pg_routing function you're passing sql which will select data for route search so you're deciding which edge will be in this dataset and which not - it's not a problem to add extra column to table with edges.
Here is link to pgrouting workshop it will guide you through all process from import of data to first generated route. It's using osm2pgroutin to import data, but I suggest you use osm2po instead.
So as Jendrusk mentioned, when you generate a route you will pass the function a SQL query to select the edges for the graph you want to solve, 'select * from edges where the_geom && <bbox>' You can model blockages using point and radius, lines, or polygons that you want the route to avoid by adding to the query above avoidance zones like:
'select * from edges where the_geom && <bbox> and not st_dwithin(the_geom, point, radius) and not stdwithin(the_geom, line_or_polygon, 0.0)'
If you have lots of these avoidances then put them in a table and do a join to eliminate the edges that are used to build the graph. If the edges are not there the route is forces to find a way around the avoidance.

Can edges have Set of properties (by set i mean multi valued atributes)

Suppose i have a two vertexes A and B.
Can the edge between this vertices have Set of properties. By Set i mean Set. Not a map of key values.
EG Edge from A to B has Set tags.
I want model someting like A workswith B
Now workswith has properties likes ondays [monday,tuesday,friday]
tags values = ['Monday','Tuesday','Friday'];
Here tags is a single propery but its type is Set.Is it possible?
Now will traversing i would like to find something like
Find with all A works on monday?
Find will all A works on an day?
Note :This is a simple example depicting my use case .My real use case is more complex.
Yes you can. An edge is a document so can be very very complex with collections, maps and nested documents.

Tag hierarchies and handling of

This is a real issue that applies on tagging items in general (and yes, this applies to StackOverflow too, and no, it is not a question about StackOverflow).
The whole tagging issue helps cluster similar items, whatever items they may be (jokes, blog posts, so questions etc). However, there (usually but not strictly) is a hierarchy of tags, meaning that some tags imply other tags too. To use a familiar example, the "c#" so tag implies also ".net"; another example, in a jokes database, a "blondes" tag implies the "derisive" tag, similarly to "irish" or "belge" or "canadian" etc depending on the joke's country origin.
How have you handled this, if you have, in your projects? I will supply an answer describing two different methods I have used in two separate cases (actually, the same mechanism but implemented in two different environments), but I am also interested not only on similar mechanisms, but also on your opinion on the hierarchy issue.
This is a tough question. The two extremes are an ontology (everything is hierarchical) and a folksonomy (tags have no hierarchy). I have answered this on WikiAnswers, with a reference to Clay Shirky's "Ontology is Overrated" article which claims you should set no hierarchy.
Actually I would say that it is not so much a hierarchical system but a semantic net with felt distancies between tags meanings. What do I mean: mathematics is closer to experimental physics then to gardening.
Possibility to build such a net: Build pairs of tags and let people judge the perceived distance (using a measure like 1-10, meaning something like [synonyms, alike,...,antonyms], ...) and when searching, search for all tags within a certain distance.
Does a measure have to be equal distance if coming from the oposite direction ([a,b] close -> [b,a,] close)? Or does proximity imply [a,b] close and [b,c] close -> [a,b] close?
Maybe the first word will by default trigger another semantic field? If you start at "social worker", "analyst" ist near. If you start at "programmer", "analyst" is near as well. But starting at any of these points, you probably would not count the other as near ("sozial worker" is by no means close to "programmer").
You therefore would have only pairs judged and judged in both directions (in random order).
[TagRelations]
tagId integer
closeTagId integer
proximity integer
Example for selection of similar tags:
select closeTagId from TagRelations where tagId = :tagID and proximity < 3
The mechanism I have implemented was to not use the tags given themselves, but an indirect lookup table (not strictly DBMS terms) which links a tag to many implied tags (obviously, a tag is linked with itself for this to work).
In a python project, the lookup table is a dictionary keyed on tags, with values sets of tags (where tags are plain strings).
In a database project (indifferent which RDBMS engine it was), there were the following tables:
[Tags]
tagID integer primary key
tagName text
[TagRelations]
tagID integer # first part of two-field key
tagID_parent integer # second part of key
trlValue float
where the trlValue was a value in the (0, 1] space, used to give a gravity for the each linked tag; a self-to-self tag relation always carries 1.0 in the trlValue, while the rest are algorithmically calculated (it's not important how exactly). Think the example jokes database I gave; a ['blonde', 'derisive', 0.5] record would correlate to a ['pondian', 'derisive', 0.5] and therefore suggest all derisive jokes given another.