How to find most connected child nodes in OrientDB - orientdb

So I'm new to OrientDB, and while I'm pretty good at SQL the syntax to get what I want in OrientDB is escaping me.
I know I can do something like select *, in().size() as size from Users order by size desc to find the most connected node of a certain class (Users in this case), but how do I find the most connected children a couple levels down?
I.e., let's say I have Organizations --> PROMOTES (edge) --> Platform --> MANAGES (edge) --> Suggestion
How do I find the most connected Suggestions at the Organization level? I.e., I know I can easily find the most connected suggestions one level out using the query I shared, but what about the most connected another level beyond that?
I'd ultimately like a result which lists each Suggestion along with how indirectly connected (number of edges) it is to Organizations.
Thank you!

Use TRAVERSE
You should use the command TRAVERSE to do that
SELECT out(PROMOTES).size() AS connectedOrg,*
FROM (
TRAVERSE out(MANAGES)
FROM Suggestion WHILE $depth < 2
)
WHERE $depth > 0
Result will be the Platforms linked to a Suggestion. Records can be duplicated.
Along with each Platform, you get the number of Organisation called connectedOrg.
About traverse :
Traverse follow the record ids present in a record and aggregates them in the results. With WHILE $depth < 2 you can limit search to only one level and with WHERE $depth > 0 you can remove the original record from the result. More info here.
Use OrientDB Functions
If you need to know how many Organization is linked to each Suggestion (through Plateform), use this syntax.
SELECT *, set(out(MANAGES).out(PROMOTES), null).size() FROM Suggestion
Note : , null allows to switch from AGGREGATE to INLINE. It prevents from having all sizes aggregated in a single record. See the doc for more info.

Related

Alternative way to list all part tables that have existing entries/tuples

I have one single master Subject table which have several part tables. One convenient thing about Table.delete() is that is displays a prompt with all existing entries that were created under that Subject. Aside from delete() is there an alternative way to print what part table entries were created under a single Subject entry?
Thank you
To get the entry count of all the part tables restricted by some restriction (e.g. subject_name), you can do something like this:
restriction = {'subject_name': 'my_star_subject'}
for part_table in Subject.parts(as_objects=True):
part_table_query = part_table & restriction
print(f'{part_table.table_name}: len(part_table_query)')
It's a slow process, but I think I would do the following to see where entries were:
(subject.Subject & 'subject="<NAME>"').descendants(as_objects=True)
Not sure if you'd be better off with children (1 level down) or descendants (all the way down). delete gives the full set of descendants, using table.delete_quick(get_count=True).
EDIT:
To just get the counts, you might want:
[print(i.table_name,len(i)) for i in (subject.Subject & 'subject="<NAME>"').descendants(as_objects=True)]

OrientDB - Find related items of a vertex

We are building a system that will audit searches of a user (as well as other actions by the user). We will be tracking data such as User1 searched for Term1. There is an edge called 'searched' between a User and a Term vertex. What we are trying to find is some related information like users that searched for Term1 also searched for these terms (Term2, Term3, etc) and possibly some other related information between users and terms like "you may know these users". I am guessing a traversal is needed, however what I'm wondering is if depth of a traversal matters and will tell us the data we want. If we go the traversal route how deep do we set before we lose actual relevance?
So far this is what we pieced together but we aren't entirely sure if it is the correct approach.
SELECT $depth, * FROM (TRAVERSE * FROM (SELECT FROM Searched where q = 'al') STRATEGY BREADTH_FIRST ) WHERE #class in ['Term']
Update: what I ended up going with so far was the following query. It will tell me what other users searched for and a count of how many times. So I sort by how close a vertex is and how many times it was searched for. I feel this should hopefully provide a fairly good sample of what other users are searching for that also searched for that term.
SELECT $depth, q, in().size() AS count FROM (TRAVERSE * FROM (select from Term where q.toLowerCase() = 'aluminum') STRATEGY BREADTH_FIRST) WHERE #class = 'Term' AND $depth <> 0 ORDER BY $depth ASC, count DESC

OrientDB traverse (children) while connected to a vertex and get an other vertex

I'm not sure the title is the best way to phrase it, here's the structure:
Structure
Here's the db json backup if you want to import it to test it: http://pastebin.com/iw2d3uuy
I'd like to get the Dishes eaten by the Humans living in Continent 1 until a _Parent Human moved to Continent 2.
Which means the target is Dish 1 & 2.
If a parent moved to another Continent, I don't want their dish nor the dishes of their children, even if they move back to Continent 1.
I don't know if it matters, but a Human can have multiple children.
If there wasn't the condition about the children of a Human who has moved from the Continent, this query would have worked:
SELECT expand(in('_Is_in').in('_Lives').in('_Eaten_by'))
FROM Continent WHERE continent_id = 1
But I guess here we're forced to use (among other things)
TRAVERSE out('_Parent') FROM Human WHILE
I've tried to use the while of traverse with a subquery to get all the Humans I'm interested in, before to try to get the Dishes, but I'm not even sure we can use while with a subquery.
I hope the structure will help other users to quickly find out if this query is useful to them. If anyone is wondering, I used the Graph tab of OrientDB Studio to make it, along with GIMP.
As a bonus, if anyone knows the Gremlin syntax, it would also be useful to learn it.
Please feel free to edit this post as you see fit and contribute your thoughts :)
SELECT expand(in('_Eaten_by'))
FROM (TRAVERSE out('_Parent')
FROM (SELECT from Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1)
Explanation:
TRAVERSE out('_Parent')
FROM (SELECT FROM Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1
returns Human 1 and 2.
That query traverses Human, starting from Human 1 while the Human is connected to Continent 1.
It starts from in('_Parent').size() = 0 which are the Humans without any _Parent (there's only Human 1 in this case) (size() is the size of the collection of vertices coming in from _Parent).
And SELECT expand(in('_Eaten_by')) FROM
gets the Dishes, starting from the Humans we got from the traversal and going through the edge _Eaten_by.
Note: be sure to always use ' around the vertices and edges names, otherwise the names don't seem to be taken in account.

Find most common shared vertices in OrientDB

I'm currently evaluating OrientDB (2.1.16) as a possible solution to building a similarity recommender. To that end, I'd love some help writing an initial query that accomplishes the following:
Vertex:Maker -(Edge:Produced)-> Vertex:Item -(Edge:TaggedBy)-> Vertex:Tag
I'd like to select a particular Item (V1) and get a list back of other Items (Vn) ordered by the number of Tags shared in common with V1;
By extension, I'd like to take a selected Maker (V2) and traverse through Items to get an ordered list of Makers (and the traversed Items, if possible) who share Tags.
There isn't an awful lot of detailed documentation on the application of intersect in this way. No unusual constraints in particular. There would be thousands of Items and Makers and probably 10x that many Tags.
I tried with this little graph example
I used this query
select item.name, count(tag)from (
select from (
MATCH {
CLASS:Item, AS:item, WHERE: (name<>'v1')
}
.out("TaggedBy"){AS:tag}
return item, tag
) where tag in (
select expand(tag) from (
MATCH {
CLASS:Item, AS:item, WHERE: (name='v1')
}.out("TaggedBy"){AS:tag}
return tag
)
)
) group by item order by count desc
and I got this result
Hope it helps.

OrientDB query for hierarchical data

OrientDB Server v2.0.10 ,
I am trying to come up with a query for the following scenario.
I have 2 hierarchies: A->B->C and D->E->F
The number of nodes in the hierarchy can change.
The node in 1st hierarchy can be connected to the other hierarchy using some relation say 'Assigned'.
What I want is the parent node of the 2nd hierarchy if there is any incoming edge to any of the node in that 2nd hierarchy from the 1st.
For example, say we have Car-Child->Engine-Child->Piston and Country-Child->State-Child->City
And a relationship Made_In which relates Car or Engine or Piston to either Country or State or City
So if there is a relation with either of Country or State or City, the Country should be returned. Example, Engine1-Made_In->Berlin, this would return Germany.
Sorry for such a toyish example. I hope it is clear.
Thanks.
You should consider reading the chapter about "traversing" - that should be the missing link to answer your question. You can find it here: http://orientdb.com/docs/last/SQL-Traverse.html
Basically, if you think of your graph as a family tree, you want to achieve 3 things:
Find all children, grand-children, grand-grand-children (and so on) from tree 1 for a given family member (=Hierarchy1)
Find those who have relations to members of another family tree (=ASSIGNED)
Show me who's on top of this tree (=Hierarchy2)
One of the possible solutions should look a little something like this:
Since you want to end up on top of hierarchy2, you have to start on the other side, i.e. hierarchy1.
Get hierarchy1 (top-to-bottom)
TRAVERSE out("CHILD") FROM Car
Choose all relations
SELECT out("MADE_IN) FROM ([1])
and from those, go bottom-to-top
TRAVERSE in("CHILD") FROM ([2])
Who's on top?
SELECT FROM ([3]) WHERE #class="Country"
Combined into one sql, it looks as ugly as this:
SELECT FROM (
TRAVERSE in("CHILD") FROM (
SELECT out("MADE_IN") FROM (
TRAVERSE out("CHILD") FROM Car
)
)
) WHERE #class="Country"
You could replace Car with any #rid in hierarchy1 to get a list of countries it or any part of it was made in.
There might be better solutions for sure. But at least this one should work, so I hope it will help.