Fetching same vertex data for multiple levels in orientdb - orientdb

I am a beginner in OrientDB.
Consider that I have 2 Vertex, Cat and Val.
Cat contains a property called category and Val contains a property called value.
Categories can have sub-categories and those can further have sub-categories and so on. The categories and sub-categories are stored in Vertex Cat. The sub-categories are mapped using an edge called CatEdge whose from and to are the same vertex i.e. Cat.
For example, consider a category 'Education' which has two sub-categories 'School' and 'College'. The 'College' sub-category has further sub-categories 'Bachelors' and 'Masters'. So, there will be an edge in CatEdge from 'Education' to 'School' and 'College', and from 'College' to 'Bachelors' and 'Masters'.
Education
|- School
|- College
|- Bachelors
|- Masters
Apart from these, the Cat Vertex can have categories that do not have any sub-categories, for example 'FirstName', 'LastName', etc.
All the 'leaf' categories (that do not have further sub-category) have an edge called ValEdge from Vertex Cat to Vertex Val.
I want to retrieve all 'value' from Val for all the categories and sub-categories.
What I have done:
First, I fired the following query to retrieve all categories that do not have sub-categories and which are not a sub-category of other category:
select from Cat where #rid not in (select #rid, expand(both('CatEdge')) from Cat)
Then, programatically, I loop through all the categories fetched and find their corresponding values:
select expand(out('ValEdge')) from Cat where category = 'FirstName'
Second, I fetch all the categories that have sub-categories or are itself a sub-category using:
select from (traverse out('CatEdge') from Cat) where out('CatEdge').size() > 0
And store it in a list called SubList.
The above query will give me 'Education' and 'College'.
Using this list, for each item, I check if there exists its sub-category using:
select expand(out('CatEdge')) from Cat where category = 'Education'
The above query will give 'School' and 'College'. Then, programmatically, I check if 'School' and 'College' exists in SubList.
If it exists, I first remove it from the SubList and fire the above query again and this continues until I get zero rows.
If it does not exists in SubList, then it is a 'leaf' category and then find its value in the Val Vertex.
As you may have noticed, this is getting too complex. Is there any other way that I can achieve the same?

If this is your situation:
create class Cat extends V
create property Cat.category string
create class CatEdge extends E
create class Val extends V
create property Val.value integer
create class ValEdge extends E
create vertex Cat set category = 'Education'
create vertex Cat set category = 'School'
create vertex Cat set category = 'College'
create vertex Cat set category = 'Bachelors'
create vertex Cat set category = 'Masters'
create vertex Val set value = 1
create vertex Val set value = 2
create vertex Val set value = 3
create edge CatEdge from (select from Cat where category = 'Education') to (select from Cat where category = 'School')
create edge CatEdge from (select from Cat where category = 'Education') to (select from Cat where category = 'College')
create edge CatEdge from (select from Cat where category = 'College') to (select from Cat where category = 'Bachelors')
create edge CatEdge from (select from Cat where category = 'College') to (select from Cat where category = 'Masters')
create edge ValEdge from (select from Cat where category = 'School') to (select from Val where value = 1)
create edge ValEdge from (select from Cat where category = 'Bachelors') to (select from Val where value = 2)
create edge ValEdge from (select from Cat where category = 'Masters') to (select from Val where value = 3)
And if I understood your intention correctly, this query will work:
select in("ValEdge").category, value from Val
Output:
UPDATE
select category, $subcategories, $value from Cat
let
$subcategories = ( select category from (traverse out('CatEdge') from $parent.$current ) where $depth >=1 ),
$value = ( select out('ValEdge').value as value from $current )
returns this JSON.
Note that for all categories, or you have a list of subcategories, or, if it's a leaf, its value.

Related

OrientDB query to receive last vertex before a given date

Let's say I have the following list of vertices (connected by edges) in the orient database:
[t=1] --> [t=2] --> [t=3] --> [t=4] --> [t=5] --> [t=6] --> [t=7]
Each vertex has a timestamp t. I now want to receive the last vertex before a given date. Example: give me the last vertex before t=5, which is t=4.
Currently I'am using the following query to do this:
SELECT FROM ANYVERTEX WHERE t < 5 ORDER BY t DESC LIMIT 1
This is working fine when having up to let's say 1000 elements but the performance of that query drops with the number of elements inserted in the list. I already tried using an index, which improved the overall performance, but the problem, that the performance drops with the amount of elements still persists.
When building queries, always try to use the information you have about the relationship in your query to improve performance. In this case you don't need the sort (which is an expensive operation) because you know that the vertex you need has an incoming edge to the vertex, you can simply use that information in your query.
For example, let's say I have the following setup:
CREATE CLASS T EXTENDS V
CREATE VERTEX T SET t = 1
CREATE VERTEX T SET t = 2
CREATE VERTEX T SET t = 3
CREATE VERTEX T SET t = 4
CREATE VERTEX T SET t = 5
CREATE CLASS link EXTENDS E
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 1) TO (SELECT * FROM T WHERE t = 2)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 2) TO (SELECT * FROM T WHERE t = 3)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 3) TO (SELECT * FROM T WHERE t = 4)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 4) TO (SELECT * FROM T WHERE t = 5)
Then I can select the vertex before any T as such:
SELECT expand(in('link')) FROM T WHERE t = 2
This query does the following:
Select the vertex from T where t=2
From that vertex, traverse the incoming edge(s) of type link
expand() the vertex from which that edge comes from to get all of its information
The result is exactly what you want:
This should give better performance (especially if you add an index on the attribute t of the vertices) because you are using all the information you know in advance about the relationship = the node you need has an edge to the node you select.
Hope that helps you out.

OrientDB select unique Vertices from multiple Edges

I have 2 vertices User and Stamp. Vertices are related by three edges Have, WishToHave and Selling.
I'm wish to select unique Stamps that have any relation with User. To do it I was running this command:
select expand(out('Have', 'WishToHave', 'Selling')) from #12:0
The problem with this command is that it returns 'Stamp1' few times, because it has Have and Selling edges.
How can I select all unique/distinct Stamps related to User1?
To init test data for this example:
create class User extends V
create class Stamp extends V
create class Have extends E
create class WishToHave extends E
create class Selling extends E
create vertex User set name = 'User1'
create vertex Stamp set name = 'Stamp1'
create vertex Stamp set name = 'Stamp2'
create vertex Stamp set name = 'Stamp3'
create edge Have from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp1')
create edge WishToHave from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp2')
create edge Selling from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp1')
create edge Selling from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp3')
I tried your case with your structure:
To retrieve unique vertices you could use the DISTINCT() function. I can give you two examples:
Query 1: Using EXPAND() in the target query
SELECT EXPAND(DISTINCT(#rid)) FROM (SELECT EXPAND(out('Have', 'WishToHave', 'Selling')) FROM #12:0)
Output:
Query 2: Using UNWIND in the target query
SELECT EXPAND(DISTINCT(out)) FROM (SELECT out('Have', 'WishToHave', 'Selling') FROM #12:0 UNWIND out)
Output:
Hope it helps

OrientDB graph database design: storing properties on edge vs nodes

I am using OrientDB to store information about video rentals. I represent members and movies as nodes. And whenever a member borrows a movie I add an edge between them. The dataset has user borrowing movie multiple times. Also I am required to store in which month/season (still deciding which will suit our needs but besides the point) was the movie rented. I was planning to store the above mentioned detail on the edge.But I came across this:
http://orientdb.com/docs/2.1/Performance-Tuning-Graph.html
And it is recommended to avoid storing properties on edges. I wanted to know whether I should change my approach ? If yes then what is the alternative?
Thanks in advance.
I think in your case you might opt for the creation of property directly on the edge. The alternative to store data related to rental is to create a third node (ex. RentalData) between Member and Movies and utilize PK and FK fields, but it would be similar to the relational DB and not necessary.
I reproduced a small DB:
create class Member extends V;
create property Member.id integer;
create property Member.name string;
create property Member.surname string;
create index Member.id unique;
create class Movie extends V;
create property Movie.id integer;
create property Movie.title string;
create property Movie.minutes integer;
create index Movie.id unique;
create class borrows extends E;
create property borrows.rentaldate Datetime;
create vertex Member set id = 1, name = "Paul", surname = "Green";
create vertex Member set id = 2, name = "John", surname = "Smith";
create vertex Member set id = 3, name = "Frank", surname = "Redding";
create vertex Movie set id = 1, title = "Interstellar", minutes = 170;
create vertex Movie set id = 2, title = "The Gladiator", minutes = 176;
create edge borrows from (select from Member where id = 1) to (select from Movie where id = 1) set rentaldate = sysdate();
create edge borrows from (select from Member where id = 1) to (select from Movie where id = 2) set rentaldate = sysdate();
create edge borrows from (select from Member where id = 2) to (select from Movie where id = 2) set rentaldate = sysdate();
create edge borrows from (select from Member where id = 3) to (select from Movie where id = 1) set rentaldate = sysdate();
create edge borrows from (select from Member where id = 3) to (select from Movie where id = 2) set rentaldate = sysdate();
I stored the "rentaldata" property directly on the edge "borrows" to associate the member to the movie borrowed and I think you could do it like me.
From the very same link you provided:
Use the schema
Starting from OrientDB 2.0, if fields are declared in the schema,
field names are not stored in document/vertex/edge themselves. This
improves performance and saves a lot of space on disk.
source

Recursive query to get desired resultset

I have three tables: Superobject, object_master and object_child.
SuperObject contains superobj_id and obj_id. obj_master contains all the details about the object.
Object_child has two columns: obj_id and child_id. It contains object and its child. A child can also have a subchild. So, an object can have multiple childs.
SuperObject Table object_child table
sobj1 obj1 obj1 ch_obj1
sobj1 obj2 obj1 ch_obj2
sobj1 obj3 ch_obj1 ch_obj3
I want resultset in format:
obj1 ch_obj1
obj1 ch_obj2
obj1 ch_obj3
obj2 ------
obj2 ------
obj3 ------
I am using the following query:
with recursive objects as (
select objectid
from object_masster
where objectid in (obj1, obj2, obj3)
union
select a.child_id
from object_child a a join objects b on a.objectid = b.objectid
)
select * from objects
It is returning me all the children for the above objects but not in the desired format.
The trick with recursive queries is that you need to store all of the data in the resultset of the seed bit and recursive bit of the union so you can have it to: A) perform the next lookup, B) display whatever you need when you select on the recursive CTE you've built.
So, for your requirements, we need to store the root node (the first objectid you are selecting from your master table), and then parent and child as we recursively select.
Also, because you want that root node to make it through to the end of all the recursive lookup, you need to keep selecting that through in your recursive bit of the union.
This will look something like:
WITH RECURSIVE objects AS (
SELECT objectid AS root, CAST(NULL AS VARCHAR(10)) AS parent, objectid AS child
FROM object_master
WHERE objectid IN (obj1, obj2, obj3)
UNION
SELECT b.root AS root, b.child AS parent, a.child_id AS child
FROM object_child a
INNER JOIN objects b
ON a.objectid = b.child
)
SELECT root, child FROM objects

In a OrientDB query, how can the where clause reference a column in the select

I want to write a query in OrientDB performs WHERE filtering on some columns/fields on the SELECTed vertex.
Here is the equivalent query implemented with nested SELECT's-
SELECT FROM (SELECT EXPAND(OUT('Foo')) FROM #13:1 ) WHERE prop = 'bar'
How I can write this query with a single SELECT?
create class Foo extends E
create vertex #9:0
create vertex set prop = 'bar' #9:1
create vertex set prop = 'baz' #9:2
create edge Foo from #9:0 to #9:1
create edge Foo from #9:0 to #9:2
You can:
select expand(out('Foo')[prop = 'bar']) from #9:0