OrientDB select unique Vertices from multiple Edges - orientdb

I have 2 vertices User and Stamp. Vertices are related by three edges Have, WishToHave and Selling.
I'm wish to select unique Stamps that have any relation with User. To do it I was running this command:
select expand(out('Have', 'WishToHave', 'Selling')) from #12:0
The problem with this command is that it returns 'Stamp1' few times, because it has Have and Selling edges.
How can I select all unique/distinct Stamps related to User1?
To init test data for this example:
create class User extends V
create class Stamp extends V
create class Have extends E
create class WishToHave extends E
create class Selling extends E
create vertex User set name = 'User1'
create vertex Stamp set name = 'Stamp1'
create vertex Stamp set name = 'Stamp2'
create vertex Stamp set name = 'Stamp3'
create edge Have from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp1')
create edge WishToHave from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp2')
create edge Selling from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp1')
create edge Selling from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp3')

I tried your case with your structure:
To retrieve unique vertices you could use the DISTINCT() function. I can give you two examples:
Query 1: Using EXPAND() in the target query
SELECT EXPAND(DISTINCT(#rid)) FROM (SELECT EXPAND(out('Have', 'WishToHave', 'Selling')) FROM #12:0)
Output:
Query 2: Using UNWIND in the target query
SELECT EXPAND(DISTINCT(out)) FROM (SELECT out('Have', 'WishToHave', 'Selling') FROM #12:0 UNWIND out)
Output:
Hope it helps

Related

OrientDB graph database design: storing properties on edge vs nodes

I am using OrientDB to store information about video rentals. I represent members and movies as nodes. And whenever a member borrows a movie I add an edge between them. The dataset has user borrowing movie multiple times. Also I am required to store in which month/season (still deciding which will suit our needs but besides the point) was the movie rented. I was planning to store the above mentioned detail on the edge.But I came across this:
http://orientdb.com/docs/2.1/Performance-Tuning-Graph.html
And it is recommended to avoid storing properties on edges. I wanted to know whether I should change my approach ? If yes then what is the alternative?
Thanks in advance.
I think in your case you might opt for the creation of property directly on the edge. The alternative to store data related to rental is to create a third node (ex. RentalData) between Member and Movies and utilize PK and FK fields, but it would be similar to the relational DB and not necessary.
I reproduced a small DB:
create class Member extends V;
create property Member.id integer;
create property Member.name string;
create property Member.surname string;
create index Member.id unique;
create class Movie extends V;
create property Movie.id integer;
create property Movie.title string;
create property Movie.minutes integer;
create index Movie.id unique;
create class borrows extends E;
create property borrows.rentaldate Datetime;
create vertex Member set id = 1, name = "Paul", surname = "Green";
create vertex Member set id = 2, name = "John", surname = "Smith";
create vertex Member set id = 3, name = "Frank", surname = "Redding";
create vertex Movie set id = 1, title = "Interstellar", minutes = 170;
create vertex Movie set id = 2, title = "The Gladiator", minutes = 176;
create edge borrows from (select from Member where id = 1) to (select from Movie where id = 1) set rentaldate = sysdate();
create edge borrows from (select from Member where id = 1) to (select from Movie where id = 2) set rentaldate = sysdate();
create edge borrows from (select from Member where id = 2) to (select from Movie where id = 2) set rentaldate = sysdate();
create edge borrows from (select from Member where id = 3) to (select from Movie where id = 1) set rentaldate = sysdate();
create edge borrows from (select from Member where id = 3) to (select from Movie where id = 2) set rentaldate = sysdate();
I stored the "rentaldata" property directly on the edge "borrows" to associate the member to the movie borrowed and I think you could do it like me.
From the very same link you provided:
Use the schema
Starting from OrientDB 2.0, if fields are declared in the schema,
field names are not stored in document/vertex/edge themselves. This
improves performance and saves a lot of space on disk.
source

Finding the most common vertex connected to neighbors by a certain edge and using this, and edge information, to perform calculations

I want to figure out where a person might be from based on who they follow and get the country together with an approximate latitude and longitude. I've got two types of nodes: Users (containing a name and possibly lat and lng) and Countries (containing a name). I've also got two types of edges: Follow and LivesIn (containing lat and lng).
At the moment both the Account as well as the LivesIn edge contain the lat and lng because I'm not completely sure where it'd be better, but at the moment I'm leaning towards putting it in the edge.
Below is an example network with five users. Three of whom I know where they're from. Now I want to make an educated guess where Alice is from:
Alice follows four users
Two of these four users are from Germany, one from Belgium and one unknown
We can assume that Alice is from Germany
The average lat and lng for the german users are (51.165691+51.115691)/2 and (10.451526+10.481526)/2
We can assume that Alice is somewhere around (51.140691; 10.466526)
.
CREATE CLASS Account EXTENDS V
CREATE PROPERTY Account.name string
CREATE PROPERTY Account.lat double
CREATE PROPERTY Account.lng double
CREATE CLASS Country EXTENDS V
CREATE PROPERTY Country.countryname string
CREATE CLASS LivesIn EXTENDS E
CREATE PROPERTY LivesIn.lat double
CREATE PROPERTY LivesIn.lng double
CREATE CLASS Follows EXTENDS E
CREATE VERTEX Account SET name='Alice'
CREATE VERTEX Account SET name='Bob', lat=50.503887, lng=4.469936 /* Belgium */
CREATE VERTEX Account SET name='Carol', lat=51.165691, lng=10.451526 /* Germany */
CREATE VERTEX Account SET name='Eve', lat=51.115691, lng=10.481526 /* Germany */
CREATE VERTEX Account SET name='Dave'
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Alice') TO (SELECT FROM Account WHERE name='Bob')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Alice') TO (SELECT FROM Account WHERE name='Carol')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Alice') TO (SELECT FROM Account WHERE name='Eve')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Alice') TO (SELECT FROM Account WHERE name='Dave')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Bob') TO (SELECT FROM Account WHERE name='Alice')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Carol') TO (SELECT FROM Account WHERE name='Alice')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Eve') TO (SELECT FROM Account WHERE name='Alice')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Dave') TO (SELECT FROM Account WHERE name='Alice')
CREATE VERTEX Country SET countryname='Belgium'
CREATE VERTEX Country SET countryname='Germany'
CREATE EDGE LivesIn FROM (SELECT FROM Account WHERE name='Bob') TO (SELECT FROM Country WHERE countryname='Belgium') SET lat=50.503887, lng=4.469936
CREATE EDGE LivesIn FROM (SELECT FROM Account WHERE name='Carol') TO (SELECT FROM Country WHERE countryname='Germany') SET lat=51.165691, lng=10.451526
CREATE EDGE LivesIn FROM (SELECT FROM Account WHERE name='Eve') TO (SELECT FROM Country WHERE countryname='Germany') SET lat=51.115691, lng=10.481526
My question is if there's an effective way to achieve this using specific sql commands in OrientDB or if it needs a new function.
I got some small things figured out like getting all outgoing Follows connections:
SELECT out("Follows") FROM Account WHERE name='Alice'
But I can't really manage to get all the LivesIn edges from there.
Alternatively I can create a new function in OrientDB as they also did here. Something like:
var gdb = orient.getGraphNoTx();
var v = gdb.command("sql", "select from Account where name='" + name + "'");
neighbours = v[0].getRecord().field("out_Follows").iterator();
var result = []
print('\n');
country_dict = {}
while(neighbours.hasNext()) {
var neighbour = neighbours.next();
var temp = neighbour.field("in").field("out_LivesIn");
if(temp) {
it = temp.iterator();
print(it.next());
// Count each country and keep track of sum of lat and lng so it can be divided
// once all neighbours have been visited
}
}
But that doesn't really use any (possibly efficient?) built in methods of sql. Considering a single person can possibly follow tens of thousands of other accounts.
Would anyone have a suggestion how I can solve this?
Try this query
select countryname,eval('sum / _count') as average_lat,eval('sum2 / _count') as average_lng from
(select countryname,sum(_lat),sum(_lng),count(*) as _count from
(select outE("livesIn").lat as _lat,outE("livesIn").lng as _lng,out("livesIn").countryname as countryname from
(select expand(out("Follows")) from Account where name="Alice") unwind _lat,_lng,countryname)
group by countryname order by _count desc limit 1)

2 vertices connected two times with the same edge on lightweight mode

The orientdb documentation says regarding lightweight edges:
two vertices are connected by maximum 1 edge, so if you already have one edge between two vertices and you're creating a new edge between the same vertices, the second edge will be regular
Looking at the following script:
drop database plocal:../databases/test-lightweight admin admin;
create database plocal:../databases/test-lightweight admin admin;
connect plocal:../databases/test-lightweight admin admin;
alter database custom useLightweightEdges=true;
// Vertices
CREATE class Driver extends V;
CREATE PROPERTY Driver.name STRING;
// Edges
CREATE class Knows extends E;
CREATE PROPERTY Knows.in LINK Driver MANDATORY=true;
CREATE PROPERTY Knows.out LINK Driver MANDATORY=true;
// DATA
CREATE VERTEX Driver SET name = 'Jochen';
CREATE VERTEX Driver SET name = 'Ronnie';
// Jochen and Ronnie are very good friends
CREATE EDGE Knows FROM (SELECT FROM Driver WHERE name = 'Jochen') to (SELECT FROM Driver WHERE name = 'Ronnie');
CREATE EDGE Knows FROM (SELECT FROM Driver WHERE name = 'Jochen') to (SELECT FROM Driver WHERE name = 'Ronnie');
SELECT expand(out()) FROM (SELECT FROM Driver WHERE name = 'Jochen'); // 2 times Ronnie
SELECT count(*) FROM Knows; // 0
I would expect the last count to return 1, but it returns 0.
When I execute the same script but disabling the lightweight mode the result is 2 (as expected).

$parent and $ current in orientDB query

I read it on orientDB documentation but can't get a hold of it.
It would be great if someone could explain the use of $parent and $current in detail.
In a few examples I tried $parent.$parent.$current and $parent.$current, both give the same results which I feel should not happen. Below are my assumption:
$current gives access to the record/node currently being processed
$parent gives access to the parent of current record/node being processed
Your second assumption is wrong. It gives you access to the variables of the parent query (useful when calling traverse in a sub-query as stated here).
An example:
create class User extends V
create class Follows extends E
create vertex User set name = 'u1'
create vertex User set name = 'u2'
create vertex User set name = 'u3'
create edge Follows from (select from User where name = 'u1') to (select from User where name = 'u2')
create edge Follows from (select from User where name = 'u2') to (select from User where name = 'u3')
select $all from User
let $all = ( traverse out('Follows') from $parent.$current)

Fetching same vertex data for multiple levels in orientdb

I am a beginner in OrientDB.
Consider that I have 2 Vertex, Cat and Val.
Cat contains a property called category and Val contains a property called value.
Categories can have sub-categories and those can further have sub-categories and so on. The categories and sub-categories are stored in Vertex Cat. The sub-categories are mapped using an edge called CatEdge whose from and to are the same vertex i.e. Cat.
For example, consider a category 'Education' which has two sub-categories 'School' and 'College'. The 'College' sub-category has further sub-categories 'Bachelors' and 'Masters'. So, there will be an edge in CatEdge from 'Education' to 'School' and 'College', and from 'College' to 'Bachelors' and 'Masters'.
Education
|- School
|- College
|- Bachelors
|- Masters
Apart from these, the Cat Vertex can have categories that do not have any sub-categories, for example 'FirstName', 'LastName', etc.
All the 'leaf' categories (that do not have further sub-category) have an edge called ValEdge from Vertex Cat to Vertex Val.
I want to retrieve all 'value' from Val for all the categories and sub-categories.
What I have done:
First, I fired the following query to retrieve all categories that do not have sub-categories and which are not a sub-category of other category:
select from Cat where #rid not in (select #rid, expand(both('CatEdge')) from Cat)
Then, programatically, I loop through all the categories fetched and find their corresponding values:
select expand(out('ValEdge')) from Cat where category = 'FirstName'
Second, I fetch all the categories that have sub-categories or are itself a sub-category using:
select from (traverse out('CatEdge') from Cat) where out('CatEdge').size() > 0
And store it in a list called SubList.
The above query will give me 'Education' and 'College'.
Using this list, for each item, I check if there exists its sub-category using:
select expand(out('CatEdge')) from Cat where category = 'Education'
The above query will give 'School' and 'College'. Then, programmatically, I check if 'School' and 'College' exists in SubList.
If it exists, I first remove it from the SubList and fire the above query again and this continues until I get zero rows.
If it does not exists in SubList, then it is a 'leaf' category and then find its value in the Val Vertex.
As you may have noticed, this is getting too complex. Is there any other way that I can achieve the same?
If this is your situation:
create class Cat extends V
create property Cat.category string
create class CatEdge extends E
create class Val extends V
create property Val.value integer
create class ValEdge extends E
create vertex Cat set category = 'Education'
create vertex Cat set category = 'School'
create vertex Cat set category = 'College'
create vertex Cat set category = 'Bachelors'
create vertex Cat set category = 'Masters'
create vertex Val set value = 1
create vertex Val set value = 2
create vertex Val set value = 3
create edge CatEdge from (select from Cat where category = 'Education') to (select from Cat where category = 'School')
create edge CatEdge from (select from Cat where category = 'Education') to (select from Cat where category = 'College')
create edge CatEdge from (select from Cat where category = 'College') to (select from Cat where category = 'Bachelors')
create edge CatEdge from (select from Cat where category = 'College') to (select from Cat where category = 'Masters')
create edge ValEdge from (select from Cat where category = 'School') to (select from Val where value = 1)
create edge ValEdge from (select from Cat where category = 'Bachelors') to (select from Val where value = 2)
create edge ValEdge from (select from Cat where category = 'Masters') to (select from Val where value = 3)
And if I understood your intention correctly, this query will work:
select in("ValEdge").category, value from Val
Output:
UPDATE
select category, $subcategories, $value from Cat
let
$subcategories = ( select category from (traverse out('CatEdge') from $parent.$current ) where $depth >=1 ),
$value = ( select out('ValEdge').value as value from $current )
returns this JSON.
Note that for all categories, or you have a list of subcategories, or, if it's a leaf, its value.