OrientDB query to receive last vertex before a given date - orientdb

Let's say I have the following list of vertices (connected by edges) in the orient database:
[t=1] --> [t=2] --> [t=3] --> [t=4] --> [t=5] --> [t=6] --> [t=7]
Each vertex has a timestamp t. I now want to receive the last vertex before a given date. Example: give me the last vertex before t=5, which is t=4.
Currently I'am using the following query to do this:
SELECT FROM ANYVERTEX WHERE t < 5 ORDER BY t DESC LIMIT 1
This is working fine when having up to let's say 1000 elements but the performance of that query drops with the number of elements inserted in the list. I already tried using an index, which improved the overall performance, but the problem, that the performance drops with the amount of elements still persists.

When building queries, always try to use the information you have about the relationship in your query to improve performance. In this case you don't need the sort (which is an expensive operation) because you know that the vertex you need has an incoming edge to the vertex, you can simply use that information in your query.
For example, let's say I have the following setup:
CREATE CLASS T EXTENDS V
CREATE VERTEX T SET t = 1
CREATE VERTEX T SET t = 2
CREATE VERTEX T SET t = 3
CREATE VERTEX T SET t = 4
CREATE VERTEX T SET t = 5
CREATE CLASS link EXTENDS E
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 1) TO (SELECT * FROM T WHERE t = 2)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 2) TO (SELECT * FROM T WHERE t = 3)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 3) TO (SELECT * FROM T WHERE t = 4)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 4) TO (SELECT * FROM T WHERE t = 5)
Then I can select the vertex before any T as such:
SELECT expand(in('link')) FROM T WHERE t = 2
This query does the following:
Select the vertex from T where t=2
From that vertex, traverse the incoming edge(s) of type link
expand() the vertex from which that edge comes from to get all of its information
The result is exactly what you want:
This should give better performance (especially if you add an index on the attribute t of the vertices) because you are using all the information you know in advance about the relationship = the node you need has an edge to the node you select.
Hope that helps you out.

Related

OrientDB select unique Vertices from multiple Edges

I have 2 vertices User and Stamp. Vertices are related by three edges Have, WishToHave and Selling.
I'm wish to select unique Stamps that have any relation with User. To do it I was running this command:
select expand(out('Have', 'WishToHave', 'Selling')) from #12:0
The problem with this command is that it returns 'Stamp1' few times, because it has Have and Selling edges.
How can I select all unique/distinct Stamps related to User1?
To init test data for this example:
create class User extends V
create class Stamp extends V
create class Have extends E
create class WishToHave extends E
create class Selling extends E
create vertex User set name = 'User1'
create vertex Stamp set name = 'Stamp1'
create vertex Stamp set name = 'Stamp2'
create vertex Stamp set name = 'Stamp3'
create edge Have from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp1')
create edge WishToHave from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp2')
create edge Selling from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp1')
create edge Selling from (select from User where name = 'User1') to (select from Stamp where name = 'Stamp3')
I tried your case with your structure:
To retrieve unique vertices you could use the DISTINCT() function. I can give you two examples:
Query 1: Using EXPAND() in the target query
SELECT EXPAND(DISTINCT(#rid)) FROM (SELECT EXPAND(out('Have', 'WishToHave', 'Selling')) FROM #12:0)
Output:
Query 2: Using UNWIND in the target query
SELECT EXPAND(DISTINCT(out)) FROM (SELECT out('Have', 'WishToHave', 'Selling') FROM #12:0 UNWIND out)
Output:
Hope it helps

OrientDB: Find Connected Components Values during the visit

I have schema with 3 main classes: Transaction , Address and ValueTx(Edge).
I am trying to find connected components within a range of time.
Now I am doing this query based on this one ( OrientDB: connected components OSQL query) :
SELECT distinct(traversedElement(0)) from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044))
And this returns the rid of the 'head' of each trasversal and from it doing another DFS I can get every node and edge of the connected component I want to search about.
How can I, using the query above, also get the number of the transactions within the connected component and also the sum of their values? (The value of a tx is a property of the class Transaction)
I want to do something like:
SELECT distinct(traversedElement(0)) as head, count(Transaction), sum(valueTot) from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044)) group by head
But of course is not working. I get only one row with the last head and the sum of all the transactions.
Thanks in advance.
Edit:
This is an example of what I'm looking for:
Connected Transactions
Every transaction there is within the same range of height:
Using my query ( the first one in my post) I get the rid of the first node of each group of transaction that are linked through several addresses.
example:
#15:27
#15:28
#15:30
#15:34
#15:35
#15:36
#15:37
#15:41
#15:47
#15:53
What I'm trying to get is a list of every first node with the total number of transactions (not addresses only the transaction) of the group it belongs to and the sum of the value of every Transaction (stored in valueTot inside the class transaction.
Edit2:
This is the dataset where I am making the tests:
The main problem is that I have a lot of data and the approach I was trying before (from every rid I make a different sql query) it's quite slow, I hope there is a faster way.
Edit3:
This is an updated sample db: Download
(note, it's way larger than the other)
select head, sum(valueTot) as valueTot, count(*) as numTx,sum(miner) as minerCount from (SELECT *,traversedElement(0) as head from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 0 and height <= 110000 ) while ( #class = 'Address' or (#class = 'Transaction' and height >= 0 and height <= 110000 )) ) where #class = 'Transaction' ) group by head
This query on my system takes around one minute, also if I limit the result set, so I think the problem maybe in the internal query that selects the transactions that isn't using the indexes... Do you have any idea?
You can use this query
select #rid, $a[0].sum as sumValueTot ,$a[0].count as countTransaction from Transaction
let $a = ( select sum(valueTot),count(*) from (TRAVERSE both('ValueTx') from $parent.$current) where #class="Transaction")
where height >= 402041 and height <= 402044
Hope it helps.
is this what are you looking for?
select head, sum(valueTot), count(*) from (SELECT *,traversedElement(0) as head from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044)) where #class = "Transaction") group by head

OrientDB SQL Check if multiple pairs of vertices are connected

I haven't been able to find an answer for the SQL for this.
Given pairs of vertices (record ids) and edge types between them, I want to check if all pairs exists.
V1 --E1--> V2
V3 --E2--> V4
... and so on. The answer I want is true / false or something equivalent. ALL connections must be present in order to evaluate to true, so at least one edge (of correct type) must exist for each pair.
Pseudo, the question would be:
Does V1 have edge <E1EdgeType> to V2?
AND
Does V3 have edge <E2EdgeType> to V4?
AND
... and so on
Does anyone know what the orientDB SQL would be to achieve this?
UPDATE
I did already have one way of checking if one single edge exists between known vertices. It's perhaps not very pretty either, but it works:
SELECT FROM (
SELECT EXPAND(out('TestEdge')) FROM #12:0
) WHERE #rid=#12:1
This will return the destination record (#12:0) if an edge of type 'TestEdge' exists from #12:0 to #12:1. However, if I have two of those, how can I query for one single result for both queries. Something like:
SELECT <something with $c> LET
$a = (SELECT FROM (SELECT EXPAND(out('TestEdge')) FROM #12:0) WHERE #rid=#12:1)
$b = (SELECT FROM (SELECT EXPAND(out('AnotherTestEdge')) FROM #12:2) WHERE #rid=#12:3)
$c = <something that checks that both a and b yield results>
That's what I aim towards doing. Please tell me if I'm solving this the wrong way. I'm not even sure what the gain is to merge queries like this compared to just repeat queries.
Given a pair of vertices, say #11:0 and #12:0, the following query will effectively check whether there is an edge of type E from #11:0
to #12:0
select from (select #this, out(E) from #11:0 unwind out) where out = #12:0
----+------+-----+-----
# |#CLASS|this |out
----+------+-----+-----
0 |null |#11:0|#12:0
----+------+-----+-----
This is highly inelegant and I would encourage you to think about formulating an enhancement request accordingly at https://github.com/orientechnologies/orientdb/issues
One way to incorporate the boolean tests you have in mind is illustrated by the following:
select from
(select $a.size() as a, $b.size() as b
let a=(select count(*) as e from (select out(E) from #11:0 unwind out)
where out = #12:0),
b=(select count(*) as e from (select out(E) from #11:1 unwind out)
where out = #12:2))
where a > 0 and b > 0
Yes, inelegance again :-(
It might be useful to you the following query
SELECT eval('sum($a.size(),$b.size())==2') as existing_edges
let $a = ( SELECT from TestEdge where out = #12:0 and in = #12:1 limit 1),
$b = ( SELECT from AnotherTestEdge where out = #12:2 and in = #12:3 limit 1)
Hope it helps.

OrientDB: Is it possible to create a vertex together with an edge in one command?

I've got three classes:
Users extends V
Links extends V
Edges extends E
I have 3 Users, that won't usually change.
I have potentially 10000's of Links, and each one is connected to at least one of the Users (usually only one) via an Edge.
Is it possible to join these two commands, which are always called in succession, into one?
link = "insert into Links set title='Link 1'"
"create edge Edges
from ( select from Users where user_id='"+user_id+"')
to ( select from " + link._rid + ")"
That is some kind of pseudocode, I'm checking this out with pyorient.
Take a look at SQL Batch.
Your command(s) might look like the following...
pyorient_client.batch("""begin
let link = create vertex Links set name = 'Link 1'
let user = select from Users where user_id = '{}' lock record
let edge = create edge Edges from $user to $link
commit
return $edge""".format(user_id)
)

2 vertices connected two times with the same edge on lightweight mode

The orientdb documentation says regarding lightweight edges:
two vertices are connected by maximum 1 edge, so if you already have one edge between two vertices and you're creating a new edge between the same vertices, the second edge will be regular
Looking at the following script:
drop database plocal:../databases/test-lightweight admin admin;
create database plocal:../databases/test-lightweight admin admin;
connect plocal:../databases/test-lightweight admin admin;
alter database custom useLightweightEdges=true;
// Vertices
CREATE class Driver extends V;
CREATE PROPERTY Driver.name STRING;
// Edges
CREATE class Knows extends E;
CREATE PROPERTY Knows.in LINK Driver MANDATORY=true;
CREATE PROPERTY Knows.out LINK Driver MANDATORY=true;
// DATA
CREATE VERTEX Driver SET name = 'Jochen';
CREATE VERTEX Driver SET name = 'Ronnie';
// Jochen and Ronnie are very good friends
CREATE EDGE Knows FROM (SELECT FROM Driver WHERE name = 'Jochen') to (SELECT FROM Driver WHERE name = 'Ronnie');
CREATE EDGE Knows FROM (SELECT FROM Driver WHERE name = 'Jochen') to (SELECT FROM Driver WHERE name = 'Ronnie');
SELECT expand(out()) FROM (SELECT FROM Driver WHERE name = 'Jochen'); // 2 times Ronnie
SELECT count(*) FROM Knows; // 0
I would expect the last count to return 1, but it returns 0.
When I execute the same script but disabling the lightweight mode the result is 2 (as expected).