OrientDB query friend of friend in social network - orientdb

I have a question regarding how to create an OrientDB query.
The query should count the vertices of friends of friends relation in depths.
I am running OrientDB 2.1.6.
Schema:
Person is a Vertex, with property id (int)
Friend is an Edge
The relations I am looking for are like this:
Person -- Friend--> Person -- Friend --> Person
(#12:0) (#12:1) (#12:2)
I have 1 milion vertex and 100 milion edges.
Each vertex have 100 edges with random vertexes.
I want to start in vertex #12:0 and count how many vertex I have in depths 2,3,4 and 5.
I want to compare the query performance in milliseconds with Neo4j.
Can someone help me?
How is the fastest way to query this on OrientDB?
Sorry for the bad english.
Thank you guys

If you want to obtain the number of vertices at depth 2, the number of vertices at depth 3, etc you can use this query
select $depth,count(*) from (traverse out() from #12:0 while $depth<=5) where $depth>=2 group by $depth
if you want to sum how many vertices there are at depth 2,3,4,5
you can use this query
select count(*) from (traverse out() from #12:0 while $depth<=5) where $depth>=2

Try this: select #rid,out("Friend").size() from (traverse out() from #12:0 while $depth<=5)
Regards

Related

Does a "check whether data is ordered" aggregate function exist in Postgres?

I have a simple GROUP BY query in Postgres:
SELECT id,sum(score)
FROM games
GROUP BY teams
HAVING (array_agg(score ORDER BY kw.tag_num DESC))[1] = max(score)
ORDER BY sum DESC
This will return the teams with the top total scores. Additionally, it will only include teams whose final game of the season was their best. For instance, array_agg = {1,2,3,4,5} is allowed but array_agg={1,2,3,4,2} will be omitted.
But what I really want is to filter by teams who continually improved as the season progressed. So, my above query is actually a bit of a hack.
How can I make sure that array_agg = {1,2,3,4,5} is allowed but array_agg = {1,2,3,2,5} is omitted?
Speed is of utmost importance to me. I'd rather stick with a fast "hack" than to get what I really want but it to end up being too slow.
Thanks in advance!

OrientDB exclude certain vertices from traverse

I want to be able to exclude certain vertices from an OrientDB traverse query.
For example,
traverse * from (select from Location where ID = '1234')
will traverse all vertex classes at a starting point. I need a way to add exclusions for specific classes.
I know this could be possible if I didn't use the * operator and instead specify all of the classes I do want. However, it would not be suitable because there will be classes my program isn't even aware of. The data is ever changing but the classes to exclude will always be present.
I don't know if I understand correctly.
I have this structure.
I want to traverse starting from the node A1 excluding node of class B and the related branch.
I use this query
traverse * from #12:0 while #class<>"B"
Hope it helps.
UPDATE
I use this query
select * from (traverse * from #12:0 while #class<>"B") where #class<>"E" or (#class="E" and in.#class<>"B")
UPDATE 2
select * from (traverse * from #12:0 while #class<>"B") where #this instanceof 'V' or (#this instanceof 'E' and in.#class<>"B")
You can do it by using the difference() function:
select expand($c)
let $a=traverse * from (select from Location where ID = '1234')
$b=select from <class to exclude>
$c=difference($a,$b)
not sure about the synthax, but it should work
Bye, Ivan

Orientdb performance of count with traverse use

I have a database with around 300,000 users and 800,000 relationships between those users, this data can be described like:
User - Contact -> User
I want to know the number of possible new contacts that a specific user can have, so I wrote this query in order to know this number:
SELECT COUNT(*) FROM (TRAVERSE OUT() FROM (SELECT FROM Usuario WHERE user_id=12345) WHILE $depth <=2) WHERE $depth = 2
The query take 5 sec (more or less). I have the same data into a neo4j database and the count for the same level takes 450 ms. So I want to know if exists some way to obtain this information (number of possible new contacts) with best performance.
A good improvement you get by putting a NOTUNIQUE_HASH_INDEX the field user_id.
EDIT 1
Another tip that you can try using 'maxdepth' instead of 'while depth <= 2.
SELECT COUNT (*) FROM (TRAVERSE OUT () FROM (SELECT FROM Usuario WHERE user_id = 12345) WHILE $ MAXDEPTH = 2) WHERE $ depth = 2
There is a slight difference in terms of calculation time, due to the fact that the while $depth will be evaluated also at level 3, then the records are skipped because they don't match the while, but in the meantime they were loaded, and it costs execution time. Withmaxdepth you just stop the execution at level 2.

Orient SQL - Filter result set using WHERE?

I've got a bit of a semantic question about Orient SQL queries.
Take for example this very simple graph:
v(#12:1 User) --> e(#13:1 FriendOf) --> v(#12:2 User)
In other words, a given User with an rid of #12:1 is friends with another user with an rid of #12:2.
To get the friends of user #12:1, one might express this in Orient SQL like so:
SELECT EXPAND(both("FriendOf")) FROM #12:1
This query would return a result list comprised of the User with rid #12:2.
Now lets say I want to filter that result list by an additional criteria, like say a numeric value ("age"):
SELECT EXPAND(both("FriendOf")) FROM #12:1 WHERE age >= 10
The above query would filter the CURRENT vertex (#12:1), NOT the result set. Which makes sense, but is there a way to apply the filter to the EXPAND(both("FriendOf")) result rather than the current vertex? I know I can do this with gremlin like so:
SELECT EXPAND(gremlin('current.both("FriendOf").has("age",T.gte,10)')) FROM #12:1
But the above does not seem to make use of indexes (at least not when I ask it to explain). For very large data sets, this is problematic.
So is there a proper way to apply a WHERE statement to the resulting data set?
Thanks !
... is there a way to apply the filter to the EXPAND(both("FriendOf")) result rather than the current vertex?
The simple answer is to embed your basic "SELECT EXPAND ..." within another SELECT, i.e.
SELECT FROM (SELECT EXPAND(both("FriendOf")) FROM #12:1) WHERE age >= 10
By the way, on my Mac, the above took .005s compared to over 2s for the Gremlin version. There must be a moral there somewhere :-)

orientDB traverse really slow

I have some serious performance problems using orientdb.
I have got a plocal graph Database with a sheme like the following, the data is imported from JSON:
PersonA --hasInterest-> InterestA
PersonA --hasInterest-> InterestB
PersonB --hasInterest-> InterestA
PersonB --hasInterest-> InterestB
My goal is to find Interests that occur in combination with a given Interest. So my query looks like:
SELECT * FROM ( TRAVERSE out_hasInterest FROM ( SELECT FROM ( TRAVERSE in_hasInterest FROM #12:33 ) WHERE $depth > 0 )) WHERE $depth > 0
Where #12:33 is an Interest.
My real data is a bit bigger than this small snippet so for a concrete Interest there are ~500,000 Persons associated which have an average of ~3 Interests. So I would Traverse 500,000 + 500,000 * 3 = 2,000,000 Vertices. That seems not to be that much.
The query needs ~100 seconds. This is far to much for my application.
I think I am doing something terribly wrong, I can't believe the performance is that bad.
Any help is greatly appreciated!
Best regards
Ludwig
Version: 1.7-rc1
Why are you using traverse? If I understand correctly your goal you could do:
SELECT expand( in('hasInterest').out('hasInterest') ) FROM #12:33