Retrieve all paths from a node - orientdb

I'm using OrientDB Community Edition 2.1.16.
This is the graph of my data:
I'm trying to retrieve all paths for given node using:
select $path from (traverse out('E1') from #13:5)
But what I get it's quite strange:
I would have expected that every path passing through second level nodes (#13:1,#13:2,#13:3) would have reached the root node (#13:0).
Something like:
(#13:5).out[0](#13:4).out[0](#13:1).out[0](#13:0)
(#13:5).out[0](#13:4).out[1](#13:2).out[0](#13:0)
(#13:5).out[0](#13:4).out[2](#13:3).out[0](#13:0)
It's that correct or what?
If yes, is there the possibility to get this result?
I mean to have a complete path from #13:5 to #13:0 passing through the second levels' nodes.
Thanks

The result you get depends on the strategy has the traverse, you can set two types: DEPTH_FIRST, the default, and BREADTH_FIRST. I think maybe you interests of the two strategies. For more info you can look at this link.
DEPTH_FIRST strategy
This is the default strategy used by OrientDB for traversal. It explores as far as possible along each branch before backtracking. It's implemented using recursion. To know more look at Depth-First algorithm. Below the ordered steps executed while traversing the graph using DEPTH_FIRST strategy:
Depth-first-tree
BREADTH_FIRST strategy
It inspects all the neighboring nodes, then for each of those neighbor nodes in turn, it inspects their neighbor nodes which were unvisited, and so on. Compare BREADTH_FIRST with the equivalent, but more memory-efficient iterative deepening DEPTH_FIRST search and contrast with DEPTH_FIRST search. To know more look at Breadth-First algorithm. Below the ordered steps executed while traversing the graph using BREADTH_FIRST strategy:
Breadth-first-tree

using your query
select $path from (traverse out('E1') from #13:5)
you get the path relative to every result of the traverse, you can verify that by adding the *
select *,$path from (traverse out('E') from #9:5)
In this way you get all the vertexes traversed and the path to get there from starting node.

Related

Filtering a return group by traversal in ArangoDB

I'm in the process of evaluating ArangoDB to be used instead of OrientDB. My dataset is essentially a forest of not-necessarily connected trees (a family tree).
Because the dataset is a directed acyclic graph (a tree), it's always more efficient to walk up the tree looking for something than down the tree.
In earlier versions of OrientDB, before they removed this critical feature for me, I was able to do the following query:
SELECT FROM Person WHERE haircolor = "Red" and in traverse(0, -1, "in") (birth_country = "Ireland")
Since haircolor is an indexed field, it's efficient to get all of those vertices. The magic is in the traverse operator within the WHERE clause, which stops traversal and immediately returns TRUE if it locates any ancestor from Ireland.
Yes, you can turn it around and look for all those from Ireland, and then walk downward looking for those pesky redheads, returning them, but it is substantially less efficient, since you have to evaluate every downward path, which potentially expands exponentially.
Since OrientDB shot themselves in the foot (in my opinion) by taking that feature out, I'm wondering if there's an ArangoDB query that would do a similar task without walking down the tree.
Thanks in advance for your help!
In AQL, it would go something like this:
FOR redhead IN Person // Find start vertices
FILTER doc.haircolor == "Red"
FOR v, e, p IN 1..99 INBOUND redhead Ancestor // Traversal up to depth 99
PRUNE v.birth_country == "Ireland" // Don't walk further if condition is met
RETURN p // Return the entire path
This assumes that the relations (edges) are stored in an edge collection called Ancestor.
PRUNE prevents further traversal down (or here: up) the path but includes the node that it is at.
https://www.arangodb.com/docs/stable/aql/graphs-traversals.html#pruning
Note that the variable depth traversal returns not only the longest paths but also "intermediate" paths of the same route. You may want to filter them out on the client-side or take a look at this AQL solution at the cost of additional traversals: https://stackoverflow.com/a/64931939/2044940

OSM : More nodes in a way than defined nodes

Using OverPass I am requesting all the ways and nodes in a specific area.
The documentation says : "The nodes defining the geometry of the way are enumerated in the correct order, and indicated only by reference using their unique identifier. These nodes must have been already defined separately with their coordinates."
But in the result I get, the definitions of some nodes are missing, as I get some nodes ID child of a way that I can't find in the nodes definition.
Here is my OverPass QL query :
[bbox:{{bbox}}];
(
node;
<;
);
out;
I am missing something ?
Thank you.
Strictly speaking, a solution based on the < (recurse up) statement does not meet your requirements. To find out why, we take a look a the Overpass QL documentation:
The recurse up standalone query is written as a single less than symbol, "<".
It takes an input set. It produces a result set. Its result set is
composed of:
all ways that have a node which appears in the input set; plus
all relations that have a node or way which appears in the input set; plus
all relations that have a way which appears in the result set
You will notice that your query also returns many relations, although in your question you mentioned you wanted only nodes and ways in your result.
A correct query would look as follows. Instead of using <, we're explicitly telling in QL that we only want ways for a set of nodes, and again, all nodes for a set of ways - and nothing else!
(
node({{bbox}});
way(bn);
node(w);
);
out meta;
(Btw: please forget about the Overpass language guide mentioned above. It is incomplete and not maintained at the moment).
Your query doesn't request all "ways and nodes". Instead it just requests nodes and performs a "recurse up" to get ways these nodes are part of. However for these ways you will only obtain the nodes from your initial query. You will need an additional "recurse down" to query for all other nodes these ways consist of:
[bbox:{{bbox}}];
(
node;
<;
);
out body;
>;
out;
Example: https://overpass-turbo.eu/s/FGj

Optimizing a Prefix Tree in OrientDB

In my project, I have a fairly large prefix tree, potentially containing millions of nodes (about 250K nodes in my development instance), managed in OrientDB (pointing to other vertices in my graph).
The nodes of the prefix tree are represented by a Token vertex type. Each Token has a 'key' property and is connected to its child vertices by a 'child' edge type. So, a sequence like "hello world" would be represented as:
root -child-> "hello" -child-> "world"
Currently, I have a NOTUNIQUE_HASH_INDEX on Token.key and I am querying the data structure like this:
SELECT EXPAND(OUT('child')[key=:k]) FROM :p
where k is the child key I am looking for and p is the RID of the parent node.
Generally, performance is pretty good, but I am looking for ideas on improving the query, the indexing, or both for this use case. In particular, queries starting at the root node, which has many children, take noticeably longer than the other, less-connected nodes.
Any suggestions? Thanks in advance!
Luigi Dell'Aquila from the OrientDB team provided an excellent answer on the OrientDB Google Group. To summarize, the following query (suggested by Luigi) dramatically improved performance.
SELECT FROM Token where key = :k AND in('Child') contains :p
I just ran a realistic test and query time was reduced by 97%! See https://groups.google.com/forum/#!topic/orient-database/mUkz6Z7hSwk for more details.

OrientDB: How to use traverse to get edges?

I am trying to use traverse to explore multiple orders of edges away from a specific starting node. For example, using the Grateful Dead graph, I call this command:
traverse bothE('followed_by') from #15:8 while $depth<3
I expect this to get two orders of edges. However, all the edges are ones that include the starting node. If instead I use both('followed_by') it appears to visit all the desired vertices, but it doesn't report the edges. What should I do?
The in edge on #15:8 record is called followed_by, and the out are sung_by, written_by, followed_by, so you can't use followed_by name and get also out edges, even if you use both in your query:
This one should do it:
traverse bothE() from #15:8 while $depth<3

what is the best way to retrive information in a graph through has Step

I'm using titan graph db with tinkerpop plugin. What is the best way to retrieve a vertex using has step?
Assuming employeeId is a unique attribute which has a unique vertex centric index defined.
Is it through label
i.e g.V().has(label,'employee').has('employeeId','emp123')
g.V().has('employee','employeeId','emp123')
(or)
is it better to retrieve a vertex based on Unique properties directly?
i.e g.V().has('employeeId','emp123')
Which one of the two is the quickest and better way?
First you have 2 options to create the index:
mgmt.buildIndex('byEmployeeId', Vertex.class).addKey(employeeId).buildCompositeIndex()
mgmt.buildIndex('byEmployeeId', Vertex.class).addKey(employeeId).indexOnly(employee).buildCompositeIndex()
For option 1 it doesn't really matter which query you're going to use. For option 2 it's mandatory to use g.V().has('employee','employeeId','emp123').
Note that g.V().hasLabel('employee').has('employeeId','emp123') will NOT select all employees first. Titan is smart enough to apply those filter conditions, that can leverage an index, first.
One more thing I want to point out is this: The whole point of indexOnly() is to allow to share properties between different types of vertices. So instead of calling the property employeeId, you could call it uuid and also use it for employers, companies, etc:
mgmt.buildIndex('employeeById', Vertex.class).addKey(uuid).indexOnly(employee).buildCompositeIndex()
mgmt.buildIndex('employerById', Vertex.class).addKey(uuid).indexOnly(employer).buildCompositeIndex()
mgmt.buildIndex('companyById', Vertex.class).addKey(uuid).indexOnly(company).buildCompositeIndex()
Your queries will then always have this pattern: g.V().has('<label>','<prop-key>','<prop-value>'). This is in fact the only way to go in DSE Graph, since we got completely rid of global indexes that span across all types of vertices. At first I really didn't like this decision, but meanwhile I have to agree that this is so much cleaner.
The second option g.V().has('employeeId','emp123') is better as long as the property employeeId has been indexed for better performance.
This is because each step in a gremlin traversal acts a filter. So when you say:
g.V().has(label,'employee').has('employeeId','emp123')
You first go to all the vertices with the label employee and then from the employee vertices you find emp123.
With g.V().has('employeeId','emp123') a composite index allows you to go directly to the correct vertex.
Edit:
As Daniel has pointed out in his answer, Titan is actually smart enough to not visit all employees and leverages the index immediately. So in this case it appears there is little difference between the traversals. I personally favour using direct global indices without labels (i.e. the first traversal) but that is just a preference when using Titan, I like to keep steps and filters to a minimum.