Gremlin : fetch result as an Array - orientdb

I'm trying to select the entities as mentioned in Gremlin Docs for Select
gremlin> g.v(1).as('x').out('knows').as('y').select
==>[x:v[1], y:v[2]]
==>[x:v[1], y:v[4]]
But I'm trying to get result as like below
gremlin> g.v(1).as('x').out('knows').as('y').select
==>[[x:v[1]], [y:v[2],y:v[4]]]
Because current scenario for an entity 'x', it has more than 500 associated 'y' entities, So I'm ended up getting same 'x' data for all 'y' entities
gremlin> g.v(1).as('x').out('knows').as('y').select
==>[x:v[1], y:v[2]]
==>[x:v[1], y:v[4]]
==>.....
==>[x:v[1], y:v[500]]
Could someone guide me the way to do this?

You could use groupBy():
g.V(1).groupBy{it}{it.out('knows')}.cap()

Related

Finding vertices with exact edge matches traversing through children

My setup:
I'm using OrientDB with a large graph of People vertices. I'm using the gremlin java driver to access this database since I would like to eventually switch to a different graph database down the line.
My database:
Each person has certain preference vertices (connected via a labeled edge describing that relation to that preference). All preferences are then connected to core concept vertex.
Problem I'm trying to solve:
I'm trying to find a way (kudos if its as simple as a Gremlin query) to start at a Person vertex and traverse down to all people with identical preferences via a core concept.
Here is a made up example of a matching case. Person B will be returned in a list of perfect matches of people when starting at Person A. I forgot to draw the directions to those edges on this picture :/ take a look at the non matching case to see the directions.
Here is an example of a non matching case. Person B will not be returned in a list of perfect matches of People. Why? Because all outgoing edges on Person B do not resolve to identically matching edges on Person A; in this case, Person A refuses to eat apples, but Person B doesn't list a similar preference to anything they refuse to eat.
Another non matching case from the above example: If Person A refuses to eat apples and Person B refuses to eat bananas -- they will not match.
If Person B likes Fries the most and likes Cheeseburgers the least, that would be a non-matching case as well.
My initial idea (that I'm not sure how to implement) with a query
I would start at person A
Find all outgoing edges to preference vertices and store some sort of "marker" or map to that preference vertex with the edge label.
Traverse out of those vertices down all SimilarTo labeled edges. Copy those markers from preference vertex into the concept vertex.
Reverse the line: concept vertex -> preference vertex (copy makers from concept to preference vertex)
... then somehow match ALL edges to those markers...
exclude person a from the results
Any ideas?
Let's start with the creation of your sample graph:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV("person").property("name", "Person A").as("pa").
......1> addV("person").property("name", "Person B").as("pb").
......2> addV("food").property("name", "Hamburgers").as("hb").
......3> addV("food").property("name", "Chips").as("c").
......4> addV("food").property("name", "Cheeseburgers").as("cb").
......5> addV("food").property("name", "Fries").as("f").
......6> addV("category").property("name", "Burgers").as("b").
......7> addV("category").property("name", "Appetizers").as("a").
......8> addE("most").from("pa").to("hb").
......9> addE("most").from("pb").to("cb").
.....10> addE("least").from("pa").to("c").
.....11> addE("least").from("pb").to("f").
.....12> addE("similar").from("hb").to("b").
.....13> addE("similar").from("cb").to("b").
.....14> addE("similar").from("c").to("a").
.....15> addE("similar").from("f").to("a").iterate()
The query, you're looking for, is the following (I will explain each step later):
gremlin> g.V().has("person", "name", "Person A").as("p").
......1> outE("most","least","refuses").as("e").inV().out("similar").
......2> store("x").by(constant(1)).
......3> in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).
......4> groupCount().as("m").
......5> select("x").by(count(local)).as("c").
......6> select("m").unfold().
......7> where(select(values).as("c")).select(keys).values("name")
==>Person B
Now, when we add the "refuses to eat Apples" relation:
gremlin> g.V().has("person", "name", "Person A").as("p").
......1> addV("food").property("name", "Apples").as("a").
......2> addV("category").property("name", "Fruits").as("f").
......3> addE("refuses").from("p").to("a").
......4> addE("similar").from("a").to("f").iterate()
...Person B is no longer a match:
gremlin> g.V().has("person", "name", "Person A").as("p").
......1> outE("most","least","refuses").as("e").inV().out("similar").
......2> store("x").by(constant(1)).
......3> in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).
......4> groupCount().as("m").
......5> select("x").by(count(local)).as("c").
......6> select("m").unfold().
......7> where(select(values).as("c")).select(keys).values("name")
gremlin>
Let's go through the query step by step / line by line:
g.V().has("person", "name", "Person A").as("p").
This should be pretty clear: start at Person A.
outE("most","least","refuses").as("e").inV().out("similar").
Traverse the out edges and set a marker, so that we can reference the edges later. Then traverse to what I called category vertices.
store("x").by(constant(1)).
For every category vertex add a 1 to an internal collection. You could also store the vertex itself, but this would be a waste of memory, since we won't need any information from the vertices.
in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).
Traverse the other direction along the similar edges to the food and then along those edges that have the same label as the marked edge from the beginning. In the end ignore the person where the traversal started (Person A).
groupCount().as("m").
Count the number of traversers that made it to each person vertex.
select("x").by(count(local)).as("c").
Count the number of Category vertices (the 1s).
select("m").unfold().
Unfold the person counters, so the keys will be the person vertices and the values will be the number of traversers that made it to this vertex.
where(select(values).as("c")).select(keys).values("name")
Ultimately the number of crossed category vertices must match the number of traversers on a person vertex. If that's the case, we have a match.
Note, that it's necessary to have a similar edge incident to the Apples vertex.

Why I get null's in query result?

I have this model
Note -> Keyword
where one note have multiple keywords that describe it. I have this Vertex:
As you can see in "Out Edges" it have 3 Noticia_keys.
If you go to graph you get this:
All ok. But if I run this query:
select #rid as rid, out(Noticia_keys).name as claves from #12:2
I get this output:
Where it that null come from?
New data:
Since I have clear the DB I have new records. This is a trace of one and the problem remains.
Both query suggested by Alessandro return nothing.
Michela: the Vertex are added through the library ODBOGM that translate Object to Vertex. It the binary API with addVertex and addEdge.
Well! Finally I found the error!!
What was wrong:
In the query
select #rid as rid, out(Noticia_keys).name as claves from #12:2
the out parameter has no quote. The query work fine if you type:
select #rid as rid, out("Noticia_keys").name as claves from #12:2
I found the error just running this query
that show me in the "claves" column, rid's of other vertex like Medios (#16:) and Fuentes (#17:). The real problem is that the query not fail if it not found a property on a vertex. Since I request "name", the Keyword vertex response correctly but the others out's return null.
Thank for your time!

can i qualify on orientdb vertex/edge class in a gremlin query?

orientdb has a seemingly 'non-standard' feature to be able to create specific classes of vertices and edges.
g.createVertex('class:person')
but it's unclear to me if/how i can qualify on that class via 'standard' gremlin?
i have seen a reference to a syntax like so:
g.V('#class','person')...
here, but then there was mention of this syntax skirting indices.
can anyone shed light on this topic?
It seems that Gremlin doesn't adopt the Schema feature and not all of the graph databases support schemas, so I don't think that you can manipulate the OrientDB Schema directly with Gremlin.
Anyway, you can use the createVertexType() command to create classes inside OrientDB trhought Gremlin.
Connection to ODB database:
g = new OrientGraphNoTx('remote:localhost/GremlinDB')
==>orientgraphnotx[remote:localhost/GremlinDB]
Create the Vertex class Person that extends V:
g.createVertexType('Person','V')
==>Person
Now, if you look at the Schema in OrientDB Studio, you'll see the new class created:
EDITED
After having added two vertices
we can find the person with name = 'John'.
Using has():
g.V.has('#class','Person').has('name','John')
==>v(Person)[#12:0]
Using has() + T operator:
g.V.has('#class','Person').has('name',T.eq,'John')
==>v(Person)[#12:0]
Using contains():
g.V.has('#class','Person').filter{it.name.contains('John')}
==>v(Person)[#12:0]
Using ==:
g.V.has('#class','Person').filter{it.name == 'John'}
==>v(Person)[#12:0]
Hope it helps

How to speed up "global" queries in Titan DB?

We are using Titan with Persistit as backend, for a graph with about 100.000 vertices. Our use-case is quite complex, but the current problem can be illustrated with a simple example. Let's assume that we are storing Books and Authors in the graph. Each Book vertex has an ISBN number, which is unique for the whole graph.
I need to answer the following query:
Give me the set of ISBN numbers of all Books in the Graph.
Currently, we are doing it like this:
// retrieve graph instance
TitanGraph graph = getGraph();
// Start a Gremlin query (I omit the generics for brevity here)
GremlinPipeline gremlin = new GremlinPipeline().start(graph);
// get all vertices in the graph which represent books (we have author vertices, too!)
gremlin.V("type", "BOOK");
// the ISBN numbers are unique, so we use a Set here
Set<String> isbnNumbers = new HashSet<String>();
// iterate over the gremlin result and retrieve the vertex property
while(gremlin.hasNext()){
Vertex v = gremlin.next();
isbnNumbers.add(v.getProperty("ISBN"));
}
return isbnNumbers;
My question is: is there a smarter way to do this faster? I am new to Gremlin, so it might very well be that I do something horribly stupid here. The query currently takes 2.5 seconds, which is not too bad, but I would like to speed it up, if possible. Please consider the backend as fixed.
I doubt that there is a much faster way (you will always need to iterate over all book vertices), however a less verbose solution to your task is possible with groovy/gremlin.
On the sample graph you can run e.g. the following query:
gremlin> namesOfJaveProjs = []; g.V('lang','java').name.store(namesOfJaveProjs)
gremlin> namesOfJaveProjs
==>lop
==>ripple
Or for your book graph:
isbnNumbers = []; g.V('type','BOOK').ISBN.store(isbnNumbers)

Does orient DB support group by operation?

I think the answer is no, but i still want to confirm
My use case is as follows:
1)Get sub graph between source and destination.
2)Then for each node in subgraph obtained in step 1 , do some aggregation
Some thing like
select nodeId, sum(mycount) where prop ="value" and prop2 ="value2" and nodeId in [x,y,z ] group by nodeId
Can u do this in orinet DB ?If yes how efficiently?
Group by is supported and it's fast. Have you tried it?