Gremlin - how do you merge vertices to combine their properties without listing the properties explicitly? - titan

Background: I'm trying to implement a time-series versioned DB using this approach, using gremlin (tinkerpop v3).
I want to get the latest state node (in red) for a given identity node (in blue) (linked by a 'state' edge which contains a timestamp range), but I want to return a single aggregated object which contains the id (cid) from the identity node and all the properties from the state node, but I don't want to have to list them explicitly.
(8640000000000000 is my way of indicating no 'to' date - i.e. the edge is current - slightly different from the image shown).
I've got this far:
:> g.V().hasLabel('product').
as('cid').
outE('state').
has('to', 8640000000000000).
inV().
as('name').
as('price').
select('cid', 'name','price').
by('cid').
by('name').
by('price')
=>{cid=1, name="Cheese", price=2.50}
=>{cid=2, name="Ham", price=5.00}
but as you can see I have to list out the properties of the 'state' node - in the example above the name and price properties of a product. But this will apply to any domain object so I don't want to have to list the properties all the time. I could run a query before this to get the properties but I don't think I should need to run 2 queries, and have the overhead of 2 round trips. I've looked at 'aggregate', 'union', 'fold' etc but nothing seems to do this.
Any ideas?
===================
Edit:
Based on Daniel's answer (which doesn't quite do what I want ATM) I'm going to use his example graph. In the 'modernGraph' people-create->software. If I run:
> g.V().hasLabel('person').valueMap()
==>[name:[marko], age:[29]]
==>[name:[vadas], age:[27]]
==>[name:[josh], age:[32]]
==>[name:[peter], age:[35]]
then the results are a list of entities's with the properties. What I want is, on the assumption that a person can only create one piece of software ever (although hopefully we will see how this could be opened up later for lists of software created), to include the created software 'language' property into the returned entity to get:
> <run some query here>
==>[name:[marko], age:[29], lang:[java]]
==>[name:[vadas], age:[27], lang:[java]]
==>[name:[josh], age:[32], lang:[java]]
==>[name:[peter], age:[35], lang:[java]]
At the moment the best suggestion so far comes up with the following:
> g.V().hasLabel('person').union(identity(), out("created")).valueMap().unfold().group().by {it.getKey()}.by {it.getValue()}
==>[name:[marko, lop, lop, lop, vadas, josh, ripple, peter], lang:[java, java, java, java], age:[29, 27, 32, 35]]
I hope that's clearer. If not please let me know.

Since you didn't provide I sample graph, I'll use TinkerPop's toy graph to show how it's done.
Assume you want to merge marko and lop:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).valueMap()
==>[name:[marko],age:[29]]
gremlin> g.V(1).out("created").valueMap()
==>[name:[lop],lang:[java]]
Note, that there are two name properties and in theory you won't be able to predict which name makes it into your merged result; however that doesn't seem to be an issue in your graph.
Get the properties for both vertices:
gremlin> g.V(1).union(identity(), out("created")).valueMap()
==>[name:[marko],age:[29]]
==>[name:[lop],lang:[java]]
Merge them:
gremlin> g.V(1).union(identity(), out("created")).valueMap().
unfold().group().by(select(keys)).by(select(values))
==>[name:[lop],lang:[java],age:[29]]
UPDATE
Thank you for the added sample output. That makes it a lot easier to come up with a solution (although I think your output contains errors; vadas didn't create anything).
gremlin> g.V().hasLabel("person").
filter(outE("created")).map(
union(valueMap(),
outE("created").limit(1).inV().valueMap("lang")).
unfold().group().by {it.getKey()}.by {it.getValue()})
==>[name:[marko], lang:[java], age:[29]]
==>[name:[josh], lang:[java], age:[32]]
==>[name:[peter], lang:[java], age:[35]]

Merging edge and vertex properties using gremlin java DSL:
g.V().has('User', 'id', userDbId).outE(Edges.TWEETS)
.union(__.identity().valueMap(), __.inV().valueMap())
.unfold().group().by(__.select(Column.keys)).by(__.select(Column.values))
.map(v -> converter.toTweet((Map) v.get())).toList();

Thanks for the answer by Daniel Kuppitz and youhans it has given me a basic idea on the solution of the issue. But later I found out that the solution is not working for multiple rows. It is required to have local step for handling multiple rows. The modified gremlin query will look like:
g.V()
.local(
__.union(__.valueMap(), __.outE().inV().valueMap())
.unfold().group().by(__.select(Column.keys)).by(__.select(Column.values))
)
This will limit the scope of union and group by to a single row.
If you can work with custom DSL ,create custom DSL with java like this one.
public default GraphTraversal<S, LinkedHashMap> unpackMaps(){
GraphTraversal<S, LinkedHashMap> it = map(x -> {
LinkedHashMap mapSource = (LinkedHashMap) x.get();
LinkedHashMap mapDest = new LinkedHashMap();
mapSource.keySet().stream().forEach(key->{
Object obj = mapSource.get(key);
if (obj instanceof LinkedHashMap) {
LinkedHashMap childMap = (LinkedHashMap) obj;
childMap.keySet().iterator().forEachRemaining( key_child ->
mapDest.put(key_child,childMap.get(key_child)
));
} else
mapDest.put(key,obj);
});
return mapDest;
});
return it;
}
and use it freely like
g.V().as("s")
.valueMap().as("value_map_0")
.select("s").outE("INFO1").inV().valueMap().as("value_map_1")
.select("s").outE("INFO2").inV().valueMap().as("value_map_2")
.select("s").outE("INFO3").inV().valueMap().as("value_map_3")
.select("s").local(__.outE("INFO1").count()).as("value_1")
.select("s").outE("INFO1").inV().value("name").as("value_2")
.project("val_map1","val_map2","val_map3","val1","val2")
.by(__.select("value_map_1"))
.by(__.select("value_map_2"))
.by(__.select("value_1"))
.by(__.select("value_2"))
.unpackMaps()
results to rows with
map1_val1, map1_val2,.... ,map2_va1, map2_val2....,value1, value2
This can handle mix of values and valueMaps in a natural gremlin way.

Related

JanusGraph indexing in Scala

I am using Spark to make a JanusGraph from a data stream, but am having issues indexing and creating properties. I want to create an index by a vertex property called "register_id". I am not sure I'm doing it the right way.
So, here's my code:
var gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
gr1.close()
// This is done to clear the graph made in every run.
JanusGraphFactory.drop(gr1)
gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
var reg_id_prop = gr1.makePropertyKey("register_id").dataType(classOf[String]).make()
var mgmt = gr1.openManagement()
gr1.tx().rollback()
mgmt.buildIndex("byRegId", classOf[Vertex]).addKey(reg_id_prop).buildCompositeIndex()
When I run the above, I get an error saying:
"Vertex with id 5164 was removed".
Also, how do I check if I have vertices with a certain property in the graph or not in Scala. I know in gremlin, g.V().has('name', 'property_value') works, but can't figure out how to do this in Scala. I tried Gremlin-Scala but can't seem to find it.
Any help will be appreciated.
You should be using mgmt object to build the schema, not the graph object. You also need to make sure to mgmt.commit() the schema updates.
gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
var mgmt = gr1.openManagement()
var reg_id_prop = mgmt.makePropertyKey("register_id").dataType(classOf[String]).make()
mgmt.buildIndex("byRegId", classOf[Vertex]).addKey(reg_id_prop).buildCompositeIndex()
mgmt.commit()
Refer to the indexing docs from JanusGraph.
For your second question on checking for the existence of a vertex using the composite index, you need to finish your traversal with a terminal step. For example, in Java, this would return a boolean value:
g.V().has('name', 'property_value').hasNext()
Refer to iterating the traversal docs from JanusGraph.
Reading over the gremlin-scala README, it looks like it has a few options for terminal steps that you could use like head, headOption, toList, or toSet.
g.V().has('name', 'property_value').headOption
You should also check out the gremlin-scala-examples and the gremlin-scala traversal specification.

How do I update a specific edge property using Gremlin/Titan/TinkerPop3?

The goal
I have a simple enough task to accomplish: Set the weight of a specific edge property. Take this scenario as an example:
What I would like to do is update the value of weight.
Additional Requirements
If the edge does not exist, it should be created.
There may only exist at most one edge of the same type between the two nodes (i.e., there can't be multiple "votes_for" edges of type "eat" between Joey and Pizza.
The task should be solved using the Java API of Titan (which includes Gremlin as part of TinkerPop 3).
What I know
I have the following information:
The Vertex labeled "user"
The edge label votes_for
The value of the edge property type (in this case, "eat")
The value of the property name of the vertex labeled "meal" (in this case "pizza"), and hence also its Vertex.
What I thought of
I figured I would need to do something like the following:
Start at the Joey vertex
Find all outgoing edges (which should be at most 1) labeled votes_for having type "eat" and an outgoing vertex labeled "meal" having name "pizza".
Update the weight value of the edge.
This is what I've messed around with in code:
//vertex is Joey in this case
g.V(vertex.id())
.outE("votes_for")
.has("type", "eat")
//... how do I filter by .outV so that I can check for "pizza"?
.property(Cardinality.single, "weight", 0.99);
//... what do I do when the edge doesn't exist?
As commented in code there are still issues. Would explicitly specifying a Titan schema help? Are there any helper/utility methods I don't know of? Would it make more sense to have several vote_for labels instead of one label + type property, like vote_for_eat?
Thanks for any help!
You are on the right track. Check out the vertex steps documentation.
Label the edge, then traverse from the edge to the vertex to check, then jump back to the edge to update the property.
g.V(vertex.id()).
outE("votes_for").has("type", "eat").as("e").
inV().has("name", "pizza").
select("e").property("weight", 0.99d).
iterate()
Full Gremlin console session:
gremlin> Titan.version()
==>1.0.0
gremlin> Gremlin.version()
==>3.0.1-incubating
gremlin> graph = TitanFactory.open('inmemory'); g = graph.traversal()
==>graphtraversalsource[standardtitangraph[inmemory:[127.0.0.1]], standard]
gremlin> vertex = graph.addVertex(T.label, 'user', 'given_name', 'Joey', 'family_name', 'Tribbiani')
==>v[4200]
gremlin> pizza = graph.addVertex(T.label, 'meal', 'name', 'pizza')
==>v[4104]
gremlin> votes = vertex.addEdge('votes_for', pizza, 'type', 'eat', 'weight', 0.8d)
==>e[1zh-38o-4r9-360][4200-votes_for->4104]
gremlin> g.E(votes).valueMap(true)
==>[label:votes_for, weight:0.8, id:2rx-38o-4r9-360, type:eat]
gremlin> g.V(vertex.id()).outE('votes_for').has('type','eat').as('e').inV().has('name','pizza').select('e').property('weight', 0.99d).iterate(); g.E(votes).valueMap(true)
==>[label:votes_for, weight:0.99, id:2rx-38o-4r9-360, type:eat]
Would explicitly specifying a Titan schema help?
If you wanted to start from the Joey node without having a reference to the vertex or its id, this would be a good use case for a Titan composite index. The traversal would start with:
g.V().has("given_name", "Joey")
Are there any helper/utility methods I don't know of?
In addition to the TinkerPop reference documentation, there are several tutorials that you can read through:
Getting Started
The Gremlin Console
Recipes
Would it make more sense to have several vote_for labels instead of one label + type property, like vote_for_eat?
Depends on what your graph model or query patterns are, but more granular labels like vote_for_eat can work out fine. You can pass multiple edge labels on the traversal step:
g.V(vertex.id()).outE('vote_for_eat', 'vote_for_play', 'vote_for_sleep')
Update
There may only exist at most one edge of the same type between the two nodes
You can use the Titan schema to help with this, specifically define an edge label with multiplicity ONE2ONE. An exception will be thrown if you create more than one votes_for_eat between Joey and pizza.
Jason already answered nearly all of your questions. The only aspect missing is:
If the edge does not exist, it should be created.
So I'll try to answer this point with a slightly different query. This query adds a new edge if it doesn't exist already and then updates / adds the weight property:
g.V(vertex.id()).outE('votes_for').has('type', 'eat')
.where(__.inV().hasLabel('meal').has('name','pizza')) // filter for the edge to update
.tryNext() // select the edge if it exists
.orElseGet({g.V(vertex.id()).next()
.addEdge('votes_for', g.V(pizzaId).next(), 'type', 'eat')}) // otherwise, add the edge
.property('weight', 0.99) // finally, update / add the 'weight' property

The provided start does not map to a value

I have a traversal as follows:
g.V().hasLabel("demoUser")
.as("demoUser","socialProfile","followCount","requestCount")
.select("demoUser","socialProfile","followCount","postCount")
.by(__.valueMap())
.by(__.out("socialProfileOf").valueMap())
.by(__.in("followRequest").hasId(currentUserId).count())
.by(__.outE("postAuthorOf").count())
I'm trying to select a user vertex, their linked social profile vertex, and some other counts. The issue is that all users may not have a socialProfile edge. When this is the case the traversal fails with the following error:
The provided start does not map to a value: v[8280]->[TitanVertexStep(OUT,[socialProfileOf],vertex), PropertyMapStep(value)]
I did find this thread from the gremlin team. I tried wrapping the logic inside of .by() with a coalesce(), and also appending a .fold() to the end of the statement with no luck.
How do I make that selection optional? I want to select a socialProfile if one exists, but always select the demoUser.
coalesce is the right choice. Let's assume that persons in the modern graph have either one or no project associated with them:
gremlin> g.V().hasLabel("person").as("user","project").
select("user","project").by("name").by(coalesce(out("created").values("name"),
constant("N/A")))
==>{user=marko, project=lop}
==>{user=vadas, project=N/A}
==>{user=josh, project=ripple}
==>{user=peter, project=lop}
Another way would be to completely exclude it from the result:
g.V().hasLabel("person").as("user","project").choose(out("created"),
select("user","project").by("name").by(out("created").values("name")),
select("user").by("name"))
But obviously this will only look good if each branch returns a map / selects more than 1 thing, otherwise you're going to have mixed result types.

traversing orientdb graph, sql-traverse vs gremlin

I want to model linked nodes data set:
Node(A)----next---->Node(B)----next---->Node(C)
applying SQL-Traverse:
traverse out('next') from Node(A)
will include Node(A) in result; A,B,C ,and this is the desired output,
where as using gremlin:
g.('Node(A)').as('start').out('next').loop('start')
will only returns B,C ,
my question is how to emit Node A in gremlin , followed by other nodes, in the same order they were linked in, and i prefer the end result to be pipline; i.e. i tried aggregate(), but the problem with it is that it will make me use the aggregated collection as a start point for a new pipline with new traverse, and i dont want this behavior, any ideas? thanks.
I think path will do what you want:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.v(1).as('s').out().loop('s'){true}{true}.path()
==>[v[1], v[3]]
==>[v[1], v[2]]
==>[v[1], v[4]]
==>[v[1], v[4], v[3]]
==>[v[1], v[4], v[5]]

Address an ordered list the RESTful way

I doubt what's the best way to address an ordered list in a RESTful API. Imagine the following example: Let's create a chart list of LPs, where you want to add new LPs, delete those which aren't in the TOP10 yet, and change their positions. How would you implement those methods in a RESTful JSON-API?
I thought of the following way:
GET / to return the ordered chart list like [{ "name": "1st-place LP", "link": "/uid123" }, { "name": "2nd-place LP", "link": "/uid987" }, ...]
GET /{uid} to return a LP by its unique ID, returning sth. like {"name": "1st-place LP", "ranking": 1 }
GET /ranking/{position} to access e.g. the current first-ranked LP, returning a 303 See Other with a Location-header like Location: /uid123
POST / with request body { "name": "my first LP title" } to create a new LP without specifying its current chart position
Now it's the question how we could change the current chart positions? One could simply PUT /{uid} to update the ranking attribute, but I think a PUT /ranking/{position} would be more natural. On the other hand it doesn't make sense to PUT against an URI which will return a 303 See Other when using GET.
What do you think would be the best way to address such a chart list? I don't like the solution of changing simply the ranking attribute in the LP-datasets as this could end in senseless states like two LPs with the same ranking and so on.
I see two questions. 1. What is the most RESTful (beautiful) way to design the API? 2. How do I make sure that two LPs does not get the same ranking?
1:
Your LPs could have several properties that are relative to eachother, e.g. different ranking on different charts. I would say that you want the ranking moved OUT of your LP resource. Keep the ranking on a certain list as a separate resource. Example:
GET /LPuid only returns properties about the LP, not relative properties, like rankings
GET /billboard/3 returns the URI to LP that has rankning 3 on the billboard list.
PUT /billboard takes a document of 100 LP URI's.
PUT /billboard/3 INSERTS an LP URI at that ranking and moves the other ones down.
2: That has nothing to do with rest and you would have that issue no matter how you design your API. Transactions is one solution.
You have two collection resources within your music service. As such, I would design a URI structure like this:
/ => returns links to collections (ergo itself a collection resource)
/releases => returns a list of LPs
/chart => returns the top 10 LPs, or redirects to the current chart URI
You would POST to /releases to add a new LP, and PUT or PATCH to /chart to define a new chart or alter the current chart. You will need to define what representation formats are transfered in each case.
This gives you the flexability to define thinks like /chart/2012-12-25 to show the chart as it stood on christmas day 2012.
I do not suggest using PUT /chart/{position} to insert an LP at a specific position and shuffle everything else down. Intermediarys would not know that a PUT to that URI causes other resources to change their URIs. This is bad for caching.
Also, as a user, I would hope you avoid the word "billboard" as the other answer suggests. A billboard conjures in the mind pictures of an advertising hoarding, and not anything to do with ranking charts!