The goal
I have a simple enough task to accomplish: Set the weight of a specific edge property. Take this scenario as an example:
What I would like to do is update the value of weight.
Additional Requirements
If the edge does not exist, it should be created.
There may only exist at most one edge of the same type between the two nodes (i.e., there can't be multiple "votes_for" edges of type "eat" between Joey and Pizza.
The task should be solved using the Java API of Titan (which includes Gremlin as part of TinkerPop 3).
What I know
I have the following information:
The Vertex labeled "user"
The edge label votes_for
The value of the edge property type (in this case, "eat")
The value of the property name of the vertex labeled "meal" (in this case "pizza"), and hence also its Vertex.
What I thought of
I figured I would need to do something like the following:
Start at the Joey vertex
Find all outgoing edges (which should be at most 1) labeled votes_for having type "eat" and an outgoing vertex labeled "meal" having name "pizza".
Update the weight value of the edge.
This is what I've messed around with in code:
//vertex is Joey in this case
g.V(vertex.id())
.outE("votes_for")
.has("type", "eat")
//... how do I filter by .outV so that I can check for "pizza"?
.property(Cardinality.single, "weight", 0.99);
//... what do I do when the edge doesn't exist?
As commented in code there are still issues. Would explicitly specifying a Titan schema help? Are there any helper/utility methods I don't know of? Would it make more sense to have several vote_for labels instead of one label + type property, like vote_for_eat?
Thanks for any help!
You are on the right track. Check out the vertex steps documentation.
Label the edge, then traverse from the edge to the vertex to check, then jump back to the edge to update the property.
g.V(vertex.id()).
outE("votes_for").has("type", "eat").as("e").
inV().has("name", "pizza").
select("e").property("weight", 0.99d).
iterate()
Full Gremlin console session:
gremlin> Titan.version()
==>1.0.0
gremlin> Gremlin.version()
==>3.0.1-incubating
gremlin> graph = TitanFactory.open('inmemory'); g = graph.traversal()
==>graphtraversalsource[standardtitangraph[inmemory:[127.0.0.1]], standard]
gremlin> vertex = graph.addVertex(T.label, 'user', 'given_name', 'Joey', 'family_name', 'Tribbiani')
==>v[4200]
gremlin> pizza = graph.addVertex(T.label, 'meal', 'name', 'pizza')
==>v[4104]
gremlin> votes = vertex.addEdge('votes_for', pizza, 'type', 'eat', 'weight', 0.8d)
==>e[1zh-38o-4r9-360][4200-votes_for->4104]
gremlin> g.E(votes).valueMap(true)
==>[label:votes_for, weight:0.8, id:2rx-38o-4r9-360, type:eat]
gremlin> g.V(vertex.id()).outE('votes_for').has('type','eat').as('e').inV().has('name','pizza').select('e').property('weight', 0.99d).iterate(); g.E(votes).valueMap(true)
==>[label:votes_for, weight:0.99, id:2rx-38o-4r9-360, type:eat]
Would explicitly specifying a Titan schema help?
If you wanted to start from the Joey node without having a reference to the vertex or its id, this would be a good use case for a Titan composite index. The traversal would start with:
g.V().has("given_name", "Joey")
Are there any helper/utility methods I don't know of?
In addition to the TinkerPop reference documentation, there are several tutorials that you can read through:
Getting Started
The Gremlin Console
Recipes
Would it make more sense to have several vote_for labels instead of one label + type property, like vote_for_eat?
Depends on what your graph model or query patterns are, but more granular labels like vote_for_eat can work out fine. You can pass multiple edge labels on the traversal step:
g.V(vertex.id()).outE('vote_for_eat', 'vote_for_play', 'vote_for_sleep')
Update
There may only exist at most one edge of the same type between the two nodes
You can use the Titan schema to help with this, specifically define an edge label with multiplicity ONE2ONE. An exception will be thrown if you create more than one votes_for_eat between Joey and pizza.
Jason already answered nearly all of your questions. The only aspect missing is:
If the edge does not exist, it should be created.
So I'll try to answer this point with a slightly different query. This query adds a new edge if it doesn't exist already and then updates / adds the weight property:
g.V(vertex.id()).outE('votes_for').has('type', 'eat')
.where(__.inV().hasLabel('meal').has('name','pizza')) // filter for the edge to update
.tryNext() // select the edge if it exists
.orElseGet({g.V(vertex.id()).next()
.addEdge('votes_for', g.V(pizzaId).next(), 'type', 'eat')}) // otherwise, add the edge
.property('weight', 0.99) // finally, update / add the 'weight' property
Related
I am using Spark to make a JanusGraph from a data stream, but am having issues indexing and creating properties. I want to create an index by a vertex property called "register_id". I am not sure I'm doing it the right way.
So, here's my code:
var gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
gr1.close()
// This is done to clear the graph made in every run.
JanusGraphFactory.drop(gr1)
gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
var reg_id_prop = gr1.makePropertyKey("register_id").dataType(classOf[String]).make()
var mgmt = gr1.openManagement()
gr1.tx().rollback()
mgmt.buildIndex("byRegId", classOf[Vertex]).addKey(reg_id_prop).buildCompositeIndex()
When I run the above, I get an error saying:
"Vertex with id 5164 was removed".
Also, how do I check if I have vertices with a certain property in the graph or not in Scala. I know in gremlin, g.V().has('name', 'property_value') works, but can't figure out how to do this in Scala. I tried Gremlin-Scala but can't seem to find it.
Any help will be appreciated.
You should be using mgmt object to build the schema, not the graph object. You also need to make sure to mgmt.commit() the schema updates.
gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
var mgmt = gr1.openManagement()
var reg_id_prop = mgmt.makePropertyKey("register_id").dataType(classOf[String]).make()
mgmt.buildIndex("byRegId", classOf[Vertex]).addKey(reg_id_prop).buildCompositeIndex()
mgmt.commit()
Refer to the indexing docs from JanusGraph.
For your second question on checking for the existence of a vertex using the composite index, you need to finish your traversal with a terminal step. For example, in Java, this would return a boolean value:
g.V().has('name', 'property_value').hasNext()
Refer to iterating the traversal docs from JanusGraph.
Reading over the gremlin-scala README, it looks like it has a few options for terminal steps that you could use like head, headOption, toList, or toSet.
g.V().has('name', 'property_value').headOption
You should also check out the gremlin-scala-examples and the gremlin-scala traversal specification.
Background: I'm trying to implement a time-series versioned DB using this approach, using gremlin (tinkerpop v3).
I want to get the latest state node (in red) for a given identity node (in blue) (linked by a 'state' edge which contains a timestamp range), but I want to return a single aggregated object which contains the id (cid) from the identity node and all the properties from the state node, but I don't want to have to list them explicitly.
(8640000000000000 is my way of indicating no 'to' date - i.e. the edge is current - slightly different from the image shown).
I've got this far:
:> g.V().hasLabel('product').
as('cid').
outE('state').
has('to', 8640000000000000).
inV().
as('name').
as('price').
select('cid', 'name','price').
by('cid').
by('name').
by('price')
=>{cid=1, name="Cheese", price=2.50}
=>{cid=2, name="Ham", price=5.00}
but as you can see I have to list out the properties of the 'state' node - in the example above the name and price properties of a product. But this will apply to any domain object so I don't want to have to list the properties all the time. I could run a query before this to get the properties but I don't think I should need to run 2 queries, and have the overhead of 2 round trips. I've looked at 'aggregate', 'union', 'fold' etc but nothing seems to do this.
Any ideas?
===================
Edit:
Based on Daniel's answer (which doesn't quite do what I want ATM) I'm going to use his example graph. In the 'modernGraph' people-create->software. If I run:
> g.V().hasLabel('person').valueMap()
==>[name:[marko], age:[29]]
==>[name:[vadas], age:[27]]
==>[name:[josh], age:[32]]
==>[name:[peter], age:[35]]
then the results are a list of entities's with the properties. What I want is, on the assumption that a person can only create one piece of software ever (although hopefully we will see how this could be opened up later for lists of software created), to include the created software 'language' property into the returned entity to get:
> <run some query here>
==>[name:[marko], age:[29], lang:[java]]
==>[name:[vadas], age:[27], lang:[java]]
==>[name:[josh], age:[32], lang:[java]]
==>[name:[peter], age:[35], lang:[java]]
At the moment the best suggestion so far comes up with the following:
> g.V().hasLabel('person').union(identity(), out("created")).valueMap().unfold().group().by {it.getKey()}.by {it.getValue()}
==>[name:[marko, lop, lop, lop, vadas, josh, ripple, peter], lang:[java, java, java, java], age:[29, 27, 32, 35]]
I hope that's clearer. If not please let me know.
Since you didn't provide I sample graph, I'll use TinkerPop's toy graph to show how it's done.
Assume you want to merge marko and lop:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).valueMap()
==>[name:[marko],age:[29]]
gremlin> g.V(1).out("created").valueMap()
==>[name:[lop],lang:[java]]
Note, that there are two name properties and in theory you won't be able to predict which name makes it into your merged result; however that doesn't seem to be an issue in your graph.
Get the properties for both vertices:
gremlin> g.V(1).union(identity(), out("created")).valueMap()
==>[name:[marko],age:[29]]
==>[name:[lop],lang:[java]]
Merge them:
gremlin> g.V(1).union(identity(), out("created")).valueMap().
unfold().group().by(select(keys)).by(select(values))
==>[name:[lop],lang:[java],age:[29]]
UPDATE
Thank you for the added sample output. That makes it a lot easier to come up with a solution (although I think your output contains errors; vadas didn't create anything).
gremlin> g.V().hasLabel("person").
filter(outE("created")).map(
union(valueMap(),
outE("created").limit(1).inV().valueMap("lang")).
unfold().group().by {it.getKey()}.by {it.getValue()})
==>[name:[marko], lang:[java], age:[29]]
==>[name:[josh], lang:[java], age:[32]]
==>[name:[peter], lang:[java], age:[35]]
Merging edge and vertex properties using gremlin java DSL:
g.V().has('User', 'id', userDbId).outE(Edges.TWEETS)
.union(__.identity().valueMap(), __.inV().valueMap())
.unfold().group().by(__.select(Column.keys)).by(__.select(Column.values))
.map(v -> converter.toTweet((Map) v.get())).toList();
Thanks for the answer by Daniel Kuppitz and youhans it has given me a basic idea on the solution of the issue. But later I found out that the solution is not working for multiple rows. It is required to have local step for handling multiple rows. The modified gremlin query will look like:
g.V()
.local(
__.union(__.valueMap(), __.outE().inV().valueMap())
.unfold().group().by(__.select(Column.keys)).by(__.select(Column.values))
)
This will limit the scope of union and group by to a single row.
If you can work with custom DSL ,create custom DSL with java like this one.
public default GraphTraversal<S, LinkedHashMap> unpackMaps(){
GraphTraversal<S, LinkedHashMap> it = map(x -> {
LinkedHashMap mapSource = (LinkedHashMap) x.get();
LinkedHashMap mapDest = new LinkedHashMap();
mapSource.keySet().stream().forEach(key->{
Object obj = mapSource.get(key);
if (obj instanceof LinkedHashMap) {
LinkedHashMap childMap = (LinkedHashMap) obj;
childMap.keySet().iterator().forEachRemaining( key_child ->
mapDest.put(key_child,childMap.get(key_child)
));
} else
mapDest.put(key,obj);
});
return mapDest;
});
return it;
}
and use it freely like
g.V().as("s")
.valueMap().as("value_map_0")
.select("s").outE("INFO1").inV().valueMap().as("value_map_1")
.select("s").outE("INFO2").inV().valueMap().as("value_map_2")
.select("s").outE("INFO3").inV().valueMap().as("value_map_3")
.select("s").local(__.outE("INFO1").count()).as("value_1")
.select("s").outE("INFO1").inV().value("name").as("value_2")
.project("val_map1","val_map2","val_map3","val1","val2")
.by(__.select("value_map_1"))
.by(__.select("value_map_2"))
.by(__.select("value_1"))
.by(__.select("value_2"))
.unpackMaps()
results to rows with
map1_val1, map1_val2,.... ,map2_va1, map2_val2....,value1, value2
This can handle mix of values and valueMaps in a natural gremlin way.
I have a traversal as follows:
g.V().hasLabel("demoUser")
.as("demoUser","socialProfile","followCount","requestCount")
.select("demoUser","socialProfile","followCount","postCount")
.by(__.valueMap())
.by(__.out("socialProfileOf").valueMap())
.by(__.in("followRequest").hasId(currentUserId).count())
.by(__.outE("postAuthorOf").count())
I'm trying to select a user vertex, their linked social profile vertex, and some other counts. The issue is that all users may not have a socialProfile edge. When this is the case the traversal fails with the following error:
The provided start does not map to a value: v[8280]->[TitanVertexStep(OUT,[socialProfileOf],vertex), PropertyMapStep(value)]
I did find this thread from the gremlin team. I tried wrapping the logic inside of .by() with a coalesce(), and also appending a .fold() to the end of the statement with no luck.
How do I make that selection optional? I want to select a socialProfile if one exists, but always select the demoUser.
coalesce is the right choice. Let's assume that persons in the modern graph have either one or no project associated with them:
gremlin> g.V().hasLabel("person").as("user","project").
select("user","project").by("name").by(coalesce(out("created").values("name"),
constant("N/A")))
==>{user=marko, project=lop}
==>{user=vadas, project=N/A}
==>{user=josh, project=ripple}
==>{user=peter, project=lop}
Another way would be to completely exclude it from the result:
g.V().hasLabel("person").as("user","project").choose(out("created"),
select("user","project").by("name").by(out("created").values("name")),
select("user").by("name"))
But obviously this will only look good if each branch returns a map / selects more than 1 thing, otherwise you're going to have mixed result types.
I'm trying to model follower relationships between certain users in my app:
user----follows----user
(think Twitter)
Given a set of userIds I need to return all those user vertices and a boolean if a particular user (currentUser) has a follows edge to those users. So I need to know whether or not currentUser is following each of these users:
user1: true
user2: true
user3: false
user4: true
I'm stuck on how to fetch that follow status. If I return each user vertex like so:
currentUser = g.V(1);
g.V().hasLabel("appUser").or(__.has("userId","123869681319429"),
__.has("userId","103659593341656")).valueMap();
what would be an efficient command to determine if each of those had an incoming follows edge from currentUser?
TitanDB 1.0.0 running on DynamoDB.
Edit- Adding My full working traversal:
g.V().hasLabel('appUser').or(__.has('cId', '1232'),__.has('cId', '1116')).group().by().by(__.in('follows').hasId(hasLabel('appUser').has('pId', 'd13dfa6').id()).count())
Edit 2 -
I wound up rewriting this traversal to better capture the data I needed by using as() and select(). Leaving here for reference:
g.V().hasLabel('appUser').or(__.has('cId', '1232'),__.has('cId', '1116')).as('user','followCount').select('user','followCount').by(__.valueMap()).by(__.in('follows').hasId(hasLabel('appUser').has('pId', 'd13dfa6').id()).count())
Here's one way to do it. Assume this sample graph:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> vUser1 = graph.addVertex(id,1)
==>v[1]
gremlin> vUser2 = graph.addVertex(id,2)
==>v[2]
gremlin> vUser3 = graph.addVertex(id,3)
==>v[3]
gremlin> vUser1.addEdge('follows',vUser2)
==>e[0][1-follows->2]
gremlin> vUser3.addEdge('follows',vUser3)
==>e[1][3-follows->3]
Your code snippet above demonstrates that you will have the "current user" vertex and the vertices of the users you want to compare to that current user to see if there are any follows relationships. Given that assumption, you could approach it this way:
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:3 edges:2], standard]
gremlin> g.V(vUser2,vUser3).group().by().by(__.in("follows").hasId(vUser1.id()).count())
==>[v[2]:1, v[3]:0]
In this case, you iterate the list of user vertices you want to compare against, then group on them. The traversal will output a Map where a value greater than 0 represents a follows relationship and a value of zero represents the opposite of no follow relationship. So, in the example above, user 1 follows 2 but doesn't follow 3.
I tried searching on this, but could not find any simple answer. Based on a image in this link it seems like it does, but I am not sure.
What I am talking about are examples like this:
Example 1: One Property
A --> B --> C
Property 1: Knows
B "Knows" A and C.
Example 2: Multiple Properties
A --> B
(I am not sure how to show multiple properties here)
Property 1: Knows
Property 2: Friends
A is "Friends" with B and A "Knows" B
Also is there some way to introduce Hierarchy.
If A is "Friends" with "B" than A implicitly also "Knows" B.
A general yes or no would be enough. If there is some example or link that you can provide that has more explanation that would be great.
Thanks
Course you can. OrientDB has 3 Graph API. One of these is the TinkerPop Blueprints API that are highly documented: http://github.com/tinkerpop/blueprints/wiki
To create 2 edges:
Vertex luca = graph.addVertex(null);
luca.setProperty( "name", "Luca" );
Vertex marko = graph.addVertex(null);
marko.setProperty( "name", "Marko" );
Edge lucaKnowsMarko = graph.addEdge(null, luca, marko, "knows");
Vertex jay = graph.addVertex(null);
marko.setProperty( "name", "Jay" );
Edge lucaRespectsJay = graph.addEdge(null, luca, jay, "respects");
Lvc#