Gremlin, including vertices with zero edges

Gremlin, including vertices with zero edges - orientdb

let's say you have the following relationship:
Author-----wrote---->Article
and you want to prepare a report about each author and how many articles he has wrote and the date of his last article, the proplem appears when there are authors who wrote no articles, they will be dropped when you pass the 'wrote' pipe and i want to include them with '0' in 'count' column and 'N/A' in 'date' column, so my quesion is how to solve this problem ?

I assume that you are still working with TinkerPop 2.x given your usage of OrientDB so I will answer in that fashion. You need to do something like:
gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> bill = g.addVertex([author:'bill',type:'author'])
==>v[0]
gremlin> amy = g.addVertex([author:'amy',type:'author'])
==>v[1]
gremlin> book1 = g.addVertex([book:1,type:'book'])
==>v[2]
gremlin> book2 = g.addVertex([book:2,type:'book'])
==>v[3]
gremlin> bill.addEdge('wrote',book1)
==>e[4][0-wrote->2]
gremlin> bill.addEdge('wrote',book2)
==>e[5][0-wrote->3]
gremlin> g.V.has('type','author').transform{[it, it.outE('wrote').count()]}
==>[v[0], 2]
==>[v[1], 0]

Related

gremlin hasId should be equivalent to id().is(xxx) in apache-tinkerpop-gremlin-console-3.3.4-bin.zip

From my understanding of hasId step, it should be the same behavior to id().is() step. Which means the follow script should print the same result.
g.V().hasId(4)
g.V().id().is(4)
But unfortunately, the hasId() step seems not work as I expected, is there any something wrong from my side ? the whole script in below FYI.
gremlin> g.addV('Orange').property('price', '1.79').property('location', 'location-0').property('_classname', 'com.microsoft.spring.data.gremlin.common.domain.Orange')
==>v[4]
gremlin> g.V().id().is(0)
==>0
gremlin> g.V().id()
==>0
==>4
gremlin> g.addV('Orange').property('price', '1.79').property('location', 'location-0').property('_classname', 'com.microsoft.spring.data.gremlin.common.domain.Orange')
==>v[8]
gremlin> g.V().id()
==>0
==>4
==>8
gremlin> g.V().hasId(8)
gremlin> g.V().id().is(8)
==>8

You're running into an inconsistency in your graph's handling of ids and I assume that the graph you are using is TinkerGraph. It's default configuration for id lookups is to compare via equals() so, using your example, you can see what's going on:
gremlin> g.V().hasId(0)
gremlin> g.V().hasId(0L)
==>v[0]
So why does this work then:
gremlin> g.V().id().is(0)
==>0
For that answer, we compare the profile() of each traversal:
gremlin> g.V().hasId(0L).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[0]) 1 1 0.755 100.00
>TOTAL - - 0.755 -
gremlin> g.V().id().is(0).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[]) 2 2 0.063 29.62
IdStep 2 2 0.089 41.73
IsStep(eq(0)) 1 1 0.061 28.65
>TOTAL - - 0.214 -
They compile to two different traversals. The first shows that the hasId() is optimized to a single TinkerGraphStep with the id applied to it which means it uses an index lookup (and thus equals()). On the other hand, when you use is() in the fashion that you are, the TinkerGraph query optimizer doesn't make note of that and simply uses a linear scan of ids and a in-memory filter with IsStep. IsStep is smarter about numbers comparisons than TinkerGraphStep is and it just knows that a "0" is a "0" and ignores the type.
You can get the same behavior from TinkerGraph though if you reconfigure its IdManager as discussed in Practical Gremlin and the Reference Documentation:
gremlin> conf = new BaseConfiguration()
==>org.apache.commons.configuration.BaseConfiguration#2c413ffc
gremlin> conf.setProperty("gremlin.tinkergraph.vertexIdManager","LONG")
gremlin> conf.setProperty("gremlin.tinkergraph.edgeIdManager","LONG")
gremlin> conf.setProperty("gremlin.tinkergraph.vertexPropertyIdManager","LONG");[]
gremlin> graph = TinkerGraph.open(conf)
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('Orange').property('price', '1.79').property('location', 'location-0').property('_classname', 'com.microsoft.spring.data.gremlin.common.domain.Orange')
==>v[0]
gremlin> g.V(0)
==>v[0]
gremlin> g.V().hasId(0)
==>v[0]
gremlin> g.V().id().is(0)
==>0

Tinkerpop3 - degree centrality

I'm looking to find the most liked nodes so basically the degree centrality recipe. This query kind of works but I'd like to return the full vertex (including properties) rather than just the id's.
( I am using Tinkerpop 3.0.1-incubating )
g.V()
.where( inE("likes") )
.group()
.by()
.by( inE("likes").count() )
Result
{
"8240": [
2
],
"8280": [
1
],
"12376": [
1
],
"24704": [
1
],
"40976": [
1
]
}

You're probably looking for the order step, using an anonymous traversal passed to the by() modulator:
g.V().order().by(inE('likes').count(), decr)
Note: this will require iterating over all vertices in Titan v1.0.0 and this query cannot be optimized, it will only work over smaller graphs in OLTP.
To get the 10 most liked:
g.V().order().by(inE('likes').count(), decr).limit(10)
If you want to get the full properties, simply chain .valueMap() or .valueMap(true) (for id and label) on the query.
See also:
http://tinkerpop.apache.org/docs/3.0.1-incubating/#order-step
https://groups.google.com/d/topic/gremlin-users/rt3qRKyAqts/discussion

GraphSON, as it is JSON based, does not support the conversion of complex objects to keys. A key in JSON must be string based and, as in this case, cannot be a map. To deal with this JSON limitation, GraphSON converts complex objects that are to be keys via the Java's toString() or by other methods for certain objects like graph elements (which returns a string representation of the element identifier, explaining why you received the output that you did).
If you want to return properties of elements while using GraphSON, you will have to figure out a workaround. In this specific case, you might do:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().group().
......1> by(id).
......2> by(union(__(), outE('knows').count()).fold())
==>[1:[v[1],2],2:[0,v[2]],3:[v[3],0],4:[0,v[4]],5:[v[5],0],6:[0,v[6]]]
In this way you get the vertex identifier as the map id and in the value you get the full vertex plus the count. TinkerPop is working on improving this issue, but I don't expect a fast fix.

Can't delete/remove multiple property keys on Vertex Titan 1.0 Tinkerpop 3

Very basic question,
I just upgraded my Titan from 0.54 to Titan 1.0 Hadoop 1 / TP3 version 3.01.
I encounter a problem with deleting values of
Property key: Cardinality.LIST/SET
Maybe it is due to upgrade process or just my TP3 misunderstanding.
// ----- CODE ------:
tg = TitanFactory.open(c);
TitanManagement mg = tg.openManagement();
//create KEY (Cardinality.LIST) and commit changes
tm.makePropertyKey("myList").dataType(String.class).cardinality( Cardinality.LIST).make();
mg.commit();
//add vertex with multi properties
Vertex v = tg.addVertex();
v.property("myList", "role1");
v.property("myList", "role2");
v.property("myList", "role3");
v.property("myList", "role4");
v.property("myList", "role4");
Now, I want to delete all the values "role1,role2...."
// iterate over all values and try to remove the values
List<String> values = IteratorUtils.toList(v.values("myList"));
for (String val : values) {
v.property("myList", val).remove();
}
tg.tx().commit();
//---------------- THE EXPECTED RESULT ----------:
Empty vertex properties
But unfortunately the result isn't empty:
System.out.println("Values After Delete" + IteratorUtils.toList(v.values("myList")));
//------------------- OUTPUT --------------:
After a delete, values are still apparent!
15:19:59,780 INFO ThriftKeyspaceImpl:745 - Detected partitioner org.apache.cassandra.dht.Murmur3Partitioner for keyspace titan
15:19:59,784 INFO Values After Delete [role1, role2, role3, role4, role4]
Any ideas?

You're not executing graph traversals with the higher level Gremlin API, but you're currently mutating the graph with the lower level graph API. Doing for loops in Gremlin is often an antipattern.
According to the TinkerPop 3.0.1 Drop Step documentation, you should be able to do the following from the Gremlin console:
v = g.addV().next()
g.V(v).property("myList", "role1")
g.V(v).property("myList", "role2")
// ...
g.V(v).properties('myList').drop()

property(key, value) will set the value of the property on the vertex (javadoc). What you should do is get the VertexProperties (javadoc).
for (VertexProperty vp : v.properties("name")) {
vp.remove();
}
#jbmusso offered a solid solution using the GraphTraversal instead.

Titan: comparison between two query approach

What is the performance difference between
g.query().has("city","mumbai").vertices().iterator().next();
here each vertex will have a property city with city name mumbai
and
v.query().direction(Direction.IN).labels("belongTo").vertices();
here v is the vertex for mumbai city and all other vertex is connect to it through edge label belongTo.
I want to do query something like all vertex having city mumbai. Which approach will be better?
The problem is a user can enter anything as city name e.g mumbai or mummbai or mubai so its not possible to varify city name. So for mumbai i will create mumbai mummbai mubai vertex. its very inefficient.
How will you handle this kind of situation?

Titans ElasticSearch integration is great for those kind of fuzzy searches. Here's an example:
g = TitanFactory.open("conf/titan-cassandra-es.properties")
g.makeKey("city").dataType(String.class).indexed("search", Vertex.class).make()
g.makeKey("info").dataType(String.class).make()
g.makeLabel("belongsTo").make()
g.commit()
cities = ["washington", "mumbai", "phoenix", "uruguay", "pompeji"]
cities.each({ city ->
info = "belongs to ${city}"
g.addVertex(["info":info]).addEdge("belongsTo", g.addVertex(["city":city]))
}); g.commit()
info = { it.getElement().in("belongsTo").info.toList() }
userQueries = ["mumbai", "mummbai", "mubai", "phönix"]
userQueries.collectEntries({ userQuery ->
q = "v.city:${userQuery}~"
v = g.indexQuery("search", q).limit(1).vertices().collect(info).flatten()
[userQuery, v]
})
The last query will give you the following result:
==>mumbai=[belongs to mumbai]
==>mummbai=[belongs to mumbai]
==>mubai=[belongs to mumbai]
==>phönix=[belongs to phoenix]
Cheers,
Daniel

Multiple key value not printing

Here I am setting multiple value for key city
Vertex v = g.addVertex(null);
TitanVertex v2=(TitanVertex)v;
v2.addProperty("city", "NY");
v2.addProperty("city", "WS");
v2.addProperty("city", "PER");
g.commit();
Here i am indexing
g.makeKey("city").dataType(String.class).indexed("search", Vertex.class).make();
When I do below
TitanVertex tv = (TitanVertex)vertex;
Iterator<TitanProperty> iterator = tv.getProperties("city").iterator();
while(iterator.hasNext())
{
TitanProperty next = iterator.next();
System.out.println(next.getValue());
}
It only print PER but not NY 'WS`. Why?

Looks like you need to use .list() to create a multi-value key (otherwise the default is a single-value key; see docs).
Unfortunately, I'm not sure you can use multi-value keys in your external index:
gremlin> g.makeKey("city").list().dataType(String.class).indexed("search", Vertex.class).make();
Only standard index is allowed on list property keys
With a standard index, though:
gremlin> g.makeKey("city").list().dataType(String.class).indexed("standard", Vertex.class).make();
==>city
gremlin> v = g.addVertex(null)
==>v[4080012]
gremlin> v.addProperty("city","NY")
==>e[2esPj-h7oE-h4][4080012-city->NY]
gremlin> v.addProperty("city","WS")
==>e[2esPl-h7oE-h4][4080012-city->WS]
gremlin> v.addProperty("city","PER")
==>e[2esPn-h7oE-h4][4080012-city->PER]
gremlin> g.commit()
==>null
gremlin> v.map
==>{city=[NY, WS, PER]}