We are using titan db to store the graph infomation. we have cassandra + es as backend storage and index. We are trying to load the graph data to represent the graph in the webui.
This is the approach i am following.
public JSONObject getGraph(long vertexId, final int depth) throws Exception {
JSONObject json = new JSONObject();
JSONArray vertices = new JSONArray();
JSONArray edges = new JSONArray();
final int currentDepth = 0;
TitanGraph graph = GraphFactory.getInstance().getGraph();
TitanTransaction tx = graph.newTransaction();
try {
GraphTraversalSource g = tx.traversal();
Vertex parent = graphDao.getVertex(vertexId);
loadGraph(g, parent, currentDepth + 1, depth, vertices, edges);
json.put("vertices", vertices);
json.put("edges", edges);
return json;
} catch (Throwable e) {
log.error(e.getMessage(), e);
if (tx != null) {
throw new Exception(e.getMessage(), e);
} finally {
if (tx != null) {
private void loadGraph(final GraphTraversalSource g, final Vertex vertex, final int currentDepth,
final int maxDepth, final JSONArray vertices, final JSONArray edges) throws Exception {
List<Edge> edgeList = g.V(vertex.id()).outE().toList();
if (edgeList == null || edgeList.size() <= 0) {
for (Edge edge : edgeList) {
Vertex child = edge.inVertex();
edges.add(Schema.toJSON(vertex, edge, child));
if (currentDepth < maxDepth) {
loadGraph(g, child, currentDepth + 1, maxDepth, vertices, edges);
But this is taking a bit lot of time for the depth 3 when we have more number of nodes exist in the tree it is taking about 1 min to load the data.
Please help me are there any better mechanisms to load the graph efficiently?

You might see better performance performing your full query across a single traversal execution - e.g., g.V(vertex.id()).outV().outV().outE() for depth 3 - but any vertices with very high edge cardinality are going to make this query slow no matter how you execute it.

To add to the answer by #Benjamin performing one traversal as opposed to many little ones which are constantly expanding will indeed be faster. Titan uses lazy loading so you should take advantage of that.
The next thing I would recommend is to also multithread each of your traversals and writes. Titan actually supports simultaneous writes very nicely. You can achieve this using Transactions.


Spark mapPartitionsToPai execution time

In the current project I am working, we are using spark as computation engine for one of workflows.
Workflow is as follows
We have product catalog being served from several pincodes. User logged in from any particular pin code should be able to see least available cost from all available serving pincodes.
Least cost is calculated as follows
product price+dist(pincode1,pincode2) -
pincode2 being user pincode and pincode1 being source pincode. Apply the above formula for all source pincodes and identify the least available one.
My Core spark logic looks like this
pincodes.javaRDD().cartesian(pincodePrices.javaRDD()).mapPartitionsToPair(new PairFlatMapFunction<Iterator<Tuple2<Row,Row>>, Row, Row>() {
public Iterator<Tuple2<Row, Row>> call(Iterator<Tuple2<Row, Row>> t)
throws Exception {
MongoClient mongoclient = MongoClients.create("mongodb://localhost");
MongoDatabase database = mongoclient.getDatabase("catalogue");
MongoCollection<Document>pincodeCollection = database.getCollection("pincodedistances");
List<Tuple2<Row,Row>> list =new LinkedList<>();
while (t.hasNext()) {
Tuple2<Row, Row>tuple2 = t.next();
Row pinRow = tuple2._1;
Integer srcPincode = pinRow.getAs("pincode");
Row pricesRow = tuple2._2;
Row pricesRow1 = (Row)pricesRow.getAs("leastPrice");
Integer buyingPrice = pricesRow1.getAs("buyingPrice");
Integer quantity = pricesRow1.getAs("quantity");
Integer destPincode = pricesRow1.getAs("pincodeNum");
if(buyingPrice!=null && quantity>0) {
BasicDBObject dbObject = new BasicDBObject();
dbObject.append("sourcePincode", srcPincode);
dbObject.append("destPincode", destPincode);
Number distance;
if(srcPincode.intValue()==destPincode.intValue()) {
distance = 0;
}else {
Document document = pincodeCollection.find(dbObject).first();
distance = document.get("distance", Number.class);
double margin = 0.02;
Long finalPrice = Math.round(buyingPrice+(margin*buyingPrice)+distance.doubleValue());
//Row finalPriceRow = RowFactory.create(finalPrice,quantity);
StructType structType = new StructType();
structType = structType.add("finalPrice", DataTypes.LongType, false);
structType = structType.add("quantity", DataTypes.LongType, false);
Object values[] = {finalPrice,quantity};
Row finalPriceRow = new GenericRowWithSchema(values, structType);
list.add(new Tuple2<Row, Row>(pinRow, finalPriceRow));
return list.iterator();
Long finalPrice1 = priceRow1.getAs("finalPrice");
Long finalPrice2 = priceRow2.getAs("finalPrice");
if(finalPrice1.longValue()<finalPrice2.longValue())return priceRow1;
return priceRow2;
// Business logic to push computed price to mongodb
I am able to get the answer correctly, however mapPartitionsToPair is taking a bit of time(~22 secs for just 12k records).
After browsing internet I found that mapPartitions performs better than mapPartitionsToPair, but I am not sure how to emit (key,value) from mapPartitions and then sort it.
Is there any alternative for above transformations or any better approach is highly appreciated.
Spark Cluster: Standalone(1 executor, 6 cores)

Use mongodb BsonSerializer to serialize and deserialize data

I have complex classes like this:
abstract class Animal { ... }
class Dog: Animal{ ... }
class Cat: Animal{ ... }
class Farm{
public List<Animal> Animals {get;set;}
My goal is to send objects from computer A to computer B
I was able to achieve my goal by using BinaryFormatter serialization. It enabled me to serialize complex classes like Animal in order to transfer objects from computer A to computer B. Serialization was very fast and I only had to worry about placing a serializable attribute on top of my classes. But now BinaryFormatter is obsolete and if you read on the internet future versions of dotnet may remove that.
As a result I have these options:
Use System.Text.Json
This approach does not work well with polymorphism. In other words I cannot deserialize an array of cats and dogs. So I will try to avoid it.
Use protobuf
I do not want to create protobuf map files for every class. I have over 40 classes this is a lot of work. Or maybe there is a converter that I am not aware of? But still how will the converter be smart enough to know that my array of animals can have cats and dogs?
Use Newtonsoft (json.net)
I could use this solution and build something like this: https://stackoverflow.com/a/19308474/637142 . Or even better serialize the objects with a type like this: https://stackoverflow.com/a/71398251/637142. So this will probably be my to go option.
Use MongoDB.Bson.Serialization.BsonSerializer Because I am dealing with a lot of complex objects we are using MongoDB. MongoDB is able to store a Farm object easily. My goal is to retrieve objects from the database in binary format and send that binary data to another computer and use BsonSerializer to deserialize them back to objects.
Have computer B connect to the database remotely. I cannot use this option because one of our requirements is to do everything through an API. For security reasons we are not allowed to connect remotely to the database.
I am hopping I can use step 4. It will be the most efficient because we are already using MongoDB. If we use step 3 which will work we are doing extra steps. We do not need the data in json format. Why not just sent it in binary and deserialize it once it is received by computer B? MongoDB.Driver is already doing this. I wish I knew how it does it.
This is what I have worked so far:
MongoClient m = new MongoClient("mongodb://localhost:27017");
var db = m.GetDatabase("TestDatabase");
var collection = db.GetCollection<BsonDocument>("Farms");
// I have 1s and 0s in here.
var binaryData = collection.Find("{}").ToBson();
// this is not readable
var t = System.Text.Encoding.UTF8.GetString(binaryData);
// how can I convert those 0s and 1s to a Farm object?
var collection = db.GetCollection<RawBsonDocument>(nameof(this.Calls));
var sw = new Stopwatch();
var sb = new StringBuilder();
// get items
IEnumerable<RawBsonDocument>? objects = collection.Find("{}").ToList();
sb.Append("TimeToObtainFromDb: ");
var ms = new MemoryStream();
var largestSixe = 0;
// write data to memory stream for demo purposes. on real example I will write this to a tcpSocket
foreach (var item in objects)
var bsonType = item.BsonType;
// write object
var bytes = item.ToBson();
ushort sizeOfBytes = (ushort)bytes.Length;
if (bytes.Length > largestSixe)
largestSixe = bytes.Length;
var size = BitConverter.GetBytes(sizeOfBytes);
sb.Append("time to serialze into bson to memory: ");
// now on the client side on computer B lets pretend we are deserializing the stream
ms.Position = 0;
var clones = new List<Call>();
byte[] sizeOfArray = new byte[2];
byte[] buffer = new byte[102400]; // make this large because if an document is larger than 102400 bytes it will fail!
while (true)
var i = ms.Read(sizeOfArray, 0, 2);
if (i < 1)
var sizeOfBuffer = BitConverter.ToUInt16(sizeOfArray);
int position = 0;
while (position < sizeOfBuffer)
position = ms.Read(buffer, position, sizeOfBuffer - position);
//using var test = new RawBsonDocument(buffer);
using var test = new RawBsonDocumentWrapper(buffer , sizeOfBuffer);
var identityBson = test.ToBsonDocument();
var cc = BsonSerializer.Deserialize<Call>(identityBson);
sb.Append("time to deserialize from memory into clones: ");
var serializedjs = new List<string>();
foreach(var item in clones)
var foo = item.SerializeToJsStandards();
if (foo.Contains("jaja"))
throw new Exception();
sb.Append("time to serialze into js: ");
foreach(var item in serializedjs)
var obj = item.DeserializeUsingJsStandards<Call>();
if (obj is null)
throw new Exception();
if (obj.IdAccount.Contains("jsfjklsdfl"))
throw new Exception();
catch(Exception ex)
sb.Append("time to deserialize js: ");

Indexing fails on enabling force index in Titan/Janus

I've written a JUnit Test to check against the generate-modern.groovy graph if marko exists.
My gremlin query being
As you can see in the generate-modern.groovy file that indexing is already applied on the name property of the person.
I later made the following
property true in the dynamodb.properties file which blocks whole graph scan thereby making indexing mandatory.
However it throws me the following exception
org.janusgraph.core.JanusGraphException: Could not find a suitable index to answer graph query and graph scans are disabled: [(name = marko)]:VERTEX
The above exception is raised from the following StandardJanusGraphTx class's method
public Iterator<JanusGraphElement> execute(final GraphCentricQuery query, final JointIndexQuery indexQuery, final Object exeInfo, final QueryProfiler profiler) {
Iterator<JanusGraphElement> iter;
if (!indexQuery.isEmpty()) {
List<QueryUtil.IndexCall<Object>> retrievals = new ArrayList<QueryUtil.IndexCall<Object>>();
for (int i = 0; i < indexQuery.size(); i++) {
final JointIndexQuery.Subquery subquery = indexQuery.getQuery(i);
retrievals.add(new QueryUtil.IndexCall<Object>() {
public Collection<Object> call(int limit) {
final JointIndexQuery.Subquery adjustedQuery = subquery.updateLimit(limit);
try {
return indexCache.get(adjustedQuery, new Callable<List<Object>>() {
public List<Object> call() throws Exception {
return QueryProfiler.profile(subquery.getProfiler(), adjustedQuery, q -> indexSerializer.query(q, txHandle));
} catch (Exception e) {
throw new JanusGraphException("Could not call index", e.getCause());
List<Object> resultSet = QueryUtil.processIntersectingRetrievals(retrievals, indexQuery.getLimit());
iter = com.google.common.collect.Iterators.transform(resultSet.iterator(), getConversionFunction(query.getResultType()));
} else {
if (config.hasForceIndexUsage()) throw new JanusGraphException("Could not find a suitable index to answer graph query and graph scans are disabled: " + query);
log.warn("Query requires iterating over all vertices [{}]. For better performance, use indexes", query.getCondition());
QueryProfiler sub = profiler.addNested("scan");
switch (query.getResultType()) {
case VERTEX:
return (Iterator) getVertices().iterator();
case EDGE:
return (Iterator) getEdges().iterator();
return new VertexCentricEdgeIterable(getInternalVertices(),RelationCategory.PROPERTY).iterator();
throw new IllegalArgumentException("Unexpected type: " + query.getResultType());
return iter;
As you can observe from the method that the exception is raised when the JointIndexQuery object is empty(arrayList being empty) and force index is true.
The problem is why the list is empty? when we have specified the indexing query against the name property in the generate-modern.groovy while querying from a JUnit Test.This works fine meaning the list is not empty when the same data is being preloaded into the gremlin server with the same file.
The personByName index definition uses a label constraint.
def personByName = mgmt.buildIndex("personByName", Vertex.class).addKey(name).indexOnly(person).buildCompositeIndex()
In order to take advantage of that index, you must use the label and the property. For example:
g.V().has('person', 'name', 'marko')
You can read more about this in the JanusGraph documentation http://docs.janusgraph.org/latest/indexes.html#_label_constraint

Graph processing increasingly gets slower on titan + dynamoDB (local) as more vertices/edges are added

I am working with titan 1.0 using AWS dynamoDB local implementation as storage backend on a 16GB machine. My use case involves generating graphs periodically containing vertices & edges in the order of 120K. Every time I generate a new graph in-memory, I check the graph stored in DB and either (i) add vertices/edges that do not exist, or (ii) update properties if they already exist (existence is determined by 'Label' and a 'Value' attribute). Note that the 'Value' property is indexed. Transactions are committed in batches of 500 vertices.
Problem: I find that this process gets slower each time I process a new graph (1st graph finished in 45 mins with empty db initially, 2nd took 2.5 hours, 3rd in 3.5 hours, 4th in 6 hours, 5th in 10 hours and so on). In fact, when processing a given graph, it is fairly quick at start time but progressively gets slower (initial batches take 2-4 secs and later on it increases to 100s of seconds for same batch size of 500 nodes; I also see sometimes it takes 1000-2000 secs for a batch). This is the processing time alone (see approach below); commit takes between 8-10 secs always. I configured the jvm heap size to 10G, and I notice that when the app is running it is eventually using up all of it.
Question: Is this behavior to be expected? It seems to me something is wrong here (either in my config / approach?). Any help or suggestions would be greatly appreciated.
Starting from the root node of the in-memory graph, I retrieve all child nodes and maintain a queue
For each child node, I check to see if it exists in DB, else create new node, and update some properties
Vertex dbVertex = dbgraph.traversal().V()
.has(currentVertexInMem.label(), "Value",
(String) currentVertexInMem.value("Value"))
.orElseGet(() -> createVertex(dbgraph, currentVertexInMem));
if (dbVertex != null) {
// Update Properties
updateVertexProperties(dbgraph, currentVertexInMem, dbVertex);
// Add edge if necessary
if (parentDBVertex != null) {
GraphTraversal<Vertex, Edge> edgeIt = graph.traversal().V(parentDBVertex).outE()
.has("EdgeProperty1", eProperty1) // eProperty1 is String input parameter
.has("EdgeProperty2", eProperty2); // eProperty2 is Long input parameter
Boolean doCreateEdge = true;
Edge e = null;
while (edgeIt.hasNext()) {
e = edgeIt.next();
if (e.inVertex().equals(dbVertex)) {
doCreateEdge = false;
if (doCreateEdge) {
e = parentDBVertex.addEdge("EdgeLabel", dbVertex, "EdgeProperty1", eProperty1, "EdgeProperty2", eProperty2);
e = null;
it = null;
if ((processedVertexCount.get() % 500 == 0)
|| processedVertexCount.get() == verticesToProcess.get()) {
Create function:
public static Vertex createVertex(Graph graph, Vertex clientVertex) {
Vertex newVertex = null;
switch (clientVertex.label()) {
case "Label 1":
newVertex = graph.addVertex(T.label, clientVertex.label(), "Value",
"Property1-1", clientVertex.value("Property1-1"),
"Property1-2", clientVertex.value("Property1-2"));
case "Label 2":
newVertex = graph.addVertex(T.label, clientVertex.label(), "Value",
clientVertex.value("Value"), "Property2-1",
"Property2-2", clientVertex.value("Property2-2"));
newVertex = graph.addVertex(T.label, clientVertex.label(), "Value",
return newVertex;
Schema Def: (Showing some of the indexes)
"EdgeLabel" = Constants.EdgeLabels.Uses
"EdgeProperty1" = Constants.EdgePropertyKeys.EndpointId
"EdgeProperty2" = Constants.EdgePropertyKeys.Timestamp
public void createSchema() {
// Create Schema
TitanManagement mgmt = dbgraph.openManagement();
// Vertex Properties
PropertyKey value = mgmt.getPropertyKey(Constants.VertexPropertyKeys.Value);
if (value == null) {
value = mgmt.makePropertyKey(Constants.VertexPropertyKeys.Value).dataType(String.class).make();
mgmt.buildIndex(Constants.GraphIndexes.ByValue, Vertex.class).addKey(value).buildCompositeIndex(); // INDEX
PropertyKey shapeSet = mgmt.getPropertyKey(Constants.VertexPropertyKeys.ShapeSet);
if (shapeSet == null) {
shapeSet = mgmt.makePropertyKey(Constants.VertexPropertyKeys.ShapeSet).dataType(String.class).cardinality(Cardinality.SET).make();
mgmt.buildIndex(Constants.GraphIndexes.ByShape, Vertex.class).addKey(shapeSet).buildCompositeIndex();
// Edge Labels and Properties
EdgeLabel uses = mgmt.getEdgeLabel(Constants.EdgeLabels.Uses);
if (uses == null) {
uses = mgmt.makeEdgeLabel(Constants.EdgeLabels.Uses).multiplicity(Multiplicity.MULTI).make();
PropertyKey timestampE = mgmt.getPropertyKey(Constants.EdgePropertyKeys.Timestamp);
if (timestampE == null) {
timestampE = mgmt.makePropertyKey(Constants.EdgePropertyKeys.Timestamp).dataType(Long.class).make();
PropertyKey endpointIDE = mgmt.getPropertyKey(Constants.EdgePropertyKeys.EndpointId);
if (endpointIDE == null) {
endpointIDE = mgmt.makePropertyKey(Constants.EdgePropertyKeys.EndpointId).dataType(String.class).make();
// Indexes
mgmt.buildEdgeIndex(uses, Constants.EdgeIndexes.ByEndpointIDAndTimestamp, Direction.BOTH, endpointIDE,
The behavior you experience is expected. Today, DynamoDB Local is a testing tool built on SQLite. If you need to support high TPS for large and periodic data loads, I recommend you use the DynamoDB service.

Bulk removal of Edges on Titan 1.0

I have a long list of edge IDs (about 12 billion) that I am willing to remove from my Titan graph (which is hosted on an HBase backend).
How can I do it quickly and efficiently?
I tried removing the edges via Gremlin, but that is too slow for that amount of edges.
Is it possible to directly perform Delete commands on HBase? How can I do it? (How do I assemble the Key to delete?)
After two days of research, I came up with a solution.
The main purpose - given a very large collection of string edgeIds, implementing a logics which removes them from the graph -
The implementation has to support a removal of billions of edges, so it must be efficient in memory and time.
Direct usage of Titan is disqualified, since Titan performs a lot of unnecessary instantiations which are redundant -- generally, we don't want to load the edges, we just want to remove them from HBase.
* Deletes the given edge IDs, by splitting it to chunks of 100,000
* #param edgeIds Collection of edge IDs to delete
* #throws IOException
public static void deleteEdges(Iterator<String> edgeIds) throws IOException {
IDManager idManager = new IDManager(NumberUtil.getPowerOf2(GraphDatabaseConfiguration.CLUSTER_MAX_PARTITIONS.getDefaultValue()));
byte[] columnFamilyName = "e".getBytes(); // 'e' is your edgestore column-family name
long deletionTimestamp = System.currentTimeMillis();
int chunkSize = 100000; // Will contact HBase only once per 100,000 records two deletes (=> 50,000 edges, since each edge is removed one time as IN and one time as OUT)
org.apache.hadoop.conf.Configuration config = new org.apache.hadoop.conf.Configuration();
config.set("hbase.zookeeper.quorum", "YOUR-ZOOKEEPER-HOSTNAME");
config.set("hbase.table", "YOUR-HBASE-TABLE");
List<Delete> deletions = Lists.newArrayListWithCapacity(chunkSize);
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf(config.get("hbase.table")));
Iterators.partition(edgeIds, chunkSize)
.forEachRemaining(edgeIdsChunk -> deleteEdgesChunk(edgeIdsChunk, deletions, table, idManager,
columnFamilyName, deletionTimestamp));
* Given a collection of edge IDs, and a list of Delete object (that is cleared on entrance),
* creates two Delete objects for each edge (one for IN and one for OUT),
* and deletes it via the given Table instance
public static void deleteEdgesChunk(List<String> edgeIds, List<Delete> deletions, Table table, IDManager idManager,
byte[] columnFamilyName, long deletionTimestamp) {
for (String edgeId : edgeIds)
RelationIdentifier identifier = RelationIdentifier.parse(edgeId);
deletions.add(createEdgeDelete(idManager, columnFamilyName, deletionTimestamp, identifier.getRelationId(),
identifier.getTypeId(), identifier.getInVertexId(), identifier.getOutVertexId(),
deletions.add(createEdgeDelete(idManager, columnFamilyName, deletionTimestamp, identifier.getRelationId(),
identifier.getTypeId(), identifier.getOutVertexId(), identifier.getInVertexId(),
try {
catch (IOException e)
logger.error("Failed to delete a chunk due to inner exception: " + e);
* Creates an HBase Delete object for a specific edge
* #return HBase Delete object to be used against HBase
private static Delete createEdgeDelete(IDManager idManager, byte[] columnFamilyName, long deletionTimestamp,
long relationId, long typeId, long vertexId, long otherVertexId,
IDHandler.DirectionID directionID) {
byte[] vertexKey = idManager.getKey(vertexId).getBytes(0, 8); // Size of a long
byte[] edgeQualifier = makeQualifier(relationId, otherVertexId, directionID, typeId);
return new Delete(vertexKey)
.addColumn(columnFamilyName, edgeQualifier, deletionTimestamp);
* Cell Qualifier for a specific edge
private static byte[] makeQualifier(long relationId, long otherVertexId, IDHandler.DirectionID directionID, long typeId) {
WriteBuffer out = new WriteByteBuffer(32); // Default length of array is 32, feel free to increase
IDHandler.writeRelationType(out, typeId, directionID, false);
VariableLong.writePositiveBackward(out, otherVertexId);
VariableLong.writePositiveBackward(out, relationId);
return out.getStaticBuffer().getBytes(0, out.getPosition());
Keep in mind that I do not consider System Types and so -- I assume that the given edge IDs are user-edges.
Using this implementation I was able to remove 20 million edges in about 2 minutes.