Indexing fails on enabling force index in Titan/Janus - titan

I've written a JUnit Test to check against the generate-modern.groovy graph if marko exists.
My gremlin query being
"g.V().has('name','marko')";
As you can see in the generate-modern.groovy file that indexing is already applied on the name property of the person.
I later made the following
query.force-index=true
property true in the dynamodb.properties file which blocks whole graph scan thereby making indexing mandatory.
However it throws me the following exception
org.janusgraph.core.JanusGraphException: Could not find a suitable index to answer graph query and graph scans are disabled: [(name = marko)]:VERTEX
The above exception is raised from the following StandardJanusGraphTx class's method
#Override
public Iterator<JanusGraphElement> execute(final GraphCentricQuery query, final JointIndexQuery indexQuery, final Object exeInfo, final QueryProfiler profiler) {
Iterator<JanusGraphElement> iter;
if (!indexQuery.isEmpty()) {
List<QueryUtil.IndexCall<Object>> retrievals = new ArrayList<QueryUtil.IndexCall<Object>>();
for (int i = 0; i < indexQuery.size(); i++) {
final JointIndexQuery.Subquery subquery = indexQuery.getQuery(i);
retrievals.add(new QueryUtil.IndexCall<Object>() {
#Override
public Collection<Object> call(int limit) {
final JointIndexQuery.Subquery adjustedQuery = subquery.updateLimit(limit);
try {
return indexCache.get(adjustedQuery, new Callable<List<Object>>() {
#Override
public List<Object> call() throws Exception {
return QueryProfiler.profile(subquery.getProfiler(), adjustedQuery, q -> indexSerializer.query(q, txHandle));
}
});
} catch (Exception e) {
throw new JanusGraphException("Could not call index", e.getCause());
}
}
});
}
List<Object> resultSet = QueryUtil.processIntersectingRetrievals(retrievals, indexQuery.getLimit());
iter = com.google.common.collect.Iterators.transform(resultSet.iterator(), getConversionFunction(query.getResultType()));
} else {
if (config.hasForceIndexUsage()) throw new JanusGraphException("Could not find a suitable index to answer graph query and graph scans are disabled: " + query);
log.warn("Query requires iterating over all vertices [{}]. For better performance, use indexes", query.getCondition());
QueryProfiler sub = profiler.addNested("scan");
sub.setAnnotation(QueryProfiler.QUERY_ANNOTATION,indexQuery);
sub.setAnnotation(QueryProfiler.FULLSCAN_ANNOTATION,true);
sub.setAnnotation(QueryProfiler.CONDITION_ANNOTATION,query.getResultType());
switch (query.getResultType()) {
case VERTEX:
return (Iterator) getVertices().iterator();
case EDGE:
return (Iterator) getEdges().iterator();
case PROPERTY:
return new VertexCentricEdgeIterable(getInternalVertices(),RelationCategory.PROPERTY).iterator();
default:
throw new IllegalArgumentException("Unexpected type: " + query.getResultType());
}
}
return iter;
}
};
As you can observe from the method that the exception is raised when the JointIndexQuery object is empty(arrayList being empty) and force index is true.
The problem is why the list is empty? when we have specified the indexing query against the name property in the generate-modern.groovy while querying from a JUnit Test.This works fine meaning the list is not empty when the same data is being preloaded into the gremlin server with the same file.

The personByName index definition uses a label constraint.
def personByName = mgmt.buildIndex("personByName", Vertex.class).addKey(name).indexOnly(person).buildCompositeIndex()
In order to take advantage of that index, you must use the label and the property. For example:
g.V().has('person', 'name', 'marko')
You can read more about this in the JanusGraph documentation http://docs.janusgraph.org/latest/indexes.html#_label_constraint

Related

Webflux - return Flux or error after a condition

I'm learning reactive programming with webflux, and for that I'm migrating some code.
For example I'm trying to migrate this method:
public Set<Vaccine> getAll(Set<Long> vaccinesIds) throws EntityNotFoundException {
if (null == vaccinesIds) {
return null;
}
Set<Long> vaccinesToFind = new HashSet<>(vaccinesIds);
vaccinesToFind.remove(null);
Set<Vaccine> vaccines = new HashSet<>();
vaccineRepository.findByIdIn(vaccinesToFind).forEach(vaccines::add);
if (vaccines.size() != vaccinesToFind.size()) {
LOG.warn("Could not find vaccines with ids: " + vaccinesToFind.removeAll(vaccines.stream().map(Vaccine::getId).collect(Collectors.toSet())));
throw new EntityNotFoundException(VACCINE_ERROR_NOT_FOUND);
}
return vaccines;
}
To summarize the code, if the respository returns all the vaccines that are requested should return the result, if not should return an error.
For that, I thought in something like this, but is not working:
public Flux<Vaccine> getAll(Set<Long> vaccinesIds) {
if (null == vaccinesIds) {
return Flux.empty();
}
Set<Long> vaccinesToFind = new HashSet<>(vaccinesIds);
Flux<Vaccine> byIdIn = vaccineRepository.findByIdIn(vaccinesToFind);
Mono<Long> filter = vaccineRepository.findByIdIn(vaccinesToFind).count().filter(x -> x.equals(Long.valueOf(vaccinesToFind.size())));
return filter.flatMapMany(asd -> vaccineRepository.findByIdIn(vaccinesToFind)
).switchIfEmpty(Flux.error((new EntityNotFoundException(VACCINE_ERROR_NOT_FOUND))));
}
What am I doing wrong?
My first doubt is why the filter is a Mono of Long if it has a equals method in the end. My problem is about evaluating the filter in order to return the list or the error.
First of all, you are querying the same result vaccineRepository.findByIdIn(vaccinesToFind) multiple times. The same data is queried, transferred and deserialized multiple times. This is a sign that something is wrong here.
Let's assume the result set fits into the memory. Then the idea would be to transform flux into a usual collection and to decide whether to emit an error or not:
return vaccineRepository.findByIdIn(vaccinesIds)
.collectList()
.flatMapMany(result -> {
if(result.size() == vaccinesIds.size()) return Flux.fromIterable(result);
else return Flux.error(new EntityNotFoundException(VACCINE_ERROR_NOT_FOUND));
});
In the case the result is to huge for the main memory, you could do count in the db by the first query and in the positive case query the results. The solution is similar to your code:
return vaccineRepository.countByIdIn(vaccinesIds)
.filter(count -> count == vaccinesIds.size())
.flatMapMany($ -> vaccineRepository.findByIdIn(vaccinesIds))
.switchIfEmpty(Mono.error(new EntityNotFoundException(VACCINE_ERROR_NOT_FOUND)));
The result of filter is Mono<Long> because filter just takes the elements from the upstream and tests against the given predicate. If the predicate returns false, the item is filtered out and the Mono is empty. To keep all results of a the test you could use map and the type would be Mono<Boolean>.

Using a Beakerx Custom Magic

I've created a custom Magic command with the intention of generating a spark query programatically. Here's the relevant part of my class that implements the MagicCommandFunctionality:
MagicCommandOutcomeItem execute(MagicCommandExecutionParam magicCommandExecutionParam) {
// get the string that was entered:
String input = magicCommandExecutionParam.command.substring(MAGIC.length())
// use the input to generate a query
String generatedQuery = Interpreter.interpret(input)
MIMEContainer result = Text(generatedQuery);
return new MagicCommandOutput(MagicCommandOutcomeItem.Status.OK, result.getData().toString());
}
This works splendidly. It returns the command that I generated. (As text)
My question is -- how do I coerce the notebook into evaluating that value in the cell? My guess is that a SimpleEvaluationObject and TryResult are involved, but I can't find any examples of their use
Rather than creating the MagicCommandOutput I probably want the Kernel to create one for me. I see that the KernelMagicCommand has an execute method that would do that. Anyone have any ideas?
Okay, I found one way to do it. Here's my solution:
You can ask the current kernelManager for the kernel you're interested in,
then call PythonEntryPoint.evaluate. It seems to do the job!
#Override
MagicCommandOutcomeItem execute(MagicCommandExecutionParam magicCommandExecutionParam) {
String input = magicCommandExecutionParam.command.substring(MAGIC.length() + 1)
// this is the Scala code I want to evaluate:
String codeToExecute = <your code here>
KernelFunctionality kernel = KernelManager.get()
PythonEntryPoint pep = kernel.getPythonEntryPoint(SCALA_KERNEL)
pep.evaluate(codeToExecute)
pep.getShellMsg()
List<Message> messages = new ArrayList<>()
//until there are messages on iopub channel available collect them into response
while (true) {
String iopubMsg = pep.getIopubMsg()
if (iopubMsg == "null") break
try {
Message msg = parseMessage(iopubMsg) //(I didn't show this part)
messages.add(msg)
String commId = (String) msg.getContent().get("comm_id")
if (commId != null) {
kernel.addCommIdManagerMapping(commId, SCALA_KERNEL)
}
} catch (IOException e) {
log.error("There was an error: ${e.getMessage()}")
return new MagicKernelResponse(MagicCommandOutcomeItem.Status.ERROR, messages)
}
}
return new MagicKernelResponse(MagicCommandOutcomeItem.Status.OK, messages)
}

Extracting the values of elements (XML) which is in an array and put it in the message property in Camel

As you can see the Availability Flag, BeamId is getting repeated. How do I traverse and set the property for Availability Flag1 and so on, so that I can later fetch it with velocity template?
Payload:<ns2:TransportFeasibilityResponse>
<ns2:Parameters>
<ns2:AvailabilityFlag>true</ns2:AvailabilityFlag>
<ns2:SatellitedID>H1B</ns2:SatellitedID>
<ns2:BeamID>675</ns2:BeamID>
<ns2:TransportName>Earth</ns2:TransportName>
</ns2:FeasibilityParameters>
<ns2:Parameters>
<ns2:AvailabilityFlag>true</ns2:AvailabilityFlag>
<ns2:SatellitedID>J34</ns2:SatellitedID>
<ns2:BeamID>111</ns2:BeamID>
<ns2:TransportName>Jupiter</ns2:TransportName>
</ns2:Parameters>
</ns2:TransportFeasibilityResponse>
</ns2:TransportFeasibilityResponseMsg>
Code: (Its not complete)
public static HashMap<String,String> extractNameValueToProperties(String msgBody, selectedKeyList, namelist) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setExpandEntityReferences(false);
factory.setNamespaceAware(true);
Document doc = null;
try{
DocumentBuilder builder = factory.newDocumentBuilder();
doc = builder.parse(new InputSource(new StringReader(msgBody)));
} catch(Exception ex) {
Exception actException = new Exception( "Exception while extracting tagvalues", ex);
throw actException;
}
HashMap<String,String> tagNameValueMap = new HashMap<String,String>();
NodeList nodeList = doc.getElementsByTagName("*");
// Trying to enter the TransportFeasibilityResponse element
for (int i = 0; i < nodeList.getLength(); i++) {
Node indNode = nodeList.item(i);
if (indNode.indexOf(String name)>-1);
//checking for Availability flag and similar namelist
dataKey = indNode.getTextContent();
message.setProperty(selectedKeyList[k], dataKey);
k++;
j++;
else
{
continue;
}
}
}
Here,
I am setting these values in my route:
<setProperty propertyName="namelist">
<constant>AvailabilityFlag,SatellitedID,BeamID</constant>
</setProperty>
<setProperty propertyName="selectedKeyList">
<constant>AvailabilityFlag1,SatellitedID1,BeamID1,AvailabilityFlag2,SatellitedID2,BeamID2 </constant>
</setProperty>
<bean beanType="com.gdg.dgdgdg.javacodename" method="extractNameValueToProperties"/>
Question: Please tell me how I can parse through the repeating elements and assign it to the property?
Thanks
I'm not sure if I understand your question correctly, but I think you could use the Splitter pattern to split your xml per Parameters tag and process each other separately and aggregate it later.
Take for example this input:
<TransportFeasibilityResponse>
<Parameters>
<AvailabilityFlag>true</AvailabilityFlag>
<SatellitedID>H1B</SatellitedID>
<BeamID>675</BeamID>
<TransportName>Earth</TransportName>
</Parameters>
<Parameters>
<AvailabilityFlag>true</AvailabilityFlag>
<SatellitedID>J34</SatellitedID>
<BeamID>111</BeamID>
<TransportName>Jupiter</TransportName>
</Parameters>
</TransportFeasibilityResponse>
A route to process this input could be something like this:
from("direct:start")
.split(xpath("/TransportFeasibilityResponse/Parameters"), new AggregationStrategy() {
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
List<String> beamIDs = null;
if (oldExchange == null) { // first
beamIDs = new ArrayList<String>();
} else {
beamIDs = oldExchange.getIn().getBody(List.class);
}
beamIDs.add(newExchange.getIn().getBody(String.class));
newExchange.getIn().setBody(beamIDs);
return newExchange;
}
})
.setBody(xpath("/Parameters/BeamID/text()"))
.end()
.log("The final body: ${body}");
First, we split the input per Parameters tag, and then extract the BeamID from it. After that, the AggregationStrategy aggregates each message into one, grouping by BeamID.
The final message should have the a body like this:
675,111
The data I put in the body just for an example, but you could set anywhere you want into the Exchange you are manipulating inside the AggregationStrategy implementation.

Load the graph from the titan db for a specified depth in a single or efficitent query

We are using titan db to store the graph infomation. we have cassandra + es as backend storage and index. We are trying to load the graph data to represent the graph in the webui.
This is the approach i am following.
public JSONObject getGraph(long vertexId, final int depth) throws Exception {
JSONObject json = new JSONObject();
JSONArray vertices = new JSONArray();
JSONArray edges = new JSONArray();
final int currentDepth = 0;
TitanGraph graph = GraphFactory.getInstance().getGraph();
TitanTransaction tx = graph.newTransaction();
try {
GraphTraversalSource g = tx.traversal();
Vertex parent = graphDao.getVertex(vertexId);
loadGraph(g, parent, currentDepth + 1, depth, vertices, edges);
json.put("vertices", vertices);
json.put("edges", edges);
return json;
} catch (Throwable e) {
log.error(e.getMessage(), e);
if (tx != null) {
tx.rollback();
}
throw new Exception(e.getMessage(), e);
} finally {
if (tx != null) {
tx.close();
}
}
}
private void loadGraph(final GraphTraversalSource g, final Vertex vertex, final int currentDepth,
final int maxDepth, final JSONArray vertices, final JSONArray edges) throws Exception {
vertices.add(toJSONvertex));
List<Edge> edgeList = g.V(vertex.id()).outE().toList();
if (edgeList == null || edgeList.size() <= 0) {
return;
}
for (Edge edge : edgeList) {
Vertex child = edge.inVertex();
edges.add(Schema.toJSON(vertex, edge, child));
if (currentDepth < maxDepth) {
loadGraph(g, child, currentDepth + 1, maxDepth, vertices, edges);
}
}
}
But this is taking a bit lot of time for the depth 3 when we have more number of nodes exist in the tree it is taking about 1 min to load the data.
Please help me are there any better mechanisms to load the graph efficiently?
You might see better performance performing your full query across a single traversal execution - e.g., g.V(vertex.id()).outV().outV().outE() for depth 3 - but any vertices with very high edge cardinality are going to make this query slow no matter how you execute it.
To add to the answer by #Benjamin performing one traversal as opposed to many little ones which are constantly expanding will indeed be faster. Titan uses lazy loading so you should take advantage of that.
The next thing I would recommend is to also multithread each of your traversals and writes. Titan actually supports simultaneous writes very nicely. You can achieve this using Transactions.

Extending TokenStream

I am trying to index into a document a field with one term that has a payload.
Since the only constructor of Field that can work for me takes a TokenStream, I decided to inherit from this class and give the most basic implementation for what I need:
public class MyTokenStream : TokenStream
{
TermAttribute termAtt;
PayloadAttribute payloadAtt;
bool moreTokens = true;
public MyTokenStream()
{
termAtt = (TermAttribute)GetAttribute(typeof(TermAttribute));
payloadAtt = (PayloadAttribute)GetAttribute(typeof(PayloadAttribute));
}
public override bool IncrementToken()
{
if (moreTokens)
{
termAtt.SetTermBuffer("my_val");
payloadAtt.SetPayload(new Payload(/*bye[] data*/));
moreTokens = false;
}
return false;
}
}
The code which was used while indexing:
IndexWriter writer = //init tndex writer...
Document d = new Document();
d.Add(new Field("field_name", new MyTokenStream()));
writer.AddDocument(d);
writer.Commit();
And the code that was used during the search:
IndexSearcher searcher = //init index searcher
Query query = new TermQuery(new Term("field_name", "my_val"));
TopDocs result = searcher.Search(query, null, 10);
I used the debugger to verify that call to IncrementToken() actually sets the TermBuffer.
My problem is that the returned TopDocs instance returns no documents, and I cant understand why... Actually I started from TermPositions (which gives me approach to the Payload...), but it also gave me no results.
Can someone explain to me what am I doing wrong?
I am currently using Lucene .NET 2.9.2
After you set the TermBuffer you need to return true from IncrementToken, you return false when you have nothing to feed the TermBuffer with anymore