Titan: How to efficienlty get the maximum value of a Long property? - titan

So if I want to retrieve the vertex that has the maximum value of a Long property, I should run:
graph.traversal().V().has("type","myType").values("myProperty").max().next()
This is really slow as it has to load all vertices to find out the maximum value. Is there any way faster?
Any indexing would help? I believe composite indexes won't help but is there a way to do it using mixed index with ElasticSearch back end?

Using Titan to create a Mixed Index on a numeric value will result in Elasticsearch indexing the property correctly. Kind of similarly to you, we want to know all our vertices ordered by a property DEGREE from max to min so we currently do the following for the property DEGREE:
TitanGraph titanGraph = TitanFactory.open("titan-cassandra-es.properties");
TitanManagement management = graph.openManagement();
PropertyKey degreeKey = management.makePropertyKey("DEGREE").dataType(Long.class).make();
management.buildIndex("byDegree", Vertex.class)
.addKey(degreeKey)
.buildMixedIndex("search");
We are currently having issues getting Titan to traverse this quickly (for some reason it can create the index but struggles to use it for certain queries) but we can query Elasticsearch directly:
curl -XGET 'localhost:9200/titan/byDegree/_search?size=80' -d '
{
"sort" : [
{ "DEGREE" : {"order" : "desc"}}
],
"query" : {
}
}
The answer is returned extremely quickly so for now we create the index with Titan but query elastic search directly.
Short Answer: Elasticsearch can do what is needed with numeric ranges very easily, the problem on our side at least seems to be getting Titan to use these indices fully. However the traversal you are trying to execute is simpler than ours (you just want the max) so you may not encounter these issues and you may just be able to stick with Titan traversals fully.
Edit:
I have recently confirmed that elasticsearch and titan can fulfill your needs (as it does mine). Just be wary of how you create your indices. Titan will be able to execute your query quickly as long as you create your Mixed index with the Type key being set to a String match not a Text Match.

Related

MongoDB Compass : can't filter a specific ID

I have a MongoDB Collection build like this :
When I try to filter by ID, I got inconsistent results. For exemple, when I try to get the first entry, by typing in filter query :
{_id:209383449381830657}
I got no result found.
But if I type for exemple the third, It work correctly.
{_id:191312485326913536}
I searched if it's due to too large int but no, all _id value are Int64 and it's the same code that generate all entries.
I really don't know why I got this result and that why I am asking here.
EDIT:
All entries have same type.
No limit are set in query.
If I type {_id:{"$gte":209383449381830657}} it found the entry, but don't if I type {_id:{"$eq":209383449381830657}}
MongoDB Compass uses mongo's node.js driver.
Both 209383449381830657 and 191312485326913536 exceed the javascript max safe integer of (2^53-1).
Javascript does not handle numbers larger than that in a consistent manner.
Note that in your documents, those numbers are reported as $numberLong, indicating that they are not using javascript's default floating point numeric representation.
To query these number consistently, use the NumberLong constructor when querying, like
{_id:NumberLong("209383449381830657")}
It depends on the version of MongoDB Compass you have (I assume it's +- latest)
As I see, you are using NumberLong for _id field.
You might use NumberLong("...") to handle such fields in MongoDB.
So, better to try search like this
{_id: NumberLong("209383449381830657")}
Can't say for sure how MongoDB Compass handles number you are trying to pass here.
As it should automatically cast values to int32 or int64, but it might be a bit tricky. So, better to define it by yourself to avoid unexpected results.
Some time ago I read an article that it tries to cast to int32, but if it's more than int32 might handle -> then it uses this function
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/isSafeInteger
and tries to cast it to int64. My speculation is that problem lives here, but can't say for sure.
In my case the _id contain hex digits.
Filter {_id: ObjectId("62470af3036b9f058552032e")} works

Titan Warning: Query requires iterating over all vertices

Below I am adding cdate index and then some data:
baseGraph.makeKey("cdate").dataType(Long.class).indexed(Vertex.class).make();
for(int i=0;i<20;i++){
Vertex page = g.addVertex("P0"+i);
page.setProperty("cdate", new Date().getTime());
page.setProperty("pName","pName-P0"+i);
Edge e =g.addEdge(null, user, page, "created");
e.setProperty("time", i);
Thread.sleep(2000);
}
for(int i=20;i<25;i++){
Vertex page = g.addVertex("P0"+i);
page.setProperty("cdate", new Date().getTime());
page.setProperty("pName","pName-P0"+i);
Edge e =g.addEdge(null, user, page, "notcreated");
e.setProperty("time", i);
Thread.sleep(2000);
}
g.commit();
Now when I run following query:
Iterable<Vertex> vertices = g.query().interval("cdate",0,time).
orderBy("cdate", Order.DESC).limit(5).vertices();
It gives the output in correct order but It shows:
WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx -
Query requires iterating over all vertices [(cdate >= 0 AND cdate < 1392198350796)].
For better performance, use indexes
But I have already defined cdate as index (see top line).
In your type definition for cdate you are using Titan's standard index (by not specifying any other index). Titan's standard index only supports equality comparisons (i.e. no range queries).
To get support for range queries, you need to use an indexing backend that must be registered with Titan and then explicitly reference it in the type definition.
Check out the documentation on this page Chapter 8. Indexing for better Performance:
Titan supports two different kinds of indexing to speed up query processing: graph indexes and vertex-centric indexes. Most graph queries start the traversal from a list of vertices or edges that are identified by their properties. Graph indexes make these global retrieval operations efficient on large graphs. Vertex-centric indexes speed up the actual traversal through the graph, in particular when traversing through vertices with many incident edges.
Bottom line: Titan supports multiple types of indexes and it automatically picks the most suitable index to answer a particular query. In your case, there is none that supports range queries, hence the warning and slow performance. The documentation above outlines how to register additional indexes that provide the support you need.

query source and destination with geospatial indexing mongodb mongomapper

I have source and destination fields in my mongodb collection named Flight. Both the fields are geospatially indexed and written in lat-long format. I am using mongomapper to query it from rails controller.
I want to write a query like following.
Result= Flight.where(:source => {'$near' => location_src} , :destination => {'$near' => location_dest} )
where location_src and location_dest are gui inputs in lat-long format.
However, when I try to access the Result by writing Result.first or Result.all, it says that
Mongo::OperationFailure: can't have 2 special fields .
Can anyone suggest me what could be the workaround?
Kind Regards,
Amrish.
I found a simple workaround which serves my purpose and might be helpful to someone facing the similar scenario.
Though mongodb does not support geospatial query on multiple fields simultaneously, it still supports geospatial indexing on multiple fields simultaneously. So, we would run query on both the fields one after another as follows.
start_place : name of starting place
start_place_loc : co-ordinates of the starting location (indexed geospatial field)
end_place : name of destination place
end_place_loc : co-ordinates of the destination location (indexed geospatial field)
Following is the example code.
tmp=Flight.distinct(:start_place,:start_place_loc => {'$near' => location_src} )
Result=Flight.where(:start_place_loc => {'$near' => location_src} , :start_place => {'$in' => tmp})
Also, one will need to set '$maxDistance' => "value" according to one's requirement.
Currently Mongodb only supports 1 geo index per collection. That's documented, from this page
You may only have 1 geospatial index per collection, for now. While
MongoDB may allow to create multiple indexes, this behavior is
unsupported. Because MongoDB can only use one index to support a
single query, in most cases, having multiple geo indexes will produce
undesirable behavior.
#Amrish
currently mongodb can have only one 2d index per collection, However you can have Multi-location Documents.
Address :
[
{loc:[Lon,Lat], // or {lon: "somevalue" , lat:"somevalue"}
name:"start"
},
{loc:[Lon,Lat], // or {lon: "somevalue" , lat:"somevalue"}
name:"end"
},
]
And you can have 2d index on Address.loc
This can be helpful with $near and geoNear in searching and can be used for workaround to search Find all "start" point near to given lat-lng pair
Also note : it should be (long,lat) pair Not reverse.
]
looks like SO has zig-zag the example
Shortly, put (lng,lat) pair, with name:start or name:end.
Make array of above pair
put 2d Index on above array.
This can help you in querying find all start point near to given (lng,lat) pair.
This can help you in querying find all end point near to given (lng,lat) pair.
And you should able to use AND/OR etc operation for above two operation

MongoDB - forcing stored value to uppercase and searching

in SQL world I could do something to the effect of:
SELECT name FROM table WHERE UPPER(name) = UPPER('Smith');
and this would match a search for "Smith", "SMITH", "SmiTH", etc... because it forces the query and the value to be the same case.
However, MongoDB doesn't seem to have this capability without using a RegEx, which won't use indexes and would be slow for a large amount of data.
Is there a way to convert a stored value to a particular case before doing a search against it in MongoDB?
I've come across the $toUpper aggregate, but I can't figure out how that would be used in this particular case.
If there's not way to convert stored values before searching, is it possible to have MongoDB convert a value when it's created in Mongo? So when I add a document to the collection it would force the "name" attribute to a particular case? Something like a callback in the Rails world.
It looks like there's the ability to create stored JS for MongoDB as well, similar to a Stored Procedure. Would that be a feasible solution as well?
Mostly looking for a push in the right direction; I can figure out the particular code once I know what I'm looking for, but so far I'm not even sure if my desired functionality is doable.
You have to normalize your data before storing them. There is no support for performing normalization as part of a query at runtime.
The simplest thing to do is probably to save both a case-normalized (i.e. all-uppercase) and display version of the field you want to search by. Suppose you are storing users and want to do a case-insensitive search on last name. You might store:
{
_id: ObjectId(...),
first_name: "Dan",
last_name: "Crosta",
last_name_upper: "CROSTA"
}
You can then create an index on last_name_upper, and query like:
> db.users.find({last_name_upper: "CROSTA"})

id autoincrement/sequence emulation with CassandraDB/MongoDB etc

I'm trying to build small web-system (url shortcutting) using nonsql Cassandra DB, the problem I stack is id auto generation.
Did someone already stack with this problem?
Thanks.
P.S. UUID not works for me, I do need to use ALL numbers from 0 to Long.MAX_VALUE (java). so I do need something that exactly works like sql sequence
UPDATED:
The reason why I'm not ok with GUID ids is inside of scope of my application.
My app has url shortcutting part, and I do need to make url as short as possible. So I follow next approach: I'm taking numbers starting from 0 and convert it base64 string. So in result I have url like mysite.com/QA (where QA is base 64 string).
This is was very easy to implement using SQL DB, I just took auto incremented ID, convert it to URL and was 100-percents sure, that URL is unique.
Don't know about Cassandra, but with mongo you can have an atomic sequence (it won't scale, but will work the way it should, even in sharded environment if the query has the sharded field).
It can be done by using the findandmodify command.
Let's consider we have a special collection named sequences and we want to have a sequence for post numbers (named postid), you could use code similar to this:
> db.runCommand( { "findandmodify" : "sequences",
"query" : { "name" : "postid"},
"update" : { $inc : { "id" : 1 }},
"new" : true } );
This command will return atomically the updated (new) document together with status. The value field contains the returned document if the command completed successfully.
Autoincrement IDs inherently don't scale well as they need a single source to generate the numbers. This is why shardable/replicatable databases such as MongoDB use longer, GUID-like identifiers for objects. Why do you need LONG values so badly?
You might be able to do it using atomic increments, retaining the old value, but I'm not sure. This would be limited to single server setups only.
Im not sure I follow you. What language are you using? Are we talking about uuid?
The following is how you generate UUIDs in some languages:
java.util.UUID.randomUUID(); // (Java) variant 2, version 4
import uuid // (Python)
uuid.uuid1() // version 1