Titan order by ES indexed vertex property? - titan

Per the latest Titan 0.4.1 docs, this code should work to order vertices in a result set:
graph.query().has("name",CONTAINS,"john").orderBy("age",Order.DESC).limit(10).vertices()
I would like to perform this type of query across a potentially large set of vertices, and thus would like to index it. The docs suggest:
Most external indexing backends support ordering natively and
efficiently. However, the property key used in the orderBy method must
be configured to be indexed in this indexing backend for native result
ordering support. This is important in cases where the orderBy key is
different from the query keys. If the property key is not indexed,
then sorting requires loading all results into memory.
Specifically for the Elasticsearch backend, how can we create an index that will support this orderBy method?
Is there a simple 3rd parameter to pass to the KeyMaker's indexed method, possibly an extension to the below example from the docs?
graph.makeKey("age").dataType(Integer.class).indexed("search", Vertex.class).make();

You just need to index the property in ES (like any other property).
Indexed in Titan (wrong; orderBy will not be optimized):
graph.makeKey("age").dataType(Integer.class).indexed(Vertex.class).make();
Indexed in ES (correct; native result ordering in ES):
graph.makeKey("age").dataType(Integer.class).indexed("search", Vertex.class).make();
Cheeers,
Daniel

Related

DynamoDB equivalent to find().sort()

In mongoDB one can get all the documents sorted asc/desc on a particular field as follows:
db.collection_name.find().sort({field: sort_order})
query() requires one to specify the partition key and if one wants to query on a non key attribute, A GSI can be created and queries can be run on it for the same as explained here: Query on non-key attribute
scan() would do the job but doesn't provide an option to sort on any field.
One of the solution as described here: Dynamodb scan in sorted order is to have a common key for all documents and create a GSI on the attribute.
But as listed in the comments of the one of the solutions and I quote "The performance characteristics of a DynamoDB table apply the same to GSIs. A GSI with a single hash key of "OK" will only ever use one partition. This loses all scaling characteristics of DynamoDB".
Is there a way to achieve the above that scales well?
The only sorting applied by DynamoDB is by range key within a partition. There is no feature for sorting results by arbitrary field, you are expected to sort your own results in your application code. i.e. do a scan and sort the results on the client side.

What data structure does Google Firebase Firestore use for it's default index

I'm curious if anyone knows, or can guess, the data structure Google's Firestore is using to index arbitrary NoSQL documents by every field. I'm looking to build something similar, making it as efficient as possible.
Some info about how their default index works:
all fields are indexed by default, but only works for equality searches not range (<,>)
any range searches require extra indexes
Source: https://firebase.google.com/docs/firestore/query-data/indexing
It's unlikely it's a standard btree index per field because the range searches would work without adding the requirement for another index. Plus if you added a new field (easy with document storage), it would take time to build an index and collections with billions of items.
One theory: 1 big index per document. Index "field_name:value" for every field in every document. The index maps to a sorted list document IDs which contain that field/value pair. It would be able to to equality search (my merging the sorted doc-ids for every equality requirement), but not a range search. Basically an inverted index.
Any suggestion for a better ways of implementing a pattern like this?
Clarification, single field indexes do support range/inequality queries, composite indexes are about combining multiple field filters in a single query. See this page for more on index types:
https://firebase.google.com/docs/firestore/query-data/index-overview
Each field index is stored in it's own key range with contiguous regions assigned to a server with compute and storage scaling independently under the covers. Cloud Firestore handles indexes fairly similar to Cloud Datastore (but not 100% the same).
You can see a basic overview on my Cloud Next conference session from last year.

Does ElasticSearch have the same indexes functionality that mongodb have?

I want to know as we have index creation feature in mognodb to speed up the query process https://docs.mongodb.org/v3.0/indexes/ what do we have for elasticsearch for this purpose? I googled it but I was unable to find any suitable information, I used indexing in mongodb on most frequently used fields to speed up the query process and now I want to do same in elasticsearch i want to know is there anything that elasticsearch provides .Thanks
Elasticsearch also has indices: https://www.elastic.co/blog/what-is-an-elasticsearch-index
They are also used as part of the database's key features to provide swift search capabilities.
It is annoying that "index" is used in a different context with ES and many other databases. I'm not as familiar with MongoDB so I'll resort to their documentation at v3.0/core/index-types.
Basically Elasticsearch was designed to serve efficient "filtering" (yes/no queries) and "scoring" (relevance ranking via tf-idf etc.), and it uses Lucene as the underlying inverted index.
MongoDB concepts and their ES counter-parts:
Single Field Index: trivially supported, perhaps as not_analyzed fields for exact matching
Compound Index: Lucene applies AND filter condition via efficient bitmaps, can ad-hoc merge any "single field" indexes
Multikey Index: Transparent support, no difference values and an array of values
Geospatial Index: directly supported via geo-shapes
Text Index: In some way ES was optimized for this use-case as analyzed field type
In my view at search applications relevance is more important that plain filtering the results, as some words occur at almost every document and thus are less relevant when searching.
Elasticsearch has other very useful concepts as well such as aggregations, nested documents and child/parent relationships.

MongoDB Shard key - enforce in every query

If we use a compound shard key say {a,b} is there a possibility to throw error on any query which do not include these fields in the query at the java driver level. i.e any callbacks/life cycle event before query gets executed...like AbstractMongoEventListener - here we have options of onAfterLoad and onAfterConvert but our requirement is before executing the query... Something at the java driver level
I understand why you want this capability(If the query does not include even a single sharded key as part of its criteria then it will result in "scatter and gather" kind of queries which result in significant performance degradation). But the best practice for APIs suggest that a API should be designed for a single purpose and be generic ,if we tend to add this capability to the Java driver then it will tend it impose a additional constraint which might not be required.Hence there is no out of the box API that does this for you.
What you can do to make it work - Write a wrapper on top of this API with the additional capability.

Composite _ID and using MongoDB as a composite bucket store, via C#

I am building an eCommerce system that uses composite bucket hashing to efficiently group similar items. Without going into why I chose this system, suffice it to say it solves several key problems facing distributed eCommerce system.
There are 11 buckets, all of them ints, which represent various values. Let's call these fields A to K. The 12th field, L, is an array of product IDs. Think of this all as a hierarchy with the leaf level (L) being data.
I ran some initial tests in MongoDB where I stored this data as individual documents. However, this is not efficient because a given set of A to K could have many L values, so these can be stored as an array.
This gives me two options:
Insert a meaningless _id document id, and put an index on A - K to ensure uniqueness. I already ran some tests on indexes, and indexing more that the first 2 columns impacts speed substantially.
Make A - K a composite _id, and have one document data field: L.
I know #2 is a highly unconventional use of MongoDB. Are there any technical reasons why I shouldn't do this? If not, using the official C# driver, how would I perform this insert?
If you went for option #2, you could perhaps create your own optimised composite id (using A-K) using a Custom Id Generator.
Did you run your tests on compound keys?