Titan: Both query is same? - titan

Let say I have following things
V1.setProperty("category","C1");
V1.setProperty("city","City1");
QUERY for vertices having city city1:
v.query().has("category","c1").has("city","city1").vertices();
same thing in different way:
V1.setProperty("category","C1");
V1.setProperty("C1_city","City1");
QUERY for vertices having city city1:
v.query().has("C1_city","city1").vertices();
assume category city and C1_city is both index. Are both query same performancewise?

I wouldn't say that they are the same from a performance perspective. In the first case, Titan will only use the index from category and will not use the city index (it will just iterate all c1 vertices and then filter on city. Therefore, I guess I would expect that the second query would be faster as it is finding exactly what you are looking for completely through the index.

Related

Querying MongoDB: retreive shops by name and by location with one single query

QUERYING MONGODB: RETREIVE SHOPS BY NAME AND BY LOCATION WITH ONE SINGLE QUERY
Hi folks!
I'm building a "search shops" application using MEAN Stack.
I store shops documents in MongoDB "location" collection like this:
{
_id: .....
name: ...//shop name
location : //...GEOJson
}
UI provides to the users one single input for shops searching. Basically, I would perform one single query to retrieve in the same results array:
All shops near the user (eventually limit to x)
All shops named "like" the input value
On logical side, I think this is a "$or like" query
Based on this answer
Using full text search with geospatial index on Mongodb
probably assign two special indexes (2dsphere and full text) to the collection is not the right manner to achieve this, anyway I think this is a different case just because I really don't want to apply sequential filter to results, "simply" want to retreive data with 2 distinct criteria.
If I should set indexes on my collection, of course the approach is to perform two distinct queries with two distinct mehtods ($near for locations and $text for name), and then merge the results with some server side logic to remove duplicate documents and sort them in some useful way for user experience, but I'm still wondering if exists a method to achieve this result with one single query.
So, the question is: is it possible or this kind of approach is out of MongoDB purpose?
Hope this is clear and hope that someone can teach something today!
Thanks

What kind of index(es) would be best to create to be able to search by one field and sort by another

I have a big collection of many million records consisting of:
{
"id1":string,
"id2":string,
"correlation":number
}
Which represents the relationships between pairs of records.
I would like to be able to efficiently run such queries as
db.collection.find({id1: 1}).sort({correlation: -1})
So, getting records by field id1 and sorting them by correlation field (in the descending order).
What kind of index(es) would be the most appropriate for such scenario?
I think the solution is to create the compound index like the following:
db.collection.createIndex({"id1": 1, "correlation": -1})
In my case all the queries of the form
db.collection.find({id1: id}).sort({correlation: -1})
run almost instantaneously.

How do i create an index in mongodb on a WHERE and ORDER query?

In mongo, When creating an index I am trying to figure out whether the following query would have an index on a) category_ids and status, OR b) category_ids, status and name???
Source.where(category_ids: [1,2,3], status: Status::ACTIVE).order_by(:name) # ((Ruby/Mongoid code))
Essentially, I am trying to figure out whether indexes should include the ORDER_BY columns? or only the WHERE clauses? Where could I read some more about this?
Yes, an index on thius particular query would be beneficial to the speed of the query. However there is one caveat here, the order of the index fields.
I have noticed you are using an $in there on category_ids. This link is particularly useful in understanding a little complexity which exists from using an $in with an index on the sort (or a sort in general in fact): http://blog.mongolab.com/2012/06/cardinal-ins/
Towards the end it gives you an indea of an optimal index order for your type of query:
The order of fields in an index should be:
First, fields on which you will query for exact values.
Second, fields on which you will sort.
Finally, fields on which you will query for a range of values.
For reference a couple of other helpful links are as follows:
http://docs.mongodb.org/manual/applications/indexes/
http://docs.mongodb.org/manual/faq/indexes/#how-do-you-determine-what-fields-to-index
http://jasonwilder.com/blog/2012/02/08/optimizing-mongodb-indexes/
why does direction of index matter in MongoDB?
And, http://www.slideshare.net/kbanker/mongo-indexoptimizationprimer
These will help you get started on optimising your indexes and making them work for your queries.

mongoDB group by/distinct query

model checkin:
checkin
_id
interest_id
author_id
I've got a collection of checkins (resolved by simple "find" query)
I'd like to count the number of checkins for each interest.
What makes the task a bit more difficult - we should count two checkins from one person and one interest as one checkin.
AFAIK, group operations in mongo are performed by map/reduce query. Should we use it here? The only idea I've got with such an approach is to aggregate the array of users for each interest and then return this array's length.
EDIT I ended up with not using map/reduce at all, allthough Emily's answer worked fine & quick.
I have to select only checkins from last 60 minutes, and there shouldn't be too many results. So I just get all of them to Ruby driver, and do all the calculation's on ruby side. It's a bit slower, but much more scalable and easy-to-understand.
best,
Roman
Map reduce would probably be the way to go for this and you could get the desired results with two map reduces.
In the first, you could remove duplicate author_id and interest_id pairs.
key would be author_id and interest_id
values would be checkin_id
The second map reduce would just be a count of the number of checkins by a given author_id.
key would be author_id
value would be checkin_id count

custom sorting in sphinx / sort result by match & distance over a particular field

I am using sphinx 2.0.
I want to achieve following results :
user will input tags with other search terms, documents associated with user input tags should come on top, sorted by distance.
After that documents does not contain those tags sorted by distance.
What i am doing:
I am searching on different parameters at the same time using like #name , #tag, #streetname etc.so i am using below
$cl->SetMatchMode(SPH_MATCH_EXTENDED);
and sorting the result by distance using $cl->SetSortMode(SPH_SORT_EXTENDED, '#geodist asc');
tag filed can contain multiple values i am using OR operator to get the desired results.
If i search for only #tags then i am able to achieve the requirement i have mentioned. but if user input is #tag food|dinner #city london #name taxi
then result with name: London Taxi, street: London comes on top or some other position breaking the sorting order by lat-long. because London is there in two parameters.i just want to sort by tag, do not want to include the weight of other search terms in sorting order.
Ranking mode is : $cl->setRankingMode(SPH_RANK_PROXIMITY_BM25);
any suggestion to overcome this issue ? or any other way to implement it.
Many Thanks.
I think the way to solve this would be to arrange for matches on the tag field to rank way way higher. Would have to test it but something like this...
$cl->setFieldWeights(array('tags' => 100000));
$cl->setSelect("*,IF(#weight>100000,1,0) AS matchtags");
$cl->SetSortMode(SPH_SORT_EXTENDED, 'matchtags DESC, #geodist ASC');