union on same collection in mongodb - mongodb

I need the most viable way to search docs with the following structure.
{ _id:"",
var1: number,
var2: number,
var3: number,
}
In the sql way the resultset i need would be UNION of these three (n records sorted by var1,n records sorted by var2,n records sorted by var3)
with UNION I expect the duplicates to be removed.
Being new to mongodb i am not able to find the right way to write a query for such an operation, i believe it must be possible in mongodb.
If in case its not possible, could you please suggest an alternate nosql solution.

The closest MongoDB operator to what you are looking for is an $or, but that isn't quite the same as an SQL UNION which combines two separate queries into a single result. MongoDB queries are always against a single collection, but $or allows you to have multiple query clauses.
For example:
db.collection.find(
// Find documents matching any of these values
{$or:[
{var1: 123},
{var2: 456},
{var3: 789}
]}
).sort(
// Sort in ascending order
{var1:1, var2:1, var3:1}
)
Since you are limited to querying a single collection, results will already be de-duplicated at the document level and all results will share the same sort order if one is specified.
If you want to simulate a UNION (or other operation working with multiple collections/queries) in MongoDB, you will have to write multiple queries and merge the result sets in your application code.

Related

Pymongo - find multiple different documents

my question is very similar to how-to-get-multiple-document-using-array-of-mongodb-id, however, I would like to find multiple documents without using the _id.
That is, consider that I have documents such as
the
document = { _id: _id, key_1: val_1, key_2: val_2, key_3: val_3}
I need to be able to .find() by multiple parameters, as for example,
query_1 = {key_1: foo, key_2: bar}
query_2 = {key_1: foofoo, key_2: barbar}
Right now, I am running a query for query_1, followed by a query for query_2.
As it turns out, this method is extremely inefficient.
I tried to add concurrency as to make it faster, but the speedup was not even 2x.
Is it possible to query multiple documents at once?,
I am looking for a method that returns the union of the matches for query_1 AND query_2.
If this is not possible, do you have any suggestions that might speed a query of this type?
Thank you for your help.

What kind of index(es) would be best to create to be able to search by one field and sort by another

I have a big collection of many million records consisting of:
{
"id1":string,
"id2":string,
"correlation":number
}
Which represents the relationships between pairs of records.
I would like to be able to efficiently run such queries as
db.collection.find({id1: 1}).sort({correlation: -1})
So, getting records by field id1 and sorting them by correlation field (in the descending order).
What kind of index(es) would be the most appropriate for such scenario?
I think the solution is to create the compound index like the following:
db.collection.createIndex({"id1": 1, "correlation": -1})
In my case all the queries of the form
db.collection.find({id1: id}).sort({correlation: -1})
run almost instantaneously.

Concatenate two queries in mongodb

I want to split my long queries in mongodb for example instead using db.users.find(q) in the following code, I want to use a concatenate version like this db.users.find(q0+q1) or any other way that I can split q into two queries
q= {},{name:1,"education.school.name":1,"education.type":1}
db.users.find(q)
q0={}
q1={name:1,"education.school.name":1,"education.type":1}
q=q0+","+q1
db.users.find(q)
How do I do something like that?
Your clauses are somewhat ambiguous there (your q and q0 are both empty objects), but you might want the $or operator
db.users.find(
$or: [
{name:1,"education.school.name":1,"education.type":1},
{age:1,"education.school.name":2,"education.type":2}
]
)
This'll return the set union of those two queries. That said, depending on what you're actually wanting to do, you might be able to use $in with an array of values to search for per field, if you just want to search on multiple values.

Sorting on Multiple fields mongo DB

I have a query in mongo such that I want to give preference to the first field and then the second field.
Say I have to query such that
db.col.find({category: A}).sort({updated: -1, rating: -1}).limit(10).explain()
So I created the following index
db.col.ensureIndex({category: 1, rating: -1, updated: -1})
It worked just fined scanning as many objects as needed, i.e. 10.
But now I need to query
db.col.find({category: { $ne: A}}).sort({updated: -1, rating: -1}).limit(10)
So I created the following index:
db.col.ensureIndex({rating: -1, updated: -1})
but this leads to scanning of the whole document and when I create
db.col.ensureIndex({ updated: -1 ,rating: -1})
It scans less number of documents:
I just want to ask to be clear about sorting on multiple fields and what is the order to be preserved when doing so. By reading the MongoDB documents, it's clear that the field on which we need to perform sorting should be the last field. So that is the case I assumed in my $ne query above. Am I doing anything wrong?
The MongoDB query optimizer works by trying different plans to determine which approach works best for a given query. The winning plan for that query pattern is then cached for the next ~1,000 queries or until you do an explain().
To understand which query plans were considered, you should use explain(1), eg:
db.col.find({category:'A'}).sort({updated: -1}).explain(1)
The allPlans detail will show all plans that were compared.
If you run a query which is not very selective (for example, if many records match your criteria of {category: { $ne:'A'}}), it may be faster for MongoDB to find results using a BasicCursor (table scan) rather than matching against an index.
The order of fields in the query generally does not make a difference for the index selection (there are a few exceptions with range queries). The order of fields in a sort does affect the index selection. If your sort() criteria does not match the index order, the result data has to be re-sorted after the index is used (you should see scanAndOrder:true in the explain output if this happens).
It's also worth noting that MongoDB will only use one index per query (with the exception of $ors).
So if you are trying to optimize the query:
db.col.find({category:'A'}).sort({updated: -1, rating: -1})
You will want to include all three fields in the index:
db.col.ensureIndex({category: 1, updated: -1, rating: -1})
FYI, if you want to force a particular query to use an index (generally not needed or recommended), there is a hint() option you can try.
That is true but there are two layers of ordering you have here since you are sorting on a compound index.
As you noticed when the first field of the index matches the first field of sort it worked and the index was seen. However when working the other way around it does not.
As such by your own obersvations the order needed to be preserved is query order of fields from first to last. The mongo analyser can sometimes move around fields to match an index but normally it will just try and match the first field, if it cannot it will skip it.
try this code it will sort data first based on name then keeping the 'name' in key holder it will sort 'filter'
var cursor = db.collection('vc').find({ "name" : { $in: [ /cpu/, /memo/ ] } }, { _id: 0, }).sort( { "name":1 , "filter": 1 } );
Sort and Index Use
MongoDB can obtain the results of a sort operation from an index which
includes the sort fields. MongoDB may use multiple indexes to support
a sort operation if the sort uses the same indexes as the query
predicate. ... Sort operations that use an index often have better
performance than blocking sorts.
db.restaurants.find().sort( { "borough": 1, "_id": 1 } )
more information :
https://docs.mongodb.com/manual/reference/method/cursor.sort/

How does MongoDB evaluates multiple $or statements?

How will MongoDB evaluate this query:
db.testCol.find(
{
"$or" : [ {a:1, b:12}, {b:9, c:15}, {c:10, d:"foo"} ]
});
When scanning values in a document if first OR statement is TRUE will the other statements be also be evaluated?
Logically if the MongoDB is optimized other values in OR statement should not be evaluated, but I don't know how MongoDB is implemented.
UPDATE:
I updated my query because it was wrong and it didn't explain correctly what I was trying to accomplish. I need to find a set of documents that have different properties and if an exact combination of these properties is found the document must be returned.
The SQL equivalent of my query would be:
SELECT * FROM testCol
WHERE (a = 1 AND b = 12) OR (b = 9 AND c = 15) OR (c = 10 AND d = 'foo');
MongoDB will execute each clause of the $or operation as a seperate query and remove duplicates as a post processing pass. As such each clause can use a seperate index which is often very useful.
In other words, it will NOT look at 1 document, see which of the OR clauses apply and do an early-out if the first clause is a match. Rather it does a full dataset query per clause and de-dupe after the fact. This may seem less than efficient but in practice it's almost always faster since the first approach would only be able to hit at most one index for all clauses which is rarely efficient.
EDIT: Mongo only skips documents during the de-duplication process, not during the table scans.
Mongo won't check documents that are already part of the result set. So if your first {a:1, b:12} returns 100% of the documents, Mongo is done.
You want to put whatever will grab the most documents as your first evaluated statement because of this. If your first item only grabs 1% of documents, the subsequent item will need to scan the other 99%.
That being said, you are using $or to look for values in a single key. I think you want to use $in for this.
See here for more:
http://books.google.com/books?id=BQS33CxGid4C&lpg=PA48&ots=PqvQJPRUoe&dq=mongo%20tips%20and%20tricks%20%22OR-query%22&pg=PA48#v=onepage&q&f=false