How feasible is this query - mongodb

suppose you have a collection of documents with the following structure:
_id
A_id = ObjectId
B_id = ObjectId
C_id = ObjectId
+ other stuff
suppose you have a collection of roughly 100 million to 1 billion documents. I have to run a query,
which returns all documents such that A_id, B_id, or C_id are in some list of ObjectId, say L = [ ObjectId]
Something like this:
{ '$or' : [ { 'A_id' : { '$in' : L}},
{ 'B_id' : { '$in' : L}},
{ 'C_id' : { '$in' : L}} ]
}
Q: is it doable to run such query? Is it normal to run such queries on mongodb?
Q: how long can it take of a single server and how long may it take at horizontally scaled database?

It is a doable query.
The real question is "is it a good query?"
The answer to that question is extremely dependant upon many variables.
First off I am assuming you have an index on each of the fields you are querying. I am also assuming that the query stands as is, without a sort. It should be noted that there are problems which stop a sort index from being used here by the optimiser: https://jira.mongodb.org/browse/SERVER-1205
Assuming you have indexes on A_id, B_id and C_id MongoDB will essentially do 3 queries and merge duplicates before returning your result.
This means that for small $or queries it might be faster within the database (or mongos) itself since you don't have to merge duplicates yourself in your application which not only spares network traffic but also costly iteration of the results of each clause of the $or.
So for a small $or like that the query is Okay. It isn't the best query in the world but it will do if you have no choice but to do an $or.
Q: how long can it take of a single server and how long may it take at horizontally scaled database?
Not sure if anyone here can answer that. It depends upon schema, the size of the $ins and much more.

It's certainly doable to run that query. However, you might want to consider an alternative structure that could be more easily searched.
Instead of
_id
A_id = ObjectId
B_id = ObjectId
C_id = ObjectId
+ other stuff
You might want to restructure it to be:
_id
idList = [
{ k: 'A', v: AObjectId },
{ k: 'B', v: BObjectId },
{ k: 'C', v: CObjectId }
]
+ other stuff
By using an array, with sub-objects with a key and value field, you can index the value fields so you can do just a single efficient query:
{ 'idList' : { $in : [listToCheck] } }

Related

In MongoDB how to decide for a collection which fields to be indexed for a costly query

I have a collection with 1000+ records and I need to run the query below. I have come across the issue that this query takes more than a minute even if the departmentIds array has length something like 15-20. I think if I use an index the query time will be reduced.
From what I observe the 99% of the time spent on the query is due to the $in part.
How do I decide which fields to index. Should I index only department.department_id since that's what taking most time or should I create a compound index using userId,something and department.department_id (bascially all the fields I'm using in the query here)
Here is what my query looks like
let departmentIds = [.......................... can be large]
let query = {
userId: someid,
something: something,
'department.department_id': {
$in: departmentIds
}
};
//db query
let result = db
.collection(TABLE_NAME)
.find(query)
.project({
anotherfield: 1,
department: 1
})
.toArray();
You need to check all search cases and create indexes for those that are often used and most critical for your application. For the particular case above this seems to be the index options:
userId:1
userId:1,something:1
userId:1,something:1,department.department_id:1
I bet on option 1 since userId sounds like a unique key with high selectivity very suitable for index , afcourse best is to do some testing and identify the fastest options , there is good explain option that can help alot with the testing:
db.collection.find().explain("executionStats")

Optimising queries in mongodb

I am working on optimising my queries in mongodb.
In normal sql query there is an order in which where clauses are applied. For e.g. select * from employees where department="dept1" and floor=2 and sex="male", here first department="dept1" is applied, then floor=2 is applied and lastly sex="male".
I was wondering does it happen in a similar way in mongodb.
E.g.
DbObject search = new BasicDbObject("department", "dept1").put("floor",2).put("sex", "male");
here which match clause will be applied first or infact does mongo work in this manner at all.
This question basically arises from my background with SQL databases.
Please help.
If there are no indexes we have to scan the full collection (collection scan) in order to find the required documents. In your case if you want to apply with order [department, floor and sex] you should create this compound index:
db.employees.createIndex( { "department": 1, "floor": 1, "sex" : 1 } )
As documentation: https://docs.mongodb.org/manual/core/index-compound/
db.products.createIndex( { "item": 1, "stock": 1 } )
The order of the fields in a compound index is very important. In the
previous example, the index will contain references to documents
sorted first by the values of the item field and, within each value of
the item field, sorted by values of the stock field.

MongoDB select subdocument with aggregation function

I have a mongo DB collection that looks something like this:
{
{
_id: objectId('aabbccddeeff'),
objectName: 'MyFirstObject',
objectLength: 0xDEADBEEF,
objectSource: 'Source1',
accessCounter: {
'firstLocationCode' : 283,
'secondLocationCode' : 543,
'ThirdLocationCode' : 564,
'FourthLocationCode' : 12,
}
}
...
}
Now, assuming that this is not the only record in the collection and that most/all of the documents contain the accessCounter subdocument/field how will I go with selecting the x first documents where I have the most access from a specific location.
A sample "query" will be something like:
"Select the first 10 documents From myCollection where the accessCounter.firstLocationCode are the highest"
So a sample result will be X documents where the accessCounter. will be the greatest is the database.
Thank your for taking the time to read my question.
No need for an aggregation, that is a basic query:
db.collection.find().sort({"accessCounter.firstLocation":-1}).limit(10)
In order to speed this up, you should create a subdocument index on accessCounter first:
db.collection.ensureIndex({'accessCounter':-1})
assuming the you want to do the same query for all locations. In case you only want to query firstLocation, create the index on accessCounter.firstLocation.
You can speed this up further in case you only need the accessCounter value by making this a so called covered query, a query of which the values to return come from the index itself. For example, when you have the subdocument indexed and you query for the top secondLocations, you should be able to do a covered query with:
db.collection.find({},{_id:0,"accessCounter.secondLocation":1})
.sort("accessCounter.secondLocation":-1).limit(10)
which translates to "Get all documents ('{}'), don't return the _id field as you do by default ('_id:0'), get only the 'accessCounter.secondLocation' field ('accessCounter.secondLocation:1'). Sort the returned values in descending order and give me the first ten."

Correct mongo index for a large query

I'm using mongoose and have a schema similar to this:
var schema = new mongoose.Schema({
created: Date,
fieldA: ObjectId,
fieldB: ObjectId,
fieldC: ObjectId,
sortField: Number
});
This is a big collection so I want to make sure the indexes are optimal. I build up a query with something like this:
var query = Schema.find({created: some-date-clause});
if ( some-condition )
query = query.or({fieldA: {$in: listOfSomeFieldAIDs}});
if ( some-other-condition )
query = query.or({fieldB: {$in: listOfSomeFieldBIDs}});
if ( yet-another-condition )
query = query.or({fieldC: {$in: listOfSomeFieldCIDs}});
query = query.sort({sortField:-1});
I want to make sure that whatever the query is, it's fully covered. My instinct is to create an index that spans created, fieldA, fieldB, fieldC and sortField. But should I actually create several indices for the situations where for example only the second condition is true, or the first and the third are true? Should I be approaching this differently?
Each clause of an $or query is considered independently, so it's likely best to create three separate indexes, one per fieldA, fieldB, and fieldC. See docs here.
The sort occurs after the results of the three $or clauses are merged, so adding sortField to these indexes isn't likely to be useful.
But as always, validate any index approach using explain() to make sure your queries are able to use the indexes you've created.

MongoDB sort order - mix of ascending and descending fields

I want to sort a MongoDB collection based upon several fields, some ascending, other descending.
I'm using the official C# driver. My code currently looks like this:
string[] sortFields = new[] { "surname", "firstname","companyname","email" };
MongoDB.Driver.Builders.SortByBuilder sort = MongoDB.Driver.Builders.SortBy.Ascending(sortFields);
foreach (MongoDB.Bson.BsonDocument doc in contactsCollection.FindAs<MongoDB.Bson.BsonDocument>(query).SetSortOrder(sort))
{
...
How do I alter this code so it sorts on descending for email?
Thanks very much.
You can chain SortBy calls:
var sort = SortBy.Ascending("surname").Descending("email");
foreach (var doc in contactsCollection.FindAs<MongoDB.Bson.BsonDocument>(query).SetSortOrder(sort))
{
...
I'd be careful about building queries dynamically, especially with that many keys. Keep in mind that MongoDB uses only ONE index for a query, so your index will have to be a good fit.
Example: find({A: "foo", B: "bar"}).sort("C" : -1);
This won't use indexing efficiently if the compound index is {C, A, B}. It must be {A, B, C} instead. Too many indexes will take up space and make inserts / updates slower.