Mongo query parse: sorting key based on alphabetical order is there any solution to consider based on user input?
Example :
db.user.explain().find({name: 'test user', active: true})
In the above query, mongo will parse the query to
"$and" : [
{
"active" : {
"$eq" : true
}
},
{
"name" : {
"$eq" : "test user"
}
}
]
while parsing mongo considering "active" key first and "name"
I want the query should look for "name" key first and "active" like
"$and" : [
{
"name" : {
"$eq" : "test user"
}
},
{
"active" : {
"$eq" : true
}
},
]
is there any setting/config?
As you have noticed parsedQuery in explain() will show you the fields in alphabetical order , but this order is not important in case there is no suitable index to be used since all documents will be loaded from the storage to memory and evaluated , so even you rename the fields to "aname" and "nactive" the execution times will be same if you dont have index , this is why it is important to create index and the order of your searched fields to coincide with the fields order in your index , for better performance in your case you may create index on:
{name:1}
or
{name:1, active:1}
But since the field "active" looks like a boolean value with very low selectivity it may not add too much difference during the search unless it is a "covered query" e.g. db.test.find({name:"Test",active:true},{name:1,_id:0})
and search happen only in memory.
Remember:
Reading from disk to memory is much more expensive then searching in memory so even your keys are intentionally renamed to satisfy the alphabetical order there will be no benefit if you dont create index and the mongod process perform COLLSCAN on the full collection.
Related
In my application I have need to load a lot of data and compare it to existing documents inside a specific collection, and version them.
In order to do it, for every new document I have to insert, I simply made a query and search for last version, using a specific key (not _id), group data together and found last version.
Example of data:
{
"_id" : ObjectId("5c73a643f9bc1c2fg4ca6ef5"),
"data" : {
the data
}
},
"key" : {
"value1" : "545454344",
"value2" : "123212321",
"value3" : "123123211"
},
"version" : NumberLong("1"),
}
As you can see, key is composed of three values, related to data and my query to find last version is the following:
db.collection.aggregate(
{
{
"$sort" : {
"version" : NumberInt("-1")
}
},
{
"$group" : {
"_id" : "$key",
"content" : {
"$push" : "$data"
},
"version" : {
"$push" : "version"
},
"_oid" : {
"$push" : "$_id"
},
}
},
{
"$project" : {
"data" : {
"$arrayElemAt" : [
"$content",
NumberInt("0")
]
},
"version" : {
"$arrayElemAt" : [
"$version",
NumberInt("0")
]
},
"_id" : {
"$arrayElemAt" : [
"$_oid",
NumberInt("0")
]
}
}
}
}
)
To improve performance (from exponential to linear), I build an index that holds key and version:
db.getCollection("collection").createIndex({ "key": 1, "version" : 1})
So my question is: there are some other capabilities/strategies to optimize this search ?
Notes
in these collection there are some other field I already use to filter data using match, omitted for brevity
my prerequisite is to load a lot of data, process one to one, before insert: if there is a better approach to calculate version, I can consider also to change this
I'm not sure if an unique index on key could do the same as my query. I mean, if I do an unique index on key and version, I could have the uniqueness on that couple an iterate on it, for example:
no data on collection: just insert first version
insert new document: try to insert version 1, then get error, iterate on it, this should hit unique index, right ?
I had similar situation and this is how I solved it.
Create a seperate collection that will hold Key and corresponding latest version, say KeyVersionCollection
Make this collection "InMemory" for faster response
Store Key in "_id" field
When inserting document in your versioned collection, say EntityVersionedCollection
Query latest version from KeyVersionCollection
Update the version number by 1 or insert a new document with version 0 in KeyVersionCollection
You can even combine above 2 operations in 1 (https://docs.mongodb.com/manual/reference/method/db.collection.findAndModify/#db.collection.findAndModify)
Use new version number to insert document in EntityVersionedCollection
This will save time of aggregation and sorting. On side note, I would keep latest versions in seperate collection - EntityCollection. In this case, for each entity - insert a new version in EntityVersionedCollection and upsert it in EntityCollection.
In corner cases, where process is interrupted between getting new version number and using it while inserting entity, you might see that the version is skipped in EntityVersionedCollection; but that should be ok. Use timestamps to track inserts/updates so that it can be used to correlate/audit in future.
Hope that helps.
You can simply pass an array into the mongoDB insert function, and it should insert an entire JSON payload without any memory deficiencies.
You're welcome
I have a collection with millions of records. I am trying to implement an autocomplete on a field called term that I broke down into an array of words called words. My query is very slow because I am missing something with regards to the index. Can someone please help?
I have the following query:
db.vx.find({
semantic: "product",
concept: true,
active: true,
$and: [ { words: { $regex: "^doxycycl.*" } } ]
}).sort({ length: 1 }).limit(100).explain()
The explain output says that no index was used even though I have the following index:
{
"v" : 1,
"key" : {
"words" : 1,
"active" : 1,
"concept" : 1,
"semantic" : 1
},
"name" : "words_1_active_1_concept_1_semantic_1",
"ns" : "mydatabase.vx"
}
You can check if the compound index is exploited correctly using the mongo shell
db.vx.find({YOURQUERY}).explain('executionStats')
and check the field winningPlan.stage:
COLLSCAN means the indexes are partially used or not used at all.
IXSCAN means the indexes are used correctly in this query.
You can also check if the text search fits your needs since is way more fast than $regex operator.
https://comsysto.com/blog-post/mongodb-full-text-search-vs-regular-expressions
If I have a document as follows:
{
"_id" : ObjectId("54986d5531a011bb5fb8e0ee"),
"owner" : "54948a5d85f7a9527a002917",
"type" : "group",
"deleted" : false,
"participants" : {
"54948a5d85f7a9527a002917" : {
"last_message_id" : null
},
"5491234568f7a9527a002917" : {
"last_message_id" : null
}
"1234567aaaa7a9527a002917" : {
"last_message_id" : null
}
},
}
How do I do a simple filter for all documents this have participant "54948a5d85f7a9527a002917"?
Thanks
Trying to query structures like this does not work well. There are a whole whole host of problems with modelling like this, but the most clear problem is using "data" as the names for "keys".
Try to think a little RDBMS like, at least in the concepts of the limitations to what a database cannot or should not do. You wouldn't design a "table" in a schema that had something like "54948a5d85f7a9527a002917" as the "column" name now would you? But this is essentially what you are doing here.
MongoDB can query this, but not in an efficient way:
db.collection.find({
"participants.54948a5d85f7a9527a002917": { "$exists": true }
})
Naturally this looks for the "presence" of a key in the data. While the query form is available, it does not make efficient use of such things as indexes where available as indexes apply to "data" and not the "key" names.
A better structure and approach is this:
{
"_id" : ObjectId("54986d5531a011bb5fb8e0ee"),
"owner" : "54948a5d85f7a9527a002917",
"type" : "group",
"deleted" : false,
"participants" : [
{ "_id": "54948a5d85f7a9527a002917" },
{ "_id": "5491234568f7a9527a002918" },
{ "_id": "1234567aaaa7a9527a002917" }
]
}
Now the "data" you are looking for is actual "data" associated with a "key" ( possibly ) and inside an array for binding to the parent object. This is much more efficient to query:
db.collection.find({
"participants._id": "54948a5d85f7a9527a002917"
})
It's much better to model that way than what you are presently doing and it makes sense to the consumption of objects.
BTW. It's probably just cut and paste in your question but you cannot possibly duplicate keys such as "54948a5d85f7a9527a002917" as you have. That is a basic hash rule that is being broken there.
Is it possible to use ensureindex within records and not for whole collection.
Eg: My database structure is
{ "_id" : "com.android.hello",
"rating" : [
[ { "user" : "BBFE7F461E10BEE10A92784EFDB", "value" : "4" } ],
[ { "user" : "BBFE7F461E10BEE10A92784EFDB", "value" : "4" } ]
]
}
It is a rating system and i don't want the user to rate multiple times on the same application (com.android.hello). If i use ensureindex on the user field then user is able to vote only on one application. When i try to vote on a different application altogether (com.android.hi) it says duplicate key.
No, you can not do this. Uniqueness is only enforced on a per document level. You will need to redesign your schema for the above to work. For example to:
{
"_id" : "com.android.hello",
"rating": {
"user" : "BBFE7F461E10BEE10A92784EFDB",
"value" : "4"
}
}
And then just store multiple...
(I realize you didn't provide the full document though)
ensureIndex
creates indexes , which is applied to whole collection. In case you want only for few records , you may have to keep two collections and apply ensureIndex on one of the collection.
As #Derick said, no however it is possible to make sure they can only vote once atomically:
var res=db.votes.update(
{_id: 'com.android.hello', 'rating.user': {$nin:['BBFE7F461E10BEE10A92784EFDB']}},
{$push:{rating:{user:'BBFE7F461E10BEE10A92784EFDB',value:4}}},
{upsert:true}
);
if(res['upserted']||res['n']>0){
print('voted');
}else
print('nope');
I was a bit concerned that $push would not work in upsert but I tested this as working.
I remember reading somewhere that the mongo engine was more confortable when the entire structure of a document was already in place in case of an update, so here is the question.
When dealing with "empty" data, for example when inserting an empty string, should I default it to null, "" or not insert it at all ?
{
_id: ObjectId("5192b6072fda974610000005"),
description: ""
}
or
{
_id: ObjectId("5192b6072fda974610000005"),
description: null
}
or
{
_id: ObjectId("5192b6072fda974610000005")
}
You have to remember that the description field may or may not be filled in every document (based on user input).
Introduction
If a document doesn't have a value, the DB considers its value to be null. Suppose a database with the following documents:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
If you create a query to find documents with the field desc different than null, you will get just one document:
db.test.find({desc: {$ne: null}})
// Output:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
The database doesn't differ documents without a desc field and documents with a desc field with the value null. One more test:
db.test.find({desc: null})
// Output:
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
But the differences are only ignored in the queries, because, as shown in the last example above, the fields are still saved on disk and you'll receive documents with the same structure of the documents that were sent to the MongoDB.
Question
When dealing with "empty" data, for example when inserting an empty string, should I default it to null, "" or not insert it at all ?
There isn't much difference from {desc: null} to {}, because most of the operators will have the same result. You should only pay special attention to these two operators:
$exists
$type
I'd save documents without the desc field, because the operators will continue to work as expected and I'd save some space.
Padding factor
If you know the documents in your database grow frequently, then MongoDB might need to move the documents during the update, because there isn't enough space in the previous document place. To prevent moving documents around, MongoDB allocates extra space for each document.
The ammount of extra space allocated by MongoDB per document is controlled by the padding factor. You cannot (and don't need to) choose the padding factor, because MongoDB will adaptively learn it, but you can help MongoDB preallocating internal space for each document by filling the possible future fields with null values. The difference is very small (depending on your application) and might be even smaller after MongoDB learn the best padding factor.
Sparse indexes
This section isn't too important to your specific problem right now, but may help you when you face similar problems.
If you create an unique index on field desc, then you wouldn't be able to save more than one document with the same value and in the previous database, we had more than one document with same value on field desc. Let's try to create an unique index in the previous presented database and see what error we get:
db.test.ensureIndex({desc: 1}, {unique: true})
// Output:
{
"err" : "E11000 duplicate key error index: test.test.$desc_1 dup key: { : null }",
"code" : 11000,
"n" : 0,
"connectionId" : 3,
"ok" : 1
}
If we want to be able to create an unique index on some field and let some documents have this field empty, we should create a sparse index. Let's try to create the unique index again:
// No errors this time:
db.test.ensureIndex({desc: 1}, {unique: true, sparse: true})
So far, so good, but why am I explaining all this? Because there is a obscure behaviour about sparse indexes. In the following query, we expect to have ALL documents sorted by desc.
db.test.find().sort({desc: 1})
// Output:
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
The result seems weird. What happened to the missing document? Let's try the query without sorting it:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
All documents were returned this time. What's happening? It's simple, but not so obvious. When we sort the result by desc, we use the sparse index created previously and there is no entries for the documents that haven't the desc field. The following query show us the use of the index to sort the result:
db.test.find().sort({desc: 1}).explain().cursor
// Output:
"BtreeCursor desc_1"
We can skip the index using a hint:
db.test.find().sort({desc: 1}).hint({$natural: 1})
// Output:
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
Summary
Sparse unique indexes don't work if you include {desc: null}
Sparse unique indexes don't work if you include {desc: ""}
Sparse indexes might change the result of a query
There is little difference between the null value field and a document without the field. The main difference is that the former consumes a little disk space, while the latter does not consume at all. They can be distinguished by using $exists operator.
The field with an empty string is quite different from them. Though it depends on purpose I don't recommend to use it as a replacement for null. To be precise, they should be used to mean different things. For instance, think about voting. A person who cast a blank ballot is different from a person who wasn't permitted to vote. The former vote is an empty String, while the latter vote is null.
There is already a similar question here.