MongoDB not using Index on simple find - mongodb

I have a collection called "EN" and I created an index as follow:
db.EN.createIndex( { "Prod_id": 1 } );
When I run db.EN.getIndexes() I get this:
[{ "v": 2, "key": {
"_id": 1 }, "name": "_id_" }, { "v": 2, "key": {
"Prod_id": 1 }, "name": "Prod_id_1" }]
However, when I run the following query:
db.EN.find({'Icecat-interface.Product.#Prod_id':'ABCD'})
.explain()
I get this:
{ "explainVersion": "1", "queryPlanner": {
"namespace": "Icecat.EN",
"indexFilterSet": false,
"parsedQuery": {
"ICECAT-interface.Product.Prod_id": {
"$eq": "ABCD"
}
},
"queryHash": "D12BE22E",
"planCacheKey": "9F077ED2",
"maxIndexedOrSolutionsReached": false,
"maxIndexedAndSolutionsReached": false,
"maxScansToExplodeReached": false,
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"ICECAT-interface.Product.Prod_id": {
"$eq": "ABCD"
}
},
"direction": "forward"
},
"rejectedPlans": [] }, "command": {
"find": "EN",
"filter": {
"ICECAT-interface.Product.Prod_id": "ABCD"
},
"batchSize": 1000,
"projection": {},
"$readPreference": {
"mode": "primary"
},
"$db": "Icecat" }, "serverInfo": {
It's using COLLSCAN instead of the index, why is this happening?
MongoDB version is 5.0.9-8
Thanks
EDIT (and solution)
It turns that the field name has "#" in front and the index was created without this character so was not picking it up at all.
Once I created a new index using the field name as it was supposed to be it worked OK.
It was interesting though to see how indexing works and best practices

Your find operation is defined as
.find({'Icecat-interface.Product.#Prod_id':'ABCD'})
What is Icecat-interface.Product.#?
The parsedQuery in the explain output confirms that MongoDB is attempting to look for a document that has has a value of "ABCD" for a different field name than the one you have aindexed. From the explain you've provided, that field name is "ICECAT-interface.Product.Prod_id". As the field name being queried and the one that is indexed are different, MongoDB cannot use the index to perform the operation.
Marginally related, the # character that is used in the find is absent in the explain output. This appears to because the actual operation that was used to generate the explain was slightly different. This is also noticeable by the fact that the explain include a batchSize of 1000 which is absent in the operation that was shown as the one being explained.
Depending on what the Icecat-interface.Product.# prefix is supposed to be, the solution is probably to simply remove that from the query predicate in the find itself.
Edit to respond to the comment and the edit to the question. Regarding the comment first:
When I run this: .find({'Prod_id':'ABCD'}) it uses COLLSCAN which to me is wrong, as I have an index on that field, unless I'm missing something here
MongoDB will look to use an index if its first key is used by the query. So an index on { y: 1 } would not be eligible for use by a query of .find({ x: 1}). Similarly to a generic x and y example, Icecat-interface.Product.Prod_id and Prod_id are different field names. So if you query on one but only an index on the other exists, then a collection scan is the only way for the database to execute the query.
This then overlaps some with the edit to the question. In the edited question the new explain plan shows the database successfully using an index. However, that index is { "ICECAT-interface.Product.Prod_id": 1 } which is not the index that you originally show being created or present on the collection ({ "Prod_id": 1 }).
Moreover, you also mention that you "don't get any result back, even with products I know are in the DB". Which field in the database contains the value that you are searching on ('ABCD')? This is going to directly inform what results you get back and what index is used to find the results. Remember that you can search on any arbitrary field in MongoDB, even if it doesn't exist in the database.
I would recommend some extra attention be paid to the namespaces and field names that are being used. Unless this { "ICECAT-interface.Product.Prod_id": 1 } index was created after the db.EN.getIndexes() output was gathered, you may be inadvertently connecting to different systems or namespaces since that index is definitely present somewhere.
Based on your live comments while I'm writing this, seems like you've solved the field name mystery.

Related

How does 'fuzzy' work in MongoDB's $searchBeta stage of aggregation?

I'm not quite understanding how fuzzy works in the $searchBeta stage of aggregation. I'm not getting the desired result that I want when I'm trying to implement full-text search on my backend. Full text search for MongoDB was released last year (2019), so there really aren't many tutorials and/or references to go by besides the documentation. I've read the documentation, but I'm still confused, so I would like some clarification.
Let's say I have these 5 documents in my db:
{
"name": "Lightning Bolt",
"set_name": "Masters 25"
},
{
"name": "Snapcaster Mage",
"set_name": "Modern Masters 2017"
},
{
"name": "Verdant Catacombs",
"set_name": "Modern Masters 2017"
},
{
"name": "Chain Lightning",
"set_name": "Battlebond"
},
{
"name": "Battle of Wits",
"set_name": "Magic 2013"
}
And this is my aggregation in MongoDB Compass:
db.cards.aggregate([
{
$searchBeta: {
search: { //search has been deprecated, but it works in MongoDB Compass; replace with 'text'
query: 'lightn',
path: ["name", "set_name"],
fuzzy: {
maxEdits: 1,
prefixLength: 2,
maxExpansion: 100
}
}
}
}
]);
What I'm expecting my result to be:
[
{
"name": "Lightning Bolt", //lightn is in 'Lightning'
"set_name": "Masters 25"
},
{
"name": "Chain Lightning", //lightn is in 'Lightning'
"set_name": "Battlebond"
}
]
What I actually get:
[] //empty array
I don't really understand why my result is empty, so it would be much appreciated if someone explained what I'm doing wrong.
What I think is happening:
db.cards.aggregate... is looking for documents in the "name" and "set_name" fields for words that have a max edit of one character variation from the "lightn" query. The documents that are in the cards collection contain edits that are greater than 2, and therefor your expected result is an empty array. "Fuzzy is used to find strings which are similar to the search term or terms"; used with maxEdits and prefixLength.
Have you tried the term operator with the wildcard option? I think the below aggregation would get you the results you were actually expecting.
e.g.
db.cards.aggregate([
{$searchBeta:
{"term":
{"path":
["name","set_name"],
"query": "l*h*",
"wildcard":true}
}}]).pretty()
You need to provide an index to use with your search query.
The index is basically the analyzer that your query will use to process your results regarding if you want to a full match of the text, or you want a partial match etc.
You can read more about Analyzers from here
In your case, an index based on STANDARD analyzer will help.
After you create your index your code, modified below, will work:
db.cards.aggregate([
{
$search:{
text: { //search has been deprecated, but it works in MongoDB Compass; replace with 'text'
index: 'index_name_for_analyzer (STANDARD in your case)'
query: 'lightn',
path: ["name"] //since you only want to search in one field
fuzzy: {
maxEdits: 1,
prefixLength: 2,
maxExpansion: 100
}
}
}
}
]);

How to maintain the uniqueness based on a particular fieldin array Without using Unique index

I have the document like this.
[{
"_id" : ObjectId("aaa"),
"host": "host1",
"artData": [
{
"aid": "56004721",
"accessMin": NumberLong(1481862180
},
{
"aid": "56010082",
"accessMin": NumberLong(1481861880)
},
{
"aid": "55998802",
"accessMin": NumberLong(1481861880)
}
]
},
{
"_id" : ObjectId("bbb"),
"host": "host2",
"artData": [
{
"aid": "55922560",
"accessMin": NumberLong(1481862000)
},
{
"aid": "55922558",
"accessMin": NumberLong(1481861880)
},
{
"aid": "55940094",
"accessMin": NumberLong(1481861760)
}
]
}]
while updating any document, duplicate "aid" should not be added again in the array.
One option i got is using the unique index on artData.aid field. But building indexes is not preferred as i wont need it as per the requirement.
Is there any way to solve this?
Option 1: While designing Schema for that document use unique:true.
for example:
var newSchema = new Schema({
artData: [
{
aid: { type: String, unique: true },
accessMin: Number
}]
});
module.exports = mongoose.model('newSchema', newSchema );
Option 2: refer a link to avoid duplicate
As per this doc, you may use a multikey index as follows:
{ "artData.aid": 1 }
That being said, since you dont want to use a multikey index, another option for insertion is to
Query the document to find artData's that match the aid
Difference the result set with the set you are about to insert
remove the items that match your query
insert the remaining items from step 2
Ideally your query from step 1 wont return a set that is too large -- making this a surprisingly fast operation. That said, It's really based on the number of duplicates you assume you will be trying to insert. If the number is really high, the result of the query from step 1 could return a large set of items, in which case this solution may not be appropriate, but its all I've got for you =(.
My suggestion is to really re-evaluate the reason for not using multikey indexing

MongoDB: How to do a text search and sort by a date

Context: I have a MongoDB populated with large number of emails. I'd like to do a search for all emails that include a given email address within any of the following fields: To, From, CC and BCC. The result needs to be sorted by the field Date. We're currently trying the following query:
db.collection.find({ $text : {$search: "\"email#domain.com\""}}).sort({Date:1})
I've tried doing a compound index including the date but it does not work.
With this index...
db.collection.createIndex({Date: 1, From:"text", To:"text", CC:"text", BCC:"text"})
it gives error 17007 as Date should have an equality match as it's a prefix. This is not an option as we'd like all emails regardless of the date.
Also with this other index...
db.collection.createIndex({From:"text", To:"text", CC:"text", BCC:"text", Date:1})
Then it gives error 17144 as it goes over the internal limit for the sort.
We've read the following:
Stackoverflow ref
Stackoverflow ref
mongoDB doc on compound index
In these references and others I'm getting the idea that this is not possible but I don't think what we're trying to do is atypical or so much out of the box.
Are we doing something wrong? Is there a way to do this query with compound index or any other MongoDB feature?
thanks!
Regardless of other compound index keys, you need to include the $meta for the "textScore" in order to get the correct sorting:
db.collection.find(
{ "$text": { "$search": "\"email#domain.com\""}},
{ "score": { "$meta": "textScore" } }
).sort({
"score": { "$meta": "textScore" }, "Date": 1
})
So naturally you want that "score" to sort first, and then by "Date" in order for things to be correctly ranked by relevance of the search.
The order of index does not matter, but of course you can ony have "one" text index. So make sure you drop all others before creating:
db.collection.createIndex({
"From": "text",
"To": "text",
"CC":"text",
"BCC": "text",
"Date":1
})
Look for indexes that are current with:
db.collection.getIndicies()
Or just drop everything and start fresh:
db.collection.dropIndexes()
For the data you appear to be searching on though, I would have thought a regular compound index on each field should suit you better. Looking for "email" addresses should be an "exact match", and if you expect multiple items for each field then they should be arrays of strings, like so:
{
"TO": ["bill#example.com"],
"FROM": ["ted#example.com"],
"CC": ["marty#example.com","sarah#example.com"],
"BCC": [],
"Date": ISODate("2015-07-27T13:42:05.535Z")
}
Then you need seperate indexes on each field, possibly in compound with "Date" like so:
db.email.createIndex({ "TO": 1, "Date": 1 })
db.email.createIndex({ "FROM": 1, "Date": 1 })
db.email.createIndex({ "CC": 1, "Date": 1 })
db.email.createIndex({ "BCC": 1, "Date": 1 })
And query with an $or condition:
db.email.find({
"$or": [
{ "TO": "sarah#example.com" },
{ "FROM": "sarah#example.com" },
{ "CC": "sarah#example.com" },
{ "BCC": "sarah#example.com" }
],
"Date": { "$lt": new Date() }
})
If you look at the .explain(true) (verbose) output from that, you should see that the winning plan is an "index intersection" of all the specified indexes. This works out to be very efficient as every field ( and index selected ) has an exact match value, and a range match on the indexed date.
That's going to be a lot better for you than the "fuzzy matching" of text searches. Even regular expressions should work better here in general ( for e-mail addresses ) and especially if they are "anchored" ^ to the start of the string.
Text indexes are meant for "word like tokens" to match, but this should not be your data. The $or does not look at nice, but it should do a much better job.

MongoDB Compound Index to Optimize Update with Key and Range Condition

Have read this doc, it states that index can optimize update operation. Then, I am adding an index to my collection to optimize update operation I am using.
Records in the collection have object as _id, and a timestamp:
{_id: {userId: "sample"}, firstTimestamp: 123, otherField: "abc"}
What I want to do is operate update using query below:
db.userFirstTimestamp.update(
{_id: {userId: "sample"}, firstTimestamp: {$gt: 100}},
{_id: {userId: "sample"}, firstTimestamp: 100, otherField2: "efg"})
I want to store 'first document' based on 'firstTimestamp', field of old and new document can be different, hence it cannot be $set query, it should rewrite document instead. For sample below "otherField" should not be exist, it should be "otherField2" instead.
Based on my understanding on MongoDB doc and this article, I created index as per below
db.sample.createIndex({_id:1, timestamp:1})
Then I try to benchmark the query on an isolated experimental node using MongoDB 3.0.4 with spec below:
MongoDB 3.0.4
Machine is empty, no other operation, only mongo
RAM ~30GB
Disk is RAID 0 stripped
Collection has 60 million record
Average object size 1001 bytes
Index size 5.34 gig
When I check the log, many update query take more than 100ms, and when I do mongotop, top of the query is write query which takes ~1000ms. It is a bit slow since it takes that long to do one query.
When I do mongostat, throughput is only 400-500 query per second.
Then I try to do query explain using find query (since update does not support explain)
When I am not using projection, it is using default index {_id:1}.
When I am using projection for _id and timestamp only, it is using {_id:1, timestamp:1} index.
My question is:
Does index I have created help this update query?
If it is not helping, then how the index should be?
Any other way to optimize this update query?
Somewhat. But not optimally.
Should be this really, so index on the "element" of the object in the _id key:
db.sample.createIndex({ "_id.userId": 1, "timestamp": 1 })
Use the $set operator and stop overwiting your documents:
db.sample.update(
{
"_id.userId": "sample",
"firstTimestamp": { "$gt": 100 }
},
{
"$set": { "otherfield": "cfg" }
}
)
But really your data "should" look like this:
{
"_id": "sample",
"firstTimestamp": 200,
"otherfield2": "sam"
}
And update like:
db.sample.update(
{
"_id.userId": "sample",
"firstTimestamp": { "$gt": 100 }
},
{
"$set": {
"fistTimetamp": 100,
"otherfield2": "efg"
}
}
)
Or if you insist that fields other than "_id" and "firstTimestamp" are going to change a lot, then rather do this:
{
"_id": "sample",
"firstTimestamp": 200,
"data": {
"otherfield2": "sam"
}
}
When if you just want to replace data then do:
db.sample.update(
{
"_id.userId": "sample",
"firstTimestamp": { "$gt": 100 }
},
{
"$set": {
"fistTimetamp": 100,
"data": {
"overwritingField": "efg"
}
}
}
)
Since "data" can be replaced as an entire object if you wish, or just update a single key:
db.sample.update(
{
"_id.userId": "sample",
"firstTimestamp": { "$gt": 100 }
},
{
"$set": {
"fistTimetamp": 100,
"data.newfield": "efg"
}
}
)
In all cases, try to use the operators rather than replacing the whole object as it typically works out as more traffic and more load to the server.
But overall, what makes sense here is that the "userId" part "should" be the portion of the index that narrows down the results the most. So it definately goes before the timestamp, of which there should be a lot more possible values.
Compound primary keys are fine, but make sure you actually use them. A singular value would not make any sense and could just be assigned to _id. If you can just query on one field of they key as you are here, then you probably don't need a compound object as the primary key.
Your _id in the update suggests that you are getting exact matches for the _id therefore it is not a compound field with other keys. With this being the case, it should just a value in the _id itself.
Also a "range" is okay, but again consider that you are trying to match a single document ( well you don't mention "multi" anywhere ), so again questin why is it needed and either then go for an exact match or at "least" an upper limit.
The $set will "only" update the fields that you specifiy. I think you made a mistake in typing your question though, as the syntax for the "update" portion would not be valid. But use update operators anyway, as they send less traffic by sending a single field, or just the fields you intend to update.

Moving MongoDB to Multikeys but indexOnly is returning false

What I'm trying to do sounds logical to me however I'm not sure.
I am trying to improve part of a MongoDB collection by using Multikeys.
For example: I have multiple documents with the following format:
Document:
{
"_id": ObjectId("528a4177dbcfd00000000013"),
"name": "Shopping",
"tags": [
"retail",
"shop",
"shopping",
"store",
"grocery"
]
}
Query:
Up until now, I have been using the following query to match the tags field.
var tags = Array("store", "shopping", "etc");
db.collection.findOne({ 'tags': { $in: tags } }, { name: true });
This has been working well, however I think Multikeys should be used in this instance to improve speed & performance. Please, correct me if I am wrong!
Indexing:
I issued the following command in an attempt to index the tags.
db.collection.ensureIndex( { tags: 1 }, { safe: true }, function(err, doc) {} );
ensureIndex was successful.
Result:
However when using RockMongo's explain feature on the above query, the result is:
{
"indexOnly": false,
"indexBounds": {
"tags": [
[
"etc",
"etc"
],
[
"shopping",
"shopping"
],
[
"store",
"store"
]
]
}
}
Questions:
Why is indexing not working, is there something else I have to do?
Is Multikey indexing in this case beneficial? (I'm assuming yes.)
Is there another form of indexing that would be more beneficial?
Edit:
I've just noticed that in the RockMongo explain data there is a field:
"isMultiKey": true,
could it be that Multikeys are being used and I've completely misunderstood that it IS being indexed?
As you say in your edit, and coming from the part of explain you did not post is that isMulyiKey: true along with other information on the cursor are showing that the index is being used. The indexBounds are another indicator.
What is being described by indexOnly is the fact that your query contains another field, name, which is not part of the index. When the query optimizer sees that all elements of the query can be met by using the fields from within the index this is referred to as a covered query and the indexOnly property here is set to true.
So in an Ideal situation your query and results are using the information from the index only and MongoDB does not also have to look up the entry from the index in the collection in order to return more data.