Mongodb query getting slow even after indexing - mongodb

I have a collection let's say Fruits in db. which has following fields
{
"_id": ObjectId(...),
"item": "Banana",
"category": ["food", "produce", "grocery"],
"location": "4th Street Store",
"stock": 4,
"type": "cases"
}
There is an index by default on _id, and i added another index which is,
{
"item": "1",
"category": "1",
"stock": "1",
"type": "1"
}
this collection has data of thousands , and my query response is slow. My query is.
After the index which I mentioned above, Do I need to add all these
checks in my query or I can use any on the keys added in the index ?
Like, currently my queries are like
fruits.find({item: 'new'});
fruits.find({item: 'new', category: 'history'});
fruits.find({stock: '5', category: 'drama'});
fruits.find({type: 'new'});
Is my index which has all these keys is enough for this or I need to
created different indexes for all these combination of keys which I
mentioned above?
Sometimes I am using query and sometimes I am using aggregation on some other collections and lookup for this fruits collections and then doing search etc..

{
"item": "1",
"category": "1",
"stock": "1",
"type": "1"
}
This index will partially work for the following.
fruits.find({item: 'new'}); **Will work (Partially)**
fruits.find({item: 'new', category: 'history'}); **Will work (Partially)**
fruits.find({stock: '5', category: 'drama'}); **Won't work**
fruits.find({type: 'new'}); **Won't work**
Partially => The index is basically an addition in a B-Tree data structure in MongoDB which maps a document in the system. The index prefix on item allows the index to work for first and second query you mentioned but it would be a collection scan for third and last one.
Read about prefixes here.
You need to properly understand indexes in the long run, for queries specifically you can seek help but the knowledge gap will become a problem. This brief read will be really useful.
Edit
Aggregation => Depends on part of the query, mostly only for match you can use index thereafter everything else happens in memory(Check this for more details). For lookup you fetch the data using index on other collection if you have the index on it (again the match part) but after fetching that data whatever you do extra on it would be done in memory. Logically, mostly the fetching of data will be where indexes will be used anyway, for sorting part read the document linked above.

Related

Mongoose Model.deleteMany() only deletes first element of matches

I'm trying to use the Model.deleteMany() function from mongoose. I'm trying to do something like this:
MyModel.deleteMany({"_id": {$in: ['objectid 1', 'objectid 2'
But this only deletes the first element of the matches from DB (in this case, if 'objectid 1' exists in the DB, it deletes that, but if it isn't nothing happens (both n and deletedCount is 0 in the returned data from the function). I tried using something other than the _id as a query, and this worked. If I had three elements with the same 'name' field, I could delete these.
I tried _id both with and without quotation marks. I also tried converting the object id strings to actual object ids before passing them to deleteMany, but this had no difference either. I have also of course tried to google this, but everything I've found are examples of usage, where it looks like I'm doing the exact same thing as the various blog posts.
I haven't added much code here because I don't really see what else I could be adding. I'm already printing out the input to the $in object, and this is correct. The strangest thing, I think, is that the first element of the list is deleted. Is it treated as a deleteOne request for some reason? are there any config options I need?
As per request, I've added the query and the documents I'd hope to delete:
//Request
MemberModel.deleteMany({"_id": {$in: [
5ee4f6308631dc413c7f04b4,
5ee4f6308631dc413c7f04b5,
5ee4f6308631dc413c7f04b6
]}};
//Expected to be deleted
[
{
"_id": "5ee4f62f8631dc413c7f04b5",
"firstName": "Name",
"lastName": "Nameson",
"__v": 0
},
{
"_id": "5ee4f62f8631dc413c7f04b6",
"firstName": "Other",
"lastName": "Person",
"__v": 0
}
]
If you have any ideas for what I could try, that would be much appreciated.

mongo operation speed : array $addToSet/$pull vs object $set/$unset

I have a index collection containing lots of terms, and a field items containing identifier from an other collection. Currently that field store an array of document, and docs are added by $addToSet, but I have some performance issues. It seems an $unset operation is executed faster, so I plan to change the array of document to a document of embed documents.
Am I right to think the $set/$unset fields are fatest than push/pull embed document into arrays ?
EDIT:
After small tests, we see the set/unset 4 times faster. On the other
hand, if I use object instead of array, it's a little harder to count
the number of properties (vs the length of the array), and we were
counting that a lot. But we can consider using $set everytime and
adding a field with the number of items.
This is a document of the current index :
{
"_id": ObjectId("5594dea2b693fffd8e8b48d3"),
"term": "clock",
"nbItems": NumberLong("1"),
"items": [
{
"_id": ObjectId("55857b10b693ff18948ca216"),
"id": NumberLong("123")
}
{
"_id": ObjectId("55857b10b693ff18948ca217"),
"id": NumberLong("456")
}
]
}
Frequent update operations are :
* remove item : {$pull:{"items":{"id":123}}}
* add item : {$addToSet:{"items":{"_id":ObjectId("55857b10b693ff18948ca216"),"id":123,}}}
* I can change $addToSet to $push and check duplicates before if performances are better
And this is what I plan to do:
{
"_id": ObjectId("5594dea2b693fffd8e8b48d3"),
"term": "clock",
"nbItems": NumberLong("1"),
"items": {
"123":{
"_id": ObjectId("55857b10b693ff18948ca216")
}
"456":{
"_id": ObjectId("55857b10b693ff18948ca217")
}
}
}
* remove item : {$unset:{"items.123":true}
* add item : {$set:{"items.123":{"_id":ObjectId("55857b10b693ff18948ca216"),"id":123,}}}
For information, theses operations are made with pymongo (or can be done with php if there is a good reason to), but I don't think this is relevant
As with any performance question, there are a number of factors which can come into play with an issue like this, such as indexes, need to hit disk, etc.
That being said, I suspect you are likely correct that adding a new field or removing an old field from a MongoDB document will be slightly faster than appending/removing from an array as the array types will be less easy to traverse when searching for duplicates.

Delete a MongoDB subdocument by value

I have a collection containing documents that look like this:
{
"user": "foo",
"topics": {
"Topic AB": {
"score": 20,
"frequency": 3,
"last_seen": 40
},
"Topic BD": {
"score": 10,
"frequency": 2,
"last_seen": 38
},
"Topic TF": {
"score": 19,
"frequency": 6,
"last_seen": 20
}
}
}
I want to remove subdocuments whose last_seen value is less than 30.
I don't want to use arrays here since I'm using $inc to update the subdocuments in conjunction with upsert (which doesn't support the $ notation).
The real question here is how can I delete a key depending on its value. Using $unset simply drops a subdocument regardless of what it contains.
I'm afraid I don't think this is possible with your current design. Knowing the name of the key whose last_seen value you wish to test, for example Topic TF, you can do
> db.topics.update({"topics.Topic TF.last_seen" : { "$lt" : 30 }},
{ "$unset" : { "topics.Topic TF" : 1} })
However, with an embedded document structure, if you don't know the name of the key that you want to query against then you can't run the query. If the Topic XX keys are only known by what's in the document, you'd have to pull the whole document to find out what keys to test, and at that point you ought to just manipulate the document client-side and then update by _id.
The best option is to use arrays. The $ positional operator works with upserts, it just has a serious gotcha that, in the case of an insert, the $ will be interpreted as part of the field name instead of as an operator, so I understand your conclusion that it doesn't seem feasible. I'm not quite sure how you are using upsert such that arrays seem like they won't work, though. Could you give more detail there and I'll try to help come up with a reasonable workaround to use arrays and $ with your use case?

geospatial index with another multikey index... Any solutions?

I have a collection like below. I want to index "location" and "product_list.id". MongoDB seems to permit only single multi key index in a document. Any work around possible?
{
"location":[62.99932,71.23424],
"product_list":[
{"id":"wf2r34f34ff33", "price": "87.99"},
{"id":"f334r3rff43ff", "price": "21.00"},
{"id":"wf2r34f34ffef", "price": "87.99"}
],
}
True, you can only index on a a single array type of field within a single compound index of a collection, but you seem to be talking about "geo-spatial" queries which are something a little different. There is nothing wrong with this at all:
db.collection.ensureIndex({ "location": "2d", "product_list": 1 })
That is a perfectly valid form for a compound index.
So it's looks like an array, but in this case MongoDB treats it differently.

How do I manage a sublist in Mongodb?

I have different types of data that would be difficult to model and scale with a relational database (e.g., a product type)
I'm interested in using Mongodb to solve this problem.
I am referencing the documentation at mongodb's website:
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
For the data type that I am storing, I need to also maintain a relational list of id's where this particular product is available (e.g., store location id's).
In their example regarding "one-to-many relationships with embedded documents", they have the following:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
I am currently importing the data with a spreadsheet, and want to use a batchInsert.
To avoid duplicates, I assume that:
1) I need to do an ensure index on the ID, and ignore errors on the insert?
2) Do I then need to loop through all the ID's to insert a new related ID to the books?
Your question could possibly be defined a little better, but let's consider the case that you have rows in a spreadsheet or other source that are all de-normalized in some way. So in a JSON representation the rows would be something like this:
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 12346789
},
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 234567890
}
So in order to get those sort of row results into the structure you wanted, one way to do this would be using the "upsert" functionality of the .update() method:
So assuming you have some way of looping the input values and they are identified with some structure then an analog to this would be something like:
books.forEach(function(book) {
db.publishers.update(
{
"name": book.publisher
},
{
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
{ "upsert": true }
);
})
This essentially simplified the code so that MongoDB is doing all of the data collection work for you. So where the "name" of the publisher is considered to be unique, what the statement does is first search for a document in the collection that matches the query condition given, as the "name".
In the case where that document is not found, then a new document is inserted. So either the database or driver will take care of creating the new _id value for this document and your "condition" is also automatically inserted to the new document since it was an implied value that should exist.
The usage of the $setOnInsert operator is to say that those fields will only be set when a new document is created. The final part uses $addToSet in order to "push" the book values that have not already been found into the "books" array (or set).
The reason for the separation is for when a document is actually found to exist with the specified "publisher" name. In this case, all of the fields under the $setOnInsert will be ignored as they should already be in the document. So only the $addToSet operation is processed and sent to the server in order to add the new entry to the "books" array (set) and where it does not already exist.
So that would be simplified logic compared to aggregating the new records in code before sending a new insert operation. However it is not very "batch" like as you are still performing some operation to the server for each row.
This is fixed in MongoDB version 2.6 and above as there is now the ability to do "batch" updates. So with a similar analog:
var batch = [];
books.forEach(function(book) {
batch.push({
"q": { "name": book.publisher },
"u": {
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
"upsert": true
});
if ( ( batch.length % 500 ) == 0 ) {
db.runCommand( "update", "updates": batch );
batch = [];
}
});
db.runCommand( "update", "updates": batch );
So what is doing in setting up all of the constructed update statements into a single call to the server with a sensible size of operations sent in the batch, in this case once every 500 items processed. The actual limit is the BSON document maximum of 16MB so this can be altered appropriate to your data.
If your MongoDB version is lower than 2.6 then you either use the first form or do something similar to the second form using the existing batch insert functionality. But if you choose to insert then you need to do all the pre-aggregation work within your code.
All of the methods are of course supported with the PHP driver, so it is just a matter of adapting this to your actual code and which course you want to take.