How could we make MongoDB report errors for queries that don't use indices?
We end up creating indices for every query anyway so it would be great if MongoDB would report missing indices for us. Also it would be convenient to be able to configure the restriction on a connection basis. This way indices wouldn't come into our way when working from MongoDB shell.
The notablescan ( http://docs.mongodb.org/manual/reference/parameters/#param.notablescan ) option for the MongoDB binary (mongod.exe or mongod depending on your OS) allows you to stop any query, with an emitted log error, which does not use an index at all.
This option will not stop inefficient queries though so that part will still need to be discovered manually by you.
Related
I run MongoDB 4.0 on WiredTiger under Ubuntu Server 16.04 to store complex documents. There is an issue with one of the collections: the documents have many images written as strings in base64. I understand this is a bad practice, but I need some time to fix it.
Because of this some find operations fail, but only those which have a non-empty filter or skip. For example db.collection('collection').find({}) runs OK while db.collection('collection').find({category: 1}) just closes connection after a timeout. It doesn't matter how many documents should be returned: if there's a filter, the error will pop every time (even if it should return 0 docs), while an empty query always executes well until skip is too big.
UPD: some skip values make queries to fail. db.collection('collection').find({}).skip(5000).limit(1) runs well, db.collection('collection').find({}).skip(9000).limit(1) takes way much time but executes too, while db.collection('collection').find({}).skip(10000).limit(1) fails every time. Looks like there's some kind of buffer where the DB stores query related data and on the 10000 docs it runs out of the resources. The collection itself has ~10500 docs. Also, searching by _id runs OK. Unfortunately, I have no opportunity to make new indexes because the operation fails just like read.
What temporary solution I may use before removing base64 images from the collection?
This happens because such a problematic data scheme causes huge RAM usage. The more entities there are in the collection, the more RAM is needed not only to perform well but even to run find.
Increasing MongoDB default RAM usage with storage.wiredTiger.engineConfig.cacheSizeGB config option allowed all the operations to run fine.
I am having trouble trying to call {allowDiskUse:true} in the mongodb compass GUI tool. I have created a view based on an aggregation of another collection. The view returns an error of
Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting.
Hence i saw that it is required to append {allowDiskUse:true} but i am unable to find a suitable place to call. There is a possibility to use the $out stage to write to another collection but I would like to try the view first :)
ADD ON:
I have tried to run the query db.noDups.aggregate([],{allowDiskUse : true }); in command line and it works. But I would like to execute in MongoDB compass for the visualization and exporting function.
I also tried {},{},{allowDiskUse: true} in the filter condition but still no luck :(
Btw I am on MongoDB 4.2.6 Community and MongoDB compass 1.25.0
I tried appending it to the filter and it didn't work. I've looked on many different forums and kind find a solution for allowDiskUse from Compass. This seems kind of crazy that you need to add such a kludgy option to even do modest groupings on small amounts of data.
I also have been looking to see how to increase the amount of memory that Mongo can use to get around having to do this. I have Mongo installed on a server with 512GB of memory, it seems rather silly to have developers jumping through hoops like this.
I have a single machine MongoDB setup which satisfies the needs of my application at runtime but impose a significant bottleneck at the data ingestion time as the background indexing on an array field takes days to complete (inverted index). It seems to be the same issue as posted here MongoDB large index build very slow. I wonder if it makes sense to delegate/distribute index creation and then deploy the result index on the main machine. If anyone considered it - would appreciate sharing the experience. Here are some ideas I wanted to test:
Use a distributed job like Hadoop or DataFlow to create index tuples , then load them back to either MongoDB directly or another DB that can be more efficient for storing an inverted index.
Use another service like ElasticSearch that can potentially handle indexing more efficiently; however, I have no experience with it and want to continue hosting everything on the same machine.
At the end I decided to generate all tuples to index with Apache Beam/DataFlow, import all tuples with mongoimport and then create an index on the fields that I need. This way I get an index to query in hours rather than days.
I posted this question on Software Engineering portal without conducting any tests. It was also brought to my notice that this needs to be posted on SO, not there. Thanks for the help in advance!
I need Mongo to return the documents sorted by a field value. The easiest way to achieve this would be running the command db.collectionName.find().sort({field:priority}), however, I tried this method on a dummy collection of 1000 documents; it runs in 22ms. I also tried running db.collectionName.find() on the same data, it runs in 3ms, which means that Mongo is taking time to sort and return the documents (which is understandable). Both tests were done in the same environment and were done by adding .explain("executionStats") to the query.
I will be working with a large amount of data and concurrent requests to access DB, so I need the querying to be faster. My question is, is there a way to always keep the data sorted by a field in the DB so that I don't have to sort it over and over for all requests? For instance, some sort of update command that could sort the entire DB once a week or so?
A non-unique index with that field in this collection will give the results you're after and avoid the inefficient in-memory sorting.
When performing a query in MongoDb, I need to obtain a total count of all matches, along with the documents themselves as a limited/paged subset.
I can achieve the goal with two queries, but I do not see how to do this with one query. I am hoping there is a mongo feature that is, in some sense, equivalent to SQL_CALC_FOUND_ROWS, as it seems like overkill to have to run the query twice. Any help would be great. Thanks!
EDIT: Here is Java code to do the above.
DBCursor cursor = collection.find(searchQuery).limit(10);
System.out.println("total objects = " + cursor.count());
I'm not sure which language you're using, but you can typically call a count method on the cursor that's the result of a find query and then use that same cursor to obtain the documents themselves.
It's not only overkill to run the query twice, but there is also the risk of inconsistency. The collection might change between the two queries, or each query might be routed to a different peer in a replica set, which may have different versions of the collection.
The count() function on cursors (in the MongoDB JavaScript shell) really runs another query, You can see that by typing "cursor.count" (without parentheses), so it is not better than running two queries.
In the C++ driver, cursors don't even have a "count" function. There is "itcount", but it only loops over the cursor and fetches all results, which is not what you want (for performance reasons). The Java driver also has "itcount", and there the documentation says that it should be used for testing only.
It seems there is no way to do a "find some and get total count" consistently and efficiently.