Something went wrong and I don't understand why. I have a script that used to be working but all of the sudden if just stopped. It's basically a mongo query on a big Mongo collection (600gb+).
Here's the query:
db.action_traces.findOne( {"block_time": {"$lt": "2018-07-15T00:00:00.000Z"} } ).pretty()
Originally I wasn't using findOne but I've restricted the results to one just in case I could help but the result is the same: nothing happens.
If I just run a find query it goes fine.
There is nothing showing up in the mongodb log and nothing seems relevant in the syslog either.
There is clearly something wrong with Mongo though, as htop is showing me this:
The mongo process is fluctuating but is most of the time filling out one CPU thread to 100%.
Can anyone help?
Many thanks in advance,
Sounds like its probably trawling through the whole collection - have you added an index on block_time ?
.find() returns a cursor pointing to the first set of matching documents, ordered by default (natural order) if you dont specify a sort order.
.findOne() returns a single document - with the provisio that if your query matches multiple documents then it uses their "natural order" and returns the first one (implying that it needs to find all matches first).
I'm guessing that your collection needs an index on that field - meaning it can look up the result without a full scan.
Related
I posted this question on Software Engineering portal without conducting any tests. It was also brought to my notice that this needs to be posted on SO, not there. Thanks for the help in advance!
I need Mongo to return the documents sorted by a field value. The easiest way to achieve this would be running the command db.collectionName.find().sort({field:priority}), however, I tried this method on a dummy collection of 1000 documents; it runs in 22ms. I also tried running db.collectionName.find() on the same data, it runs in 3ms, which means that Mongo is taking time to sort and return the documents (which is understandable). Both tests were done in the same environment and were done by adding .explain("executionStats") to the query.
I will be working with a large amount of data and concurrent requests to access DB, so I need the querying to be faster. My question is, is there a way to always keep the data sorted by a field in the DB so that I don't have to sort it over and over for all requests? For instance, some sort of update command that could sort the entire DB once a week or so?
A non-unique index with that field in this collection will give the results you're after and avoid the inefficient in-memory sorting.
I've got a small collection of mp3s in a mongo 2.6 DB. A document from this "songs" collection might look like this:
{_id: ..., name: "shagCarpet.mp3", tags: ["acapella", "rap"]}
I expect this collection to grow rapidly in the near future, so I want to index this collection for easy searching. I created a multi-key index on the "tags" field like so:
db.songs.createIndex({"tags": 1})
Since the collection is currently small, I don't see a performance gain by adding the index. How can I verify that the index is working properly? Can I look at the data in the index? I'm aware of db.songs.getIndexes(), but that only regurgitates what I told Mongo when I created the index. I'd like to actually see what the index data looks like.
you will get more info by using explain():
db.songs.find({"tags":"accapella"}).explain();
But I presume you wanna compare the speed of the query that is using the index with the speed of a normal collection scan(no index). You can force query to do a collection scan by using hint() method ie. on _id.
db.songs.find({"tags":"accapella"}).hint({_id:1}).explain();
Then compare explain() "millis" property between the two - you should see that the query without hint({_id:1}) is faster.
The first step to determining if your indexes are working correctly is determining how you are going to be searching your database. Once you find the queries that will be run most frequently and/or will be most expensive, run them in the shell and append the .explain() call. For example:
db.songs.find(...).explain();
This will dump a lot of info out, part of which will tell you if an index was used, and if so which ones and in what order. For details of what it all means, see here.
use
db.songs.find({"tags":"accapella"}).explain("executionStats");
if nReturned equals docsExamined, it means that no unnecessary documents where examined and your index covers your query completely.
also here is a good article
https://www.percona.com/blog/2018/09/06/mongodb-investigate-queries-with-explain-index-usage-part-2/
I known there is already some patterns on pagination with mongo (skip() on few documents, ranged queries on many), but in my situation i need live sorting.
update:
For clarity i'll change point of question. Can i make query like this:
db.collection.find().sort({key: 1}).limit(n).sort({key: -1}).limit(1)
The main point, is to sort query in "usual" order, limit the returned set of data and then reverse this with sorting to get the last index of paginated data. I tried this approach, but it seems that mongo somehow optimise query and ignores first sort() operator.
I am having a huge problem attempting to grasp your question.
From what I can tell when a user refreshes the page, say 6 hours later, it should show not only the results that were there but also the results that are there now.
As #JohnnyHK says MongoDB does "live" sorting naturally whereby this would be the case and MongoDB, for your queries would give you back the right results.
Now I think one problem you might trying to get at here (the question needs clarification, massively) is that due to the data change the last _id you saw might no longer truely represent the page numbers etc or even the diversity of the information, i.e. the last _id you saw is now in fact half way through page 13.
These sorts of things you would probably spend more time and performance trying to solve than just letting the user understand that they have been AFAK for a long time.
Edit
Aha, I think I see what your trying to do now, your trying to be sneaky by getting both the page and the last item in the list at the same time. Unfortunately just like SQL this is not possible. Even if sort worked like that the sort would not function like it should since you can only sort one way on a single field.
However for future reference the sort() function is exactly that on a cursor and until you actually open the cursor by starting to iterate it calling sort() multiple times will just overwrite the cursor property.
I am afraid that this has to be done with two queries, so you get your page first and then client side (I think your looking for the max of that page) scroll through the records to find the last _id or just do a second query to get the last _id. It should be super dupa fast.
When performing a query in MongoDb, I need to obtain a total count of all matches, along with the documents themselves as a limited/paged subset.
I can achieve the goal with two queries, but I do not see how to do this with one query. I am hoping there is a mongo feature that is, in some sense, equivalent to SQL_CALC_FOUND_ROWS, as it seems like overkill to have to run the query twice. Any help would be great. Thanks!
EDIT: Here is Java code to do the above.
DBCursor cursor = collection.find(searchQuery).limit(10);
System.out.println("total objects = " + cursor.count());
I'm not sure which language you're using, but you can typically call a count method on the cursor that's the result of a find query and then use that same cursor to obtain the documents themselves.
It's not only overkill to run the query twice, but there is also the risk of inconsistency. The collection might change between the two queries, or each query might be routed to a different peer in a replica set, which may have different versions of the collection.
The count() function on cursors (in the MongoDB JavaScript shell) really runs another query, You can see that by typing "cursor.count" (without parentheses), so it is not better than running two queries.
In the C++ driver, cursors don't even have a "count" function. There is "itcount", but it only loops over the cursor and fetches all results, which is not what you want (for performance reasons). The Java driver also has "itcount", and there the documentation says that it should be used for testing only.
It seems there is no way to do a "find some and get total count" consistently and efficiently.
I’m playing around with MongoDB for the moment to see what nice features it has. I’ve created a small test suite representing a simple blog system with posts, authors and comments, very basic.
I’ve experimented with a search function which uses the MongoRegEx class (PHP Driver), where I’m just searching through all post content and post titles after the sentence ‘lorem ipsum’ with case sensitive on “/I”.
My code looks like this:
$regex = new MongoRegEx('/lorem ipsum/i');
$query = array('post' => $regex, 'post_title' => $regex);
But I’m confused and stunned about what happens. I check every query for running time (set microtime before and after the query and get the time with 15 decimals).
For my first test I’ve added 110.000 blog documents and 5000 authors, everything randomly generated. When I do my search, it finds 6824 posts with the sentence “lorem ipsum” and it takes 0.000057935714722 seconds to do the search. And this is after I’ve reset the MongoDB service (using Windows) and this is without any index other than the default on _id.
MongoDB uses a B-tree index, which most definitely isn’t very efficient for a full text search. If I create an index on my post content attribute, the same query as above runs in 0.000150918960571, which funny enough is slower than without any index (slower with a factor of 0.000092983245849). Now this can happen for several reasons because it uses a B-tree cursor.
But I’ve tried to search for an explanation as to how it can query it so fast. I guess that it probably keeps everything in my RAM (I’ve got 4GB and the database is about 500MB). This is why I try to restart the mongodb service to get a full result.
Can anyone with experience with MongoDB help me understand what is going on with this kind of full text search with or without index and definitely without an inverted index?
Sincerely
- Mestika
I think you simply didn't iterate over the results? With just a find(), the driver will not send a query to the server. You need to fetch at least one result for that. I don't believe MongoDB is this fast, and I believe your error to be in your benchmark.
As a second thing, for regular expression search that is not anchored at the beginning of the field's value with an ^, no index is used at all. You should play with explain() to see what is actually happening.