From aggregate command doucumentation`:
To indicate a cursor with the default batch size, specify cursor: {}.
However, I haven't found the value of such defaul or how to find it (maybe using a mongo admin command).
How to find such value?
From the docs:
The MongoDB server returns the query results in batches. The amount of data in the batch will not exceed the maximum BSON document size.
New in version 3.4: Operations of type find(), aggregate(), listIndexes, and listCollections return a maximum of 16 megabytes per batch. batchSize() can enforce a smaller limit, but not a larger one.
find() and aggregate() operations have an initial batch size of 101 documents by default. Subsequent getMore operations issued against the resulting cursor have no default batch size, so they are limited only by the 16 megabyte message size.
So, the default for the first batch is 101 documents, the batch size for subsequent getMore() calls is undetermined but cannot exceed 16 megabytes.
If I'm not entirely wrong, I think it's 101 for aggregation pipeline.
See here
Related
In mongo local db, you can check oplog related data by using db.oplog.rs.stats()
But what does the "count" field mean? And I see it's decreasing every second in my db server.
The replica set oplog (oplog.rs) is a capped collection, which means it has a maximum total size for data. The underlying implementation varies by storage engine (eg WiredTiger vs MMAPv1) but the conceptual outcome is the same: capped collections make room for new documents by overwriting or expiring the oldest documents in the collection in FIFO order (First In, First Out).
But what does the "count" field mean?
As with any collection, the count information in db.collection.stats() indicates the number of documents currently in the collection.
For an explanation of collection stats output, see collStats in the MongoDB documentation.
Note: The output will vary depending on your version of MongoDB server and storage engine used.
I see it's decreasing every second in my db server.
The count of documents in the oplog will vary over time based on the size of the write operations being applied, so this is expected to fluctuate for an active deployment. For example, single field updates will generally write smaller oplog entries than full document updates. Once your oplog reaches its maximum data size, the count may also decrease as the oldest oplog documents are removed to make room for new oplog entries.
Should I use the allowDiskUse option when returned doc exceed 16MB limit in aggregation?
Or should I alter db structure or codes logic to avoid the limit?
What's the advantage and disadvantage of 'allowDiskUse'?
Thanks for your help.
Hers is the official doc I have seen:
Result Size Restrictions
Changed in version 2.6.
Starting in MongoDB 2.6, the aggregate command can return a cursor or store the results in a collection. When returning a cursor or storing the results in a collection, each document in the result set is subject to the BSON Document Size limit, currently 16 megabytes; if any single document that exceeds the BSON Document Size limit, the command will produce an error. The limit only applies to the returned documents; during the pipeline processing, the documents may exceed this size.
Memory Restrictions¶
Changed in version 2.6.
Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this limit, MongoDB will produce an error. To allow for the handling of large datasets, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.
https://docs.mongodb.com/manual/core/aggregation-pipeline-limits/
allowDiskUse is unrelated to the 16MB result size limit. That setting controls whether pipeline steps such as $sort or $group can use some temporary disk space if they need more than 100MB of memory. In theory, for an arbitrary pipeline this could be a very large amount of diskspace. Personally it's never been a problem, but that will be down to your data.
If your result is going to be more than 16MB then you need to use the $out pipeline stage to output the data to a collection or use a pipeline API that returns a cursor to results instead of returning all the data inline (for some drivers this is a separate method, for others it is a flag passed to the same method).
I am using mongolastic, to index a collection in elasticsearch.
It took around 6 hours to index a collection having 30,000 documents. Is there a way we can increase the efficiency?
Also, it was noted that the indexing was done in batch (of 200), can we increase this limit too?
Any suggestions?
As for the limit of the batch size - this is taken from the link you provided yourself:
Override the default batch size which is normally 200. (Optional)
batch: <number> (6)
When i was reading about limit method then i found a line that was
A negative limit is similar to a positive limit but closes the cursor
after returning a single batch of results. As such, with a negative
limit, if the limited result set does not fit into a single batch, the
number of documents received will be less than the specified limit.
I can't understand this explanation. So can any one explain this with a suitable Example?
If your query returns 100 elements before applying the limit operation, then you have 10 batches of data for your query ( if you do it via mongo shell). You can iterate through the data via it command until end ( 10 iterations).
If you add limit(30) to your query, you indicate that you want to get only 30 elements. Mongod will keep the connection from the mongo shell open until you have gone through all data.
However, if you set limit(-30), your query result on server still has 30 elements, but mongod only returns the first 10 elements to the shell, and you cannot go through the rest with it command because the connection is closed.
This is about a recommendation on mongodb. I have a collection that always increase row by row (I mean the count of documents). It is about 5 billion now. When I make a request on this collection I sometimes get the error about 16 MB size.
The first thing that I want to ask is which structure is the best way of creating collections that increasing the rows hugely. What is the best approach? What should I do for this kind of structure and the performance?
Just to clarify, the 16MB limitation is on documents, not collections. Specifically, its the maximum BSON document size, as specified in this page in the documentation.
The maximum BSON document size is 16 megabytes.
If you're running into the 16MB limit in aggregation is because you are using MongoDB version 2.4 or older. In these, the aggregate() method returned a document, which is subject to the same limitation as all other documents. Starting in 2.6, the aggregate() method returns a cursor, which is not subject to the 16MB limit. For more information, you should consult this page in the documentation. Note that each stage in the aggregation pipeline is still limited to 100MB of RAM.