MongoDB atlas archive after total document reaches a certain number - mongodb

I want to write a custom archive rule in mongodb that archives the data with some condition.
Let's say I have a collection A.
If the collection has more than 1000 docs, archive the oldest doc (I have createdAt) until total document count is 1000. So basically, It should not exceed 1000.

You can implement this in a number of different ways, there is no OOTB solution for this.
I would personally use a capped collection with a size set to a 1000, this will let Mongo handle the most difficult part of your requirements, Regarding the "archiving" I would create an additional collection for archiving purposes and insert the document into both collections.
This will allow you to have a lean capped collection for your queries, and an additional "archive" collection for historical queries.
There are additional points to consider that you didn't specify, are the capped collection limitations an issue? do we need to support updates? what is the frequency of such operations, and so on.

Related

Is there a way to count reads/writes per collection in mongodb?

is there a way to view the reads/writes in mongodb on a per collection-basis? I would like to see how many documents have been read and written on a specific collection.
We are currently researching the costs of some specific queries and try to find out more about if they are heavy read- and/or write-tasks.
Thank you :-)
Yes you can. You can perform db.collection.stats()
this will return the size and count of documents, index information and a lot of other useful information. But you want to count the number of reads and writes performed on a specific collection. For that you can use mongostat. It captures and returns the counts of database operations by type (e.g. insert, query, update, delete, etc.). These counts report on the load distribution on the server. Read more about mongostat on their documentation. Here's the link https://docs.mongodb.com/manual/reference/program/mongostat/#bin.mongostat

Is better to have multiple collections with thousands of documents or one collection with 100 million documents?

I'm migrating a MySql table which has 100 million rows to a MongoDB database, this table stores companys documents and what difference them are the column company_id. I was wondering if have multiple collections on mongodb would be faster than just one collection, for example, each company would have it own collection (collections: company_1, company_2, company_3...) and store only documents from that company, so I will not need to filter then as I would need to do if I just had 1 big collection and in every document there would be a column named company_id that would be used to filter documents.
Which method would perform best in this case?
EDIT:
Here's a JSON document example: https://pastebin.com/T5m2tbaY
{"_id":"5d8b8241ae0f000015006142","id_consulta":45254008,"company_id":7,"tipo_doc":"nfe","data_requisicao":"2019-09-25T15:05:35.155Z","xml":Object...
You could have one collection and one document per company, with company specific details in the document, assuming the details do not exceed 16MB in size. Place an index on company id for performance reasons. If performance conditions are not meeting expectations scale vertically - i.e., add memory, CPU, disk IO, and network enhancements to increase performance. If that does not suffice, consider sharding the collection across multiple hosts.

What does nscannedObjects = 0 actually mean?

As far as I understood, nscannedObjects entry in the explain() method means the number of documents that MongoDB needed to go to find in the disk.
My question is: when this value is 0, what this actually mean besides the explanation above? Does MongoDB keep a cache with some documents stored there?
nscannedObjects=0 means that there was no fetching or filtering to satisfy your query, the query was resolved solely based on indexes. So for example if you were to query for {_id:10} and there were no matching documents you would get nscannedObjects=0.
It has nothing to do with the data being in memory, there is no such distinction with the query plan.
Note that in MongoDB 3.0 and later nscanned and nscannedObjects are now called totalKeysExamined and totalDocsExamined, which is a little more self-explanatory.
Mongo is a document database, which means that it can interpret the structure of the stored documents (unlike for example key-value stores).
One particular advantage of that approach is that you can build indices on the documents in the database.
Index is a data structure (usually a variant of b-tree), which allows for fast searching of documents basing on some of their attributes (for example id (!= _id) or some other distinctive feature). These are usually stored in memory, allowing very fast access to them.
When you search for documents basing on indexed attributes (let's say id > 50), then mongo doesn't need to fetch the document from memory/disk/whatever - it can see which documents match the criteria basing solely on the index (note that fetching something from disk is several orders of magnitude slower than memory lookup, even with no cache). The only time it actually goes to the disk is when you need to fetch the document for further processing (and which is not covered by the statistic you cited).
Indices are crucial to achieve high performance, but also have drawbacks (for example rarely used index can slow down inserts and not be worth it - after each insertion the index has to be updated).

Mapping datasets to NoSql (MongoDB) collection

what I have ?
I have data of 'n' department
each department has more than 1000 datasets
each datasets has more than 10,000 csv files(size greater than 10MB) each with different schema.
This data even grow more in future
What I want to DO?
I want to map this data into mongodb
What approaches I used?
I can't map each datasets to a document in mongo since it has limit of 4-16MB
I cannot create collection for each datasets as max number of collection is also limited (<24000)
So finally I thought to create collection for each department , in that collection one document for each record in csv file belonging to that department.
I want to know from you :
will there be a performance issue if we map each record to document?
is there any max limit for number of documents?
is there any other design i can do?
will there be a performance issue if we map each record to document?
mapping each record to document in mongodb is not a bad design. You can have a look at FAQ at mongodb site
http://docs.mongodb.org/manual/faq/fundamentals/#do-mongodb-databases-have-tables .
It says,
...Instead of tables, a MongoDB database stores its data in collections,
which are the rough equivalent of RDBMS tables. A collection holds one
or more documents, which corresponds to a record or a row in a
relational database table....
Along with limitation of BSON document size(16MB), It also has max limit of 100 for level of document nesting
http://docs.mongodb.org/manual/reference/limits/#BSON Document Size
...Nested Depth for BSON Documents Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON document...
So its better to go with one document for each record
is there any max limit for number of documents?
No, Its mention in reference manual of mongoDB
...Maximum Number of Documents in a Capped Collection Changed in
version
2.4.
If you specify a maximum number of documents for a capped collection
using the max parameter to create, the limit must be less than 232
documents. If you do not specify a maximum number of documents when
creating a capped collection, there is no limit on the number of
documents ...
is there any other design i can do?
If your document is too large then you can think of document partitioning at application level. But it will have high computation requirement at application layer.
will there be a performance issue if we map each record to document?
That depends entirely on how you search them. When you use a lot of queries which affect only one document, it is likely even faster that way. When a higher document-granularity results in a lot of document-spanning queries, it will get slower because MongoDB can't do that itself.
is there any max limit for number of documents?
No.
is there any other design i can do?
Maybe, but that depends on how you want to query your data. When you are content with treating files as a BLOB which is retrieved as a whole but not searched or analyzed on the database level, you could consider storing them on GridFS. It's a way to store files larger than 16MB on MongoDB.
In General, MongoDB database design doesn't depend so much on what and how much data you have, but rather on how you want to work with it.

Mongodb : multiple specific collections or one "store-it-all" collection for performance / indexing

I'm logging different actions users make on our website. Each action can be of different type : a comment, a search query, a page view, a vote etc... Each of these types has its own schema and common infos. For instance :
comment : {"_id":(mongoId), "type":"comment", "date":4/7/2012,
"user":"Franck", "text":"This is a sample comment"}
search : {"_id":(mongoId), "type":"search", "date":4/6/2012,
"user":"Franck", "query":"mongodb"} etc...
Basically, in OOP or RDBMS, I would design an Action class / table and a set of inherited classes / tables (Comment, Search, Vote).
As MongoDb is schema less, I'm inclined to set up a unique collection ("Actions") where I would store these objects instead of multiple collections (collection Actions + collection Comments with a link key to its parent Action etc...).
My question is : what about performance / response time if I try to search by specific columns ?
As I understand indexing best practices, if I want "every users searching for mongodb", I would index columns "type" + "query". But it will not concern the whole set of data, only those of type "search".
Will MongoDb engine scan the whole table or merely focus on data having this specific schema ?
If you create sparse indexes mongo will ignore any rows that don't have the key. Though there is the specific limitation of sparse indexes that they can only index one field.
However, if you are only going to query using common fields there's absolutely no reason not to use a single collection.
I.e. if an index on user+type (or date+user+type) will satisfy all your querying needs - there's no reason to create multiple collections
Tip: use date objects for dates, use object ids not names where appropriate.
Here is some useful information from MongoDB's Best Practices
Store all data for a record in a single document.
MongoDB provides atomic operations at the document level. When data
for a record is stored in a single document the entire record can be
retrieved in a single seek operation, which is very efficient. In some
cases it may not be practical to store all data in a single document,
or it may negatively impact other operations. Make the trade-offs that
are best for your application.
Avoid Large Documents.
The maximum size for documents in MongoDB is 16MB. In practice most
documents are a few kilobytes or less. Consider documents more like
rows in a table than the tables themselves. Rather than maintaining
lists of records in a single document, instead make each record a
document. For large media documents, such as video, consider using
GridFS, a convention implemented by all the drivers that stores the
binary data across many smaller documents.