We have a active collection with 3000-5000 updates per second and average payload size being 500KB. We run changestream on the same but currently we only have one connector listening to all events in the collections. This works but we expect even more updates and are exploring ways to scale this approach horizontally. The current plan is to the hash the _id key and distirbute the events equally into three or more connectors.
The challenge is , since we use fulldocument=updateLookup we want to check if internally Mongo will now do a query to read the documents 3 times instead of the existing one time.
We tried testing this approach by using mongocat and setting log level to 2 but we are unable to see any Query logs on the collection while updating the documents even while having a active changestream with full document enabled. Any ideas on how we can test this out would really help
Related
I have a collection in mongo that stores every user action of my application, and its very huge in size (3Million documents per day). On UI I have a requirement to show the user actions for max. 6months period.
And the queries on this collection are becoming very slow with all the historic data, though there are indexes in place. So, I want to move the documents that are older than 6months to a separate collection.
Is it the right way to handle my issue?
Following are some of the techniques you can use to manage data growth in MongoDB:
Using capped collection
Using TTLs
Using mulitple collections for months
Using different databases on same host
I'm currently experimenting with a test collection on a LAN-accessible MongoDB server and data in a Meteor (v1.6) application. View layer of choice is React and right now I'm using the createContainer to bind the subscriptions to props.
The data that gets put in the MongoDB storage is updated on a daily basis and consists of a big set of data from several SQL databases, netting up to about 60000 lines of JSON per day. The data has been ever-so-slightly reshaped to be turned into a usable format whilst remaining as RAW as I'd like it to be.
The working solution right now is fetching all this data and doing further manipulations client-side to prepare the data for visualization. The issue should seem obvious: each client is fetching a set of documents that grows every day and repeats a lot of work on earlier entries before being ready to display. I want to do this manipulation on the server, through MongoDB's Aggregation Framework.
My initial idea is to do the aggregations on the server and to create new Collections containing smaller, more specific datasets without compromising the RAWness of the original Collection. That would mean the "reduced" Collections can still be reactive, as I've been able to confirm through testing in a Remote Desktop, subscribing to an aggregated Collection which I can update through Robo3T.
I don't know if this would be ideal. As far as storage goes, there's plenty of room for the extra Collections. But I have no idea how to set up an automated aggregation script on said server. And regarding Meteor, I've tried using meteorhacks:aggregate and jcbernack:reactive-aggregate but couldn't figure out how to deal with either one of them. If anyone is dealing, or has dealt with, something similar; I'd love to hear ideas / suggestions.
I have an usecase where a set of records in a collection need to be deleted after a specified interval of time.
For ex: Records older than 10hours be deleted every 10th hour.
We have tried deletion based on id but found it to be slow.
Is there a way to partition the records in a collection and drop a partition as and when required in Mongo
MongoDB does not currently support partitions, there is a JIRA ticket to add this as a feature (SERVER-2097).
One solution is to leverage multiple, time-based collections, cycling collections in a similar way as you would partitions. Typically we would do this when you'd usually only be querying one or few of these time-based collections. If you would often need to read across multiple collections, you could add some wrapper code to simplify that.
There's also TTL Indexes, which leverage a background thread in the mongod server to handle the deletes for you.
Your deletes by _id may have been slow for a number of reasons, and probably warrants more investigation beyond your original question.
I need to set a scheme for a multi room chat which uses mongodb for storage. I'm currently using mongoose v2 and I've thought of the following methods:
Method 1
Each chat log (each room) has its own mongo collection
Each chat log collection is populated by documents (schema) message with from, to, message and time.
There is the user collection
The user collection is populated by documents (schema) user (with info regarding the user)
Doubts:
1 How exactly can I retrieve the documents from a specific collection (the chat room)?
Method 2
There is a collection (chat_logs)
The collection chat_logs is pop. by documents (schema) message with from, to (chatroom), user, etc...
There is the user collection, as above.
Doubts:
1 Is there a max size for collections?
Any advice is welcome... thank you for your help.
There is no good reason to have a separate collection per chatroom. The size of collections is unlimited and MongoDB offers no way to query data from more than one collection. So when you distribute your data over multiple collections, you won't be able to analyze data across more than one chatroom.
There isn't a limit on normal collections. However, do you really want to save every word ever written in any chat forever? Most likely not. You would like to save say the last 1000 or 10000 messages written. Or even 100.000. Lets make it 1.000.000. Given the average size of a chat message, this shouldn't be more than 20MB. So let's make it really safe and multiply that by 10.
What I would do is to use a capped collection per chat room and use tailable cursors. You don't need to be afraid as for too many connections. The average mongo server can take a couple of hundred of them. The queries can be made tailable quite easily as shown in the Mongoose docs.
This approach has some advantages:
Capped collections are fast - they are basically fifo buffers for BSON data.
The data is returned exactly in insertion order for free - no sorting, no complex queries, no extra indices
There is no need to maintain the individual rooms. Simply set a cap on creation and mongodb will take care of the rest.
As for how to do it: Simply create a connection per chat room, save them on an application level in an associative array with the chat room as the key name. Use the connections for creating a new tailable cursors per request. Use XHR to request the chat data. Respond as stream. Process accordingly.
I have 1000+ streaming audio devices deployed around the world. All of which currently check in with the factory monitoring service about every 15 minutes. I'm rolling my own monitoring service and will be storing the status updates with mongo.
Question:
What would be a better design schema:
1) One massive collection named device_updates. Where all the status update document would include a device serial_number key?
2) 1000+ collections each named with the devices serial number, ie: 65FE9, with the devices status updates siloed in their own collection If going this route I would cap the collections at about 2000 status update documents.
Both would need to be indexed by the created_at date key.
Any ideas on which would be better performance wise? Or any thoughts on what would be the preferred method?
Thanks!
I would definitely go for one massive collection, since all the documents are of the same type.
As a general rule, think of a collection in MongoDB as a set of homogeneous documents. Having just one collection, moreover, makes it much easier to scale out horizontally (i.e., by use of shards), by using for example the serial_number as the shard key.