I come from a RDBMS and MongoDB background and can't get my head around the flat Couchbase database.
I am working on an educational app, where students (around 10000) will access "their individual study material " and solve it. Since the application works offline too, I need to keep Couchbase as the remote database and PouchDB as local in mobile and keep them in sync.
Use Cases:
If a question is added on the remote server, it should get synced locally to PouchDB.
If a student marks any question "important" or "doubt", it should get synced to the remote Couchbase database in the student's private space.
The schema which I can think after researching from SO is that I maintain an individual database for each student (1 bucket for 1 student) so that the syncing can be done very quickly.
The other model can be 1 bucket containing all the students as different documents. In this case, every question will be replicated as a sub-document for every student's document.
What can be the optimal database design at the Couchbase Server. Shall I go for the first or second approach or some other approach as per your suggestions?
Related
Trying to figure out what would be better:
Multiple instances, one per DB
or
Single large instance which will hold multiple DBs inside
The scenario is similar to Jira Cloud where each customer has his own Jira Cloud server, with its own DB.
Now the question is, will it be better to manage all of the users' DBs in 1 large instance, or to have a DB instance for each customer?
What would be the cons and pros for the chosen alternative?
The first thing that came to or minds is backup management - Will we be able to recover a specific customer's DB if it resides on the same large instance as all other DBs?
Similar question, but in a different scenario and other requirements - 1-big-google-cloud-sql-instance-2-small-google-cloud-sql-instances-or-1-medium
This answer is based on a personal opinion. It is up to you to decide how you want to build your database. However, it is better to go with multiple smaller Cloud SQL instances as it is also stated in Cloud SQL > Best practices documentation.
PROS of multiple databases
It is easier to manage smaller instances rather than big instances. (In documentation provided above)
You can choose the region and zone for each database, so if your customers are located in different geographical locations, you can always choose the closest for them zone for the Cloud SQL instance and this way you will reduce the latency.
Also if you are planning to have a lot of databases, with a lot of tables in each database and a lot of records in each table, this means that the instance will be huge. Therefore the backup, creating read replicas or fail-over replicas and maintaining them, will take some time after the databases will begin to expand.
Although, I would suggest, if you have multiple databases per user, have them inside one Cloud SQL instance that so you manage one Cloud SQL instance per user. e.g. You have 100 users and User1 has 4 databases, User2 has 6 databases etc. Create 100 Cloud SQL instances instead of having one Cloud SQL instance per databases, otherwise you will end up with a lot of them and it will be hard to manage multiple instances per user.
MongoDB contains data ready for client-side apps. The raw data being stored in Google BigQuery (GBQ). Each day a lot of new data being added into GBQ and once a day pretty much everything in MongoDB needs to be updated according to the most recent data in GBQ. All outdated (not updated) records must be deleted.
What is the right way to handle MongoDB update with close to 0 downtime?
Among the crazy solutions: may be i should have two instances of MongoDB, one is in production, another is being updated. Once the second db updated, i'll run Google Kubernetes Engine deploy with changed configs, so all clients will be smoothly moved from previous data to the updated one without messing up with partially updated data and without downtime. Though, i have never heard about such solutions, so i'm not sure if this is the right one.
Another solution is to have two versions of each collection under a single instance of MongoDB. Once collection is updated, server switches to that collection.
The 2nd solution seems a good option, if you know the trigger for the update, you can have minimum downtime by creating a new collection (named by date or a unique serial maybe) and update your code accordingly.
I had some good experience doing this for a fashion website sometime back, where we scraped data (using scrapinghub) and imported them into mongodb (collections stored by date) and used accordingly. So our scraping ran early morning (5-6AM) and when our editors/curators came in the office, they would start using the current dated collection (via the Web Interface of course :) )
I have a cloud native app based on azure. The app uses azure table storage.
Due to a fantastic opportunity I have decided to also provide the app on-premises. So I have to replace the NoSql data provider... my question is: Which solution is more alike Azure Table Storage? Mongo? Raven? you name it!
What I intend is to migrate the code effortlessly, like migrating from SQL Azure to Sql Server 2012... no code change needed... but I know that theres no equivalent for table storage... so I intend to find the one that will reduce my TTM as much as possible...
MongoDB and Table Storage are not exactly swappable replacements for each other. One is key/value, the other is document. I compared the two in this answer.
There's no getting around the fact that Table Storage is Storage-as-a-Service and you only pay for quantity of data (plus a very small per-transaction cost), whereas to work with MongoDB, you'd either have to host it in your own VMs (which gives you plenty of storage room, but at the expense of VMs) or work with a hoster (such as MongoLab, which offers 500MB for free currently). Regardless, you'd have do do some code changes to work with MongoDB over Table Storage.
I'm not sure if there exists a key/value store equivalent to Table Storage that's locally-installable. No matter what you pick, you'll have modifications on your Azure-side solution if you swap out Table Storage.
Is it possible, for your on-premises solution, to provide a MongoDB backend that stays relatively simple? That is: Stick with a single index to substitute for rowkey, and then store your table entities as documents (avoiding sub-documents)? This would keep your data layout very similar. At that point, you could use things like Aggregation Framework for a bit of data processing, and not damage the overall layout style/schema of your data.
MongoDB would give you a consistent storage framework that you could use in-cloud and on-premises, and has good support for Windows Azure.
I have recently moved to MongoDB part of the back-end of a web app, the web app itself is a validation tool, and the workflow looks like:
the user uploads a file (typically hundreds of thousands of lines)
the validator checks it outputting a lot of messages (possibly more than one per line)
...and finally provide a few statistics
I modelled my application so that each user has it's own DB containing:
The file (saved through GridFS)
A collection containing the messages (possibly over a million lines, in some cases)
A collection with the statistics
We have a few hundreds of users, so MongoDB will end up having a few hundreds DBs.
Of course I could have hold all the data in the same DB, using namespaces to separate data from different users. However I felt it was handy to send the DB in the connection URI, and I found more intuitive to issue a "drop database" statement to purge a user, rather than searching and removing its data in the large DB.
I am pretty new to MongoDB, so my question is: is there any drawback in having several DBs in the same MongoDB instance? Or is there any special consideration that I should give to the problem?
I'm not familiar with MongoDB specifically. In general, openning a connection to a database is a relatively slow operation and ties up system resources. Whether this is enough to matter in your case I can't say.
Having a different db for each user would make it difficult to perform queries that access data for multiple users. Maybe you have no need to do this.
Still, I would think it would be a whole lot simpler in general to just put a user id in each record rather than create a separate database. What's the gain of separate databases? Okay, deleting a user means saying "drop database". But deleting a user from a single database should mean saying "delete from tableX where user=?; delete from tableY where user=?" etc for however many relevant tables you have. I can't imagine it's hundreds, right? Maybe half a dozen lines of code or so?
We have decided to use MongoDB for a SaaS offering we are creating. Each company that signs up gets their own url (mycompany.domain.com) and their own private set of users, projects, etc... Since we are using a NoSQL solution, and wouldn't have to manage pushing out schema updates to every database like we would with MySQL, I am wondering if it would be better to have one huge database containing all the data, or to have one database per client.
Since MongoDB can shard the database across multiple servers, I'm thinking there wouldn't be a huge performance hit if we had a giant database, but I also think backups and exporting data would be much easier if there was one database per client. Any thoughts?
Go with one but make sure to take advantage of some sort of replication for backup purposes!
Look into sharding or look into replica-sets.