MongoDb casscade update/delete - mongodb

I'm trying to use mongoDB with Morphia but still I have a problem with deleting documents. Is there any additional plugin or wrapper which works with Mongo and provides something like transactions in DBMS?

No, there are no (multi document) transactions. There are two possible solutions:
You can restructure your data into a single document instead of spreading it over multiple tables. Thus MongoDB's single document transactions (if you call them that) are enough for you. You can solve many problems with embedded entities or arrays. You might want to start a question related to "schema" design, if you're unsure how to approach this.
Your problem absolutely needs transactions across multiple documents / tables. Then MongoDB is simply not the right tool and you should use a relational database.
Don't fight the tool, pick the right one...

Related

Single big collection for all products vs Separate collections for each Product category

I'm new to NoSQL and I'm trying to figure out the best way to model my database. I'll be using ArangoDB in the project but I think this question also stands if using MongoDB.
The database will store 12 categories of products. Each category is expected to hold hundreds or thousands of products. Products will also be added / removed constantly.
There will be a number of common fields across all products, but each category will also have unique fields / different restrictions to data.
Keep in mind that there are instances where I'd need to query all the categories at the same time, for example to search a product across all categories, and other instances where I'll only need to query one category.
Should I create one single collection "Product" and use a field to indicate the category, or create a seperate collection for each category?
I've read many questions related to this idea (1 collection vs many) but I haven't been able to reach a conclusion, other than "it dependes".
So my question is: In this specific use case which option would be most optimal, multiple collections vs single collection + sharding, in terms of performance and speed ?
Any help would be appreciated.
As you mentioned, you need to play with your data and use-case. You will have better picture.
Some decisions required as below.
Decide the number of documents you will have in near future. If you will have 1m documents in an year, then try with at least 3m data
Decide the number of indices required.
Decide the number of writes, reads per second.
Decide the size of documents per category.
Decide the query pattern.
Some inputs based on the requirements
If you have more writes with more indices, then single monolithic collection will be slower as multiple indices needs to be updated.
As you have different set of fields per category, you could try with multiple collections.
There is $unionWith to combine data from multiple collections. But do check the performance it purely depends on the above decisions. Note this open issue also.
If you decide to go with monolithic collection, defer the sharding. Implement this once you found that queries are slower.
If you have more writes on the same document, writes will be executed sequentially. It will slow down your read also.
Think of reclaiming the disk space when more data is cleared from the collections. Multiple collections do good here.
The point which forces me to suggest monolithic collections is that I'd need to query all the categories at the same time. You may need to add more categories, but combining all of them in single response would not be better in terms of performance.
As you don't really have a join use case like in RDBMS, you can go with single monolithic collection from model point of view. I doubt you could have a join key.
If any of my points are incorrect, please let me know.
To SQL or to NoSQL?
I think that before you implement this in NoSQL, you should ask yourself why you are doing that. I quite like NoSQL but some data is definitely a better fit to that model than others.
The data you are describing is a classic case for a relational SQL DB. That's fine if it's a hobby project and you want to try NoSQL, but if this is for a production environment or client, you are likely making the situation more difficult for them.
Relational or non-relational?
You mention common fields across all products. If you wish to update these fields and have those updates reflected in all products, then you have relational data.
Background
It may be worth reading Sarah Mei 2013 article about this. Skip to the section "How MongoDB Stores Data" and read from there. Warning: the article is called "Why You Should Never Use MongoDB" and is (perhaps intentionally) somewhat biased against Mongo, so it's important to read this through the correct lens. The message you should get from this article is that MongoDB is not a good fit for every data type.
Two strategies for handling relational data in Mongo:
every time you update one of these common fields, update every product's document with the new common field data. This is generally only ok if you have few updates or few documents, but not both.
use references and do joins.
In Mongo, joins typically happen code-side (multiple db calls)
In Arango (and in other graph dbs, as well as some key-value stores), the joins happen db-side (single db call)
Decisions
These are important factors to consider when deciding which DB to use and how to model your data
I've used MongoDB, ArangoDB and Neo4j.
Mongo definitely has the best tooling and it's easy to find help, but I don't believe it's good fit in this case
Arango is quite pleasant to work with, but doesn't yet have the adoption that it deserves
I wouldn't recommend Neo4j to anyone looking for a NoSQL solution, as its nodes and relations only support flat properties (no nesting, so not real documents)
It may also be worth considering MariaDB or Postgres

I wonder if there are a lot of collections

Do many mongodb collections have a big impact on mongodb performance, memory and capacity? I am designing an api with mvc pattern, and a collection is being created for each model. I question the way I am doing now.
MongoDB with the WirdeTiger engine supports an unlimited number of collections. So you are not going to run into any hard technical limitations.
When you wonder if something should be in one collection or in multiple collections, these are some of the considerations you need to keep in mind:
More collections = more maintenance work. Sharding is configured on the collection level. So having a large number of collections will make shard configuration a lot more work. You also need to set up indexes for each collection separately, but this is quite easy to automatize, because createIndex on an index which already exists does nothing.
The MongoDB API is designed in a way that every database query operates on one collection at a time. That means when you need to search for a document in n different collections, you need to perform n queries. When you need to aggregate data stored in multiple collections, you run into even more problems. So any data which is queried together should be stored together in the same collection.
Creating one collection for each class in your model is usually a god rule of thumb, but it is not a golden hammer solution. There are situations where you want to embed object in their parent-object documents instead of putting them into a separate collection. There are also cases where you want to put all objects with the same base-class in the same collection to benefit from MongoDB's ability to handle heterogeneous collections. But that goes beyond the scope of this question.
Why don't you use this and test your application ?
https://docs.mongodb.com/manual/tutorial/evaluate-operation-performance/
By the way your question is not completely clear... is more like a "discussion" rather than question. And you're asking others to evaluate your work instead of searching the web the rigth approach.

How to merge two collections in mongodb

I have two collections called Company_Details and Company_Ranks...Comp_ID is common in two collections. How do I merge these two collections to get complete details of a company.
Please help me
Thanks
Satyam
To make long story short, you either do that on client-side or consider the benefits of embedding those documents.
MongoDB does not support joins, as opposed to relational databases. This is both a pro and a con. It has helped MongoDB's developers to focus on scalability which is much harder to implement when you have joins and transactions.
You can follow the DBRef specification. Lots of drivers support DBRef and do the composition seamlessly for you. You can even do that manually. But most importantly, you can take advantage of embedding documents.
Embedding documents in MongoDB is a unique ability over relational databases. Meaning, you can create one collection consisting of compound documents. You'll enjoy atomicity, as there is no "partial success", and data locality: spinning disks are better in accessing data in sequence.
If querying is your motive and you don't want to change your schema. Then, try Apache Drill which allows you to query with SQLs. Then perform the full join, inner join, etc whatever you want. You can check for drill with MongoDB.
With MongoDB Version 3.2 and higher we got now the $lookup Command, which is the "same" as a Join in a RDBMS.
With that you can easy Query between your 2 Collections and get the Information you want.
For further Details Checkt out the Documentation
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

Denormalization vs Parent Referencing vs MapReduce

I have a highly normalized data model with me. Currently I'm using manual referencing by storing the _id and running sequential queries to fetch details from the deepest collection.
The referencing is one-way and the flow has around 5-6 collections. For one particular use case, I'm having to query down to the deepest collection by querying subsequent "_id" from the higher level collections. So technically I'm hitting the database every time I run a
db.collection_name.find(_id: ****).
My prime goal is to optimize the read without hugely affecting the atomicity of the other collections. I have read about de-normalization and it does not make sense to me because I want to keep an option for changing the cardinality down the line and hence want to maintain a separate collection altogether.
I was initially thinking of using MapReduce to do an aggregation from the back and have a collection primarily for the particular use-case. But well even that does not sound that good.
In a relational db, I would be breaking the query in sub-queries and performing a join to get the data sets that intersect from the initial results. Since mongodb does not support joins, I'm having a tough time figuring anything out.
Please help if you have faced anything like this earlier or have any idea how to resolve it.
Denormalize your data.
MongoDB does not do JOIN's - period.
There is no operation on the database which gets data from more than one collection. Not find(), not aggregate() and not MapReduce. When you need to puzzle your data together from more than one collection, there is no other way than doing it on the application layer. For that reason you should organize your data in a way that any common and performance-relevant query can be resolved by querying just a single collection.
In order to do that you might have to create redundancies and transitive dependencies. This is normal in MongoDB.
When this feels "dirty" to you, then you should either accept the fact that your performance will be sub-optimal or use a different kind of database, like a classic relational database or a graph database.

Is MongoDB a good fit for this?

In a system I'm building, it's essentially an issue tracking system, but with various issue templates. Some issue types will have different formats that others.
I was originally planning on using MySQL with a main issues table and an issues_meta table that contains key => value pairs. However, I'm thinking NoSQL (MongoDB) might be the better option.
Can MongoDB provide me with the ability to generate "standard"
reports, like # of issues by type, # of issues by type by month, # of
issues assigned per person, etc? I ask this because I've read a few
sources that said Mongo was bad at reporting.
I'm also planning on storing my audit logs in Mongo, since I want a single "table" for all actions (Modifications to any table). In Mongo I can store each field that was changed easily, since it is schemaless. Is this a bad idea?
Anything else I should know, and will Mongo work for what I want?
I think MongoDB will be a perfect match for that use case.
MongoDB collections are heterogeneous, meaning you can store documents with different fields in the same bag. So different reporting templates won't be a show stopper. You will be able to model a full issue with a single document.
MongoDB would be a good fit for logging too. You may be interested in capped collections.
Should you need to have relational association between documents, you can do have it too.
If you are using Ruby, I can recommend you Mongoid. It will make it easier. Also, it has support for versioning of documents.
MongoDB will definitely work (and you can use capped collections to automatically drop old records, if you want), but you should ask yourself, does it fit to this task well? For use case you've described it is better option to use Redis (simple and fast enough) or Riak (if you care a lot about your log data).