Which is the best way to separate aggregate in DDD - aggregate

I'm learning about DDD, I don't clear about how to separate objects into aggregate.
An example:
I have 3 objects: company, shop, job.
And i have some relationships: one company has many shops and one shop has many jobs.
I thinks:
A shop can't exist without company. And a company have to has shops, it's real world. So that, i group company and shop into one aggregate.
Job is another aggregate.
Another thought
When getting a job, i always care about which shop this job belongs to.
So that, i group: shop and job into one aggregate.
Company is another aggregate.
Which way is right?
Thanks

The only possible answer is, of course, "It depends." That's not especially helpful, though.
Review the definition of an aggregate from Evan's book:
An AGGREGATE is a cluster of associated objects that we treat as a
unit for the purpose of data changes ... Invariants, which are
consistency rules that must be maintained whenever data changes, will
involve relationships between members of the AGGREGATE. Any rule that
spans AGGREGATES will not be expected to be up-to-date at all times
... But the invariants applied within an AGGREGATE will be enforced
with the completion of each transaction.
So the questions of "what objects make up my aggregate" and "what is my aggregate root?" depend on what business invariants need to be enforced across which business transactions?
You do not design aggregates likes you do tables in a relational database. You're not concerned about the multiplicity of the relationship between the entities in "real life". You're looking for what facts (properties, values) must be true at the end of an action that affects (mutates the data of) those entities.
Look at your requirements. Look at what kinds of behavior your system needs to support. What can you do with jobs? Create them? Start them? Complete them? Can you transfer a job from one shop to another? Can a job move between companies?
What facts need to stay consistent? e.g., are you enforcing a maximum number of jobs per shop? At the end of "adding a job", does the current # of jobs in a shop need to be consistent with the job's shop assignment?
Since you can only interact with an aggregate through its root, you need to think about the context of how you add new data. e.g., can you create a job with no initial shop assignment? Or can it only be created through a shop?
There's also a compromise between the size/scope of an aggregate and the possibility of data contention when updating an aggregate in a transaction.
With all of these things to worry about, you may wonder why even bother with aggregates? Well, they are great at a couple of things:
validation and execution of commands is FAST, because all the data you need is self-contained inside the aggregate
they lend themselves well to document-based persistence stores (like MongoDB with nested documents for the aggregate objects), which makes retrieval of the aggregate state simple and performant, and enforcement of the aggregate transaction boundary easy with document-level atomic updates
they are extremely easy to test, because they can be implemented as simple classes (POCO's/POJO's in C# or Java). Because they contain the majority of your business logic, it means your overall app's behavior is easy to unit test as well!
they are intentful; each aggregate has a purpose, and it's very clear from the data and functions they implement just what they do in your system. Combined with leveraging the ubiquitous language of the context you're coding in in the code itself, they are the most direct expressions of your business behavior in the codebase (much more so than a set of data tables alone)
because they are so use-case specific, aggregates often avoid leaky abstractions that crop up in more generic solutions
If you're interested in reading more, Vaughn Vernon has a nice summary in his Effective Aggregate Design posts, which served as the basis for his meaty book "Implementing Domain-Driven Design".

Related

MongoDB Extended Reference Pattern - Managing Data Duplication

I'm new to MongoDB and trying to wrap my head around managing duplicate data. The Extended Reference Pattern (link) is a good example. When you have two related collections (e.g., Customers and Orders), it can make sense for performance reasons to duplicate some information that would otherwise just live in the referenced collection. So for instance, the Order collection might duplicate the customer's name to avoid unnecessary joins with some queries.
I totally get that. And I totally get that you should be careful about what data you duplicate ("it works best if [duplicated fields] don't frequently change"), as updating those records can be expensive. What I don't understand is how you're supposed to keep track of where all that data is housed. Suppose you do need to update a customer's name. If that information duplicated in multiple orders within the Order Collection, plus maybe one or two other collections, tracking down where all the customer name lives (and the mechanics of changing it) sounds like a logistical nightmare!
Is there some sort of Mongo voodoo magic that can help with these sorts of updates, or is that just a necessarily messy process?
you have to manage all that changes on your app, so you have to take care when select one pattern or another, they are not silver bullets.
and remember not all the data need to be updated, depends of the situation, the data and the context of your app.

Single big collection for all products vs Separate collections for each Product category

I'm new to NoSQL and I'm trying to figure out the best way to model my database. I'll be using ArangoDB in the project but I think this question also stands if using MongoDB.
The database will store 12 categories of products. Each category is expected to hold hundreds or thousands of products. Products will also be added / removed constantly.
There will be a number of common fields across all products, but each category will also have unique fields / different restrictions to data.
Keep in mind that there are instances where I'd need to query all the categories at the same time, for example to search a product across all categories, and other instances where I'll only need to query one category.
Should I create one single collection "Product" and use a field to indicate the category, or create a seperate collection for each category?
I've read many questions related to this idea (1 collection vs many) but I haven't been able to reach a conclusion, other than "it dependes".
So my question is: In this specific use case which option would be most optimal, multiple collections vs single collection + sharding, in terms of performance and speed ?
Any help would be appreciated.
As you mentioned, you need to play with your data and use-case. You will have better picture.
Some decisions required as below.
Decide the number of documents you will have in near future. If you will have 1m documents in an year, then try with at least 3m data
Decide the number of indices required.
Decide the number of writes, reads per second.
Decide the size of documents per category.
Decide the query pattern.
Some inputs based on the requirements
If you have more writes with more indices, then single monolithic collection will be slower as multiple indices needs to be updated.
As you have different set of fields per category, you could try with multiple collections.
There is $unionWith to combine data from multiple collections. But do check the performance it purely depends on the above decisions. Note this open issue also.
If you decide to go with monolithic collection, defer the sharding. Implement this once you found that queries are slower.
If you have more writes on the same document, writes will be executed sequentially. It will slow down your read also.
Think of reclaiming the disk space when more data is cleared from the collections. Multiple collections do good here.
The point which forces me to suggest monolithic collections is that I'd need to query all the categories at the same time. You may need to add more categories, but combining all of them in single response would not be better in terms of performance.
As you don't really have a join use case like in RDBMS, you can go with single monolithic collection from model point of view. I doubt you could have a join key.
If any of my points are incorrect, please let me know.
To SQL or to NoSQL?
I think that before you implement this in NoSQL, you should ask yourself why you are doing that. I quite like NoSQL but some data is definitely a better fit to that model than others.
The data you are describing is a classic case for a relational SQL DB. That's fine if it's a hobby project and you want to try NoSQL, but if this is for a production environment or client, you are likely making the situation more difficult for them.
Relational or non-relational?
You mention common fields across all products. If you wish to update these fields and have those updates reflected in all products, then you have relational data.
Background
It may be worth reading Sarah Mei 2013 article about this. Skip to the section "How MongoDB Stores Data" and read from there. Warning: the article is called "Why You Should Never Use MongoDB" and is (perhaps intentionally) somewhat biased against Mongo, so it's important to read this through the correct lens. The message you should get from this article is that MongoDB is not a good fit for every data type.
Two strategies for handling relational data in Mongo:
every time you update one of these common fields, update every product's document with the new common field data. This is generally only ok if you have few updates or few documents, but not both.
use references and do joins.
In Mongo, joins typically happen code-side (multiple db calls)
In Arango (and in other graph dbs, as well as some key-value stores), the joins happen db-side (single db call)
Decisions
These are important factors to consider when deciding which DB to use and how to model your data
I've used MongoDB, ArangoDB and Neo4j.
Mongo definitely has the best tooling and it's easy to find help, but I don't believe it's good fit in this case
Arango is quite pleasant to work with, but doesn't yet have the adoption that it deserves
I wouldn't recommend Neo4j to anyone looking for a NoSQL solution, as its nodes and relations only support flat properties (no nesting, so not real documents)
It may also be worth considering MariaDB or Postgres

How to design MongoDB data model to store Event Sourcing events

If I create a single table (or document in document databases) per aggregate type,I can merge databases or shard them whenever I refactor the write side's microservices, and as the result the application becomes more scalable, and it also increases the speed of loading events.
Are there any side effects I should be aware of while I'm designing the event store like that?
Edit:
I'm currently using MongoDb.
What if I create a collection per aggregate id ?
Or a database per aggregate type, and a collection per aggregate id ...?
Is that problematic in performance, ease of data administration, maintainability, or further scalability?
If I create a single table (or document in document databases),I can merge databases or shard them whenever I refactor the write microservices, and as the result the application becomes more scalable.
Are there any side effects I should be aware of while I'm designing the event store like that?
I haven't seen any authoritative discussion of that design.
There was a discussion in the event sourcing community about having a separate table for each type of aggregate. You can find that discussion here. Executive summary: the more experienced practitioners seemed to be startled that anybody would do that on purpose.
One thing that you should keep in mind is that while events are real (they describe something of interest to the business), aggregates are artificial. You are probably going to be unhappy if redesigning your aggregate boundaries requires that you move your events all over the place.
The following may be helpful
https://github.com/NEventStore/NEventStore.Persistence.MongoDB
http://www.slideshare.net/dbellettini/cqrs-and-event-sourcing-with-mongodb-and-php
http://blingcode.blogspot.com/2010/12/cqrs-building-transactional-event-store.html

MongoDB - How to organize collections

Just a general question here - I'm doing some self paced learning on MongoDB and to get off on the right foot I'd like an opinion on how to organize collections for a sample budget application.
As with any home budget I have 'Categories' such as Home, Auto and I also have subcategories under those categories such as Mortgage and Car Payments.
Each bill will have a due date, minimum amount due, a forecast payment, forecast payment date, actual payment and actual payment date.
Each bill is due to 'someone', for example Home, Mortgage may be due to Bank of America, and Bank of America may have contact info (phone, mailing address).
Making the switch from a Table structure to Mongo is a bit confusing, so I appreciate any opinions on how to approach this.
The question is very general. In general :), the following principles apply to schema design in MongoDB:
The layout of your collections should be guided by sound modeling principles. With MongoDB, you can have a schema that more closely resembles the object structure of your data, as opposed to a relational "projection" of it.
The layout of your collections should be guided by your data access patterns (which may sometimes conflict with the previous statement). Design your schemas so you can get the info you need in as few queries as possible, without loading too much data you don't need into your application.
You often can, and should, "denormalize" to achieve the above two. It's not a bad thing with MongoDB at all. The downside of denormalizing is that updates become more expensive and you need to make sure to maintain consistency. But those downsides are often outweighed by more natural modeling and better read efficiency.
From your description above, it sounds as if you have a rather "relational" model already in mind. Try to get rid of that and approach the problem with a fresh mind. Think objects, not tables.

Many to many update in MongoDB without transactions

I have two collections with a many-to-many relationship. I want to store an array of linked ObjectIds in both documents so that I can take Document A and retrieve all linked Document B's quickly, and vice versa.
Creating this link is a two step process
Add Document A's ObjectId to Document B
Add Document B's ObjectId to Document A
After watching a MongoDB video I found this to be the recommended way of storing a many-to-many relationship between two collections
I need to be sure that both updates are made. What is the recommended way of robustly dealing with this crucial two step process without a transaction?
I could condense this relationship into a single link collection, the advantage being a single update with no chance of Document B missing the link to Document A. The disadvantage being that I'm not really using MongoDB as intended. But, because there is only a single update, it seems more robust to have a link collection that defines the many-to-many relationship.
Should I use safe mode and manually check the data went in afterwards and try again on failure? Or should I represent the many-to-many relationship in just one of the collections and rely on an index to make sure I can still quickly get the linked documents?
Any recommendations? Thanks
#Gareth, you have multiple legitimate ways to do this. So they key concern is how you plan to query for the data, (i.e.: what queries need to be fast)
Here are a couple of methods.
Method #1: the "links" collection
You could build a collection that simply contains mappings between the collections.
Pros:
Supports atomic updates so that data is not lost
Cons:
Extra query when trying to move between collections
Method #2: store copies of smaller mappings in larger collection
For example: you have millions of Products, but only a hundred Categories. Then you would store the Categories as an array inside each Product.
Pros:
Smallest footprint
Only need one update
Cons:
Extra query if you go the "wrong way"
Method #3: store copies of all mappings in both collections
(what you're suggesting)
Pros:
Single query access to move between either collection
Cons:
Potentially large indexes
Needs transactions (?)
Let's talk about "needs transactions". There are several ways to do transactions and it really depends on what type of safety you require.
Should I use safe mode and manually check the data went in afterwards and try again on failure?
You can definitely do this. You'll have to ask yourself, what's the worst that happens if only one of the saves fails?
Method #4: queue the change
I don't know if you've ever worked with queues, but if you have some leeway you can build a simple queue and have different jobs that update their respective collections.
This is a much more advanced solution. I would tend to go with #2 or #3.
Why don't you create a dedicated collection holding the relations between A and B as dedicated rows/documents as one would do it in a RDBMS. You can modify the relation table with one operation which is of course atomic.
Should I use safe mode and manually check the data went in afterwards and try again on failure?
Yes this an approach, but there is an another - you can implement an optimistic transaction. It has some overhead and limitations but it guarantees data consistency. I wrote an example and some explanation on a GitHub page.