DDD - How to store aggregates in NoSql databases

DDD - How to store aggregates in NoSql databases - mongodb

A current project needs us to persist domain objects in a NoSQL database such as mongoDB.
In many examples (incl. Eric Evans, Vaughn Vernon) the domain objects are serialized and persisted to the mongoDB directly.
We would like to avoid mixing the domain layer with persistence related inforamtion by not having any annotations in our domain objects.
Also we are concerned about corrupting the persisted data by changing the domain object in the future.
We came to the conclusion that we need to have some kind of DTOs translating between the domain objects and the persisted data.
Did anyone of you come across a good solution for such a case?

Yes. Your domain models should be ignorant of persistence. So you need a DTO or what I call data models (apart from the domain models and view models). Your data models will be map to the domain models before persisting to the database. This mapping is pretty common in insert and update operations. For read-only operations (reporting, etc) you can bypass the mapping from data models and to domain models. That will prevent loading the whole object graph of your domain models. This is widely applied in CQRS architecture patterns where read and write commands are separated.

Like you, I want business objects to have no dependency on any kind of specific repository. I solved it like this: That have your business object define its state objects and repository functions as interfaces. Your repository implementation can create an actual state object and inject that into your business object using the constructor.
There are a lot of advantages to this approach (such as having business objects for specific purposes), but you easily achieve complete (two-way) independence of your repository this way. Martin Fowler also hinted at this approach elsewhere.
I actually use the same pattern in my Angular / TypeScript projects. My read-api calls return DTO objects that get state objects injected as well and their properties map directly onto state objects.
These DTOs that end up as untyped javascript objects when they come from the api to the client (Angular) project are then in turn injected as state objects into TypeScript objects, injected in the constructor again and mapped by getters and setters. It works very cleanly and is well maintainable. I have an example on my GitHub (niwra) account (Software-Management repositories), but can expand here if anyone is interested.
MongoDB allows for very clean and Unit-Testable repository implementations, that returns strongly typed aggregates. The only thing I haven't solved cleanly yet is telling MongoDb about state objects for child-collections. Currently that is pretty 'static' still, but I'm sure I'll find some nice solution.

You can store your domain objects as-is in document databases. Vaughn Vernon has posted an article The Ideal Domain-Driven Design Aggregate Store? about this, featuring PostgreSQL new (at that time) JSONB document-like storage.
Of course, you get a risk having your aggregates polluted by BsonX attributes, which you probably do not want. You can avoid this by using convention configuration but you will still need to think about serialisation and this can have an effect on the level of encapsulation.
Another pattern here is to use a separate state object, which is then held as a property inside the aggregate root (or regular entity). I would not call it a "DTO", since this is clearly your aggregate state. You are not transferring anything. Methods inside your aggregate can mutate the state or, even better, the state would be an immutable value object and new state is produced when you need to change the state.
In such case persistence would only care about the state object. You still might be unhappy to have MongoDb attributes on the state object properties and this is reasonable. Then, you would need to have an identical structure inside the persistence mechanism, so you can map properties on-to-one.

A current project needs us to persist domain objects in a NoSQL
database such as mongoDB. In many examples (incl. Eric Evans, Vaughn
Vernon) the domain objects are serialized and persisted to the mongoDB
directly.
I can confirm you that MongoDB is a good choice for persisting DDD models. I use MongoDB as an Event store in my current project. You can use MongoDB even if you are not using Event sourcing, for example using an ODM (Object Document Mapper): you have a document for each Aggregate instance (this applies to any document based database, not only MongoDB) and you store nested entities and value objects as nested documents.
We would like to avoid mixing the domain layer with persistence related inforamtion by not having any annotations in our domain objects.
You can use xml mapping.
Also we are concerned about corrupting the persisted data by changing the domain object in the future.
For this you can use custom migration scripts. If you use Event sourcing then there are event versioning strategies.
We came to the conclusion that we need to have some kind of DTOs translating between the domain objects and the persisted data.
This is a bad conclusion.
If you use CQRS you won't need DTOs because the readmodels are enough.

Related

Exposed domain model in Java microservice architecture

I'm aware that copying entity classes and properties into DTOs is considered anti-pattern, so by Exposed domain model pattern the same #Entity can be used as both database entity class, and DTO for service and MVC layer. (see here https://codereview.stackexchange.com/questions/93511/data-transfer-objects-vs-entities-in-java-rest-server-application)
But suppose we have microservice architecture where the same set of properties is used as entity in one project with persistence, and as DTO in another project which uses the first one as a service. What's the proposed pattern in such a situation?
Because the second project doesn't need #Entity related functionality, and if we put that class in shared library, it will be tied unnecessary to JPA specific APIs and libraries. And the alternative is to again use separate DTO classes anti-pattern.

When your requirements for a DTO model exactly match your entity model you are either in a very early stage of the project or very lucky that you just have a simple model. If your model is very simple, then DTOs won't give you many immediate benefits.
At some point, the requirements for the DTO model and the entity model will diverge though. Imagine you add some audit aspects, statistics or denormalization to your entity/persistence model. That kind of data is usually never exposed via DTOs directly, so you will need to split the models. It is also often the case that the main driver for DTOs is the fact that you don't need all the data all the time. If you display objects in e.g. a dropdown you only need a label and the object id, so why would you load the whole entity state for such a use case?
The fact that you have annotations on your DTO models shouldn't bother you that much, what is the alternative? An XML-like mapping? Manual object wiring?
If your model is used by third parties directly, you could use a subclassing i.e. keep the main model free of annotations and have annotated subclasses in your project that extend the main model.
Since implementing a DTO approach correctly, I created Blaze-Persistence Entity Views which will not only simplify the way you define DTOs, but it will also improve the performance of your queries.
If you are interested, I even have an example for an external model that uses entity view subclasses to keep the main model clean.

Thank you for the answers, but emphasize in the question is on microservice (MS) architecture and reusing defined entity POJOs from one MS in another as POJOs. From what I've read on microservices it's closely related to another question - should MSs share any common functionality and classes at all, or be completely independent? It seems there is no definite agreement on it, and also no definite answer, or widely accepted pattern, to this.
From my recent experience here is what I adopted, and it works well so far.
Have common functionality across MSs - yes, in form of a commons project added as dependency to all MSs, with its dependencies set as optional. Share entity classes (expose them in commons) - no.
The main reason is that entity classes are closely related to data store for particular MS. And as the established rule is that MSs shouldn't share data stores, then it makes sense not to share entity classes for those data stores. It helps MSs to be more independent, and freedom to manage their data store in their own way. It means some more typing to add additional DTO classes and conversion between them, but it's a trade-off worth taking to retain MS independence. Reasons Christian Beikov and Maksim Gumerov mentioned apply as well.
What we do share (put in commons) are some common functionality and helper classes (for cloud, discovery, error handling, rest and json configuration...), and pure DTOs, where T is transfer between MSs (rest entities or message payloads).

Using ViewModels instead DTOs as the result of a CQRS query

Reading a SO question, I realized that my Read services could provide some smarter object like ViewModels instead plain DTOs. This makes me reconsider what information should be provided by the objects returned by the Read Services
Before, using just DTOs, my Read Service just made flat view mapping of a database query into hash like structure with minimum normalization and no behavior.
However I tend to think of a ViewModel as something "smarter" that can have generated information not provided by the database, like status icon, calculated values, reformatted values, default values, etc.
I am starting to see that the construction of some ViewModel objects might get more complicated and has potential downsides if I made my generic ReadServiceInterface return ViewModels only:
(1) Should I plan some design restriction for the ViewModels returned by my CQRS? Like making sure that their construction is almost as fast as a plain DTO?
(2) DTOs by nature are easily serialized and ready to be sent to an external system in a SOA architecture or embedded into a message. Does this mean that using ViewModels will have a negative impact on my architecture?
(3) Which type of ViewModels should I keep outside my Read Services?
(4) Should I expect all ViewModels to be retrieved from Read Services?
In the past I implemented some ViewModels that needed more than one query. In a CQRS I suppose, that is a design smell, since everything they provide, should be in only one query.
I am starting a new project, where I thought that any query will return either aggregate objects or DTOs. Since now ViewModels come into play. I am wondering:
(5) Should I plan that queries within my architecture will yield two type of objects (ViewModels+Aggregates) or three (+DTO)?

View Models (VM) serve a single master: the View. We're usually consider the VM a pretty dumb object so in this regard, there's no technical difference between a VM and a DTO, only their purpose and semantics are different.
How you build a VM is an implementation detail. Some VM are pre generated and stored in a VM repository. Others are built in real-time by a service (or a query handler) either by querying the db directly or querying other repos/services then assembling the results. There's no right or wrong and no rules about how to do it. It comes down to preference.
In CQRS the important part is separation of commands from queries i.e more than one model. There's no rule about how many queries you should do or if you should return a view model or dto. As long as you have at least one read model dedicated for queries, it's CQRS.
Don't let technicalities complicate your design. Proper design is more about high level structure and not low level implementation. Use CQRS because having a read model simplifies your app, not for other reasons. Aim for simplification and clean code, not for rigid rules that dictate a 'how to' recipe.

Persistence ignorance and DDD reality

I'm trying to implement fully valid persistence ignorance with little effort. I have many questions though:
The simplest option
It's really straightforward - is it okay to have Entities annotated with Spring Data annotations just like in SOA (but make them really do the logic)? What are the consequences other than having to use persistance annotation in the Entities, which doesn't really follow PI principle? I mean is it really the case with Spring Data - it provides nice repositories which do what repositories in DDD should do. The problem is with Entities themself then...
The harder option
In order to make an Entity unaware of where the data it operates on came from it is natural to inject that data as an interface through constructor. Another advantage is that we always could perform lazy loading - which we have by default in Neo4j graph database for instance. The drawback is that Aggregates (which compose of Entities) will be totally aware of all data even if they don't use them - possibly it could led to debugging difficulties as data is totally exposed (DAO's would be hierarchical just like Aggregates). This would also force us to use some adapters for the repositories as they doesn't store real Entities anymore... And any translation is ugly... Another thing is that we cannot instantiate an Entity without such DAO - though there could be in-memory implementations in domain... again, more layers. Some say that injecting DAOs does break PI too.
The hardest option
The Entity could be wrapped around a lazy-loader which decides where data should come from. It could be both in-memory and in-database, and it could handle any operations which need transactions and so on. Complex layer though, but might be generic to some extent perhaps...? Have a read about it here
Do you know any other solution? Or maybe I'm missing something in mentioned ones. Please share your thoughts!

I achieve persistence ignorance (almost) for free, as a side effect of proper domain modeling.
In particular:
if you correctly define each context's boundary, you will obtain small entities without any need for lazy loading (that, actually becomes an antipattern/code smell in a DDD project)
if you can't simply use SQL into your repository, map a set of DTO to your db schema, and use them into factories to initialize entity classes.
In DDD projects, persistence ignorance is relevant for the domain model itself, not for repositories, factories and other applicative code. Indeed you are very unlikely to change the ORM and/or the DB in the future.
The only (but very strong) rational behind persistence ignorance of the domain model is separation of concerns: in the domain model you should express business invariants only! Persistence is an infrastructural concern!
For example without persistence ignorance (and with lazy loading) the domain model should handle possible exceptions from the db, it's complexity grows and business rules are buried under technological details.

Personally I find it near impossible to achieve a clean domain model when trying to use the same entities as the ORM.
My solution is to model my domain entities as I see fit and ensure that any ORM entities don't leak outside of the repositories. This means that my repositories accept and return domain entities.
This means you lose "most of your ORM goodness" and end up "using your ORM for simple CRUD operations".
Both of these trade-offs are fine for me, I would rather have a clean domain model that I can use, rather than one polluted with artefacts from my DB or ORM. It also cuts down the amount of time I spend "wrestling with my ORM" to zero.
As a side-note, I find document databases a much better fit for DDD.

Once you will provide persistence mapping in you domain model:
your code depends on framework. If you decided to change this framework, you want to change persistence layer and model layer source code - more work, more changes, more merging of code etc.
your domain model jar file depends on spring/nhibernate jars etc.
your classes become larger and larger how business code and persistence related code grows
I've to admit that I dont understand harder and hardest option.
We used separated interfaces and implementations for domain entities. Provide separated mapping files using Hibernate along with repositories.
Entities are created using factory (or repository later), identifier is generated within persistence layer, entity does not need it until it's being persisted.
Lazy loading is provided by special implementation of List once:
mapping of an entity contains it
entity/aggregate is fetched from persistence layer
The only issue is related to transaction as when you use lazy-loaded collection out of transaction scope, it fails.

I would follow the simplest option unless I ran into a stone wall. There are also pitfalls such as this when you adopt pi principle.
Somtimes some compromises are acceptable.
public class Order {
private String status;//my orm does not support enum
public Status status() {
return Status.of(this.status);
}
public is(Status status) {
return status() == status;//use status() instead of getStatus() in domain model
}
}

Need some advice concerning MVVM + Lightweight objects + EF

We develop the back office application with quite large Db.
It's not reasonable to load everything from DB to memory so when model's proprties are requested we read from DB (via EF)
But many of our UIs are just simple lists of entities with some (!) properties presented to the user.
For example, we just want to show Id, Title and Name.
And later when user select the item and want to perform some actions the whole object is needed. Now we have list of items stored in memory.
Some properties contain large textst, images or other data.
EF works with entities and reading a bunch of large objects degrades performance notably.
As far as I understand, the problem can be solved by creating lightweight entities and using them in appropriate context.
First.
I'm afraid that each view will make us create new LightweightEntity and we eventually will end with bloated object context.
Second. As the Model wraps EF we need to provide methods for various entities.
Third. ViewModels communicate and pass entities to each other.
So I'm stuck with all these considerations and need good architectural design advice.
Any ideas?

For images an large textst you may consider table splitting, which is commonly used to split a table in a lightweight entity and a "heavy" entity.
But I think what you call lightweight "entities" are data transfer objects (DTO's). These are not supplied by the context (so it won't get bloated) but by projection from entities, which is done in a repository or service.
For projection you can use AutoMapper, especially its newer feature that I describe here. This allows you to reduce the number of methods you need to provide "for various entities" (DTO's), because the type to project to can be given in a generic type parameter.

Entity Framework as Repository and UnitOfWork?

I'm starting a new project and have decided to try to incorporate DDD patterns and also include Linq to Entities. When I look at the EF's ObjectContext it seems to be performing the functions of both Repository and Unit of Work patterns:
Repository in the sense that the underlying data level interface is abstracted from the entity representation and I can request and save data through the ObjectContext.
Unit Of Work in the sense that I can write all my inserts/updates to the objectContext and execute them all in one shot when I do a SaveChanges().
It seems redundant to put another layer of these patterns on top of the EF ObjectContext? It also seems that the Model classes can be incorporated directly on top of the EF generated entities using 'partial class'.
I'm new at DDD so please let me know if I'm missing something here.

I don't think that the Entity Framework is a good implementation of Repository, because:
The object context is insufficiently abstract to do good unit testing of things which reference it, since it is bound to the DB access. Having an IRepository reference instead works much better for creating unit tests.
When a client has access to the ObjectContext, the client can do pretty much anything it cares to. The only real control you have over this at all is to make certain types or properties private. It is hard to implement good data security this way.
On a non-trivial model, the ObjectContext is insufficiently abstract. You may, for example, have both tables and stored procedures mapped to the same entity type. You don't really want the client to have to distinguish between the two mappings.
On a related note, it is difficult to write comprehensive and well-enforce business rules and entity code. Indeed, whether or not it this is even a good idea is debatable.
On the other hand, once you have an ObjectContext, implementing the Repository pattern is trivial. Indeed, for cases that are not particularly complex, the Repository is something of a wrapper around the ObjectContext and the Entity types.

I would say that you should look at the ObjectContext as your UnitOfWork, and not as a repository.
An ObjectContext cannot be a repository -imho- since it is 'to generic'.
You should create your own Repositories, which have specialized methods (like GetCustomersWithGoldStatus for instance) next to the regular CRUD methods.
So, what I would do, is create repositories (one for each aggregate-root), and let those repositories use the ObjectContext.

I like to have a repository layer for the following reasons:
EF gotcha's
When you look at some of the current tutorials on EF (Code First version), it is apparent there's a number of gotcha's to be handled, particularly around object graphs (entities containing entities) and disconnected scenarios. I think a repository layer is great for wrapping these up in one place.
A clear picture of data access mechanisms
A repository gives a specific picture as to how the BL is accessing and updating the data store. It exposes methods that have a clear single purpose, and can be tested independently of the BL. Standard example from the textbooks, Find() to find a single entity. A more application specific example, Clear() to clear down a db table.
A place for optimizations
Inevitably you come up against performance hits when using vanilla EF. I use the repository to hide the optimization mechanisms from the BL.
Examples,
GetKeys() to project cached keys from the tables (for Insert/Update decisions). The reading of key only is faster and uses less memory than reading the full entity.
Bulk load via SqlBulkCopy. EF will insert by individual SQL statements. If you want a single statement to insert multiple rows, SqlBulkCopy is a good mechanism. The repository encapsulates this and provides metadata for SqlBulkCopy. As well as the Insert method, you need a StartBatch() and EndBatch() method, which is also an argument for a UnitOfWork layer.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse