Alternative datastore per request type on sails.js - sails.js

In a sails.js project, I would like to use a different database endpoint for READ operations (using AWS RDS read replica) than the default datastore that I will keep using for WRITE operations.
As explained here, it is possible in sails.js to set datastore on a per-model basis, but what about setting an alternative datastore on a per-request basis or directly for all read operations?

One way to accomplish this in a Sails-y fashion, would be to have "read only" models. This does a couple things: first, will make it very clear which datastore you are working with in your controllers, if you have say a User model, and a UserRead model, and that means it doubles your models (small price to pay in memory costs); but, this also means Sails can easily manage your read only and write databases, you just have to be conscience of using the proper model in the proper context.
To keep things really light in your duplicate "read only" model (and so you don't have to change 2 models every time something changes), you could just extend your original model, and simply change the datastore, something like this should work:
UserRead.js
module.exports = _.merge({}, require('./User'), {datastore: 'defaultRead'});

Related

DDD - How to store aggregates in NoSql databases

A current project needs us to persist domain objects in a NoSQL database such as mongoDB.
In many examples (incl. Eric Evans, Vaughn Vernon) the domain objects are serialized and persisted to the mongoDB directly.
We would like to avoid mixing the domain layer with persistence related inforamtion by not having any annotations in our domain objects.
Also we are concerned about corrupting the persisted data by changing the domain object in the future.
We came to the conclusion that we need to have some kind of DTOs translating between the domain objects and the persisted data.
Did anyone of you come across a good solution for such a case?
Yes. Your domain models should be ignorant of persistence. So you need a DTO or what I call data models (apart from the domain models and view models). Your data models will be map to the domain models before persisting to the database. This mapping is pretty common in insert and update operations. For read-only operations (reporting, etc) you can bypass the mapping from data models and to domain models. That will prevent loading the whole object graph of your domain models. This is widely applied in CQRS architecture patterns where read and write commands are separated.
Like you, I want business objects to have no dependency on any kind of specific repository. I solved it like this: That have your business object define its state objects and repository functions as interfaces. Your repository implementation can create an actual state object and inject that into your business object using the constructor.
There are a lot of advantages to this approach (such as having business objects for specific purposes), but you easily achieve complete (two-way) independence of your repository this way. Martin Fowler also hinted at this approach elsewhere.
I actually use the same pattern in my Angular / TypeScript projects. My read-api calls return DTO objects that get state objects injected as well and their properties map directly onto state objects.
These DTOs that end up as untyped javascript objects when they come from the api to the client (Angular) project are then in turn injected as state objects into TypeScript objects, injected in the constructor again and mapped by getters and setters. It works very cleanly and is well maintainable. I have an example on my GitHub (niwra) account (Software-Management repositories), but can expand here if anyone is interested.
MongoDB allows for very clean and Unit-Testable repository implementations, that returns strongly typed aggregates. The only thing I haven't solved cleanly yet is telling MongoDb about state objects for child-collections. Currently that is pretty 'static' still, but I'm sure I'll find some nice solution.
You can store your domain objects as-is in document databases. Vaughn Vernon has posted an article The Ideal Domain-Driven Design Aggregate Store? about this, featuring PostgreSQL new (at that time) JSONB document-like storage.
Of course, you get a risk having your aggregates polluted by BsonX attributes, which you probably do not want. You can avoid this by using convention configuration but you will still need to think about serialisation and this can have an effect on the level of encapsulation.
Another pattern here is to use a separate state object, which is then held as a property inside the aggregate root (or regular entity). I would not call it a "DTO", since this is clearly your aggregate state. You are not transferring anything. Methods inside your aggregate can mutate the state or, even better, the state would be an immutable value object and new state is produced when you need to change the state.
In such case persistence would only care about the state object. You still might be unhappy to have MongoDb attributes on the state object properties and this is reasonable. Then, you would need to have an identical structure inside the persistence mechanism, so you can map properties on-to-one.
A current project needs us to persist domain objects in a NoSQL
database such as mongoDB. In many examples (incl. Eric Evans, Vaughn
Vernon) the domain objects are serialized and persisted to the mongoDB
directly.
I can confirm you that MongoDB is a good choice for persisting DDD models. I use MongoDB as an Event store in my current project. You can use MongoDB even if you are not using Event sourcing, for example using an ODM (Object Document Mapper): you have a document for each Aggregate instance (this applies to any document based database, not only MongoDB) and you store nested entities and value objects as nested documents.
We would like to avoid mixing the domain layer with persistence related inforamtion by not having any annotations in our domain objects.
You can use xml mapping.
Also we are concerned about corrupting the persisted data by changing the domain object in the future.
For this you can use custom migration scripts. If you use Event sourcing then there are event versioning strategies.
We came to the conclusion that we need to have some kind of DTOs translating between the domain objects and the persisted data.
This is a bad conclusion.
If you use CQRS you won't need DTOs because the readmodels are enough.

Using ViewModels instead DTOs as the result of a CQRS query

Reading a SO question, I realized that my Read services could provide some smarter object like ViewModels instead plain DTOs. This makes me reconsider what information should be provided by the objects returned by the Read Services
Before, using just DTOs, my Read Service just made flat view mapping of a database query into hash like structure with minimum normalization and no behavior.
However I tend to think of a ViewModel as something "smarter" that can have generated information not provided by the database, like status icon, calculated values, reformatted values, default values, etc.
I am starting to see that the construction of some ViewModel objects might get more complicated and has potential downsides if I made my generic ReadServiceInterface return ViewModels only:
(1) Should I plan some design restriction for the ViewModels returned by my CQRS? Like making sure that their construction is almost as fast as a plain DTO?
(2) DTOs by nature are easily serialized and ready to be sent to an external system in a SOA architecture or embedded into a message. Does this mean that using ViewModels will have a negative impact on my architecture?
(3) Which type of ViewModels should I keep outside my Read Services?
(4) Should I expect all ViewModels to be retrieved from Read Services?
In the past I implemented some ViewModels that needed more than one query. In a CQRS I suppose, that is a design smell, since everything they provide, should be in only one query.
I am starting a new project, where I thought that any query will return either aggregate objects or DTOs. Since now ViewModels come into play. I am wondering:
(5) Should I plan that queries within my architecture will yield two type of objects (ViewModels+Aggregates) or three (+DTO)?
View Models (VM) serve a single master: the View. We're usually consider the VM a pretty dumb object so in this regard, there's no technical difference between a VM and a DTO, only their purpose and semantics are different.
How you build a VM is an implementation detail. Some VM are pre generated and stored in a VM repository. Others are built in real-time by a service (or a query handler) either by querying the db directly or querying other repos/services then assembling the results. There's no right or wrong and no rules about how to do it. It comes down to preference.
In CQRS the important part is separation of commands from queries i.e more than one model. There's no rule about how many queries you should do or if you should return a view model or dto. As long as you have at least one read model dedicated for queries, it's CQRS.
Don't let technicalities complicate your design. Proper design is more about high level structure and not low level implementation. Use CQRS because having a read model simplifies your app, not for other reasons. Aim for simplification and clean code, not for rigid rules that dictate a 'how to' recipe.

What are the benefits of ORM lazy loading?

I'm researching data layer underpinnings for a new web-based reporting system and have spent a lot of time evaluating ORM's over the last few days. That said, I've never dealt with "lazy loading" before and am confused at why its the default setting for LINQ queries in the Entity Framework. It seems like it creates a lot of network traffic and unnecessarily tasks the database with additional queries that could otherwise be resolved with joins.
Can someone describe a scenario in which lazy loading would be beneficial?
Some meta:
The new system will be working against a database with hundreds of tables and many terabytes of data in a production environment with over 3,000 concurrent users on the system 24 hours a day. They will be retrieving large datasets continuously. Is it possible that an ORM just isn't the right solution for our needs, especially since the app will be web-based?
When we talk about lazy loading we are talking about Navigation Properties (how we follow foreign keys). What lazy loading will do for us is to populate the entity from a remote table as we attempt to access that entity. For example if we have a model like this
public class TestEntity
{
public int Id{get;set;}
public AnotherEntity RemoteEntity{get;set;}
}
And call the following
var something = WhateverContext.TestEntities.First().RemoteEntity;
We will get 2 database calls, one for WhateverContext.TestEntities.First() and one for loading the remote entity.
I'm a web guy, (and more specifically an MVC guy) and for web stuff I don't think there is ever a good reason for wanting to do this, One database call is always going to be quicker than two if we require the same set of data.
The situation where I think that lazy loading is actually worth considering is when you don't know when you do your first query if you will need the second entity at all. In my opinion this is much more relevant for windows applications where we have a user who is performing actions in real time (rather than stateless MVC where users are requesting whole pages at once). For example I think lazy loading shines when we have a list of data with a details link, then we don't load the details until the user decides they want to see them.
I don't feel this extends to paging, sorting and filtering, IMO there should be one specifically crafted database query per page of data you are displaying, which returns exactly the data set required to display that page.
In terms of your performance question, I feel that EF (or another ORM) can probably meet your needs here but you want to be careful with how you are retrieving large datasets due to the way EF tracks entities. Check out my EF performance tuning cheat sheet, and read up on DetectChanges and AsNoTracking if you do decide to use EF with large queries.
Most ORMs will give you the option, when you're building up your object selections, to say "don't be lazy, go ahead and join", so if you're worried about it from an efficiency perspective, don't be. You can make it work (usually).
There are 2 particular cases I know of where lazy loading helps:
Chaining commands
What if you want to create a basic select, but then you want to run it through a sort and a filter function that's based on user input. You can simply pass the ORM object in, and attach the sort and filtering functionality to it. Instead of evaluating it each time, it only evaluates when it's actually used.
Avoiding huge, deep, highly-relational queries
What if you just need the IDs of some related fields? If it loads lazily, you don't have to worry about it joining a whole bunch of data and tables that you don't need, potentially slowing down the query and overusing bandwidth. Of course, if you DID want everything else, then you'll need to be explicit, or you may run into a problem where it lazily runs a query for each detail record. Like I mentioned at the outset, that's easily overcome in any ORM worth using.
A simple case is a result set of N records which you do not want to bring to the client at once. The benefit is that you are able to lazily load only what is needed for the clients demands, such as sorting, filtering, etc... An example would be a paging view where one could page through records and sort them accordingly, thus the client only needs N amount at a given time.
When you perform the LINQ query it translates that to SQL commands on the server side to provide only what is needed in the given context. It boils down to offloading work to the database and minimizing what you need to send back to the client.
Some will argue that ORM based lazy loading is wrong however that starts to move to semantics fairly quick and should be more about approach to design versus what is right and wrong.

How to expose read model from shared module

I am working on developing a set of assemblies that encapsulate parts of our domain that will be shared by many applications. Using the example of an order management system, one such assembly will contain all of the core operations an application can perform to/with an order. We are applying a simple version of CQS/CQRS so that all operations that change the state of the "system" are represented as public commands, such as CancelOrderCommand, ShipOrderCommand and CreateORderCommand. The command handlers are internal to the assembly.
The question I am struggling to answer is how to best expose the read model to consuming code?
The read model will be used by consuming code to perform queries. I don't know how all of the ways the read model will be used so the interface needs to be flexible to allow any query.
What complicates it for me is that I not only need to expose my aggregate root but there are also several "lookup" lists of related data that client applications may use. For example, each order has an associated OrderType which is data-driven (i.e., not an enum) and contains several properties that will drive some of our business rules that control what operations can/cannot be performed, etc. It is easy inside my module to manage this relationship; however, a client application that allows order creation will most likely need to display the list of possible OrderTypes to the user. As a result, I need to not only expose the list of Order aggregates but the supporting list of OrderTypes (and other lookup lists) from my read model.
How is this typically done?
I'm not sure what else to explain that will help trigger a solution, so please ask away...
I have never seen a CQRS based implementation expose a full dataset for ad-hoc querying so this is an interesting situation! In a typical CQRS scenario you would expose very specific queries because you may want to raise events when they are called (for caching for example - see this post for more details on that).
However since this is your design, let's not worry about "typical" or "correct" CQRS, I guess you just need a solution! One of the best new mechanisms for exposing data for flexible querying I have seen is the Open Data Protocol (OData). It will allow consumers to implement their own filtering, sorting and paging over a data source you expose.
Most implementations of this seem to deal with relational data. If you are dealing with a relational data source then OData might be a nice way to go. I suspect by your comment of "expose my aggregate root" that you might be using a document database? If so, there is one example I have seen of OData services on top of MongoDB: http://bloggingabout.net/blogs/vagif/archive/2012/10/11/mongodb-odata-provider-now-supports-arrays-and-nested-collections.aspx.
I hope that helps, OData is definitely worth looking into. It seems to be growing really quickly and is getting good support on both server and client technology platforms.

Is core data implementing data mapper pattern?

I know that core data should not be considered as ORM but it still offers the functionality that is similar to ORM. Just curious, is it implementing data mapper pattern? I know "The Data Mapper is a layer of software that separates the in-memory objects from the database. Its responsibility is to transfer data between the two and also to isolate them from each other." (Martin Fowler). IMHO context manager handles all SQL stuff into one transaction, so it's very performance wise design and IMHO core data might be considered implementing data mapper pattern.
One year latter, I will contribute with my two cents
I am not an ORM expert and just recently started something using a Data Mapper, but as a long time Core Data user I can say that no. The main objective of this pattern is having a clear cut of a domain object from all database related operations.
Once I start writing unit tests, the first thing I notice is that I must load a database, even if it is just some in memory store, but I do must load one. Also there are no mappers for each class, I have no control about how each relation is stored.
Core Data loads lots of meta information about your object graph and forces some structure to them. Although you can change the persistent store and bake something of your own, you will have lots of restrictions about how to do it, with a clear "relational" feeling to it.
The idea is good, we might say it is some variation of it. Something that I do love is that the save operation is done by the context, not the object itself. So there is some type of separation.
However look at those functions like "awakeFromFetch" or "didSave", both operations are related with the data store, not a plain domain object. A proper Data Mapper pattern would allow you to define those operations for each persistent store, not unified in a single object.
UPDATE:
Funny enough one day after my answer I had to deal with an old CoreData based project and must come back to improve this answer. To make things clear, I do consider that "seems like a pattern" is not enough. For example, implementation of the facade and adapter patterns is quite similar, but you name them differently depending on how you use them.
Is Core Data implementing data mapper?
I must say that my "not quite" should have been "definitely not!"
I have just been very angry because I needed to rename some fields and later add new ones. Although I do know quite well how auto-migrations work with Core Data I forgot how annoying these are.
How many times do you need some new field, rename something, experiment until you get it right.... and every single tiny change requires a full blown database migration? With Data Mappers this never happens because domain objects are perfectly decoupled. You only touch the database to catch up with the domain objects after you finish some new feature. Core Data forces you to bind at every single moment every single detail of your domain objects.
Boy, how sweet life was until I forgot that "tiny" annoyance of Core Data being the exact opposite of what you can achieve with data mappers.