Neo4j (or any other graph database) modeling - nosql

I'm starting to work with graph databases, and in my team we've started modeling a graph for our software. The problem comes when we try to "document" the model, to see the structure of our database. With SQL databases you only have to look at the SQL schema.
We've spent some time reading neo4j blogs and documentation, but we've seen that the usual way to show how a graph works is with a minimal graph showing some sample data (Random samples: sample1, sample2, etc). That's great for educational purposes, but we'd love to be able to do it in a little more formal way. We'd like to set what kind of node can relate with another one, and with what kind of relationship, that kind of stuff.
Using Spring you can wrap the graph with classes, but it's very specific to Java and OO model, and we're working with Erlang. We're looking for some kind of formal language (SQL Schema equivalent), or a E-R model equivalent, or something like that.

One way to do this is to put the "meta-model" of your graph (a type network) in the graph as well and then connect the instances (nodes) to their meta-model-type. So you can visualize the meta-model using the graph visualization and at the same time use the meta-model to enforce additional constraints (by storing constraint information in the meta-model and using that when the actual model is updated) and also use the type-nodes of the meta-model to quickly access all "instance"-nodes of this type.
What is the domain you want to model?

A quick idea - could you use a subset of UML? Graph modeling seems to be closer to the domain, so maybe that's reasonable.
What we do is a generalization of the "example data" approach, where we include cardinality on each side of a relationship, as well as type and direction. I also often include a node "type" in the diagram (or some other specification of it's role/relation to domain models) instead of example data, and of course note the expected properties, their types, and whether they are optional. It's less than formal, but has served well so far.

Related

Domain Driven Design - Shared entities across bounded contexts

I am new to domain driven design and trying to learn and implement in my project. My project structure up till now similar to this.
Maintainance Folder Maintainance.Data(Class
Library) Maintainance.Domain(Class Library)
Maintainance.Domin.Tests(test project)
MovieBooking Folder MovieBooking.Data(Class
Library) MovieBooking.Domain(Class Library)
MovieBooking.Domain.Tests(test project)
SharedKernel Common things
Web Application MovieBooking MVC Web
Application(which have reference to MovieBooking Domain)
In Maintainance boundned context I am keeping all CRUD, GetAll type things for say Movie, Country, Category, Subcategory entities in Maintainance DBContext.
Now in MovieBooking data layer I will also need to use these entities (mostly to display name or dropdown fills in view, kind of subset needed - not all properties needed, only few like Id, name)
There are few ways I can access this entities in Movie booking Bounded Context
Via web services - Need to create web api for common entities like Movie,Country,Category,Subcategory and call web api in web project (to fill Dropdowns or get name from entities)
Via Reference Context (Seperate Dbcontext) - Need to configure Dbset and then map a database view (with only require fields) to Dbset
Example :
modelBuilder.Entity().ToTable(ViewName);
For (1) it can be long term implmentation solution for me
(2) I have to create view (with only few properties) for each require table and it will increase my number of views in my DB drastically as I have enterprise level application.
Is there any other way I can achieve this? Anything I am missing in DDD to look for ?
Option 2, while it will save you time, is actually a very bad idea from the DDD perspective as it allows for violations of the transactional boundary guarantees that each aggregate is meant to enforce\represent.
Option 1 seems a better option, although there are still quite a bit of wiggle room for interpretation based on your brief description of your proposed solution. If I understood correctly, it is generally recommended to follow the below:
Do not expose your aggregate state directly since this exposes internals and increases coupling. Simple create meaningful DTO's and use something like Automapper to map your Aggregates to DTO's easilly and with little effort before sending it over.
Have a duplicate of the DTO definition in your client. This will reduce coupling and allow for easier deployments.
I strongly recommend reading the DDD orange book although I have to say that I cannot recall specifically on which chapter this is discussed. You will also benefit a lot by reading about hexagonal architecture (and I would search for that term in the orange book to find more info about your question).
There is actually one alternative that I can think of: if you're publishing events from your BC's you can create a workflow to translate the domain events to "public" events and then in the other BC listen for the public events that you need to and store the data that you need somewhere inside there. The difficulty of this ranges from very easy to quite problematic depending on your infrastructure. Be aware that it is not a very good idea to re-use your domain events for transmitting data to other BC's since this closely couples the two BC's.
I hope this helps. Please do not hesitate to elaborate if I did not understood the question well enough.

Is it possible to do DDD and REST interface and language mapping?

REST has a uniform interface constraint which is the following in a very zipped opinion based format.
You have to use standards like HTTP, URI, MIME, etc...
You have to use hyperlinks.
You have to use RDF vocabs to annotate data and hyperlinks with semantics.
You do all of these to decouple the client from the implementation details of the service.
DDD with CQRS (or without it) is very similar as far as I understand.
By CQRS you define an interface to interact with the domain model. This interface consists of commands an queries classes.
By DDD you define domain events to decouple the domain model from the persistence details.
By DDD you have one ubiquitous language per bounded context which expresses the semantics.
You do all of these to completely decouple the domain model from the outside world.
Is it possible to map the REST uniform interface to the domain interface defined by commands and queries and domain events? (So the REST service code would be generated automatically.)
Is it possible to map the linked data semantics to the ubiquitous languages? (So you wouldn't need to define very similar terms, just find and reuse existing vocabs.)
Please add a very simple mapping example to your answer, why yes or why not!
I don't think this is possible. There is a term which I believe describes this problem, it is called ontology alignment.
In this case have have at least 3 ontologies:
the ubiquitous language (UL) of the domain model
the application specific vocab (ASO) of the REST service
the linked open data vocabs (LODO) which the application specific vocab uses
So we have at least 2 alignments:
the UL : ASO alignment
the ASO : LODO alignment
Our problem is related to the UL : ASO alignment, so let's talk about these ontologies.
The UL is object oriented, because we are talking about DDD and domain model. So most of the domain objects entities, value objects are real objects and not data structures. The non-object-oriented part of it are the DTOs like command+domainEvent, query+result and error on the interface of the domain model.
In contrast the ASO is strictly procedural, we manipulate the resources (data structures) using a set of standard methods (procedures) on them.
So from my aspect we are talking about 2 very different things and we got the following options:
make the ASO more object oriented -> RPC
make the UL less object oriented -> anaemic domain model
So from my point of view we can do the following things:
we can automatically map entities to resources and commands to operations by CRUD, for example the HydraBundle does this with active records (we can do just the same with DDD and without CQRS)
we can manually map commands to operations by a complex domain model
the operation POST transaction {...} can result a SendMoneyCommand{...}
the operation GET orders/123/total can result a OrderTotalQuery{...}
we cannot map entities to resources by a complex domain model, because we have to define new resources to describe a new service or a new entity method, for example
the operation POST transaction {...} can result account.sendMoney(anotherAccount, ...)
the operation GET orders/123/total can result in an SQL query on a read database without ever touching a single entity
I think it is not possible to do this kind of ontology alignment between DDD+CQRS and REST, but I am not an expert of this topic. What I think we can do is creating an application specific vocab with resource classes, properties and operations and map the operations to the commands/queries and the properties to the command/query properties.
You have posed some interesting questions here.
To start with I do not quite agree with
By DDD you define domain events to decouple the domain model from the
persistence details.
I think you might be confusing Event Sourcing ES with DDD, ES can be used with DDD but its very much optional in fact you should give it a lot of thought before choosing it as your persistence mechanism.
Now to the bulk of your question, of whether REST and DDD get along if yes how ?
My take on it, yes they do get along, however generally you do not want to expose your domain model via a REST interface, you want to build a abstraction over it and then expose that.
You can refer to this answer here, for a little more detail.
However i cannot recommend enough the Implementing Domain-Driven Design book, Chapter 14 Application deals with your concern to a fair degree.
I could not have explained it more thoroughly than the book and hence referring you there :)

How to expose read model from shared module

I am working on developing a set of assemblies that encapsulate parts of our domain that will be shared by many applications. Using the example of an order management system, one such assembly will contain all of the core operations an application can perform to/with an order. We are applying a simple version of CQS/CQRS so that all operations that change the state of the "system" are represented as public commands, such as CancelOrderCommand, ShipOrderCommand and CreateORderCommand. The command handlers are internal to the assembly.
The question I am struggling to answer is how to best expose the read model to consuming code?
The read model will be used by consuming code to perform queries. I don't know how all of the ways the read model will be used so the interface needs to be flexible to allow any query.
What complicates it for me is that I not only need to expose my aggregate root but there are also several "lookup" lists of related data that client applications may use. For example, each order has an associated OrderType which is data-driven (i.e., not an enum) and contains several properties that will drive some of our business rules that control what operations can/cannot be performed, etc. It is easy inside my module to manage this relationship; however, a client application that allows order creation will most likely need to display the list of possible OrderTypes to the user. As a result, I need to not only expose the list of Order aggregates but the supporting list of OrderTypes (and other lookup lists) from my read model.
How is this typically done?
I'm not sure what else to explain that will help trigger a solution, so please ask away...
I have never seen a CQRS based implementation expose a full dataset for ad-hoc querying so this is an interesting situation! In a typical CQRS scenario you would expose very specific queries because you may want to raise events when they are called (for caching for example - see this post for more details on that).
However since this is your design, let's not worry about "typical" or "correct" CQRS, I guess you just need a solution! One of the best new mechanisms for exposing data for flexible querying I have seen is the Open Data Protocol (OData). It will allow consumers to implement their own filtering, sorting and paging over a data source you expose.
Most implementations of this seem to deal with relational data. If you are dealing with a relational data source then OData might be a nice way to go. I suspect by your comment of "expose my aggregate root" that you might be using a document database? If so, there is one example I have seen of OData services on top of MongoDB: http://bloggingabout.net/blogs/vagif/archive/2012/10/11/mongodb-odata-provider-now-supports-arrays-and-nested-collections.aspx.
I hope that helps, OData is definitely worth looking into. It seems to be growing really quickly and is getting good support on both server and client technology platforms.

What is the philosophy of creating a database mapping tools?

I'm new in the work with the Object Relational Mapping Tool's and I want to know what has caused the programming tools are created. What a privilege to work with the database and mapping tools will be for us?
ORMs are used to close the gap between object-oriented models (applications) and the relational model (database). There are several difficulties for that mapping, like inheritance and object identification to name only a few (for example, check out http://www.cit.dk/cot/reports/reports/Case4/05-v1.1/cot-4-05-1.1.pdf to go on here).
There are many ORM implementations/frameworks that use different concepts to achieve the mapping and they offer different 'features'.
As you might know there are object-oriented database-systems too, but the relational model is still favored because of practical issues. For your application you could map the datamodel
on your own or (if available and it fits your needs) use an ORM.

How do I use entity framework with hierarchical data?

I'm working with a large hierarchical data set in sql server - modelled using the standard "EntityID, ParentID" kind of approach. There are about 25,000 nodes in the whole tree.
I often need to access subtrees of the tree, and then access related data that hangs off the nodes of the subtree. I built a data access layer a few years ago based on table-valued functions, using recursive queries to fetch an arbitrary subtree, given the root node of the subtree.
I'm thinking of using Entity Framework, but I can't see how to query hierarchical data like
this. AFAIK there is no recursive querying in Linq, and I can't expose a TVF in my entity data model.
Is the only solution to keep using stored procs? Has anyone else solved this?
Clarification: By 25,000 nodes in the tree I'm referring to the size of the hierarchical dataset, not to anything to do with objects or the Entity Framework.
It may the best to use a pattern called "Nested Set", which allows you to get an arbitrary subtree within one query. This is especially useful if the nodes aren't manipulated very often: Managing hierarchical data in MySQL.
In a perfect world the entity framework would provide possibilities to save and query data using this data pattern.
Everything IS possible with Entity Framework but you have to hack and slash your way in to it. The database I am currently working against has too many "holder tables" since Points for instance is shared with both teams and users. Both users and teams can also have a blog.
When you say 25 000 nodes do you mean navigational properties? If so I think it could be tricky to get the data access in place. It's not hard to navigate, search etc with entity framework but I tend to model on paper then create the database based on how I want to navigate while using entity framework. Sounds like you don't have that option.
Thanks for these suggestions.
I'm beginning to realise that the answer is to remodel the data in the database - either along the lines of nested sets as Georg suggests, or maybe a transitive closure table, which I've just come across.
That way, I'm hoping to get two key benefits:
a) faster querying aginst arbitrary subtrees
b) a data model which no longer requires recursive querying - so perhaps bringing it within easy reach of the Entity Framework!
It's always amazing how so often the right answer to a difficult problem is not to answer it, but to do something else instead!