Normalized and immutable data model - scala

How does Haskell solve the "normalized immutable data structure" problem?
For example, let's consider a data structure representing ex girlfriends/boyfriends:
data Man = Man {name ::String, exes::[Woman]}
data Woman = Woman {name :: String, exes::[Man]}
What happens if a Woman changes her name and she had been with 13 man? Then all the 13 man should be "updated" (in the Haskell sense) too? Some kind of normalization is needed to avoid these "updates".
This is a very simple example, but imagine a model with 20 entities, and arbitrary relationships between them, what to do then?
What is the recommended way to represent complex, normalized data in an immutable language?
For example, a Scala solution can be found here (see code below), and it uses references. What to do in Haskell?
class RefTo[V](val target: ModelRO[V], val updated: V => AnyRef) {
def apply() = target()
I wonder, if more general solutions like the one above (in Scala) don't work in Haskell or they are not necessary? If they don't work, then why not? I tried to search for libraries that do this in Haskell but they don't seem to exist.
In other words, if I want to model a normalized SQL database in Haskell (for example to be used with acid-state) is there a general way to describe foreign keys? By general I mean, not hand coding the IDs as suggested by chepner in the comments below.
Yet in other words, is there a library (for Haskell or Scala) that implements an SQL/relational database in memory (possibly also using Event Sourcing for persistence) such that the database is immutable and most of the SQL operations (query/join/insert/delete/etc.) are implemented and are type-safe ? If there is not such a library, why not ? It seems to be a pretty good idea. How should I create such a library ?
Some related links:

The problem is you are storing data and relationships in the same type. To normalise, you need to separate. Relational databases 101.
newtype Id a = Id Int -- Type-safe ID.
data Person = Person { id :: Id Person, name :: String }
data Ex = Ex { personId :: Id Person, exId :: Id Person }
Now if a person changes their name, only a single Person value is affected. The Ex entries don't care about peoples' names.

Project M63 comes pretty close to what I was looking for. It is written in Haskell.
A more lightweight Haskell solution is outlined in Gabriel Gonzalez's post "A very general API for relational joins".


Does a birthdate/deathdate class should be a composition or an aggregation to an individual class?

The entity is a person.
So the entity have a birthdate and maybe already have a deathdate.
But this dates can or cannot be informed (depends of the entity and avaibility of the informations) ; so the entity might have none of those.
But I feel to do mess with the cardinality and the relation type.
How should I represent that ?
I have created an abstract class Individual. It leads to 2 final class : Person (identified person) or Pseudonym (anonym person).
It linked to a class Birthdate and a class Deathdate (both are generalized as a class Date).
[Birthdate]----<>[Individual] relationship is :
one (optional)-to-many (0..1 - 1..*)
0..1 : Because birthdate can be omitted and individual can have just one date of birth.
1..* : Because birthdate must concern at least one, but can concern severals individual.
[Deathdate]----<>[Individual] relationship is :
one (optional)-to-many (0..1 - 1..*)
0..1 : Because the individual isn't dead yet and can die just once.
1..* : Because deathdate must concern at least one but can concern severals individual.
But since, theoretically, everyone have a birthdate (and will have a deathdate) I was tempted by a composition. But some might prefer keep these dates secret and I wondered if composition could allow that.
Futhermore one date can correspond to severals individuals and here also I guess composition isn't possible then OR else it's me who did the confusion between Individual class and its instances (the individuals) and then Composition would be possible but not with the aforementionned cardinality.
At the moment I chose that :
Aggregation :
___________ _______________
|Birthdate|0..1-----1..*< >| |
___________ | <<Individual>>|
|Deathdate|0..1-----1..*< >|_______________|
But I hesitate with this one
Composition :
___________ _______________
|Birthdate|0..1-----1<#>| |
___________ | <<Individual>>|
What is the right answer ? Thanks for the attention.
There is a number of issues with the approach.
First - using a class for dates is simply an overkill. Both birthdate and deathdate are attributes of a specific person and can be easily modelled as inline properties of the Individual class. Unless there is some significant reason to use something more than the good old Date DataType, keep with the standard approach.
For visibility issue, as object oriented principles say you should not expose the properties directly anyway. Rather than that you should have an operation responsible for retrieving birthdate and deathdate that will control if the date can be read or not. You may add boolean attributes that will support that, but it isn't necessary if the ability to see the dates depend on some state of the Individual or other things (e.g. "who" asks). In the former case you may also wish to still show explicitly those boolean attributes as derived ones.
If you insist on using a class for dates (e.g. as you want to have a Wikipedia-style "Born on date"/"Deceased on date" collections) you should create just one class Date and build associations to this class pretty much similar to the way you did in your second approach. In such situation, the multiplicity does not work "database style" but is a property of association itself. In particular association you have one birthdate/deathdate and one Individual. By default you will have two 1-0..1 association one for each but depending on the approach you may have much more complex approach as well.
I'll later add diagrams for more clarity.
One last remark.
Do not use << >> for the class name. Those are reserved to indicate stereotypes.
If you want to indicate that Individual is abstract either show it in italics or (if your tool doesn't allow that) use <<abstract>> stereotype.

EF, Repositories and crossing aggregate boundaries

I have a two aggregate roots in my domain, and therefore two repositories. We'll call them BookRepository, and AuthorRepository, for the sake of example.
I'm designing an MVC application, and one page has to display a table containing a list of authors, with each row showing the author's personal details. At the end of each row is a small button that can be clicked to expand the row and show a child table detailing the author's published books.
When the page loads, some ajax is executed to retrieve the Author details from an API controller and display the data in the table. Each property in an Author object maps almost directly to a column, with one exception, and this is where I'm having my problem. I want the button at the end of each row to be disabled, if and only if the author has no published books. This means that a boolean has to returned with each Author record, indicating if they have any published books.
My book repository has a couple of methods like this:
public IEnumerable<Book> GetBooksForAuthor(int authorId);
public bool AnyBooksForAuthor(int authorId);
and my Book class has a property called AuthorId, so I can retrieve a book's author by calling
My problem is that in order to create a row for my aforementioned table, I need to create it like this:
IEnumerable<Author> authors = authorRepository.GetAll();
foreach (Author author in authors)
yield return new AuthorTableRow
Name = author.Name,
Age = author.Age,
Location = author.PlaceOfResidence.Name,
HasBooks = this.bookRepository.AnyBooksForAuthor(author.Id)
The above code seems correct, but there's a fairly heft performance penalty in calling this.bookRepository.AnyBooksForAuthor(author.Id) for every single author, because it performs a database call each time.
Ideally, I suppose I would want an AuthorTableRowRepository which could perform something like the following:
public IEnumerable<AuthorTableRow> GetAll()
return from a in this.dbContext.Authors
select new AuthorTableRow
Name = a.Name,
Age = a.Age,
Location a.PlaceOfResidence.Name
HasBooks = a.Books.Any()
I'm hesitant to put this in place for these reasons :
AuthorTableRowRepository is a repository of AuthorTableRows, but AuthorTable row is not a domain object, nor an aggregate root, and therefore should not have its own repository.
As Author and Book are both aggregate roots, I removed the "Books" property from the Author entity, because I wanted the only way to retrieve books to be via the BookRepository. This makes HasBooks = a.Books.Any() impossible. I am unsure whether I am imposing my own misguided best practice here though. It seems wrong to obtain Books by obtaining an Author via the AuthorRepository and then going through its Books property, and vice versa in obtaining an Author via a property on a Book object. Crossing aggregate root boundaries would be the way I'd term it, I suppose?
How would other people solve this? Are my concerns unfounded? I am mostly concerned about the (what should be a) performance hit in the first method, but I want to adhere to best practice with the Repository pattern and DDD.
I would stick to the first approach, but try to optimize things in the bookrepository method. For instance, you can load this information all in one time, and use in-memory lookup to speed this up. Like this you would need 2 queries, and not 1 for each author.
The way I solved this in the end was to create an Entity from a view in the database. I named the entity 'AuthorSummary', and made an AuthorSummaryRepository that didn't contain any Add() methods, just retrieval methods.

Class Design, which one is best design approach for this?

I am new to architecture design and need some help on this.
I have two class namely 'Part' and 'Supplier'. A part will have supplier.
In my class design, should i have 'int SupplierID' (type is 'int') or 'Supplier supplier' (type is 'Supplier' ) as my property in Part class ?
Which one is better? What is the Pros and Cons of them?
Kinldy provide your input on this.
Supplier supplier
Having Supplier as a type and having SupplierID as a property of Supplier would make more sense to me. The initial benefit is that you can do some basic validation on the supplier ID. Sure you are representing it as an int now but this could (and probably will) change in the future. For example, you may decide to represent the ID as a string and int internally but when reporting it you will represent it as a string: XYZ1234, where XYZ is the Supplier company name(string) and 1234 is the unique ID (int) (bad contrived example maybe, but it is still likely to change in some way)/
The real advantage of having Supplier as a type is due to the fact you will be able to use Dependancy Injection to assign the Supplier to the Part when you create an instance of Part. So your constructor for Part should look like:
Part(Supplier supplier)
_supplier = supplier;
Now your Part class is not dependant on changes in your Supplier class. I.e. it is not dependant on it.
Note: If your not familiar with Dependancy Injection, this article from Martin Fowler should explain:

How to sort related entities with eager loading in ADO.NET Entity Framework

Considering the Northwind sample tables Customers, Orders, and OrderDetails I would like to eager load the related entities corresponding to the tables mentioned above and yet I need ot order the child entities on the database before fetching entities.
Basic case:
var someQueryable = from customer in northwindContext.Customers.Include("Orders.OrderDetails")
select customer;
but I also need to sort Orders and OrderDetails on the database side (before fetching those entities into memory) with respect to some random column on those tables. Is it possible without some projection, like it is in T-SQL? It doesn't matter whether the solution uses e-SQL or LINQ to Entities. I searched the web but I wasn't satisfied with the answers I found since they mainly involve projecting data to some anonymous type and then re-query that anonymous type to get the child entities in the order you like. Also using CreateSourceQuery() doesn't seem to be an option for me since I need to get the data as it is on the database side, with eager loading but just by ordering child entities. That is I want to do the "ORDER BY" before executing any query and then fetch the entities in the order I'd like. Thanks in advance for any guidance. As a personal note, please excuse the direct language since I am kinda pissed at Microsoft for releasing the EF in such an immature shape even compared to Linq to SQL (which they seem to be getting away slowly). I hope this EF thingie will get much better and without significant bugs in the release version of .NET FX 4.0.
Actually I have Tip that addresses exactly this issue.
Sorting of related entities is not 'supported', but using the projection approach Craig shows AND relying on something called 'Relationship Fixup' you can get something very similar working:
If you do this:
var projection = from c in ctx.Customers
select new {
Customer = c,
Orders = c.Orders.OrderByDescending(
o => o.OrderDate
foreach(var anon in projection )
anon.Orders //is sorted (because of the projection)
anon.Customer.Orders // is sorted too! because of relationship fixup
Which means if you do this:
var customers = projection.AsEnumerable().Select(x => x.Customer);
you will have customers that have sorted orders!
See the tip for more info.
Hope this helps
You are confusing two different problems. The first is how to materialize entities in the database, the second is how to retrieve an ordered list. The EntityCollection type is not an ordered list. In your example, customer.Orders is an EntityCollection.
On the other hand, if you want to get a list in a particular order, you can certainly do that; it just can't be in a property of type EntityCollection. For example:
from c in northwindContext.Customers
orderby c.SomeField
select new {
Name = c.Name,
Orders = from o in c.Orders
orderby c.SomeField
select new {
SomeField = c.SomeField
Note that there is no call to Include. Because I am projecting, it is unnecessary.
The Entity Framework may not work in the way you expect, coming from a LINQ to SQL background, but it does work. Be careful about condemning it before you understand it; deciding that it doesn't work will prevent you from learning how it does work.
Thank you both. I understand that I can use projection to achieve what I wanted but I thought there might be an easy way to do it since in T-SQL world it's perfectly possible with a few nested queries (or joins) and order bys. On the other hand seperation of concerns sounds reasonable and we are in the entity domain now so I will use the way you two both recommended though I have to admit this is easier and cleaner to achieve in LINQ to SQL by using AssociateWith.
Kind regards.

Entity Framework and associations between string keys

I am new to Entity Framework, and ORM's for that mather.
In the project that I'm involed in we have a legacy database,
with all its keys as strings, case-insensitive.
We are converting to MSSQL and want to use EF as ORM,
but have run in to a problem.
Here is an example that illustrates our problem:
TableA has a primary string key,
TableB has a reference to this primary key.
In LINQ we write something like:
var result = from t in context.TableB select t.TableA;
foreach( var r in result )
Console.WriteLine( r.someFieldInTableA );
if TableA contains a primary key that reads "A", and TableB contains two rows that references TableA but with different cases in the referenceing field, "a" and "A".
In our project we want both of the rows to endup in the result, but only the one
with the matching case will end up there.
Using the SQL Profiler, I have noticed that both of the rows are selected.
Is there a way to tell Entity Framework that the keys are case insensitive?
Edit:We have now tested this with NHibernate and come to the conclution that NHibernate works with case-insensitive keys. So NHibernate might be a better choice for us.I am however still interested in finding out if there is any way to change the behaviour of Entity Framework.
Thanks for your answer!
Problem is that if we add that constraint to the database now,
the legacy application might stop working because of how it is built.
Best for us would be, if possible, to change the behavior of EF.
I'm guessing it is not possible, but I'm giving it a shot.
edit: The reason why I added an answer to my own question was that I added this question before I was a registerd user, and when I had registred my account I couldn't add comments or edit my post. Now the accounts are merged.
I think you need to make the change to the schema in SQL Server, not in EF. This post's answer, on how to make a column case-sensitive, looks like it will do the trick: T-SQL: How do I create a unique key that is case sensitive?
I know this isn't a perfect solution, but in LINQ why not do the join yourself. EF doesn't work because the .Designer.cs file returns objA.Equals(objB) when doing the join. .Equals is case sensitive.
var result = from t1 in context.TableB
join t2 in context.TableA on t1.someFieldInTableB.ToUpper() equals t2.someFieldInTableA.ToUpper();
Hackish I know, but LINQ to Entities is still in its infancy and the object classes that are designed are designed for specific reasons that do not handle exceptional cases in a design such as this.
Another alternative is that you can create your own code generator using T4 templates. Since everything is a public partial class you can create a navigation property that actually does the case insensitive comparisson that you are looking for.
To answer your question truthfully though, there is no "out of the box" way to get EF to do a navigation using case insensitive searching.
I came up with a workaround that "stitches up" the string based association in memory after the context has retrieved the rows from the database (hint: making using of the context.[EntityTypeCollection].Local property. You can see my answer at
I know this isn't a perfect solution, but in LINQ why not do the join yourself. EF
doesn't work because the .Designer.cs file returns objA.Equals(objB) when doing the >> join. .Equals is case sensitive.
Well, not if you override the Equals method
The generated domain classes in EF are partial no? So it's fairly easy to replace the default Equals implementation of these classes by your own implementations (which of course would render it case insensitive )
BTW : a technique dat dates back from .NET 1.0
With all this .NET 3.5/4.0, Linq and Lambda violence, people tend to forget about the basics
As an alternative to the Entity Framework, you can use LINQ to SQL, which works well with relations involving case sensitive collations. Although this ORM does not offer all the flexibility of EF or NHibernate, it can be sufficient in many cases.
I've recently posted a thread on the official Microsoft Entity Framework forum: