I'm working with a large hierarchical data set in sql server - modelled using the standard "EntityID, ParentID" kind of approach. There are about 25,000 nodes in the whole tree.
I often need to access subtrees of the tree, and then access related data that hangs off the nodes of the subtree. I built a data access layer a few years ago based on table-valued functions, using recursive queries to fetch an arbitrary subtree, given the root node of the subtree.
I'm thinking of using Entity Framework, but I can't see how to query hierarchical data like
this. AFAIK there is no recursive querying in Linq, and I can't expose a TVF in my entity data model.
Is the only solution to keep using stored procs? Has anyone else solved this?
Clarification: By 25,000 nodes in the tree I'm referring to the size of the hierarchical dataset, not to anything to do with objects or the Entity Framework.
It may the best to use a pattern called "Nested Set", which allows you to get an arbitrary subtree within one query. This is especially useful if the nodes aren't manipulated very often: Managing hierarchical data in MySQL.
In a perfect world the entity framework would provide possibilities to save and query data using this data pattern.
Everything IS possible with Entity Framework but you have to hack and slash your way in to it. The database I am currently working against has too many "holder tables" since Points for instance is shared with both teams and users. Both users and teams can also have a blog.
When you say 25 000 nodes do you mean navigational properties? If so I think it could be tricky to get the data access in place. It's not hard to navigate, search etc with entity framework but I tend to model on paper then create the database based on how I want to navigate while using entity framework. Sounds like you don't have that option.
Thanks for these suggestions.
I'm beginning to realise that the answer is to remodel the data in the database - either along the lines of nested sets as Georg suggests, or maybe a transitive closure table, which I've just come across.
That way, I'm hoping to get two key benefits:
a) faster querying aginst arbitrary subtrees
b) a data model which no longer requires recursive querying - so perhaps bringing it within easy reach of the Entity Framework!
It's always amazing how so often the right answer to a difficult problem is not to answer it, but to do something else instead!
Related
In my app I have two entities, User and Meetings. What I want is a list of User who have meetings today.
Also, I haven't added relationship between both the entities. Is there any way through which I can query both the entities in a single fetch request. Or is there any other way.
Please help me to solve this in best possible way
Thanks in advance
Core Data tries to map objects from the OOP-world into tables and rows from the rDBMS-world and back. This is called a object-relational mapper (ORM). Even this looks very easy, because concepts seems to be similar, it is a difficult task. One called it the "Vietnam of information technology".
However, at some point things do not go together. This is called the object-relational impedance mismatch (ORIM). At this point one has to decide, whether he takes the OOP-way or the rDBMS-way. Resolving relationships is one of this points.
Core Data decided to do this the OOP-way: Relationships are treated as relationships between "usual" objects. This has two consequences:
You do not join anything. In OOP objects are not joined. So in Core data objects are not joined. (However, they have some features in a fetch request with dictionaries, but this is not the usual way to access data in Core Data.)
To do the job, Core Data needs to know the relationships between objects. You have to set the relationships.
I've system that has a primary data model to perform most of the work.
The model has quite a few tables and with performance in mind when I came to add an administrative feature to the application I decided to use a second separate data model.
All works well until my second data model needs to access a table that is also in the primary data model. Now, from digging around I can see this can cause problems.
The two possible workaround I've come up with are to either:
Put the data models in separate projects.
Use views / stored procedures for accessing the table in question when required.
Method 1 seems the simpliest but I'm concerned about whether there would be any performance loss. Method 2 seems a bit messy and takes the point out of using EF.
Before I plump for using method 1, is there an easier work around that I could use?
In the end I decided to put the two data models into separate projects and I've there hasn't been any slowdown that I've been able to notice (I've not done any benchmarking but it's passed the perception test).
In one of her online tutorials EF guru Julie Lerman says that you should put your data model in a separate project anyway, so I don't think this has been a bad workaround.
I am working with 2 models in the same project, because I connect to 2 different databases. I have put different namespaces using "Custom Tool Namespace" on *.tt files but it is not necessary. It generally works, but it cannot handle situation when the entity (table) with the same name is in both models. When you save one model the entity with the same name is deleted from the second model.
We develop the back office application with quite large Db.
It's not reasonable to load everything from DB to memory so when model's proprties are requested we read from DB (via EF)
But many of our UIs are just simple lists of entities with some (!) properties presented to the user.
For example, we just want to show Id, Title and Name.
And later when user select the item and want to perform some actions the whole object is needed. Now we have list of items stored in memory.
Some properties contain large textst, images or other data.
EF works with entities and reading a bunch of large objects degrades performance notably.
As far as I understand, the problem can be solved by creating lightweight entities and using them in appropriate context.
First.
I'm afraid that each view will make us create new LightweightEntity and we eventually will end with bloated object context.
Second. As the Model wraps EF we need to provide methods for various entities.
Third. ViewModels communicate and pass entities to each other.
So I'm stuck with all these considerations and need good architectural design advice.
Any ideas?
For images an large textst you may consider table splitting, which is commonly used to split a table in a lightweight entity and a "heavy" entity.
But I think what you call lightweight "entities" are data transfer objects (DTO's). These are not supplied by the context (so it won't get bloated) but by projection from entities, which is done in a repository or service.
For projection you can use AutoMapper, especially its newer feature that I describe here. This allows you to reduce the number of methods you need to provide "for various entities" (DTO's), because the type to project to can be given in a generic type parameter.
I am having a model "Events" (Zend_Db_Table_Abstract) that's got various relationships to other models. Usually I think I would do something like this to find it and its relationships:
$events = new Events();
$event = $events->find($id)->current();
$eventsRelationship1 = $event->findDependentRowset('Relationship1');
As the relationship is already set up I am wondering if there's any sort of automatic join available or something. Every time I fetch my event I need to have all the relationships, too. Currently I see only two ways to achieve that:
Build the query myself, hard coded. Don't like this, because it's working around the already set up relationship and "model method convenience".
Fetch every related object with a single query. This one's ugly, too, as I have to trigger too many queries.
This goes even a step further when thinking about getting a set of multiple rows. For a single event I may query the database multiple times, but when fetching 100 rows joins are just elementary.
So, does anyone know a way to create joins by using those relationships or is there no other way than hardcoding the query?
Thanks in advance
Arne
The way to solve this challenge is to 'upgrade' your database access to use the dataMapper pattern.
You are essentially adding an extra layer between the model in your application an their representation in the db. This mapper layer allows you read/write data from different tables - rather than a direct link between one model and one table.
Here is a good tutorial to follow. (There are some bits you can skip - I left out all the getters and setters as its just me using the code).
It takes a little while to get your head round the way it works, when you've just been using Zend_Db_Table_Abstract, but it is worth it.
Stephan Walters video on MVC and Models is a very good and light discussion of the various topics listed in this questions title. The one question listed in the notes unanswered was:
If you create an Interface / Repository pattern for Linq2SQL, does Linq2SQLs classes still cause a dependency on Linq, even though you pass the classes as toList?
It is probably an easy answer YES, however, what standard mechanic would you use to represent the data?
Lets say you have a Product entity that is made up of three tables (Prices, Text, and Photos) (you could have sets of price for different regions, different text for localization, and different photos). (Sounds like a builder pattern) Would you create a slice of these tables grabbing the right prices, text, and photos in to a single List? Since Lists may be proprietary, would you use a Dictionary object?
I thank you for your answers. I am very interested in the "standard and proper" way to do it rather than 101 possibilities.
Another quick question: is Entity Framework ready for a complicated database yet? There are a lot of constructs that Linq2SQL likes that EF does not. EF seems to require identity fields as primary keys (HAHA), but it seems like every demo does this. I want to use EF, but I constantly fail to make it work, falling back to Linq2SQL.
If you keep the L2S on the other side of the Repository facade (remember, that's all a Repository is - a facade) then you decouple the rest of your application from L2S. This means that the job of the code behind your repository is to turn the L2S into "domain" objects, custom classes, and then the Repository returns those.
In this sense, the Repository is returning fully formed "Product" objects with all their related Price, Text, and Photo data. This is called an Aggregate Root.
There shouldn't be a problem with Lists, since they are CLR objects.
As far as EF for advanced scenarios, my advice would be not yet, for the reasons you note.
The standard mechanism I'd use to represent the data is a Data Transfer Object. I would never return a LINQ to SQL or Entity Framework object across a service boundary, and I would hesitate to return it across a layer boundary of any kind. This is because these objects will serialize implementation-dependant data.