I'm using Eclipselink to map my tables to entities.
I have one big database table (actually it's view) with columns like groupId, groupName, categoryId, categoryName etc. I know it's redundand, but we're trying to minimize queries and it's dynamically created view.
The question is: How to map such table to several entities like Group, Category etc?
You would probably be better off mapping to the real tables and use query optimization to reduce your queries (such as join fetching and batch fetching)
See,
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
If you really want to have several class map to the same table, you will need to have one Entity and make the rest Embeddables.
See,
http://en.wikibooks.org/wiki/Java_Persistence/Embeddables
Related
I am developing an API where I am confused as to what is the efficient way to handle join query.
I want to join 2 tables data and return the response. Either I can query the database with join query and fetch the result and then return the response OR I can fire two separate queries and then I would handle the join in the API on the fly and return the response. Which is the efficient and correct way ?
Databases are pretty much faster than querying and joining as class instances. Always do joins in the database and map them from the code. Also look for any lazy loading if possible. Cause in a situation like below:
#Entity
#Table(name = "USER")
public class UserLazy implements Serializable {
#Id
#GeneratedValue
#Column(name = "USER_ID")
private Long userId;
#OneToMany(fetch = FetchType.LAZY, mappedBy = "user")
private Set<OrderDetail> orderDetail = new HashSet();
// standard setters and getters
// also override equals and hashcode
}
you might not want order details when you want the initial results.
Usually it's more efficient to do the join in the database, but there are some corner cases, mostly due to the fact that application CPU time is cheaper than database CPU time. Here are a few examples that come to mind, with a query like "table A join table B":
B is a small table that rarely changes.
In this case it can be profitable to cache the contents of this table in the application, and not query it at all.
Rows in A are quite large, and many rows of B are selected for each row of A.
This will cause useless network traffic and load as rows from A are duplicated many times in each result row.
Rows in B are quite large, and there are few distinct b_id's in A
Same as above, except this time the same few rows from B are duplicated in the result set.
In the previous two examples, it could be useful to perform the query on table A, then gather a set of unique b_id's from the result, and SELECT FROM b WHERE b_id IN (list).
Data structure and ORMs
If each table contains a different object type, and they have a "belongs to" relationship (like category and product) and you use an ORM which will instantiate objects for each selected row, then perhaps you only want one instance of each category, and not one per selected product. In this case, you could select the products, gather a list of unique category_ids, and select the categories from there. The ORM may even do that for you behind the scene.
Complicated aggregates
Sometimes, you want some stuff, and some aggregates of other stuff related to the first stuff, but it just won't fit in a neat GROUP BY, or you may need several ones.
So basically, usually the join works better in the database, so that should be the default. If you do it in the application, then you should know why you're doing it, and decide it's a good reason. If it is, then fine. I gave a few reasons, based on performance, data model, and SQL constraints, these are only examples of course.
Using Entity Framework 6.
Suppose I have an entity Parent with two nested collections ICollection<Child> and ICollection<Child2>. I want to fetch both eagerly:
dbContext.Parent.Include(p => p.Child).Include(p => Child2).ToList()
This generates a big query, which looks like this at a high level:
SELECT ... FROM (
SELECT (parent columns), (child columns), NULL as (child2 columns)
FROM Parent left join Child on ...
WHERE (filter on Parent)
UNION ALL
SELECT (parent columns), NULL as (child columns), (child2 columns)
FROM Parent left join Child2 on ...
WHERE (filter on Parent)
))
Is there a way to get Entity Framework to behave like batch fetch in NHibernate (or JPA, EclipseLink, Hibernate etc.) where you can specify that you want to query the parent table first, then each child table separately?
SELECT ... from Parent -- as usual
SELECT ... from Child where parent_id in (list of IDs)
SELECT ... from Child2 where parent_id in (list of IDs)
-- alternatively, you can specify EXISTS instead of IN LIST:
SELECT ... from Child where exists (select 1 from Parent where child.parent_id = parent.id and (where clause for parent))
I find this easier to understand and reason about, since it more closely resembles the SQL you would write if you were writing it by hand. Also, it prevents the redundant parent table rows in the result set. On the other hand, it's more round trips.
I do not believe this is possible with the Entity Framework, at least using LINQ. At the end of the day the ORM attempts to generate the most efficient query possible, at least to it. That being said ORMs like Entity don't always generate the nicest looking SQL or the most efficient. My guess, and this is just a guess, is Entity is trying to reduce the number of trips and I/O becaus I/O is experience, in relativity.
If you are looking for fine grain control over your SQL I recommend you avoid ORMs, or do like I do, use Entity for your basic CRUD and simple queries, used stored procedures for your complex queries, such as complex reports. There is always the ADO.NET too, but seems like you are more intent on using an ORM.
You may fine this useful as well. Basically not much tuning is available. https://stackoverflow.com/a/22390400/2272004
Entity Framework misses out on many sophisticated features NHibernate offers. EF's unique selling point is its versatile LINQ support, but if you need declarative control over how an ORM fetches your data spanning multiple tables, EF is not the tool of choice. With EF, you can only try to find procedural tricks.
Let me illustrate that by showing what you'd need to do to achieve "batch fetching" with EF:
context.Configuration.LazyLoadingEnabled = false;
context.Children1.Where(c1 => parentIds.Contains(c1.ParentId)).Load();
context.Children2.Where(c2 => parentIds.Contains(c2.ParentId)).Load();
var parents = dbContext.Parent.Where(p => parentIds.Contains(p.Id)).ToList();
This loads the required data into to context and EF connects the parent and children by relationship fixup. The result is a parents list with their two child collections populated. But of course, it has several downsides:
You need disable lazy loading, because even though the child collections are populated, they are not marked as loaded. Accessing them would still trigger lazy loading, when enabled.
Repetitive code: you need to repeat the predicates three times. It's not really easy to avoid this.
Too specific. For each different scenario, even if they are almost identical, you have to write a new set of statements. Or make it configurable, which is still a procedural solution.
EF's current main production version (6) doesn't have a query batch facility. You need third-party tools like EntityFramework.Extended to run these queries in one database roundtrip.
I am using entity splitting to split properties across multiple tables - http://msdn.microsoft.com/en-us/data/jj591617
This adds an inner join to the resulting SQL query.
I expected this join would only be included when the query projection includes the properties in the secondary table. This is not the case when I use an anonymous type to isolate (project) a subset of needed fields. The resulting SQL query only selects columns from the base table but still includes the join.
Is there anyway to continue to use entity splitting and only include the join when necessary?
As far as I can tell, no. With Entity Framework, you can't lazy-load simple properties that are mapped to a column (which, in your case, would theoretically help you avoid the join); only navigation properties can be lazy-loaded. Perhaps you would need to employ table splitting instead to achieve the goal of eliminating the join.
For reference:
http://www.eidias.com/blog/2013/11/18/entity-framework-lazy-loading-properties
We have a customer very large table with over 500 columns (i know someone does that!)
Many of these columns are in fact foreign keys to other tables.
We also have the requirement to eager load some of the related tables.
Is there any way in Linq to SQL or Dynamic Linq to specify what columns to be retrieved from the database?
I am looking for a linq statement that actually HAS this effect on the generated SQL Statement:
SELECT Id, Name FROM Book
When we run the reguar query generated by EF, SQL Server throws an error that you have reached the maximum number of columns that can be selected within a query!!!
Any help is much appreciated!
Yes exactly this is the case, the table has 500 columns and is self referencing our tool automatically eager loads the first level relations and this hits the SQL limit on number of columns that can be queried.
I was hoping that I can set to only load limited columns of the related Entities such as Id and Name (which is used in the UI to view the record to user)
I guess the other option is to control what FK columns should be eager loaded. However this still remains problem for tables that has a binary or ntext column which you may not want to load all the times.
Is there a way to hook multiple models (Entities) to the same table in Code First? We tried doing this I think the effort failed miserably.
Yes you can return only subset of columns by using projection:
var result = from x in context.LargeTable
select new { x.Id, x.Name };
The problem: projection and eager loading doesn't work together. Once you start using projections or custom joins you are changing shape of the query and you cannot use Include (EF will ignore it). The only way in such scenario is to manually include relations in the projected result set:
var result = from x in context.LargeTable
select new {
Id = x.Id,
Name = x.Name,
// You can filter or project relations as well
RelatedEnitites = x.SomeRelation.Where(...)
};
You can also project to specific type BUT that specific type must not be mapped (so you cannot for example project to LargeTable entity from my sample). Projection to the mapped entity can be done only on materialized data in Linq-to-objects.
Edit:
There is probably some misunderstanding how EF works. EF works on top of entities - entity is what you have mapped. If you map 500 columns to the entity, EF simply use that entity as you defined it. It means that querying loads entity and persisting saves entity.
Why it works this way? Entity is considered as atomic data structure and its data can be loaded and tracked only once - that is a key feature for ability to correctly persist changes back to the database. It doesn't mean that you should not load only subset of columns if you need it but you should understand that loading subset of columns doesn't define your original entity - it is considered as arbitrary view on data in your entity. This view is not tracked and cannot be persisted back to database without some additional effort (simply because EF doesn't hold any information about the origin of the projection).
EF also place some additional constraints on the ability to map the entity
Each table can be normally mapped only once. Why? Again because mapping table multiple times to different entities can break ability to correctly persist those entities - for example if any non-key column is mapped twice and you load instance of both entities mapped to the same record, which of mapped values will you use during saving changes?
There are two exceptions which allow you mapping table multiple times
Table per hierarchy inheritance - this is a mapping where table can contains records from multiple entity types defined in inheritance hierarchy. Columns mapped to the base entity in the hierarchy must be shared by all entities. Every derived entity type can have its own columns mapped to its specific properties (other entity types have these columns always empty). It is not possible to share column for derived properties among multiple entities. There must also be one additional column called discriminator telling EF which entity type is stored in the record - this columns cannot be mapped as property because it is already mapped as type discriminator.
Table splitting - this is direct solution for the single table mapping limitation. It allows you to split table into multiple entities with some constraints:
There must be one-to-one relation between entities. You have one central entity used to load the core data and all other entities are accessible through navigation properties from this entity. Eager loading, lazy loading and explicit loading works normally.
The relation is real 1-1 so both parts or relation must always exists.
Entities must not share any property except the key - this constraint will solve the initial problem because each modifiable property is mapped only once
Every entity from the split table must have a mapped key property
Insertion requires whole object graph to be populated because other entities can contain mapped required columns
Linq-to-Sql also contains ability to mark a column as lazy loaded but this feature is currently not available in EF - you can vote for that feature.
It leads to your options for optimization
Use projections to get read-only "view" for entity
You can do that in Linq query as I showed in the previous part of this answer
You can create database view and map it as a new "entity"
In EDMX you can also use Defining query or Query view to encapsulate either SQL or ESQL projection in your mapping
Use table splitting
EDMX allows you splitting table to many entities without any problem
Code first allows you splitting table as well but there are some problems when you split table to more than two entities (I think it requires each entity type to have navigation property to all other entity types from split table - that makes it really hard to use).
Create stored procedures that query the number of columns needed and then call the stored procs from code.
I have my app using core data with the data model below. However, I'm switching to a standard database with columns and rows. Can anyone help me with setting up this new database schema?
First of all you need to create tables for each of the Entities and their attributes (note I added "id" to each of the tables for relationships):
Routine (name, timestamp, id)
Exercise - this looks like a duplicate to me, so leaving one only here (muscleGroup, musclePicture, name, timeStamp, id)
Session (timeStamp, id)
Set (reps, timeStamp, unit, weight, id)
Now that you have tables that describe each of the entities, you need to create tables that will describe the relationships between these entities - as before table names are in capitals and their fields are in parenthesis:
RoutineExercises (routine_id, exercise_id)
SessionExercises (session_id, exercise_id)
ExerciseSets (exercise_id, set_id)
That's it! Now if you need to add an exercise to a routine, you simply:
Add an entry into Exercise table
Establish the relationship by adding a tuple into RoutineExercises table where routine_id is your routine ID and exercise_id is the ID of the newly created entry in the Exercise table
This will hold true for all the rest of the relationships.
NOTE: Your core data model has one-to-many and many-to-many relationships. If you want to specifically enforce that a relationship is one-to-many (e.g. Exercise can only have 1 routine), then you will need to make "exercise_id" as the index for the RoutineExercises table. If you want a many-to-many relationships to be allowed (i.e. each exercise is allowed to have multiple routines), then set the tuple of (routine_id, exercise_id) as the index.