JPA/EclipseLink: disable cascade for certain merge operations? - jpa

I am using EclipseLink 2.3.3. with a data model with about 100 entities. I have a Java class mapped to each database table using annotations.
I have two use cases to implement. One is that a new record enters the system that hits about 60-75 of the tables. For this case, I want merge and persist to cascade, so that I can just merge the top level object and have that cascade to all related entities.
Another use case is that I need to insert a collection of individual objects, often one from each of a bunch of different tables. In this case I don't want the cascading merge, because I need to have control over the insertions. If I have cascade enabled, merging the first object might or might not merge the other objects, depending on if or how they are related, so I'd rather explicitly merge each of them.
So essentially, I want cascading merge and persist in one situation, but not another. So if I include the cascade annotations in the mapped classes, I need to selectively disable the cascading for certain operations; or, if I turn off cascading in the mapped classes, I would like to enable cascading for certain operations.
So far I am not finding any way to selectively turn on or off cascading for a particular operation. There is a CascadePolicy class but that seems to only be used with queries. There are dynamic entities, and I was thinking perhaps I could use that to do something like create a dynamic entity from an existing entity and turn off the cascading behavior on that entity's relationships and somehow use that for the merge, but I have not been able to find the right API for that.
So I am wondering if there is a better answer somewhere that I'm overlooking? Thanks for any information.

I'm not certain about what level of control you are after, especially in the case that you mention you want to insert individual objects. From the sounds of it, cascade merge is exactly what you want for your Entity object tree in the first case for use with the EntityManager.merge. Merge called on an entity will check if it is new or not, and update or insert as appropriate. Marking relationships as cascade merge will allow finding new objects and having them inserted.
The second case though where you want to handle individual insertions, why not exclude the cascade persist option on mappings and just call EntityManager.persist on the objects you want to insert? Persist then will not cascade, so only the entity you call em.persist on will get inserted. Relationships will be used just to set the foreignkey values - though you might want to leave them nulled out and set them later as part of larger merge calls. Both sides of bidirectional relationships need to be maintained, and if the other side is exists and doesn't get merged, its relationship changes are not stored.
If that isn't what you want, EclipseLink has native API on the UnitOfWork (the EntityManager essentially wraps a UnitOfWork for transactional work) that allows you to specify the merge policy. See mergeClone, deepMergeClone and shallowMergeClone on UnitOfWork, which essentially use CASCADE_ALL_PARTS, CASCADE_PRIVATE_PARTS and NO_CASCADE respectively as the merge policies, while the JPA merges use CASCADE_BY_MAPPING.

Related

How to rehash and clean existing entities in the database?

We have an entity and a corresponding table in the database with one additional column which contains digested hash of the entity fields, calculated each time programmatically in application. Entity has associations with two additional tables/entities which fields also take part in hashing.
Now a decision was made to get rid of one of the fields from the main entity (boolean flag) and exclude it from hashing, since it makes two otherwise identical entities get different hashes when one entity has its flag set to true, while other is false. Since hashes are different both entities get stored in the database, which is not what we want.
Removing the field is simple, but we also need to re-calculate hashes for entities which have been already stored in the database. Since there might be duplicates, we also need to get rid of one of two duplicated entries. This whole operation must be done once after migration.
The stack we use is Quarkus, Flyway, Hibernate with Panache, and PostgreSQL. I have tried to use Flyway callbacks with Event.AFTER_MIGRATE to get all existing entities from the db, but I can't use Panache since its not initialised yet by the time callback hits. Using plain java.sql.* Connection and Statement is pretty cumbersome, cause I need to fetch data from 3 tables, create entity from all of the fields, re-calculate hash and put it back, while taking care of possible conflicts. Another option would be to create a new REST API endpoint specifically for the job which the client will have to call after app has booted, but somehow I don't feel that that is the best solution.
How do you tackle this kind of a situation?

When to use an owned entity types vs just creating a foreign key or adding the columns directly to the table?

I was reading about owned entity types here https://learn.microsoft.com/en-us/ef/core/modeling/owned-entities#feedback and I was wondering when I would use that. Especially when using .ToTable(); although I am not sure if ToTable creates a relationship with keys.
I read the entire article so I understand that it essentially forces you to access the data via nav properties and prevents the owned table from being treated as an entity. They also say Include() is not needed and the data comes down with every query for the parent table so its not like you are reducing the amount of data that comes back.
So whats the point exactly? Also whats the point of "table splitting"?
It takes the place of Complex types with the option to set it up like a 1-1 relationship /w ToTable while automatically eager-loaded. This would use the same PK in both tables, same as 1-1.
The point Table-splitting would be that you want an object model that is normalized, where the table structure is not. This would fit scenarios where you have an existing table structure and want to split off related pieces of that data into sub-entities associated with the main entity. With the ToTable option, it would be similar to a 1-1 relationship, but automatically eager-loaded. However when considering the reasons to use a 1-1 relationship I would consider this option a bad choice.
The common reasons for using it in normal 1-1 relationships would include:
Splitting off expensive to load, rarely used data. (images, binary, memo)
Encapsulating data particular to a single application off of a common entity. i.e. if I have a "Customer" which is used by a billing system vs. a CRM I might have "CustomerBillingData" and "CustomerCRMData" owned by "Customer" rather than an inherited BillingCustomer / CRMCustomer. As there is a "single" customer that may serve one or both systems. Billing doesn't care about CRM data, CRM doesn't care about Billing. If all data is in "Customer" then both systems potentially need to be updated, and I cannot rely on constraints when the data is optional to the other system. By using composition I can enforce required data for a particular system.
In neither of these cases would I want to use table-splitting or anything that automatically eager-loads, so Owned Types /w ToTable would not replace 1-1 relationships by any stretch. It's essentially a more strict version of complex types, I'd say it's strictly used for entity organization. Not something I'd admit to wanting to use very often.

JPA CascadeType priority?

Using JPA i have a question relating to the CascadeTypes.
for Example:
#ManyToMany(fetch=FetchType.LAZY, cascade={CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REFRESH})
is different to this:
#ManyToMany(fetch=FetchType.LAZY, cascade={CascadeType.MERGE, CascadeType.PERSIST, CascadeType.REFRESH})
Why?
I need the cascadetype persist to automatically insert referenced objects in my
entityclass. and i need merge because i dont want to have double entries in my
tables. but when i define persist first, merging doesnt work, when i define
merge first, persist doesnt work.
why?
The JPA specification is actually a very readable document and can be downloaded here:
https://jcp.org/aboutJava/communityprocess/final/jsr317/index.html
Inside it on page 384 it covers the cascade attribute of the ManyToMany annotation:
The cascade element specifies the set of cascadable operations that
are propagated to the associated entity. The operations that are
cascadable are defined by the CascadeType enum: public enum
CascadeType { ALL, PERSIST, MERGE, REMOVE, REFRESH, DETACH}; The value
cascade=ALL is equivalent to cascade={PERSIST, MERGE, REMOVE, REFRESH,
DETACH}.
As you can see it says nothing about the order. What is probably happening is your application is sometimes using a new object that needs to be persisted and sometimes loading one from the database that then needs to be merged. In order words, its an application issue.
Personally I use a DIY approach to merging entities in my persistence context. A good article to read on the subject is here:
http://blog.xebia.com/2009/03/23/jpa-implementation-patterns-saving-detached-entities/

Aggregate Root support in Entity Framework

How can we tell Entity Framework about Aggregates?
when saving an aggregate, save entities within the aggregate
when deleting an aggregate, delete entities within the aggregate
raise a concurrency error when two different users attempt to modify two different entities within the same aggreate
when loading an aggregate, provide a consistent point-in-time view of the aggregate even if there is some time delay before we access all entities within the aggregate
(Entity Framework 4.3.1 Code First)
EF provides features which allows you defining your aggregates and using them:
This is the most painful part. EF works with entity graphs. If you have an entity like Invoice and this entity has collection of related InvoiceLine entities you can approach it like aggregate. If you are in attached scenario everything works as expected but in detached scenario (either aggregate is not loaded by EF or it is loaded by different context instance) you must attach the aggregate to context instance and tell it exactly what did you changed = set state for every entity and independent association in object graph.
This is handled by cascade delete - if you have related entities loaded, EF will delete them but if you don't you must have cascade delete configured on the relation in the database.
This is handled by concurrency tokens in the database - most commonly either timestamp or rowversion columns.
You must either use eager loading and load all data together at the beginning (= consistent point of view) or you will use lazy loading and in such case you will not have consistent point of view because lazy loading will load current state of relations but it will not update other parts of aggregate you have already loaded (and I consider this as performance killer if you try to implement such refreshing with EF).
I wrote GraphDiff specifically for this purpose. It allows you to define an 'aggregate boundary' on update by providing a fluent mapping. I have used it in cases where I needed to pass detached entity graphs back and forth.
For example:
// Update method of repository
public void Update(Order order)
{
context.UpdateGraph(order, map => map
.OwnedCollection(p => p.OrderItems);
}
The above would tell the Entity Framework to update the order entity and also merge the collection of OrderItems. Mapping in this fashion allows us to ensure that the Entity Framework only manages the graph within the bounds that we define on the aggregate and ignores all other properties. It supports optimistic concurrency checking of all entities. It handles much more complicated scenarios and can also handle updating references in many to many scenarios (via AssociatedCollections).
Hope this can be of use.

I don't need/want a key!

I have some views that I want to use EF 4.1 to query. These are specific optimized views that will not have keys to speak of; there will be no deletions, updates, just good ol'e select.
But EF wants a key set on the model. Is there a way to tell EF to move on, there's nothing to worry about?
More Details
The main purpose of this is to query against a set of views that have been optimized by size, query parameters and joins. The underlying tables have their PKs, FKs and so on. It's indexed, statiscized (that a word?) and optimized.
I'd like to have a class like (this is a much smaller and simpler version of what I have...):
public MyObject //this is a view
{
Name{get;set}
Age{get;set;}
TotalPimples{get;set;}
}
and a repository, built off of EF 4.1 CF where I can just
public List<MyObject> GetPimply(int numberOfPimples)
{
return db.MyObjects.Where(d=> d.TotalPimples > numberOfPimples).ToList();
}
I could expose a key, but whats the real purpose of dislaying a 2 or 3 column natural key? That will never be used?
Current Solution
Seeming as their will be no EF CF solution, I have added a complex key to the model and I am exposing it in the model. While this goes "with the grain" on what one expects a "well designed" db model to look like, in this case, IMHO, it added nothing but more logic to the model builder, more bytes over the wire, and extra properties on a class. These will never be used.
There is no way. EF demands unique identification of the record - entity key. That doesn't mean that you must expose any additional column. You can mark all your current properties (or any subset) as a key - that is exactly how EDMX does it when you add database view to the model - it goes through columns and marks all non-nullable and non-computed columns as primary key.
You must be aware of one problem - EF internally uses identity map and entity key is unique identification in this map (each entity key can be associated only with single entity instance). It means that if you are not able to choose unique identification of the record and you load multiple records with the same identification (your defined key) they will all be represented by a single entity instance. Not sure if this can cause you any issues if you don't plan to modify these records.
EF is looking for a unique way to identify records. I am not sure if you can force it to go counter to its nature of desiring something unique about objects.
But, this is an answer to the "show me how to solve my problem the way I want to solve it" question and not actually tackling your core business requirement.
If this is a "I don't want to show the user the key", then don't bind it when you bind the data to your form (web or windows). If this is a "I need to share these items, but don't want to give them the keys" issue, then map or surrogate the objects into an external domain model. Adds a bit of weight to the solution, but allows you to still do the heavy lifting with a drag and drop surface (EF).
The question is what is the business requirement that is pushing you to create a bunch of objects without a unique identifier (key).
One way to do this would be not to use views at all.
Just add the tables to your EF model and let EF create the SQL that you are currently writing by hand.