JPA First level cache and when its filled - spring-data-jpa

working with Spring data JPA and reading it Hibernate first level cache is missed, the answer says "Hibernate does not cache queries and query results by default. The only thing the first level cache is used is when you call EntityManger.find() you will not see a SQL query executing. And the cache is used to avoid object creation if the entity is already loading."
So, if If get an entity not by its Id but other criteria, if I update some property I should not see an update sql inside a transactional methods because it has not been stored int the first level cache, right?
According to the above answer, if I get some list of entities, they will not be stored in first level cache not matter the criteria I use to find them, right?
When a Transactional(propagation= Propagation.NEVER) method loads the same entity by its id two times, is not supposed it will hit the database two times because each loading will run in its own "transaction" and will have its own persistent context? What is the expected behaviour in this case?
Thanks

Related

How to rehash and clean existing entities in the database?

We have an entity and a corresponding table in the database with one additional column which contains digested hash of the entity fields, calculated each time programmatically in application. Entity has associations with two additional tables/entities which fields also take part in hashing.
Now a decision was made to get rid of one of the fields from the main entity (boolean flag) and exclude it from hashing, since it makes two otherwise identical entities get different hashes when one entity has its flag set to true, while other is false. Since hashes are different both entities get stored in the database, which is not what we want.
Removing the field is simple, but we also need to re-calculate hashes for entities which have been already stored in the database. Since there might be duplicates, we also need to get rid of one of two duplicated entries. This whole operation must be done once after migration.
The stack we use is Quarkus, Flyway, Hibernate with Panache, and PostgreSQL. I have tried to use Flyway callbacks with Event.AFTER_MIGRATE to get all existing entities from the db, but I can't use Panache since its not initialised yet by the time callback hits. Using plain java.sql.* Connection and Statement is pretty cumbersome, cause I need to fetch data from 3 tables, create entity from all of the fields, re-calculate hash and put it back, while taking care of possible conflicts. Another option would be to create a new REST API endpoint specifically for the job which the client will have to call after app has booted, but somehow I don't feel that that is the best solution.
How do you tackle this kind of a situation?

How does EntityFramework Core mange data internally?

I'm trying to understand how EntityFramework Core manages data internally because it influences how I call DbSets. Particularly, does it refer to in-memory data or re-query the database every time?
Example 1)
If I call _context.ToDo.Where(x => x.id == 123).First() and then in a different procedure call the same command again, will EF give me the in-memory value or re-query the DB?
Example 2)
If I call _context.ToDo.Where(x => x.id == 123).First() and then a few lines later call _context.ToDo.Find(123).Where(x => x.id == 123).Incude(x => x.Children).First(), will it use the in-memeory and then only query the DB for "Children" or does it recall the entire dataset?
I guess I'm wondering if it matters if I duplicate a call or not?
Is this affected by the AsNoTracking() switch?
What you really ask is how caching works in EF Core, not how DbContext manages data.
EF always offered 1st level caching - it kept the entities it loaded in memory, as long as the context remains alive. That's how it can track changes and save all of them when SaveChanges is called.
It doesn't cache the query itself, so it doesn't know that Where(....).First() is meant to return those specific entities. You'd have to use Find() instead. If tracking is disabled, no entities are kept around.
This is explained in Querying and Finding Entities, especially Finding entities using primary keys:
The Find method on DbSet uses the primary key value to attempt to find an entity tracked by the context. If the entity is not found in the context then a query will be sent to the database to find the entity there. Null is returned if the entity is not found in the context or in the database.
Find is different from using a query in two significant ways:
A round-trip to the database will only be made if the entity with the given key is not found in the context.
Find will return entities that are in the Added state. That is, Find will return entities that have been added to the context but have not yet been saved to the database.
In Example #2 the queries are different though. Include forces eager loading, so the results and entities returned are different. There's no need to call that a second time though, if the first entity and context are still around. You could just iterate over the Children property and EF would load the related entities one by one, using lazy loading.
EF will execute 1 query for each child item it loads. If you need to load all of them, this is slow. Slow enough to be have its own name, the N+1 selects problem. To avoid this you can load a related collection explicitly using explicit loading, eg. :
_context.Entry(todo).Collection(t=>t.Children).Load();
When you know you're going to use all children though, it's better to eagerly load all entities with Include().

Eclipselink disable cache for stored procedure

I have two stored procedure calls that return a User entity. One looks to see if the user is registered by two parameters not included in the user entity. If the procedure does not return any users, a second stored procedure is called to register that user.
The behavior I'm seeing is that when called in this order, the second stored procedure returns a User entity from the cache that has nearly all the fields as null. When I disable caching it returns the user object appropriately. It would seem that the first call is caching the user object.
In normal operation where a user is logging in, I want it to cache, so I do not want to disable caching for the first call. I want the second stored procedure call to not use the cache. After doing some research and testing a few options, I've found few options.
This doesn't work on a stored procedure:
proc.setHint("javax.persistence.cache.retrieveMode", CacheRetrieveMode.BYPASS);
java.lang.IllegalArgumentException: Query linkUser, query hint javax.persistence.cache.retrieveMode is not valid for this type of query.
This looks like evicts all the cache for all Users.
em.getEntityManagerFactory().getCache().evict(User.class);
And these options either disable cache for all instances of the entity or across the application.
How can I not use cache for a single stored procedure call with Eclipselink?
Bonus: Why would a stored procedure call that returns a null user be cached?
The JPA specification requires that all entities returned from JPA queries (which includes native and Stored proc queries) be managed, which implies they are also cached to maintain object identity. If your first query is returning an incomplete entity, this too will be cached. Applications need to be careful when using queries that return entities that they return a complete set of data or they can corrupt the cache, and also note that their entities may be pulled from the cache instead of rebuilt with the data from their query, and may want to return java objects (constructor queries) rather than JPA entities.
For the answer to the first part, see https://stackoverflow.com/a/4471109/496099
Found a Java EE Tutorial that shows how evict an individual cache. I'm still not sure why it's even being cached because the first call never has the userId.
Cache cache = em.getEntityManagerFactory().getCache();
cache.evict(User.class, userId);

What is the correct why to get EclipseLink JPA query.getResultList() results in the cache?

When I use EclipseLink JPA query.getResultList() it doesn't store the results in the cache so when I call merge the first call does a select all then update for each object.
What's the correct all to get query results in the cache?
I'm thinking do the query then call EntityManager find for each result - but seems wrong. Obviously I can't call find first as I don't know the object id.
Basically I want to cache all the data in-memory (in the cache) and have updates as quick as possible.
Thanks
EclipseLink caches every object returned by getResultList() in the shared (L2) cache by default.
If you are not getting caching, then you have mis-configured something.
Please include your code, configuration, and SQL log.
Are you using Spring? (see http://www.eclipse.org/forums/index.php/t/200321/)
Ensure you have not disabled the shared cache, or configured refreshing.
How many object are you reading, and how long is it from the query to the merge? If you are reading a lot of objects, you may need to increase the cache size (default is 100), or change the cache type.
See, http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching

Create new or update existing entity at one go with JPA

A have a JPA entity that has timestamp field and is distinguished by a complex identifier field. What I need is to update timestamp in an entity that has already been stored, otherwise create and store new entity with the current timestamp.
As it turns out the task is not as simple as it seems from the first sight. The problem is that in concurrent environment I get nasty "Unique index or primary key violation" exception. Here's my code:
// Load existing entity, if any.
Entity e = entityManager.find(Entity.class, id);
if (e == null) {
// Could not find entity with the specified id in the database, so create new one.
e = entityManager.merge(new Entity(id));
}
// Set current time...
e.setTimestamp(new Date());
// ...and finally save entity.
entityManager.flush();
Please note that in this example entity identifier is not generated on insert, it is known in advance.
When two or more of threads run this block of code in parallel, they may simultaneously get null from entityManager.find(Entity.class, id) method call, so they will attempt to save two or more entities at the same time, with the same identifier resulting in error.
I think that there are few solutions to the problem.
Sure I could synchronize this code block with a global lock to prevent concurrent access to the database, but would it be the most efficient way?
Some databases support very handy MERGE statement that updates existing or creates new row if none exists. But I doubt that OpenJPA (JPA implementation of my choice) supports it.
Event if JPA does not support SQL MERGE, I can always fall back to plain old JDBC and do whatever I want with the database. But I don't want to leave comfortable API and mess with hairy JDBC+SQL combination.
There is a magic trick to fix it using standard JPA API only, but I don't know it yet.
Please help.
You are referring to the transaction isolation of JPA transactions. I.e. what is the behaviour of transactions when they access other transactions' resources.
According to this article:
READ_COMMITTED is the expected default Transaction Isolation level for using [..] EJB3 JPA
This means that - yes, you will have problems with the above code.
But JPA doesn't support custom isolation levels.
This thread discusses the topic more extensively. Depending on whether you use Spring or EJB, I think you can make use of the proper transaction strategy.