Spring JPA/Hibernate Repository findAll is doing N+1 requests instead of a JOIN by default in Kotlin - postgresql

I am working in a Spring JPA/Hibernate application with Kotlin and I want to find all elements in an entity.
That entity has a foreign key with a #ManyToOne relationship. I want to get all elements with their associated values with a JOIN query avoiding the N+1 problem.
One thing is that the foreign keys are not related to the primary keys, but to another unique field in the entities (UUID).
I was able to make that query with a JOIN creating a custom Query with a JOIN FETCH, but my point is to avoid creating those queries and make those JOINS in all findAlls by default.
Is that possible or do I have to make a query in JPQL manually to force the JOIN FETCH?
Here is the example code:
#Entity
data class A {
#Id
val id: Long,
#Column
val uuid: UUID,
#Column
val name: String
}
#Entity
data class B {
#Id
val id: Long,
...
#Fetch(FetchMode.JOIN)
#ManyToOne
#JoinColumn(name = "a_uuid", referencedColumnName = "uuid", insertable = false, updatable = false)
val a: A
}
#Repository
interface Repo<B> : CrudRepository<B, Long>
...
repo.findAll() // <-- This triggers N+1 queries instead of making a JOIN
...

Another option for you is using EntityGraph. It allows defining a template by grouping the related persistence fields which we want to retrieve and lets us choose the graph type at runtime.
This is an example code that is made by modifying your code.
#Entity
data class A (
#Id
val id: Long,
#Column
val uuid: UUID,
#Column
val name: String
) : Serializable
#NamedEntityGraph(
name = "b_with_all_associations",
includeAllAttributes = true
)
#Entity
data class B (
#Id
val id: Long,
#ManyToOne
#JoinColumn(name = "a_uuid", referencedColumnName = "uuid")
val a: A
)
#Repository
interface ARepo: CrudRepository<A, Long>
#Repository
interface BRepo: CrudRepository<B, Long> {
#EntityGraph(value = "b_with_all_associations", type = EntityGraph.EntityGraphType.FETCH)
override fun findAll(): List<B>
}
#Service
class Main(
private val aRepo: ARepo,
private val bRepo: BRepo
) : CommandLineRunner {
override fun run(vararg args: String?) {
(1..3L).forEach {
val a = aRepo.save(A(id = it, uuid = UUID.randomUUID(), name = "Name-$it"))
bRepo.save(B(id = it + 100, a = a))
}
println("===============================================")
println("===============================================")
println("===============================================")
println("===============================================")
bRepo.findAll()
}
}
On B entity, an entity graph named "b_with_all_associations" is defined, and it is applied to the findAll method of the repository of B entity with LOAD type.
These things will prevent your N+1 problem by fetching with join.
Here is the SQL log for the bRepo.findAll().
select
b0_.id as id1_1_0_,
a1_.id as id1_0_1_,
b0_.a_uuid as a_uuid2_1_0_,
a1_.name as name2_0_1_,
a1_.uuid as uuid3_0_1_
from
b b0_
left outer join
a a1_
on b0_.a_uuid=a1_.uuid
ps1. due to this issue, I don't recommend using many to one relationship with non-pk. It forces us to use java.io.Serializable to 'One' entity.
ps2. EntityGraph can be a good answer to your question when you want to solve the N+1 problem with Join. But I would recommend the better solution: try to solve it with Lazy loading.
ps3. It's not a good idea that using non-pk associations for Hibernate. I truly agree on this comment. I think it's a bug that is not solved yet. It breaks the lazy loading mechanism of hibernate.

As far as I know, the fetch mode only applies to EntityManager.find related queries or when doing lazy loading but never when executing HQL queries, which is what is happening behind the scenes. If you want this to be join fetched, you will have to use an entity graph, which is IMO also better as you can define it per use-site, rather than globally.

I don't know how to configure exactly what you are asking, but the following suggestion might be worth considering...
Change
#Fetch(FetchMode.JOIN)
#ManyToOne
#JoinColumn(name = "a_uuid", referencedColumnName = "uuid", insertable = false, updatable = false)
val a: A
to
#ManyToOne(fetch = javax.persistence.FetchType.LAZY)
#JoinColumn(name = "a_uuid", referencedColumnName = "uuid", insertable = false, updatable = false)
val a: A
And then on your entity A, add the annotation to the class
#BatchSize(size = 1000)
Or whatever batch-size you feel to be appropriate.
This will generally give you the results in 2 queries if you have less than 1000 results. It will load a proxy for A rather than joining to A, but then the first time that A is accessed, it will populate the proxies for BATCH_SIZE number of entities.
It reduces the number of queries from
N + 1
to
1 + round_up(N / BATCH_SIZE)

The findAll implementation will always load b first and then resolve it's dependencies checking the annotations. If you want to avoid the N+1 problem you can add the #Query annotation with JPQL query:
...
#Query("select b from TableB b left join fetch b.a")
repo.findAll()
...

Related

How to avoid id collisions in Spring Data JPA/Hibernate-generated database?

We use a dockerized postgres database and have hibernate auto-generate the tables (using spring.jpa.hibernate.ddl-auto: create) for our integration tests. Using something like H2 is not an option because we do some database-specific operations in a few places, e.g. native SQL queries.
Is there any way to avoid id collisions when all entities use auto-incremented ids? Either by offsetting the start id or, better yet, having all tables use a shared sequence?
Schema is created when the docker container is launched, tables are created by Spring Data JPA/Hibernate
Example
Examples use kotlin syntax and assumes the "allopen"-plugin is applied for entities.
Sometimes we've had bugs where the wrong foreign key was used, e.g. something like this:
#Entity
class EntityOne(
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "id", nullable = false, columnDefinition = "SERIAL")
var id: Long,
)
#Entity
class EntityTwo(
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "id", nullable = false, columnDefinition = "SERIAL")
var id: Long,
)
#Entity
class JoinEntity(
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "id", nullable = false, columnDefinition = "SERIAL")
var id: Long,
#ManyToOne
#JoinColumn(name = "entity_one_id")
var entityOne: EntityOne,
#ManyToOne
#JoinColumn(name = "entity_two_id")
var entityTwo: EntityTwo,
)
#Repository
interface JoinEntityRepository : JpaRepository<JoinEntity, Long> {
//
// Bug here! Should be "WHERE entityOne.id = :entityOneId"
//
#Query("SELECT entityTwo FROM JoinEntity WHERE entityTwo.id = :entityOneId")
fun findEntityTwoByEntityOneId(entityOneId: Long): Collection<EntityTwo>
}
These bugs can in some circumstances be very hard to find because when the table is created, there may very well be an Entity2 with the same id as some Entity1, and so the query succeeds but the test fails somewhere down the line because while it is returning one or more Entity2, it's not the expected ones.
Even worse, depending on the scope of the test it may pass even if the wrong entity is fetched, or fail only when tests are run in a specific order (due to ids getting "out of sync"). So ideally it should fail to even find an entity when the wrong id is passed. But because the database structure is created from scratch and the ids are auto-incremented they always start at 1.
I found a solution to this.
In my resources/application.yml (in the test folder, you most likely do not want to do this in your main folder) I add spring.datasource.initialization-mode: always and a file data.sql.
The contents of data.sql are as follows:
DROP SEQUENCE IF EXISTS test_shared_sequence;
CREATE SEQUENCE test_shared_sequence;
ALTER TABLE entity_one ALTER COLUMN id SET DEFAULT nextval('test_shared_sequence');
ALTER TABLE entity_two ALTER COLUMN id SET DEFAULT nextval('test_shared_sequence');
After Spring has auto-generated the tables (using spring.jpa.hibernate.ddl-auto: create) it will run whatever is in this script, and the script will change all tables to auto-generate ids based on the same sequence, meaning that no two entities will ever have the same id regardless of which table they're stored in, and as such any query that looks in the wrong table for an id will fail consistently.

EclipseLink ManyToOne - CriteriaBuilder Generated Query is Wrong

I have an Entity with a ManyToOne Relationship to the Primary Key of another entity. When I create a query that references this Foreign Key eclipseLink always creates a join instead of simply accessing the Foreign Key.
I have created a highly simplified example to show my issue:
#Entity
public class House {
#Id
#Column(name = "H_ID")
private long id;
#Column(name = "NAME")
private String name;
#ManyToOne
#JoinColumn(name = "G_ID")
private Garage garage;
}
#Entity
public class Garage{
#Id
#Column(name = "G_ID")
private long id;
#Column(name = "SPACE")
private Integer space;
}
I created a query that should return all houses that either have no garage or have a garage with G_ID = 0 using the CriteriaBuilder.
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<House> query = cb.createQuery(House.class);
Root<House> houseRoot = query.from(House.class);
Path<Long> garageId = houseRoot.get(House_.garage).get(Garage_.id);
query.where(cb.or(cb.equal(garageId , 0), cb.isNull(garageId)));
TypedQuery<House> typedQuery = entityManager.createQuery(query);
List<House> houses = typedQuery.getResultList();
The generated query is:
SELECT h.NAME, h.G_ID FROM HOUSE h, GARAGE g WHERE (((h.G_ID= 0) OR (g.G_ID IS NULL)) AND (g.G_ID = h.G_ID));
I don't understand why
The or condition first references table HOUSE and then GARAGE (instead of HOUSE)
The join is created in the first place.
The correct query should look like this in my understanding:
SELECT h.NAME, h.G_ID FROM HOUSE h WHERE (((h.G_ID= 0) OR (h.G_ID IS NULL));
Or if a join is made it should take into account that the ManyToOne relationship is nullable and therefore do a LEFT OUTER JOIN.
SELECT h.NAME, h.G_ID FROM HOUSE h LEFT OUTER JOIN GARAGE g ON (h.G_ID = g.G_ID ) WHERE (h.G_ID = 0) OR (g.G_ID IS NULL);
(Note both these queries would work correctly in my more complicated setup. I also get the same error when only wanting to retrieve all houses that have no garage.)
How can I achieve this (while still using the CriteriaBuilder and ideally not having to change the DB Model)?
(Please let me know any additional information that might be required, I'm very new to this topic and came across this issue while migrating an existing application.)
-- edit --
I have found a solution to my problem that will result in slightly different behaviour (but in my application that part of the code I had to migrate didn't make much sense in the first place). Instead of using
Path<Long> garageId = houseRoot.get(House_.garage).get(Garage_.id);
I use
Path<Garage> garage = houseRoot.get(House_.garage);
And then as expected table Garage isn't joined anymore. (I assume the code previously must have been some kind of hack to get the desired behaviour from openJPA)
I don't understand why
The or condition first references table HOUSE and then GARAGE (instead of HOUSE)
I believe this is implementation specific; in any case, it shouldn't have any bearing on the results.
The join is created in the first place.
By saying Path<Long> garageId = houseRoot.get(House_.garage).get(Garage_.id) you're basically telling EclipseLink: 'join Garage to House, we're gonna need it'. That you then access Garage_.id (and not, for example, Garage_.space) is inconsequential.
If you don't want the join, simply map the G_ID column one more time as a simple property: #Column(name = "G_ID", insertable = false, updatable = false) private Long garageId. Then refer to House_.garageId in your query.
Or if a join is made it should take into account that the ManyToOne relationship is nullable and therefore do a LEFT OUTER JOIN.
Path.get(...) always defaults to an INNER JOIN. If you want a different join type, use Root.join(..., JoinType.LEFT), i. e. houseRoot.join(House_.garage, JoinType.LEFT).get(Garage_.id).
One solution that results in the same behaviour is:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<House> query = cb.createQuery(House.class);
Root<House> houseRoot = query.from(House.class);
Path<Garage> garage = houseRoot.get(House_.garage);
Path<Long> garageId = garage.get(Garage_.id);
query.where(cb.or(cb.equal(garageId , 0), cb.isNull(garage)));
TypedQuery<House> typedQuery = entityManager.createQuery(query);
List<House> houses = typedQuery.getResultList();
This results in the following SQL:
SELECT H_ID, NAME, G_ID FROM HOUSE WHERE ((G_ID = 0) OR (G_ID IS NULL));

JPQL query delete not accept a declared JOIN?

I'm trying to understand why the Hibernate not accepts this follow JPQL:
#Modifying
#Query("delete from Order order JOIN order.credit credit WHERE credit.id IN ?1")
void deleteWithListaIds(List<Long> ids);
The error that I receive is:
Caused by: java.lang.IllegalArgumentException: node to traverse cannot be null!
at org.hibernate.hql.internal.ast.util.NodeTraverser.traverseDepthFirst(NodeTraverser.java:46)
at org.hibernate.hql.internal.ast.QueryTranslatorImpl.parse(QueryTranslatorImpl.java:284)
But accepts this:
#Modifying
#Query("delete from Order order WHERE order.credit.id IN ?1")
void deleteWithListaIds(List<Long> ids);
The entity Order (the entity Credit does not map the Orders):
#Entity
public class Order {
#Id
#Setter
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = SEQUENCE)
#SequenceGenerator(name = SEQUENCE, sequenceName = SEQUENCE, allocationSize = 1)
#Column(name = "id", nullable = false)
private Long id;
#JoinColumn(name = "credit_id", foreignKey = #ForeignKey(name = "fk_order_credit"))
#ManyToOne(fetch = FetchType.LAZY, optional = false)
private Credit credit;
}
In select statements, the two approaches are accepted, but I don't understand why Hibernate have this limitation or if I'm doing something wrong in my DELETE Jpql. I would like to declare the JOIN in the query.
The only way that I know to resolve this problem in more complex queries is create a subselect:
delete from Order order WHERE order.id IN (
SELECT order.id FROM Order order
JOIN order.credit credit
WHERE credit.id in ?1)
Is this the right approach for more complex delete queries?
I'm using the Spring Jpa Repository in the code above and Spring Boot 1.5.10.RELEASE.
I don't understand why Hibernate have this limitation.
It is specified as such in the JPA Spec in section 4.10:
delete_statement ::= delete_clause [where_clause]
delete_clause ::= DELETE FROM entity_name [[AS] identification_variable]
So joins aren't allowed in delete statements.
Why this was decided this way is pure speculation on my side.
But the select_clause or delete_clause specify what the query operates on. While it is totally fine for a select statement to operate on a combination of multiple entities a join for a delete doesn't really make much sense.
It just forces you to specify which entity to delete.
The only way that I know to resolve this problem in more complex queries is to create a subselect:
Is this the right approach for more complex delete queries?
If you can't express it using simpler means then yes, this is the way to go.

spring data sort by map-value within a given key

I would like to sort a Map of by the value. For example I have Person class which has a map of details that are stored in a map with key-value Map<String, String>.
I am using springboot with hibernate5. This is the mapping.
public class Person implements Serializable {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#DocumentId
#Column(name = "personid")
private Integer id;
#Column(name = "name")
private String name;
// this is a collection of person details
#ElementCollection()
#MapKeyColumn(name = "detailkey")
#Column(name = "detailvalue")
#CollectionTable(name = "details", joinColumns = #JoinColumn(name = "personid"))
Map<String, String> details = new HashMap<>();
//getter and setters omitted
So far I am able to retrieve a person with some specific detailskey and specific detail value. So for example a person table in the DB has eyecolor as detail attribute and as value can have "green", "blue", "brown". Note this is not a real example, just for clarity purposes.
So for example I can get the list of persons and sort them by their name, in the controller I can do
Sort sort = new Sort(Sort.Direction.ASC, "name");
and the opposite direction
Sort sort = new Sort(Sort.Direction.DESC, "name");
Pageable pageable = new PageRequest(1, 10, sort);
pageResult = personRepository.findAll(
"eyecolor", "green", pageable
);
and this one will return the list of persons that have "eyecolor" as green. So far so good and this is working as expected. Now I would like to define a sorting on the detailvalue.
For example I would like to get a list of person sorted by their eyecolor. So first I should have the persons that have "blue", "brown", "green".
how can the Sort be specified in this case ?
In standard SQL it would be something like this:
SELECT p.* from persons p LEFT JOIN details d ON
p.personid = d.personid AND p.detailkey='eyercolor' ORDER BY
p.detailvalue ASC;
The following query worked for me:
SELECT p FROM Person p JOIN p.details d WHERE KEY(d) = 'eyecolor' ORDER BY d
(note that ORDER BY VALUE(d) would fail since VALUE(d) still seems to behave as described here: JPA's Map<KEY, VALUE> query by JPQL failed)
Now, I'm not particularly well versed with Spring Data, but I suppose you should be able to use the above query (without the ORDER BY part) with the #Query annotation on your PersonRepository.findAll method (I'm assuming that's a custom method) and provide the sorting using JpaSort.unsafe("d").

JPQL Query working in testing, not in production

I have two Entities related by a ManyToMany and I want to select them via a named Query. This works in my test (with a H2 DB set up) and throws exceptions at runtime (with postgresql set up). Other than the H2 and PG I am hard pressed to find differences between test and production.
The Entities and the Query look like so (abbreviated):
#Entity(name = "Enrichment")
#Table(name = "mh_Enrichment")
NamedQueries({
#NamedQuery(name = "findByLink",
query = "SELECT e FROM Enrichment e INNER JOIN e.links l WHERE l.link in (:links)") })
public class EnrichmentImpl {
#Id
#Column(name = "enrichmentId")
#GeneratedValue(strategy = GenerationType.AUTO)
private long id;
#ManyToMany
#JoinTable(name = "mh_EnrichmentLinks", joinColumns = { #JoinColumn(name = "EnrichmentId",
referencedColumnName = "enrichmentId") }, inverseJoinColumns = { #JoinColumn(name = "Link",
referencedColumnName = "link") })
private List<Link> links;
}
#Entity(name = "Link")
#Table(name = "mh_enrichment_link")
public class LinksImpl {
#Id
#Column(name = "link", length = 1024)
private String link;
}
Upon running the query with a String value in production I get:
Internal Exception: org.postgresql.util.PSQLException: ERROR: operator does not exist: character varying = bigint
Hinweis: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Position: 215
Error Code: 0
Call: SELECT t1.enrichmentId FROM mh_enrichment_link t0, mh_EnrichmentLinks t2, mh_Enrichment t1 WHERE ((t0.link IN (?)) AND ((t2.EnrichmentId = t1.enrichmentId) AND (t0.link = t2.Link)))
Any ideas what's wrong? It is the query, isn't it?
The query is supposed to retrieve a list of Enrichments that are related to the given link.
Update #1
As requested: the tables in the DB look as follows:
For entity Link
CREATE TABLE mh_enrichment_link
(
link character varying(1024) NOT NULL,
CONSTRAINT mh_enrichment_link_pkey PRIMARY KEY (link)
)
For entity Enrichment
CREATE TABLE mh_enrichment
(
enrichmentid bigint NOT NULL,
CONSTRAINT mh_enrichment_pkey PRIMARY KEY (enrichmentid)
)
For the relation (See answer, this was where it went wrong)
CREATE TABLE mh_enrichmentlinks
(
link character varying(1024) NOT NULL,
CONSTRAINT mh_enrichment_link_pkey PRIMARY KEY (link)
)
The issue was fixed by dropping all related tables and having JPA regenerate them. Table definitions didn't match Entity definitions.
Thats also the quite obviously the reason why the test worked and the production didn't. In testing the tables are generated on runtime, in production they existed already (with an outdated definition).
Side note: The query is correct and does what it should.