What is the technique to configure JPA query cache so that it avoids multiple database hits when query returns no results? - jpa

I am using EclipseLink's implementation of JPA to query a table using a non-primary key field.
To cache results returned by this query, I have configured Query Results cache.
I have also annotated my entity with QUERY_RESULTS_CACHE_IGNORE_NULL set to true, so that it allows insert of new rows as per this javadoc.
However, when no results are returned, firing the query misses the cache.
Is there a configuration on query results cache either in JPA or Eclipselink which causes a cache hit even when no results are returned?
This behavior seems to be conflicting with the QUERY_RESULTS_CACHE_IGNORE_NULL flag. I expect the cache to be flushed once there is a subsequent write, followed by the query being fired again.

Related

JPA Specification count with upper limit

I'm using a readonly JPA Entity MyView built on top of a complicated PostgreSQL View my_view that uses Unions of different tables:
#Entity
#Immutable
#Data
public class MyView {
#Id
private Long id;
private String name;
}
I'm also using (Spring Data) JPA Specifications on this Entity to build dynamic queries. Finally I'm using the preexisting query method to execute:
Page<MyView> findAll(Specification<MyView> spec, Pageable pageable);
All of this works perfectly fine - but the View is so complicated it requires a full scan when sorting OR counting. By introducing indices on all relevant tables and columns I got the execution planner to use at least Index Scans when filtering, but Index-Only Scans seem to be prevented by the underlying Unions of different tables.
This means it still needs to load all rows that match the filter criteria into memory to rescan for sorting or even just counting. If no filter hits / the rows are basically unfiltered, every page is loaded once, making the whole query run for about tens of seconds. If filtering drops down the count to 100000 or less it's a question of milliseconds instead.
Further optimization of the query and index structures has diminishing returns at this point, and changing the underlying tables is impractical because of other dependents. Materializing the View or switching anything in the tech stack is also not within scope.
So instead I'd like to change the query coming into the database. Ordering is not really required for the application, so the big question is about the count necessary for pagination.
What would be acceptable is to have a limit for the count - for a few results the exact count may be useful, but if the total count ends up higher than 10000, just saying 'more than 10000 results' in the UI is perfectly fine.
PostgreSQL would allow something like
SELECT COUNT(*) FROM my_view WHERE name LIKE '%abc%'; -- incredibly slow because we need to load gigabytes of data
SELECT COUNT(*) FROM ( SELECT * FROM my_view WHERE name LIKE '%abc%' LIMIT 10000) x; -- incredibly fast because after loading a few pages into memory we reach the limit
Implementing this into custom Spring Data repository methods has turned out to be a harder problem than expected. If the query were static I could send out a native query and be done with it, but Specifications are important.
A few leads in my current research:
introducing a Hibernate StatementInspector may help us inject custom SQL
overwriting/extending the PostgreSQL dialect may do the same in a more specific manner
using query.grouping() and query.having() we may be able to build a different JPQL TypedQuery with the same intention inside the Repository implementation: returning the count from something but with an upper limit. How this query would look like is sadly beyond me.
a not-solution would be to e.g. use a projection to only return column id and just request the first 10000 elements with a normal query and count the list in Java. While internally this would be fine for the DB, the new bottleneck would be network. It also seems too crude.
way more interesting solution would be to use the EXPLAIN sql feature to get a quick row count estimate, but implementing this with JPA seems even more complicated than just limiting the count to a reasonable size.
Unfortunately this is also where I currently stand.
What are the best ways to gain this functionality of a JPA count upper limit, and are there existing solutions I'm overlooking?
Similar questions:
JPA count query with maximum results (unclear if the intention is identical, but it sounds similar)
fetch the count of records from table by setting the Maximum results in JPA

Eclipselink batch fetch VS join fetch

When should I use "eclipselink.join-fetch", when should I use "eclipselink.batch" (batch type = IN)?
Is there any limitations for join fetch, such as the number of tables being fetched?
Answer is alway specific to your query, the specific use case, and the database, so there is no hard rule on when to use one over the other, or if to use either at all. You cannot determine what to use unless you are serious about performance and willing to test both under production load conditions - just like any query performance tweaking.
Join-fetch is just what it says, causing all the data to be brought back in the one query. If your goal is to reduce the number of SQL statements, it is perfect. But it comes at costs, as inner/outer joins, cartesian joins etc can increase the amount of data being sent across and the work the database has to do.
Batch fetching is one extra query (1+1), and can be done a number of ways. IN collects all the foreign key values and puts them into one statement (more if you have >1000 on oracle). Join is similar to fetch join, as it uses the criteria from the original query to select over the relationship, but won't return as much data, as it only fetches the required rows. EXISTS is very similar using a subquery to filter.

why does Doctrine ODM (MongoDB) findOneById hit database many times for the same id?

In our application, within one request, we do many queries of the sort:
$dm->getRepository('Bundle:some_document')->findOneById($id)
My expectation was that when we do second and subsequent calls for some fixed id (say, 1) there should be no actual query to the database and we would get some "in-memory" representation of the document, fetched during the first time.
However, it seems to hit the db over and over again.
Is it an expected behavour or we're missing something?
$repository->findOneById() is barely wrapping $repository->findOneBy($criteria) which may or may not ask for document by its identifier (and there are no optimizations to see if criteria is only an identifier).
If you want to utilize in-memory representation of objects you need to use $repository->find() which first tries to look up your document in the UnitOfWork and hits database only later if there was no hit.

Morphia is there a difference between fetch and asList in performance wise

We are using morphia 0.99 and java driver 2.7.3 I would like to learn is there any difference between fetching records one by one using fetch and retrieving results by asList (assume that there is enough memory to retrieve records through asList).
We iterate over a large collection, while using fetch I sometimes encounter cursor not found exception on the server during the fetch operation, so I need to execute another command to continue, what could be the reason for this?
1-)fetch the record
2-)do some calculation on it
3-)+save it back to database again
4-)fetch another record and repeat the steps until there isn't any more records.
So which one would be faster? Fetching records one by one or retrieving bulks of results using asList, or isn't there any difference between them using morphia implementation?
Thanks for the answers
As far as I understand the implementation, fetch() streams results from the DB while asList() will load all query results into memory. So they will both get every object that matches the query, but asList() will load them all into memory while fetch() leaves it up to you.
For your use case, it neither would be faster in terms of CPU, but fetch() should use less memory and not blow up in case you have a lot of DB records.
Judging from the source-code, asList() uses fetch() and aggregates the results for you, so I can't see much difference between the two.
One very useful difference would be if the following two conditions applied to your scenario:
You were using offset and limit in the query.
You were changing values on the object such that it would no longer be returned in the query.
So say you were doing a query on awesome=true, and you were using offset and limit to do multiple queries, returning 100 records at a time to make sure you didn't use up too much memory. If, in your iteration loop, you set awesome=false on an object and saved it, it would cause you to miss updating some records.
In a case like this, fetch() would be a better approach.

How to avoid dropping existing index when reindexing

When reindexing Sunspot, the existing index is cleared/dropped first. This means user will see blank search results for a short time, which is bad for production environment. Is there a way to do reindexing without clearing existing indexes?
The clearing occurs when I call rake task, and when I call solr_reindex in console.
By looking into the code, I think doing a Model.solr_index is enough. When indexing is complete, one can start searching into new indexed fields.
The searchable schema is not something shared across all records from one model. Therefore indexing a single record will update the searchable schema of that record.