JPA/EclipseLink mixing batch fetch strategies on different collections - jpa

I have a JPA / JPQL query in EclipseLink 2.3.2 and I'm providing a batch-fetch query hint on multiple collections
hints={
#QueryHint(name=QueryHints.BATCH, value="obj.collection1"),
#QueryHint(name=QueryHints.BATCH, value="obj.group.members"),
#QueryHint(name=QueryHints.BATCH_TYPE, value="IN"),
}
Is there a way to specify different batch fetch types on different collections, such that I can get obj.collection1 with JOIN and obj.group.members with IN or EXISTS?
Or do they all have to be the same?
The practical application is that with a fetch on nested collections, there may be different cardinalities at different levels. For example, for the initial query, there might be thousands of rows returned so I couldn't use "IN" for obj.collection1 without possibly breaking Oracle syntax limits on the IN clause. On the other hand, for obj.group.members there might only be a few distinct values of group so an IN clause would make more sense.

Related

Eclipselink batch fetch VS join fetch

When should I use "eclipselink.join-fetch", when should I use "eclipselink.batch" (batch type = IN)?
Is there any limitations for join fetch, such as the number of tables being fetched?
Answer is alway specific to your query, the specific use case, and the database, so there is no hard rule on when to use one over the other, or if to use either at all. You cannot determine what to use unless you are serious about performance and willing to test both under production load conditions - just like any query performance tweaking.
Join-fetch is just what it says, causing all the data to be brought back in the one query. If your goal is to reduce the number of SQL statements, it is perfect. But it comes at costs, as inner/outer joins, cartesian joins etc can increase the amount of data being sent across and the work the database has to do.
Batch fetching is one extra query (1+1), and can be done a number of ways. IN collects all the foreign key values and puts them into one statement (more if you have >1000 on oracle). Join is similar to fetch join, as it uses the criteria from the original query to select over the relationship, but won't return as much data, as it only fetches the required rows. EXISTS is very similar using a subquery to filter.

How to do multi-table aggregates using Spring Data repositories?

What's the best approach for doing multi-table aggregates, or non aggregate multi table results in my Spring Data repositories. I don't care about mapping back to entities, I just need a list of objects returned I can massage into a JSON response.
If you don't care about entities, repositories are not the tool for the job. Repositories are defined to simulate collections of aggregates (which are special kinds of entities usually).
So to answer the question from your headline (which surprisingly seems to be the opposite of what you're asking in the description): just do it. Define your entity types including the relations that form the aggregate, create a repository for them and query it, define query methods etc.
If you don't care about types (which is perfectly fine, too), have a look at jOOQ which is centered around SQL to efficiently query relational databases, but wrapped into a nice API.

Sorting Cassandra using individual components of Composite Keys

I want to store a list of users in a Cassandra Column Family(Wide rows).
The columns in the CF will have Composite Keys of pattern id:updated_time:name:score
After inserting all the users, i need to query users in a different sorted order each time.
For example, if i specify updated_time, i could be able to fetch the recent 10 users.
And, if i specify score, then i could be able to fetch the top 10 users based on score.
Does Cassandra supports this?
Kindly help me in this regard...
i need to query users in a different sorted order each time...
Does Cassandra supports this
It does not. Unlike a RDBMS, you can not make arbitrary queries and expect reasonable performance. Instead you must design you data model so the queries you anticipate will be made will be efficient:
The best way to approach data modeling for Cassandra is to start with your queries and work backwards from there. Think about the actions your application needs to perform, how you want to access the data, and then design column families to support those access patterns.
So rather than having one column family (table) for your data, you might want several with cross references between them. That is, you might have to denormalise your data.

The most performant way to query against three collections, sort them and limit?

I am in need of querying against 3 different collections.
Then I need to sort the collection results (each based on different field and order but every time a DateTime value), and then finally limit the number of results I want (10 in my case).
Currently I'm just doing three separate queries and limiting each by 10, then manually sorting based on the date times they have. Then I finally limit to 10 myself.
Is there a better way to do this?
As mongodb is no relational database where you can join multiple tables within one query, no. I'm not even sure if you could do such kind of sorting (taking each field equal on the order precedence) for relational DBMS.
What you're doing already sounds really good. You could possibly improve your sorting of these 3 results. Aborting early to iterate over one or more collections if no further element can be within the overal top 10. You could modify your queries accordingly to only return documents for the other two collections, whose date is lower than the last one (10th) of the first queried collection. But maybe you did this already...
While talking about performance you may consider to add indexes on these datetime fields used for your query to keep the fields presorted in memory.

Entity framework optimize include, provide hints to Include

Is it possible to provide hints to the Include linq queries so that the joining of the child tables can be optimized ?
I have a very generic data model and so, there are multiple referential constraints between tables. I am working with a legacy system, so changing that around would be very difficult.
I have a query like the following, which generates very complicated SQL queries.
var links = A.B.CreateSourceQuery()
.Include("B1")
.Include("B1.C1")
.Include("B1.D1")
.ToArray();
Is there a way to provide hints to the above query, on how to join the respective child entities, so that the SQL generated is more optimized and efficient and the data can be eager loaded.
See my post at http://www.thinqlinq.com/Post.aspx/Title/LINQ-to-Database-Performance-hints particularly the point around breaking up complex queries. In the case where you're fetching two sets of grandchildren, your performance may well suffer. You may want to consider a custom projection instead of multiple includes.