set batch size for a lazy loaded collection in spring-data-jpa

set batch size for a lazy loaded collection in spring-data-jpa - jpa

I need to generate a CSV file containing a database export. Since the data can be quite big, I want to use lazy loading with a specific batch size, so that when I iterate through the collection returned by the DAO/Repository, I will only have a batch loaded at one point. I want this to be done automatically by the collection (e.g. otherwise I could just load page after page, using Pageable as a parameter).
Here is some code to hopefully make things clearer. My controller looks something like this:
public ModelAndView generateCsv(Status status) {
//can return a large number of items.
Collection<Item> items = itemRepository.findByStatus(status);
return new ModelAndView("csv", "items", items);
}
As you can see, I'm passing that collection to the view (through the ModelAndView object), and the view will just iterate through it and generate the CSV.
I want the collection to know how to load the next batch internally, which is what a lazy loaded collection should generally do.
Is there a way to do this with Spring-data or just plain JPA?
I know of ScrollableResults from Hibernate, but I don't like it for two reasons: it's not JPA (I'd have to make my code depend on Hibernate), and it's not using collections API, thus I'd have to make my view know about ScrollableResults. At least if it would implement Iterable, that would have made it nicer, but it's not.
So what I'm looking for is a way to specify that a collection is to be lazy loaded, using a specific batch size. Maybe something like:
#Query("SELECT o FROM Item o WHERE o.status = ?1")
#Fetch(type = FetchType.LAZY, size = 100)
Page<Item> findByStatus(Item.Status status);
If something like this is not possible using Spring Data, do you know if it can be done with QueryDsl? The fact that QueryDsl repositories return Iterator objects makes me think it might lazy load those, though I can't find documentation on that.
Thanks,
Stef.

Related

JHipster Role based masking of certain columns

In a JHipster based project, we need to selectively filter out certain columns based on role/user logged in. All users will be able to view/modify most of the columns, but only some privileged users will be able to view/modify certain secure fields/columns.
It looks like the only option to get this done is using EntityListeners. I can use an EntityListener and mask a certain column during PostLoad event. Say for example, I mask the column my_secure_column with XXX and display to the user.
User then changes some other fields/columns (that he has access to) and submits the form. Do I have to again trap the partially filled in entity in PreUpdate event, get the original value for my_secure_column from database and set it before persisting?
All this seems inefficient. Scoured several hours but couldn't find a specific implementation that best suits this use case.
Edit 1: This looks like a first step to achieving this in a slightly better way. Updating Entities with Update Query in Spring Data JPA
I could use specific partial updates like updateAsUserRole, updateAsManagerRole, etc., instead of persisting the whole entity all the time.
#Repository
public interface CompanyRepository extends JpaRepository<Company, Integer> {
#Modifying(clearAutomatically = true)
#Query("UPDATE Company c SET c.address = :address WHERE c.id = :companyId")
int updateAddress(#Param("companyId") int companyId, #Param("address") String address);
}

Column based security is not an easy problem to solve, and especially in combination with JPA.
Ideally you like to avoid even loading the columns, but since you are selecting entities this is not possible by default, so you have to remove the restricted content by overriding the value after load.
As an alternative you can create a view bean (POJO) and then use JPQL Constructor Expression. Personally I would use CriteriaBuilder. construct() instead of concatenating a JPQL query, but same principle.
With regards to updating the data, the UI should of cause not allow the editing of restricted fields. However you still have to validate on the backend, and I would recommend that you check if the column was modify before calling JPA. Typically you have the modifications in a DTO and would need to load the Entity anyway, if a restricted column was modified, you would send an error back. This way you only call JPA after the security has been checked.

How do I avoid large generated SQL queries in EF when using Include()

I'm using EF (dll version is 4.4) to query against a database. The database contains several tables with course information. When having a look what actually is sent to the db I see a massive, almost 1300 line SQL query (which I'm not going to paste here because of it's size). The query I'm running on the context looks like:
entities.Plans
.Include("program")
.Include("program.offers")
.Include("program.fees")
.Include("program.intakes")
.Include("program.requirements")
.Include("program.codes")
.Include("focuses")
.Include("codes")
.Include("exceptions")
.Include("requirements")
where plans.Code == planCode
select plans).SingleOrDefault();
I want to avoid having to go back to the server when collecting information from each of the related tables but with such a large query I'm wondering if there is there a better way of doing this?
Thanks.

A bit late but, as your data is only changing once a day, look at putting everything you need into an indexed view and place this view in your EF model.

You can usually add the .Include() after the where clause. This means that you're only pulling out information that matches what you want, see if that reduces your query any.

As you are performing an eager loading, so if you are choosing the required entities then its fine. Otherwise you can go with Lazy Loading, but as you specified that you don't want database round trip, so you can avoid it.
I would suggest, If this query is used multiple times then you can use compiled query. So that it will increase the performance.
Go through this link, if you want..
http://msdn.microsoft.com/en-us/library/bb896297.aspx

If you're using DbContext, you can use the .Local property on the context to look if your entity is already retrieved and therefore attached to the context.
If the query had run before and your root Plan entities are already attached based on Plan.Code == planId, presumably its sub-entities are also already attached since you did eager loading, so referring to them via the navigation properties won't hit the DB for them again during the lifetime of that context.
This article may be helpful in using .Local.

You may be able to get a slightly more concise SQL query by using projection rather than Include to pull back your referenced entities:
var planAggregate =
(from plan in entities.Plans
let program = plan.Program
let offers = program.Offers
let fees = program.Fees
//...
where plan.Code == planCode
select new {
plan
program,
offers,
fees,
//...
})
.SingleOrDefault();
If you disable lazy loading on your context this kind of query will result in the navigation properties of your entities being populated with the entities which were included in your query.
(I've only tested this on EF.dll v5.0, but it should behave the same on EF.dll v4.4, which is just EF5 on .NET 4.0. When I tested using this pattern rather than Include on a similarly shaped query it cut about 70 lines off of 500 lines of SQL. Your mileage may vary.)

ORMLite LazyForeignCollection: How to query a collection only once?

Let's assume I have the following object:
public class MyOwnList {
#DatabaseField(id= true)
protected int id;
#ForeignCollectionField(eager = false)
protected Collection<Item> items;
}
As items is marked as lazy it won't be loaded if I load the list object from the database.
That's exactly what I want!!
The problem is that everytime I access items, ORMLite makes a sql query to get the collection. Only discovered it after activating the logging of ORMLite...
Why does it do that? Any good reason for that?
Is there any way that I can lazy load the collection, but only once, not everytime I access the collection? So something between eager and lazy?

The problem is that everytime I access items, ORMLite makes a sql query to get the collection.
So initially I didn't understand this. What you are asking for is for ORMLite to cache the collection of items after it lazy loads it the first time. The problem with this as the default is that ORMLite has no idea how big your collection of items is. One of the reasons why lazy collections are used is to handle large collections. If ORMLite kept all lazy collections around in memory, it could fill the memory.
I will add to the TODO list something like lazyCached = true which does a hybrid between lazy and eager. Good suggestion.

How to achieve pagination without writing Query in JPA?

I have a Question object which has List of Comment objects with #OneToMany mapping. The Question object has a fetchComments(int offset, int pageSize) method to fetch comments for a given question.
I want to paginate the comments by fetching a limited amount of them at a time.
If I write a Query object then I can set record offset and maximum records to fetch with Query.setFirstResult(int offset) and Query.setMaxResults(int numberOfResults). But my question is how(if possible) can I achieve the same result without having to write a Query i.e. with simple annotation or property. More clearly, I need to know if there is something like
#OneToMany(cascade = CascadeType.ALL)
#Paginate(offset = x,maxresult = y)//is this kind of annotation available?
private List<Comment> comments;
I have read that #Basic(fetch = FetchType.LAZY) only loads the records needed at runtime, but I won't have control to the number of records fetched there.
I'm new to JPA. So please consider if I've missed something really simple.

No, there is no such a functionality in JPA. Also concept itself is bit confusing. With your example offset (and maxresult as well) is compile time constant and that does not serve pagination purpose too well. Also in general JPA annotations in entities define structure, not the context dependent result (for that need there is queries).
If fetching entities when they are accessed in list is enough and if you are using Hibernate, then closest you can get is extra #LazyCollection:
#org.hibernate.annotations.LazyCollection(LazyCollectionOption.EXTRA)

Are navigation properties read each time they are accessed? (EF4.1)

I am using EF 4.1, with POCOs which are lazy loaded.
Some sample queries that I run:
var discountsCount = product.Discounts.Count();
var currentDiscountsCount = product.Discounts.Count(d=>d.IsCurrent);
var expiredDiscountsCount = product.Discounts.Count(d=>d.IsExpired);
What I'd like to know, is whether my queries make sense, or are poorly performant:
Am I hitting the database each time, or will the results come from cached data in the DbContext?
Is it okay to access the navigation properties "from scratch" each time, as above, or should I be caching them and then performing more queries on them, for example:
var discounts = product.Discounts;
var current = discounts.Count(d=>d.IsCurrent);
var expired = discounts.Count(d=>d.Expired);
What about a complicated case like below, does it pull the whole collection and then perform local operations on it, or does it construct a specialised SQL query which means that I cannot reuse the results to avoid hitting the database again:
var chained = discounts.OrderBy(d=>d.CreationDate).Where(d=>d.CreationDate < DateTime.Now).Count();
Thanks for the advice!
EDIT based on comments below
So once I call a navigation property (which is a collection), it will load the entire object graph. But what if I filtered that collection using .Count(d=>d...) or Select(d=>d...) or Min(d=>d...), etc. Does it load the entire graph as well, or only the final data?

product.Discounts (or any other navigation collection) isn't an IQueryable but only an IEnumerable. LINQ operations you perform on product.Discounts will never issue a query to the database - with the only exception that in case of lazy loading product.Discounts will be loaded once from the database into memory. It will be loaded completely - no matter which LINQ operation or filter you perform.
If you want to perform filters or any queries on navigation collections without loading the collection completely into memory you must not access the navigation collection but create a query through the context, for instance in your example:
var chained = context.Entry(product).Collection(p => p.Discounts).Query()
.Where(d => d.CreationDate < DateTime.Now).Count();
This would not load the Discounts collection of the product into memory but perform a query in the database and then return a single number as result. The next query of this kind would go to the database again.

In your examples above the Discounts collection should be populated by Ef the first time you access it. The subsequent linq queries on the Discount collection should then be performed in memory. This will even include the last complex expression.
You can also use the Include method to make sure you are getting back associated collection first time. example .Include("Discounts");
If your worried about performance I would recommend using SQL Profiler to have a look at what SQL is being executed.