how to read complex data from multiple related database tables in spring batch - spring-batch

I intend to implement the batch to read data from various DB tables to populate the complex domain below, then perform calculation in processor and load the data into DB via writer.
public class A{
private String id;
private String name;
private ArrayList list1;
private ArrayList list2;
......
}
Now, I am stuck at the design of the reader. The idea is to query DB table to get a list of id, then query the other fields including list1 and list 2 based on each id. It seems the existing reader can not fulfill this requirement, do I need to create custom reader to achieve the goal? I think I would take chunk approach, but has no clue how to implement it.
Code example is much appreciated.

You can use the driving query pattern. The reader reads only the IDs and then the processor can query the details of each object based on the ID.
This is a common pattern and you can find more details about it in the common batch patterns section of the documentation here: https://docs.spring.io/spring-batch/4.0.x/reference/html/common-patterns.html#drivingQueryBasedItemReaders

Related

How do I load an aggregate object from db in AxonFramework or any other Event-Sourcing frameworks?

I've had the question for a long time. For most samples on Internet. They always creat one aggregate object first and then operate the aggregate objects. My question is, how can I load one from db other than create one every time. I'll take e-sopping as an example. I treat one product as an aggregate object. I can't load all of them into my program memory. So how can I do?
What I do is, I write another constructor whit the parameter UpdateProductCommand as well as the constructor with parameter CreateProductCommand. In this constructor, I load it from db.Is this OK?
class Product{
public Product(){}
#CommandHandler
public Product(CreateProductCommand command){
apply(new CreateProductEvent(command.id));
}
#CommandHandler
public Product(UpdateProductCommand command){
load(command.id)
...
apply(new UpdateProductEvent(command.id));
}
}
I am assuming that you want to use State-Stored Aggregates and you can check the link for more info.
To give you some light, I would have to see which field have you marked with #Id and #AggregateIdentifier but assuming you have one String id (which is your command.id and the #TargetAggregateIdentifier as well), Axon is responsible for loading the Aggregate from the database based on that field. Having said that, you don't have to take care of it yourself, just focusing on your business logic (which means validations) and applying new values when needed.

Should DTOs contain embedded entities or should I return "flat" DTOs that only have the IDs?

In the past, my DTOs have been a direct map of the entity. However, I am now in a scenario where all we really need is the ID of a nested object so we can then do the DB lookup if needed. It looks something like this
public class UserDto {
Integer id;
String name;
List<Integer> groupIds;
}
But the entity looks like this
public class UserEntity {
Integer id;
String name;
List<UserGroupEntity> userGroups;
}
Is this a common practice? Should I just have the DTO mapped directly from the entity and have embeded UserGroup DTOs?
I usually go with option number one, only including the ids of other entities. The DTO can otherwise become quite bloated after a while, leading to a lot of unnecessary database lookups.
I only go for option number two if the DTO never makes any sense without the containing entities. But this is almost never the case, and it's difficult and dangerous to assume that you know what information all future users of your DTO will want to have.
When it comes to the API endpoint design (assuming you go for option one) you can let the client fetch Users and UserGroups by calling the following endpoints:
/users/:id
/users/:id/usergroups
This is so that the client doesn't have to wait for the User fetch to finish before fetching the UserGroups. This follows RESTful principles and can e.g. be done in parallell from the client.

Deciding which collection

I have an abstract class which is extended by large number of other POJOs, I need all these main POJOs to be stored in a dedicated collection.
My repository looks like this:
interface TimesliceRepository extends MongoRepository<AbstractTimeslice, String>
How can I make it so that objects are directed to the appropriate collection? Eg: AATimeslice, BBTimeslice, etc...
Or do I have to have a repository for every POJO?
Also, would read queries work? How would I be able to query for BBTimeslice only?
After some research, I came to the conclusion that for my uses cases its better to use MongoTemplate and not MongoRepository.

Refactoring application: Direct database access -> access through REST

we have a huge database application, which must get refactored (there are so many reasons for this. biggest one: security).
What we already have:
MySQL Database
JPA2 (Eclipselink) classes for over 100 tables
Client application that accesses the database directly
What needs to be there:
REST interface
Login/Logout with roles via database
What I've done so far:
Set up Spring MVC 3.2.1 with Spring Security 3.1.1
Using a custom UserDetailsService (contains just static data for testing atm)
Created a few Controllers for testing (simply receiving/providing data)
Design Problems:
We have maaaaany #OneToMany and #ManyToMany relations in our database
1.: (important)
If I'd send the whole object tree with all child objects as a response, I could probably send the whole database at once.
So I need a way to request for example 'all Articles'. But it should omit all the child objects. I've tried this yesterday and the objects I received were tons of megabytes:
#PersistenceContext
private EntityManager em;
#RequestMapping(method=RequestMethod.GET)
public #ResponseBody List<Article> index() {
List<Article> a = em.createQuery("SELECT a FROM Article a", Article.class).getResultList();
return a;
}
2.: (important)
If the client receives an Article, at the moment we can simply call article.getAuthor() and JPA will do a SELECT a FROM Author a JOIN Article ar WHERE ar.author_id = ?.
With REST we could make a request to /authors/{id}. But: This way we can't use our old JPA models on the client side, because the model contains Author author and not Long author_id.
Do we have to rewrite every model or is there a simpler approach?
3.: (less important)
Authentication: Make it stateless or not? I've never worked with stateless auth so far, but Spring seems to have some kind of support for it. When I look at some sample implementations on the web I have security concerns: With every request they send username and password. This can't be the right way.
If someone knows a nice solution for that, please tell me. Else I'd just go with standard HTTP Sessions.
4.:
What's the best way to design the client side model?
public class Book {
int id;
List<Author> authors; //option1
List<Integer> authorIds; //option2
Map<Integer, Author> idAuthorMap; //option3
}
(This is a Book which has multiple authors). All three options have different pros and cons:
I could directly access the corresponding Author model, but if I request a Book model via REST, I maybe don't want the model now, but later. So option 2 would be better:
I could request a Book model directly via REST. And use the authorIds to afterwards fetch the corresponding author(s). But now I can't simply use myBook.getAuthors().
This is a mixture of 1. and 2.: If I just request the Books with only the Author ids included, I could do something like: idAuthorMap.put(authorId, null).
But maybe there's a Java library that handles all the stuff for me?!
That's it for now. Thank you guys :)
The maybe solution(s):
Problem: Select only the data I need. This means more or less to ignore every #ManyToMany, #OneToMany, #ManyToOne relations.
Solution: Use #JsonIgnore and/or #JsonIgnoreProperties.
Problem: Every ignored relation should get fetched easily without modifying the data model.
Solution: Example models:
class Book {
int bId;
Author author; // has #ManyToOne
}
class Author {
int aId;
List<Book> books; // has #OneToMany
}
Now I can fetch a book via REST: GET /books/4 and the result will look like that ('cause I ignore all relations via #JsonIgnore): {"bId":4}
Then I have to create another route to receive the related author: GET /books/4/author. Will return: {"aId":6}.
Backwards: GET /authors/6/books -> [{"bId":4},{"bId":42}].
There will be a route for every #ManyToMany, #OneToMany, #ManyToOne, but nothing more. So this will not exist: GET /authors/6/books/42. The client should use GET /books/42.
First, you will want to control how the JPA layer handles your relationships. What I mean is using Lazy Loading vs. Eager loading. This can easily be controller via the "fetch" option on the annotation like thus:
#OneToMany(fetch=FetchType.Lazy)
What this tells JPA is that, for this related object, only load it when some code requests it. Behind the scenes, what is happening is that a dynamic "proxy" object is being made/created. When you try to access this proxy, it's smart enough to go out and do another SQL to gather that needed bit. In the case of Collection, its even smart enough to grab the underlying objects in batches are you iterate over the items in the Collection. But, be warned: access to these proxies has to happen all within the same general Session. The underlying ORM framework (don't know how Eclipselink works...I am a Hybernate user) will not know how to associate the sub-requests with the proper domain object. This has a bigger effect when you use transportation frameworks like Flex BlazeDS, which tries to marshal objects using bytecode instead of the interface, and usually gets tripped up when it sees these proxy objects.
You may also want to set your cascade policy, which can be done via the "cascade" option like
#OneToMany(cascade=CascadeType.ALL)
Or you can give it a list like:
#OneToMany(cascade={CascadeType.MERGE, CascadeType.REMOVE})
Once you control what is getting pulled from your database, then you need to look at how you are marshalling your domain objects. Are you sending this via JSON, XML, a mixture depending on the request? What frameworks are you using (Jackson, FlexJSON, XStream, something else)? The problem is, even if you set the fetch type to Lazy, these frameworks will still go after the related objects, thus negating all the work you did telling it to lazily load. This is where things get more specific to the mashalling/serializing scheme: you will need to figure out how to tell your framework what to marshal and what not to marshal. Again, this will be highly dependent on whatever framework is in use.

How to achieve pagination without writing Query in JPA?

I have a Question object which has List of Comment objects with #OneToMany mapping. The Question object has a fetchComments(int offset, int pageSize) method to fetch comments for a given question.
I want to paginate the comments by fetching a limited amount of them at a time.
If I write a Query object then I can set record offset and maximum records to fetch with Query.setFirstResult(int offset) and Query.setMaxResults(int numberOfResults). But my question is how(if possible) can I achieve the same result without having to write a Query i.e. with simple annotation or property. More clearly, I need to know if there is something like
#OneToMany(cascade = CascadeType.ALL)
#Paginate(offset = x,maxresult = y)//is this kind of annotation available?
private List<Comment> comments;
I have read that #Basic(fetch = FetchType.LAZY) only loads the records needed at runtime, but I won't have control to the number of records fetched there.
I'm new to JPA. So please consider if I've missed something really simple.
No, there is no such a functionality in JPA. Also concept itself is bit confusing. With your example offset (and maxresult as well) is compile time constant and that does not serve pagination purpose too well. Also in general JPA annotations in entities define structure, not the context dependent result (for that need there is queries).
If fetching entities when they are accessed in list is enough and if you are using Hibernate, then closest you can get is extra #LazyCollection:
#org.hibernate.annotations.LazyCollection(LazyCollectionOption.EXTRA)