I have spring batch program which reads the data from DB and process it and inserts(using ItemWriter) in to other table in the database. Here i am using a bunch of SQL queries for ItemReader,ItemProcessor and ItemWriter.
my requirement is store all these queries in a table with parameter and value format and retrieve it with a single DB call and pass it to ItemReader or ItemProcessor or ItemrWriter. So that if there is any change in the queries in future, we will end up in doing only DB updates and the code will be untouched.
I tried to do in beforeJob section but i am facing error saying "java.lang.IllegalArgumentException: The SQL query must be provided". But i can do this successfully by making an DB call inside the ItemReader method. Iam trying to avoid this way of approach because i need to make db call for each ItemReader,ItemProcessor and ItemWriter. Please let me know how to achieve this ?
You can create a step with a tasklet that reads the query from the database and adds it to the execution context under some key, then configure the reader of your chunk-oriented step with the query from the execution context. Here is a quick example:
1. Retrieve the query and put it in the execution context:
#Bean
public Tasklet queryRetrievalTasklet() {
return (contribution, chunkContext) -> {
String query = ""; // retrieve query from db (using a JdbcTemplate for example)
chunkContext.getStepContext().getJobExecutionContext().put("query", query);
return RepeatStatus.FINISHED;
};
}
2. Configure the reader with the query from the execution context
#Bean
#StepScope
public ItemReader<Integer> itemReader(#Value("#{jobExecutionContext['query']}") String query) {
// return your reader configured with the query
return null;
}
Hope this helps.
In my opinion, such configuration is usually done storing queries in properties not in database. Like :
batch.query.unload=SELECT ...
Related
I've a usecase where I need to connect to two different databases(Postgres and Oracle). Postgres is already configured with jpa. I need to add one more databases(Oracle). In the oracle database i need to choose tables at runtime for insertion and deletion(since tables are not fixed). Currently im passing the tables in my properties file as a list
oracle:
deletion:
table:
-
tableName: user
primaryKey: userId
emailField: emailId
deleteTableName: user_delete
-
tableName: customer
primaryKey: customerId
emailField: emailAddress
deleteTableName: customer_delete
I've created a bean that reads all these properties and puts them in a list
#Bean("oracleTables")
#ConfigurationProperties("oracle.deletion.table")
List<Table> getAllTAbles(){
return new ArrayList<>();
}
I have a list of emailAddresses with me. For each of these tables i need to fetch primary key based on emailAddress from parent table(value in tableName) and insert data into corresponding deleteTable(value in deleteTableName). Once that is done i need to delete data from the actual table(value in tableName) based on email address.
I'm planning to loop through the list of tables I have in my bean and perform fetch, insert and delete.
sample snippet
#Autowired
#Qualifier("oracleTables")
List<Table> tables;
public boolean processDelete(List<String> emails){
for(Table table:tables){
//fetch all the primary keys for given emails from main table(value in tableName)
//insert into corresponding delete table
//delete from main table
}
}
But the question i have is , should i go with jdbcTemplate or jpaRepository/hibernate. And some help with implementation as well with a small sample/link.
The reason for this question is
1)Tables in my case are not fixed
2)I need transaction management to rollback in case of failure in either fetching or inserting or deletion.
3)I need to configure two databases
should i go with jdbcTemplate or jpaRepository/hibernate
Most definitely JdbcTemplate. JPA does not easily allow dynamic tables.
I need transaction management to rollback in case of failure in either fetching or inserting or deletion
If you need transactions, you'll also need to define two separate transaction managers:
#Bean
public TransactionManager oracleTransactionManager() {
var result = new DataSourceTransactionManager();
...
result.setDataSource(oracleDataSource());
return result;
}
#Bean
public TransactionManager postgresTransactionManager() {
...
}
Then, if you want declarative transactions, you need to specify the manager with which to run a given method:
#Transactional(transactionManager = "oracleTransactionManager")
public void doWorkInOracleDb() {
...
}
I need to configure two databases
Just configure two separate DataSource beans. Of course, you will actually need two separate JdbcTemplate beans as well.
I have a SQL script which performs delete operation from multiple tables based on say employee ids:
DELETE FROM EMP_ADDRESS where EMP_ID in (EMP_IDS);
DELETE FROM EMP_DETAILS where EMP_ID in (EMP_IDS);
DELETE FROM EMPLOYEE where EMP_ID in (EMP_IDS);
Is there a way to call the sql script from Spring batch by passing the employee ids? I tried an alternate approach where in the writer i get the ids and delete from the tables as below:
public class DeleteEmployeeData implements ItemWriter<EmployeeData>{
#Autowired
private JdbcTemplate jdbcTemplate;
#Override
public void write(List<? extends EmployeeData> items) throws Exception {
for(EmployeeData item : items){
jdbcTemplate.update(SQLConstants.DELETE_EMP_ADDRESS,item.getEmployeeId());
jdbcTemplate.update(SQLConstants.DELETE_EMP_DETAILS,item.getEmployeeId());
jdbcTemplate.update(SQLConstants.DELETE_EMPLOYEES,item.getEmployeeId());
}
}
}
This works. But i wanted to know if there is a better approach than this?
Your current approach with a chunk oriented step looks good to me:
The reader reads IDs
A processor filters IDs
And a composite writer with two writers: one to write xml and another one to delete items
We have a use case where a user can pass in arbitrary search criteria for a collection, and wants the output paged. Using Spring Data repositories, this is quite simple if we know ahead of time what attributes they may be searching on by simple extending MongoRepository, and declaring a:
Page<Thing> findByFooAndBarAndBaz(Type foo, Type bar, Type baz, Pageable page)
However, if we generate the query ourselves either using the fluent interface or constructing a mongo string and wrapping it in a BasicQuery class, I can not find a way to get that into a repository instance. There is no:
Page<Thing> findByQuery(Query q, Pageable page)
functionality that I have been able to see.
Nor can I see how to hook into the MongoTemplate querying functionality with the Page abstraction.
I'm hoping I don't have to roll my own paging (calculating skip and limit parameters, which I guess is not hard) and call into the template directly, but I guess I can if that's the best choice.
I don't think this can be done in the way I'd hoped, but I've come up with a workaround. As background, we put all our methods to do data access in a DAO, and some delegate to the repository, some to the template.
Wrote a DAO method which takes our arbitrary filter string (which I have a utility that converts it to standard mongo JSON query syntax.
Wrap that in a BasicQuery to get a "countQuery".
Use that countQuery to get a total count of records using MongoTemplate#count(Query, Class)
Append my paging criteria to create a "pageQuery" using Query#with(Pageable)
Run the pageQuery with MongoTemplate#find(Query, Pageable)
Get the List<T> result from that, the Pageable that was used for the query and the count returned from the countQuery run, and construct a new PageImp to return to the caller.
Basically, this (DocDbDomain is a test domain class for testing out document db stuff):
Query countQuery = new BasicQuery(toMongoQueryString(filterString));
Query pageQuery = countQuery.with(pageRequest);
long total = template.count(countQuery, DocDbDomain.class);
List<DocDbDomain> content = template.find(pageQuery, DocDbDomain.class);
return new PageImpl<DocDbDomain>(content, pageRequest, total);
You can use the #Query annotation to execute an arbitrary query through a repository method:
interface PersonRepository extends Repository<Person, Long> {
#Query("{ 'firstname' : ?0 }")
Page<Person> findByCustomQuery(String firstname, Pageable pageable);
}
Generally speaking, #Query can contain any JSON query you can execute via the shell but with the ?0 kind of syntax to replace parameter values. You can find more information on the basic mechanism in the reference documentation.
In case you can't express your query within the #Query-Annotation, you can use the Spring Repository PageableExecutionUtils for your custom queries.
For example like this:
#Override
public Page<XXX> findSophisticatedXXX(/* params, ... */ #NotNull Pageable pageable) {
Query query = query(
where("...")
// ... sophisticated query ...
).with(pageable);
List<XXX> list = mongoOperations.find(query, XXX.class);
return PageableExecutionUtils.getPage(list, pageable,
() -> mongoOperations.count((Query.of(query).limit(-1).skip(-1), XXX.class));
}
Like in the original Spring Data Repository, the PageableExecutionUtils will do a separated count request and wrap it into a nice Page for you.
Here you can see that spring is doing the same.
I use JPA specification and Hibernate as my vendor. I need somehow to take the generated SQL Query which is sent to the the DB (printed to the sysout) and save it as a simple string.
Is there a way to do this?
EDIT
Let me make it a beat clearer: I don't need hibernate log. I need to be able to execute the same query on a different DB. Therefore, I need to get the SQL query as is, and hold it in a normal String variable.
Edit 2
Is there a util which I can provide it a bean and it will automatically generate an Insert query? can I somehow use Hibernate beans here? I know it's a beat complex.
Thanks,
Idob
Create a bean like this.
#Bean
public JpaVendorAdapter jpaVendorAdapter(){
HibernateJpaVendorAdapter jpaVendorAdapter = new HibernateJpaVendorAdapter();
jpaVendorAdapter.setGenerateDdl(true);
jpaVendorAdapter.setShowSql(true);
return jpaVendorAdapter;
}
If you're using Spring Boot add it somewhere to your #Configuration.
The logs created from this are executable in MySQL workbench.
You stated that you are using JPA and Hibernate. There's no other way except if the database you support are supported by JPA. In that case there is an AbstractJpaVendorAdapter that you can implement.
The simple answer to your question is No. What you want to do is something that many developers would also like to do however it was not part of the JPA specification and thus the ability to get the generated SQL will depend upon what the vendor decided to do. Using Hibernate the only way to obtain the SQL is via the log.
You have to enable the log4j logging and add an appender for Hibernate to show the queries.
This has already been described here: How to print a query string with parameter values when using Hibernate
If I understand you correctly, you want to get the insert query which Hibernate is executed on one database, and via code, run it on a different database via entityManager#executeUpdate or similar.
Hibernate does not expose the generated query as it is specific for the dialect of target database. So even if were to get the insert query, it could be pointless.
However in your case, you can create two database connections (via two DataSource or EntityManagerFactory whatever in your case) and call dao.persist(entity) two times for both databases, and let Hibernate handle the query construction part.
Edit: By query I mean native query here, HQL query would be same for both databases.
Hope it helps.
I don't know what you mean by a generated query but if you use Hibernate and you have javax.persistence.Query query you can get HQL string very easy (for EclipseLink it is similar). And if you have HQL you can translete it to SQL with QueryTranslator.
// Get HQL
String hqlQueryString = query.unwrap(org.hibernate.query.Query.class).getQueryString();
// Translate HQL to SQL
ASTQueryTranslatorFactory queryTranslatorFactory = new ASTQueryTranslatorFactory();
SessionImplementor hibernateSession = em.unwrap(SessionImplementor.class);
QueryTranslator queryTranslator = queryTranslatorFactory.createQueryTranslator("", hqlQueryString, Collections.emptyMap(), hibernateSession.getFactory(), null);
queryTranslator.compile(Collections.emptyMap(), false);
String sqlQueryString = queryTranslator.getSQLString();
Try to add properties in instance of LocalContainerEntityManagerFactoryBean ,Its working for me :-
#EnableJpaRepositories(basePackages = "org.common.persistence.dao")
public class PersistenceJPAConfig {
#Bean
public LocalContainerEntityManagerFactoryBean entityManagerFactory() {
final LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();
em.setDataSource(dataSource());
em.setPackagesToScan(new String[] { "org.common.persistence.model" });
final HibernateJpaVendorAdapter vendorAdapter = new HibernateJpaVendorAdapter();
em.setJpaVendorAdapter(vendorAdapter);
em.setJpaProperties(additionalProperties());
return em;
}
final Properties additionalProperties() {
final Properties hibernateProperties = new Properties();
hibernateProperties.setProperty("showSql", "true");
hibernateProperties.setProperty("hibernate.show_sql", "true");
hibernateProperties.setProperty("hibernate.format_sql", "true");
hibernateProperties.setProperty("hibernate.query.substitutions", "false");
return hibernateProperties;
}
}
I have the following 2 methods in my Logs repository.
public IEnumerable<Log> GetAll()
{
var db = new CasLogEntities();
return db.Logs;
}
public DbSet<Log> GetAllSet()
{
var db = new CasLogEntities();
return db.Logs;
}
The only difference is that one returns an IEnumerable of Log and the other a DbSet of Log.
In My Asset Controller I have the following code
var allLogs = _logRepo.GetAllSet();
var Logs = (from log in allLogs
group log by log.DeviceId
into l
select new {DeviceId = l.Key, TimeStamp = l.Max(s => s.TimeStamp)}).ToList();
Now the issues is that I am getting massive performance difference in the group by statement depending on which one of the repo methods I call.
getAllSet which returns the DbSet is lightning fast,
GetAll returns IEnumerable is reallllyyyy slow.
Can anybody explain this. I was thinking that the conversion of the DbSet to the IEnumerable in the GetAll was causing the Query to execute and hence I was doing the group by on a massive in memory set. While as the GetAllSet was deferring the query execution until the "ToList()" and hence was doing the group by work on the server.
Is this correct? Is there another way to explain this?
I would prefer for the GetAll to return the IEnumerable as I am more familiar with it and its a bit easier for testing.
No, converting to IEnumerable<T> does not cause it to execute.
It does, on the other hand, take the query into object space, so the SQL generated will be different when you project onto an anonymous type. Watch SQL Server Profiler to see the difference.
This would explain the performance difference.
If you return IQueryable<T> instead of IEnumerable<T>, the SQL/performance should be identical.