Spring Batch Reader for distributed DB2 database - spring-batch

I am trying to write a job using Spring batch framework. Job needs to get data from a clustered db2 database, call some logic on each fetched record and then store transformed data in same db ( different table than from where it was read). I am trying to write step1 as below,
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(100).reader(reader)
.processor(processor).writer(writer).build();
}
Currently, I face two challenges due to database being DB2 and being clustered,
1.
SQLs provided for meta data at -
/org/springframework/batch/core/schema-db2.sql doesn't work for
distributed DB2. It fails on command , constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY) .
Queries written in this file can be
tweaked to distributed db2 or I can create tables manually too but I am
not sure if I should create tables manually? if that will have some
further complications?
I need all these tables because I wanted to used Spring batch for its PAUSE , RESTART functionalities.
2.
We need to fire all SELECT queries on DB2
with READ ONLY WITH UR SO
question.
If we don't run queries with this keyword, db can get locked.
Problem in point # 2 is that I can't use in built reader classes of Spring Batch (JdbcPagingItemReader etc )as those doesn't support this db2 specific keyword.
By reading useless simple examples on Internet that explain advantages of this framework, I thought that I will be up and running in a very short period but it looks I have to write own query provider classes, research meta data sqls and what not if db happens to be DB2 and distributed.
Has anybody implemented similar job for distributed Db2 database and guide me on above points?

I guess, to solve point # 1 , I will create tables manually since I have confirmed in another question that tables will not get dropped automatically so recreation will not be needed. One time manual activity should be enough.
and I will solve point # 2 by specifying isolation levels at transaction level so WITH UR in SELECT queries will not be needed,
#Autowired
private DataSource dataSource;
#Bean
public TransactionTemplate transactionTemplateUR(){
TransactionTemplate txnTemplate = new TransactionTemplate();
txnTemplate.setIsolationLevelName("ISOLATION_READ_UNCOMMITTED");
txnTemplate.setTransactionManager(txnManager);
return txnTemplate;
}
#Bean
public PlatformTransactionManager txnManager(DataSource dataSource){
DataSourceTransactionManager txnManager = new DataSourceTransactionManager();
txnManager.setDataSource(dataSource);
return txnManager;
}

Related

Nested Query in spring batch processing

I want to create an ETL process using Spring Batch, the steps will be read from a DB(s) and insert in one DB so basically I'm collecting similar information from different DB and inserting them in one DB, I have a large complex query that I need to run on those DBs and the result will be inserted in the so called one DB for later processing, my main concert is that I want to reference this query in the JpaPagingItemReader for example, it there a way I can for example add this query in my project as .sql file and then reference it in the reader?
Or any other solution I can follow?
Thank you
it there a way I can for example add this query in my project as .sql file and then reference it in the reader? Or any other solution I can follow?
You can put your query in a properties file and inject in your reader, something like:
#Configuration
#EnableBatchProcessing
#PropertySource("classpath:application.properties")
public class MyJob {
#Bean
public JpaPagingItemReader itemReader(#Value("${query}") String query) {
return new JpaPagingItemReaderBuilder<>()
.queryString(query)
// set other reader properties
.build();
}
// ...
}
In this example, you should have a property query=your sql query in application.properties. This is actually the regular Spring property injection mechanism, nothing Spring Batch specific here.

Is it OK to globally set the mybatis executor mode to BATCH?

I am currently developing a Spring Boot app, which uses mybatis for its persistence layer. I want to optimize the batch insertion of entities in the following scenario:
// flightSerieMapper and legMapper are used to create a series of flights.
// legMapper needs to use batch insertion.
#Transactional
public FlightSerie add(FlightSerie flightSerie) {
Integer flightSerieId = flightSeriesSequenceGenerator.getNext();
flightSerie.setFlightSerieId(flightSerieId);
flightSerieMapper.create(flightSerie);
// create legs in batch mode
for (Leg leg : flightSerie.getFlightLegs()) {
Integer flightLegId = flightLegsSequenceGenerator.getNext();
leg.setLegId(flightLegId);
legMapper.create(leg);
}
return flightSerie;
}
mybatis is configured as follows in application.properties:
# this can be externalized if necessary
mybatis.config-location=classpath:mybatis-config.xml
mybatis.executor-type=BATCH
This means that mybatis will execute all statements in batch mode by default, including single insert/update/delete statements. Is this OK? Are there any issues I should be aware of?
Another approach would be to use a dedicated SQLSession specifically for the LegMapper. Which approach is the best (dedicated SQLSession vs global setting in application.properties)?
Note: I have seen other examples where "batch inserts" are created using a <foreach/> loop directly in the mybatis xml mapper file. I don't want to use this approach because it does not actually provide a batch insert.
As #Ian Lim said, make sure you annotate mapper methods with inserts and updates with #Flush annotation if you globally set executor type to BATCH.
Another approach would be to use a dedicated SQLSession specifically
for the LegMapper. Which approach is the best (dedicated SQLSession vs
global setting in application.properties)?
Keep in mind that if you are using different SQL sessions for different mappers there will be different transactions for each SQL session. If a service or service method annotated with #Transactional uses several mappers that use different SQL sessions it will allocate different SQL transactions. So it's impossible to do atomic data operation that involves mappers with different SQL sessions.

JPA cache behaviour when invoke count() method on Spring Data JPA Repository

I'm writing a transactional junit-based IT test for Spring Data JPA repository.
To check number of rows in table I use side JDBCTemplate.
I notice, that in transactional context invoking of org.springframework.data.repository.CrudRepository#save(S) doesn't take effect. SQL insert in not performed, number of rows in table is not increased.
But If I invoke org.springframework.data.repository.CrudRepository#count after the save(S) then SQL insert is performed and number of rows is increased.
I guess this is behavior of JPA cache, but how it works in details?
Code with Spring Boot:
#RunWith(SpringRunner.class)
#SpringBootTest
public class ErrorMessageEntityRepositoryTest {
#Autowired
private ErrorMessageEntityRepository errorMessageEntityRepository;
#Autowired
private JdbcTemplate jdbcTemplate;
#Test
#Transactional
public void save() {
ErrorMessageEntity errorMessageEntity = aDefaultErrorMessageEntity().withUuid(null).build();
assertTrue(TestTransaction.isActive());
int sizeBefore= JdbcTestUtils.countRowsInTable(jdbcTemplate, "error_message");
ErrorMessageEntity saved = errorMessageEntityRepository.save(errorMessageEntity);
errorMessageEntityRepository.count(); // [!!!!] if comment this line test will fail
int sizeAfter= JdbcTestUtils.countRowsInTable(jdbcTemplate, "error_message");
Assert.assertEquals(sizeBefore+1, sizeAfter);
}
Entity:
#Entity(name = "error_message")
public class ErrorMessageEntity {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private UUID uuid;
#NotNull
private String details;
Repository:
public interface ErrorMessageEntityRepository extends CrudRepository<ErrorMessageEntity, UUID>
You are correct this is a result of how JPA works.
JPA tries to delay SQL statement execution as long as possible.
When saving new instances this means it will only perform an insert if it is required in order to get an id for the entity.
Only when a flush event occurs will all changes that are stored in the persistence context flushed to the database. There are three triggers for that event to happen:
The closing of the persistence context will flush all the changes. In a typical setup, this is tight to a transaction commit.
Explicitly calling flush on the EntityManager which you might do directly or when using Spring Data JPA via saveAndFlush
Before executing a query. Since you typically want to see your changes in a query.
Number 3 is the effect you are seeing.
Note that the details are a little more complicated since you can configure a lot of this stuff. As usual, Vlad Mihalcea has written an excellent post about it.
In order to make the test data not pollute the database, when using the unit test of Spring-test, the transaction will be rolled back by default, that is, #Rollback is true by default. If you want to test the data without rolling back, you can set #Rollback(value = false) . If you are using a MySQL database, after setting up automatic rollback, if you find that the transaction is still not rolled back, you can check whether the database engine is Innodb , because other database engines such as MyISAM and Memory do not support transactions.

Is it good practice to use AccessBean or SQL to fetch data from OOTB table in IBM WCS

I want to get data from multiple OOTB WCS table for which there is no OOTB rest available. I am using multiple access bean in databean to get data from tables. Is this a good practice or we should use ServerJDBCHelperAccessBean make a single query with join to hit database. I understand that AccessBean are cached but there are techniques we can cache sql also.
Is there any other reason we should use AccessBean instead of ServerJDBCHelperAccessBean in case fetching data from multiple tables. or we should use ServerJDBCHelperAccessBean and get data in single sql query with joins.
And which will be more expensive in above approaches.
Thanks
Ankit
There is no hard and fast rule to choose between the above two methods for database interactions. Developer has to make a logical choice
AccessBeans
Caching is one of the advantage of access beans. That is a good performance improvement and is achieved by caching the home objects as the look up for home objects are costly. Another point in favour of access bean is handling optimistic updates. Your case is to get the data (not to update/insert) and hence you are safe here.
Session Bean
Like access bean , session beans are another way of reading data from DB when you want to get data from multiple tables. A session bean must implement BASEJDBCHelper class.
public class TestSessionBean extends
com.ibm.commerce.base.helpers.BaseJDBCHelper
implements SessionBean{
public Object fetchResults() throws
javax.naming.NamingException, SQLException
{
try {
// get a connection from the WebSphere Commerce data source
makeConnection();
PreparedStatement prepStatement = getPreparedStatement( "sql to execute");
ResultSet rs = executeQuery(prepStatement, false);
}
finally {
closeConnection();
}
}
}
Using ServerJDBCHelperAccessBean
This is used when you have to make a db transaction outside of EJBs. Keep in mind that it is highly recommended to use EJBs for update/delete for keeping the overall integrity.
In your case, as far as I understand it is a select involving multiple tables and you are not keen on the data to be really in sync (like you are OK to lose a data which was updated nano seconds back or so). Hence you can go ahead with second or third approach
A good reference :
http://deepakpadmakumar.blogspot.com.au/2012/05/session-beans-and-entity-beans-in-wcs.html

Trouble with Multi-Tenant Schema Generator Example

We are attempting to use CFE to generate one schema for each tenant as outlined in the CodeFluent blog post (http://blog.codefluententities.com/2014/12/04/multi-tenant-using-multiple-schema/). In this scenario, we are expecting that each schema generated should be identical and we are using the ICodeFluentPersistence Hook system to identify the company for a user and then properly set the schema to be used. All of that works fine, but when we run the code to generate the multiple schemas (https://github.com/SoftFluent/CodeFluent-Entities/tree/master/Extensions/SoftFluent.MultiTenantGenerator), it is removing the constraints. I then tried to see if there was an issue with my configuration, but running the sample program from GitHub produces the same results. After running the sample program, the Primary key was not present in the contoso schema, even though is was properly defined in the dbo schema (and in the model).
Has anyone used the CFE Multi-Schema generator or have any insight into what the issue may be?
Thanks for your response, but I am not sure that I agree. The whole reason (at least of me) to use the Multi-Tenant generator is to create as many database schemas as needed (one per client) from a single CFE model. The idea that you would lose the constraints in all but one of them didn't feel right so I did a bit more investigation and found the following in "Microsoft SQL Server 2012 Internals" by Kalen Delaney and Craig Freeman (through Google Books):
And in fact was able to do a quick test to prove this out by creating two identical tables with identical PK names:
So it would appear to me that CFE should be able to create the two identical databases from the same model and seems to point to a deficiency in the SQLServer diff engine.
The multi-schema generator loads the model and change it dynamically to modify the schema of the entities. Then it call the standard code production process with only the database producers (SQL Server, Oracle, etc.).
So if you want to generate 2 differents schema (dbo and contoso) against an empty database, the process is the following:
Generate the database for the dbo schema from a blank database
Generate the database for the contoso schema from the previously generated database
Before creating a constraint, the SQL Server diff engine drops the constraint with the same name. In fact SQL Server does not allow 2 constraints to have the same name (I can't find a page on MSDN with more details about that). So in your case the existing PK is dropped when you generate the contoso schema because the name of the PK is the same as the one that exists in the dbo schema. Maybe this can be improved, but the diffs engine tries to generate a code that works for SQL Server 2000 to SQL Server 2016.
Workarounds
You can generate each schema in a different database, so the diffs engine will generate the code you expect. Then you can run the generated scripts on the production database. Not the easiest way but it should work.
You can use the patch producer to replace the name of the schema in the file. For SQL files you should use the SqlServerPatchProducer as explain in the KnowledgeBase:
namespace Sample
{
public class SqlServerPatchProducer : SqlServerProducer
{
public SqlServerPatchProducer()
{
}
protected override void RunProceduresScript()
{
string path = GetPath(Project.DefaultNamespace + "_procedures.sql");
ProduceFrom(path, "before");
SearchAndReplaceProducer.ProducePatches(Project, null, this, null, ProductionFlags, Element);
Utilities.RunFileScript(path, Database, OutputEncoding);
ProduceFrom(path, "after");
}
}
}