Spring Batch - How to prevent batch from storing transactions in DB

Spring Batch - How to prevent batch from storing transactions in DB - spring-batch

First the problem statement:
I am using Spring-Batch in my DEV environment fine. When I move the code to a production environment I am running into a problem. In my DEV environment, Spring-Batch is able to create it's transaction data tables in our DB2 database server with out problem. This is not a option when we go to PROD as this is a read only job.
Attempted solution:
Search Stack Overflow I found this posting:
Spring-Batch without persisting metadata to database?
Which sounded perfect, so I added
#Bean
public ResourcelessTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public JobRepository jobRepository(ResourcelessTransactionManager transactionManager) throws Exception {
MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean(transactionManager);
mapJobRepositoryFactoryBean.setTransactionManager(transactionManager);
return mapJobRepositoryFactoryBean.getObject();
}
I also added it to my Job by calling .reporitory(jobRepository).
But I get
Caused by: java.lang.NullPointerException: null
at org.springframework.batch.core.repository.dao.MapJobExecutionDao.synchronizeStatus(MapJobExecutionDao.java:158) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
So I am not sure what to do here. I am new to Spring so I am teaching myself as I go. I am open to other solutions, such as an in memory database, but I have not been able to get them to work either. I do NOT need to save any state or session information between runs, but the data base query I am running will return around a million or so rows, so I will need to get that in chunks.
Any suggestions or help would be greatly appreciated.

Add this beans to AppClass
#Bean
public PlatformTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public JobExplorer jobExplorer() throws Exception {
MapJobExplorerFactoryBean jobExplorerFactory = new MapJobExplorerFactoryBean(mapJobRepositoryFactoryBean());
jobExplorerFactory.afterPropertiesSet();
return jobExplorerFactory.getObject();
}
#Bean
public MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean() {
MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean();
mapJobRepositoryFactoryBean.setTransactionManager(transactionManager());
return mapJobRepositoryFactoryBean;
}
#Bean
public JobRepository jobRepository() throws Exception {
return mapJobRepositoryFactoryBean().getObject();
}
#Bean
public JobLauncher jobLauncher() throws Exception {
SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
simpleJobLauncher.setJobRepository(jobRepository());
return simpleJobLauncher;
}

This doesn't directly answer your question, but that is not a good solution; the map-based repository is supposed to be used only for testing. It will grow in memory indefinitely.
I suggest you use an embedded database like sqlite. The main problem in using a separate database for job metadata is that you should then coordinate the transactions between the two databases that you use (so that the state of metadata matches that of the data), but since it seems you're not even writing in the main database, that probably won't be a problem for you.

You could use an in-memory database (for example H2 or HSQL) quite easily. Examples of that you can find for example here: http://www.mkyong.com/spring/spring-embedded-database-examples/.
As for the Map-backed job repository, it does provide a method to clear its contents:
public void clear()
Convenience method to clear all the map DAOs globally, removing all entities.
Be aware that a Map-based job repository is not fit for use in partitioned steps and other multi-threading.

The following seems to have done the job for me:
#Bean
public DataSource dataSource() {
EmbeddedDatabaseBuilder builder = new EmbeddedDatabaseBuilder();
EmbeddedDatabase db = builder
.setType(EmbeddedDatabaseType.HSQL)
.build();
return db;
}
Now Spring is not creating tables in our production database, and when the JVM exits state is lost so nothing seems to be hanging around.
UPDATE: The above code has caused concurrency errors for us. We have addressed this by abandoning the EmbeddedDatabaseBuilder and declaring the HSQLDB this way instead:
#Bean
public BasicDataSource dataSource() {
BasicDataSource dataSource = new BasicDataSource();
dataSource.setDriverClassName("org.hsqldb.jdbcDriver");
dataSource.setUrl("jdbc:hsqldb:mem:testdb;sql.enforce_strict_size=true;hsqldb.tx=mvcc");
dataSource.setUsername("sa");
dataSource.setPassword("");
return dataSource;
}
The primary difference is that we are able to specify mvcc (Multiversion concurrency control) in connection string which resolves the issue.

Related

PostgreSQL drops table after program starts

My goal is basically create a very simple backend application with postgresql, and spring boot. Everytime I run my program I need to insert datas into my database table, because for some reason it does not save permanently. Is this a normal behaviour? To be frank im pretty new to postgresql and spring boot, therefore im sorry if the answer to this question is obvious.
My configuration file:
#Configuration
public class DatabaseConfig {
#Bean
CommandLineRunner commandLineRunner(BlogpostRepository blogrep, CategoryRepository catrep){
return args -> {
blogPost blog1=new blogPost(1,"asd","asd","asd","asd");
blogPost blog2=new blogPost(2,"asd2","asd2","asd2","asd2");
Category cat1=new Category(1,"titles1");
Category cat2=new Category(2,"titles2");
Category cat3=new Category(3,"titles3");
blogrep.saveAll(
List.of(blog1,blog2)
);
catrep.saveAll(
List.of(cat1,cat2,cat3)
);
};
}
}

The solution for this problem was in the application.properties
file. From create and drop I changed it to:
spring.jpa.hibernate.ddl-auto=update

How do you read meta data before the actual job in Spring Batch

I'm currently designing a Spring Batch application that reads from a table, transforms the data and then writes it to another table.
However, before I begin reading the source table, I need to collect some meta data for the application run (e.g. read the holiday calendar table to determine if it's a bank holiday or not). This meta data will not change anymore during runtime, so it needs to be read only once, at the very beginning of the application run.
How can this be achieved? Use a JobListener? Configure a separate Job for this and then pass the information to the "actual" job through an ExecutionContext? Configure a separate step that gets only executed once?

Configure a JobExecutionListener to get the information you need and store it on the Job's ExecutionContext.
You can create a Listener class that either extends JobExecutionListenerSupport to only override the beforeJob method or create a standalone Listener class with a beforeJob method annotated with #BeforeJob.
When configuring the job, just add an instance of your custom Listener class to your JobBuilder configuration before adding any steps.
#Bean
public Job myJob() {
return this.jobBuilderFactory.get("myJob")
.listener(new MyListener())
.start(step1())
.next(step2())
.next(step3())
.build();
}
Anything you add in your Job's ExecutionContext can then be injected into any other Processor/Reader/Writer/Step beans that are configured as long as they are annotated with either #JobScope or #StepScope:
#Bean
#JobScope
public ItemReader<MyItem> myItemReader(
#Value("#{jobExecutionContext['myDate']}") Date myDate) {
//...
}
Component classes work the same as well
#Component
#JobScope
static class MyProcessor implements ItemProcessor<ItemA, ItemB> {
private Date myDate;
public MyProcessor(
#Value("#{jobExecutionContext['myDate']}") Date myDate) {
this.myDate = myDate;
}
// ...
}

Spring step does not run properly when I "fib" the reader, must I use a tasklet?

I'm aware that all spring steps need to have a reader, a writer, and optionally a processor. So even though my step only needs a writer, I am also fibbing a reader that does nothing but make spring happy.
This is based on the solution found here. Is it outdated, or am I missing something?
I have a spring batch job that has two chunked steps. My first step, deleteCount, is just deleting all rows from the table so that the second step has a clean slate. This means my first step doesn't need a reader, so I followed the above linked stackoverflow solution and created a NoOpItemReader, and added it to my stepbuilder object (code at the bottom).
My writer is mapped to a simple SQL statement that deletes all the rows from the table (code is at the bottom).
My table is not being cleared by the deleteCounts step. I suspect it's because I'm fibbing the reader.
I am expecting that deleteCounts will delete all rows from the table, yet it is not - and I suspect it's because of my "fibbed" reader but am not sure what I'm doing wrong.
My delete statement:
<delete id="delete">
DELETE FROM ${schemaname}.DERP
</delete>
My deleteCounts Step:
#Bean
#JobScope
public Step deleteCounts() {
StepBuilder sb = stepBuilderFactory.get("deleteCounts");
SimpleStepBuilder<ProcessedCountData, ProcessedCountData> ssb = sb.<ProcessedCountData, ProcessedCountData>chunk(10);
ssb.reader(noOpItemReader());
ssb.writer(writerFactory.myBatisBatchWriter(COUNT_DATA_DELETE));
ssb.startLimit(1);
ssb.allowStartIfComplete(true);
return ssb.build();
}
My NoOpItemReader, based on the previously linked solution on stackoverflow:
public NoOpItemReader<? extends ProcessedCountData> noOpItemReader() {
return new NoOpItemReader<>();
}
// for steps that do not need to read anything
public class NoOpItemReader<T> implements ItemReader<T> {
#Override
public T read() throws Exception {
return null;
}
}
I left out some mybatis plumbing, since I know that is working (step 2 is much more involved with the mybatis stuff, and step 2 is inserting rows just fine. deleting is so simple, it must be something with my step config...)

Your NoOpItemReader returns null. An ItemReader returning null indicates that the input has been exhausted. Since, in your case, that's all it returns, the framework assumes that there was no input in the first place.

How To Unit Test Entity Framework With Seeded Data

I'm thinking that it makes sense to test my VS2015 EF Code First project with the data that gets created with seeding. It's not clear to me what should be in the test project in terms of setup and teardown and the actual tests.
Is there an example someone can point me at that shows this? Also, am I off base thinking this is a good way to test (seeded data). I have not been able to find examples of that. The examples I see seem a lot more complex with mocking data instead.

You haven't specified whether you are using MSTest or not, but I just had this problem today and this is what I did using MSTest. This base test class handles the seeding on the first test that runs. The Initialize(false) makes it that it wont try to initialize on secondary test runs so only the first test pays the setup price. Since each test is in a transaction they will rollback the changes made in each test.
[TestClass]
public abstract class EntityFrameworkTest
{
private static bool _hasSeeded;
protected TransactionScope Scope;
[TestInitialize]
public void Initialize()
{
Database.SetInitializer(new MigrateDatabaseToLatestVersion<YourContext, YourModelNameSpace.Migrations.Configuration>());
using (var context = new YourContext())
{
context.Database.Initialize(false);
if (!_hasSeeded)
{
context.AnEntity.AddOrUpdate(c => c.EntityName, new AnEntity {EntityName = "Testing Entity 1"});
context.SaveChanges();
_hasSeeded = true;
}
}
Scope = new TransactionScope();
}
[TestCleanup]
public void CleanUp()
{
Scope.Dispose();
}
[AssemblyCleanup]
public static void KillDb()
{
using (var context = new YourContext())
context.Database.Delete();
}
}
It is also worth noting that I setup my test project app.config with a connection string like this that my context is set to look for (ConnStringName). The desire being here that each devs machine will just create a Testing db in their local db and wont have to fiddle with changing the connection string to something if their actual SQL instance setup is different. Also, depending on if you are VS 2015 or not, your local DB data source may vary.
<add name="ConnStringName" connectionString="Data Source=(localdb)\MSSQLLocalDB; Initial Catalog=DbNameTestingInstance; Integrated Security=True; MultipleActiveResultSets=True;Application Name=Testing Framework;" providerName="System.Data.SqlClient" />

spring batch 3.0.3 get "updates to tables using non-transactional storage engines such as MyISAM " with mysql 5.6

I use spring batch 3.0.3.RELEASE in grails 2.4.4
I found exception when i execute the code below.
"When ##GLOBAL.ENFORCE_GTID_CONSISTENCY = 1, updates to non-transactional tables can only be done in either autocommitted statements or single-statement transactions, and never in the same statement as updates to transactional tables."
the code is
List<Flow> flowList = Lists.newArrayList()
Shop.findAllByCityIdAndTypeAndStatus(cityId, 1 as byte, 1 as byte).each {
Shop stationShop ->
TaskletStep taskletStep = stepBuilderFactory.get("copy_city_item_to_station").tasklet(new Tasklet() {
#Override
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
copyCityItemToStationItem(item, stationShop)
return RepeatStatus.FINISHED
}
}).build()
Flow flow = new FlowBuilder<Flow>("subflow").from(taskletStep).end();
flowList.add(flow)
}
Flow splitFlow = new FlowBuilder<Flow>("split_city_item_to_station").split(eventTaskExecutor).add(flowList.toArray(new Flow[0])).build();
FlowJobBuilder builder = jobBuilderFactory.get("push_item_to_all_station").start(splitFlow).end();
Job job = builder.preventRestart().build()
jobLauncher.run(job, new JobParametersBuilder().addLong("city.item.id", item.id).toJobParameters())
the google say the problom maybe exist in "https://dev.mysql.com/doc/refman/5.6/en/replication-gtids-restrictions.html", so i replace all the ENGINE form MyISAM to InnoDB in file 'schema-mysql.sql', and it works.
now i want to know what i do is right or not, is there a potential bug in my way ?

What you did is correct. That's a bug in Spring Batch's generated SQL file for MySql. I've created an issue in Jira that you can follow here: https://jira.spring.io/browse/BATCH-2373.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spring Batch - How to prevent batch from storing transactions in DB - spring-batch

Related

PostgreSQL drops table after program starts

How do you read meta data before the actual job in Spring Batch

Spring step does not run properly when I "fib" the reader, must I use a tasklet?

How To Unit Test Entity Framework With Seeded Data

spring batch 3.0.3 get "updates to tables using non-transactional storage engines such as MyISAM " with mysql 5.6

Categories

Resources