Any thoughts on Improving a spring batch jobs restart speed? - spring-batch

I have a spring batch job which copies data from tables from one oracle schema to another.
To do this, I have written a partitioned job.
For example:
Case A:
I have a table A with 100000 rows and I split into 100 steps of 1000 rows each. all these inserts are done in parallel using ThreadPool task executor. If 4 steps failed due to some issue. I restarted the job, it successfully ran the failed 4 steps within seconds as I expected.
Case B:
Say a table A containing 32 million rows is to be copied from source to destination.
So I split this job into steps of 1000 rows each so 32000 steps are created.
Out of these 32000 steps 4 steps fails due to some db issue. When I try to restart this job, Spring batch just hangs or I dont know what processing is going on behind restart it just does not do anything. I waited for 5 hours and then killed the job.
so Can someone help me with what happens behind the restart, how the total number of steps affects the restart ? and what should I do to improve this speed?
Please let me know if any more info is needed.
Updates:
I was able to speed up the Case B by implementing PartitionNameProvider and was returning the names of failed steps alone via getPartitionNames during restart. Great. I was testing this restart with more number of failures and I have one more case now.
Case C:
If I have 20000 steps and all 20000 steps failed. When I try to restart this case. the getPartitionNames returns all 20000 steps. In this case, Im facing the problem I mentioned above. The process hangs.
I tried to understand what was going on behind the job by enabling spring debug logs (which took me so long to discover but worth it). I saw a specific set of queries running. they are:
2019-02-22 06:40:27 DEBUG ResourcelessTransactionManager:755 - Initiating transaction commit
2019-02-22 06:40:27 DEBUG ResourcelessTransactionManager:39 - Committing resourceless transaction on [org.springframework.batch.support.transaction.ResourcelessTransactionManager$ResourcelessTransaction#5743a94e]
2019-02-22 06:40:27 DEBUG ResourcelessTransactionManager:367 - Creating new transaction with name [org.springframework.batch.core.repository.support.SimpleJobRepository.getLastStepExecution]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT
2019-02-22 06:40:27 DEBUG JdbcTemplate:682 - Executing prepared SQL query
2019-02-22 06:40:27 DEBUG JdbcTemplate:616 - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, START_TIME, END_TIME, STATUS, EXIT_CODE, EXIT_MESSAGE, CREATE_TIME, LAST_UPDATED, VERSION, JOB_CONFIGURATION_LOCATION from COPY_JOB_EXECUTION where JOB_INSTANCE_ID = ? order by JOB_EXECUTION_ID desc]
2019-02-22 06:40:27 DEBUG DataSourceUtils:110 - Fetching JDBC Connection from DataSource
2019-02-22 06:40:27 DEBUG DataSourceUtils:114 - Registering transaction synchronization for JDBC Connection
2019-02-22 06:40:27 DEBUG JdbcTemplate:682 - Executing prepared SQL query
2019-02-22 06:40:27 DEBUG JdbcTemplate:616 - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from COPY_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2019-02-22 06:40:27 DEBUG JdbcTemplate:682 - Executing prepared SQL query
2019-02-22 06:40:27 DEBUG JdbcTemplate:616 - Executing prepared SQL statement [SELECT STEP_EXECUTION_ID, STEP_NAME, START_TIME, END_TIME, STATUS, COMMIT_COUNT, READ_COUNT, FILTER_COUNT, WRITE_COUNT, EXIT_CODE, EXIT_MESSAGE, READ_SKIP_COUNT, WRITE_SKIP_COUNT, PROCESS_SKIP_COUNT, ROLLBACK_COUNT, LAST_UPDATED, VERSION from COPY_STEP_EXECUTION where JOB_EXECUTION_ID = ? order by STEP_EXECUTION_ID]
2019-02-22 06:40:27 DEBUG JdbcTemplate:682 - Executing prepared SQL query
2019-02-22 06:40:27 DEBUG JdbcTemplate:616 - Executing prepared SQL statement [SELECT STEP_EXECUTION_ID, STEP_NAME, START_TIME, END_TIME, STATUS, COMMIT_COUNT, READ_COUNT, FILTER_COUNT, WRITE_COUNT, EXIT_CODE, EXIT_MESSAGE, READ_SKIP_COUNT, WRITE_SKIP_COUNT, PROCESS_SKIP_COUNT, ROLLBACK_COUNT, LAST_UPDATED, VERSION from COPY_STEP_EXECUTION where JOB_EXECUTION_ID = ? order by STEP_EXECUTION_ID]
2019-02-22 06:40:30 DEBUG JdbcTemplate:682 - Executing prepared SQL query
2019-02-22 06:40:30 DEBUG JdbcTemplate:616 - Executing prepared SQL statement [SELECT SHORT_CONTEXT, SERIALIZED_CONTEXT FROM COPY_STEP_EXECUTION_CONTEXT WHERE STEP_EXECUTION_ID = ?]
2019-02-22 06:40:30 DEBUG JdbcTemplate:682 - Executing prepared SQL query
2019-02-22 06:40:30 DEBUG JdbcTemplate:616 - Executing prepared SQL statement [SELECT SHORT_CONTEXT, SERIALIZED_CONTEXT FROM COPY_JOB_EXECUTION_CONTEXT WHERE JOB_EXECUTION_ID = ?]
2019-02-22 06:40:30 DEBUG DataSourceUtils:327 - Returning JDBC Connection to DataSource
and so on...
I understood spring is executing getLastStepExecution for each failed step one by one. But why is getLastStepExecution done this way? or is there any way we can configure this? like reading all the step executions in bulk and start processing so as to reduce the restart time.
Thanks in advance.

Related

Postgres crashes when selecting from view

I have a view in Postgres with the following definition:
CREATE VIEW participant_data_view AS
SELECT participant_profile.*,
"user".public_id, "user".created, "user".status, "user".region,"user".name, "user".email, "user".locale,
(SELECT COUNT(id) FROM message_log WHERE message_log.target_id = "user".id AND message_log.type = 'diary') AS diary_reminder_count,
(SELECT SUM(pills) FROM "order" WHERE "order".user_id = "user".id AND "order".status = 'complete') AS pills
FROM participant_profile
JOIN "user" ON "user".id = participant_profile.id
;
The view creation works just fine. However, when I query the view SELECT * FROM participant_data_view, postgres crashes with
10:24:46.345 WARN HikariPool-1 - Connection org.postgresql.jdbc.PgConnection#172d19fe marked as broken because of SQLSTATE(08006), ErrorCode(0) c.z.h.p.ProxyConnection
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
this question suggests to me that it might be an internal assertion that causes it to crash.
If I remove the diary_reminder_count field from the view definition, the select works just fine.
What am I doing wrong? How can I fix my view, or change it so I can query the same data in a different way?
Note that creating the view works just fine, it only crashes when querying it.
I tried running explain (analyze) select * from participant_data_view; from the IntelliJ query console, which only returns
[2020-12-08 11:13:56] [08006] An I/O error occurred while sending to the backend.
[2020-12-08 11:13:56] java.io.EOFException
I ran the same using psql, there it returns
my-database=# explain (analyze) select * from participant_data_view;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Looking at the log files, it contains:
2020-12-08 10:24:01.383 CET [111] LOG: server process (PID 89670) was terminated by signal 9: Killed: 9
2020-12-08 10:24:01.383 CET [111] DETAIL: Failed process was running: select "public"."participant_data_view"."id", "public"."participant_data_view"."study_number", <snip many other fields>,
"public"."participant_data_view"."diary_reminder_count", "public"."participant
2020-12-08 10:24:01.383 CET [111] LOG: terminating any other active server processes
In all likelihood, the Linux kernel out-of-memory killer killed your query because the system ran out of memory.
Either restrict the number of database sessions (for example with a connection pool) or reduce work_mem.
It is usually a good idea to set vm.overcommit_memory = 2 in the kernel and tune vm.overcommit_ratio appropriately.

why mybatis insert not return last insert id?

I am using mybatis to insert record like this:
#Override
public void lockRecordHostory(OperateInfo operateInfo) {
WalletLockedRecordHistory lockedRecordHistory = new WalletLockedRecordHistory();
JSONObject jsonObject = JSON.parseObject(operateInfo.getParam(), JSONObject.class);
lockedRecordHistory.setParam(operateInfo.getParam());
int result = lockedRecordHistoryMapper.insertSelective(lockedRecordHistory);
log.info("result:", result);
}
why the the result value aways 1 not the last insert id?I turn on debug info of mybatis,and it execute log:
DEBUG [http-nio-11002-exec-7] - JDBC Connection [com.alibaba.druid.proxy.jdbc.ConnectionProxyImpl#33d1051f] will be managed by Spring
DEBUG [http-nio-11002-exec-7] - ==> Preparing: insert into wallet_locked_record_history ( locked_amount, created_time, updated_time, user_id, locked_type, operate_type, param ) values ( ?, ?, ?, ?, ?, ?, ? )
DEBUG [http-nio-11002-exec-7] - ==> Parameters: 1(Integer), 1566978734712(Long), 1566978734712(Long), 3114(Long), RED_ENVELOPE_BUMPED_LOCK(String), LOCKED(String), {"amount":1,"lockedType":"RED_ENVELOPE_BUMPED_LOCK","userId":3114}(String)
DEBUG [http-nio-11002-exec-7] - <== Updates: 1
DEBUG [http-nio-11002-exec-7] - ==> Preparing: SELECT LAST_INSERT_ID()
DEBUG [http-nio-11002-exec-7] - ==> Parameters:
DEBUG [http-nio-11002-exec-7] - <== Total: 1
DEBUG [http-nio-11002-exec-7] - Releasing transactional SqlSession [org.apache.ibatis.session.defaults.DefaultSqlSession#420ad884]
Is the transaction affect the results?
It seems that the query that retrieves the value of the generated id uses the separate connection to mysql.
This is from mysql documentation for LAST_INSERT_ID function:
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client
You are using connection pool and depending on its configuration it might happen that different queries are executed using different native JDBC Connection objects, that is using different connections to mysql. So the second query returns the value that was generated (at some earlier time) for the wrong connection from the pool.
To overcome this you do need to configure connection pool so that it does not release the connection after the each statement. You need to configure it so that the pool uses the same connection until the proxy connection is released by you code (that is when mybatis closes connection in the end of the transaction).

Staus NOTRUN in Automatic task scheduler

I created a DB2 task to run my stored procedure automatically at a specific time, I created the task using the ADMIN_TASK_ADD procedure:
CALL SYSPROC.ADMIN_TASK_ADD ( 'WR_AM_ADT_AUTO_CNRRM_SCHDLR',
NULL,
NULL,
NULL,
'05 16 * * *',
'ASPECT',
'WR_AM_ADT_AUTO_CNRRM',
'81930',NULL,NULL);
COMMIT;
I want to run my scheduled task every day at 04:05 PM, but it didn't work and giving the status as
NOTRUN, SQLCODE -104
.
So can anyone please tell me what am I doing wrong?
I also checked my scheduler in task list using following command:
SELECT * from SYSTOOLS.ADMIN_TASK_LIST
I am using DB2 9.7 version on Windows.
The status of the task NOTRUN means an error prevented the scheduler from calling the task's procedure. The SQLCODE indicates the type of error.
I suggest you the followings;
Confirm scheduler is enabled.
db2 > db2set
DB2_ATS_ENABLE=YES
ATS depends on the SYSTOOLSPACE tablespace to store historical data and configuration information. You can check if the tablespace exists in your system with the following query.
db2 select TBSPACE from SYSCAT.TABLESPACES where TBSPACE = 'SYSTOOLSPACE'
You can test stored procedure in isolation
CALL WR_AM_ADT_AUTO_CNRRM()
Then run your task in schedular!

Powercenter SQL1224N error connecting DB2

Im running a workflow in powercenter that is constatnly getting an SQL1224N error.
This process execute a query against one table (POLIZA) with 800k rows, it retrieves the first 10k rows and then it start to execute to another table with 75M rows, at ths moment in DB2 an idle thread error appear but the PWC process still running retrieving the 75M rows, when it is completed (after 20 minutes) the errros comes up related with the first table:
[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032
sqlstate = 40003
[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032
sqlstate = 40003
Database driver error...
Function Name : Fetch
SQL Stmt : SELECT POLIZA.BSPOL_BSCODCIA, POLIZA.BSPOL_BSRAMOCO
FROM POLIZA
WHERE
EXA01.POLIZA.BSPOL_IDEMPR='0015' for read only with ur
Native error code = -1224
DB2 Fatal Error].
I have a similar process runing against the same 2 tables and it is woking fine where the only difference I can see is that the DB2 user is different.
Any idea how can i fix this?
Regards
The common causes for -1224 are:
Your instance or database has crashed, or
Something/somebody is forcing off your application (FORCE APPLICATION or equivalent)
As for the crash, I think you would know by know. This typically requires a database or instance restart. At any rate, can you please have a look into your DIAGPATH to check for any FODC* directories whose timestamp would match the timestamp of the -1224 errors?
As for the FORCE case, you should find some evidence of the -1224 in db2diag.log. Try searching for the decimal -1224, but also for its hex representation (0xFFFFFB38).

PostgreSQL hangs while executing DDL without log message

I have Postgresql-9.2.10 on CentOS.
I experience the following error:
DETAIL: Multiple failures --- write error might be permanent.
ERROR: could not open file "pg_tblspc / 143862353 / PG_9.2_201204301 / 16439 / 199534370_fsm": No such file or directory
This happens since I stopped the PostgreSQL service, ran pg_resetxlog and started the service. The logs in pg_log look good, and the service is listed without any problem.
DML works well , but not a DDL statement like CREATE TABLE, otherwise an error message is thrown or nothing is visible in the logs in pg_log.
If I try to create a table, there is no reaction, and it looks like the statement is blocked by a lock.
So I tried the following query to look for locks:
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
You probably corrupted the PostgreSQL cluster with pg_resetxlog. How exactly did you run the command?
I would restore from the last good backup.