How to get Spring-Batch completion status when job is launched in Asyncnorous - spring-batch

When Spring-batch jobs are launched in Asyn - How can we get completion status of job if its Successfully completed or failed.

Related

Airflow : Job <job-id> was killed before it finished (likely due to running out of memory)

I have a linear DAG with two tasks - first task's truthy/falsy value decide if second task would be executed. I am using ShortCircuitOperator for first task so that if needed second task can be bypassed. Following is my DAG code :-
DAG_VERSION = "1.0.0"
with DAG(
"sample_dag",
catchup=False,
tags=[DAG_VERSION],
max_active_runs=1,
schedule_interval=None,
default_args=DEFAULT_ARGS,
) as dag:
dag.doc_md = "Sample DAG"
TASK_1 = ShortCircuitOperator(
task_id="task_1",
python_callable=test_script_1,
executor_config=EXECUTOR_CONFIG,
)
TASK_2 = PythonOperator(
task_id="task_2",
python_callable=test_script_2,
executor_config=EXECUTOR_CONFIG,
)
TASK_1 >> TASK_2
However, when I try to run the DAG, then I get following in log for first task when it returns truthy value :-
task_1 logs
Marking task as SUCCESS. dag_id=sample_dag, task_id=task_1, execution_date=20220606T060000, start_date=20220606T070012, end_date=20220606T070014
[2022-06-06, 07:00:17 UTC] State of this instance has been externally set to success. Terminating instance.
[2022-06-06, 07:00:17 UTC] Sending Signals.SIGTERM to GPID 18
[2022-06-06, 07:01:17 UTC] process psutil.Process(pid=18, name='airflow task runner: sample_dag task_1 scheduled__2022-06-06T06:00:00+00:00 7037', status='sleeping', started='07:00:12') did not respond to SIGTERM. Trying SIGKILL
[2022-06-06, 07:01:17 UTC] Process psutil.Process(pid=18, name='airflow task runner: sample_dag task_1 scheduled__2022-06-06T06:00:00+00:00 7037', status='terminated', exitcode=<Negsignal.SIGKILL: -9>, started='07:00:12') (18) terminated with exit code Negsignal.SIGKILL
[2022-06-06, 07:01:17 UTC] Job 7037 was killed before it finished (likely due to running out of memory)
I am using return value of first task in second task. When I try to log xcom value of first task inside second task then I get None, which cause second task to fail. This is my code for access xcom value for first task inside second task :-
def test_script_2(**context: models.xcom) -> List[str]:
task_instance = context["task_instance"]
return_value = task_instance.xcom_pull(task_ids="task_1")
print("logging return value of first task ", return_value)
I am running Airflow 2.2.2 with kubernetes executor.
Is None xcom value due to out of memory issue in first task ? I tried by adding fixed value in first task, but again None was returned in second task with following log :-
task_2 logs
Marking task as SUCCESS. dag_id=sample_dag, task_id=task_2, execution_date=20220606T095611, start_date=20220606T095637, end_date=20220606T095638
[2022-06-06, 09:56:42 UTC] State of this instance has been externally set to success. Terminating instance.
[2022-06-06, 09:56:42 UTC] Sending Signals.SIGTERM to GPID 18
[2022-06-06, 09:57:42 UTC] process psutil.Process(pid=18, name='airflow task runner: sample_dag task_2 manual__2022-06-06T09:56:11.804005+00:00 7051', status='sleeping', started='09:56:37') did not respond to SIGTERM. Trying SIGKILL
[2022-06-06, 09:57:42 UTC] Process psutil.Process(pid=18, name='airflow task runner: sample_dag task_2 manual__2022-06-06T09:56:11.804005+00:00 7051', status='terminated', exitcode=<Negsignal.SIGKILL: -9>, started='09:56:37') (18) terminated with exit code Negsignal.SIGKILL
[2022-06-06, 09:57:42 UTC] Job 7051 was killed before it finished (likely due to running out of memory)
I am unable to find issue in the code. Would appreciate any hint on where I am going wrong and how to get xcom value of first task
Thanks

Spring Batch Job Stop Using jobOperator

I have Started my job using jobLauncher.run(processJob,jobParameters); and when i try stop job using another request jobOperator.stop(jobExecution.getId()); then get exeption :
org.springframework.batch.core.launch.JobExecutionNotRunningException:
JobExecution must be running so that it can be stopped
Set<JobExecution> jobExecutionsSet= jobExplorer.findRunningJobExecutions("processJob");
for (JobExecution jobExecution:jobExecutionsSet) {
System.err.println("job status : "+ jobExecution.getStatus());
if (jobExecution.getStatus()== BatchStatus.STARTED|| jobExecution.getStatus()== BatchStatus.STARTING || jobExecution.getStatus()== BatchStatus.STOPPING){
jobOperator.stop(jobExecution.getId());
System.out.println("###########Stopped#########");
}
}
when print job status always get job status : STOPPING but batch job is running
its web app, first upload some CSV file and start some operation using spring batch and during this execution if user need stop then stop request from another controller method come and try to stop running job
Please help me for stop running job
If you stop a job while it is running (typically in a STARTED state), you should not get this exception. If you have this exception, it means you have stopped your job while it is currently stopping (that is what the STOPPING status means).
jobExplorer.findRunningJobExecutions returns only running executions, so if in the next line right after this one you have a job in STOPPING status, this means the status changed right after calling jobExplorer.findRunningJobExecutions. You need to be aware that this is possible and your controller should handle this case.
When you tell spring batch to stop a job it goes into STOPPING mode. What this means is it will attempt to complete the unit of work chunk it is currently processing but then stop working. Likely what's happening is you are working on a long running task that is not finishing a unit of work (is it hung?) so it can't move from STOPPING to STOPPED.
Doing it twice rightly leads to an Exception because your job is already STOPPING by the time you did it the first time.

In a scala spark job, running in yarn, how can I fail the job so that yarn shows a Failed status

I have a simple if statement in my scala spark job code, that if false i want to stop the job and mark it failed. I want the yarn UI to show the spark job with a status of failed, but everything i've done so far has stopped the job, but only shows up as successfully finished on the yarn UI.
if(someBoolen) {
//context.clearAllJobs()
//System.exit(-1)
//etc, nothing so far, stops the job and show as failed in the yarn UI
}
Any help would be great.
Throwing an exception (and not catching it) will cause the process to fail.
if(someBoolen) {
throw new Exception("Job failed");
}

How to stop and resume a spring batch job

Goal : I am using spring batch for data processing and I want to have an option to stop/resume (where it left off).
Issue: I am able to send a stop signal to a running job and it gets stopped successfully. But when I try to send start signal to same job its creating a new instance of the job and starts as a fresh job.
My question is how can we achieve a resume functionality for a stopped job in spring batch.
You just have to run it with the same parameters. Just make sure you haven't marked the job as non-restarted and that you're not using RunIdIncrementer or similar to automatically generate unique job parameters.
See for instance, this example. After the first run, we have:
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{}] and the following status: [STOPPED]
Status is: STOPPED, job execution id 0
#1 step1 COMPLETED
#2 step2 STOPPED
And after the second:
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]
Status is: COMPLETED, job execution id 1
#3 step2 COMPLETED
#4 step3 COMPLETED
Note that stopped steps will be re-executed. If you're using chunk-oriented steps, make sure that at least the ItemReader implements ItemStream (and does it with the correct semantics).
Steps marked with allowRestartWithComplete will always be re-run.

Quartz doesn't start jobs sometimes

I use quartz in scala to schedule some jobs. The problem is the 20% times I start an application jobs are not properly started and the output is like this fragment:
0:21:24.001 [DefaultQuartzScheduler_Worker-8] DEBUG org.quartz.core.JobRunShell - Calling execute on job group.Client:17;Lb:10
10:21:24.003 [DefaultQuartzScheduler_Worker-4] DEBUG org.quartz.core.JobRunShell - Calling execute on job group.Client:17;Lb:13
10:21:24.002 [DefaultQuartzScheduler_Worker-7] DEBUG org.quartz.core.JobRunShell - Calling execute on job group.Client:17;Lb:12
10:21:24.001 [DefaultQuartzScheduler_Worker-3] DEBUG org.quartz.core.JobRunShell - Calling execute on job group.Client:17;Lb:11
10:21:24.004 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG o.q.simpl.PropertySettingJobFactory - Producing instance of Job 'group.Client:17;Lb:15', class=pl.soi.sep.RunScripsAndUpdateLeaderboards
10:21:24.004 [DefaultQuartzScheduler_Worker-2] DEBUG org.quartz.core.JobRunShell - Calling execute on job group.Client:17;Lb:15
10:21:24.005 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG o.quartz.core.QuartzSchedulerThread - batch acquisition of 1 triggers
10:21:24.005 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG o.q.simpl.PropertySettingJobFactory - Producing instance of Job 'group.Client:17;Lb:2', class=pl.soi.sep.RunScripsAndUpdateLeaderboards
10:21:24.005 [DefaultQuartzScheduler_Worker-1] DEBUG org.quartz.core.JobRunShell - Calling execute on job group.Client:17;Lb:2
10:21:24.005 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG o.quartz.core.QuartzSchedulerThread - batch acquisition of 1 triggers
10:21:24.006 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG o.q.simpl.PropertySettingJobFactory - Producing instance of Job 'group.Client:17;Lb:20', class=pl.soi.sep.RunScripsAndUpdateLeaderboards
10:21:24.006 [DefaultQuartzScheduler_Worker-5] DEBUG org.quartz.core.JobRunShell - Calling execute on job group.Client:17;Lb:20
10:21:24.013 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG o.quartz.core.QuartzSchedulerThread - batch acquisition of 1 triggers
As you see the jobs are said to be started but the aren't. I just have to restart the app to make it work again. Do you know what can be the problem?