Spring Batch CommandLineJobRunner with parameters - spring-batch

I'm working on a simple batch : one job, one step, one reader ,one processor ,one writer
Currently, i'm running it using the CommandLineJobRunner specifying the configuration class and the name of the job :
CommandLineJobRunner MyConfigurationClass myJobName
Is it possible to add parameters at the end of the line and then use these parameters on my writer ? If it is how can i do it? Thanks :)

Ok it was CommandLineJobRunner myConfigurationClass myJob parameterName=parametervalue
then i can catch this paramater in the java code with
#Value({jobParameters['parameterName']})

Related

Karate- Gatling: Not able to run scenarios based on tags

I am trying to run performance test on scenario tagged as perf from the below feature file-
#tag1 #tag2 #tag3
**background:**
user login
#tag4 #perf
**scenario1:**
#tag4
**scenario2:**
Below is my .scala file setup-
class PerfTest extends Simulation {
val protocol = karateProtocol()
val getTags = scenario("Name goes here").exec(karateFeature("classpath:filepath"))
setUp(
getTags.inject(
atOnceUsers(1)
).protocols(protocol)
)
I have tried passing the tags from command line and as well as passing the tag as argument in exec method in scala setup.
Terminal command-
mvn clean test-compile gatling:test "-Dkarate.env={env}" "-Dkarate.options= --tags #perf"
.scala update:- I have also tried passing the tag as an argument in the karate execute.
val getTags = scenario("Name goes here").exec(karateFeature("classpath:filepath", "#perf"))
Both scenarios are being executed with either approach. Any pointers how i can force only the test with tag perf to run?
I wanted to share the finding here. I realized it is working fine when i am passing the tag info in .scala file.
My scenario with perf tag was a combination of GET and POST call as i needed some data from GET call to pass in POST call. That's why i was seeing both calls when running performance test.
I did not find any reference in karate gatling documentation for passing tags in terminal execution command. So i am assuming that might not be a valid case.

Submit command line arguments to a pyspark job on airflow

I have a pyspark job available on GCP Dataproc to be triggered on airflow as shown below:
config = help.loadJSON("batch/config_file")
MY_PYSPARK_JOB = {
"reference": {"project_id": "my_project_id"},
"placement": {"cluster_name": "my_cluster_name"},
"pyspark_job": {
"main_python_file_uri": "gs://file/loc/my_spark_file.py"]
"properties": config["spark_properties"]
"args": <TO_BE_ADDED>
},
}
I need to supply command line arguments to this pyspark job as show below [this is how I am running my pyspark job from command line]:
spark-submit gs://file/loc/my_spark_file.py --arg1 val1 --arg2 val2
I am providing the arguments to my pyspark job using "configparser". Therefore, arg1 is the key and val1 is the value from my spark-submit commant above.
How do I define the "args" param in the "MY_PYSPARK_JOB" defined above [equivalent to my command line arguments]?
I finally managed to solve this conundrum.
If we are making use of ConfigParser, the key has to be specified as below [irrespective of whether the argument is being passed as command or on airflow]:
--arg1
In airflow, the configs are passed as a Sequence[str] (as mentioned by #Betjens below) and each argument is defined as follows:
arg1=val1
Therefore, as per my requirement, command line arguments are defined as depicted below:
"args": ["--arg1=val1",
"--arg2=val2"]
PS: Thank you #Betjens for all your suggestions.
You have to pass a Sequence[str]. If you check DataprocSubmitJobOperator you will see that the params job implements a class google.cloud.dataproc_v1.types.Job.
class DataprocSubmitJobOperator(BaseOperator):
...
:param job: Required. The job resource. If a dict is provided, it must be of the same form as the protobuf message.
:class:`~google.cloud.dataproc_v1.types.Job`
So, on the section about job type pySpark which is google.cloud.dataproc_v1.types.PySparkJob:
args Sequence[str]
Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.

Spring batch, handle failed files again, skip successfully handled

How to ideologically correct organize file handling?
I have a folder for new files (NEW), folder for old files (OLD), a folder for failed files (FAIL). New file puts in NEW, then if the handling was correct, the file goes to OLD, if the handling was failed, the file goes to ERR. Then we take this file again and correcting it and put in NEW if all ok file goes to OLD if failed goes to ERR. And repeat again and again.
I have job with constant name "fileHandlingJob", in job i have some steps: "extract", "handling", "utilize", and i have job parameters: "filePath", "fileName".
Thanks!
If you state that uniqueness criteria of file - it's file's name, then you are on right way.
If job was in state FAILED (ERR folder) then you can retrigger it with same set of parameters. If job was COMPLETED - you can't run it again. Spring batch will complain.
You can ensure this behaviour by having unique file name as Job's parameter. So no other job could be triggered with same file name. Spring batch will simply prevent this.
Second parameter filePath can be additional non-unique parameter.
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder()
.addString("fileName", "myfile.xml", true)
.addDate("filePath", "C:\new\myfile.xml", false);
true/false here means whether parameter is unique or not.

Unattended run for Gatling

I generated a maven project for gatling using the archetype approach. When I run the Engine.scala file from eclipse I ask for input on which simulation to run.
I added the property for the simulation class using the Gatling PropertiesBuilder but it still asks for a simulation id. I want to provide all the information upfront so that when I run Engine.scala it does not prompt me for input.
Here is my code
val props = new GatlingPropertiesBuilder
props.dataDirectory(IDEPathHelper.dataDirectory.toString)
props.resultsDirectory(IDEPathHelper.resultsDirectory.toString)
props.bodiesDirectory(IDEPathHelper.bodiesDirectory.toString)
props.binariesDirectory(IDEPathHelper.mavenBinariesDirectory.toString)
props.simulationClass("za.co.insights.gatling.RecordedSimulation")
props.runDescription("Testing")
props.mute()
//props.reportsOnly("true")
//props.
//Gatling.fromArgs
Gatling.fromMap(props.build)
To do this go to your gattling.conf file and set mute = true

Stopping SpringBatch jobs started from the command line

Spring Batch jobs can be started from the commandline by telling the JVM to run CommandLineJobRunner. According to the JavaDoc, running the same command with the added parameter of -stop will stop the Job:
The arguments to this class can be provided on the command line
(separated by spaces), or through stdin (separated by new line). They
are as follows:
jobPath jobIdentifier (jobParameters)* The command line options are as
follows
jobPath: the xml application context containing a Job
-restart: (optional) to restart the last failed execution
-stop: (optional) to stop a running execution
-abandon: (optional) to abandon a stopped execution
-next: (optional) to start the next in a sequence according to the JobParametersIncrementer in the Job jobIdentifier: the name of the job or the id of a job execution (for -stop, -abandon or -restart).
jobParameters: 0 to many parameters that will be used to launch a job specified in the form of key=value pairs.
However, on the JavaDoc for the main() method the -stop parameter is not specified. Looking through the code on docjar.com I can't see any use of the -stop parameter where I would expect it to be.
I suspect that it is possible to stop a batch that has been started from the command line but only if the batches being run are backed by a non-transient jobRepository? If running a batch on the command line that only stores its data in HSQL (ie in memory) there is no way to stop the job other than CTRL-C etc?
stop command is implemented, see source for CommandLineJobRunner, line 300+
if (opts.contains("-stop")) {
List<JobExecution> jobExecutions = getRunningJobExecutions(jobIdentifier);
if (jobExecutions == null) {
throw new JobExecutionNotRunningException("No running execution found for job=" + jobIdentifier);
}
for (JobExecution jobExecution : jobExecutions) {
jobExecution.setStatus(BatchStatus.STOPPING);
jobRepository.update(jobExecution);
}
return exitCodeMapper.intValue(ExitStatus.COMPLETED.getExitCode());
}
The stop switch will work, but it will only stop the job after the currently executing step completes. It won't kill the job immediately.