Datastage job to read an empty file

Datastage job to read an empty file - datastage

What changes should I make in a datastage job in order to run a job successfully even with a empty input file. I have a job that reads a file as an input and undergo into some transformations and will give an end file. I want to make the job run even if the source file contains zero records. Do I need to make changes in sequence job or parallel job? Thanks in advance

Related

Give Input to github action job after first job is succesful

I have a requirement in Github-Action where i need to feed input after 1st job is finished.
I.e. The first job (Build) starts automatically after each commit without taking any input and it runs succesfully.
Then for second job to start(Deploy), based on input(Environment) i select the Deploy should start executing.
I have figured out the manual job execution from https://stackoverflow.com/a/73708545/2728619
But I need help for taking input after 1st job is finished.
For Manual trigger, i tried the solution https://stackoverflow.com/a/73708545/2728619
I am expeccting reading input after 1st job is run(not at the beginning of the workflow).

Multiple Kafka input not starting at the same time in Talend Job

I have a simple Talend standard job containing two Kafka inputs as you can see in the picture, the problem is when I run the job just one of the Kafka input start, the ideal condition that I expected to happen is multiple Kafka input running at the same time, is there is any configuration that I miss?

you can easily add the tParallelize component at the beginning of the talend job and it will be executed at the same time, if you have multiple sub jobs it can work too.
I think the Talend job default runs in serial we just can't see which component runs first because the process is so fast.

Put a deadline in spring batch

In a java program.
I need to read database, take theses data, doing some rest call,  write data in a txt file (who have an header, data and a footer).
Job start saturday night and need to finish before saturday morning. If not finish, we need to close file (write footer before) and start a new one.
I started to check some tool to do this job. Spring batch seem interesting.
I can split job with reader, process, writer.
Is there something to check if a job has reach is deadline
Job will be launch with Jentskin

I guess you must use a scheduler for that.
You must read from DB the end date every minute or so, and
if (endDate.compareTo(new Date())<=0)
than the scheduler'job must stop the batch job.
You can use Quartz

Autosys trigger same DataStage job multiple times with different inovacation IDs

Here is what I am trying to do, not sure if it is possible:
Autosys gets File1:10pm starts DataStage Job 1.1:10pm
Job1.1:10pm is still running
Autosys gets File1:20pm, it needs to start the same Job1 but run it as Job1.1:20pm, even though Job1.1:10pm is still running & not wait for it to finish, go ahead & run.
Can Autosys call the same DataStage job every time it gets a new file & run it with the new timestamp as the invocation id. Without waiting for the previous job to finish.
Thanks ya'll

Yes - absolutely - this is possible. To enable different InvocationIds you have to check the "multiple instance" property in the jobs properties. With this you allow multiple simultaneous runs of the job.
The invocationID can be a parameter as well when calling it from a sequence.
When your (multiple intance) job writes to a file make sure that each filename is unique to avoid side effects due to the multiple runs at the same time. This can be done by specifying DSJobInvocationId as part of the filename. Note that it is a parameter provided by DataStage which needs to be written exactly as shown with the upper and lower case letters. DataStage will the replace it with the content of your job invocationid at runtime.

Retry failed writing operations without delaying other steps in Spring Batch application

I am maintaining a legacy application written using Spring Batch and need to tweak it to never lose data.
I have to read from various webservice (one for each step) and then write to a remote database. Things goes bad when connection with the DB drops because all itens read from webservice are discarded (can't read the same item twice), and the data is lost because can not be written.
I need to setup Spring Batch to keep already read data on one step to retry the writing operation next time the step runs. The same step can not read more data until the write operation is successfully concluded.
When not being able to write, the step should keep the read data and pass execution to the next step, after a while, when it's time to the failed step to run again, it should not read another item, retrying the failed writing operation instead.
The batch application should runs in an infinite loop and each step should gather data from one different source. Failed writing operations should be momentarily skipped (keeping the read data) to not delay others steps but should resume from the write operation next time they are called.
I am researching in various web sources aside from official docs, but Spring Batch hasn't the most intuitive docs I have come across.
Can this be achieved? If yes, how?

You can write the data you need to persist in case the job fails to the Batch Step's ExecutionContext. You can restart the job again with this data:
Step executions are represented by objects of the StepExecution class.
Each execution contains a reference to its corresponding step and
JobExecution, and transaction related data such as commit and rollback
count and start and end times. Additionally, each step execution will
contain an ExecutionContext, which contains any data a developer needs
persisted across batch runs, such as statistics or state information
needed to restart
More from: http://static.springsource.org/spring-batch/reference/html/domain.html#domainStepExecution

I do not know if this will be ok with you, but here are my thoughts on your configuration.
Since you have two remote sources that are open to failure, let us partition the overall system with two jobs (not two steps)
JOB A
Step 1: Tasklet
Check a shared folder for files. If files exist, do not proceed to the next step. Will be more understandable when writing about JOB B
Step 2: Webservice to files
Read from your web service and write results to flatfiles in the shared folder. Since you would be using flatfiles for the output, you will solve your "all items read from webservice are discarded and the data is lost because can not be written."
Use Quartz or equivalent for the scheduling of this job.
JOB B
Poll the shared folder for generated files and create a joblauncher with the file (file.getWhere as a jobparameter). Spring integration project may help in this polling.
Step 1:
Read from the file, write them to remote db and move/delete file if writing to db is successful.
No scheduling will be needed since job launching originates from polled in files.
Sample Execution
Time 0: No file in the shared folder
Time 1: Read from web service and write to shared folder
Time 2: Job B file polling occurs, tries to write to db.
If successfull, the system continues to execute.
If not, when Job A tries to execute on its scheduled time, it will skip reading from web service since files still exist in the shared folder. It will skip until Job B consumes the files.
I did not want to go into implementation specifics but Spring Batch can handle all of these situations. Hope that this helps.