Talend Subjobs and Sundry - talend

Trying to troubleshoot an existing Talend job with many iterations and sub-jobs created by a developer who is no longer with the company. Ran into an issue with subjobs and hoping someone here can answer.
I know by reading the documentation that OnSubjobOk10 indicates that the job will execute after #10 is complete. But in a workflow with no names, how I do know which is Subjob#10? Can I assume it is the one from where the job-job connection is made?
Thanks in advance,
Bee

OnSubJobOK will make te next subjob work if the previous subjob finished without error, from help.talend:
OnSubjobOK (previously Then Run): This link is used to trigger the
next subjob on the condition that the main subjob completed without
error. This connection is to be used only from the start component of
the Job.
These connections are used to orchestrate the subjobs forming the Job
or to easily troubleshoot and handle unexpected errors.

Related

Anylogic-job shop scheduling

I am trying to do a job shop scheduling resorting to anylogic. I have 20 jobs, 5 machines(resources) and each job has a specific order to visit each machine. In each machine each job has different processing time.
This is what I have right know. I have jobs agent that have a DB table of the machine sequence associated.
This is my jobs agent. I created the collections col_machinesequence(arraylist of strings with op1,op2...where op are the columns of my DB table) and enterblock(arraylist of class Enter where I put my 5 enter blocks)
In each exit block I call the function nextmachine, you can read about it here How to send agents through exit and enter blocks?.
Right know, when I run my project I don't get any error however this is what happens. I guess something in my nextmachine function or in the collection is wrong so this is where I need your help, if anyone may know what is the problem.
I also want to order each job in each machine in order to the shortest processing time. I have this DB table that right know is not associated to any agent. Does anyone know how to do this?
Thank you in advance

I have to perform more stuff after the parallelization work using Talend Studio. How do I place a connecting OnSubJobOk?

I am trying to implement parallelization within talend. I have it working, but now I don't know how to connect the parallelization work to the next part. Usually, you would click on the previous block and select OnSubjobOk. That option doesn't appear. Is there another component that I need to add that I don't know about?
Under the basic settings of tParallelize you would find the option Wait For. This have two options -
end of first subjob: sequence the relevant subjob to be executed at
the end of the first subjob
end of all subjobs: sequence the relevant subjob to be executed at the end of all
subjobs.
So, all you have to do is connect your next part - your sub job with the tParallelize component by selecting the trigger - synchronize(wait for all). This would ensure once all the parallel subjobs/components are executed the sub job connected with synchronize(wait for all) will be executed.

How do i Re-run pipeline with only failed activities/Dataset in Azure Data Factory V2?

I am running a pipeline where i am looping through all the tables in INFORMATION.SCHEMA.TABLES and copying it onto Azure Data lake store.My question is how do i run this pipeline for the failed tables only if any of the table fails to copy?
Best approach I’ve found is to code your process to:
0. Yes, root cause the failure and identify if it is something wrong with the pipeline or if it is a “feature” of your dependency you have to code around.
1. Be idempotent. If your process ensures a clean state as the very first step, similar to Command Design pattern’s undo (but more naive), then your process can re-execute.
* with #1, you can safely use “retry” in your pipeline activities, along with sufficient time between retries.
* this is an ADFv1 or v2 compatible approach
2. If ADFv2, then you have more options and can have more complex logic to handle errors:
* for the activity that is failing, wrap this in an until-success loop, and be sure to include a bound on execution.
* you can add more activities in the loop to handle failure and log, notify, or resolve known failure conditions due to externalities out of your control.
3. You can also use asynchronous communication to future process executions that save success to a central store. Then later executions “if” I already was successful then stop processing before the activity.
* this is powerful for more generalized pipelines, since you can choose where to begin
4. Last resort I know (and I would love to learn new ways to handle) is manual re-execution of failed activities.
Hope this helps,
J

Talend Force run order of joblets

My company has a couple of joblets that we put in new jobs to do things like initialization of variables, get system information from the database and sending out error / warning emails. The issue we are running into is that if we go ahead and start creating the components of a job and realize that we forgot to include these 3 joblets, we have to basically re-create the job to ensure that the joblets are added first so they run first.
Is there any way to force these joblets to run first and possibly also in a certain order before moving on to the contents of the job being created? Please let me know if there is any information you may need that I'm missing as I have only been using Talend for a few days. The rest of the team has not been using it too much longer than I have, so they do not have the answer I'm looking for either. Thanks in advance!
In Joblets you can use the components Trigger_Input and Trigger_Output as connection-points for on subjob OK triggers. So you can connect joblets and other components in a job with triggers. Thus enforcing execution order.
But you cannot get a on subjob OK trigger from a tPreJob. I am thinking on triggering from a tPreJob to a tWarn (on component OK) and then from tWarn to the joblet (on subjob OK).

Talend job batch processing

I am exploring Talend at work, I was asked if Talend supports batch processing as in running the job in multiple threads. After going through the user guide I understood threading is possible with sub jobs. I would like to know if it is possible to run the a job with a single action in parallel
Talend has excellent multi threading support. There are two basic methods for this. One method gives you more control and is implemented using components. The other method is implemented as job setting.
For the first method see my screenshot. I use tParallelize to load three files into three tables at the same time. Then when all three files are successfully loaded I use the same tParallelize to set the values of a control table. tParallelize can also be connected to tRunJob as easily as a subjob.
The other method is described very well here in Talend Help: Talend Help- Run Jobs in Parallel
Generally I recommend the first method because of the control it gives you, but if your job follows the simple pattern described in the help link, that method works as well.