I have a talend job which takes the input from a csv file. The CSV file has child job names, and a start date. Right now this is how the job is built
I have a tFileInputDelimited, which takes the input from the file and this connects to the tFlowToIterate, which has the key value pairs
Example :
c1, c2 and c3 which are C1 -->Job1
C2 -->J1
C3 -->1/16/2017
J1 is the name of the child job and C3 has the date.
In the trunjob I have used the "Use Dynamic job" and the Context job is "globalMap.get("c2") which will execute all the child jobs.
Now I need to execute those child jobs whose c3 value is today.
If you are question is continuation of this thread Running Talend child jobs through a parent job, then you can follow below steps,
Below is my input date
ChildJob1, 1/16/2017
ChildJob2, 1/17/2017
ChildJob3, 1/17/2017
I have modified the same job from previous answer with addition tjava component like below,
Below is the code that exist inside tjava component
System.out.println("|-----------------Date from Input file is "+row5.Date.toString()+"------------|");
System.out.println("|-----------------Job name from Input file is "+context.JobName+"---------------------------|");
String input = TalendDate.getDate("DD/MM/yyyy");
SimpleDateFormat inputFormatter = new SimpleDateFormat("DD/MM/yyyy");
Date date = inputFormatter.parse(input); // Getting Today's date in DD/MM/YYYY format
context.IsTodayJob = TalendDate.compareDate(date,row5.Date) == 0 ? true : false;
In tjava component, I am setting the context variable IsTodayJob value by comparing Today's date and the date value from input file.
And I have connected the tjava component with tRunJob component through run if option with below condition
This gave me below result.
Hope this would help you out.
Related
I have 2 Get metadata stages in ADF which is fetching file names from 2 different folders, I need to use these outputs for file name comparison in databricks notebook and return true if all the files are present.
how to pass the output from Get meta data stages to databricks and perform string comparison and
return true if all files are present and return false if even 1 file is missing
How to achieve this?
Please find the below answer which I explained with 1 Get metadata stage , the same can be replicated for more than one also.
Create an ADF pipeline with below activities.
Now in the Get Metadata activity , add the childItems in the Fieldlist as argument, to pass the output of Get Metadata to Notebook as show below
In the Databricks Notebook activity , add the below parameter as Base Paramter which will capture the output of Get Metadata and pass as input paramater to Notebook. Generally this parameter will of object datatype , but I converted to string datatype to access the names of files in the notebook as show below
#string(activity('Get Metadata1').output.childItems)
Now we can able to access the Get Metadata output as string in the notebook.
import ast
required_filenames = ['File1.csv','File2.csv','File3.csv'] ##This is for comparing with the output we get from GetMetadata activity.
metadata_value = dbutils.widgets.get('metadata_output') ##Accessing the output from Get Metadata and storing into a variable using databricks widgets.
metadata_list = ast.literal_eval(metadata_value) ##Converting the above string datatype to the list datatype.
blob_output_list=[] ##Creating an empty list to add the names of files we get from GetMetadata activity.
for i in metadata_list:
blob_output_list.append(i['name']) ##This will add all the names of files from blob storage to the empty list we created above.
validateif = all(item in blob_output_list for item in required_filenames) ##This validateif variable now compare both the lists using list comprehension and provide either True or False.
I tried in the above way and can able to solve the provided requirement. Hope this helps.
Request to please upvote the answer if this helps in your requirement.
I have a talend job that create folder based on account ID on a specific folder(C/LogDetails).
Job run every 5 mins and because of this I have no space left in the directory and this prevent job from creating more folders based on account ID.
In short because of lack of space in the folder(C/LogDetails) the job failed.
I want to build a solution in talend that will delete all folders where date modified must be less than today's date.
in tFileList give the parent folder path c/LogDetails and select 'directories' in FileList type dropdown.
in tFileProperties component use the global variable ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")). it will iterate all the folders inside your parent folder because you have selected fileList type as directories in tFileList component.
in tJavaRow use below code
if(TalendDate.compareDate(TalendDate.parseDate("yyyy-MM-dd", TalendDate.getDate("yyyy-MM-dd")),
TalendDate.parseDate("E MMM dd HH:mm:ss Z yyyy", row1.mtime_string)) == 1) {
context.abs_path = input_row.abs_path;
System.out.println("if : "+context.abs_path);
}
join tjavarow with if condition to tFileDelete component. If conditon should be context.abs_path should not be null or empty. give the context.abs_path in tFileDelete and select delete folder option
hope this help..:)
I have a Talend Job that currently does the following:
Input csv (tFileInputDelimited) --> tMap --> Output csv(tFileInputDelimited)
The goal of my job, is keep a value from the tMap, and use it to rename the output file.
I've tried to use a context and specify the row and the column I want to use, but it didn't work.
I'm a beginner, I use talend during an intership, I started 6 years ago, so I don't know many things ^^
Thank you for you future help !
You can use a tJavaRow to capture the value from the flow and assign it to the variable, the code will be like this :
// get the value of wanted_field of the id 40
if (input.id == 40) context.myvar = input.wanted_field
Your job will look like this:
Input csv (tFileInputDelimited) --> tJavaRow --> tMap --> Output csv(tFileInputDelimited)
i've got the following data:
ID;NAME;SKILL
1;JOE;XML
1;JOE;JAVA
1;JOE;ORACLE
2;JOHN;JAVA
2;JOHN;API
I need a counter that will give me this structure:
ID;NAME;COUNTER;SKILL
1;JOE;1;XML
1;JOE;2;JAVA
1;JOE;3;ORACLE
2;JOHN;1;JAVA
2;JOHN;2;API
How can i achieve that in Talend? I tried to use a Number.sequence but i dont know how to get the dependency with the column ID. So every time a new id occures i need to reset the Sequence Number.
Any advice?
You can do it in following way.
yourInput---tJavaRow---tMap--youroutput
Create context variable named as oldID as int.\
In tJavaRow add following code.
if(!input_row.ID.equalsIgnoreCase(context.oldID)){
Numeric.resetSequence("i", 0);
context.oldID=input_row.ID;
}
Add tmap after tJavaRow and add additional column with name COUNTER
In COUNTER column add following code.
Numeric.sequence("i",1, 1);
Now execute your job will get expected output.
my output.
[statistics] connected
1|JOE|1|XML
1|JOE|2|JAVA
1|JOE|3|ORACLE
2|JOHN|1|JAVA
2|JOHN|2|API
[statistics] disconnected
There is a simpler way to do it : you can have a schema like input->tMap->output.
Then in tMap, just add a new Integer column "counter" and fill it with Numeric.sequence(row1.NAME,1,1)
It will automatically restart the sequence every time the input name changes.
I have a scenario where I would like to skip a component to execute based on the condition and run its consecutive components in Talend.
Is it at all possible?
You have two options available to you for conditionally executing parts of your job.
Where the component that follows your conditional check can be a starting component (if you drop it to the canvas then it should have a green background) then you can use the Run if connector to link it to the previous part of your job like so:
In this example we simply call another tJava component conditionally but this could be any component that is startable.
Where the first tJava component (Set condition boolean) is configured with the following code:
Boolean condition = false;
globalMap.put("condition",condition);
And the two Run if connectors are set as ((Boolean)globalMap.get("condition")) == true and ((Boolean)globalMap.get("condition")) == false respectively.
A better option may be to use the filtering in a tMap or tFilterRow component and this also allows you to link to components that aren't starting components. To do this you would set your job up as below:
In this job I have hard coded some tabular data in a tFixedFlowInput component:
We then use a tMap to filter the flows of data to any following components:
In which we test the value of the boolean condition column of our data. As an illustration I have also applied some simple, conditional transformation to the data where "true" rows have 1000 added to their value and "false" rows have 100 subtracted from their value.
From here you can then carry on the flow of your job as normal, in this case we link to a tSystem component to execute system commands as per your comment.
I've mocked up a job for you:
I have a context variable called: startFrom
It can be accessed with context.startFrom
I've placed a tJava with a few tWarns:
I use 4 context settings:
Default
Normal
Opt
two
So my Job:
Does nothing
Start from "Start"
Start from "Optional_Start"
Start from "RecoverFromHere"
If settings are the following:
context.startFrom.equals("opt")
Recovery and Recovery1 prints out their names using System.out
If I start my job I can select where I want to start it. If I don't select anything: context value is null, it won't do anything.
you can not use prejob as it does not have runif trigger, but you can do like this
prejob -->oncomponentok-->tJava (in here you poupulate you evaluate your condition say as given below)---->RUN IF Trigger - you put your condition here..((String)globalMap.get("var_myCondition")).equals("true") --->component to run in true condition
--->RUN IF Trigger on (tJava) ---((String)globalMap.get("var_myCondition")).equals("false")--->component to run in false condition
in short your job would be like
prejob-->tJava---(RUNIF TRIGGER)------>component/flow to run in true condition
---(RUNIF TRIGGER)------>component/flow to run in false condition
tJava code
String myCondition="false";
globalMap.put("var_myCondition",myCondition);