i've got the following data:
ID;NAME;SKILL
1;JOE;XML
1;JOE;JAVA
1;JOE;ORACLE
2;JOHN;JAVA
2;JOHN;API
I need a counter that will give me this structure:
ID;NAME;COUNTER;SKILL
1;JOE;1;XML
1;JOE;2;JAVA
1;JOE;3;ORACLE
2;JOHN;1;JAVA
2;JOHN;2;API
How can i achieve that in Talend? I tried to use a Number.sequence but i dont know how to get the dependency with the column ID. So every time a new id occures i need to reset the Sequence Number.
Any advice?
You can do it in following way.
yourInput---tJavaRow---tMap--youroutput
Create context variable named as oldID as int.\
In tJavaRow add following code.
if(!input_row.ID.equalsIgnoreCase(context.oldID)){
Numeric.resetSequence("i", 0);
context.oldID=input_row.ID;
}
Add tmap after tJavaRow and add additional column with name COUNTER
In COUNTER column add following code.
Numeric.sequence("i",1, 1);
Now execute your job will get expected output.
my output.
[statistics] connected
1|JOE|1|XML
1|JOE|2|JAVA
1|JOE|3|ORACLE
2|JOHN|1|JAVA
2|JOHN|2|API
[statistics] disconnected
There is a simpler way to do it : you can have a schema like input->tMap->output.
Then in tMap, just add a new Integer column "counter" and fill it with Numeric.sequence(row1.NAME,1,1)
It will automatically restart the sequence every time the input name changes.
Related
so here's my problem. The below works:
=DLookUp("[CylindersCompleted]","WO_User_Input_Save","WorkOrder=331091")
Unfortunately I need 331091 to be Combo4. Once I change the formula to:
=DLookUp("[CylindersCompleted]","WO_User_Input_Save","WorkOrder"= [Combo4]") or
=DLookUp("[CylindersCompleted]","WO_User_Input_Save","WorkOrder"= Combo4) or
=DLookUp("[CylindersCompleted]","WO_User_Input_Save","[WorkOrder]"= [Combo4])
=DLookUp("[CylindersCompleted]","WO_User_Input_Save","[WorkOrder]= [Combo4]")
I've been testing all the variations in Immediate Window and all result in Compile error: Expected: expression. Getting the same error in my other database which is why I created this one. One table, one record along with one unbound form. Table has WorkOrder and CylindersCompleted which are both Number and the form has one Combo and one textbox which are both Number. I'm putting the Dlookup formula in the Control Source of the textbox. I'm hoping someone can help me solve this issue so I can apply it to my other database which is much more complicated. Thanks in advance.
=DLookUp("[CylindersCompleted]","WO_User_Input_Save","WorkOrder"= [Combo4]")
Should be:
=DLookUp("[CylindersCompleted]","WO_User_Input_Save","WorkOrder=" & [Combo4])
I have a task to validate decimal and date field.I am able to validate decimal and date filed on same column but not able to keep old column values.
Input:
id,amt1
1,123
2,321
3,345
4,543
5,789
Current Output:
id,amt1
1,12.3
2,32.1
3,34.5
4,54.3
5,78.9
Expected Output:
id,amt1,original_amt1_values
1,12.3,123
2,32.1,321
3,34.5,345
4,54.3,543
5,78.9,789
Below is the code, I am able to validate decimal filed but not able to keep original values. Kindly help me on this. I want to keep its original column in dataframe itself.
SourceFileDF = SourceFileDF.withColumn("amt1", DecimalConversion(col(amt1)))
DecimalConversion is my UDF and SourceFileDF is my dataframe.
You can use a temporary column name for "amt1" and the use column rename
SourceFileDF.withColumn("amt1_converted", DecimalConversion(col(amt1)))
SourceFileDF.withColumnRenamed("amt1", "original_amt1_values")
SourceFileDF.withColumnRenamed("amt1_converted", "amt1")
You can use select and provide the alias in a single line :
sourceFileDF.select(
DecimalConversion($"amt1").as("amt1") ,
$"amt1".as("original_amt1_values")
)
I have a Talend Job that currently does the following:
Input csv (tFileInputDelimited) --> tMap --> Output csv(tFileInputDelimited)
The goal of my job, is keep a value from the tMap, and use it to rename the output file.
I've tried to use a context and specify the row and the column I want to use, but it didn't work.
I'm a beginner, I use talend during an intership, I started 6 years ago, so I don't know many things ^^
Thank you for you future help !
You can use a tJavaRow to capture the value from the flow and assign it to the variable, the code will be like this :
// get the value of wanted_field of the id 40
if (input.id == 40) context.myvar = input.wanted_field
Your job will look like this:
Input csv (tFileInputDelimited) --> tJavaRow --> tMap --> Output csv(tFileInputDelimited)
I am trying to create some joblets in Talend that will speed up some processes.
I have an input from a MSSQLInput, the results are then sorted and filtered a little. Then I have a tMemorizeRows and a tJavaFlex, the purpose of this is to memorize the rows in a column to preform a count. The count is based on a customer ID, once the the id changes the count starts back to 1 and the proccess begine again and continues to the end. I have refactored this as a joblet but it does not work, the error is:
ID_tMemorizeRows_1 cannot be resolved to a variable
I have a tJavaFlex which starts with
int counte = 1;
The Main code is
if(ID_tMemorizeRows_1[0].equals(ID_tMemorizeRows_1[1]))
{
counte = counte + 1;
}
else
{
counte = 1;
}
context.Enqnum = counte;
The Enqnum variable and is created correctly and added into a tMaps component.
Does anyone know why this is happening, one person told me it is because when you move something to a joblet it gets a new/different name so it has to be specifically called in the Java, if this is the case how do I find the name out?
Thank you
Rich
I do have a resolution. I have tried to add images however my reputation is not high enough.
When using joblets we know that Talend essentially recycles the code used in the joblet by inserting it into the code for the main job.
This is the joblet I have created, i know it works because I have refactored it to a joblet instead of building it from sctatch. What its doing is simply memorises row 0 and row 1 in an ordered data set, the java performs a count and the tMap appends the result to the job (as Mentioned above).
(I will try it inser image in my question, I do not have enough reputation point to insert it into a question).
When the job is run it runs fine. But problems occur when I want to reuse the same joblet in another part of the job. What Talend does is it assigns names within the source code to each component depending on the name of the joblet.
For example, if the Joblet was called ThisJob, then tMemorizeRows_1 would be called ThisJob_1_tMemorizeRows_1.
The row within the component (in this example ReferenceID) would renamed as:
ReferenceID_ThisJob_1_tMemorizeRows_1.
But when you add a second joblet to your job it gives it a new name, eg ThisJob_2. This name will be different depending on how much you have been altering your job before you add the second joblet. Therefore the number within the name will depend on this activity.
If you add the joblet into your job immediately then the joblet would be called ThisJob_2, if you have added 5 other components before you add it in then the joblet is likely to be called ThisJob_6 etc. (I'm not 100% sure how talend renames components)
When you add a joblet, You can see the name of the joblet on the joblet component, this then reverts back the the original joblet name when you create any links/joins to other components.
Its also important that each component within the code is assigned to a variable called currentComponent.
Resolution
What I did was used the Java code to split the name using the code below. This way I can get the current name of the of the joblet and use this name in my Java.
String string = currentComponent;
String[] parts = string.split("_");
String part1 = parts[0];
String part2 = parts[1];
String joblet = part1+'_'+part2;
String newrow = "ReferenceID_"+joblet+"_tMemorizeRows_1";
I hope this makes sense.
Thanks
I have searched all over, and read this post.
But it doesn't seem complete and doesn't work.
The situation: I need to get the last modified file from a directory on the local machine. I then need to pass that file into the fileinputdelimited component.
I currently have:
tfilelist --> iterate --> titeratetoflow --> tsamplerow
-->tflowtoiterate -> tinpufiledelimited ---> tlogrow (just to make sure its pulling the right file)
But it doesn't work. I have configured it. so that titeratetoflow has a column called
"FileName" with "((String)globalMap.get("CURRENT_FILE"))" as the value,
"FileDirectory" with ((String)globalMap.get("CURRENT_FILEDIRECTORY")) as value, and
"FileAndDirectory" with ((String)globalMap.get("CURRENT_FILEPATH")) as value.
The tsamplerow is limited to "1".
The tiflowtoiterate is set so that
"FileNameOnly" is value of "FileName"
"FileDirectoryOnly" is "FileDirectory" and
"FilePathComplete" is "FileAndDirectory"
In the File location field of the tinputfiledelimited, I have "((String)globalMap.get("FilePathComplete"))"
When it runs I get an error saying cannot find file or path. If I cut out the fileinput component and have it send straight to the tlogrow, it shows a single line of blank entry.
Any ideas?
I'm not sure if you've just slightly misconfigured the job here but it seems to work fine for me.
Here's a few screenshots showing my job design:
The only thing I can think of just by looking at your post is that you might have slightly messed up the key value pair combinations in the tFlowToIterate. I tend to find that the default settings there work fine pretty much all of the time and it makes it a little more obvious what it's doing as well.
EDIT: Actually, it looks like you might be using the wrong values in your tIterateToFlow. The tFileList will throw the values for the file paths etc in to the global map but it will preface it with the unique component name. If you hit ctrl+space in the value window it should prompt you with a list of available values (these are also specified in the "Outline" tab of the studio). It typically makes an implicit conversion to String but for this you will need to explicitly convert it so use .toString() instead of (String).
Another way to get last modified file is as below
tFileList(sorted DESC by file modified date) ------> tFixedFlowInput (schema - filename, filenumber) ----->tHashOutput
here in tFixedFlowInput
filename = file(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")+"/"+(String)globalMap.get("tFileList_1_CURRENT_FILE")
filenumber = (Integer)globalMap.get("tFileList_1_NB_FILE")
What above will accomplish is get list of all files in the directory with their number/rank - where the file last modified will have file number =1 and next to that will have 2...and so on.
Now on SubJobOK of above tFileList you can have tHashInput which will read from above tHashOutput and filter only row where filenumber==1 - which means the last modified file.
tHashInput (link to tHashoutput) ---->tFilterRow(filenumber==1)------>tLogRow
One reason why you are getting null is probably you have used globalMap.get("CURRENT_FILEPATH) instead of globalMap.get("tFileList_1_CURRENT_FILEPATH")
The Simple Solution for above problem could be as below:
tFileList(sorted ASC by file modified date)--> tIterateToFlow --> tJava( just to end the subjob).
Then on
subjob ok --> tfileinput ( use (String)globalMap.get("tFileList_1_CURRENT_FILE") or (String)globalMap.get("tFileList_1_CURRENT_FILEPATH") as a file name/file path)
Explanation:
Since tFileList iterates all the files in ASC order, it will always have Latest file name stored in globalMap for the last iteration. The list is only iterated till tIterateToFlow hence after this component (String)globalMap.get("tFileList_1_CURRENT_FILE") will always give the last file name from the iterated list, which is the latest file in out case.
Main Flow :
Component View: