Passing value from TpostgresSql to context variable - talend

I need to pass value from Tpostgressql to context variable,so that context variable value can be used in other components
The query used in tpostgres is :
select max(started_on) started_on from etl_log
I have created a context variable started_on_date (date datatype)
In the Tjavarow :-
context.started_on_date =row1.started_on
But it throws
error created_on variable cannot be resolved or is not a field

Have you defined the schema in the tPostrgesqlInput component ? If not, that needs to be done first. Afterward, synchronize the schema of the tJavaRow. You can use the Java row's code generation feature, if appropriate .
Question / if you want to do row-based processing in the same job, there is likely no need to put the started date in the context.
If you want to do non-row based processing, you can used the tJavaRow component to put the value in the globalMap. This assumes there is only one row of data or that you only care about the last row. Then, you can use that value in other components which are not processing a flow (rows). tJava is an example of that.

Related

ADF Copy function comparing watermark against isnull(date1,date2)

Forum Newbie...
I want to utilise the ADF Copy function, to carry out incremental table extracts from one Azure DB to another. Every table in the database that I need all have the same 2 relevant fields i.e. date1, date2. For Watermark comparison purposes, I need to use isnull(date1,date2), but unsure how to do this, i.e. I am not sure how I can add this consistent derived value to the Source as an additional field that can perhaps be added via the Query or Stored Procedure Option on the source, to utilise the #item().source.schema and #item().source.table values that have already been generated as parameters..?
You can use the query option in the Copy data activity source and add a new column in the query itself to get the results of isnull(date1,date2) and include the parameter values to get the table name instead of hardcoding them as shown below.
In source, select Query option under Use query and add dynamic content to concat() select statement with parameter values.
#concat('select *, isnull(date1,date2) as final_dt from ',pipeline().parameters.schema,'.',pipeline().parameters.table)
Sink table data output:

Azure Data Flow ( Data Flow) - First row field value as custom field to remaining rows

I am creating DataFlow in ADF. my requirement is to read first row one field value and make it as session id for rest of the rows. I looked into the expressions but didn't find much functions that will help on this.
ex: Source file in blob :---------------
time ,phone
2020-01-31 10:00:00,1234567890
2020-01-31 10:10:00,9876543219
Target should be :-----------------
SessionID , time, Phone
20200131100000,2020-01-31 10:00:00,1234567890
20200131100000,2020-01-31 10:10:00,9876543219
SessionIID is a derived column. i need to read first row of time and remember that time and apply to all rows for sessionID.
How to read first row time value and keep it in global variable ?
any inputs are appreciated.
You can use Lookup activity in pipeline(check First row only option) and pass time value to Data Flow parameter. Then use Derived Column transform in Data Flow to add SessionID column.
Details:
check First row only option in Lookup activity
use this expression to get your expected value:
#replace(replace(replace(activity('Lookup1').output.firstRow.time,'-',''),' ',''),':','')
3.pass value of this variable to parameter in Data Flow.
4.add Session column in Data Flow.

How to assign csv field value to SQL query written inside table input step in Pentaho Spoon

I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.

Talend Data Itegration: Avoid nulls coming out of tExtractXMLField?

I have this simple flow in Talend DI 6 (simplified for posting on SO):
The last step crashes with a NullPointerException, because missing XML attributes are returned as null.
Is there a way to get empty string values instead of nulls?
For now I'm using a tReplace step to remove nulls as a work-around, but it's tedious and adds to the cost of maintenance by creating one more place where the list of attributes needs to be maintained.
In Talend DI 5.6.2 it is possible to add default data values to the schema. The column in the schema is called "Default". If you expect strings, you can set an empty string, which is set if the column value is null:
Talend schema view with Default column
Works also for other data types. Talend DI 6 should still be able to do this, although the field might be renamed.

How to conditionally execute something based on previous processed number of rows?

I want to execute some subjob if the previously processed number of rows are greater than N. To do this, i'm using the following configuration:
tFixedFlowInput have some rows.
tAggregateRow uses the count function and outputs one row with the number.
tSetGlobalVar then stores this value into a global var that I can check in the Run If connector (In this case, (Integer)globalMap.get("tSetGlobalVar_1") > 3 ).
tMsgBox then shows if the condition is true.
What I would like is to do the same, but in a more elegant way, using the minimum components required. I would like to connect tAggregateRow with the Run If connector directly (or even tFixedFlow) with tMsgBox, but I haven't found a way to refer to the number of rows previously processed without using the output row2.count variable.
How could I do something like this?
What should I put in the If condition to refer to the tAggregateRow operation result without connecting it to another meaningless component like exposed at the beginning?
for any talend component look under outline tab under the left side workspace pane at the bottom. this lists down the properties available via global variables for that component. Some properties like count of records inserted by output components are only available once the component is executed completely (After).
for your case you can try directly using ((Integer)globalMap.get("tFixedFlowInput_1_NB_LINE")) which gives number of lines (after) given by tFixedFlowInput.