Loading data based on condition of target table in talend

Loading data based on condition of target table in talend - talend

I have a table(target) which has 5 rows, one of the columns name:slno have values (12,13,14,34,56), I need to load data from my source table to target based on the max value of target.
Example :
If in the source table for slno column values are (12,13,14,34,56,88,89,90,99) then only (88,89,90,99) values should go to target (along with all row values), basically I need to find max from target and based on that I need to load rows after that value.
I tried using tJavaRow, tSetGlobalVar, tAggregateRow, but not able to figure out how to map.

There are many ways you can do this.
If your source and target tables are on the same database, you can filter your source query like this:
select *
from source
where slno > (select max(slno) from target)
And then load the rows in your target table.
But if they are not, you can do it in Talend :
The lookup on target gets the max value of slno :
SELECT max(slno)
FROM target
Its schema contains only one column (max_slno):
And inside the tMap, only send the rows where the source's slno is greater than the maximum value of target's slno :

Related

reduce function not working in derived column in adf mapping data flow

I am trying to create the derived column based on the condition that met the value and trying to do the summation of multiple matching column values dynamically. So I am using reduce function in ADF derived column mapping data flow. But the column is not getting created even the transformation is correct.
Columns from source
Derived column logic
Derived column data preview without the new columns as per logic
I could see only the fields from source but not the derived column fields. If I use only the array($$) I could see the fields getting created.
Derived column data preview with logic only array($$)
How to get the derived column with the summation of all the fields matching the condition?
We are getting data of 48 weeks forecast and the data to be prepared on monthly basis.
eg: Input data
Output data:
JAN
----
506 -- This is for first record i.e. (94 + 105 + 109 + 103 + 95)

The problem is that the array($$) in the reduce function has only one element, so that the reduce function can not accumulate the content of the matching columns correctly.
You can solve this by using two derived columns and a data flow parameter as follows:
Create derived columns with pattern matching for each month-week you did it before, but put the reference $$ into the value field, instead of the reduce(...) function.
This will create derived columns like jan0, jan1, etc. containing the copy of the original values. For example Week 0 (1 Jan - 7 Jan) => 0jan with value 95.
This step gives you a predefined set of column names for each week, which you can use to summarize the values with specific column names.
Define Data Flow parameters for each month containing the month-week column names in a string array, like this:
ColNamesJan=['0jan' ,'1jan', etc.] ColNamesFeb=['0feb' ,'1feb', etc.] and so on.
You will use these column names in a reduce function to summarize the month-week columns to monthly column in the next step.
Create a derived column for each month, which will contain the monthly totals, and use the following reduce function to sum the weekly values:
reduce(array(byNames($ColNamesJan)), 0, #acc + toInteger(toString(#item)),#result)
Replace the parameter name accordingly.
I was able to summarize the columns dynamically with the above solution.
Please let me know if you need more information (e.g. screenshots) to reproduce the solution.
Update -- Here are the screenshots from my test environment.
Data source (data preview):
Derived columns with pattern matching (settings)
Derived columns with pattern matching (data preview)
Data flow parameter:
Derived column for monthly sum (settings):
Derived column for monthly sum (data preview):

ADF map source columns startswith to sink columns in SQL table

I have a ADF data flow with many csv files as a source and a SQL database as a sink. The data in the csv files are similar with 170 plus columns wide however not all of the files have the same columns. Additionally, some column names are different in each file, but each column name starts with the same corresponding 3 digits. Example: 203-student name, 644-student GPA.
Is it possible to map source columns using the first 3 characters?

Go back to the data flow designer and edit the data flow.
Click on the parameters tab
Create a new parameter and choose string array data type
For the default value as per your requirement, enter ['203-student name','203-student grade',’203-student-marks']
Add a Select transformation. The Select transformation will be used to map incoming columns to new column names for output.
We're going to change the first 3 column names to the new names defined in the parameter
To do this, add 3 rule-based mapping entries in the bottom pane
For the first column, the matching rule will be position==1 and the name will be $parameter11
Follow the same pattern for column 2 and 3
Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name.
Reference - https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-dynamic-columns#parameterized-column-mapping

how to multiply variable to each element of a column in database

I am trying to add a column to a collection by multiplying the 0.9 to existing database column recycling. but I get a run time error.
I tried to multiply 0.9 direction in the function but it is showing error, so I created the class and multiplied it there yet no use. what could be the problem?

Your error message is telling you what the problem is: your database query is using GROUP BY in an invalid way.
It doesn't make sense to group by one column and then select other columns (you've selected all columns in your case); what values would they contain, since you haven't grouped by them as well (and get one row returned per group)? You either have to group by all the columns you're selecting for, and/or use aggregates such as SUM for the non-grouped columns.
Perhaps you meant to ORDER BY that column (orderBy(dt.recycling.asc()) if ascending order in QueryDSL format), or to select all rows with a particular value of that column (where(dt.recycling.eq(55)) for example)?

Combine fields in transformation from "get rows from result" + info from a query

I have a PDI transformation that gets 3 fields from a result row:
SEARCH_VALUE
Asset
IP_V4_Address
The next hop is a table input that searches based on search value and returns one column value, something like abcd-1234.
SELECT DISTINCT p.txt_reqID FROM ...
Now, after my table input runs, the resulting stream only has 1 column (the txt_reqID). I'd like my output stream to have 4 columns - the original 3 + the new one from the table input. How do I do that?
Here is the transformation and the input row structure:
This is the table input setup:
I'm only able to access the txt_reqID field after the table input, I can't figure out how to tell it to pass the other 3 through.

You can achieve this by having Select values step after the Get rows from result step. Select Values is required to duplicate your SEARCH_VALUE as you need this field in both SELECT and in the WHERE clause and also it can be used to reorder the fields before table input.
In Table input you can use the query like
SELECT DISTINCT p.txt_reqId, ? as SearchValue, ? as Asset, ? as IPV4 address
FROM ... WHERE d.value like ?
Here is the sample for the same
click here for the image

Displaying columns that contain no data

These is a sample data that I'm getting right now:
... where row 3 of the header is a 'Priority'. Group 3 has no records with Priority 1, 3, or 4, so the report only shows 1 column with Priority 2 for that group. What I need is to display all 4 columns for each group, even if there are no records with this priority, it just should have zeros as a count in that column. Here is an example of what this should look like:
Any help is greatly appreciated!

Depending on how your database is set up, try this:
Go to file -> report options -> and check "Convert Database NULL Values to Default" and "Convert Other NULL Values to Default." The reason why it's doing this, is because the value is not 0, it is actually just null.

You would have to create an 'expected' table with the PK values from your source table. Including even the ones without data, then left outer join the expected table with your actual source.
Or you can use a 'union all' on the rows from your source table and the stub rows with values of zero instead of null.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Loading data based on condition of target table in talend - talend

Related

reduce function not working in derived column in adf mapping data flow

ADF map source columns startswith to sink columns in SQL table

how to multiply variable to each element of a column in database

Combine fields in transformation from "get rows from result" + info from a query

Displaying columns that contain no data

Categories

Resources