Talend - Combining two rows into one - talend

Sample Input
Here is an example of my input. As you can see, the address column has 2 values which I would like to separate and then combine into one value.
Expected Output
This is what the output should be, Combined values into one cell.
Talend Output
If I read the data into Talend it looks like this:

You should be able to accomplish this by using the tMemorizeRows component in Talend.
A really rough example job might look like:
I'm using a tFixedFlowInput to hardcode some data here rather than reading in an Excel Sheet but it should match what you've provided as an example in the question:
The tMemorizeRows component keeps a specified amount of rows in memory at all times rather than processing things row by row in a flow as normal (although some components will require the entire data set to be in memory such as with a sort). This can then be accessed as an array. You just want to set this to memorise all of the columns and you only need 2 rows in memory at all times:
In this case you need to pull all of the data from the previous row into the next row when you have an empty name so we can access the data held by the tMemorizeRows component using a tJavaRow using the following example code (quickly hacked together):
String name = "";
String address = input_row.address;
String mailingAddress = input_row.mailing_address;
if ("".equals(input_row.name)) {
name = name_tMemorizeRows_1[1];
address = address_tMemorizeRows_1[1] + " " + input_row.address;
mailingAddress = mailing_address_tMemorizeRows_1[1] + " " + input_row.mailing_address;
} else {
name = "DELETE THIS ROW";
address = input_row.address;
mailingAddress = input_row.mailing_address;
}
output_row.name = name;
output_row.address = address;
output_row.mailing_address = mailingAddress;
Notice how I've set the name for the non empty name rows to "DELETE THIS ROW". I can then use a tFilterRow to remove this row from the flow so we are left with only the output we want:
Leaving us with the following output:
.-----------+---------------------------+---------------------.
| Output |
|=----------+---------------------------+--------------------=|
|name |address |mailing_address |
|=----------+---------------------------+--------------------=|
|John Carter|Washington Street USA 12345|PO Box 999 USA 12345 |
|Linda Green|London Road UK E20 2ST |PO Box 998 UK E20 2ST|
'-----------+---------------------------+---------------------'

Related

Creating a For loop that iterates through all the numbers in a column of a table in Matlab

I am a new user of MatlabR2021b and I have a table where the last column (with name loadings) spans multiple sub-columns (all sub-columns were added under the same variable/column and are threated as one column). I wanto to create a For loop that goes through each separate loading column and iterates through them, prior to creating a tbl that I will input into a model. The sub-columns contain numbers with rows corresponding to the number of participants.
Previously, I had a similar analogy where the loop was iterating through the names of different regions of interest, whereas now the loop has to iterate through columns that have numbers in them. First, the numbers in the first sub-column, then in the second, and so on.
I am not sure whether I should split the last column with T1 = splitvars(T1, 'loadings') first or whether I am not indexing into the table correctly or performing the right transformations. I would appreciate any help.
roi.ic = T.loadings;
roinames = roi.ic(:,1);
roinames = [num2str(roinames)];
for iroi = 1:numel(roinames)
f_roiname = roinames{iroi};
tbl = T1;
tbl.(roinames) = T1.loadings(:,roiname);
**tbl.(roinames) = T1.loadings_rsfa(:,roiname)
Unable to use a value of type cell as an index.
Error in tabular/dotParenReference (line 120)
b = b(rowIndices,colIndices)**

Java code to check if a pcollection is empty

I am trying to write a pipeline to insert/update/delete mysql table based on the pubsub messages.While inserting into a particular table, I will have to check if the data exists in other table and do the insert only when the data is available in the other table
I will have to stop the insertion process , when there is no data in the other table(PCollection).
PCollection recordCount= windowedMatchedCollection.apply(Combine.globally(new CountElements()).withoutDefaults());
This piece of line does not seem to help. Any inputs on this please
It's a little unclear exactly what you're trying to do, but this should be achievable with counting elements. For example, suppose you have
# A PCollection of (table, new_row) KVs.
new_data = ...
# A PCollection of (table, old_row) KVs.
old_data = ...
You could then do
rows_per_old_table = old_data | beam.CombinePerKey(
beam.combiners. CountCombineFn())
and use this to filter out your data with a side input.
def maybe_filter_row(table_and_row, old_table_count):
# table_and_row comes from the PCollection new_data
# old_table_count is the side input as a Map
table = table_and_row[0]
if old_table_count.get(table) > 0:
yield table_and_row
new_data_to_update = new_data | beam.FlatMap(
maybe_filter_row,
old_table_count=beam.pvalue.AsMap(rows_per_old_table))
Now your new_data_to_update will contain only that data for tables that had a non-zero number of rows in old_data.
If you're trying to do this in a streaming fashion, everything would have to be windowed, including old_data, and it would filter out only those things that have data in that same window. You could instead do something like
# Create a PCollection containing the set of tables in new_data, per window.
tables_to_consider = (
new_data
| beam.GroupByKey()
| beam.MapTuple(lambda table, rows: table))
rows_per_old_table = tables_to_consider | beam.ParDo(
SomeDoFnLookingUpCurrentSizeOfEachTable())
# Continue as before.

Calculated Time weigthed return in Power BI

Im trying to calculate the Time Weigthed Return for a portfolio of stocks. The formula is:
I have the following data:
Im calculate the TWR (time weigthed return) in Power Bi as:
TWR = productx(tabel1;TWR denom/yield+1)
The grey and blue marked/selected fields are individual single stock. Here you see the TWR for the grey stock is = 0,030561631 and for the blue TWR = 0,012208719 which is correct for the period from 09.03.19 to 13.03.19.
My problem is, when im trying to calculate the TWR for a portfolio of the two stocks, it takes the product og every row. In the orange field I have calculated the correct result in excel. But in Power BI it takes the product of the grey and blue stocks TWR: (0,0305661631 * 0,012208719) = 0,03143468 which is incorrect.
I want to sum(yield for both stocks)/sum(TWRDenominator for both stocks) for both stocks every single date, such that I not end up with two rows (one for each stock) but instead a common number every date for the portfolio.
I have calculated the column TWR denom/yield -1 in a measure like this:
twr denom/yield-1 = CALCULATE(1+sumx(tabel1;tabel1(yield)/sumx(tabel1;tabel1[TwrDenominator])))
How can I solved this problem?
Thank you in advance!
This is one solution to your question but it assumes the data is in the following format:
[Date] | [Stock] | [TWR] | [Yield]
-----------------------------------
[d1] | X | 12355 | 236
[d1] | y | 23541 | 36
[d2] ... etc.
I.e. date is not a unique value in the table, though date-stock name will be.
Then you can create a new calculated table using the following code:
Portfolio_101 =
CalculateTable(
Summarize(
DataTable;
DataTable[Date];
"Yield_over_TWR"; Sum(DataTable[Yield])/Sum(DataTable[TWR_den])+1
);
Datatable[Stock] in {"Stock_Name_1"; "Stock_Name_2"}
)
Then in the new Portfolio_101 create a measure:
Return_101 =
Productx(
Portfolio_101;
Portfolio_101[Yield_over_TWR]
)-1
If using your data I en up with the following table, I have created three calculated tables, one fore each stock and a third (Portfolio_103) with the two combined. In addition I have a calendar table which has a 1:1 relationship between all Portfolio tables.
Hope this helps, otherwise let me know where I've misunderstood you.
Cheers,
Oscar

ag-grid - getColumnGroup() method returning null after hiding all columns in a group

In ag-grid I have a table with a structure like this:
| Temperature | ....
-----------|------|------|------|---------
Date | min | avg | max | ....
-----------|------|------|------|---------
2017-03-01 | 19.5 | 20.2 | 22.0 | ....
2017-03-02 | 18.8 | 20.4 | 21.6 | ....
I want to be able to hide the entire Temperature column group and I do it like this:
get column group by it's name with columnApi.getColumnGroup(groupId)
get column children with getChildren()
loop through all elements and hide/show depending on Column visibility state
The hiding part works ok, but when I want to show the columns again, the getColumnGroup method returns a null object, and I cannot set the columns to be visible again. Any ideas?
The entire code (part of an Angular2 component) looks like this:
toggleColumn(groupId: string) {
let groupColumn = this.dataGridOptions.columnApi.getColumnGroup(groupId);
let children = groupColumn.getChildren();
for (let idx = 0; idx < children.length; idx++) {
let colId: string = children[idx].getUniqueId();
let colState = this.dataGridOptions.columnApi.getColumn(colId);
let colVisibility = colState.isVisible();
this.dataGridOptions.columnApi.setColumnVisible(colId, !colVisibility);
}
}
you can't directly (explanation below).
what you can do is loop over ALL columns and get the parent and check that it matches the groupId.
take a look at following link :
https://github.com/ag-grid/ag-grid/issues/696
the columns always exist, exactly one column for every column def. the
column then has a 'visible' attribute.
the groups are transient and only exists if they are needed (there is
also the concept of 'OriginalColumnGroup' to keep track of what
columns are in each group, but you don't have access to that, it's
internal)
[...]
https://www.ag-grid.com/angular-grid-master-slave/index.php
so, in summary, column groups only exists if the group is showing, and
there can be multiple groups for the same group if the columns are
split. so, that's why the column groups don't return if they are not
visible!

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi