Give rank using field values on table in Talend - talend

Right Now, I have 8 tables that needs to transform into 1, and I need to add Rank to the Output Table.
By using, Amount Collected field from 1 of the 8 table.
Sample:
Table A: amount_assignment
Table B: amount_collected
OutputTable: Rank= 1 (based on the highest collected)
How can I place 1, 2, 3.... on the Output Table field Rank based on the computed 'amount_collected'?

you can try to use your inputdataflow-->tSortRow-->tMap. In tSortrow you can sort data based on amount column you need and then further in tMap you can put a sequence number to every row using Numeric.sequence("sequencename",1,1) in expression for rank_column

Related

reduce function not working in derived column in adf mapping data flow

I am trying to create the derived column based on the condition that met the value and trying to do the summation of multiple matching column values dynamically. So I am using reduce function in ADF derived column mapping data flow. But the column is not getting created even the transformation is correct.
Columns from source
Derived column logic
Derived column data preview without the new columns as per logic
I could see only the fields from source but not the derived column fields. If I use only the array($$) I could see the fields getting created.
Derived column data preview with logic only array($$)
How to get the derived column with the summation of all the fields matching the condition?
We are getting data of 48 weeks forecast and the data to be prepared on monthly basis.
eg: Input data
Output data:
JAN
----
506 -- This is for first record i.e. (94 + 105 + 109 + 103 + 95)
The problem is that the array($$) in the reduce function has only one element, so that the reduce function can not accumulate the content of the matching columns correctly.
You can solve this by using two derived columns and a data flow parameter as follows:
Create derived columns with pattern matching for each month-week you did it before, but put the reference $$ into the value field, instead of the reduce(...) function.
This will create derived columns like jan0, jan1, etc. containing the copy of the original values. For example Week 0 (1 Jan - 7 Jan) => 0jan with value 95.
This step gives you a predefined set of column names for each week, which you can use to summarize the values with specific column names.
Define Data Flow parameters for each month containing the month-week column names in a string array, like this:
ColNamesJan=['0jan' ,'1jan', etc.] ColNamesFeb=['0feb' ,'1feb', etc.] and so on.
You will use these column names in a reduce function to summarize the month-week columns to monthly column in the next step.
Create a derived column for each month, which will contain the monthly totals, and use the following reduce function to sum the weekly values:
reduce(array(byNames($ColNamesJan)), 0, #acc + toInteger(toString(#item)),#result)
Replace the parameter name accordingly.
I was able to summarize the columns dynamically with the above solution.
Please let me know if you need more information (e.g. screenshots) to reproduce the solution.
Update -- Here are the screenshots from my test environment.
Data source (data preview):
Derived columns with pattern matching (settings)
Derived columns with pattern matching (data preview)
Data flow parameter:
Derived column for monthly sum (settings):
Derived column for monthly sum (data preview):

Filter by/Aggregate multiple columns in Power BI

I would like to group by item and count for each store how many rows are there for my sales data.
table:
Id Item Store Qty
1 A store1 5
2 B store1 2
3 A store2 3
4 B store2 10
....
To group by item I tried:
groupby_item = SUMMARIZE(table, table[Item], "Count", COUNT(table[Item]))
which gives the table:
Item Count
A 2
B 2
but I want to introduce a Store slicer in a visual and I couldn't because Store column is absent in the aggregated table. Can I group by Store then by item and count?
Like in Python you could maybe do:
table.groupby('Item').agg({'Store': 'first', 'Id': 'count'})
to keep the Store information by keeping the first value of Store in each item group.
Would you be able to do that in Power BI? Or is there a better way to do this?
Why create an aggregate table in the first place? You can use the base table in the visual and it would reflect any filters on Store.
The default behavior of PBI visuals is to group categorical columns and aggregate numerical ones.

Combine fields in transformation from "get rows from result" + info from a query

I have a PDI transformation that gets 3 fields from a result row:
SEARCH_VALUE
Asset
IP_V4_Address
The next hop is a table input that searches based on search value and returns one column value, something like abcd-1234.
SELECT DISTINCT p.txt_reqID FROM ...
Now, after my table input runs, the resulting stream only has 1 column (the txt_reqID). I'd like my output stream to have 4 columns - the original 3 + the new one from the table input. How do I do that?
Here is the transformation and the input row structure:
This is the table input setup:
I'm only able to access the txt_reqID field after the table input, I can't figure out how to tell it to pass the other 3 through.
You can achieve this by having Select values step after the Get rows from result step. Select Values is required to duplicate your SEARCH_VALUE as you need this field in both SELECT and in the WHERE clause and also it can be used to reorder the fields before table input.
In Table input you can use the query like
SELECT DISTINCT p.txt_reqId, ? as SearchValue, ? as Asset, ? as IPV4 address
FROM ... WHERE d.value like ?
Here is the sample for the same
click here for the image

Pivot Charts in google Sheets by counting non-numeric data?

I have a dataset that I'd like to summarize in chart form. There are about 30 categories whose counts I'd like to display in a bar chart from about 300+ responses. I think a pivot table is probably the best way to do this, but when I create a pivot table and select multiple columns, each new column added gets entered as a sub-set of a previous column. My data looks something like the following
ID Country Age thingA thingB thingC thingD thingE thingF
1 US 5-9 thB thD thF
2 FI 5-9 thA thF
3 GA 5-9 thA thF
4 US 10-14 thC
5 US 10-14 thB thF
6 US 15-18
7 BR 5-9 thA
8 US 15-18 thD thF
9 FI 10-14 thA
So, I'd like to be able to create an interactive chart that showed the counts of "thing" items; I'd then like to be able to filter based upon demographic data (e.g., Country, Age). Notice that the data is non-numeric, so I have to use a CountA to see how many there are in each category.
Is there a simple way to display chart data that summarizes the counts and will allow me to filter based on different criteria?
The query can summarize the data in the form you want. The fact that you have "thA", "thB", etc, instead of "1" complicates the matter, but one can transform the strings to numeric data on the fly.
Assuming the data you've shown is in the cells A1:I10, the following formula will summarize it:
=query({B2:C10, arrayformula(if(len(D2:I10), 1, 0))}, "select Col1, Col2, count(Col3), sum(Col3), sum(Col4), sum(Col5), sum(Col6), sum(Col7) group by Col1, Col2", 0)
Explanation:
{B2:C10, arrayformula(if(len(D2:I10), 1, 0))} creates a table where the first two columns are your B,C (Country, Age) and the other six are filled with 1 or 0 depending on whether the cells in D-I are filled or not.
select Col1, Col2, count(Col3), sum(Col3), ... group by Col1, Col2 selects Country, Age, the total count of rows with this Country-Age combination, the number of rows with thingA for this Country-Age combination, etc.
the last argument, 0, indicates there are no header rows in the table passed to the query.
It's possible to give labels to the columns returned by the query, using label: see query language documentation. It would be something like
label Col1 'Country', Col2 'Age', count(Col3) 'Total count', sum(Col3) 'thingA count', ...
Add a Count column to your data with a "1" for whatever occurrence, this might solve your problem in the Pivot Table. I was just looking for a solution and thought about this. Working now for me.

Insert multiple records into fact table based on fields in single record

I'm working in Pentaho 4.4.1-GA (Kettle / PDI). The database is Postgres.
I need to be able to insert multiple records into a fact table based on the fields that come from a single record. The single record contains fields:
productcode1, price1
productcode2, price2
productcode3, price3
...
productcode10,price10
So if there was a value for each of the 10 productcode / prices then I'd need to insert a total of 10 records into the fact table. If there were values for 4 of the combinations, then I'd need to insert 4 records into the fact table, etcetera. All field values for the fact records would be identical except for the PK (generated by sequence), product codes, and prices.
I figure that I need some type of looping construct which would let me check whether or not a value was present for each productx field, and if so, do an insert/update step on the fact table with the desired field values. I'm just not sure how to do this in Pentaho.
Any ideas? All suggestions are welcome :)
Thank You,
Rakesh
Could you give a sample input and output for your scenario??
From your example data I can infer that if there are 10 different product codes and only 4 product prices you want to have 4 records inserted into your table. Is that so?
Well for a start you can add a constant value of 1 to those records by filtering for NOT NULL and then use an Group BY Step to count the number of 1's. This would give you the count. BTW it would be helpful if you could provide more details on what columns you would be loading as there are ways to make a PDI transformation execute multiple times