How can I count multiple foreign keys and group them in a row? - talend

I have one table with the following structure:
ID|FK1|FK2|FK3|FK4
ID|FK1|FK2|FK3|FK4
ID|FK1|FK2|FK3|FK4
And another table that holds:
FK|DATA
FK|DATA
FK|DATA
FK|DATA
The FKn columns in the first table references the FK field in the second one. There can be more than one record linked between the first table and the second one.
What I want to achieve is to create another table with the total number of records of every FKn linked. For example:
ID|FK1|FK2|FK3|FK4
A|0 |23 |9 |3
B|4 |0 |2 |0
I know how to transform the row flow and iterate over every FKn field. I also know how to count. What Im not able to do is to group every FKn count from the same ID into one row, because after I use a tLoop component, every count operation is transformed into a new row like:
FK|count
FK|count
FK|count
FK|count
...
Any idea about how to join rows by packing N of them into one single row each time? Or is there another way to do it?
NOTE: I'm using text data as input

if i understood your issue then i would suggest a different way as below
(provided you have fixed number of FK1, FK2, FK3, FK4)
tFileInput-->Tmap (left join to lookup tAnotherTable(FK DATA) on FK1) -->output-1 i will have four columns - ID, FK1=(0 or 1 - 0 if no matching row is found, 1 if matching row is found), FK2=0, FK3=0, FK4=0. This could be a case where for same ID we can get many FK1 values as you mentioned there can be more than one rows)..
Similarly i will have
tFileInput-->Tmap (left join to lookup tAnotherTable(FK DATA) on FK2) -->output-2 i will have four columns - ID, FK1=0, FK2=(0 or 1 - 0 if no matching row is found, 1 if matching row is found), FK3=0, FK4=0.
tFileInput-->Tmap (left join to lookup tAnotherTable(FK DATA) on FK3) -->output-3 i will have four columns - ID, FK1=0, FK2=0, FK3=(0 or 1 - 0 if no matching row is found, 1 if matching row is found), FK4=0.
.....
...
Next i will Union all these Output-1, output-2, output-3, output-4 into a final result set say union all ResultUnionall and will move them to tagg and group by ID column and take SUM(FK1), SUM(FK2), SUM(FK3)...
to summarize you job will look something as below
tFileInput--->tmap(withlookup)-->tHash1/tFileOutput1
tFileInput--->tmap(withlookup)-->tHash1/tFileOutput2
tFileInput--->tmap(withlookup)-->tHash1/tFileOutput3
tFileInput--->tmap(withlookup)-->tHash1/tFileOutput4
tHashInput1/tFIleInput1---
tHashInput1/tFIleInput1---
tHashInput1/tFIleInput1----
tHashInput1/tFIleInput1--- tUnite--->tAgg-->finaloutput.

Related

Postgres get null if row doesn't exist in where clause

I've a postgres table with data like
Name
Attendance
Jackie
2
Jade
5
Xi
10
Chan
15
In my query I want all present by name, and if name doesn't exist return "null" instead of no row for that particular name
Eg
query where Name in ('Jackie', 'Jade', 'Cha', 'Xi')
Should return
Name
Attendance
Jackie
2
Jade
5
NULL
NULL
Xi
10
To produce the desired rows, you need to join with a table or set of rows which has all those names.
You can do this by inserting the names into a temp table and joining on that, but in Postgres you can turn an array of names into a set of rows using unnest. Then left join with the table to return a row for every value in the array.
select attendances.*
from
unnest(ARRAY['Jackie','Jade','Cha','Xi']) as names(name)
left join attendances on names.name = attendances.name;

Best way to maintain an ordered list in PostgreSQL?

Say I have a table called list, where there are items like these (the ids are random uuids):
id rank text
--- ----- -----
x 0 Hello
x 1 World
x 2 Foo
x 3 Bar
x 4 Baz
I want to maintain the property that rank column always goes from 0 to n-1 (n being the number of rows)---if a client asks to insert an item with rank = 3, then the pg server should push the current 3 and 4 to 4 and 5, respectively:
id rank text
--- ----- -----
x 0 Hello
x 1 World
x 2 Foo
x 3 New Item!
x 4 Bar
x 5 Baz
My current strategy is to have a dedicated insertion function add_item(item) that scans through the table, filter out items with rank equal or greater than that of the item being inserted, and increment those ranks by one. However, I think this approach will run into all sorts of problems---like race conditions.
Is there a more standard practice or more robust approach?
Note: The rank column is completely independent of rest of the columns, and insertion is not the only operation I need to support. Think of it as the back-end of a sortable to-do list, and the user can add/delete/reorder the items on the fly.
Doing verbatim what you suggest might be difficult or not possible at all, but I can suggest a workaround. Maintain a new column ts which stores the time a record is inserted. Then, insert the current time along with rest of the record, i.e.
id rank text ts
--- ----- ----- --------------------
x 0 Hello 2017-12-01 12:34:23
x 1 World 2017-12-03 04:20:01
x 2 Foo ...
x 3 New Item! 2017-12-12 11:26:32
x 3 Bar 2017-12-10 14:05:43
x 4 Baz ...
Now we can easily generate the ordering you want via a query:
SELECT id, rank, text,
ROW_NUMBER() OVER (ORDER BY rank, ts DESC) new_rank
FROM yourTable;
This would generate 0 to 5 ranks in the above sample table. The basic idea is to just use the already existing rank column, but to let the timestamp break the tie in ordering should the same rank appear more than once.
you can wrap it up to function if you think its worth of:
t=# with u as (
update r set rank = rank + 1 where rank >= 3
)
insert into r values('x',3,'New val!')
;
INSERT 0 1
the result:
t=# select * from r;
id | rank | text
----+------+----------
x | 0 | Hello
x | 1 | World
x | 2 | Foo
x | 3 | New val!
x | 4 | Bar
x | 5 | Baz
(6 rows)
also worth of mention you might have concurrency "chasing condition" problem on highly loaded systems. the code above is just a sample
You can have a “computed rank” which is a double precision and a “displayed rank” which is an integer that is computed using the row_number window function on output.
When a row is inserted that should rank between two rows, compute the new rank as the arithmetic mean of the two ranks.
The advantage is that you don't have to update existing rows.
The down side is that you have to calculate the displayed ranks before you can insert a new row so that you know where to insert it.
This solution (like all others) are subject to race conditions.
To deal with these, you can either use table locks or serializable transactions.
The only way to prevent a race condition would be to lock the table
https://www.postgresql.org/docs/current/sql-lock.html
Of course this would slow you down if there are lots of updates and inserts.
If can somehow limit the scope of your updates then you can do a SELECT .... FOR UPDATE on that scope. For example if the records have a parent_id you can do a select for update on the parent record first and any other insert who does the same select for update would have to wait till your transaction is done.
https://www.postgresql.org/docs/current/explicit-locking.html#:~:text=5.-,Advisory%20Locks,application%20to%20use%20them%20correctly.
Read the section on advisory locks to see if you can use those in your application. They are not enforced by the system so you'll need to be careful of how you write your application.

Min value with GROUP BY in Power BI Desktop

id datetime new_column datetime_rankx
1 12.01.2015 18:10:10 12.01.2015 18:10:10 1
2 03.12.2014 14:44:57 03.12.2014 14:44:57 1
2 21.11.2015 11:11:11 03.12.2014 14:44:57 2
3 01.01.2011 12:12:12 01.01.2011 12:12:12 1
3 02.02.2012 13:13:13 01.01.2011 12:12:12 2
3 03.03.2013 14:14:14 01.01.2011 12:12:12 3
I want to make new column, which will have minimum datetime value for each row in group by id.
How could I do it in Power BI desktop using DAX query?
Use this expression:
NewColumn =
CALCULATE(
MIN(
Table[datetime]),
FILTER(Table,Table[id]=EARLIER(Table[id])
)
)
In Power BI using a table with your data it will produce this:
UPDATE: Explanation and EARLIER function usage.
Basically, EARLIER function will give you access to values of different row context.
When you use CALCULATE function it creates a row context of the whole table, theoretically it iterates over every table row. The same happens when you use FILTER function it will iterate on the whole table and evaluate every row against the filter condition.
So far we have two row contexts, the row context created by CALCULATE and the row context created by FILTER. Note FILTER use the EARLIER to get access to the CALCULATE's row context. Having said that, in our case for every row in the outer (CALCULATE's row context) the FILTER returns a set of rows that correspond to the current id in the outer context.
If you have a programming background it could give you some sense. It is similar to a nested loop.
Hope this Python code points the main idea behind this:
outer_context = ['row1','row2','row3','row4']
inner_context = ['row1','row2','row3','row4']
for outer_row in outer_context:
for inner_row in inner_context:
if inner_row == outer_row: #this line is what the FILTER and EARLIER do
#Calculate the min datetime using the filtered rows
...
...
UPDATE 2: Adding a ranking column.
To get the desired rank you can use this expression:
RankColumn =
RANKX(
CALCULATETABLE(Table,ALLEXCEPT(Table,Table[id]))
,Table[datetime]
,Hoja1[datetime]
,1
)
This is the table with the rank column:
Let me know if this helps.

how to get grouped query data from the resultset?

I want to get grouped data from a table in sqlite. For example, the table is like below:
Name Group Price
a 1 10
b 1 9
c 1 10
d 2 11
e 2 10
f 3 12
g 3 10
h 1 11
Now I want get all data grouped by the Group column, each group in one array, namely
array1 = {{a,1,10},{b,1,9},{c,1,10},{h,1,11}};
array2 = {{d,2,11},{e,2,10}};
array3 = {{f,3,12},{g,3,10}}.
Because i need these 2 dimension arrays to populate the grouped table view. the sql statement maybe NSString *sql = #"SELECT * FROM table GROUP BY Group"; But I wonder how to get the data from the resultset. I am using the FMDB.
Any help is appreciated.
Get the data from sql with a normal SELECT statement, ordered by group and name:
SELECT * FROM table ORDER BY group, name;
Then in code, build your arrays, switching to fill the next array when the group id changes.
Let me clear about GroupBy. You can group data but that time its require group function on other columns.
e.g. Table has list of students in which there are gender group mean Male & Female group so we can group this table by Gender which will return two set . Now we need to perform some operation on result column.
e.g. Maximum marks or Average marks of each group
In your case you want to group but what kind of operation you require on price column ?.
e.g. below query will return group with max price.
SELECT Group,MAX(Price) AS MaxPriceByEachGroup FROM TABLE GROUP BY(group)

Can we edit a row by updating sequence number of record 3 times while inserting those 3 into an array and insert to table?

Suppose I fetch valid rows from table where marks_colm = '300' and we get 100 rows
For Each fetched row, I'd like to:
create 3 new rows:
increase max count of sequence_column by +1 set marks ='350'
again increase max count of sequence_column by +1 set marks ='351'
again increase max count of sequence_column by +1 set marks ='352'
copy these three rows to an array ..
insert the whole array into the table
Example
input row:
Name1 ... RollNo31.... sequence5 ... marks300
output should be
3 output rows for each one of the input row above
Name1 ... RollNo31.... sequence6 ... marks350
Name1 ... RollNo31.... sequence7 ... marks351
Name1 ... RollNo31.... sequence8 ... marks352
How can I achieve this?
I believe you can achieve your goal using multi-row insert. Note that, because there may be multiple errors encountered due to your inserting multiple rows, you must use the get diagnostics statement to retrieve details of any errors that may occur, DSNTIAR will be insufficient.