Use self joins to collapse linked rows in POSTGRESQL - postgresql

Suppose I have a table in the format:
+======+=========+=======+============+
| code | type | Price | parentCode |
+======+=========+=======+============+
| TR1 | initial | 100 | -1 |
+------+---------+-------+------------+
| TR2 | losing | 70 | TR1 |
+------+---------+-------+------------+
| TR3 | winning | 150 | TR1 |
+------+---------+-------+------------+
Which for example, represented a trade placed by a user at a price of 100 (TR1), and the following trades (TR2, TR3) are automatic trades that will be placed if the price hits the value specified.
Using the fact that TR2 and TR3 are linked to the initial trade using the parentCode, how would I use this relationship to generate the following table by collapsing the entries into a single row:
+=============+============+=============+==============+=============+==============+
| initialCode | losingCode | winningCode | initialPrice | losingPrice | winningPrice |
+=============+============+=============+==============+=============+==============+
| TR1 | TR2 | TR3 | 100 | 70 | 150 |
+-------------+------------+-------------+--------------+-------------+--------------+

Assuming these are always in groups of one each of initial, losing, and winning.
select i.code as initialCode,
l.code as losingCode,
w.code as winningCode,
i.price as initialPrice,
l.price as losingPrice,
w.price as winningPrice
from trades i
join trades l on l.parentCode = i.code and l.type = 'losing'
join trades w on w.parentCode = i.code and w.type = 'winning'
where i.type = 'initial';
This can also be done without joining. Ask in the comments if you want to see that.
If you have more than just the three types, then you can follow the above pattern, but the self-joins will start adding up.
To do this without all the joins, follow this pattern:
with by_parent as (
select case
when parentCode = '-1' then code
else parentCode
end as parentCode,
type, price, code
from trades
)
select parentCode as initialCode,
max(code) filter (where type='losing') as losingCode,
max(code) filter (where type='winning') as winningCode,
max(code) filter (where type='trailLosing') as trailLosingCode,
max(code) filter (where type='guaranteedLosing') as guaranteedLosingCode,
max(price) filter (where type='initial') as initialPrice,
max(price) filter (where type='losing') as losingPrice,
max(price) filter (where type='winning') as winningPrice,
max(price) filter (where type='trailLosing') as trailLosingPrice,
max(price) filter (where type='guaranteedLosing') as guaranteedLosingPrice
from by_parent
group by parentCode
;
initialcode | losingcode | winningcode | traillosingcode | guaranteedlosingcode | initialprice | losingprice | winningprice | traillosingprice | guaranteedlosingprice
-------------+------------+-------------+-----------------+----------------------+--------------+-------------+--------------+------------------+-----------------------
TR1 | TR2 | TR3 | | | 100 | 70 | 150 | |
(1 row)
If you have control over your source data, then you can get rid of the by_parent CTE if you can have the parentCode set to the code for the initial rows.

Related

Postgres Hierarchy output

im struggling on how to get the correct output using hierarchy query.
I have one table which loads per day all product and its price. during time this can cancel and being activate again.
I believe with oracle we could use the Connect By.
WITH RECURSIVE cte AS (
select min(event_date) event_date, item_code,sum(price::numeric)/1024/1024 price, 1 AS level
from rdpidevdat.raid_r_cbs_offer_accttype_map where product_type='cars' and item_code in ('Renault')
group by item_code
UNION ALL
SELECT e.event_date, e.item_code, e.price, cte.level + 1
from (select event_date, item_code,sum(price::numeric)/1024/1024 price
from rdpidevdat.raid_r_cbs_offer_accttype_map where product_type='cars' and item_code in ('9859')
group by event_date,item_code) e join cte ON e.event_date = cte.event_date and e.item_code = cte.item_code
)
SELECT *
FROM cte where item_code in ('Renault') ;
how do i put an ouput where will have the range of each product during time?
if we have the data:
EVENT_DATE | ITEM_COD| PRICE
20210910 | Renaut | 2500
20210915 | Renaut | 2500
20210920 | Renaut | 2600
20211020 | Renaut | 2900
20220101 | Renaut | 2500
the expected output should be:
-------------------------------------------------
FROM_EVENT_DATE | TO_EVENT_DATE | ITEM_COD| PRICE
20210910 | 20210915 | Renaut | 2500
20210915 | 20210920 | Renaut | 2600
20210920 | 20211020 | Renaut | 2900
20211020 | 20220101 | Renaut | 2500
Thanks in Advance and Regards!
I already found the solution. Using the Lag and lastvalue function. no need to use the hierarchy one.

How can I `SUM()` in PostgreSQL based on certain condition? For summing debits and credits in accounting journal table

I have a database full with accounting journals. There is table for accounting journal itself (the accounting journal's metadata) and there is a table for accounting journal line (for each account with its debit or credit).
I have database like this:
+----+---------------+--------+---------+
| ID | JOURNAL_NAME | DEBIT | CREDIT |
+----+---------------+--------+---------+
| | | | |
| 1 | INV/0001 | 100 | 0 |
| | | | |
| 2 | INV/0001 | 0 | 100 |
| | | | |
| 3 | INV/0002 | 200 | 0 |
| | | | |
| 4 | INV/0002 | 0 | 200 |
+----+---------------+--------+---------+
I want to have all journal with the same name to be summed in one, their debits and credits. So from the above table... I want to have a query that makes something like this:
+--------------+--------+---------+
| JOURNAL_NAME | DEBIT | CREDIT |
+--------------+--------+---------+
| | | |
| INV/0001 | 100 | 100 |
| | | |
| INV/0002 | 200 | 200 |
+--------------+--------+---------+
I have tried with:
SELECT DISTINCT ON (accounting_journal.id)
accounting_journal.name,
accounting_journal_line.debit,
accounting_journal_line.credit
FROM accounting_journal_line
JOIN accounting_journal ON accounting_journal.id = accounting_journal_line.move_id
ORDER BY accounting_journal.id ASC
LIMIT 3;
With the above query, I have all the journal and the journal lines. I just need to have the above query to sum the debits and credits for every same accounting_journal.name.
I have tried with SUM() but it always stuck in GROUP BY` clause.
SELECT DISTINCT ON (accounting_journal.id)
accounting_journal.name,
accounting_journal.ref,
accounting_journal_line.name,
SUM(accounting_journal_line.debit),
SUM(accounting_journal_line.credit)
FROM accounting_journal_line
JOIN accounting_journal ON accounting_journal.id = accounting_journal_line.move_id
ORDER BY accounting_journal.id ASC
LIMIT 3;
The error:
Error in query (7): ERROR: column "accounting_journal.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 2: accounting_journal.name,
I hope I can get assistance or pointer where I need to look at, here. Thanks!
When you are using any aggregation function with normal columns then your have to mention all the non-aggregating column in group by clause,
So try This:
SELECT DISTINCT ON (accounting_journal.id)
accounting_journal.name,
accounting_journal.ref,
accounting_journal_line.name,
SUM(accounting_journal_line.debit),
SUM(accounting_journal_line.credit)
FROM accounting_journal_line
JOIN accounting_journal ON accounting_journal.id = accounting_journal_line.move_id
group by 1,2,3
ORDER BY accounting_journal.id ASC
LIMIT 3;
In your query you are having 3 non-aggregation column so you can mention column number in group by clause to achieve it.
You can use the Sum Window Function, it does not require "group by". So:
select aj.id journal_id
aj.name journal_name,
aj.ref journal_ref,
ajl.name line_name,
sum(ajl.debit) over(partition by aj.id) total_debit,
sum(ajl.credit) over(partition by aj.id) total_credit
from accounting_journal_line ajl
join accounting_journal aj
on aj.id = ajl.move_id
order by aj.id;
See fiddle for a working example.

How to query just the last record of every second within a period of time in postgres

I have a table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585'.
I can quite easily to select a period using
SELECT uid, price, unit from prices
WHERE dt > '2017-05-01 00:00:00.000'
AND dt < '2017-05-01 02:59:59.999'
What I can't understand how to select price for every last record in each second. (I also need a very first one of each second too, but I guess it will be a similar separate query). There are some similar example (here), but they did not work for me when I try to adapt them to my needs generating errors.
Could some please help me to crack this nut?
Let say that there is a table which has been generated with a help of this command:
CREATE TABLE test AS
SELECT timestamp '2017-09-16 20:00:00' + x * interval '0.1' second As my_timestamp
from generate_series(0,100) x
This table contains an increasing series of timestamps, each timestamp differs by 100 milliseconds (0.1 second) from neighbors, so that there are 10 records within each second.
| my_timestamp |
|------------------------|
| 2017-09-16T20:00:00Z |
| 2017-09-16T20:00:00.1Z |
| 2017-09-16T20:00:00.2Z |
| 2017-09-16T20:00:00.3Z |
| 2017-09-16T20:00:00.4Z |
| 2017-09-16T20:00:00.5Z |
| 2017-09-16T20:00:00.6Z |
| 2017-09-16T20:00:00.7Z |
| 2017-09-16T20:00:00.8Z |
| 2017-09-16T20:00:00.9Z |
| 2017-09-16T20:00:01Z |
| 2017-09-16T20:00:01.1Z |
| 2017-09-16T20:00:01.2Z |
| 2017-09-16T20:00:01.3Z |
.......
The below query determines and prints the first and the last timestamp within each second:
SELECT my_timestamp,
CASE
WHEN rn1 = 1 THEN 'First'
WHEN rn2 = 1 THEN 'Last'
ELSE 'Somwhere in the middle'
END as Which_row_within_a_second
FROM (
select *,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp
) rn1,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp DESC
) rn2
from test
) xx
WHERE 1 IN (rn1, rn2 )
ORDER BY my_timestamp
;
| my_timestamp | which_row_within_a_second |
|------------------------|---------------------------|
| 2017-09-16T20:00:00Z | First |
| 2017-09-16T20:00:00.9Z | Last |
| 2017-09-16T20:00:01Z | First |
| 2017-09-16T20:00:01.9Z | Last |
| 2017-09-16T20:00:02Z | First |
| 2017-09-16T20:00:02.9Z | Last |
| 2017-09-16T20:00:03Z | First |
| 2017-09-16T20:00:03.9Z | Last |
| 2017-09-16T20:00:04Z | First |
| 2017-09-16T20:00:04.9Z | Last |
| 2017-09-16T20:00:05Z | First |
| 2017-09-16T20:00:05.9Z | Last |
A working demo you can find here

Join column with timestamps where value is maximum

I have a table that looks like
+-------+-----------+
| value | timestamp |
+-------+-----------+
and I'm trying to build a query that gives a result like
+-------+-----------+------------+------------------------+
| value | timestamp | MAX(value) | timestamp of max value |
+-------+-----------+------------+------------------------+
so that the result looks like
+---+----------+---+----------+
| 1 | 1.2.1001 | 3 | 1.1.1000 |
| 2 | 5.5.1021 | 3 | 1.1.1000 |
| 3 | 1.1.1000 | 3 | 1.1.1000 |
+---+----------+---+----------+
but I got stuck on joining the column with the corresponding timestamps.
Any hints or suggestions?
Thanks in advance!
For further information (if that helps):
In the real project the max-values are grouped by month and day (with group by clause, which works btw), but somehow I got stuck on joining the timestamps for max-values.
EDIT
Cross joins are a good idea, but I want to have them grouped by month e.g.:
+---+----------+---+----------+
| 1 | 1.1.1101 | 6 | 1.1.1300 |
| 2 | 2.6.1021 | 5 | 5.6.1000 |
| 3 | 1.1.1200 | 6 | 1.1.1300 |
| 4 | 1.1.1040 | 6 | 1.1.1300 |
| 5 | 5.6.1000 | 5 | 5.6.1000 |
| 6 | 1.1.1300 | 6 | 1.1.1300 |
+---+----------+---+----------+
EDIT 2
I've added a fiddle for some sample data and and example of the current query.
http://sqlfiddle.com/#!1/efa42/1
How to add the corresponding timestamp to the maximum?
Try a cross join with two sub queries, the first one selects all records, the second one gets one row that represents the time_stamp of the max value, <3;"1000-01-01"> for example.
SELECT col_value,col_timestamp,max_col_value, col_timestamp_of_max_value FROM table1
cross join
(
select max(col_value) max_col_value ,col_timestamp col_timestamp_of_max_value from table1
group by col_timestamp
order by max_col_value desc
limit 1
) A --One row that represents the time_stamp of the max value, ie: <3;"1000-01-01">
Use the window cause you use with pg
Select *, max( value ) over (), max( timestamp ) over() from table
That gives you the max values from all values in every row
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

How to list the train operators that use the second oldest trains (PostgreSQL)

train_operators:
| train_operator_id | name |
------------------------------
| 1 | Virgin |
| 2 | First |
journeys:
| journey_id | train_operator | train_type |
--------------------------------------------
| 1 | 2 | 2 |
| 2 | 2 | 1 |
| 3 | 1 | 3 |
| 4 | 1 | 2 |
train_types:
| train_type_id | date_made |
------------------------------
| 1 | 1999-02-15 |
| 2 | 2001-03-11 |
| 3 | 2000-12-05 |
How would you write a query to find all the train operators that use the second oldest type of train?
With the given schema the query should result with just Virgin since it is the only train operator that uses the second oldest train type
Try this:
select distinct train_operator from journeys
inner join (Select * from train_types order by date_made LIMIT 1 OFFSET 1) sectrain
on sectrain.train_type_id = journeys.train_type
You're into the UK Rail Network are you? I used to work for Funkwerk IT, who in turn used to provide the timetable planning software for Network Rail...
It can be pretty easy using the power of window functions in pg
SELECT DISTINCT train_operator_id,
name
FROM (SELECT t.train_operator_id,
t.name,
Rank() OVER (ORDER BY tt.date_made) AS rank
FROM train_operators AS t
JOIN journeys AS j
ON j.train_operator = t.train_operator_id
JOIN train_types AS tt
ON tt.train_type_id = j.train_type) AS q
WHERE rank = 2;
http://sqlfiddle.com/#!12/98816/8
select to.name
from
train_operators to
inner join
journeys j on to.train_operator_id = j.train_operator
where
j.train_type = (
select train_type_id
from train_types
order by date_made
limit 1 offset 1
)