how to order output of arrays when using union - postgresql

I have a query like this:
SELECT array_agg(candles) as candles FROM ( SELECT * FROM ... ) AS candles
UNION ALL
SELECT array_agg(trades) as trades FROM ( SELECT * FROM ... ) AS trades
UNION ALL
SELECT ...
But then I'll get rows that contain arrays, but the order of the rows doesn't necessarily match the query order.
For example, it is possible that the output will have the trades row before the candles row.
How can I get the rows in a predictable order?
Edit:
updated the query based on the answer but getting an error:
SELECT a FROM
(
SELECT 1 as o, array_agg(candles) as a
FROM (
SELECT ts, open, high, low, close, midpoint, volume
FROM exchange.binance.candles
WHERE instrument = 'BTCUSDT' AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS candles
UNION ALL
SELECT 2 as o, array_agg(trades)
FROM (
SELECT ts, price, quantity, direction
FROM exchange.binance.trades
WHERE instrument = 'BTCUSDT' AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS trades
UNION ALL
SELECT 3 as o, array_agg(kvwap)
FROM (
SELECT ts, price, "interval"
FROM exchange.binance.kvwap
WHERE instrument = 'BTCUSDT' AND "interval" IN ('M5', 'H1', 'H4') AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS kvwap
)
ORDER BY o;
the error is:
[42601] ERROR: subquery in FROM must have an alias Hint: For example, FROM (SELECT ...) [AS] foo. Position: 15

Add a column for ordering to each subquery, but don't include it in the output:
SELECT a FROM (
SELECT 1 as o, array_agg(candles) as a FROM ( SELECT * FROM ... ) c group by 1
UNION ALL
SELECT 2, array_agg(trades) FROM ( SELECT * FROM ... ) t group by 1
UNION ALL
SELECT ...
) x
ORDER BY o
Note that with UNION only the first subquery's column names are relevant - the entire union uses column names from the first subquery - so don't bother providing aliases for the others.

Related

Checking Slowly Changing Dimension 2

I have a table that looks like this:
A slowly changing dimension type 2, according to Kimball.
Key is just a surrogate key, a key to make rows unique.
As you can see there are three rows for product A.
Timelines for this product are ok. During time the description of the product changes.
From 1-1-2020 up until 4-1-2020 the description of this product was ProdA1.
From 5-1-2020 up until 12-2-2020 the description of this product was ProdA2 etc.
If you look at product B, you see there are gaps in the timeline.
We use DB2 V12 z/Os. How can I check if there are gaps in the timelines for each and every product?
Tried this, but doesn't work
with selectie (key, tel) as
(select product, count(*)
from PROD_TAB
group by product
having count(*) > 1)
Select * from
PROD_TAB A
inner join selectie B
on A.product = B.product
Where not exists
(SELECT 1 from PROD_TAB C
WHERE A.product = C.product
AND A.END_DATE + 1 DAY = C.START_DATE
)
Does anyone know the answer?
The following query returns all gaps for all products.
The idea is to enumerate (RN column) all periods inside each product by START_DATE and join each record with its next period record.
WITH
/*
MYTAB (PRODUCT, DESCRIPTION, START_DATE, END_DATE) AS
(
SELECT 'A', 'ProdA1', DATE('2020-01-01'), DATE('2020-01-04') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA2', DATE('2020-01-05'), DATE('2020-02-12') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA3', DATE('2020-02-13'), DATE('2020-12-31') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB1', DATE('2020-01-05'), DATE('2020-01-09') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB2', DATE('2020-01-12'), DATE('2020-03-14') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB3', DATE('2020-03-15'), DATE('2020-04-18') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB4', DATE('2020-04-16'), DATE('2020-05-03') FROM SYSIBM.SYSDUMMY1
)
,
*/
MYTAB_ENUM AS
(
SELECT
T.*
, ROWNUMBER() OVER (PARTITION BY PRODUCT ORDER BY START_DATE) RN
FROM MYTAB T
)
SELECT A.PRODUCT, A.END_DATE + 1 START_DT, B.START_DATE - 1 END_DT
FROM MYTAB_ENUM A
JOIN MYTAB_ENUM B ON B.PRODUCT = A.PRODUCT AND B.RN = A.RN + 1
WHERE A.END_DATE + 1 <> B.START_DATE
AND A.END_DATE < B.START_DATE;
The result is:
|PRODUCT|START_DT |END_DT |
|-------|----------|----------|
|B |2020-01-10|2020-01-11|
May be more efficient way:
WITH MYTAB2 AS
(
SELECT
T.*
, LAG(END_DATE) OVER (PARTITION BY PRODUCT ORDER BY START_DATE) END_DATE_PREV
FROM MYTAB T
)
SELECT PRODUCT, END_DATE_PREV + 1 START_DATE, START_DATE - 1 END_DATE
FROM MYTAB2
WHERE END_DATE_PREV + 1 <> START_DATE
AND END_DATE_PREV < START_DATE;
Thnx Mark, will try this one of these days.
Never heard of LAG in DB2 V12 for z/Os
Will read about it
Thnx

Selecting the 1st and 10th Records Only

Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,
I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc
Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)

Select not null column in full join postgresql

I have 3 tables:
with current_exclusive as(
select id_station, area_type,
count(*) as total_entries
from c1169.data_cashier
where id_station IN(2439,2441,2443,2445,2447,2449) and date >= '2017-10-30' and date <= '2017-12-30'
group by id_station, area_type
), current_table as(
select id_station, area_type,
sum(total_time) filter (where previous_status = 1) as total_time
from c1169.data_table
where id_station IN(2439,2441,2443,2445,2447,2449) and date >= '2017-10-30' and date < '2017-12-30'
group by id_station, area_type
), current_cashier as(
select id_station, area_type,
sum(1) as total_transactions
from c1169.data_cashier
where id_station IN(2439,2441,2443,2445,2447,2449) and date >= '2017-10-30' and date < '2017-12-30'
group by id_station, area_type
)
select *
from current_exclusive
full join current_table on current_exclusive.id_station = current_table.id_station and current_exclusive.area_type = current_table.area_type
full join current_cashier on current_exclusive.id_station = current_cashier.id_station and current_exclusive.area_type = current_cashier.area_type
and the result is:
but my expected result is:
Are there any way to select * and show the expected result? Because when I do full join then id_station and area_type can be null in some tables, so it very hard to choose which column is not null.
Like: select case id_station is not null then id_station else id_station1 end, but I have up to 10 tables so can not do in select case
Use USING, per the documentation:
USING ( join_column [, ...] )
A clause of the form USING ( a, b, ... ) is shorthand for ON left_table.a = right_table.a AND left_table.b = right_table.b .... Also, USING implies that only one of each pair of equivalent columns will be included in the join output, not both.
select *
from current_exclusive
full join current_table using (id_station, area_type)
full join current_cashier using (id_station, area_type)
You cannot accomplish anything if you insist on using select *, since you are getting the values from different tables.
The option you have is to include a COALESCE block which gives you the first non-null value from the list of columns.
So, you could use.
select COALESCE( current_exclusive.id_station, current_table.id_station, current_cashier.id_station ) as id_station ,
COALESCE( current_exclusive.area_type , current_table.area_type, current_cashier.area_type ) as area_type ,.....
...
from current_exclusive
full join current_table..
...

Need to retrieve n-rows that are not at the beginning or in the end of the selected list

I have written sql statement :
select * from (
select count(*) as NumberofSignals,signals.transmitter_account,signals.class,signals.type,signals.signal_mode,
signals.area_id,signals.sector_id,signals.region_info_id,signals.zone_info_id,signals.user_id,signals.device_id,
signals.panel_name,signals.panel_id,signals.sector_name,signals.region_code,signals.area_name,signals.zone_code,
signals.description,signals.transmitter_name,signals.transmitter_id,signals.color,'event' as Event,get_name(signals.id,'event') as event_value,
'packetnumber' as packetnumber,get_name(signals.id,'packetnumber') as packetnumber_value,wm_concat(distinct get_name(signals.id,'repeater')) as repeater,
round(avg(get_name(signals.id,'signallevel'))) as avg_signallevel,min(to_char(signals.signal_forming_time, 'yyyy/mm/dd hh24:mi:ss')) as formingtime,
get_name(signals.id,'address') as address,get_name(signals.id,'username') as username,get_name(signals.id,'chaneltype') as channeltype,
get_name(signals.id,'code') as code,get_name(signals.id,'account') as account
from signals,signal_custom_fields where signals.id = signal_custom_fields.signal_id and
signals.id in (select id from (select id,rownum num from((select signals.id
from signals,signal_custom_fields where signal_custom_fields.field_name = 'event'
and signal_custom_fields.field_value is not null and signals.id = signal_custom_fields.signal_id
and signals.signal_forming_time >= to_date('2011/5/10 14:34:44', 'yyyy/mm/dd hh24:mi:ss')
AND signals.signal_forming_time <= to_date('2011/5/10 15:34:44', 'yyyy/mm/dd hh24:mi:ss'))
intersect (select distinct signals.id from signals,signal_custom_fields
where signal_custom_fields.field_name = 'packetnumber' and signal_custom_fields.field_value is not null
and signals.id = signal_custom_fields.signal_id
and signals.signal_forming_time >= to_date('2011/5/10 14:34:44', 'yyyy/mm/dd hh24:mi:ss')
AND signals.signal_forming_time <= to_date('2011/5/10 15:34:44', 'yyyy/mm/dd hh24:mi:ss')))
order by id desc)) group by 'event',signals.transmitter_account,signals.class,
signals.type,signals.signal_mode,signals.area_id,signals.sector_id,signals.region_info_id,signals.zone_info_id,
signals.user_id,signals.device_id,signals.panel_name,signals.panel_id,signals.sector_name,signals.region_code,
signals.area_name,signals.zone_code,signals.description,signals.transmitter_name,signals.transmitter_id,
signals.color, get_name(signals.id,'event'), 'packetnumber',get_name(signals.id,'username'),
get_name(signals.id,'chaneltype'),
get_name(signals.id,'code'),
get_name(signals.id,'account'), get_name(signals.id,'packetnumber'),get_name(signals.id,'address'),
TO_CHAR(signals.signal_forming_time ,'dd/mm/yyyy hh24'),
TRUNC(to_number(to_char(signals.signal_forming_time ,'mi'))/(30))
order by event)where rownum < 300
and here i get the first 300 rows, but how i need to rewright this statment to retrieve second 300 rows ???
Your query doesn't have the rownum listed in the first nested table. Add a rownum column in the first nested table then you can do a between function in the where clause at the top level:
--create a demo table
DROP TABLE paging_test;
CREATE TABLE paging_test AS
(SELECT rownum x FROM user_tables
);
--count how many records exist (in my case there is 821)
SELECT COUNT(*)
FROM paging_test;
--get the first 300 rows
SELECT *
FROM
(SELECT rownum rn, x FROM paging_test ORDER BY x
) pt
WHERE pt.rn BETWEEN 1 AND 300 ;
--get the next 300 rows
SELECT *
FROM
(SELECT rownum rn, x FROM paging_test ORDER BY x
) pt
WHERE pt.rn BETWEEN 300 AND 600 ;
You might also be interested in my reference:
References:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:948366252775

Dealing with periods and dates without using cursors

I would like to solve this issue avoiding to use cursors (FETCH).
Here comes the problem...
1st Table/quantity
------------------
periodid periodstart periodend quantity
1 2010/10/01 2010/10/15 5
2st Table/sold items
-----------------------
periodid periodstart periodend solditems
14343 2010/10/05 2010/10/06 2
Now I would like to get the following view or just query result
Table Table/stock
-----------------------
periodstart periodend itemsinstock
2010/10/01 2010/10/04 5
2010/10/05 2010/10/06 3
2010/10/07 2010/10/15 5
It seems impossible to solve this problem without using cursors, or without using single dates instead of periods.
I would appreciate any help.
Thanks
DECLARE #t1 TABLE (periodid INT,periodstart DATE,periodend DATE,quantity INT)
DECLARE #t2 TABLE (periodid INT,periodstart DATE,periodend DATE,solditems INT)
INSERT INTO #t1 VALUES(1,'2010-10-01T00:00:00.000','2010-10-15T00:00:00.000',5)
INSERT INTO #t2 VALUES(14343,'2010-10-05T00:00:00.000','2010-10-06T00:00:00.000',2)
DECLARE #D1 DATE
SELECT #D1 = MIN(P) FROM (SELECT MIN(periodstart) P FROM #t1
UNION ALL
SELECT MIN(periodstart) FROM #t2) D
DECLARE #D2 DATE
SELECT #D2 = MAX(P) FROM (SELECT MAX(periodend) P FROM #t1
UNION ALL
SELECT MAX(periodend) FROM #t2) D
;WITH
L0 AS (SELECT 1 AS c UNION ALL SELECT 1),
L1 AS (SELECT 1 AS c FROM L0 A CROSS JOIN L0 B),
L2 AS (SELECT 1 AS c FROM L1 A CROSS JOIN L1 B),
L3 AS (SELECT 1 AS c FROM L2 A CROSS JOIN L2 B),
L4 AS (SELECT 1 AS c FROM L3 A CROSS JOIN L3 B),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS i FROM L4),
Dates AS(SELECT DATEADD(DAY,i-1,#D1) AS D FROM Nums where i <= 1+DATEDIFF(DAY,#D1,#D2)) ,
Stock As (
SELECT D ,t1.quantity - ISNULL(t2.solditems,0) AS itemsinstock
FROM Dates
LEFT OUTER JOIN #t1 t1 ON t1.periodend >= D and t1.periodstart <= D
LEFT OUTER JOIN #t2 t2 ON t2.periodend >= D and t2.periodstart <= D ),
NStock As (
select D,itemsinstock, ROW_NUMBER() over (order by D) - ROW_NUMBER() over (partition by itemsinstock order by D) AS G
from Stock)
SELECT MIN(D) AS periodstart, MAX(D) AS periodend, itemsinstock
FROM NStock
GROUP BY G, itemsinstock
ORDER BY periodstart
Hopefully a little easier to read than Martin's. I used different tables and sample data, hopefully extrapolating the right info:
CREATE TABLE [dbo].[Quantity](
[PeriodStart] [date] NOT NULL,
[PeriodEnd] [date] NOT NULL,
[Quantity] [int] NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[SoldItems](
[PeriodStart] [date] NOT NULL,
[PeriodEnd] [date] NOT NULL,
[SoldItems] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO Quantity (PeriodStart,PeriodEnd,Quantity)
SELECT '20100101','20100115',5
INSERT INTO SoldItems (PeriodStart,PeriodEnd,SoldItems)
SELECT '20100105','20100107',2 union all
SELECT '20100106','20100108',1
The actual query is now:
;WITH Dates as (
select PeriodStart as DateVal from SoldItems union select PeriodEnd from SoldItems union select PeriodStart from Quantity union select PeriodEnd from Quantity
), Periods as (
select d1.DateVal as StartDate, d2.DateVal as EndDate
from Dates d1 inner join Dates d2 on d1.DateVal < d2.DateVal left join Dates d3 on d1.DateVal < d3.DateVal and d3.DateVal < d2.DateVal where d3.DateVal is null
), QuantitiesSold as (
select StartDate,EndDate,COALESCE(SUM(si.SoldItems),0) as Quantity
from Periods p left join SoldItems si on p.StartDate < si.PeriodEnd and si.PeriodStart < p.EndDate
group by StartDate,EndDate
)
select StartDate,EndDate,q.Quantity - qs.Quantity
from QuantitiesSold qs inner join Quantity q on qs.StartDate < q.PeriodEnd and q.PeriodStart < qs.EndDate
And the result is:
StartDate EndDate (No column name)
2010-01-01 2010-01-05 5
2010-01-05 2010-01-06 3
2010-01-06 2010-01-07 2
2010-01-07 2010-01-08 4
2010-01-08 2010-01-15 5
Explanation: I'm using three Common Table Expressions. The first (Dates) is gathering all of the dates that we're talking about, from the two tables involved. The second (Periods) selects consecutive values from the Dates CTE. And the third (QuantitiesSold) then finds items in the SoldItems table that overlap these periods, and adds their totals together. All that remains in the outer select is to subtract these quantities from the total quantity stored in the Quantity Table
John, what you could do is a WHILE loop. Declare and initialise 2 variables before your loop, one being the start date and the other being end date. Your loop would then look like this:
WHILE(#StartEnd <= #EndDate)
BEGIN
--processing goes here
SET #StartEnd = #StartEnd + 1
END
You would need to store your period definitions in another table, so you could retrieve those and output rows when required to a temporary table.
Let me know if you need any more detailed examples, or if I've got the wrong end of the stick!
Damien,
I am trying to fully understand your solution and test it on a large scale of data, but I receive following errors for your code.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near 'Dates'.
Msg 102, Level 15, State 1, Line 22
Incorrect syntax near ','.
Msg 102, Level 15, State 1, Line 25
Incorrect syntax near ','.
Damien,
Based on your solution I also wanted to get a neat display for StockItems without overlapping dates. How about this solution?
CREATE TABLE [dbo].[SoldItems](
[PeriodStart] [datetime] NOT NULL,
[PeriodEnd] [datetime] NOT NULL,
[SoldItems] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO SoldItems (PeriodStart,PeriodEnd,SoldItems)
SELECT '20100105','20100106',2 union all
SELECT '20100105','20100108',3 union all
SELECT '20100115','20100116',1 union all
SELECT '20100101','20100120',10
;WITH Dates as (
select PeriodStart as DateVal from SoldItems
union
select PeriodEnd from SoldItems
union
select PeriodStart from Quantity
union
select PeriodEnd from Quantity
), Periods as (
select d1.DateVal as StartDate, d2.DateVal as EndDate
from Dates d1
inner join Dates d2 on d1.DateVal < d2.DateVal
left join Dates d3 on d1.DateVal < d3.DateVal and
d3.DateVal < d2.DateVal where d3.DateVal is null
), QuantitiesSold as (
select StartDate,EndDate,SUM(si.SoldItems) as Quantity
from Periods p left join SoldItems si on p.StartDate < si.PeriodEnd and si.PeriodStart < p.EndDate
group by StartDate,EndDate
)
select StartDate,EndDate, qs.Quantity
from QuantitiesSold qs
where qs.quantity is not null