Select not null column in full join postgresql - postgresql

I have 3 tables:
with current_exclusive as(
select id_station, area_type,
count(*) as total_entries
from c1169.data_cashier
where id_station IN(2439,2441,2443,2445,2447,2449) and date >= '2017-10-30' and date <= '2017-12-30'
group by id_station, area_type
), current_table as(
select id_station, area_type,
sum(total_time) filter (where previous_status = 1) as total_time
from c1169.data_table
where id_station IN(2439,2441,2443,2445,2447,2449) and date >= '2017-10-30' and date < '2017-12-30'
group by id_station, area_type
), current_cashier as(
select id_station, area_type,
sum(1) as total_transactions
from c1169.data_cashier
where id_station IN(2439,2441,2443,2445,2447,2449) and date >= '2017-10-30' and date < '2017-12-30'
group by id_station, area_type
)
select *
from current_exclusive
full join current_table on current_exclusive.id_station = current_table.id_station and current_exclusive.area_type = current_table.area_type
full join current_cashier on current_exclusive.id_station = current_cashier.id_station and current_exclusive.area_type = current_cashier.area_type
and the result is:
but my expected result is:
Are there any way to select * and show the expected result? Because when I do full join then id_station and area_type can be null in some tables, so it very hard to choose which column is not null.
Like: select case id_station is not null then id_station else id_station1 end, but I have up to 10 tables so can not do in select case

Use USING, per the documentation:
USING ( join_column [, ...] )
A clause of the form USING ( a, b, ... ) is shorthand for ON left_table.a = right_table.a AND left_table.b = right_table.b .... Also, USING implies that only one of each pair of equivalent columns will be included in the join output, not both.
select *
from current_exclusive
full join current_table using (id_station, area_type)
full join current_cashier using (id_station, area_type)

You cannot accomplish anything if you insist on using select *, since you are getting the values from different tables.
The option you have is to include a COALESCE block which gives you the first non-null value from the list of columns.
So, you could use.
select COALESCE( current_exclusive.id_station, current_table.id_station, current_cashier.id_station ) as id_station ,
COALESCE( current_exclusive.area_type , current_table.area_type, current_cashier.area_type ) as area_type ,.....
...
from current_exclusive
full join current_table..
...

Related

how to order output of arrays when using union

I have a query like this:
SELECT array_agg(candles) as candles FROM ( SELECT * FROM ... ) AS candles
UNION ALL
SELECT array_agg(trades) as trades FROM ( SELECT * FROM ... ) AS trades
UNION ALL
SELECT ...
But then I'll get rows that contain arrays, but the order of the rows doesn't necessarily match the query order.
For example, it is possible that the output will have the trades row before the candles row.
How can I get the rows in a predictable order?
Edit:
updated the query based on the answer but getting an error:
SELECT a FROM
(
SELECT 1 as o, array_agg(candles) as a
FROM (
SELECT ts, open, high, low, close, midpoint, volume
FROM exchange.binance.candles
WHERE instrument = 'BTCUSDT' AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS candles
UNION ALL
SELECT 2 as o, array_agg(trades)
FROM (
SELECT ts, price, quantity, direction
FROM exchange.binance.trades
WHERE instrument = 'BTCUSDT' AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS trades
UNION ALL
SELECT 3 as o, array_agg(kvwap)
FROM (
SELECT ts, price, "interval"
FROM exchange.binance.kvwap
WHERE instrument = 'BTCUSDT' AND "interval" IN ('M5', 'H1', 'H4') AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS kvwap
)
ORDER BY o;
the error is:
[42601] ERROR: subquery in FROM must have an alias Hint: For example, FROM (SELECT ...) [AS] foo. Position: 15
Add a column for ordering to each subquery, but don't include it in the output:
SELECT a FROM (
SELECT 1 as o, array_agg(candles) as a FROM ( SELECT * FROM ... ) c group by 1
UNION ALL
SELECT 2, array_agg(trades) FROM ( SELECT * FROM ... ) t group by 1
UNION ALL
SELECT ...
) x
ORDER BY o
Note that with UNION only the first subquery's column names are relevant - the entire union uses column names from the first subquery - so don't bother providing aliases for the others.

SQL Server - Select with Group By together Raw_Number

I'm using SQL Server 2000 (80). So, it's not possible to use the LAG function.
I have a code a data set with four columns:
Purchase_Date
Facility_no
Seller_id
Sale_id
I need to identify missing Sale_ids. So every sale_id is a 100% sequential, so the should not be any gaps in order.
This code works for a specific date and store if specified. But i need to work on entire data set looping looping through every facility_id and every seller_id for ever purchase_date
declare #MAXCOUNT int
set #MAXCOUNT =
(
select MAX(Sale_Id)
from #table
where
Facility_no in (124) and
Purchase_date = '2/7/2020'
and Seller_id = 1
)
;WITH TRX_COUNT AS
(
SELECT 1 AS Number
union all
select Number + 1 from TRX_COUNT
where Number < #MAXCOUNT
)
select * from TRX_COUNT
where
Number NOT IN
(
select Sale_Id
from #table
where
Facility_no in (124)
and Purchase_Date = '2/7/2020'
and seller_id = 1
)
order by Number
OPTION (maxrecursion 0)
My Dataset
This column:
case when
Sale_Id=0 or 1=Sale_Id-LAG(Sale_Id) over (partition by Facility_no, Purchase_Date, Seller_id)
then 'OK' else 'Previous Missing' end
will tell you which Seller_Ids have some sale missing. If you want to go a step further and have exactly your desired output, then filter out and distinct the 'Previous Missing' ones, and join with a tally table on not exists.
Edit: OP mentions in comments they can't use LAG(). My suggestion, then, would be:
Make a temp table that that has the max(sale_id) group by facility/seller_id
Then you can get your missing results by this pseudocode query:
Select ...
from temptable t
inner join tally N on t.maxsale <=N.num
where not exists( select ... from sourcetable s where s.facility=t.facility and s.seller=t.seller and s.sale=N.num)
> because the only way to "construct" nonexisting combinations is to construct them all and just remove the existing ones.
This one worked out
; WITH cte_Rn AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Facility_no, Purchase_Date, Seller_id ORDER BY Purchase_Date) AS [Rn_Num]
FROM (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id
FROM MyTable WITH (NOLOCK)
) a
)
, cte_Rn_0 as (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Rn_Num] AS 'Skipped Sale'
-- , case when Sale_id = 0 Then [Rn_Num] - 1 Else [Rn_Num] End AS 'Skipped Sale for 0'
, [Rn_Num] - 1 AS 'Skipped Sale for 0'
FROM cte_Rn a
)
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Skipped Sale],
[Skipped Sale for 0]
FROM cte_Rn_0 a
WHERE NOT EXISTS
(
select * from cte_Rn_0 b
where b.Sale_id = a.[Skipped Sale for 0]
and a.Facility_no = b.Facility_no
and a.Purchase_Date = b.Purchase_Date
and a.Seller_id = b.Seller_id
)
--ORDER BY Purchase_Date ASC

How to workaround unsupported percentile_cont in Postgres/Citus?

I have a query similar to this:
select
coalesce(s.import_date, r.import_date) as import_date,
coalesce(s.bedrooms, r.bedrooms) as bedrooms,
coalesce(s.ptype, r.ptype) as property_type,
s.s_price,
s.s_transactions,
....
r.r_rent,
....
from
(
select
sc.import_date,
sc.bedrooms,
sc.ptype,
percentile_cont(array[0.25,0.5,0.75,0.9]) within group (order by sc.asking_price) filter(where sc.price > 0) as s_price,
sum(1) filter(where sc.sold_price > 0) as s_transactions,
......
from prices sc
where sc.ptype = 'F' and sc.bedrooms = 2 and st_Intersects('010300002.....'::geometry,sc.geom)
and sc.import_date between '2012-01-01' and '2019-01-01'
group by sc.import_date, sc.bedrooms, sc.property_type
) s
full join
(
select
rc.import_date,
rc.bedrooms,
rc.ptype,
percentile_cont(array[0.25,0.5,0.75,0.9]) within group (order by rc.rent) filter(where rc.rent > 0) as r_rent,
.....
from rents rc
where rc.ptype = 'F' and rc.bedrooms = 2 and st_Intersects('010300002....'::geometry,rc.geom)
and rc.import_date between '2012-01-01' and '2019-01-01'
group by rc.import_date, rc.bedrooms, rc.property_type
) r
on r.import_date = s.import_date;
When I run it against my distributed tables on Citus/Postgres-11 I get:
ERROR: unsupported aggregate function percentile_cont
Is there any way to workaround this limitation?
AFAIK there is no easy workaround for this.
You can always pull all the data to the coordinator and calculate percentiles there. It is not advisable to do this in the same query though.
SELECT percentile_cont(array[0.25,0.5,0.75,0.9]) within group (order by r.order_col)
FROM
(
SELECT order_col, ...
FROM rents
WHERE ...
) r
GROUP BY ...
This query will pull all the data returned by the inner subquery to the coordinator and calculate the percentiles in the coordinator.

How to translate SQL to DAX, Need to add FILTER

I want to create calculated table that will summarize In_Force Premium from existing table fact_Premium.
How can I filter the result by saying:
TODAY() has to be between `fact_Premium[EffectiveDate]` and (SELECT TOP 1 fact_Premium[ExpirationDate] ORDE BY QuoteID DESC)
In SQL I'd do that like this:
`WHERE CONVERT(date, getdate()) between CONVERT(date, tblQuotes.EffectiveDate)
and (
select top 1 q2.ExpirationDate
from Table2 Q2
where q2.ControlNo = Table1.controlno
order by quoteid` desc
)
Here is my DAX statement so far:
In_Force Premium =
FILTER(
ADDCOLUMNS(
SUMMARIZE(
//Grouping necessary columns
fact_Premium,
fact_Premium[QuoteID],
fact_Premium[Division],
fact_Premium[Office],
dim_Company[CompanyGUID],
fact_Premium[LineGUID],
fact_Premium[ProducerGUID],
fact_Premium[StateID],
fact_Premium[ExpirationDate]
),
"Premium", CALCULATE(
SUM(fact_Premium[Premium])
),
"ControlNo", CALCULATE(
DISTINCTCOUNT(fact_Premium[ControlNo])
)
), // Here I need to make sure TODAY() falls between fact_Premium[EffectiveDate] and (SELECT TOP 1 fact_Premium[ExpirationDate] ORDE BY QuoteID DESC)
)
Also, what would be more efficient way, to create calculated table from fact_Premium or create same table using sql statement (--> Get Data--> SQL Server) ?
There are 2 potential ways in T-SQL to get the next effective date. One is to use LEAD() and another is to use an APPLY operator. As there are few facts to work with here are samples:
select *
from (
select *
, lead(EffectiveDate) over(partition by CompanyGUID order by quoteid desc) as NextEffectiveDate
from Table1
join Table2 on ...
) d
or
select table1.*, oa.NextEffectiveDate
from Table1
outer apply (
select top(1) q2.ExpirationDate AS NextEffectiveDate
from Table2 Q2
where q2.ControlNo = Table1.controlno
order by quoteid desc
) oa
nb. an outer apply is a little similar to a left join in that it will allow rows with a NULL to be returned by the query, if that is not needed than use cross apply instead.
In both these approaches you may refer to NextEffectiveDate in a final where clause, but I would prefer to avoid using the convert function if that is feasible (this depends on the data).

TSQL Compare 2 select's result and return result with most recent date

Wonder if someone could give me a quick hand. I have 2 select queries (as shown below) and I want to compare the results of both and only return the result that has the most recent date.
So say I have the following 2 results from the queries:-
--------- ---------- ----------------------- --------------- ------ --
COMPANY A EMPLOYEE A 2007-10-16 17:10:21.000 E-mail 6D29D6D5 SYSTEM 1
COMPANY A EMPLOYEE A 2007-10-15 17:10:21.000 E-mail 6D29D6D5 SYSTEM 1
I only want to return the result with the latest date (so the first one). I thought about putting the results into a temporary table and then querying that but just wondering if there's a simpler, more efficient way?
SELECT * FROM (
SELECT fc.accountidname, fc.owneridname, fap.actualend, fap.activitytypecodename, fap.createdby, fap.createdbyname,
ROW_NUMBER() OVER (PARTITION BY fc.accountidname ORDER BY fap.actualend DESC) AS RN
FROM FilteredContact fc
INNER JOIN FilteredActivityPointer fap ON fc.parentcustomerid = fap.regardingobjectid
WHERE fc.statecodename = 'Active'
AND fap.ownerid = '0F995BDC'
AND fap.createdon < getdate()
) tmp WHERE RN = 1
SELECT * FROM (
SELECT fa.name, fa.owneridname, fa.new_technicalaccountmanageridname, fa.new_customerid, fa.new_riskstatusname,
fa.new_numberofopencases, fa.new_numberofurgentopencases, fap.actualend, fap.activitytypecodename, fap.createdby, fap.createdbyname,
ROW_NUMBER() OVER (PARTITION BY fa.name ORDER BY fap.actualend DESC) AS RN
FROM FilteredAccount fa
INNER JOIN FilteredActivityPointer fap ON fa.accountid = fap.regardingobjectid
WHERE fa.statecodename = 'Active'
AND fap.ownerid = '0F995BDC'
AND fap.createdon < getdate()
) tmp2 WHERE RN = 1
if the tables have the same structure (column count and column types to match), then you could just union the results of the two queries, then order by the date desc and then select the top 1.
select top 1 * from
(
-- your first query
union all
-- your second query.
) T
order by YourDateColumn1 desc
You should GROUP BY and use MAX(createdon)