results of except query change after repeated consecutive runs in redshift - postgresql

I'm running the postgresql query below in aws redshift. Each time I run this query I'm getting a different result for the number of records that are different on daily_table.product_repeat_sub_query side, using the except operator. Neither the daily_table.product_repeat_sub_query table or the daily_table.daily_sku_t are being updated during this time. the daily_table.product_repeat_sub_query table and the product_repeat_sub_query query both have the same record count. the schema for the daily_table.daily_sku_t is below, the matching fields in the daily_table.product_repeat_sub_query have the same data types. I've also included some sample records from the tables below. does anyone have an idea how the results of the except query can come out differently each time this query is run, when the underlying tables aren't changing?
daily_table.daily_sku_t schema:
customer_uuid string
boardname_12 string
producttype string
productsubtype string
storeid int
product_id string
dateclosed date
Size string
query:
with product_repeat_sub_query as
(
select
dateclosed, t.product_id, t.storeid, t.producttype, t.productsubtype, t.size, t.boardname_12,
case
when ticketid = first_value(ticketid) over (partition by t.product_id, customer_uuid
ORDER BY
dateclosed ASC rows between unbounded preceding and unbounded following) then 0
else grossreceipts
end as product_repeat_gross, datediff(day,
lag(dateclosed, 1) over (partition by t.boardname_12, customer_uuid, t.product_id
ORDER BY
dateclosed ASC ),
dateclosed) as product_cycle_days
from
daily_table.daily_sku_t t )
select count(*) from
(
select dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from daily_table.product_repeat_sub_query
except
select dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from product_repeat_sub_query
);
-- 36843
-- 36887
-- 36188
data:
daily_table.product_repeat_sub_query
dateclosed storeid boardname_12 producttype productsubtype size product_id product_cycle_days
2021-04-23 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 2
2021-04-24 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 6
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 8
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 3
2021-05-01 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 13
2020-06-18 61 FLAV RX WINGER BEVERAGE 100MT 0000265d-6b81-4d79-90cf-xxxxxxxxxxxx 5
2020-06-29
product_repeat_subquery
dateclosed storeid boardname_12 producttype productsubtype size product_id product_cycle_days
2021-04-23 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 2
2021-04-24 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 6
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 8
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 3
2021-05-01 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 13
2020-06-18 61 FLAV RX WINGER BEVERAGE 100MT 0000265d-6b81-4d79-90cf-xxxxxxxxxxxx 5
2020-06-29
update:
with product_repeat_sub_query as
(
select customer_uuid,
dateclosed, t.product_id, t.storeid, t.producttype, t.productsubtype, t.size, t.boardname_12,
case
when ticketid = first_value(ticketid) over (partition by t.product_id, customer_uuid
ORDER BY
dateclosed ASC rows between unbounded preceding and unbounded following) then 0
else grossreceipts
end as product_repeat_gross, datediff(day,
lag(dateclosed, 1) over (partition by t.boardname_12, customer_uuid, t.product_id
ORDER BY
dateclosed ASC,t.boardname_12, customer_uuid, t.product_id ),
dateclosed) as product_cycle_days
from
daily_table.daily_sku_t t
where (t.customer_uuid is not null)
and (trim(t.customer_uuid) != '')
and (t.product_id is not null)
and (trim(t.product_id) != '')
)
select count(*) from
(
select customer_uuid, dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from daily_table.product_repeat_sub_query
except
select customer_uuid, dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from product_repeat_sub_query
);
even after adding all the fields from the partition to the order by and filtering our nulls or blanks in the id fields, I'm still getting a different count each time.

Your window functions don't have fully qualified order by clauses. You have repeated "dateclosed" values within partitions. This means that Redshift can have different row orders for the lag and first-value functions. I expect that these "random" ordering differences are causing your changing results.

Related

Group By - Using Absolute Values

I'm trying to display an accounting report where I show total transactions, voids, the transaction fee, and a total amount for each transaction type.
TransactionType Amount TransactionCount TotalAmount
AgentCredit -$1.00 49 -$49.00
MailFee -$1.25 11 -$13.75
MailFee $1.25 531 $663.75
HardCardFee -$5.00 7 -$35.00
HardCardFee $5.00 239 $1,195.00
QuotaHuntFee -$2.00 1 -$2.00
QuotaHuntFee $2.00 202 $404.00
But what I want to display would look like the following:
TransactionType Amount TransactionCount TotalAmount TotalTrans Voids
AgentCredit -$1.00 49 -$49.00 49 0
MailFee $1.25 520 $650.00 531 11
HardCardFee $5.00 232 $1,160.00 239 7
QuotaHuntFee $2.00 201 $402.00 202 1
Would it be possible to group the transaction types using the absolute value of the Amount and calculate the grand total along with the transaction count & void counts?
This is on SQL Server 2014.
Thanks,
I think this does it
declare #T table (nm varchar(20), prc smallmoney, amt int);
insert into #T values
('AgentCredit', -1.00, 49)
, ('MailFee', -1.25, 11)
, ('MailFee', 1.25, 531)
, ('HardCardFee', -5.00, 7)
, ('HardCardFee', 5.00, 239)
, ('QuotaHuntFee', -2.00, 1)
, ('QuotaHuntFee', 2.00, 202);
with cte as
(
select t.*, (t.prc * t.amt) as net
, count(*) over (partition by t.nm, abs(t.prc)) as cnt
, row_number() over (partition by t.nm, abs(t.prc) order by t.prc) as rn
, lag(t.prc) over (partition by t.nm, abs(t.prc) order by t.prc) as prPrc
, lag(t.amt) over (partition by t.nm, abs(t.prc) order by t.prc) as prAmt
, case when lag(t.prc) over (partition by t.nm, abs(t.prc) order by t.prc) < 0 then t.amt - lag(t.amt) over (partition by t.nm, abs(t.prc) order by t.prc)
else t.amt
end as bal
from #T t
)
select *, ISNULL(t.prAmt, 0) as void
, bal*prc as nnet
from cte t
where t.cnt = 1
or t.rn = 2
order by t.nm, t.prc;
There's a bit of confusion around your results with the data you've provided. HardCardFee has 7 and 23 in the sample you provided, but you want to return 232 for the total?.. MailFee also has some inconsistent math. Also, your 'Voids' returns 0 for the first row; however, it seems as if there are 49?
Perhaps this query could get you started down the right path:
DECLARE #Table TABLE (TransactionType varchar(20), Amount decimal(10,2), TransactionCount int, TotalAmount decimal(10,2))
INSERT #Table
VALUES ('AgentCredit' ,-$1.00 ,49 ,-$49.00 ),
('MailFee' ,-$1.25 ,11 ,-$13.75 ),
('MailFee' ,$1.25 ,531 ,$663.75 ),
('HardCardFee' ,-$5.00 ,7 ,-$35.00 ),
('HardCardFee' ,$5.00 ,23 ,$1195.00 ),
('QuotaHuntFee' ,-$2.00 ,1 ,-$2.00 ),
('QuotaHuntFee' ,$2.00 ,202 ,$404.00 )
;WITH c AS (
SELECT TransactionType, Amount, TransactionCount, TotalAmount,
CASE WHEN t.Amount + ABS(t.Amount) = 0 THEN '-' ELSE '' END +
CAST(t.TransactionCount AS VARCHAR(10)) AS TCount
FROM #Table t
)
SELECT t.TransactionType
,MAX(t.Amount) AS Amount
,SUM(CAST(t.TCount AS INT)) AS TransactionCount
,SUM(t.TotalAmount) AS TotalAmount
,SUM(ABS(t.TransactionCount)) AS TotalTrans
,ABS(MIN(t.TCount)) AS Voids
FROM c t
GROUP BY TransactionType
Again, not sure about some of the values provided.

How to find gap date and minimum date in the same query?

I have a table customer_history which log customer_id and modification_date.
When customer_id is not modified there is no entry in the table
I can find when customer_id haven't been modified (=last_date_with_no_modification). I look for when the date is missing (= Gaps and Islands problem).
But in the same query if no date is missing the value last_date_with_no_modification should
be DATEADD(DAY,-1,min(modification_date)) for the customer_id.
I don't know how to add this last condition in my SQL query?
I use following tables:
"Customer_history" table:
customer_id modification_date
1 2017-12-20
1 2017-12-19
1 2017-12-17
2 2017-12-20
2 2017-12-18
2 2017-12-17
2 2017-12-15
3 2017-12-20
3 2017-12-19
"#tmp_calendar" table:
date
2017-12-15
2017-12-16
2017-12-17
2017-12-18
2017-12-19
2017-12-20
Query used to qet gap date:
WITH CTE_GAP AS
(SELECT ch.customer_id,
LAG(ch.modification_date) OVER(PARTITION BY ch.customer_id ORDER BY ch.modification_date) as GapStart,
ch.modification_date as GapEnd,
(DATEDIFF(DAY,LAG(ch.modification_date) OVER(PARTITION BY ch.customer_id ORDER BY ch.modification_date), ch.modification_date)-1) GapDays
FROM customer_history ch )
SELECT cg.customer_id,
DATEADD(DAY,1,MAX(cg.GapStart)) as last_date_with_no_modification
FROM CTE_GAP cg
CROSS JOIN #tmp_calendar c
WHERE cg.GapDays >0
AND c.date BETWEEN DATEADD(DAY,1,cg.GapStart) AND DATEADD(DAY,-1,cg.GapEnd)
GROUP BY cg.customer_id
Result:
customer_id last_date_with_no_modification
1 2017-12-18
2 2017-12-19
3 2017-12-19 (Row missing)
How to get customer_id 3?
Something this should work:
WITH CTE_GAP
AS
(
SELECT
ch.customer_id,
LAG(ch.modification_date) OVER(PARTITION BY ch.customer_id ORDER BY ch.modification_date) as GapStart,
ch.modification_date as GapEnd,
(DATEDIFF(DAY,LAG(ch.modification_date) OVER(PARTITION BY ch.customer_id ORDER BY ch.modification_date), ch.modification_date)-1) GapDays
FROM #customer_history ch
)
SELECT DISTINCT
C.customer_id
, ISNULL(LD.last_date_with_no_modification, LD_NO_GAP.last_date_with_no_modification) last_date_with_no_modification
FROM
customer_history C
LEFT JOIN
(
SELECT
cg.customer_id,
DATEADD(DAY, 1, MAX(cg.GapStart)) last_date_with_no_modification
FROM
CTE_GAP cg
CROSS JOIN #tmp_calendar c
WHERE
cg.GapDays >0
AND c.date BETWEEN DATEADD(DAY, 1, cg.GapStart) AND DATEADD(DAY, -1, cg.GapEnd)
GROUP BY cg.customer_id
) LD
ON C.customer_id = LD.customer_id
LEFT JOIN
(
SELECT
customer_id
, DATEADD(DAY, -1, MIN(modification_date)) last_date_with_no_modification
FROM customer_history
GROUP BY customer_id
) LD_NO_GAP
ON C.customer_id = LD_NO_GAP.customer_id

Ranking in PostgreSQL

I have a query that looks like this:
select
restaurant_id,
rank() OVER (PARTITION BY restaurant_id order by churn desc) as rank_churn,
churn,
orders,
rank() OVER (PARTITION BY restaurant_id order by orders desc) as rank_orders
from data
I would expect that this ranking function will order my data and provide a column that has 1,2,3,4 according to the values of the column.
However the outcome is always 1 in the ranking.
restaurant_id rank_churn churn orders rank_orders
2217 1 75 182 1
2249 1 398 896 1
2526 1 11 56 1
2596 1 89 139 1
What am I doing wrong?

Selecting only the values that are in the 75th percent tile and are above a constraint

I'm trying to get this query to work properly...
select salary from agent
where salary > 75000
ORDER BY salary ASC
LIMIT (select ROUND(count(salary) * .75) as TwentyFifthTile from agent)
some addition information about the rows:
166 rows – 25%
331 rows – 50%
497 rows – 75%
662 rows – 100%
These rows have salary 75,000 plus:
235 / 662 = ~.35
.35 * 662 = ~235 rows.
I'm trying to get the above query to return back all the rows that have salary greater than 75,000 but are still in the first 497 rows. When I run the above query it returns all the rows starting at 75,000 and limited by a 497 row return constraint.
I'm not sure how I can just return salaries of greater than 75,000 that are in the first 497 rows of the limit constraint.
You can divide the total number of rows by the current row number to get this:
select salary
from (
select salary,
count(*) over () as total_count,
row_number() over (order by salary) as rn
from agent
where salary > 75000
) t
where (rn / total_count::numeric) <= 0.75
order by salary asc
Use row_number:
select salary, row_number() over (order by salary) row_num
from agent
where row_num < (select ROUND(count(salary) * .75) from agent)
and salary > 75000

Select values by date

I have same data structure as Select only last value of date?
, but i need to get values for each date, I marked them. The solution for Feroc's question is giving last date only.
localName Date_Time RH
BAG012 2014-10-09 17:17:58.000 16 <--
BAG012 2014-10-09 17:13:28.000 16
BAG012 2014-10-09 17:12:23.000 16
BAG012 2014-10-09 16:52:54.000 16
BAG012 2014-10-08 05:14:56.000 16 <--
BAG012 2014-10-08 04:45:31.000 16
BAG012 2014-10-08 04:44:08.000 16
SAG165 2014-10-28 11:22:14.000 698 <--
SAG165 2014-10-28 11:09:14.000 698
SAG165 2014-10-28 10:53:18.000 698
SAG165 2014-10-27 19:30:14.000 693 <--
SAG165 2014-10-27 19:14:51.000 693
SAG165 2014-10-27 19:13:56.000 693
Here is a code I am using:
WITH CTE AS
(
SELECT LTRIM(localName) as localName, CAST(year(Date_Time) as varchar)+'-'+CAST(month(Date_Time) as varchar)+'-'+CAST(day(Date_Time) as varchar) as date_time, RH, tank, mode,
RN = ROW_NUMBER() OVER (PARTITION BY localName ORDER BY Date_Time DESC)
FROM dbo.SMCData
)
SELECT DISTINCT Date_Time, localName, RH, tank, mode
FROM CTE
WHERE RN = 1
order by LocalName asc
as the datetime includes time also, I tried to remove time and leave only date.
Tried to used DISTINCT Date_Time
both above cases are not making any changes in result.
ANSWER:
thanks for all, the following worked for me:
WITH CTE AS
(
SELECT localName, Date_Time, RH,
RN = ROW_NUMBER() OVER (PARTITION BY localName, CAST(Date_Time AS date) ORDER BY Date_Time DESC)
FROM dbo.TableName
)
SELECT localName, Date_Time, RH
FROM CTE
WHERE RN = 1
ORDER BY localName, Date_Time;
WITH CTE AS
(
SELECT localName, Date_Time, RH,
RN = ROW_NUMBER() OVER (PARTITION BY localName, CAST(Date_Time AS date) ORDER BY Date_Time DESC)
FROM dbo.TableName
)
SELECT localName, Date_Time, RH
FROM CTE
WHERE RN = 1
ORDER BY localName, Date_Time;
Use select distinct:
SELECT DISTINCT date
FROM table_name
It will give you on of each date
Group by the date also:
SELECT localName, MAX(date_time) AS max_date_time, MAX(rh) AS max_rh
FROM your_table
GROUP BY localName, CAST(date_time AS date)
This will give you the last date for each localname.
Sample SQL Fiddle
Output:
localName max_date_time max_rh
---------- ----------------------- -----------
BAG012 2014-10-08 05:14:56.000 16
BAG012 2014-10-09 17:17:58.000 16
SAG165 2014-10-27 19:30:14.000 693
SAG165 2014-10-28 11:22:14.000 698
SELECT DISTINCT localName, Date , RH
FROM table_name
This should give unique date and RH values and corresponding localName.