I am computing a percentage in postgresql and I get the following unexpected behavior when dividing a number by the same number - postgresql

I am new at postgresql and am having trouble wrapping my mind around why I am getting the results that I see.
I perform the following query
SELECT
name AS region_name,
COUNT(tripsq1.id) AS trips,
COUNT(DISTINCT user_id) AS unique_users,
COUNT(case when consumed_at = start_at then tripsq1.id end) AS first_day,
(SUM(case when consumed_at = start_at then tripsq1.id end)::NUMERIC(6,4))/COUNT(tripsq1.id)::NUMERIC(6,4) AS percent_on_first_day
FROM promotionsq1
INNER JOIN couponsq1
ON promotion_id = promotionsq1.id
INNER JOIN tripsq1
ON couponsq1.id = coupon_id
INNER JOIN regionsq1
ON regionsq1.id = region_id
WHERE promotion_name = 'TestPromo'
GROUP BY region_name;
and get the following result
region_name | trips | unique_users | first_day | percent_on_first_day
-------------------+-------+--------------+-----------+-----------------------
A | 3 | 2 | 1 | 33.3333333333333333
B | 1 | 1 | 0 |
C | 1 | 1 | 1 | 2000.0000000000000000
The first rows percentage gets calculated correctly while the third rows percentage is 20 times what it should be. The percent_on_first_day should be 100.00 since it is 100.0 * 1/1.
Any help would be greatly appreciated

I'm suspecting that the issue is because of this code:
SUM(case when consumed_at = start_at then tripsq1.id end)
This tells me you are summing the ids, which is meaningless. You probably want:
SUM(case when consumed_at = start_at then 1 end)

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

how to drop rows if a variale is less than x, in sql

I have the following query code
query = """
with double_entry_book as (
SELECT to_address as address, value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE to_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
-- credits
SELECT from_address as address, -value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE from_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
)
SELECT address,
sum(value) / 1000000000000000000 as balance
from double_entry_book
group by address
order by balance desc
LIMIT 15000000
"""
In the last part, I want to drop rows where "balance" is less than, let's say, 0.02 and then group, order, etc. I imagine this should be a simple code. Any help will be appreciated!
We can delete on a CTE and use returning to get the id's of the rows being deleted, but they still exist until the transaction is comitted.
CREATE TABLE t (
id serial,
variale int);
insert into t (variale) values
(1),(2),(3),(4),(5);
✓
5 rows affected
with del as
(delete from t
where variale < 3
returning id)
select
t.id,
t.variale,
del.id ids_being_deleted
from t
left join del
on t.id = del.id;
id | variale | ids_being_deleted
-: | ------: | ----------------:
1 | 1 | 1
2 | 2 | 2
3 | 3 | null
4 | 4 | null
5 | 5 | null
select * from t;
id | variale
-: | ------:
3 | 3
4 | 4
5 | 5
db<>fiddle here

Postgresql: Create a date sequence, use it in date range query

I'm not great with SQL but I have been making good progress on a project up to this point. Now I am completely stuck.
I'm trying to get a count for the number of apartments with each status. I want this information for each day so that I can trend it over time. I have data that looks like this:
table: y_unit_status
unit | date_occurred | start_date | end_date | status
1 | 2017-01-01 | 2017-01-01 | 2017-01-05 | Occupied No Notice
1 | 2017-01-06 | 2017-01-06 | 2017-01-31 | Occupied Notice
1 | 2017-02-01 | 2017-02-01 | | Vacant
2 | 2017-01-01 | 2017-01-01 | | Occupied No Notice
And I want to get output that looks like this:
date | occupied_no_notice | occupied_notice | vacant
2017-01-01 | 2 | 0 | 0
...
2017-01-10 | 1 | 1 | 0
...
2017-02-01 | 1 | 0 | 1
Or, this approach would work:
date | status | count
2017-01-01 | occupied no notice | 2
2017-01-01 | occupied notice | 0
date_occurred: Date when the status of the unit changed
start_date: Same as date_occurred
end_date: Date when status stopped being x and changed to y.
I am pulling in the number of bedrooms and a property id so the second approach of selecting counts for one status at a time would produce a relatively large number of rows vs. option 1 (if that matters).
I've found a lot of references that have gotten me close to what I'm looking for but I always end up with a sort of rolling, cumulative count.
Here's my query, which produces a column of dates and counts, which accumulate over time rather than reflecting a snapshot of counts for a particular day. You can see my references to another table where I'm pulling in a property id. The table schema is Property -> Unit -> Unit Status.
WITH t AS(
SELECT i::date from generate_series('2016-06-29', '2017-08-03', '1 day'::interval) i
)
SELECT t.i as date,
u.hproperty,
count(us.hmy) as count --us.hmy is the id
FROM t
LEFT OUTER JOIN y_unit_status us ON t.i BETWEEN us.dtstart AND
us.dtend
INNER JOIN y_unit u ON u.hmy = us.hunit -- to get property id
WHERE us.sstatus = 'Occupied No Notice'
AND t.i >= us.dtstart
AND t.i <= us.dtend
AND u.hproperty = '1'
GROUP BY t.i, u.hproperty
ORDER BY t.i
limit 1500
I also tried a FOR loop, iterating over the dates to determine cases where the date was between start and end but my logic wasn't working. Thanks for any insight!
You are on the right track, but you'll need to handle NULL values in end_date. If those means that status is assumed to be changed somewhere in the future (but not sure when it will change), the containment operators (#> and <#) for the daterange type are perfect for you (because ranges can be "unbounded"):
with params as (
select date '2017-01-01' date_from,
date '2017-02-02' date_to
)
select date_from + d, status, count(unit)
from params
cross join generate_series(0, date_to - date_from) d
left join y_unit_status on daterange(start_date, end_date, '[]') #> date_from + d
group by 1, 2
To achieve the first variant, you can use conditional aggregation:
with params as (
select date '2017-01-01' date_from,
date '2017-02-02' date_to
)
select date_from + d,
count(unit) filter (where status = 'Occupied No Notice') occupied_no_notice,
count(unit) filter (where status = 'Occupied Notice') occupied_notice,
count(unit) filter (where status = 'Vacant') vacant
from params
cross join generate_series(0, date_to - date_from) d
left join y_unit_status on daterange(start_date, end_date, '[]') #> date_from + d
group by 1
Notes:
The syntax filter (where <predicate>) is new to 9.4+. Before that, you can use CASE (and the fact that most aggregate functions does not include NULL values) to emulate it.
You can even index the expression daterange(start_date, end_date, '[]') (using gist) for better performance.
http://rextester.com/HWKDE34743

One table with two different tasks

I got a test from my lecturer, I have to make one table with 3 columns inside: prodName, Qty, and totSalesToDate. Column Qty shows how many products have been sold in the input date, and totSalesToDate indicates products have been sold during the beginning of a month until the input date. Here is the example result table:
prodName | Qty | totSalesToDate
Car | 2 | 10
Bicycle | 8 | 22
Truck | 1 | 7
Motor-cycle | 3 | 12
I have to make this table using stored procedure (TSQL) with no subqueries. So far, the queries I made is:
create procedure SalesReport #date varchar(10)
as
select p.prodName, sum(s.Qty) as Qty
from PeriodTime pt full join Sales s on pt.Time = s.Time full join Product p on s.prodID = p.prodID
where #date = pt.Date
group by p.prodName
union
select p.prodName, sum(s.Qty) as totSalesToDate
from PeriodTime pt full join Sales s on pt.Time = s.Time full join Product p on s.prodID = p.prodID
where pt.Date between '2010060' and #date and p.prodName is not null
group by p.prodName
go
But the result I get is like this:
prodName | Qty
Car | 2
Car | 10
Bicycle | 8
Bicycle | 22
Truck | 1
Truck | 7
Motor-cycle | 3
Motor-cycle | 12
Anybody can help? I've been googling around but still cannot find the answer. Thanks.
How about
create procedure SalesReport #date varchar(10)
as
select p.prodName,
SUM(CASE WHEN #date = pt.Date THEN s.Qty ELSE 0 END) as Qty,
SUM(CASE WHEN pt.Date between '2010060' and #date THEN s.Qty ELSE 0.0 END) AS totSalesToDate
from PeriodTime pt full join Sales s on pt.Time = s.Time full join Product p on s.prodID = p.prodID
group by p.prodName
go

T-SQL: How to use GROUP BY and getting the value which excesses 60%?

sorry for the bad title, I don't know how to describe my problem.
I have the following table:
| ItemID | Date |
-------------------------
| 1 | 01.01.10 |
| 1 | 03.01.10 |
| 1 | 05.01.10 |
| 1 | 06.01.10 |
| 1 | 10.01.10 |
| 2 | 05.01.10 |
| 2 | 10.01.10 |
| 2 | 20.01.10 |
Now I want to GROUP BY ItemID and for the date I want to get the value, which excesses 60%. What I mean is, that for item 1 I've five rows, so each have a percentage of 20% and for item 2 I've three row, so each have a percentage of 33,33%. So for item 1 I need the 3rd and for item 2 the 2nd value, so that the result looks like that.
| ItemID | Date |
-------------------------
| 1 | 06.01.10 |
| 2 | 10.01.10 |
Is there a easy way so get this data? Maybe using OVER?
Thank you
Torben
with NumItems as
( select itemID, count(*) as numOfItems from table group by itemID)
),
rowNums as
(
select itemID,Date, row_number() over (partition by ItemID order by date asc) as rowNum
from table
)
select itemID, min(Date) from
rowNums a inner join NumItems b on a.itemID = b.ItemID
where cast(b.rowNum as float) / cast(numOfItems as float) >= 0.6
group by itemID
that should do it although I am certain It can be writter with only one table scan. That should work nice though.
The provided the script contained a few errors. Below is a working one:
with NumItems as
(
select itemID, count(*) as numOfItems from table group by itemID
),
rowNums as
(
select itemID, Date, row_number() over (partition by ItemID order by date asc) as rowNum
from table
)
select a.itemID, min(a.Date) from
rowNums a inner join NumItems b on a.itemID = b.ItemID
where cast(a.rowNum as float) / cast(numOfItems as float) >= 0.6
group by a.itemID