Calculate relative errors in Postgres

Calculate relative errors in Postgres - postgresql

Is there an easy way to calculate errors in Postgres given a table that looks something like this:
id | bool | score
1 | False | 9
1 | True | 9.6
2 | False | 5
2 | True | 4.7
The output that I want id | (False_row - True_row)/True_row:
id | err
1 | -0.0625
2 | 0.063829

SELECT
id,
(false_row - true_row) / true_row
FROM (
SELECT
id,
SUM(CASE WHEN bool THEN score ELSE 0 END) AS true_row,
SUM(CASE WHEN NOT bool THEN score ELSE 0 END) AS false_row
FROM
table_name
GROUP BY
id
) AS sub;
In the subquery (sub) take the true_row and the false_row. This can be done using a variety of aggregate functions, SUM for example.
When you have your true_row and false_row just do the calculations in the outer query.

Related

Need to have subquery within subquery

I have a stock table which holds for example
Partnumber | Depot | flag_redundant
------------+-------+----------------
1 | 1 | 5
1 | 2 | 0
1 | 3 | 0
1 | 4 | 5
2 | 1 | 0
2 | 2 | 0
2 | 3 | 0
2 | 4 | 0
I need to be able to see the depots in which the parts have not been flagged as redundant, but the flag_redundant has been at least been flagged once for that part, and I need to ignore any parts where there has not been a flag flagged.
Any help appreciated!
I'm thinking of something along the lines of ....
SELECT stock.part, stock.depot,
OrderCount = (SELECT CASE WHEN Stock.flag_redundant = 5 THEN 1 end as Countcolumn FROM stock C)
FROM stock
Partnumber | MissingDepots
------------+---------------
1 | Yes

You can group by partnumber and set the conditions in the HAVING clause:
select
partnumber, 'Yes' MissingDepots
from stock
group by partnumber
having
sum(flag_redundant) > 0 and
sum(case when flag_redundant = 0 then 1 end) > 0
Or:
select
partnumber, 'Yes' MissingDepots
from stock
group by partnumber
having sum(case when flag_redundant = 0 then 1 end) between 1 and count(*) - 1
See the demo.
Results:
> partnumber | missingdepots
> ---------: | :------------
> 1 | Yes

Assuming you want to get these partnumbers that contain data sets with flag_redundant = 5 AND 0:
demo:db<>fiddle
SELECT
partnumber,
'Yes' AS missing
FROM (
SELECT
partnumber,
COUNT(flag_redundant) FILTER (WHERE flag_redundant = 5) AS cnt_redundant, -- 2
COUNT(*) AS cnt -- 3
FROM
stock
GROUP BY partnumber -- 1
) s
WHERE cnt_redundant > 0 -- 4
AND cnt_redundant < cnt -- 5
Group by partnumber
Count all records with flag_redundant = 5
Count all records
Find all partnumbers that contain any element with 5 ...
... and which have more records than 5-element records

Finding the length of a series in postgres

A tricky query for postgres. Imagine, I have a set of rows with a boolean column called (for example) success. Like this:
id | success
9 | false
8 | false
7 | true
6 | true
5 | true
4 | false
3 | false
2 | true
1 | false
And I need to calculate a length of the latest (not) successful series. E. g. in this case it would be "3" for successful and "2" for not successful. Or using window functions, then something like:
id | success | length
9 | false | 2
8 | false | 2
7 | true | 3
6 | true | 3
5 | true | 3
4 | false | 1
3 | true | 2
2 | true | 2
1 | false | 1
(note that I generally need a length of only the latest series, not all of those)
The closest answer I've found so far was this article:
https://jaxenter.com/10-sql-tricks-that-you-didnt-think-were-possible-125934.html
(See #5)
However, postgres doesn't support "IGNORE NULLS" option so the query doesn't work. Without "IGNORE NULLS" it simply returns me nulls in length column.
Here is the closest I was able to get:
WITH
trx1(id, success, rn) AS (
SELECT id, success, row_number() OVER (ORDER BY id desc)
FROM results
),
trx2(id, success, rn, lo, hi) AS (
SELECT trx1.*,
CASE WHEN coalesce(lag(success) OVER (ORDER BY id DESC), FALSE) != success THEN rn END,
CASE WHEN coalesce(lead(success) OVER (ORDER BY id DESC), FALSE) != success THEN rn END
FROM trx1
)
SELECT trx2.*, 1
- last_value (lo) IGNORE nulls OVER (ORDER BY id DESC ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW)
+ first_value(hi) OVER (ORDER BY id DESC ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING)
AS length FROM trx2;
Do you have any ideas of such a query?

You can use the window function row_number() to designate series:
select max(id) as max_id, success, count(*) as length
from (
select *, row_number() over wa - row_number() over wp as grp
from my_table
window
wp as (partition by success order by id desc),
wa as (order by id desc)
) s
group by success, grp
order by 1 desc
max_id | success | length
--------+---------+--------
9 | f | 2
7 | t | 3
4 | f | 2
2 | t | 1
1 | f | 1
(5 rows)
DbFiddle.

Even though answer by Klin is totally correct, I'd like to post another solution my friend suggested:
with last_success as (
select max(id) id from my_table where success
)
select count(mt.id) last_fails_count
from my_table mt, last_success lt
where mt.id > lt.id;
--------------------
| last_fails_count |
--------------------
| 2 |
--------------------
DbFiddle
It is twice faster if I only need to get the last failing or successful series.

I am computing a percentage in postgresql and I get the following unexpected behavior when dividing a number by the same number

I am new at postgresql and am having trouble wrapping my mind around why I am getting the results that I see.
I perform the following query
SELECT
name AS region_name,
COUNT(tripsq1.id) AS trips,
COUNT(DISTINCT user_id) AS unique_users,
COUNT(case when consumed_at = start_at then tripsq1.id end) AS first_day,
(SUM(case when consumed_at = start_at then tripsq1.id end)::NUMERIC(6,4))/COUNT(tripsq1.id)::NUMERIC(6,4) AS percent_on_first_day
FROM promotionsq1
INNER JOIN couponsq1
ON promotion_id = promotionsq1.id
INNER JOIN tripsq1
ON couponsq1.id = coupon_id
INNER JOIN regionsq1
ON regionsq1.id = region_id
WHERE promotion_name = 'TestPromo'
GROUP BY region_name;
and get the following result
region_name | trips | unique_users | first_day | percent_on_first_day
-------------------+-------+--------------+-----------+-----------------------
A | 3 | 2 | 1 | 33.3333333333333333
B | 1 | 1 | 0 |
C | 1 | 1 | 1 | 2000.0000000000000000
The first rows percentage gets calculated correctly while the third rows percentage is 20 times what it should be. The percent_on_first_day should be 100.00 since it is 100.0 * 1/1.
Any help would be greatly appreciated

I'm suspecting that the issue is because of this code:
SUM(case when consumed_at = start_at then tripsq1.id end)
This tells me you are summing the ids, which is meaningless. You probably want:
SUM(case when consumed_at = start_at then 1 end)

Group rows into two types depending on a value in column

I have a table:
------------------------------------------
Uid | mount | category
-----------------------------------------
1 | 10 | a
1 | 3 | b
3 | 7 | a
4 | 1 | b
4 | 12 | a
4 | 5 | b
1 | 2 | c
2 | 5 | d
I want to have one result like this:
------------------------------------------
Uid | suma | sumnota
-----------------------------------------
1 | 10 | 5
2 | 0 | 5
3 | 7 | 0
4 | 12 | 6
Group by uid;
Suma is sum(mount) where catagory = 'a';
Sumnota is sum(mount) where catagory <> 'a';
Any ideas how to do it?

Use conditional aggregation with CASE statements in SUM() function:
SELECT
uid
, SUM(CASE WHEN category = 'a' THEN mount ELSE 0 END) AS suma
, SUM(CASE WHEN category IS DISTINCT FROM 'a' THEN mount ELSE 0 END) AS sumnota
FROM
yourtable
GROUP BY uid
ORDER BY uid
I'm using IS DISTINCT FROM clause to properly handle NULL values in category column. If that's not your case you could simply use <> operator.
From documentation (bold emphasis mine):
Ordinary comparison operators yield null (signifying "unknown"), not
true or false, when either input is null.
For non-null inputs, IS DISTINCT FROM is the same as the <> operator. However, if both inputs are null it returns false, and if only one input is null it returns true.

Here's a solution more "verbosed" than accepted answer.
WITH
t_suma AS ( SELECT uid, SUM(mount) AS suma
FROM your_table
WHERE category = 'a'
GROUP BY uid ),
t_sumnota AS ( SELECT uid, SUM(mount) AS sumnota
FROM your_table
WHERE category <> 'a' or category is NULL
GROUP BY uid )
SELECT distinct y.uid, COALESCE( suma, 0) AS suma, COALESCE( sumnota, 0 ) AS sumnota
FROM your_table y LEFT OUTER JOIN t_suma ON ( y.uid = t_suma.uid )
LEFT OUTER JOIN t_sumnota ON ( y.uid = t_sumnota.uid )
ORDER BY uid;

Select multiple row values into single row with multi-table clauses

I've searched the forums and while I see similar posts, they only address pieces of the full query I need to formulate (array_aggr, where exists, joins, etc.). If the question I'm posting has been answered, I will gladly accept references to those threads.
I did find this thread ...which is very similar to what I need, except it is for MySQL, and I kept running into errors trying to get it into psql syntax. Hoping someone can help me get everything together. Here's the scenario:
Attribute
attrib_id | attrib_name
UserAttribute
user_id | attrib_id | value
Here's a small example of what the data looks like:
Attribute
attrib_id | attrib_name
-----------------------
1 | attrib1
2 | attrib2
3 | attrib3
4 | attrib4
5 | attrib5
UserAttribute -- there can be up to 15 attrib_id's/value's per user_id
user_id | attrib_id | value
----------------------------
101 | 1 | valueA
101 | 2 | valueB
102 | 1 | valueC
102 | 2 | valueD
103 | 1 | valueA
103 | 2 | valueB
104 | 1 | valueC
104 | 2 | valueD
105 | 1 | valueA
105 | 2 | valueB
Here's what I'm looking for
Result
user_id | attrib1_value | attrib2_value
--------------------------------------------------------
101 | valueA | valueB
102 | valueC | valueD
103 | valueA | valueB
104 | valueC | valueD
105 | valueA | valueB
As shown, I'm looking for single rows that contain:
- user_id from the UserAttribute table
- attribute values from the UserAttribute table
Note: I only need attribute values from the UserAttribute table for two specific attribute names in the Attribute table
Again, any help or reference to an existing solution would be greatly appreciated.
UPDATE:
#ronin provided a query that gets the results desired:
SELECT ua.user_id
,MAX(CASE WHEN a.attrib_name = 'attrib1' THEN ua.value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN a.attrib_name = 'attrib2' THEN ua.value ELSE NULL END) AS attrib_2_val
FROM UserAttribute ua
JOIN Attribute a ON (a.attrib_id = ua.attrib_id)
WHERE a.attrib_name IN ('attrib1', 'attrib2')
GROUP BY ua.user_id;
To build on that, I tried to add some 'LIKE' pattern matching within the 'WHEN' condition (against the ua.value), but everything ends up as the 'FALSE' value. Will start a new question to see if that can be incorporated if I cannot figure it out. Thanks all for the help!!

If each attribute only has a single value for a user, you can start by making a sparse matrix:
SELECT user_id
,CASE WHEN attrib_id = 1 THEN value ELSE NULL END AS attrib_1_val
,CASE WHEN attrib_id = 2 THEN value ELSE NULL END AS attrib_2_val
FROM UserAttribute;
Then compress the matrix using an aggregate function:
SELECT user_id
,MAX(CASE WHEN attrib_id = 1 THEN value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN attrib_id = 2 THEN value ELSE NULL END) AS attrib_2_val
FROM UserAttribute
GROUP BY user_id;
In response to the comment, searching by attribute name rather than id:
SELECT ua.user_id
,MAX(CASE WHEN a.attrib_name = 'attrib1' THEN ua.value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN a.attrib_name = 'attrib2' THEN ua.value ELSE NULL END) AS attrib_2_val
FROM UserAttribute ua
JOIN Attribute a ON (a.attrib_id = ua.attrib_id)
WHERE a.attrib_name IN ('attrib1', 'attrib2')
GROUP BY ua.user_id;

Starting with Postgres 9.4 you can use the simpler aggregate FILTER clause:
SELECT user_id
,MAX(value) FILTER (WHERE attrib_id = 1) AS attrib_1_val
,MAX(value) FILTER (WHERE attrib_id = 2) AS attrib_2_val
FROM UserAttribute
WHERE attrib_id IN (1,2)
GROUP BY 1;
For more than a few attributes or for top performance, look to crosstab() from the additional module tablefunc (Postgres 8.3+). Details here:
PostgreSQL Crosstab Query

What about something like this:
select ua.user_id, a.attrib_name attrib_value1, a2.attrib_name attrib_value2
from user_attribute ua
left join attribute a on a.atribute_id=ua.attribute_id and a.attribute_id in (1,2)
left join user_attribute ua2 on ua2.user_id=ua.user_id and ua2.attribute_id > ua.attribute_id
left join attribute a2 on a2.attribute_id=ua2.attribute_id and a2.attribute_id in (1,2)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Calculate relative errors in Postgres - postgresql

Is there an easy way to calculate errors in Postgres given a table that looks something like this: id | bool | score 1 | False | 9 1 | True | 9.6 2 | False | 5 2 | True | 4.7 The output that I want id | (False_row - True_row)/True_row: id | err 1 | -0.0625 2 | 0.063829

Related

Need to have subquery within subquery

Finding the length of a series in postgres

I am computing a percentage in postgresql and I get the following unexpected behavior when dividing a number by the same number

Group rows into two types depending on a value in column

Select multiple row values into single row with multi-table clauses

Categories

Resources