What does filter mean? - postgresql

select driverid, count(*)
from f1db.results
where position is null
group by driverid order by driverid
My thought: first find out all the records that position is null then do the aggregate function.
select driverid, count(*) filter (where position is null) as outs
from f1db.results
group by driverid order by driverid
First time, I meet with filter clause, not sure what does it mean?
Two code block results are different.
Already googled, seems don't have many tutorials about the FILTER. Kindly share some links.

A more helpful term to search for is aggregate filters, since FILTER (WHERE) is only used with aggregates.
Filter clauses are an additional filter that is applied only to that aggregate and nowhere else. That means it doesn't affect the rest of the columns!
You can use it to get a subset of the data, like for example, to get the percentage of cats in a pet shop:
SELECT shop_id,
CAST(COUNT(*) FILTER (WHERE species = 'cat')
AS DOUBLE PRECISION) / COUNT(*) as "percentage"
FROM animals
GROUP BY shop_id
For more information, see the docs

FILTER ehm... filters the records which should be aggregated - in your case, your aggregation counts all positions that are NULL.
E.g. you have 10 records:
SELECT COUNT(*)
would return 10.
If 2 of the records have position = NULL
than
SELECT COUNT(*) FILTER (WHERE position IS NULL)
would return 2.
What you want to achieve is to count all non-NULL records:
SELECT COUNT(*) FILTER (WHERE position IS NOT NULL)
In the example, it returns 8.

Related

Selecting max value grouped by specific column

Focused DB tables:
Task:
For given location ID and culture ID, get max(crop_yield.value) * culture_price.price (let's call this multiplication monetaryGain) grouped by year, so something like:
[
{
"year":2014,
"monetaryGain":...
},
{
"year":2015,
"monetaryGain":...
},
{
"year":2016,
"monetaryGain":...
},
...
]
Attempt:
SELECT cp.price * max(cy.value) AS monetaryGain, EXTRACT(YEAR FROM cy.date) AS year
FROM culture_price AS cp
JOIN culture AS c ON cp.id_culture = c.id
JOIN crop_yield AS cy ON cy.id_culture = c.id
WHERE c.id = :cultureId AND cy.id_location = :locationId AND cp.year = year
GROUP BY year
ORDER BY year
The problem:
"columns "cp.price", "cy.value" and "cy.date" must appear in the GROUP BY clause or be used in an aggregate function"
If I put these three columns in GROUP BY, I won't get expected result - It won't be grouped just by year obviously.
Does anyone have an idea on how to fix/write this query better in order to get task result?
Thanks in advance!
The fix
Rewrite monetaryGain to be:
max(cp.price * cy.value) AS monetaryGain
That way you will not be required to group by cp.price because it is not outputted as an group member, but used in aggregate.
Why?
When you write GROUP BY query you can output only columns that are in GROUP BY list and aggregate function values. Well this is expected - you expect single row per group, but you may have several distinct values for the field that is not in grouping column list.
For the same reason you can not use a non grouping column(-s) in arithmetic or any other (not aggregate) function because this would lead in several results for in single row - there would not be a way to display.
This is VERY loose explanation but I hope will help to grasp the concept.
Aliases in GROUP BY
Also you should not use aliases in GROUP BY. Use:
GROUP BY EXTRACT(YEAR FROM cy.date)
Using alias in GROUP BY is not allowed. This link might explain why: https://www.postgresql.org/message-id/7608.1259177709%40sss.pgh.pa.us

Replacing null values by average of values grouped by concatenated categories in Teradata

Suppose that I have a lot of NULL values (missing values) in a column named 'score'. I want to replace them by a specific average not from all the values of the column 'score' but by groups that I built with a crosscategory from two concatenated categories:
This kind of query works for getting averages by groups:
SELECT
category1 || ' > ' || category2 AS crosscategory,
ROUND(CAST(AVG(score) AS FLOAT), 2) AS score_avg
FROM DatabaseName.TableName
GROUP BY crosscategory
ORDER BY score_avg;
This one works to replace NULL values by a constant:
SELECT
NVL(score, 0) AS score_without_missing_values
FROM DatabaseName.TableName
The problem that I cannot solve now is how to articulate the replacement of NULL values with a constant here the averages computed with the functions AVG and GROUP BY.
Thank you very much for your help!
Seems you want a Group Average:
SELECT
t.*,
coalesce(score, AVG(score) OVER (PARTITION BY category1, category2)) AS score_avg
FROM DatabaseName.TableName AS t
I removed the ROUND/CAST, because AVG returns FLOAT by default and ROUND in probably not needed (if you need it, you might better cast to a DECIMAL).

How to get ids of grouped by rows in postgresql and use result?

I have a table containing transactions with an amount. I want to create a batch of transactions so that the sum of amount of each 'group by' is negative.
My problematic is to get all ids of the rows concerned by a 'group by' where each group is validate by a sum condition.
I find many solutions which don't work for me.
The best solution I found is to request the db a first time with the 'group by' and the sum, then return ids to finally request the db another time with all of them.
Here an example of what I would like (it doesn't work!) :
SELECT * FROM transaction_table transaction
AND transaction.id IN (
select string_agg(grouped::character varying, ',' ) from (
SELECT array_agg(transaction2.id) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
)
);
the result of the array_agg is like:
{39758,39759}
{39757,39756,39755,39743,39727,39713}
and the string_agg is :
{39758,39759},{39757,39756,39755,39743,39727,39713}
Now I just need to use them but I don't know how to...
unfortunatly, it doesn't work because of type casting :
ERROR: operator does not exist: integer = integer[]
IndiceĀ : No operator matches the given name and argument type(s). You might need to add explicit type casts.
Maybe you are looking for
SELECT id, motto, accountbnf, payment, accountclnt, amount
FROM (SELECT id, motto, accountbnf, payment, accountclnt, amount,
sum(amount)
OVER (PARTITION BY motto, accountbnf, payment, accountclnt)
AS group_total
FROM transaction_table) AS q
WHERE group_total < 0;
The inner SELECT adds an additional column using a window function that calculates the sum for each group, and the outer query removes all results where that sum is not negative.
Finally I found this option using the 'unnest' method. It works perfectly.
Array_agg bring together all ids in different array
unnest flattened all of them
This comes from here
SELECT * FROM transaction_table transaction
WHERE transaction.id = ANY(
SELECT unnest(array_agg(transaction2.id)) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
);
The problem with this solution is that hibernate doesn't take into account the array_agg method.

comprare aggregate sum function to number in postgres

I have the next query which does not work:
UPDATE item
SET popularity= (CASE
WHEN (select SUM(io.quantity) from item i NATURAL JOIN itemorder io GROUP BY io.item_id) > 3 THEN TRUE
ELSE FALSE
END);
Here I want to compare each line of inner SELECT SUM value with 3 and update popularity. But SQL gives error:
ERROR: more than one row returned by a subquery used as an expression
I understand that inner SELECT returns many values, but can smb help me in how to compare each line. In other words make loop.
When using a subquery you need to get a single row back, so you're effectively doing a query for each record in the item table.
UPDATE item i
SET popularity = (SELECT SUM(io.quantity) FROM itemorder io
WHERE io.item_id = i.item_id) > 3;
An alternative (which is a postgresql extension) is to use a derived table in a FROM clause.
UPDATE item i2
SET popularity = x.orders > 3
FROM (select i.item_id, SUM(io.quantity) as orders
from item i NATURAL JOIN itemorder io GROUP BY io.item_id)
as x(item_id,orders)
WHERE i2.item_id = x.item_id
Here you're doing a single group clause as you had, and we're joining the table to be updated with the results of the group.

PostgreSQL 9.1: Find exact expression in title

I'm looking for a way to search a specific expression - then a part of its - in all documents (and its values associated). The final order should be:
Complete expression (in title or content): using ILIKE and '%expression%'
One of the word (in title or content): using tsquery on tsvector indexes columns
I have two tables:
documents (id [integer], title [character varying], title_search [tsvector])
values (id [integer], content [character varying], content_search [tsvector], id_document [integer])
Here the request I am doing right now:
(SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE (title ILIKE(unaccent('%lorem ipsum%')) OR content ILIKE(unaccent('%lorem ipsum%'))))
UNION (SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE query ## title_search)
UNION (SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE query ## content_search)
ORDER BY rank DESC, title ASC
By doing this, I can get all documents with this expression and/or a part of its but I can't get to have those correctly ordrered. This is because I am relying on ranking with ts_rank on tsvector field which cannot be used to define exact expression.
So my questions are how I can get this to work as I expect? Am I wrong for using full text search?
Thank you.
It's a bit awkward but a solution I've used before is to include an extra "rank" column in your individual subqueries. For instance, the fist query would look like
select 1 as which_rank, id, title, ...
....
where title ILIKE(unaccent('%lorem ipsum%'))
OR content ILIKE(unaccent('%lorem ipsum%')))
then the second would be
select 2 as which_rank, id, title, ...
...
where query ## title_search
and the third would be
select 3 as which_rank, id, title, ...
...
where query ## content_search
If you include that ranking value in your sort order:
ORDER BY which_rank asc, rank DESC, title ASC
you can make sure the first case gets listed first, the second second, and the third third. You can also re-arrange which is 1, 2, 3 depending on your needs.