If I can't use aggregate in a where clause, how to get results - postgresql

Ok I have a query where I need to ommit the result if the first value of an array_agg = natural so I thought I can do this:
select
visitor_id,
array_agg(code
order by session_start) codes_array
from mark_conversion_sessions
where conv_visit_num2 < 2
and max_conv = 1
and (array_agg(code
order by session_start))[1] != 'natural'
group by visitor_id
But when I run this I get the error:
ERROR: aggregate functions are not allowed in WHERE
LINE 31: and (array_agg(code
So is there a way I can reference that array_agg in the where clause?
Thank you

The having clause is used to act like a where clause on grouped data. Move the criteria that is using aggregates into the having clause, eg:
select
visitor_id,
array_agg(code order by session_start) codes_array
from mark_conversion_sessions
where
conv_visit_num2 < 2
and max_conv = 1
group by visitor_id
having
(array_agg(code order by session_start))[1] != 'natural'
docs:
https://www.postgresql.org/docs/9.6/static/tutorial-agg.html

Related

How to reference a column in the select clause in the order clause in SQLAlchemy like you do in Postgres instead of repeating the expression twice

In Postgres if one of your columns is a big complicated expression you can just say ORDER BY 3 DESC where 3 is the order of the column where the complicated expression is. Is there anywhere to do this in SQLAlchemy?
As Gord Thompson observes in this comment, you can pass the column index as a text object to group_by or order_by:
q = sa.select(sa.func.count(), tbl.c.user_id).group_by(sa.text('2')).order_by(sa.text('2'))
serialises to
SELECT count(*) AS count_1, posts.user_id
FROM posts GROUP BY 2 ORDER BY 2
There are other techniques that don't require re-typing the expression.
You could use the selected_columns property:
q = sa.select(tbl.c.col1, tbl.c.col2, tbl.c.col3)
q = q.order_by(q.selected_columns[2]) # order by col3
You could also order by a label (but this will affect the names of result columns):
q = sa.select(tbl.c.col1, tbl.c.col2, tbl.c.col3.label('c').order_by('c')

Druid SQL: filter on result of expression

I have HTTP access log data in a Druid data source, and I want to see access patterns based on certain identifiers in the URL path. I wrote this query, and it works fine:
select regexp_extract(path, '/id/+([0-9]+)', 1) as "id",
sum("count") as "request_count"
from "access-logs"
where __time >= timestamp '2022-01-01'
group by 1
The only problem is that not all requests match that pattern, so I get one row in the result with an empty "id". I tried adding an extra condition in the where clause:
select regexp_extract(path, '/id/+([0-9]+)', 1) as "id",
sum("count") as "request_count"
from "access-logs"
where __time >= timestamp '2022-01-01' and "id" != ''
group by 1
But when I do that, I get this error message:
Error: Plan validation failed: org.apache.calcite.runtime.CalciteContextException:
From line 4, column 46 to line 4, column 49: Column 'id' not found in any table
So it doesn't let me reference the result of the expression in the where clause. I could of course just copy the entire regexp_extract expression, but is there a cleaner way of doing this?
Since id is an aggregated column, you would need a HAVING clause to filter on it.

What does filter mean?

select driverid, count(*)
from f1db.results
where position is null
group by driverid order by driverid
My thought: first find out all the records that position is null then do the aggregate function.
select driverid, count(*) filter (where position is null) as outs
from f1db.results
group by driverid order by driverid
First time, I meet with filter clause, not sure what does it mean?
Two code block results are different.
Already googled, seems don't have many tutorials about the FILTER. Kindly share some links.
A more helpful term to search for is aggregate filters, since FILTER (WHERE) is only used with aggregates.
Filter clauses are an additional filter that is applied only to that aggregate and nowhere else. That means it doesn't affect the rest of the columns!
You can use it to get a subset of the data, like for example, to get the percentage of cats in a pet shop:
SELECT shop_id,
CAST(COUNT(*) FILTER (WHERE species = 'cat')
AS DOUBLE PRECISION) / COUNT(*) as "percentage"
FROM animals
GROUP BY shop_id
For more information, see the docs
FILTER ehm... filters the records which should be aggregated - in your case, your aggregation counts all positions that are NULL.
E.g. you have 10 records:
SELECT COUNT(*)
would return 10.
If 2 of the records have position = NULL
than
SELECT COUNT(*) FILTER (WHERE position IS NULL)
would return 2.
What you want to achieve is to count all non-NULL records:
SELECT COUNT(*) FILTER (WHERE position IS NOT NULL)
In the example, it returns 8.

array_agg DISTINCT and ORDER

I'm trying to make a query in PostgreSQL for include results from 2 (or more) tables using left join lateral, and I need to have one record for each record for table entidad_a_ (main table) and all the records from table entidad_b_ must be included in one field generated by array_agg. And in this array, I have to delete duplicate elements and I have to preserve order array in main table.
I need to execute this SQL query:
SELECT entidad_a_._id_ AS "_id", CASE WHEN count(entidadB) > 0 THEN array_agg(DISTINCT entidadB._id,ordinality order by ordinality)
ELSE NULL END AS "entidadB"
FROM entidad_a_ as entidad_a_, unnest(entidad_a_.entidad_b_) WITH ORDINALITY AS u(entidadb_id, ordinality)
LEFT JOIN LATERAL (
SELECT entidad_b_3._id_ AS "_id", entidad_b_3.label_ AS "label"
FROM entidad_b_ as entidad_b_3
WHERE entidad_b_3._id_ = entidadb_id
GROUP BY entidad_b_3._id_
LIMIT 1000 OFFSET 0
) entidadB ON TRUE
GROUP BY entidad_a_._id_
LIMIT 1000 OFFSET 0
But I have errors....
How can I have these results?
Edited:
My error is:
ERROR: function array_agg (integer, bigint) does not exist
SQL state: 42883
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Character: 69
If the query is:
......array_agg (DISTINCT entidadB._id order by ordinality).....
The eror is:
ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
SQL state: 42P10
Character: 110
My problem is the combination of array_agg, DISTINCT, and ORDER by
Solved!! I've created a postgres extension with a custom aggregation.
CREATE AGGREGATE array_agg_dist (anyelement)
(
sfunc = array_agg_transfn_dist,
stype = internal,
finalfunc = array_agg_finalfn_dist,
finalfunc_extra
);
Creating functions and c code for this custom functions.

comprare aggregate sum function to number in postgres

I have the next query which does not work:
UPDATE item
SET popularity= (CASE
WHEN (select SUM(io.quantity) from item i NATURAL JOIN itemorder io GROUP BY io.item_id) > 3 THEN TRUE
ELSE FALSE
END);
Here I want to compare each line of inner SELECT SUM value with 3 and update popularity. But SQL gives error:
ERROR: more than one row returned by a subquery used as an expression
I understand that inner SELECT returns many values, but can smb help me in how to compare each line. In other words make loop.
When using a subquery you need to get a single row back, so you're effectively doing a query for each record in the item table.
UPDATE item i
SET popularity = (SELECT SUM(io.quantity) FROM itemorder io
WHERE io.item_id = i.item_id) > 3;
An alternative (which is a postgresql extension) is to use a derived table in a FROM clause.
UPDATE item i2
SET popularity = x.orders > 3
FROM (select i.item_id, SUM(io.quantity) as orders
from item i NATURAL JOIN itemorder io GROUP BY io.item_id)
as x(item_id,orders)
WHERE i2.item_id = x.item_id
Here you're doing a single group clause as you had, and we're joining the table to be updated with the results of the group.