Why can you nest aggregate functions when using a window function in PostgreSQL? - postgresql

I'm trying to understand window functions a bit better, and I'm stumped as to why I can't run a nested aggregate function normally, but I can when using a window function.
This is the dbfiddle I'm working off of: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=76d62fcf4066053db18783e70269438c
Before running the window function, basically everything else in my query is evaluated (JOIN and GROUP BY).
So I believe the data the window function is working off of is something like this (after grouping):
Or is it something like this?
So why can I do this: SUM(COUNT(votes.option_id)) OVER(), but I can't do it without OVER()?
As far as I understand, OVER() makes the SUM(COUNT(votes.option_id)) run on this related data set, but it's still a nested aggregate function.
What am I missing?
Thank you very much!

If you have something like SUM(COUNT(votes.option_id)) OVER() you can think of COUNT(votes.option_id) as a column generated in the GROUP BY clause.
According to the documentation:
The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any.
This means that window functions operate at a level above the GROUP BY clause and any aggregates, and therefore aggregates are available to be used inside window functions. In your example the "virtual table" corresponds to the second picture.
The reason you cannot nest aggregate functions is that you cannot have multiple levels of GROUP BY on the same query. Similarly you cannot nest window functions. The documentation is clear on what type of expression are allowed inside aggregate and window functions. For aggregates functions we can use:
any value expression that does not itself contain an aggregate expression or a window function call
while for window functions we can use:
any value expression that does not itself contain window function calls

Related

Can you use GROUP BY and Window function together?

This is the code I'm trying to use
From orders as o, o.items as it
Group by o.orderno
Select o.orderno, Max(it.price) OVER (PARTITION BY it.itemno ORDER BY o.orderno) as MaxPrice;
But SQL++ returns an error which states:
Cannot resolve alias reference for undefined identifier it (in line 3, at column 23)
I'm trying to figure out if both GROUP BY and a window function can be used in the same query.
WINDOW functions and GROUP/Aggregates can be used together.
But WINDOW functions must depended on (including OVER clause expressions) constant, group expressions or aggregate functions.
No other expression is allowed. This is restriction of GROUP (The reason once you do group by, There is no way you can reduce other expression to single value with out aggregate). Example for GROUP BY restriction

PYSPARK : Finding a Mean of a variables excluding the top 1 percentile of data

I have a dataset which is getting grouped by multiple variables where we finding aggregates like mean , std dev etc. Now i want to find Mean of a variables excluding the top 1 percentile of data
I am trying something like
df_final=df.groupby(groupbyElement).agg(mean('value').alias('Mean'),stddev('value').alias('Stddev'),expr('percentile(value, array(0.99))')[0].alias('99_percentile'),mean(when(col('value')<=col('99_percentile'),col('value')))
But it seems spark cannot use the agg name which is defined in the same group statement.
I even tried this ,
~df_final=df.groupby(groupbyElement).agg(mean('value').alias('Mean'),stddev('value').alias('Stddev'),mean(when(col('value')<=expr('percentile(value, array(0.99))')[0],col('value')))~
But it throws below error:
pyspark.sql.utils.AnalysisException: 'It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.
I hope some one would be able to answer this
Update :
I try doing the otherway
Here's a straightforward modification of your code. It will aggregate df twice. As far as I can tell, that's what is required.
df_final=(
df.join(df
.groupby(groupbyElement)
.agg(expr('percentile(value, array(0.99))')[0].alias('99_percentile'),
on=["groupbyElement"], how="left"
)
.groupby(groupbyElement)
.agg(mean('value').alias('Mean'),
stddev('value').alias('Stddev'),
mean(when(col('value')<=col('99_percentile'), col('value')))
)

SSRS multi value parameter - can't get it to work

First off this is my first attempt at a multi select. I've done a lot of searching but I can't find the answer that works for me.
I have a postgresql query which has bg.revision_key in (_revision_key) which holds the parameter. A side note, we've named all our parameters in the queries with the underscore and they all work, they are single select in SSRS.
In my SSRS report I have a parameter called Revision Key Segment which is the multi select parameter. I've ticked Allow multi value and in Available Values I have value field pointing to revision_key in the dataset.
In my dataset parameter options I have Parameter Value [#revision_key]
In my shared dataset I also have my parameter set to Allow multi value.
For some reason I can't seem to get the multi select to work so I must be missing something somewhere but I've ran out of ideas.
Unlike with SQL Server, when you connect to a database using an ODBC connection, the parameter support is different. You cannot use named parameters and instead have to use the ? syntax.
In order to accommodate multiple values you can concatenate them into a single string and use a like statement to search them. However, this is inefficient. Another approach is to use a function to split the values into an in-line table.
In PostgreSQL you can use an expression like this:
inner join (select CAST(regexp_split_to_table(?, ',') AS int) as filter) as my on my.filter = key_column
Then in the dataset properties, under the parameters tab, use an expression like this to concatenate the values:
=Join(Parameters!Keys.Value, ",")
In other words, the report is concatenating the values into a comma-separated list. The database is splitting them into a table of integers then inner joining on the values.

Tableau: Create a table calculation that sums distinct string values (names) when condition is met

I am getting my data from denormalized table, where I keep names and actions (apart from other things). I want to create a calculated field that will return sum of workgroup names but only when there are more than five actions present in DB for given workgroup.
Here's how I have done it when I wanted to check if certain action has been registered for workgroup:
WINDOW_SUM(COUNTD(IF [action] = "ADD" THEN [workgroup_name] END))
When I try to do similar thing with count, I am getting "Cannot mix aggregate and non-aggregate arguments":
WINDOW_SUM(COUNTD(IF COUNT([Number of Records]) > 5 THEN [workgroup_name] END))
I know that there's problem with the IF clause, but don't know how to fix it.
How to change the IF to be valid? Maybe there's an easier way to do it, that I am missing?
EDIT:
(after Inox's response)
I know that my problem is mixing aggregate with non-aggregate fields. I can't use filter to do it, because I want to use it later as a part of more complicated view - filtering would destroy the whole idea.
No, the problem is to mix aggregated arguments (e.g., sum, count) with non aggregate ones (e.g., any field directly). And that's what you're doing mixing COUNT([Number of Records]) with [workgroup_name]
If your goal is to know how many workgroup_name (unique) has more than 5 records (seems like that by the idea of your code), I think it's easier to filter then count.
So first you drag workgroup_name to Filter, go to tab conditions, select By field, Number of Records, Count, >, 5
This way you'll filter only the workgroup_name that has more than 5 records.
Now you can go with a simple COUNTD(workgroup_name)
EDIT: After clarification
Okay, than you need to add a marker that is fixed in your database. So table calculations won't help you.
By definition table calculation depends on the fields that are on the worksheet (and how you decide to use those fields to partition or address), and it's only calculated AFTER being called in a sheet. That way, each time you call the function it will recalculate, and for some analysis you may want to do, the fields you need to make the table calculation correct won't be there.
Same thing applies to aggregations (counts, sums,...), the aggregation depends, well, on the level of aggregation you have.
In this case it's better that you manipulate your data prior to connecting it to Tableau. I don't see a direct way (a single calculated field that would solve your problem). What can be done is to generate a db from Tableau (with the aggregation of number of records for each workgroup_name) then export it to csv or mdb and then reconnect it to Tableau. But if you can manipulate your database outside Tableau, it's usually a better solution

ormlite select count(*) as typeCount group by type

I want to do something like this in OrmLite
SELECT *, COUNT(title) as titleCount from table1 group by title;
Is there any way to do this via QueryBuilder without the need for queryRaw?
The documentation states that the use of COUNT() and the like necessitates the use of selectRaw(). I hoped for a way around this - not having to write my SQL as strings is the main reason I chose to use ORMLite.
http://ormlite.com/docs/query-builder
selectRaw(String... columns):
Add raw columns or aggregate functions
(COUNT, MAX, ...) to the query. This will turn the query into
something only suitable for using as a raw query. This can be called
multiple times to add more columns to select. See section Issuing Raw
Queries.
Further information on the use of selectRaw() as I was attempting much the same thing:
Documentation states that if you use selectRaw() it will "turn the query into" one that is supposed to be called by queryRaw().
What it does not explain is that normally while multiple calls to selectColumns() or selectRaw() are valid (if you exclusively use one or the other),
use of selectRaw() after selectColumns() has a 'hidden' side-effect of wiping out any selectColumns() you called previously.
I believe that the ORMLite documentation for selectRaw() would be improved by a note that its use is not intended to be mixed with selectColumns().
QueryBuilder<EmailMessage, String> qb = emailDao.queryBuilder();
qb.selectColumns("emailAddress"); // This column is not selected due to later use of selectRaw()!
qb.selectRaw("COUNT (emailAddress)");
ORMLite examples are not as plentiful as I'd like, so here is a complete example of something that works:
QueryBuilder<EmailMessage, String> qb = emailDao.queryBuilder();
qb.selectRaw("emailAddress"); // This can also be done with a single call to selectRaw()
qb.selectRaw("COUNT (emailAddress)");
qb.groupBy("emailAddress");
GenericRawResults<String[]> rawResults = qb.queryRaw(); // Returns results with two columns
Is there any way to do this via QueryBuilder without the need for queryRaw(...)?
The short answer is no because ORMLite wouldn't know what to do with the extra count value. If you had a Table1 entity with a DAO definition, what field would the COUNT(title) go into? Raw queries give you the power to select various fields but then you need to process the results.
With the code right now (v5.1), you can define a custom RawRowMapper and then use the dao.getRawRowMapper() method to process the results for Table1 and tack on the titleCount field by hand.
I've got an idea how to accomplish this in a better way in ORMLite. I'll look into it.