PostgreSQL - Conditional aggregation - Avg() in Select statement

PostgreSQL - Conditional aggregation - Avg() in Select statement - postgresql

I have this table
| user | Mark | Points |
|--------------|------------|----------|
| John | 0 | 2 |
| Paul | 5 | 3 |
| John | 4 | 4 |
| Paul | 7 | 5 |
I would like to build a query with one select statement that returns the rows shown below.
Avg(Mark) - should be average only if Mark>0
Sum(Points) - should be sum of all records.
| user | Avg(Mark) | Sum(Points) |
|--------------|------------|-------------|
| John | 4 | 6 |
| Paul | 6 | 8 |
Can anyone point to a proper syntax?
I believe it should like :
select user, avg(Mark>0), sum(Points) from Table group by user;

Starting with version 9.4, PostgreSQL directly supports filtering aggregates.
https://www.postgresql.org/docs/9.4/static/sql-expressions.html
If FILTER is specified, then only the input rows for which the filter_clause evaluates to true are fed to the aggregate function; other rows are discarded.
By using it, your example can be rewritten as:
SELECT
"user",
AVG(mark) FILTER (WHERE mark > 0),
SUM(points)
FROM
"table"
GROUP BY
"user"

How about:
select user,
avg(case when mark > 0 then mark end),
sum(mark)
from ...

select
user, -- very bad choice for column name, but i assume it's just SO example, not real column
sum( mark ) / count (nullif(mark, 0))
from
table
group by
user
should so the trick.

Related

How can I `SUM()` in PostgreSQL based on certain condition? For summing debits and credits in accounting journal table

I have a database full with accounting journals. There is table for accounting journal itself (the accounting journal's metadata) and there is a table for accounting journal line (for each account with its debit or credit).
I have database like this:
+----+---------------+--------+---------+
| ID | JOURNAL_NAME | DEBIT | CREDIT |
+----+---------------+--------+---------+
| | | | |
| 1 | INV/0001 | 100 | 0 |
| | | | |
| 2 | INV/0001 | 0 | 100 |
| | | | |
| 3 | INV/0002 | 200 | 0 |
| | | | |
| 4 | INV/0002 | 0 | 200 |
+----+---------------+--------+---------+
I want to have all journal with the same name to be summed in one, their debits and credits. So from the above table... I want to have a query that makes something like this:
+--------------+--------+---------+
| JOURNAL_NAME | DEBIT | CREDIT |
+--------------+--------+---------+
| | | |
| INV/0001 | 100 | 100 |
| | | |
| INV/0002 | 200 | 200 |
+--------------+--------+---------+
I have tried with:
SELECT DISTINCT ON (accounting_journal.id)
accounting_journal.name,
accounting_journal_line.debit,
accounting_journal_line.credit
FROM accounting_journal_line
JOIN accounting_journal ON accounting_journal.id = accounting_journal_line.move_id
ORDER BY accounting_journal.id ASC
LIMIT 3;
With the above query, I have all the journal and the journal lines. I just need to have the above query to sum the debits and credits for every same accounting_journal.name.
I have tried with SUM() but it always stuck in GROUP BY` clause.
SELECT DISTINCT ON (accounting_journal.id)
accounting_journal.name,
accounting_journal.ref,
accounting_journal_line.name,
SUM(accounting_journal_line.debit),
SUM(accounting_journal_line.credit)
FROM accounting_journal_line
JOIN accounting_journal ON accounting_journal.id = accounting_journal_line.move_id
ORDER BY accounting_journal.id ASC
LIMIT 3;
The error:
Error in query (7): ERROR: column "accounting_journal.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 2: accounting_journal.name,
I hope I can get assistance or pointer where I need to look at, here. Thanks!

When you are using any aggregation function with normal columns then your have to mention all the non-aggregating column in group by clause,
So try This:
SELECT DISTINCT ON (accounting_journal.id)
accounting_journal.name,
accounting_journal.ref,
accounting_journal_line.name,
SUM(accounting_journal_line.debit),
SUM(accounting_journal_line.credit)
FROM accounting_journal_line
JOIN accounting_journal ON accounting_journal.id = accounting_journal_line.move_id
group by 1,2,3
ORDER BY accounting_journal.id ASC
LIMIT 3;
In your query you are having 3 non-aggregation column so you can mention column number in group by clause to achieve it.

You can use the Sum Window Function, it does not require "group by". So:
select aj.id journal_id
aj.name journal_name,
aj.ref journal_ref,
ajl.name line_name,
sum(ajl.debit) over(partition by aj.id) total_debit,
sum(ajl.credit) over(partition by aj.id) total_credit
from accounting_journal_line ajl
join accounting_journal aj
on aj.id = ajl.move_id
order by aj.id;
See fiddle for a working example.

Reset column with numeric value that represents the order when destroying a row

I have a table of users that has a column called order that represents the order in they will be elected.
So, for example, the table might look like:
| id | name | order |
|-----|--------|-------|
| 1 | John | 2 |
| 2 | Mike | 0 |
| 3 | Lisa | 1 |
So, say that now Lisa gets destroyed, I would like that in the same transaction that I destroy Lisa, I am able to update the table so the order is still consistent, so the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 2 | Mike | 0 |
Or, if Mike were the one to be deleted, the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 3 | Lisa | 0 |
How can I do this in PostgreSQL?

If you are just deleting one row, one option uses a cte and the returning clause to then trigger an update
with del as (
delete from mytable where name = 'Lisa'
returning ord
)
update mytable
set ord = ord - 1
from del d
where mytable.ord > d.ord
As a more general approach, I would really recommend trying to renumber the whole table after every delete. This is inefficient, and can get tedious for multi-rows delete.
Instead, you could build a view on top of the table:
create view myview as
select id, name, row_number() over(order by ord) ord
from mytable

Selecting value for the latest two distinct columns

I am trying to do an SQL which will return the latest data value of the two distinct columns of my table.
Currently, I select distinct the values of the column and afterwards, I iterate through the columns to get the distinct values selected before then order and limit to 1. These tags can be any number and may not always be posted together (one time only tag 1 can be posted; whereas other times 1, 2, 3 can).
Although it gives the expected outcome, this seems to be inefficient in a lot of ways, and because I don't have enough SQL experience, this was so far the only way I found of performing the task...
--------------------------------------------------
| name | tag | timestamp | data |
--------------------------------------------------
| aa | 1 | 566 | 4659 |
--------------------------------------------------
| ab | 2 | 567 | 4879 |
--------------------------------------------------
| ac | 3 | 568 | 1346 |
--------------------------------------------------
| ad | 1 | 789 | 3164 |
--------------------------------------------------
| ae | 2 | 789 | 1024 |
--------------------------------------------------
| af | 3 | 790 | 3346 |
--------------------------------------------------
Therefore the expected outcome is {3164, 1024, 3346}
Currently what I'm doing is:
"select distinct tag from table"
Then I store all the distinct tag values programmatically and iterate programmatically through these values using
"select data from table where '"+ tags[i] +"' in (tag) order by timestamp desc limit 1"
Thanks,

This comes close, but beware if you have two rows with the same tag share a maximum timestamp you will get duplicates in the result set
select data from table
join (select tag, max(timestamp) maxtimestamp from table t1 group by tag) as latesttags
on table.tag = latesttags.tag and table.timestamp = latesttags.maxtimestamp

Join column with timestamps where value is maximum

I have a table that looks like
+-------+-----------+
| value | timestamp |
+-------+-----------+
and I'm trying to build a query that gives a result like
+-------+-----------+------------+------------------------+
| value | timestamp | MAX(value) | timestamp of max value |
+-------+-----------+------------+------------------------+
so that the result looks like
+---+----------+---+----------+
| 1 | 1.2.1001 | 3 | 1.1.1000 |
| 2 | 5.5.1021 | 3 | 1.1.1000 |
| 3 | 1.1.1000 | 3 | 1.1.1000 |
+---+----------+---+----------+
but I got stuck on joining the column with the corresponding timestamps.
Any hints or suggestions?
Thanks in advance!
For further information (if that helps):
In the real project the max-values are grouped by month and day (with group by clause, which works btw), but somehow I got stuck on joining the timestamps for max-values.
EDIT
Cross joins are a good idea, but I want to have them grouped by month e.g.:
+---+----------+---+----------+
| 1 | 1.1.1101 | 6 | 1.1.1300 |
| 2 | 2.6.1021 | 5 | 5.6.1000 |
| 3 | 1.1.1200 | 6 | 1.1.1300 |
| 4 | 1.1.1040 | 6 | 1.1.1300 |
| 5 | 5.6.1000 | 5 | 5.6.1000 |
| 6 | 1.1.1300 | 6 | 1.1.1300 |
+---+----------+---+----------+
EDIT 2
I've added a fiddle for some sample data and and example of the current query.
http://sqlfiddle.com/#!1/efa42/1
How to add the corresponding timestamp to the maximum?

Try a cross join with two sub queries, the first one selects all records, the second one gets one row that represents the time_stamp of the max value, <3;"1000-01-01"> for example.
SELECT col_value,col_timestamp,max_col_value, col_timestamp_of_max_value FROM table1
cross join
(
select max(col_value) max_col_value ,col_timestamp col_timestamp_of_max_value from table1
group by col_timestamp
order by max_col_value desc
limit 1
) A --One row that represents the time_stamp of the max value, ie: <3;"1000-01-01">

Use the window cause you use with pg
Select *, max( value ) over (), max( timestamp ) over() from table
That gives you the max values from all values in every row
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

Sum of the most recent non-null columns (window function with "ignore nulls")

I am using PostgreSQL 9.1.9.
In the project I am working on, some most recent records have null columns because that information was not available when that row was created. I have a view that lists the sum of rows that belongs to the members of a group. As of right now, the view shows the sum of the most recent columns, which uses null values if those are the most recent values. For example,
table1
group_name | member
-------------------
group1 | Andy
group1 | Bob
table2
name | stat_date | col1 | col2 | col 3
--------------------------------------
Andy | 6/19/13 | null | 1 | 2
Andy | 6/18/13 | 100 | 3 | 5
Bob | 6/19/13 | 50 | 9 | 12
Bob | 6/18/13 | 111 | 31 | 51
-- creating view would be something like this...
create view v_grouped as
select table1.group_name, stat_date,
sum(col1) as col1_sum, sum(col2) as col2_sum, sum(col3) as col3_sum
from table1
join table2 on table1.member = table2.name
group by table1.group_name, table2.stat_date;
Current view looks like this:
group_name | stat_date | col1_sum | col2_sum | col3_sum
-------------------------------------------------------
group1 | 6/19/13 | 50 | 10 | 14
group2 | 6/18/13 | 211 | 34 | 56
Instead of 50, 150 would be a closer representation of what the actual group total is, despite lack of data for 6/19. So, I want an output of
group_name | stat_date | col1_sum | col2_sum | col3_sum
-------------------------------------------------------
group1 | 6/19/13 | 150 | 10 | 14
group2 | 6/18/13 | 211 | 34 | 56
I've been looking at first_value() from window functions as a possible function to use. I found that Oracle's first_value() supports the ignore nulls option which I believe will do what I want (http://psoug.org/definition/FIRST_VALUE.htm). According to this page I linked, about PL/SQL's first_value() function:
If the first value in the result set is NULL then the function returns NULL unless you specify IGNORE NULLS.
If you use the IGNORE NULLS parameter then FIRST_VALUE will return the first non-null value found in the result set. (If all
values are null then it will return NULL.)
Example Syntax: FIRST_VALUE(expression [INGORE NULLS]) OVER (analytic_clause)
But PostgreSQL's first_value() does not support such an option. Is there a way to do this in PostgreSql? Thank you in advance!

You can use this custom aggregate as a postgres variant of FIRST_VALUE(expression INGORE NULLS). Or build your own aggregate with desired behavior.

Is this what you are trying to describe?
SELECT sum(col1), sum(col2), sum(col3) FROM table2 WHERE col1 IS NOT NULL
(although I omitted the join on table1; that is an exercise for the reader)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PostgreSQL - Conditional aggregation - Avg() in Select statement - postgresql

How about: select user, avg(case when mark > 0 then mark end), sum(mark) from ...

select user, -- very bad choice for column name, but i assume it's just SO example, not real column sum( mark ) / count (nullif(mark, 0)) from table group by user should so the trick.

Related

How can I `SUM()` in PostgreSQL based on certain condition? For summing debits and credits in accounting journal table

Reset column with numeric value that represents the order when destroying a row

Selecting value for the latest two distinct columns

Join column with timestamps where value is maximum

Sum of the most recent non-null columns (window function with "ignore nulls")

Categories

Resources