I need to iterate through a table column and for each value to execute a simple SELECT statement.
I get the result table with the following statement:
SELECT event_id, count(event_id) as occurence
FROM event
GROUP BY event_id
ORDER BY occurence DESC
LIMIT 50
Output:
event_id | occurence
---------------------
1234567 | 56678
8901234 | 86753
For each event_id from the output table I need to execute a SELECT statement like:
SELECT * FROM event WHERE event_id = 'event_id from result row'
Expected output:
event_id | even_type | event_time
----------------------------
1234567 | ....... | .......
1234567 | ....... | .......
8901234 | ....... | .......
8901234 | ....... | .......
In other words: I need to get the 50 most occuring event_ids from the event table and then retrieve all available data for those specific events.
How can I achieve that?
There are probably a few way to handle this but here is one way:
SELECT a.*, b.event_type, b.event_time
FROM
(
SELECT event_id, count(event_id) as occurence
FROM event
GROUP BY event_id
ORDER BY occurence DESC
LIMIT 50
) a
JOIN event b ON (b.event_id = a.event_id)
;
Instead of specific columns from what I called 'b' you could select b.* for all columns.
No need to even join - just simply use a window function! See below:
SELECT *
FROM (
SELECT
*,
COUNT(*) OVER(PARTITION BY event_id) AS event_count
FROM event
) A
ORDER BY event_count DESC
LIMIT 50
Related
I have a postgresql type and a table
CREATE TYPE mem_status AS ENUM('waiting', 'active', 'expired');
CREATE TABLE mems (
id BIGSERIAL PRIMARY KEY,
status mem_status NOT NULL
);
dataset
INSERT INTO mems(id, status) VALUES
(1, 'active'), (2, 'active'), (3, 'expired');
I want to query counts that grouped by statuses. So I treid the query below.
WITH mem_statuses AS (
SELECT unnest(enum_range(NULL::mem_status)) AS status
)
SELECT m.status, count(1)
FROM mems m
RIGHT JOIN mem_statuses ms ON ms.status = m.status
GROUP BY m.status;
But if there is no waiting mems, the result looks like below.
status | count
================
NULL | 1 <- problem
'active' | 2
'expired' | 1
I want to get result like this.
status | count
================
'waiting' | 0
'active' | 2
'expired' | 1
How can I do that?
Use count(id):
WITH mem_statuses AS (
SELECT unnest(enum_range(NULL::mem_status)) AS status
)
SELECT ms.status, count(id)
FROM mems m
RIGHT JOIN mem_statuses ms ON ms.status = m.status
GROUP BY ms.status;
or:
select status, count(id)
from unnest(enum_range(null::mem_status)) as status
left join mems using(status)
group by status
status | count
---------+-------
waiting | 0
active | 2
expired | 1
(3 rows)
Per the documentation count(expression) gives
number of input rows for which the value of expression is not null
You need to modify the join and aggregate a bit -
select ms.status, count(m.status)
from (select unnest(enum_range(null::mem_status))) as ms(status)
left join mems as m
on ms.status = m.status
group by ms.status;
This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 4 years ago.
I have next data:
id | name | amount | datefrom
---------------------------
3 | a | 8 | 2018-01-01
4 | a | 3 | 2018-01-15 10:00
5 | b | 1 | 2018-02-20
I can group result with the next query:
select name, max(amount) from table group by name
But I need the id of selected row too. Thus I have tried:
select max(id), name, max(amount) from table group by name
And as it was expected it returns:
id | name | amount
-----------
4 | a | 8
5 | b | 1
But I need the id to have 3 for the amount of 8:
id | name | amount
-----------
3 | a | 8
5 | b | 1
Is this possible?
PS. This is required for billing task. At some day 2018-01-15 configuration of a was changed and user consumes some resource 10h with the amount of 8 and rests the day 14h -- 3. I need to count such a day by the maximum value. Thus row with id = 4 is just ignored for 2018-01-15 day. (for next day 2018-01-16 I will bill the amount of 3)
So I take for billing the row:
3 | a | 8 | 2018-01-01
And if something is wrong with it. I must report that row with id == 3 is wrong.
But when I used aggregation function the information about id is lost.
Would be awesome if this is possible:
select current(id), name, max(amount) from table group by name
select aggregated_row(id), name, max(amount) from table group by name
Here agg_row refer to the row which was selected by aggregation function max
UPD
I resolve the task as:
SELECT
(
SELECT id FROM t2
WHERE id = ANY ( ARRAY_AGG( tf.id ) ) AND amount = MAX( tf.amount )
) id,
name,
MAX(amount) ma,
SUM( ratio )
FROM t2 tf
GROUP BY name
UPD
It would be much better to use window functions
There are at least 3 ways, see below:
CREATE TEMP TABLE test (
id integer, name text, amount numeric, datefrom timestamptz
);
COPY test FROM STDIN (FORMAT csv);
3,a,8,2018-01-01
4,a,3,2018-01-15 10:00
5,b,1,2018-02-20
6,b,1,2019-01-01
\.
Method 1. using DISTINCT ON (PostgreSQL-specific)
SELECT DISTINCT ON (name)
id, name, amount
FROM test
ORDER BY name, amount DESC, datefrom ASC;
Method 2. using window functions
SELECT id, name, amount FROM (
SELECT *, row_number() OVER (
PARTITION BY name
ORDER BY amount DESC, datefrom ASC) AS __rn
FROM test) AS x
WHERE x.__rn = 1;
Method 3. using corelated subquery
SELECT id, name, amount FROM test
WHERE id = (
SELECT id FROM test AS t2
WHERE t2.name = test.name
ORDER BY amount DESC, datefrom ASC
LIMIT 1
);
demo: db<>fiddle
You need DISTINCT ON which filters the first row per group.
SELECT DISTINCT ON (name)
*
FROM table
ORDER BY name, amount DESC
You need a nested inner join. Try this -
SELECT id, T2.name, T2.amount
FROM TABLE T
INNER JOIN (SELECT name, MAX(amount) amount
FROM TABLE
GROUP BY name) T2
ON T.amount = T2.amount
In my Postgresql 9.3 database I have a table stock_rotation:
+----+-----------------+---------------------+------------+---------------------+
| id | quantity_change | stock_rotation_type | article_id | date |
+----+-----------------+---------------------+------------+---------------------+
| 1 | 10 | PURCHASE | 1 | 2010-01-01 15:35:01 |
| 2 | -4 | SALE | 1 | 2010-05-06 08:46:02 |
| 3 | 5 | INVENTORY | 1 | 2010-12-20 08:20:35 |
| 4 | 2 | PURCHASE | 1 | 2011-02-05 16:45:50 |
| 5 | -1 | SALE | 1 | 2011-03-01 16:42:53 |
+----+-----------------+---------------------+------------+---------------------+
Types:
SALE has negative quantity_change
PURCHASE has positive quantity_change
INVENTORY resets the actual number in stock to the given value
In this implementation, to get the current value that an article has in stock, you need to sum up all quantity changes since the latest INVENTORY for the specific article (including the inventory value). I do not know why it is implemented this way and unfortunately it would be quite hard to change this now.
My question now is how to do this for more than a single article at once.
My latest attempt was this:
WITH latest_inventory_of_article as (
SELECT MAX(date)
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
)
SELECT a.id, sum(quantity_change)
FROM stock_rotation sr
INNER JOIN article a ON a.id = sr.article_id
WHERE sr.date >= (COALESCE(
(SELECT date FROM latest_inventory_of_article),
'1970-01-01'
))
GROUP BY a.id
But the date for the latest stock_rotation of type INVENTORY can be different for every article.
I was trying to avoid looping over multiple article ids to find this date.
In this case I would use a different internal query to get the max inventory per article. You are effectively using stock_rotation twice but it should work. If it's too big of a table you can try something else:
SELECT sr.article_id, sum(quantity_change)
FROM stock_rotation sr
LEFT JOIN (
SELECT article_id, MAX(date) AS date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
GROUP BY article_id) AS latest_inventory
ON latest_inventory.article_id = sr.article_id
WHERE sr.date >= COALESCE(latest_inventory.date, '1970-01-01')
GROUP BY sr.article_id
You can use DISTINCT ON together with ORDER BY to get the latest INVENTORY row for each article_id in the WITH clause.
Then you can join that with the original table to get all later rows and add the values:
WITH latest_inventory as (
SELECT DISTINCT ON (article_id) id, article_id, date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
ORDER BY article_id, date DESC
)
SELECT article_id, sum(sr.quantity_change)
FROM stock_rotation sr
JOIN latest_inventory li USING (article_id)
WHERE sr.date >= li.date
GROUP BY article_id;
Here is my take on it: First, build the list of products at their last inventory state, using a window function. Then, join it back to the entire list, filtering on operations later than the inventory date for the item.
with initial_inventory as
(
select article_id, date, quantity_change from
(select article_id, date, quantity_change, rank() over (partition by article_id order by date desc)
from stockRotation
where type = 'INVENTORY'
) a
where rank = 1
)
select ii.article_id, ii.quantity_change + sum(sr.quantity_change)
from initial_inventory ii
join stockRotation sr on ii.article_id = sr.article_id and sr.date > ii.date
group by ii.article_id, ii.quantity_change
I have a single table laid out as such:
id | name | count
1 | John |
2 | Jim |
3 | John |
4 | Tim |
I need to fill out the count column such that the result is the number of times the specific name shows up in the column name.
The result should be:
id | name | count
1 | John | 2
2 | Jim | 1
3 | John | 2
4 | Tim | 1
I can get the count of occurrences of unique names easily using:
SELECT COUNT(name)
FROM table
GROUP BY name
But that doesn't fit into an UPDATE statement due to it returning multiple rows.
I can also get it narrowed down to a single row by doing this:
SELECT COUNT(name)
FROM table
WHERE name = 'John'
GROUP BY name
But that doesn't allow me to fill out the entire column, just the 'John' rows.
you can do that with a common table expression:
with counted as (
select name, count(*) as name_count
from the_table
group by name
)
update the_table
set "count" = c.name_count
from counted c
where c.name = the_table.name;
Another (slower) option would be to use a co-related sub-query:
update the_table
set "count" = (select count(*)
from the_table t2
where t2.name = the_table.name);
But in general it is a bad idea to store values that can easily be calculated on the fly:
select id,
name,
count(*) over (partition by name) as name_count
from the_table;
Another method : Using a derived table
UPDATE tb
SET count = t.count
FROM (
SELECT count(NAME)
,NAME
FROM tb
GROUP BY 2
) t
WHERE t.NAME = tb.NAME
assuming below table;
column name | type
id | int
date | varchar
When I use
SELECT ROWNUMBER() OVER( ORDER BY TYPE_DATE ) as ROWID,
TO_DATE( date, 'mm\dd\yyyy' ) as TYPE_DATE,
*
FROM TABLE
I always get below error:
SQL0104N an expected token "*" was found following .... <select_sublist>
here are three questions:
Why can't * be used here?
Why can't this new column be used in OVER()
How can I get the set of second 10 records, order by a formatted column
To answer your first question, it is because you have designated additional columns, and DB2 is unable expand this * to a column list. You can fix this by adding a table identifier FROM TABLE T, and using the exposed identifier to expand the column list SELECT ..., T.*
As you can see on this chart from the Information Center, you can only have EITHER * OR expressions and exposed-name.*
>--+-*-----------------------------------------------+---------><
| .-,-------------------------------------------. |
| V | |
'---+-expression--+-------------------------+-+-+-'
| | .-AS-. | |
| '-+----+--new-column-name-' |
'-exposed-name.*--------------------------'
For two and three, the column can't access the value of a function in the same SELECT clause by referring to it by its alias. You can push it lower into a sub-select, and then use the OVER() function. You can then get the rows you want by adding a BETWEEN:
SELECT ROWNUMBER() OVER( ORDER BY TYPE_DATE ) as ROWID, T1.*
FROM (
SELECT TO_DATE( date, 'mm\dd\yyyy' ) as TYPE_DATE, T.*
FROM TABLE T
) T1
WHERE ROWNUMBER() OVER( ORDER BY TYPE_DATE ) BETWEEN 10 AND 20
ORDER BY TYPE_DATE