how to number distinct values while respecting their original ordering? - postgresql

Here's my input data:
CREATE TEMP TABLE test AS SELECT * FROM (VALUES
(1, 12),
(2, 7),
(3, 8),
(4, 8),
(5, 7)
) AS rows (position, value);
I want to, in a single query (no subqueries or CTEs), assign a unique number for each distinct value. However, I also want those numbers to ascend according to the associated position -- i.e., a distinct value's number should be assigned according to its lowest position.
Assumptions:
each row will always have a unique position
value is not guaranteed unique per row
the number of a distinct value is only for ordinal purposes, e.g. it doesn't matter whether distinct_values goes 1-2-3 or 3-8-14
The desired output is:
position | value | distinct_value
----------+-------+----------------
1 | 12 | 1
2 | 7 | 2
3 | 8 | 3
4 | 8 | 3
5 | 7 | 2
I can get close using DENSE_RANK to number distinct values:
SELECT
position,
value,
DENSE_RANK() OVER (ORDER BY value) AS distinct_value
FROM test ORDER BY position;
The result obviously ignores position:
position | value | distinct_value
----------+-------+----------------
1 | 12 | 3
2 | 7 | 1
3 | 8 | 2
4 | 8 | 2
5 | 7 | 1
Is there a better window function for this?

with
t(x,y) as (values
(1, 12),
(2, 7),
(3, 8),
(4, 8),
(5, 7)),
pos(i,y) as (select min(x), y from t group by y),
ind(i,y) as (select row_number() over(order by i), y from pos)
select * from ind join t using(y) order by x;

Related

POSTGRESQL: Enumerate with the same number if having the same criteria

What I have
id | value
1 | foo
2 | foo
3 | bah
4 | bah
5 | bah
6 | jezz
7 | jezz
8 | jezz
9 | pas
10 | log
What I need:
Enumerate rows as in the following example
id | value | enumeration
1 | foo | 1
2 | foo | 1
3 | bah | 2
4 | bah | 2
5 | bah | 2
6 | jezz | 3
7 | jezz | 3
8 | jezz | 3
9 | pas | 4
10 | log | 5
I've tried row_number with over partition. But this leads to another kind of enumeration.
Thanks for any help
You can use rank() or dense_rank() for that case:
Click: demo:db<>fiddle
SELECT
*,
dense_rank() OVER (ORDER BY value)
FROM
mytable
rank() generates an ordered number to every element of a group, but it creates gaps (if there were 3 elements in the first group, the second group starting at row 4 would get the number 4). dense_rank() avoids these gaps.
Note, this orders the table by the value column alphabetically. So, the result will be: blah == 1, foo == 2, jezz == 3, log == 4, pas == 5.
If you want to keep your order, you need an additional order criterion. In your case you could use the id column to create such a column, if no other is available:
Click: demo:db<>fiddle
First, use first_value() to find the lowest id per value group:
SELECT
*,
first_value(id) OVER (PARTITION BY value ORDER BY id)
FROM
mytable
This first value (foo == 1, blah == 3, ...) can be used to keep the original order when calculating the dense_rank():
SELECT
id,
value,
dense_rank() OVER (ORDER BY first_value)
FROM (
SELECT
*,
first_value(id) OVER (PARTITION BY value ORDER BY id)
FROM
mytable
) s

Aggregate all combinations of rows taken k at a time

I am trying to calculate an aggregate function for a field for a subset of rows in a table. The problem is that I'd like to find the mean of every combination of rows taken k at a time --- so for all the rows, I'd like to find (say) the mean of every combination of 10 rows. So:
id | count
----|------
1 | 5
2 | 3
3 | 6
...
30 | 16
should give me
mean of ids 1..10; ids 1, 3..11; ids 1, 4..12, and so so. I know this will yield a lot of rows.
There are SO answers for finding combinations from arrays. I could do this programmatically by taking 30 ids 10 at a time and then SELECTing them. Is there a way to do this with PARTITION BY, TABLESAMPLE, or another function (something like python's itertools.combinations())? (TABLESAMPLE by itself won't guarantee which subset of rows I am selecting as far as I can tell.)
The method described in the cited answer is static. A more convenient solution may be to use recursion.
Example data:
drop table if exists my_table;
create table my_table(id int primary key, number int);
insert into my_table values
(1, 5),
(2, 3),
(3, 6),
(4, 9),
(5, 2);
Query which finds 2 element subsets in 5 element set (k-combination with k = 2):
with recursive recur as (
select
id,
array[id] as combination,
array[number] as numbers,
number as sum
from my_table
union all
select
t.id,
combination || t.id,
numbers || t.number,
sum+ number
from my_table t
join recur r on r.id < t.id
and cardinality(combination) < 2 -- param k
)
select combination, numbers, sum/2.0 as average -- param k
from recur
where cardinality(combination) = 2 -- param k
combination | numbers | average
-------------+---------+--------------------
{1,2} | {5,3} | 4.0000000000000000
{1,3} | {5,6} | 5.5000000000000000
{1,4} | {5,9} | 7.0000000000000000
{1,5} | {5,2} | 3.5000000000000000
{2,3} | {3,6} | 4.5000000000000000
{2,4} | {3,9} | 6.0000000000000000
{2,5} | {3,2} | 2.5000000000000000
{3,4} | {6,9} | 7.5000000000000000
{3,5} | {6,2} | 4.0000000000000000
{4,5} | {9,2} | 5.5000000000000000
(10 rows)
The same query for k = 3 gives:
combination | numbers | average
-------------+---------+--------------------
{1,2,3} | {5,3,6} | 4.6666666666666667
{1,2,4} | {5,3,9} | 5.6666666666666667
{1,2,5} | {5,3,2} | 3.3333333333333333
{1,3,4} | {5,6,9} | 6.6666666666666667
{1,3,5} | {5,6,2} | 4.3333333333333333
{1,4,5} | {5,9,2} | 5.3333333333333333
{2,3,4} | {3,6,9} | 6.0000000000000000
{2,3,5} | {3,6,2} | 3.6666666666666667
{2,4,5} | {3,9,2} | 4.6666666666666667
{3,4,5} | {6,9,2} | 5.6666666666666667
(10 rows)
Of course, you can remove numbers from the query if you do not need them.

DB2: How to join indirectly referenced data

I have the following given table structure (I've removed some columns and created a stub) to support versioning and reduce duplication of data. Imagine an article review process whereas each step is stored in database (article_meta). Whenever the article itself changes, the data is stored in DB, too.
The versioning is done by a reference to the predecessor (pre_meta_id).
WITH
t_article_meta (id, pre_meta_id, user_id, state) as (
values (1, NULL, 101, 'submitted')
union all values (2, 1, 7, 'inreview')
union all values (3, 2, 7, 'rejected')
union all values (4, 3, 101, 'submitted')
union all values (5, NULL, 202, 'submitted')
union all values (6, 5, 7, 'inreview')
union all values (7, 6, 7, 'accepted')
union all values (8, 4, 7, 'inreview')
union all values (9, 8, 7, 'accepted')
),
t_article (id, meta_id, content) as (
values (1, 1, 'Hello wordl')
union all values (2, 4, 'Hello world')
union all values (3, 5, 'Lorem ipsum doloret')
)
SELECT ...;
Now I want to create a view that somehow combines meta data and article data even if there is no direct reference (only indirect via predecessor).
id | pre_meta_id | user_id | state | content (left join) | content (I want to have)
---|-------------|---------|-----------|---------------------|-------------------------
1 | NULL | 101 | submitted | Hello wordl | Hello wordl
2 | 1 | 7 | inreview | NULL | Hello wordl
3 | 2 | 7 | rejected | NULL | Hello wordl
4 | 3 | 101 | submitted | Hello world | Hello world
5 | NULL | 202 | submitted | Lorem ipsum doloret | Lorem ipsum doloret
6 | 5 | 7 | inreview | NULL | Lorem ipsum doloret
7 | 6 | 7 | accepted | NULL | Lorem ipsum doloret
8 | 4 | 7 | inreview | NULL | Hello world
9 | 8 | 7 | accepted | NULL | Hello world
How can I realize something like that in DB2 in a performing way? My first idea: a join on a function (to get the predecessor with an article related) sounds really expensive to me.
This SQL would do the job:
SELECT m.id, successor_id, user_id, state, content,
last_value(content,'IGNORE NULLS') over (order by m.id) as last_value
FROM article_meta m
LEFT JOIN article a
ON m.id = a.article_meta_id
ORDER BY m.id
It is the regular join to combine the tables with an aditional column (with another name compared to your expected result to show the difference)
You might want to rename that column and remove content to get a exact match to you expected result.
For the adjusted requirements the SQL gets more complex as we have to define a recursive query to get the title/content for all the childs - it will look like this:
with temp (id, pre_meta_id, user_id, state, level, parent, root) as (
select m.id, m.pre_meta_id, m.user_id, m.state, 1 as level, m.pre_meta_id as parent, m.id as root
from article_meta m, article a
where m.id = a.meta_id
union all
select m.id, m.pre_meta_id, m.user_id, m.state, level + 1 as level, t.id as parent, t.root
from temp t, article_meta m
where m.pre_meta_id = t.id
and m.id not in (select meta_id from article)
and level < 10
)
select *
from temp t
left join article a
on t.root = a.meta_id
order by 1

PostgreSQL, two windowing functions at once

I have typical table with data, say mytemptable.
DROP TABLE IF EXISTS mytemptable;
CREATE TEMP TABLE mytemptable
(mydate date, somedoc text, inqty int, outqty int);
INSERT INTO mytemptable (mydate, somedoc, inqty, outqty)
VALUES ('01.01.2016.', '123-13-24', 3, 0),
('04.01.2016.', '15-19-44', 2, 0),
('06.02.2016.', '15-25-21', 0, 1),
('04.01.2016.', '21-133-12', 0, 1),
('04.01.2016.', '215-11-51', 0, 2),
('05.01.2016.', '11-181-01', 0, 1),
('05.02.2016.', '151-80-8', 4, 0),
('04.01.2016.', '215-11-51', 0, 2),
('07.02.2016.', '34-02-02', 0, 2);
SELECT row_number() OVER(ORDER BY mydate) AS rn,
mydate, somedoc, inqty, outqty,
SUM(inqty-outqty) OVER(ORDER BY mydate) AS csum
FROM mytemptable
ORDER BY mydate;
In my SELECT query I try to order result by date and add row numbers 'rn' and cumulative (passing) sum 'csum'. Of course unsuccessfully.
I believe this is because I use two windowing functions in query which conflicts in some way.
How to properly make this query to be fast, well ordered and to get proper result in 'csum' column (3, 5, 4, 2, 0, -1, 3, 2, 0)
Since there is an ordering tie at 2016-04-01 the result for those rows will be the total accumulated sum. If you want it to be different use untie columns in the order by.
From the manual:
There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Many (but not all) window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition
Without an untieing column you can use the generated row number in an outer query:
set datestyle = 'dmy';
with mytemptable (mydate, somedoc, inqty, outqty) as (
values
('01-01-2016'::date, '123-13-24', 3, 0),
('04-01-2016', '15-19-44', 2, 0),
('06-02-2016', '15-25-21', 0, 1),
('04-01-2016', '21-133-12', 0, 1),
('04-01-2016', '215-11-51', 0, 2),
('05-01-2016', '11-181-01', 0, 1),
('05-02-2016', '151-80-8', 4, 0),
('04-01-2016', '215-11-51', 0, 2),
('07-02-2016', '34-02-02', 0, 2)
)
select *, sum(inqty-outqty) over(order by mydate, rn) as csum
from (
select
row_number() over(order by mydate) as rn,
mydate, somedoc, inqty, outqty
from mytemptable
) s
order by mydate;
rn | mydate | somedoc | inqty | outqty | csum
----+------------+-----------+-------+--------+------
1 | 2016-01-01 | 123-13-24 | 3 | 0 | 3
2 | 2016-04-01 | 15-19-44 | 2 | 0 | 5
3 | 2016-04-01 | 21-133-12 | 0 | 1 | 4
4 | 2016-04-01 | 215-11-51 | 0 | 2 | 2
5 | 2016-04-01 | 215-11-51 | 0 | 2 | 0
6 | 2016-05-01 | 11-181-01 | 0 | 1 | -1
7 | 2016-05-02 | 151-80-8 | 4 | 0 | 3
8 | 2016-06-02 | 15-25-21 | 0 | 1 | 2
9 | 2016-07-02 | 34-02-02 | 0 | 2 | 0

Select statement with join, or subquery limit

For few days now I'm trying to solve this problem.
I have table group_user, group_name.
What I wanna to do is select user groups, than description that group (from group_name), and 10 other users from the group.
It's not problem with first two. The problem is, that I'm nowhere to get limit users.
I can select user_group, and other users in that group. I don't know how to limit that.
Using:
SELECT a.g_id,b.group,b.userid
FROM group_user AS a
RIGHT JOIN
(SELECT g_id as group, u_id as userid FROM group_user) AS b ON a.g_id=b.group
WHERE u_id=112
It showing me, my user groups and users in that group. But when I'm trying to limit in subwuery, it limits all, not particular group.
I tried, Select users, with using IN where was goups of my user without luck.
I was thinking maybe group and having will help, but I can't see how I could use it.
So my question is, how can I limit subquery result in MySQL where the subquery is built on result of query.
I think im overload and maybe I don't see something.
UPDATE to show what I really wanna accomplish here's another piece of code.
SELECT g_id FROM group_user WHERE user_id = 112
So I get all groups that user is in let, saye each of that select is var extra_group, so second query will be
SELECT u_id FROM group_user WHERE group_id = extra_group LIMIT 10
I need to do same as above, in one query.
another UPDATE after MIKE post.
I should ADD that, user can be in more than 1 group. So I think the real problem is, that I don't have any clue how to select those groups and in same query select 10 users for selected groups, so in result could be
g_id u_id
1 | 2
1 | 3
1 | 4
3 | 3
3 | 8
where g_id is user groups from that query
SELECT g_id FROM group_user WHERE user_id = 112
Create sample tables and add data:
CREATE TABLE `group_user` (
`u_id` int(11) DEFAULT NULL,
`g_id` int(11) DEFAULT NULL,
`apply_date` date DEFAULT NULL
);
CREATE TABLE `group_name` (
`g_id` int(11) DEFAULT NULL,
`g_name` varchar(255) DEFAULT NULL
);
INSERT INTO `group_name` VALUES
(1, 'Group 1'), (2, 'Group 2'), (3, 'Group 3'), (4, 'Group 4'), (5, 'Group 5');
INSERT INTO `group_user` VALUES
(1, 1, '2010-12-01'), (1, 2, '2010-12-01'), (1, 3, '2010-12-01'), (1, 4, '2010-12-01'), (1, 5, '2010-12-01'),
(2, 1, '2010-12-02'), (2, 2, '2010-12-02'),
(3, 1, '2010-12-03'), (3, 2, '2010-12-03'), (3, 3, '2010-12-03'), (3, 4, '2010-12-03'),
(4, 1, '2010-12-04'), (4, 2, '2010-12-04'),
(5, 1, '2010-12-05'), (5, 2, '2010-12-05'),
(6, 1, '2010-12-06'), (6, 2, '2010-12-06'),
(7, 1, '2010-12-07'), (7, 2, '2010-12-07'), (7, 3, '2010-12-07'), (7, 4, '2010-12-07'), (7, 5, '2010-12-07'),
(8, 1, '2010-12-08'), (8, 2, '2010-12-08'),
(9, 1, '2010-12-09'), (9, 2, '2010-12-09'), (9, 3, '2010-12-09'), (9, 4, '2010-12-09'), (9, 5, '2010-12-09');
Select the groups of which user u_id == 1 is a member. Then for each group select a maximum of 4 members (excluding user u_id == 1), ordered by descending apply_date:
SELECT u3.g_id, g.g_name, u3.u_id, u3.apply_date
FROM (
SELECT
u1.g_id,
u1.u_id,
u1.apply_date,
IF( #prev_gid <> u1.g_id, #user_index := 1, #user_index := #user_index + 1 ) AS user_index,
#prev_gid := u1.g_id AS prev_gid
FROM group_user AS u1
JOIN (SELECT #prev_gid := 0, #user_index := NULL) AS vars
JOIN group_user AS u2
ON u2.g_id = u1.g_id
AND u2.u_id = 1
AND u1.u_id <> 1
ORDER BY u1.g_id, u1.apply_date DESC, u1.u_id
) AS u3
JOIN group_name AS g ON g.g_id = u3.g_id
WHERE u3.user_index <= 4
ORDER BY u3.g_id, u3.apply_date DESC, u3.u_id;
+------+---------+------+------------+
| g_id | g_name | u_id | apply_date |
+------+---------+------+------------+
| 1 | Group 1 | 5 | 2010-12-05 |
| 1 | Group 1 | 4 | 2010-12-04 |
| 1 | Group 1 | 3 | 2010-12-03 |
| 1 | Group 1 | 2 | 2010-12-02 |
| 2 | Group 2 | 5 | 2010-12-05 |
| 2 | Group 2 | 4 | 2010-12-04 |
| 2 | Group 2 | 3 | 2010-12-03 |
| 2 | Group 2 | 2 | 2010-12-02 |
| 3 | Group 3 | 9 | 2010-12-09 |
| 3 | Group 3 | 7 | 2010-12-07 |
| 3 | Group 3 | 3 | 2010-12-03 |
| 4 | Group 4 | 9 | 2010-12-09 |
| 4 | Group 4 | 7 | 2010-12-07 |
| 4 | Group 4 | 3 | 2010-12-03 |
| 5 | Group 5 | 9 | 2010-12-09 |
| 5 | Group 5 | 7 | 2010-12-07 |
+------+---------+------+------------+