I don't understand how to add the grouped values on SQL - postgresql

Data table:
| WINNER | FOOT CLUB|
| -------- | -------- |
| 1 | Beşiktaş |
| 2 | Beşiktaş |
| 3 |Galatasaray |
| 4 |Galatasaray |
| 5 | Beşiktaş |
| 6 | Istanbul |
| 7 | Istanbul |
| 8 | Istanbul |
| 9 |Galatasaray |
| 10 |Galatasaray |
| 11 |Fenerbahçe |
| 12 |Fenerbahçe |
| 13 |Fenerbahçe |
| 14 | Istanbul |
Help, please. I need to make a sorted array of a sequence of identical values appear. Use SQL syntax of any version. I need this result:
Beşiktaş 2
Galatasaray 2
Beşiktaş 1
Istanbul 3
Galatasaray 2
Fenerbahçe 3
Istanbul 1
CREATE TABLE football (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
INSERT INTO football VALUES (1, 'Beşiktaş');
INSERT INTO football VALUES (2, 'Beşiktaş');
INSERT INTO football VALUES (3, 'Galatasaray');
INSERT INTO football VALUES (4, 'Galatasaray');
INSERT INTO football VALUES (5, 'Beşiktaş');
INSERT INTO football VALUES (6, 'Istanbul');
INSERT INTO football VALUES (7, 'Istanbul');
INSERT INTO football VALUES (8, 'Istanbul');
INSERT INTO football VALUES (9, 'Galatasaray');
INSERT INTO football VALUES (10, 'Galatasaray');
INSERT INTO football VALUES (11, 'Fenerbahçe');
INSERT INTO football VALUES (12, 'Fenerbahçe');
INSERT INTO football VALUES (13, 'Fenerbahçe');
INSERT INTO football VALUES (14, 'Istanbul');
SELECT name,
RANK() OVER()
FROM football
it turned out like this:
Beşiktaş|1
Beşiktaş|1
Galatasaray|1
Galatasaray|1
Beşiktaş|1
Istanbul|1
Istanbul|1
Istanbul|1
Galatasaray|1
Galatasaray|1
Fenerbahçe|1
Fenerbahçe|1
Fenerbahçe|1
Istanbul|1

The below was adapted from this solution.
Dbfiddle for your solution if desired
select name, count(*) as cnt
from (select t.*,
(row_number() over (order by id) - row_number() over (partition by name order by id)
) as grp
from football t
) t
group by name, grp
order by min(id) asc

Related

Agreggating and joining two tables in PostgreSQL

I am trying to produce an aggregated output table using aggregates from two different tables. I am unclear on how to join the two outcomes.
The two tables, one listing all products in each store, the other the price variation for each product are as presented below.
| product_id | daily_price | date |
|------------|-------------|------------|
| 1 | 1.25$ | 01-01-2000 |
| 1 | ... | ... |
| 1 | 1$ | 31-12-2000 |
| 2 | 4.5$ | 01-01-2000 |
| 2 | ... | ... |
| 2 | 4.25$ | 31-12-2000 |
| store_id | product_id |
|----------|------------|
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 3 |
| 3 | 2 |
The first aggregation gets the average daily price (it varies) of all products.
SELECT product_id, ROUND((AVG(price)),2) as average_price FROM product_dailyprices
GROUP BY product_id;
| product_id | average_price |
|------------|---------------|
| 1 | 50 |
| 2 | 100 |
| 3 | 250 |
The second query gets me the number of different products available in each store
SELECT store, COUNT(product_id) as product_count FROM products
GROUP BY store;
| store_id | product_count |
|----------|---------------|
| 1 | 200 |
| 2 | 250 |
| 3 | 225 |
I am a bit lost on how to perform a query to produce the following:
| store_id | product_count | average_price_at_store |
|----------|---------------|------------------------|
| 1 | 34 | 6.51$ |
| 2 | 45 | 3.23$ |
| 3 | 36 | 5.37$ |
Thanks for the help!
As you did not provide an SQL for the tables, lets use the following bare bone structure:
CREATE TABLE products
(
id SERIAL NOT NULL,
name text NOT NULL,
CONSTRAINT products_pk PRIMARY KEY (id)
);
CREATE TABLE stores
(
id SERIAL NOT NULL,
name text NOT NULL,
CONSTRAINT stores_pk PRIMARY KEY (id)
);
CREATE TABLE daily_prices
(
product_id INTEGER NOT NULL,
daily_price DOUBLE PRECISION NOT NULL,
date timestamptz,
CONSTRAINT daily_prices_product FOREIGN KEY (product_id) REFERENCES products (id)
);
CREATE TABLE locations
(
store_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
CONSTRAINT products_product_fk FOREIGN KEY (product_id) REFERENCES products (id),
CONSTRAINT products_store_fk FOREIGN KEY (store_id) REFERENCES stores (id)
);
And let enter some sample data to help use verify tthat the query works:
INSERT INTO products(name)
VALUES ('product 1');
INSERT INTO products(name)
VALUES ('product 2');
INSERT INTO products(name)
VALUES ('product 3');
INSERT INTO stores(name)
VALUES ('store 1');
INSERT INTO stores(name)
VALUES ('store 2');
insert into locations (store_id, product_id)
values (1, 1),
(1, 2),
(2, 2),
(2, 3);
INSERT INTO daily_prices(product_id, daily_price, date)
VALUES (1, 2.0, '01-01-2020');
INSERT INTO daily_prices(product_id, daily_price, date)
VALUES (1, 4.0, '02-01-2020');
INSERT INTO daily_prices(product_id, daily_price, date)
VALUES (2, 3.0, '01-01-2020');
INSERT INTO daily_prices(product_id, daily_price, date)
VALUES (2, 5.0, '02-01-2020');
INSERT INTO daily_prices(product_id, daily_price, date)
VALUES (3, 10.0, '01-01-2020');
INSERT INTO daily_prices(product_id, daily_price, date)
VALUES (3, 20.0, '02-01-2020');
Then the query to produce your desired table would look like:
select l.store_id as store_id,
count(distinct l.product_id) as number_of_products,
avg(dp.daily_price) as average_price
from locations l
join daily_prices dp on dp.product_id = l.product_id
group by l.store_id;
And we can manually verify that it calculated the expected result:
+--------+------------------+-------------+
|store_id|number_of_products|average_price|
+--------+------------------+-------------+
|1 |2 |3.5 |
|2 |2 |9.5 |
+--------+------------------+-------------+

Parent/child table with include/exclude items

I have a table with parent-child relations. The relations can go n-level deep.
There is also a table with elements that belong to a group.
CREATE TABLE group_children(
id serial PRIMARY KEY,
parent_id integer,
children_id integer,
contains boolean
);
CREATE TABLE group_item(
id serial PRIMARY KEY,
group_id integer,
name text
);
INSERT INTO group_children(parent_id, children_id, contains) VALUES
(1, 2, true),
(1, 3, false),
(2, 4, true),
(2, 5, false),
(3, 6, true),
(3, 7, false);
INSERT INTO group_item(group_id, name) VALUES
(4, 'aaa'),
(4, 'bbb'),
(5, 'bbb'),
(5, 'ccc'),
(6, 'aaa'),
(6, 'bbb'),
(7, 'aaa'),
(7, 'ccc');
So, we can represent this data as
It is not necessary to be in the form of a binary tree, just a simple case. Group can contains m child.
Need to read right to left. Group 4 contains ['aaa', 'bbb'], group 5 - ['bbb', 'ccc']. Group 2 includes all items from group 4 and excludes from group 5. So group 2 contains ['aaa']. And so on. After all computation group 1 will contain ['aaa'].
Question is: how to build a sql query to get all items that belong to the group 1?
All i could do:
WITH RECURSIVE r AS (
SELECT group_children.parent_id, group_children.children_id, group_children.contains, group_item.name
FROM group_children
LEFT JOIN group_item ON group_children.children_id = group_item.group_id
WHERE parent_id = 1
UNION ALL
SELECT group_children.parent_id, group_children.children_id, group_children.contains, group_item.name
FROM group_children
LEFT JOIN group_item ON group_children.children_id = group_item.group_id
JOIN r ON group_children.parent_id = r.children_id
)
SELECT * FROM r;
SQL Fiddle
demo:db<>fiddle
WITH RECURSIVE items AS (
SELECT -- 1
group_id,
array_agg(name)
FROM
group_Item
GROUP BY group_id
UNION
SELECT DISTINCT
parent_id,
array_agg(unnest) FILTER (WHERE bool_and) OVER (PARTITION BY parent_id) -- 5
FROM (
SELECT
parent_id,
unnest,
bool_and(contains) OVER (PARTITION BY parent_id, unnest) -- 4
FROM items i
JOIN group_children gc -- 2
ON i.group_id = gc.children_id,
unnest(array_agg) -- 3
) s
)
SELECT * FROM items
The non-recursive part aggregates all names per group_id
Recursive part: Joining the chilren against their parents
Expanding the name arrays into one element per row.
This results in:
| group_id | array_agg | id | parent_id | children_id | contains | unnest |
|----------|-----------|----|-----------|-------------|----------|--------|
| 4 | {aaa,bbb} | 3 | 2 | 4 | true | aaa |
| 4 | {aaa,bbb} | 3 | 2 | 4 | true | bbb |
| 5 | {bbb,ccc} | 4 | 2 | 5 | false | bbb |
| 5 | {bbb,ccc} | 4 | 2 | 5 | false | ccc |
| 6 | {aaa,bbb} | 5 | 3 | 6 | true | aaa |
| 6 | {aaa,bbb} | 5 | 3 | 6 | true | bbb |
| 7 | {aaa,ccc} | 6 | 3 | 7 | false | aaa |
| 7 | {aaa,ccc} | 6 | 3 | 7 | false | ccc |
Now you have the unnested names. Now you want to find the ones that have to be excluded. Taking the bbb element for the parent_id = 2: There is one row with contains = true and one with contains = false. This should be excluded. Therefore all the names per parent_id have to be grouped. The contains values can be aggregated with boolean operators. The aggregate function bool_and gives only true if all elements are true. So bbb would get a false (The aggregation needs to be done as a window function because GROUP BY is not allowed within the recursive part for some reasons):
Result:
| parent_id | unnest | bool_and |
|-----------|--------|----------|
| 2 | aaa | true |
| 2 | bbb | false |
| 2 | bbb | false |
| 2 | ccc | false |
| 3 | aaa | false |
| 3 | aaa | false |
| 3 | bbb | true |
| 3 | ccc | false |
After that the unnested names can be grouped per parent_id. The FILTER clause only aggregates the elements where the bool_and is true. Of course you need to do this in a window function again. This creates duplicate records which can be removed by the DISTINCT clause
Final result (which of course could be filtered by the element 1):
| group_id | array_agg |
|----------|-----------|
| 5 | {bbb,ccc} |
| 4 | {aaa,bbb} |
| 6 | {aaa,bbb} |
| 7 | {aaa,ccc} |
| 2 | {aaa} |
| 3 | {bbb} |
| 1 | {aaa} |

Substitute Dense_Rank() value with Newid() value in TSQL?

is it possible to substitute the value generated by Dense_Rank() with a Newid() value in TSQL? I use Dense_Rank() for grouping but I need a uniqueidentifier generated instead of an integer. Thanks in advance.
There's no direct way to do this, but as I mentioned in my comment, you can get your dense_rank() for each record, then generate a NEWID() for each distinct Dense_Rank(), then join it back to itself.
CREATE TABLE test(f1 int, f2 char(1));
INSERT INTO test
VALUES (1, 'a'),
(1, 'b'),
(1, 'c'),
(2, 'a'),
(2, 'b'),
(3, 'a'),
(3, 'd'),
(3, 'g');
With dr AS (SELECT f1, f2, dense_rank() OVER (PARTITION BY f1 ORDER BY f2) as dr FROM test)
,dr_newid AS (SELECT dr, newid() as nid FROM (SELECT dr FROM dr GROUP BY dr) as drsub)
SELECT dr.f1, dr.f2, dr.dr, dr_newid.nid
FROM dr LEFT OUTER JOIN dr_newid ON dr.dr = dr_newid.dr
ORDER BY f1, f2;
+----+----+----+--------------------------------------+
| f1 | f2 | dr | nid |
+----+----+----+--------------------------------------+
| 1 | a | 1 | 966389AF-4C70-4AA8-A5C9-6F9537B8A1B8 |
| 1 | b | 2 | 73BE2978-B7D7-46B8-8B04-3103C8410575 |
| 1 | c | 3 | CB935CCA-AFE5-4D13-9583-0440DF1BEFE2 |
| 2 | a | 1 | 966389AF-4C70-4AA8-A5C9-6F9537B8A1B8 |
| 2 | b | 2 | 73BE2978-B7D7-46B8-8B04-3103C8410575 |
| 3 | a | 1 | 966389AF-4C70-4AA8-A5C9-6F9537B8A1B8 |
| 3 | d | 2 | 73BE2978-B7D7-46B8-8B04-3103C8410575 |
| 3 | g | 3 | CB935CCA-AFE5-4D13-9583-0440DF1BEFE2 |
+----+----+----+--------------------------------------+
One caveat here though... depending on how your box performs the join from dr to dr_newid it may generate unique newids for each distinct dense_rank value. Using a LEFT JOIN should trick the optimizer into generating the dr_newid intermediate result set once to be joined back. An INNER JOIN though may not.
If it's giving incorrect results, you may dump that dr_newid out to a temp table and then join back, forcing the server to derive the newid() once for each distinct dense_rank() and avoid tricks to force the optimizer's logic.
sqlfiddle here

PostgreSQL get parent categories from table

I have the table as like below.
CREATE TABLE my.categories (id bigint, parent_id bigint, name varchar(128));
INSERT INTO my.categories (id, parent_id, name) VALUES (1, null, 'LEVEL 1');
INSERT INTO my.categories (id, parent_id, name) VALUES (2, 1, 'LEVEL 2.1');
INSERT INTO my.categories (id, parent_id, name) VALUES (3, 1, 'LEVEL 2.2');
INSERT INTO my.categories (id, parent_id, name) VALUES (4, 2, 'LEVEL 3.1.1');
INSERT INTO my.categories (id, parent_id, name) VALUES (5, 2, 'LEVEL 3.1.2');
INSERT INTO my.categories (id, parent_id, name) VALUES (6, 3, 'LEVEL 3.2.1');
+----+-----------+---------------+
| id | parent_id | name |
+----+-----------+---------------+
| 1 | null | 'LEVEL 1' |
| 2 | 1 | 'LEVEL 2.1' |
| 3 | 1 | 'LEVEL 2.2' |
| 4 | 2 | 'LEVEL 3.1.1' |
| 5 | 2 | 'LEVEL 3.1.2' |
| 6 | 3 | 'LEVEL 3.2.1' |
+----+-----------+---------------+
I need to get all id's for parent categories.
WITH RECURSIVE tree(theId) AS (
SELECT id
FROM my.categories
WHERE id = theId -- wrong here, because its not a procedure
UNION ALL
SELECT table1.id
FROM my.categories AS table1
JOIN tree AS parent ON theId = table1.parent_id
)
SELECT DISTINCT theId FROM tree WHERE theId = 6;
Example result with data but actually I need only id's.
+----+-----------+---------------+
| id | parent_id | name |
+----+-----------+---------------+
| 1 | null | 'LEVEL 1' |
| 3 | 1 | 'LEVEL 2.2' |
| 6 | 3 | 'LEVEL 3.2.1' |
+----+-----------+---------------+
Or like this:
+----+-----------+---------------+
| id | parent_id | name |
+----+-----------+---------------+
| 3 | 1 | 'LEVEL 2.2' |
| 6 | 3 | 'LEVEL 3.2.1' |
+----+-----------+---------------+
The trouble is I'm not allowed to use procedures. This query should be used as sub-query for many other queries. And please dont look at name column it is irrelevant.
If I get you, this is what you need.
First, with the folowing query you can get all the parent ids:
WITH RECURSIVE t(id, parentlist) AS (
SELECT id, ARRAY[]::bigint[] FROM my.categories WHERE parent_id IS NULL
UNION
SELECT my.categories.id, my.categories.parent_id || t.parentlist
FROM my.categories
JOIN t ON categories.parent_id = t.id
) SELECT * FROM t
-- outputs:
-- id | parentlist
-- ----+------------
-- 1 | {}
-- 2 | {1}
-- 3 | {1}
-- 4 | {2,1}
-- 5 | {2,1}
-- 6 | {3,1}
If you want to get a record of the parents of one id you just need to change the query like:
WITH RECURSIVE t(id, parentlist) AS (
SELECT id, ARRAY[]::bigint[] FROM my.categories WHERE parent_id IS NULL
UNION
SELECT my.categories.id, my.categories.parent_id || t.parentlist
FROM my.categories
JOIN t ON categories.parent_id = t.id
) SELECT unnest(parentlist) as parents_ids FROM t WHERE id=6;
-- outputs:
-- parents_ids
-- -----------
-- 3
-- 1
Note that the last query does not output the "current" id (6).

Update using subquery sets same value for all records

I'm trying to calculate the weight of each record based on the value of a column (updated_at). When I run the following query:
UPDATE buyers
SET weight = RankedRecords.rank / (RankedRecords.count + 1.0)
FROM (
SELECT
id,
RANK() OVER (
PARTITION BY board_list_id ORDER BY 'updated_at' ASC
) AS rank,
COUNT(id) OVER (PARTITION BY board_list_id) AS count
FROM buyers
) RankedRecords
WHERE buyers.id = RankedRecords.id
All records with the same board_list_id get their weight updated to the same value. While I expect all weight values to be different and depend on rank.
Running just the subquery produces correct results (each record has different rank). But updating doesn't work as expected.
What should I change?
You have a very subtle mistake in your query. Try this instead:
UPDATE
buyers
SET
weight = RankedRecords.rank / (RankedRecords.count + 1.0)
FROM
(
SELECT
id,
rank() OVER (PARTITION BY board_list_id ORDER BY updated_at ASC) AS rank,
count(id) OVER (PARTITION BY board_list_id) AS count
FROM buyers
) RankedRecords
WHERE
buyers.id = RankedRecords.id ;
Your litle mistake: ORDER BY 'updated_at' is just ORDER BY 'constant-text'. If you want to refer to the column, you either use "updated_at" (with double quotes) or updated_at (without them, because the name of your column is just ASCII lowercase chars).
Tried with:
CREATE TABLE buyers
(
id integer not null primary key,
board_list_id integer not null,
updated_at timestamp not null default now(),
weight double precision
) ;
INSERT INTO buyers (id, board_list_id, updated_at)
VALUES
(1, 1, '2017-01-09'),
(2, 1, '2017-01-10'),
(3, 1, '2017-01-11'),
(4, 1, '2017-01-12'),
(5, 2, '2017-01-09'),
(6, 2, '2017-01-10'),
(7, 2, '2017-01-11'),
(8, 1, '2017-01-12') ;
The result of the previous UPDATE (with a RETURNING * clause) would be:
|----+---------------+---------------------+--------+----+------+-------|
| id | board_list_id | updated_at | weight | id | rank | count |
|----+---------------+---------------------+--------+----+------+-------|
| 1 | 1 | 2017-01-09 00:00:00 | 0.1667 | 1 | 1 | 5 |
|----+---------------+---------------------+--------+----+------+-------|
| 2 | 1 | 2017-01-10 00:00:00 | 0.3333 | 2 | 2 | 5 |
|----+---------------+---------------------+--------+----+------+-------|
| 3 | 1 | 2017-01-11 00:00:00 | 0.5 | 3 | 3 | 5 |
|----+---------------+---------------------+--------+----+------+-------|
| 8 | 1 | 2017-01-12 00:00:00 | 0.6667 | 8 | 4 | 5 |
|----+---------------+---------------------+--------+----+------+-------|
| 4 | 1 | 2017-01-12 00:00:00 | 0.6667 | 4 | 4 | 5 |
|----+---------------+---------------------+--------+----+------+-------|
| 5 | 2 | 2017-01-09 00:00:00 | 0.25 | 5 | 1 | 3 |
|----+---------------+---------------------+--------+----+------+-------|
| 6 | 2 | 2017-01-10 00:00:00 | 0.5 | 6 | 2 | 3 |
|----+---------------+---------------------+--------+----+------+-------|
| 7 | 2 | 2017-01-11 00:00:00 | 0.75 | 7 | 3 | 3 |
|----+---------------+---------------------+--------+----+------+-------|