How to number groups using window function in Postgresql?

How to number groups using window function in Postgresql? - postgresql

I have a table visits that shows the navigation of a user in a website. The ultimate goal is to have an estimation of the time a user spent in each area of the website.
| user_id | timestamp | area_visited | visit_it |
| ------- | ------------------------- | ------------- | -------------- |
| 1 | 2021-03-02 19:34:09.708+00| area1 |1 |
| 1 | 2021-03-02 19:34:16.53+00 | area2 |2 |
| 1 | 2021-03-02 19:34:18.697+00| area2 |2 |
| 1 | 2021-03-02 19:34:56.367+00| area1 |3 |
| 2 | 2021-03-02 19:35:16.53+00 | area1 |1 |
| 2 | 2021-03-02 19:36:52.53+00 | area3 |2 |
| 2 | 2021-03-02 19:38:16.53+00 | area3 |3 |
I tried to use dense_rank but the results is not exactly what I need. I want to increment the visit_id field only when the user visits a new area in the website. If the user visits an area, then another one and comes back to the first, I still want that to be considered as a different visit_id
I tried the following query but it does not take the chronological order of the visits
select *, dense_rank() over (order by user_id,area) as visit_id from visits
Then this, but that does not work as each timestamp is unique
select *, dense_rank() over (order by user_id,timestamp, area) as visit_id from visits
Any idea how to do this?
Thanks!

I think you are looking for this :
select *
, dense_rank() over (partition by user_id order by timestamp)
- dense_rank() over (partition by user_id,area_visited order by timestamp) as visit_id
from visits

Related

Flatten Postgers left join query result with dynamic values into one row

I have two tables products and product_attributs. One Product can have one or many attributs and these are filled by a dynamic web form (name and value inputs) added by the user as needed. For example for a drill the user could decide to add two attributs : color=blue and power=100 watts. For another product it could be 3 or more different attribus and for another it could have no special attributs.
products
| id | name | identifier | identifier_type | active
| ----------|--------------|-------------|------------------|---
| 1 | Drill | AD44 | barcode | true
| 2 | Polisher | AP211C | barcode | true
| 3 | Jackhammer | AJ2133 | barcode | false
| 4 | Screwdriver | AS4778 | RFID | true
product_attributs
|id | name | value | product_id
|----------|--------------|-------------|----------
|1 | color | blue | 1
|2 | power | 100 watts | 1
|3 | size | 40 cm | 2
|4 | energy | electrical | 3
|4 | price | 35€ | 3
so attributs could be anything which are set dynamically by the user. My need is to generate a report on CSV which contain all products with their attributs. Without a good experience in SQL I generated the following basic request :
SELECT pr.name, pr.identifier_type, pr.identifier, pr.active, att.name, att.value
FROM products as pr
LEFT JOIN product_attributs att ON pr.id = att.product_id
as you know the result will contain for the same product as many rows as attributs it has and this is not ideal for reporting. The ideal would be this :
|name | identifier_type | identifier | active | name | value | name | value
|-----------|-----------------|------------|--------|--------|-------|------ |------
|Drill | barcode | AD44 | true | color | blue | power | 100 w
|Polisher | barcode | AP211C | true | size | 40 cm | null | null
|Jackhammer | barcode | AJ2133 | true | energy | elect | price | 35 €
|Screwdriver| barcode | AS4778 | true | null | null | null | null
here I only showed a max of two attributes per product but it could be more if needed. Well I did some research and came across the pivot with crosstab function on Postgres but the problem it requests static values but this does not match my need.
thanks lot for your help and sorry for duplicates if any.

Thanks Laurenz Albe for your help. array_agg solved my problem. Here is the query if someone may be interested in :
SELECT
pr.name, pr.description, pr.identifier_type, pr.identifier,
pr.internal_identifier, pr.active,
ARRAY_TO_STRING(ARRAY_AGG (oa.name || ' = ' || oa.value),', ') attributs
FROM
products pr
LEFT JOIN product_attributs oa ON pr.id = oa.product_id
GROUP BY
pr.name, pr.description, pr.identifier_type, pr.identifier,
pr.internal_identifier, pr.active
ORDER BY
pr.name;

'Smart' grouping in postgres

I need somehow to group my rows by specific condition
|id | address | last_name | Count of purchases | customer_number |
|1 | de Berlin | name_1 | 1 | 11111 |
|2 | de Berlin | name_2 | 1 | 12345 |
|3 | de Berlin | name_1 | 1 | 12345 |
So the problem is that I need to group by address AND last_name BUT in this case row with ID = 2 will not be in set because it has different last_name BUT it shares the same customer_number number with row with ID = 3. Can I do it somehow with one query?
So basically I want to receive something like
select SUM(Count_of_purchases), array_agg(last_name), array_agg(customer_number)
from table
group by f(address, last_name, customer_number)
| 3 | {name_1, name_2} | {11111, 12345} |

Sort partitions and rows inside the partitions

I am following this tutorial:
http://www.postgresqltutorial.com/postgresql-window-function/
I'm looking for a case that is not described in the tutorial and I don't found a solution.
At one moment on the tutorial, this SELECT query is used to display the products grouped by group name and their prices sorted ascending in each group, here is the result :
the request is :
SELECT
product_name,
group_name,
price,
ROW_NUMBER () OVER (
PARTITION BY group_name
ORDER BY
price
)
FROM
products
INNER JOIN product_groups USING (group_id);
I would like to sort the rows by price like in the example AND to sort the partition by descending alphabetical order, like this :
How can modify the request to obtain this result ?

ORDER BY can be followed by a comma-separated list of sort_expressions. Use ASC or DESC to set the sort direction for each expression. ASC (ascending order) is the default sort direction.
Thus, you could use ORDER BY group_name DESC, price:
SELECT
product_name,
group_name,
price,
ROW_NUMBER () OVER (
PARTITION BY group_name
ORDER BY
group_name DESC, price
)
FROM
products
INNER JOIN product_groups USING (group_id);
yields
| product_name | group_name | price | row_number |
|--------------------+------------+---------+------------|
| Kindle Fire | Tablet | 150.00 | 1 |
| Samsung Galaxy Tab | Tablet | 200.00 | 2 |
| iPad | Tablet | 700.00 | 3 |
| Microsoft Lumia | Smartphone | 200.00 | 1 |
| HTC One | Smartphone | 400.00 | 2 |
| Nexus | Smartphone | 500.00 | 3 |
| iPhone | Smartphone | 900.00 | 4 |
| Lenovo Thinkpad | Laptop | 700.00 | 1 |
| Sony VAIO | Laptop | 700.00 | 2 |
| Dell Vostro | Laptop | 800.00 | 3 |
| HP Elite | Laptop | 1200.00 | 4 |

SUM values from two tables with GROUP BY and WHERE

I have two tables below named sent_table and received_table. I am attempting to mash them together in a query to achieve output_table. All my attempts so far result in a huge amount of duplicates and totally bogus sum values.
I am assuming I would need to use GROUP BY and WHERE to achieve this goal. I want to be able to filter based on the users name.
sent_table
+----+------+-------+----------+
| id | name | value | order_id |
+----+------+-------+----------+
| 1 | dave | 100 | 1 |
| 2 | dave | 200 | 1 |
| 3 | dave | 300 | 2 |
+----+------+-------+----------+
received_table
+----+------+-------+----------+
| id | name | value | order_id |
+----+------+-------+----------+
| 1 | dave | 400 | 1 |
| 2 | dave | 500 | 2 |
| 3 | dave | 600 | 2 |
+----+------+-------+----------+
output table
+------+----------+----------+
| sent | received | order_id |
+------+----------+----------+
| 300 | 400 | 1 |
| 300 | 1100 | 2 |
+------+----------+----------+
I tried the following with no joy. This does not impose any restrictions on how I would desire to solve this problem. It is just how I attempted to do it.
SELECT *
FROM
( select SUM(value) as sent, order_id FROM sent_table WHERE name='dave' GROUP BY order_id) A
CROSS JOIN
( select SUM(value) as received, order_id FROM received_table WHERE name='dave' GROUP BY order_id) B
Any help would be greatly appreciated.

Do the sums on each table, grouping by order_id, then join the results. To get the rows even if one side is missing, do a FULL OUTER JOIN:
SELECT COALESCE(s.order_id, r.order_id) AS order_id, s.sent, r.received
FROM (
SELECT order_id, SUM(value) AS sent
FROM sent
GROUP BY order_id
) s
FULL OUTER JOIN (
SELECT order_id, SUM(value) AS received
FROM received
GROUP BY order_id
) r
USING (order_id)
ORDER BY 1
Result:
| order_id | sent | received |
| -------- | ---- | -------- |
| 1 | 300 | 400 |
| 2 | | 1100 |
Note the COALESCE on the order_id, so that if it's missing from sent it will be taken from recevied, so that that value will never be NULL.
If you want to have 0 in place of NULL (when e.g. there is no record for that order_id in either sent or received), you would do COALESCE(s.sent, 0) AS sent, COALESCE(r.received, 0) AS received.
https://www.db-fiddle.com/f/nq3xYrcys16eUrBRHT6xLL/2

Using HiveQL, how do I pull the row with the highest integer?

I have a table with a few million rows of data that looks like this:
+---------------+--------------+-------------------+
| page | search_term | interactions |
+---------------+--------------+-------------------+
| /mom | pizza | 15 |
| /dad | pizza | 8 |
| /uncle | pizza | 2 |
| /brother | pizza | 7 |
| /mom | pasta | 12 |
| /dad | pasta | 23 |
+---------------+--------------+-------------------+
My goal is to run a HiveQL Query that will return the largest 'interactions' number for each unique page/term combo. For example:
+---------------+--------------+-------------------+
| page | search_term | interactions |
+---------------+--------------+-------------------+
| /dad | pasta | 23 |
| /mom | pizza | 15 |
+---------------+--------------+-------------------+
How would I write this considering that each unique page has hundreds of thousands of search_terms, but I only want to pull the one search_term with the most interactions?
I have tried using max(interactions) and max(struct(interactions, search_term)).col1 but have had no luck. My output is consistently giving me all of the search_terms for each page no matter how many interactions.
Thanks!

Use row_number() analytic function:
select page, search_term, interactions
from
(select page, search_term, interactions,
row_number() over (partition by page order by interactions desc ) rn
)s
where rn = 1;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to number groups using window function in Postgresql? - postgresql

I think you are looking for this : select * , dense_rank() over (partition by user_id order by timestamp) - dense_rank() over (partition by user_id,area_visited order by timestamp) as visit_id from visits

Related

Flatten Postgers left join query result with dynamic values into one row

'Smart' grouping in postgres

Sort partitions and rows inside the partitions

SUM values from two tables with GROUP BY and WHERE

Using HiveQL, how do I pull the row with the highest integer?

Categories

Resources