PostgreSQL materialized view with global and partitioned ranks - postgresql

I have a table with 10 mils. of user ratings.
I need to create materialized view that has global rating and rating by country and is refreshed once a day.
I came up with the following select query:
SELECT row_number() OVER(ORDER BY value DESC, id) AS rank_global,
row_number() OVER(PARTITION BY country ORDER BY value DESC, id) AS rank_country,
*
FROM rate
ORDER BY value DESC, id
LIMIT 100000
Is there a way to speed up this query or maybe there is another way to do the same? I created btree (value desc, id) and (country, value desc, id) indexes but it still takes a lot of time to complete.
Example:
Creating a table and populating it with users with random value column and random country:
CREATE TABLE rate
(
id serial NOT NULL,
name text,
value integer NOT NULL DEFAULT 0,
country character varying,
CONSTRAINT rate_pkey PRIMARY KEY (id)
);
INSERT INTO rate
(SELECT n, ('user_'||n), (random()*30)::int, ('country_'||(random()*3)::int)
FROM generate_series(0,10) AS n);
CREATE INDEX rate_country_value_id_index
ON rate
USING btree(country, value DESC, id);
CREATE INDEX rate_value_id_index
ON rate
USING btree(value DESC, id);
Table contents:
id name value country
0 user_0 28 country_2
1 user_1 24 country_2
2 user_2 29 country_1
3 user_3 11 country_1
4 user_4 16 country_1
5 user_5 28 country_0
6 user_6 3 country_1
7 user_7 7 country_1
8 user_8 28 country_1
9 user_9 4 country_0
10 user_10 29 country_1
Then I create materialized view:
CREATE MATERIALIZED VIEW rate_view AS
SELECT row_number() OVER (ORDER BY value DESC, id) AS rgl,
row_number() OVER (PARTITION BY country ORDER BY value DESC, id) AS rc,
*
FROM rate
ORDER BY value DESC, id;
View contents (rgl - global rank, rc - rank by country):
rgl rc id name value country
1 1 2 user_2 29 country_1
2 2 10 user_10 29 country_1
3 1 5 user_5 28 country_0
4 1 0 user_0 28 country_2
5 3 8 user_8 28 country_1
6 2 1 user_1 24 country_2
7 4 4 user_4 16 country_1
8 5 3 user_3 11 country_1
9 6 7 user_7 7 country_1
10 2 9 user_9 4 country_0
11 7 6 user_6 3 country_1
Now i can create complex queries to select users with closest rank and its neighbours by rank. Both globally and by country.
For example, (after creating (value,id) and (rgl) indexes on view) here is global top 50 and 5 users closest by rank to value 9999942:
(
WITH closest_rank AS
(
SELECT rgl FROM rate_view
WHERE value <= 9999942
ORDER BY value DESC, id ASC
LIMIT 1
)
SELECT rgl, name, value
FROM rate_view
WHERE rgl > (SELECT rgl-3 FROM closest_rank )
ORDER BY rgl ASC
LIMIT 5
)
UNION
SELECT rgl, name, value
FROM rate_view
WHERE rgl <=50
ORDER BY rgl;

Related

Typeorm order by after distinct on with postgresql

I have a table below:
id
product_id
priceĀ 
1
1
100
2
1
150
3
2
120
4
2
190
5
3
100
6
3
80
I want to select cheapest price for product and sort them by price
Expected output:
id
product_id
price
6
3
80
1
1
100
3
2
120
What I try so far:
`
repository.createQueryBuilder('products')
.orderBy('products.id')
.distinctOn(['products.id'])
.addOrderBy('price')
`
This query returns, cheapest products but not sort them. So, addOrderBy doesn't effect to products. Is there a way to sort products after distinctOn ?
SELECT id,
product_id,
price
FROM (SELECT id,
product_id,
price,
Dense_rank()
OVER (
partition BY product_id
ORDER BY price ASC) dr
FROM product) inline_view
WHERE dr = 1
ORDER BY price ASC;
Setup:
postgres=# create table product(id int, product_id int, price int);
CREATE TABLE
postgres=# insert into product values (1,1,100),(2,1,150),(3,2,120),(4,2,190),(5,3,100),(6,3,80);
INSERT 0 6
Output
id | product_id | price
----+------------+-------
6 | 3 | 80
1 | 1 | 100
3 | 2 | 120
(3 rows)

Select rows with second highest value for each ID repeated multiple times

Id values
1 10
1 20
1 30
1 40
2 3
2 9
2 0
3 14
3 5
3 7
Answer should be
Id values
1 30
2 3
3 7
I tried as below
Select distinct
id,
(select max(values)
from table
where values not in(select ma(values) from table)
)
You need the row_number window function. This adds a column with a row count for each group (in your case the ids). In a subquery you are able to ask for the second row of each group.
demo:db<>fiddle
SELECT
id, values
FROM (
SELECT
*,
row_number() OVER (PARTITION BY id ORDER BY values DESC)
FROM
table
) s
WHERE row_number = 2

Several top numbers in a column T-SQL

I have a table called _Invoice in SQL Server 2016 - like this:
Company InvoiceNo
-----------------
10 1
10 2
10 3
20 1
20 2
20 3
20 4
I want to get the highest value from all companies.
Like this:
Company InvoiceNo
-----------------
10 3
20 3
I want this data to then update another table that is called InvoiceSeries
where the InvoiceNo is higher than the NextNo in InvoiceSeries table
I am stuck with getting the highest data from InvoiceNo:
UPDATE InvoiceSeries
SET NextNo = -- Highest number from each company--
FROM InvoiceSeries ise
JOIN _Invoice i ON ise.InvoiceSeries = i.InvoiceSeries
WHERE i.InvoiceNo > ise.NextNo
Some example data:
Columns in InvoiceSeries Columns in _Invoices
Company NextNo Company InvoiceNo
10 9007 10 9008
20 1001 10 9009
10 9010
10 9011
10 9012
20 1002
20 1003
20 1004
If I understand correctly, you are looking for the HIGHEST common invoice number
Example
Select A.*
From YourTable A
Join (
Select Top 1 with ties
InvoiceNo
From YourTable
Group By InvoiceNo
Having count(Distinct Company) = (Select count(Distinct Company) From YourTable)
Order By InvoiceNo Desc
) B on A.InvoiceNo=B.InvoiceNo
Returns
Company InvoiceNo
10 3
20 3
EDIT - Updated for comment
Select company
,Invoice=max(invoiceno)
From YourTable
Group By company
This answer assumes there will be a record in the Invoice Series table.
--Insert Sample Data
CREATE TABLE #_Invoice (Company INT, InvoiceNo INT)
INSERT INTO #_Invoice(Company, InvoiceNo)
VALUES
(10 , 1),
(10 , 2),
(10 , 3),
(20 , 1),
(20 , 2),
(20 , 3),
(20 , 4)
CREATE TABLE #InvoiceSeries(Company INT, NextNo INT)
INSERT INTO #InvoiceSeries(Company, NextNo)
VALUES
(10, 1),
(20 ,1)
UPDATE s
SET NextNo = MaxInvoiceNo
FROM #InvoiceSeries s
INNER JOIN (
--Get the Max invoice number per company
SELECT Company, MAX(InvoiceNo) as MaxInvoiceNo
FROM #_Invoice
GROUP BY Company
) i on i.Company = s.Company
AND s.NextNo < i.MaxInvoiceNo --Only join to records where the 'nextno' is less than the max
--Confirm results
SELECT * FROM #InvoiceSeries
DROP TABLE #InvoiceSeries
DROP TABLE #_Invoice

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.
This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;

Build a query that pulls records based on a value in a column

My table has a parent/child relationship, along the lines of parent.id,id. There is also a column that contains a quantity, and another ID representing a grand-parent, like so:
id parent.id qty Org
1 1 1 100
2 1 0 100
3 1 4 100
4 4 1 101
5 4 2 101
6 6 1 102
7 6 0 102
8 6 1 102
What this is supposed to show is ID 1 is the parent, and ID 2 and 3 are children which belongs to ID 1, and ID 1, 2, and 3 all belong to the grandparent 100.
I would like to know if any child or parent has QTY = 0, what are all the other id's associated to that parent, and what are all the other parents associated with that grandparent?
For example, I would want to see a report that shows me this:
Org id parent.id qty
100 1 1 1
100 2 1 0
100 3 1 4
102 6 6 1
102 7 6 0
102 8 6 1
Much appreciate any help you can offer to build a MS SQL 2000 (yeah, I know) query to handle this.
Try this
select * from tablename a
where exists (select 1 from tablename x
where x.parent_id = a.parent_id and qty = 0)
Example:
;with cte as
( select 1 id,1 parent_id, 1 qty, 100 org
union all select 2,1,0,100
union all select 3,1,4,100
union all select 4,4,1,101
union all select 5,4,2,101
union all select 6,6,1,102
union all select 7,6,0,102
union all select 8,6,1,102
)
select * from cte a
where exists (select 1 from cte x
where x.parent_id = a.parent_id and qty = 0)
SQL DEMO HERE