Conditional Row_Number() for min and maximum date - tsql

I ve got a table with data which looks like this:
Table T1
+----+------------+------------+
| ID | Udate | last_code |
+----+------------+------------+
| 1 | 05/11/2018 | ATTEMPT |
| 1 | 03/11/2018 | ATTEMPT |
| 1 | 01/11/2017 | INFO |
| 1 | 25/10/2016 | ARRIVED |
| 1 | 22/9/2016 | ARRIVED |
| 1 | 14/9/2016 | SENT |
| 1 | 1/9/2016 | SENT |
+----+------------+------------+
| 2 | 26/10/2016 | RECEIVED |
| 2 | 19/10/2016 | ARRIVED |
| 2 | 18/10/2016 | ARRIVED |
| 2 | 14/10/2016 | ANNOUNCED |
| 2 | 23/9/2016 | INFO |
| 2 | 14/9/2016 | DAMAGE |
| 2 | 2/9/2016 | SCHEDULED |
+----+------------+------------+
Each id has multiple codes at different dates and there is no pattern for them.
Overall I m trying to get the last date and code, but if there is an "ATTEMPT" code, I need to get the first date and that code for each individual ID. Based on the table above, I would get:
+----+------------+------------+
| ID | Udate | last_code |
| 1 | 03/11/2018 | ATTEMPT |
| 2 | 26/10/2016 | RECEIVED |
+----+------------+------------+
I ve been trying
ROW_NUMBER() OVER (PARTITION BY ID
ORDER BY
(CASE WHEN code = 'ATTEMPT' THEN u_date END) ASC,
(CASE WHEN code_key <> 'ATTEMPT' THEN u_date END) DESC
) as RN
And at the moment I ve been stuck after I use ROW_NUMBER() twice, but can t think of a way to bring them all in the same table.
,ROW_NUMBER() OVER (PARTITION BY id, code order by udate asc) as RN1
,ROW_NUMBER() OVER (PARTITION BY id order by udate desc) AS RN2
I m not very familiar with CTEs and I think it s one of those queries which requires one perhaps..
Thanks.

I think you have a couple of options before attempting a CTE.
Give these a try, examples below:
DECLARE #TestData TABLE
(
[ID] INT
, [Udate] DATE
, [last_code] NVARCHAR(100)
);
INSERT INTO #TestData (
[ID]
, [Udate]
, [last_code]
)
VALUES ( 1, '11/05/2018', 'ATTEMPT ' )
, ( 1, '11/03/2018', 'ATTEMPT' )
, ( 1, '11/01/2017', 'INFO' )
, ( 1, '10/25/2016', 'ARRIVED' )
, ( 1, '9/22/2016 ', 'ARRIVED' )
, ( 1, '9/14/2016 ', 'SENT' )
, ( 1, '9/1/2016 ', 'SENT' )
, ( 2, '10/26/2016', 'RECEIVED' )
, ( 2, '10/19/2016', 'ARRIVED' )
, ( 2, '10/18/2016', 'ARRIVED' )
, ( 2, '10/14/2016', 'ANNOUNCED' )
, ( 2, '9/23/2016 ', 'INFO' )
, ( 2, '9/14/2016 ', 'DAMAGE' )
, ( 2, '9/2/2016 ', 'SCHEDULED' );
--option 1
--couple of outer apply
--1 - to get the min date for attempt
--2 - to get the max date regardless of the the code
--where clause, using coalesce will pick what date. Use the date if I have one for code ='ATTEMPT', if not use the max date.
SELECT [a].*
FROM #TestData [a]
OUTER APPLY (
SELECT [b].[ID]
, MIN([b].[Udate]) AS [AttemptUdate]
FROM #TestData [b]
WHERE [b].[ID] = [a].[ID]
AND [b].[last_code] = 'ATTEMPT'
GROUP BY [b].[ID]
) AS [aa]
OUTER APPLY (
SELECT [c].[ID]
, MAX([c].[Udate]) AS [MaxUdate]
FROM #TestData [c]
WHERE [c].[ID] = [a].[ID]
GROUP BY [c].[ID]
) AS [cc]
WHERE [a].[ID] = COALESCE([aa].[ID], [cc].[ID])
AND [a].[Udate] = COALESCE([aa].[AttemptUdate], [cc].[MaxUdate]);
--use window functions
--Similiar in that we are finding the max Udate and also min Udate when last_code='ATTEMPT'
--Then using COALESCE in the where clause to evaluate which one to use.
--Maybe a little cleaner
SELECT [td].[ID]
, [td].[Udate]
, [td].[last_code]
FROM (
SELECT [ID]
, [last_code]
, [Udate]
, MAX([Udate]) OVER ( PARTITION BY [ID] ) AS [MaxUdate]
, MIN( CASE WHEN [last_code] = 'ATTEMPT' THEN [Udate]
ELSE NULL
END
) OVER ( PARTITION BY [ID] ) AS [AttemptUdate]
FROM #TestData
) AS [td]
WHERE [td].[Udate] = COALESCE([td].[AttemptUdate], [td].[MaxUdate]);
To explain how I got there a little bit, it was primarily base on your requirement:
Overall I m trying to get the last date and code, but if there is an
"ATTEMPT" code, I need to get the first date and that code for each
individual ID.
So for each ID I needed a way to get:
Minimum Udate for last_code = 'ATTEMPT' per ID - if there was no ATTEMPT we'll get a null
Maximum Udate for all records per ID
If I could determine the above for each record based on ID then my final result set are basically those where the Udate equals my Maximum Udate if the Minimum was null. If the Minimum wasn't null use that instead.
The first option, using 2 outer applies is doing each of the points above.
Minimum Udate for last_code = 'ATTEMPT' per ID - if there was no ATTEMPT we'll get a null:
OUTER APPLY (
SELECT [b].[ID]
, MIN([b].[Udate]) AS [AttemptUdate]
FROM #TestData [b]
WHERE [b].[ID] = [a].[ID]
AND [b].[last_code] = 'ATTEMPT'
GROUP BY [b].[ID]
) AS [aa]
Outer Apply as I might not have an ATTEMPT record for a given ID, so in those situations it returns NULL.
Maximum Udate for all records per ID:
OUTER APPLY (
SELECT [c].[ID]
, MAX([c].[Udate]) AS [MaxUdate]
FROM #TestData [c]
WHERE [c].[ID] = [a].[ID]
GROUP BY [c].[ID]
) AS [cc]
Then the where clause compares what was returned by those to return only the records I want:
[a].[Udate] = COALESCE([aa].[AttemptUdate], [cc].[MaxUdate]);
I'm using COALESCE to handled and evaluate NULLs. COALESCE will evaluate the fields from left to right and use/return the first non NULL value.
So using this with Udate we can evaluate which Udate value I should use in my filter to satisfy the requirement.
Because if I had an ATTEMPT record field AttemptUdate would have a value and be used in the filter first. If I didn't have an ATTEMPT record AttemptUdate would be NULL so then MaxUdate would be used.
For option 2, similar just going after it a little different.
Minimum Udate for last_code = 'ATTEMPT' per ID - if there was no ATTEMPT we'll get a null:
MIN( CASE WHEN [last_code] = 'ATTEMPT' THEN [Udate]
ELSE NULL
END
) OVER ( PARTITION BY [ID] ) AS [AttemptUdate]
Min on Udate, but I use a case statement to evaluate if that records is an ATTEMPT or not. using OVER PARTITION will do that based on how I tell it to partition the data, by ID.
Maximum Udate for all records per ID:
MAX([Udate]) OVER ( PARTITION BY [ID] ) AS [MaxUdate]
Go get me the maximum Udate based on ID, since that's how I told it to partition it.
I do all that in a sub-query to make the where clause easier to work with. Then it's the same as before when filtering:
[td].[Udate] = COALESCE([td].[AttemptUdate], [td].[MaxUdate]);
Using COALESCE to determine which date I should be using and only return the records I want.
With the second option, go a little deeper, If you run just the sub query, you'll see you get for each individual record the 2 main driving points of the requirement:
What's the Max Udate per ID
What's the mint Udate of last_code=ATTEMPT per ID
From there I can just filter on those records satisfying what I was originally looking for, using a COALESCE to simplify my filter.
[td].[Udate] = COALESCE([td].[AttemptUdate], [td].[MaxUdate]);
Use AttemptUdate unless it's NULL then use MaxUdate.

Related

Count with group by on Postgresql

I have a postgresql type and a table
CREATE TYPE mem_status AS ENUM('waiting', 'active', 'expired');
CREATE TABLE mems (
id BIGSERIAL PRIMARY KEY,
status mem_status NOT NULL
);
dataset
INSERT INTO mems(id, status) VALUES
(1, 'active'), (2, 'active'), (3, 'expired');
I want to query counts that grouped by statuses. So I treid the query below.
WITH mem_statuses AS (
SELECT unnest(enum_range(NULL::mem_status)) AS status
)
SELECT m.status, count(1)
FROM mems m
RIGHT JOIN mem_statuses ms ON ms.status = m.status
GROUP BY m.status;
But if there is no waiting mems, the result looks like below.
status | count
================
NULL | 1 <- problem
'active' | 2
'expired' | 1
I want to get result like this.
status | count
================
'waiting' | 0
'active' | 2
'expired' | 1
How can I do that?
Use count(id):
WITH mem_statuses AS (
SELECT unnest(enum_range(NULL::mem_status)) AS status
)
SELECT ms.status, count(id)
FROM mems m
RIGHT JOIN mem_statuses ms ON ms.status = m.status
GROUP BY ms.status;
or:
select status, count(id)
from unnest(enum_range(null::mem_status)) as status
left join mems using(status)
group by status
status | count
---------+-------
waiting | 0
active | 2
expired | 1
(3 rows)
Per the documentation count(expression) gives
number of input rows for which the value of expression is not null
You need to modify the join and aggregate a bit -
select ms.status, count(m.status)
from (select unnest(enum_range(null::mem_status))) as ms(status)
left join mems as m
on ms.status = m.status
group by ms.status;

Optimising T-SQL reporting performance

I have the table bellow, I need to delete opposite rows between two dates by pairs based on PerCode Value,
In fact, we delete rows inside the date range that have the same PerCode and have equal and opposite values.
The problem is that begin date and end date are provided by users as parameters while reporting but the query take too much time if i try to delete these at runtime.
Example:
Begin date = 01/01/2018
End date = 31/12/2018
I should delete rows 3 and 4.
Do u have any idea how to do that while optimising performance (the table have 200 Millions of rows)
+----+------------+---------+---------+-----------+
| Id | Date | PerCode | Value | IsDeleted |
+----+------------+---------+---------+-----------+
| 1 | 01/10/2017 | C1 | 10 | |
| 2 | 01/01/2018 | C1 | -10 | |
| 3 | 15/02/2018 | C2 | 20 | 1 |
| 4 | 10/03/2018 | C2 | -20 | 1 |
| 5 | 01/12/2018 | C3 | 15 | |
| 6 | 01/02/2019 | C3 | -15 | |
+----+------------+---------+---------------------+
I had a quick go at this, using a table variable to allow me to knock together a query using your test data. However, this might not perform well when used over 2 million rows?
DECLARE #table TABLE (id INT, [date] DATE, percode CHAR(2), [value] INT, isdeleted BIT);
INSERT INTO #table
SELECT 1, '20171001', 'C1', 10, NULL
UNION ALL
SELECT 2, '20180101', 'C1', -10, NULL
UNION ALL
SELECT 3, '20180215', 'C2', 20, NULL
UNION ALL
SELECT 4, '20180310', 'C2', -20, NULL
UNION ALL
SELECT 5, '20181201', 'C3', 15, NULL
UNION ALL
SELECT 6, '20190201', 'C3', -15, NULL;
DECLARE #date_from DATE = '20180101';
DECLARE #date_to DATE = '20181231';
WITH ordered AS (
SELECT
id,
percode,
[value],
ROW_NUMBER() OVER (PARTITION BY percode, [value] ORDER BY [value]) AS order_id
FROM
#table
WHERE
[date] BETWEEN #date_from AND #date_to
AND ISNULL(isdeleted, 0) != 1),
matches AS (
SELECT
m1.id AS match_1_id,
m2.id AS match_2_id
FROM
ordered m1
INNER JOIN ordered m2 ON m1.percode = m2.percode AND m1.[value] = m2.[value] * -1 AND m1.order_id = m2.order_id)
UPDATE
t
SET
isdeleted = 1
FROM
#table t
INNER JOIN matches m ON m.match_1_id = t.id OR m.match_2_id = t.id;
SELECT * FROM #table;
Results:
id date percode value isdeleted
1 2017-10-01 C1 10 NULL
2 2018-01-01 C1 -10 NULL
3 2018-02-15 C2 20 1
4 2018-03-10 C2 -20 1
5 2018-12-01 C3 15 NULL
6 2019-02-01 C3 -15 NULL
How does it work? Well I broke the task down into steps:
make a list of all rows in the date period specified, where they aren't already deleted;
for each row of data assign it a running count number, grouped by the percode and the value. So the first C1 10 would be number #1, then the second C1 10 would be number #2, etc.;
to find matches it's simply a case of finding any value that has the same percode, the equal and opposite value to another value group, and the same running count number;
where there's a match set the isdeleted flag to 1.
Here is my code but this is not performant over 200 millions rows in real time.
and in real life Percode is concatenation of 5 columns (date, varchar(13), varchar(2),varchar(1) and varchar(50)) and Value is 4 numeric columns.
I am searching for other ideas.
--DECLARE #table TABLE (id INT, [date] DATE, percode CHAR(2), [value] INT, isdeleted BIT);
Select * INTO #MasterTable FROM
(
SELECT 1 id, '20171001' [date], 'C1' percode, 10 [value], NULL isdeleted
UNION ALL
SELECT 2, '20180101', 'C1', -10, NULL
UNION ALL
SELECT 3, '20180215', 'C2', 20, NULL
UNION ALL
SELECT 4, '20180310', 'C2', -20, NULL
UNION ALL
SELECT 5, '20181201', 'C3', 15, NULL
UNION ALL
SELECT 6, '20190201', 'C3', -15, NULL
) T ;
DECLARE #date_from DATE = '20180101';
DECLARE #date_to DATE = '20181231';
select F.id
Into #TmpTable
from
(
select Id, PerCode, Value
,ROW_NUMBER() over (partition by PerCode, Value order by (select 0)) Rn2
from
#MasterTable ) F
inner join (
select
PerCode
, Rn1
from (
select
PerCode
,Value
,ROW_NUMBER() over (partition by PerCode, Value order by (select 0)) Rn1
FROM #MasterTable
where
[date] BETWEEN #date_from AND #date_to
) A
group by PerCode , Rn1
having sum(Value) = 0 and count(*)>1
) B on F.PerCode = B.PerCode
and F.Rn2 = B.Rn1
update R
set IsDeleted = 1
from #MasterTable R
inner join #TmpTable P
on R.id = P.id
select * from #MasterTable
drop table #MasterTable ;
drop table #TmpTable;

Maintaining order in DB2 "IN" query

This question is based on this one. I'm looking for a solution to that question that works in DB2. Here is the original question:
I have the following table
DROP TABLE IF EXISTS `test`.`foo`;
CREATE TABLE `test`.`foo` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Then I try to get records based on the primary key
SELECT * FROM foo f where f.id IN (2, 3, 1);
I then get the following result
+----+--------+
| id | name |
+----+--------+
| 1 | first |
| 2 | second |
| 3 | third |
+----+--------+
3 rows in set (0.00 sec)
As one can see, the result is ordered by id. What I'm trying to achieve is to get the results ordered in the sequence I'm providing in the query. Given this example it should return
+----+--------+
| id | name |
+----+--------+
| 2 | second |
| 3 | third |
| 1 | first |
+----+--------+
3 rows in set (0.00 sec)
You could use a derived table with the IDs you want, and the order you want, and then join the table in, something like...
SELECT ...
FROM mcscb.mcs_premise prem
JOIN mcscb.mcs_serv_deliv_id serv
ON prem.prem_nb = serv.prem_nb
AND prem.tech_col_user_id = serv.tech_col_user_id
AND prem.tech_col_version = serv.tech_col_version
JOIN (
SELECT 1, '9486154876' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 2, '9403149581' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 3, '9465828230' FROM SYSIBM.SYSDUMMY1
) B (ORD, ID)
ON serv.serv_deliv_id = B.ID
WHERE serv.tech_col_user_id = 'CRSSJEFF'
AND serv.tech_col_version = '00'
ORDER BY B.ORD
You can use derived column to do custom ordering.
select
case
when serv.SERV_DELIV_ID = '9486154876' then 1 ELSE
when serv.SERV_DELIV_ID = '9403149581' then 2 ELSE 3
END END as custom_order,
...
...
ORDER BY custom_order
To make the logic a little bit more evident you might modify the solution provided by bhamby like so:
WITH ordered_in_list (ord, id) as (
VALUES (1, '9486154876'), (2, '9403149581'), (3, '9465828230')
)
SELECT ...
FROM mcscb.mcs_premise prem
JOIN mcscb.mcs_serv_deliv_id serv
ON prem.prem_nb = serv.prem_nb
AND prem.tech_col_user_id = serv.tech_col_user_id
AND prem.tech_col_version = serv.tech_col_version
JOIN ordered_in_list il
ON serv.serv_deliv_id = il.ID
WHERE serv.tech_col_user_id = 'CRSSJEFF'
AND serv.tech_col_version = '00'
ORDER BY il.ORD

Sum with different condition for every line

In my Postgresql 9.3 database I have a table stock_rotation:
+----+-----------------+---------------------+------------+---------------------+
| id | quantity_change | stock_rotation_type | article_id | date |
+----+-----------------+---------------------+------------+---------------------+
| 1 | 10 | PURCHASE | 1 | 2010-01-01 15:35:01 |
| 2 | -4 | SALE | 1 | 2010-05-06 08:46:02 |
| 3 | 5 | INVENTORY | 1 | 2010-12-20 08:20:35 |
| 4 | 2 | PURCHASE | 1 | 2011-02-05 16:45:50 |
| 5 | -1 | SALE | 1 | 2011-03-01 16:42:53 |
+----+-----------------+---------------------+------------+---------------------+
Types:
SALE has negative quantity_change
PURCHASE has positive quantity_change
INVENTORY resets the actual number in stock to the given value
In this implementation, to get the current value that an article has in stock, you need to sum up all quantity changes since the latest INVENTORY for the specific article (including the inventory value). I do not know why it is implemented this way and unfortunately it would be quite hard to change this now.
My question now is how to do this for more than a single article at once.
My latest attempt was this:
WITH latest_inventory_of_article as (
SELECT MAX(date)
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
)
SELECT a.id, sum(quantity_change)
FROM stock_rotation sr
INNER JOIN article a ON a.id = sr.article_id
WHERE sr.date >= (COALESCE(
(SELECT date FROM latest_inventory_of_article),
'1970-01-01'
))
GROUP BY a.id
But the date for the latest stock_rotation of type INVENTORY can be different for every article.
I was trying to avoid looping over multiple article ids to find this date.
In this case I would use a different internal query to get the max inventory per article. You are effectively using stock_rotation twice but it should work. If it's too big of a table you can try something else:
SELECT sr.article_id, sum(quantity_change)
FROM stock_rotation sr
LEFT JOIN (
SELECT article_id, MAX(date) AS date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
GROUP BY article_id) AS latest_inventory
ON latest_inventory.article_id = sr.article_id
WHERE sr.date >= COALESCE(latest_inventory.date, '1970-01-01')
GROUP BY sr.article_id
You can use DISTINCT ON together with ORDER BY to get the latest INVENTORY row for each article_id in the WITH clause.
Then you can join that with the original table to get all later rows and add the values:
WITH latest_inventory as (
SELECT DISTINCT ON (article_id) id, article_id, date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
ORDER BY article_id, date DESC
)
SELECT article_id, sum(sr.quantity_change)
FROM stock_rotation sr
JOIN latest_inventory li USING (article_id)
WHERE sr.date >= li.date
GROUP BY article_id;
Here is my take on it: First, build the list of products at their last inventory state, using a window function. Then, join it back to the entire list, filtering on operations later than the inventory date for the item.
with initial_inventory as
(
select article_id, date, quantity_change from
(select article_id, date, quantity_change, rank() over (partition by article_id order by date desc)
from stockRotation
where type = 'INVENTORY'
) a
where rank = 1
)
select ii.article_id, ii.quantity_change + sum(sr.quantity_change)
from initial_inventory ii
join stockRotation sr on ii.article_id = sr.article_id and sr.date > ii.date
group by ii.article_id, ii.quantity_change

PostgreSQL: select nearest rows according to sort order

I have a table like this:
a | user_id
----------+-------------
0.1133 | 2312882332
4.3293 | 7876123213
3.1133 | 2312332332
1.3293 | 7876543213
0.0033 | 2312222332
5.3293 | 5344343213
3.2133 | 4122331112
2.3293 | 9999942333
And I want to locate a particular row - 1.3293 | 7876543213 for example - and select the nearest 4 rows. 2 above, 2 below if possible.
Sort order is ORDER BY a ASC.
In this case I will get:
0.0033 | 2312222332
0.1133 | 2312882332
2.3293 | 9999942333
3.1133 | 2312332332
How can I achieve this using PostgreSQL? (BTW, I'm using PHP.)
P.S.: For the last or first row the nearest rows would be 4 above or 4 below.
Test case:
CREATE TEMP TABLE tbl(a float, user_id bigint);
INSERT INTO tbl VALUES
(0.1133, 2312882332)
,(4.3293, 7876123213)
,(3.1133, 2312332332)
,(1.3293, 7876543213)
,(0.0033, 2312222332)
,(5.3293, 5344343213)
,(3.2133, 4122331112)
,(2.3293, 9999942333);
Query:
WITH x AS (
SELECT a
,user_id
,row_number() OVER (ORDER BY a, user_id) AS rn
FROM tbl
), y AS (
SELECT rn, LEAST(rn - 3, (SELECT max(rn) - 5 FROM x)) AS min_rn
FROM x
WHERE (a, user_id) = (1.3293, 7876543213)
)
SELECT *
FROM x, y
WHERE x.rn > y.min_rn
AND x.rn <> y.rn
ORDER BY x.a, x.user_id
LIMIT 4;
Returns result as depicted in the question. Assuming that (a, user_id) is unique.
It is not clear whether a is supposed to unique. That's why I sort by user_id additionally to break ties. That's also why I use the window function row_number(), an not rank() for this. row_number() is the correct tool in any case. We want 4 rows. rank() would give an undefined number of rows if there were peers in the sort order.
This always returns 4 rows as long as there are at least 5 rows in the table. Close to first / last row, the first / last 4 rows are returned. The two rows before / after in all other cases. The criteria row itself is excluded.
Improved performance
This is an improved version of what #Tim Landscheidt posted. Vote for his answer if you like the idea with the index. Don't bother with small tables. But will boost performance for big tables - provided you have a fitting index in place. Best choice would be a multicolumn index on (a, user_id).
WITH params(_a, _user_id) AS (SELECT 5.3293, 5344343213) -- enter params once
,x AS (
(
SELECT a
,user_id
,row_number() OVER (ORDER BY a DESC, user_id DESC) AS rn
FROM tbl, params p
WHERE a < p._a
OR a = p._a AND user_id < p._user_id -- a is not defined unique
ORDER BY a DESC, user_id DESC
LIMIT 5 -- 4 + 1: including central row
)
UNION ALL -- UNION right away, trim one query level
(
SELECT a
,user_id
,row_number() OVER (ORDER BY a ASC, user_id ASC) AS rn
FROM tbl, params p
WHERE a > p._a
OR a = p._a AND user_id > p._user_id
ORDER BY a ASC, user_id ASC
LIMIT 5
)
)
, y AS (
SELECT a, user_id
FROM x, params p
WHERE (a, user_id) <> (p._a, p._user_id) -- exclude central row
ORDER BY rn -- no need to ORDER BY a
LIMIT 4
)
SELECT *
FROM y
ORDER BY a, user_id -- ORDER result as requested
Major differences to #Tim's version:
According to the question (a, user_id) form the search criteria, not just a. That changes window frame, ORDER BY and WHERE clause in subtly different ways.
UNION right away, no need for an extra query level. You need parenthesis around the two UNION-queries to allow for individual ORDER BY.
Sort result as requested. Requires another query level (at hardly any cost).
As parameters are used in multiple places I centralized the input in a leading CTE.
For repeated use you can wrap this query almost 'as is' into an SQL or plpgsql function.
And another one:
WITH prec_rows AS
(SELECT a,
user_id,
ROW_NUMBER() OVER (ORDER BY a DESC) AS rn
FROM tbl
WHERE a < 1.3293
ORDER BY a DESC LIMIT 4),
succ_rows AS
(SELECT a,
user_id,
ROW_NUMBER() OVER (ORDER BY a ASC) AS rn
FROM tbl
WHERE a > 1.3293
ORDER BY a ASC LIMIT 4)
SELECT a, user_id
FROM
(SELECT a,
user_id,
rn
FROM prec_rows
UNION ALL SELECT a,
user_id,
rn
FROM succ_rows) AS s
ORDER BY rn, a LIMIT 4;
AFAIR WITH will instantiate a memory table, so the focus of this solution is to limit its size as much as possible (in this case eight rows).
set search_path='tmp';
DROP TABLE lutser;
CREATE TABLE lutser
( val float
, num bigint
);
INSERT INTO lutser(val, num)
VALUES ( 0.1133 , 2312882332 )
,( 4.3293 , 7876123213 )
,( 3.1133 , 2312332332 )
,( 1.3293 , 7876543213 )
,( 0.0033 , 2312222332 )
,( 5.3293 , 5344343213 )
,( 3.2133 , 4122331112 )
,( 2.3293 , 9999942333 )
;
WITH ranked_lutsers AS (
SELECT val, num
,rank() OVER (ORDER BY val) AS rnk
FROM lutser
)
SELECT that.val, that.num
, (that.rnk-this.rnk) AS relrnk
FROM ranked_lutsers that
JOIN ranked_lutsers this ON (that.rnk BETWEEN this.rnk-2 AND this.rnk+2)
WHERE this.val = 1.3293
;
Results:
DROP TABLE
CREATE TABLE
INSERT 0 8
val | num | relrnk
--------+------------+--------
0.0033 | 2312222332 | -2
0.1133 | 2312882332 | -1
1.3293 | 7876543213 | 0
2.3293 | 9999942333 | 1
3.1133 | 2312332332 | 2
(5 rows)
As Erwin pointed out, the center row is not wanted in the output. Also, the row_number() should be used instead of rank().
WITH ranked_lutsers AS (
SELECT val, num
-- ,rank() OVER (ORDER BY val) AS rnk
, row_number() OVER (ORDER BY val, num) AS rnk
FROM lutser
) SELECT that.val, that.num
, (that.rnk-this.rnk) AS relrnk
FROM ranked_lutsers that
JOIN ranked_lutsers this ON (that.rnk BETWEEN this.rnk-2 AND this.rnk+2 )
WHERE this.val = 1.3293
AND that.rnk <> this.rnk
;
Result2:
val | num | relrnk
--------+------------+--------
0.0033 | 2312222332 | -2
0.1133 | 2312882332 | -1
2.3293 | 9999942333 | 1
3.1133 | 2312332332 | 2
(4 rows)
UPDATE2: to always select four, even if we are at the top or bottom of the list. This makes the query a bit uglier. (but not as ugly as Erwin's ;-)
WITH ranked_lutsers AS (
SELECT val, num
-- ,rank() OVER (ORDER BY val) AS rnk
, row_number() OVER (ORDER BY val, num) AS rnk
FROM lutser
) SELECT that.val, that.num
, ABS(that.rnk-this.rnk) AS srtrnk
, (that.rnk-this.rnk) AS relrnk
FROM ranked_lutsers that
JOIN ranked_lutsers this ON (that.rnk BETWEEN this.rnk-4 AND this.rnk+4 )
-- WHERE this.val = 1.3293
WHERE this.val = 0.1133
AND that.rnk <> this.rnk
ORDER BY srtrnk ASC
LIMIT 4
;
Output:
val | num | srtrnk | relrnk
--------+------------+--------+--------
0.0033 | 2312222332 | 1 | -1
1.3293 | 7876543213 | 1 | 1
2.3293 | 9999942333 | 2 | 2
3.1133 | 2312332332 | 3 | 3
(4 rows)
UPDATE: A version with a nested CTE (featuring outer join!!!). For conveniance, I added a primary key to the table, which sounds like a good idea anyway IMHO.
WITH distance AS (
WITH ranked_lutsers AS (
SELECT id
, row_number() OVER (ORDER BY val, num) AS rnk
FROM lutser
) SELECT l0.id AS one
,l1.id AS two
, ABS(l1.rnk-l0.rnk) AS dist
-- Warning: Cartesian product below
FROM ranked_lutsers l0
, ranked_lutsers l1 WHERE l0.id <> l1.id
)
SELECT lu.*
FROM lutser lu
JOIN distance di
ON lu.id = di.two
WHERE di.one= 1
ORDER by di.dist
LIMIT 4
;