find rows with same values in mulitple columns with ID's - postgresql

i have a relatively big table, with a lot of columns and rows.
Among them i have ID, longitude and latitude.
I would like to have a list of ID's which have the same coordinates (latitude and longitude)
something like this
ID¦latitude¦longitude¦number
1 ¦ 12.12¦ 34.54¦1
12¦ 12.12¦ 34.54¦1
52¦ 12.12¦ 34.54¦1
3 ¦ 56.08¦ -45.87¦1
67¦ 56.08¦ -45.87¦1
Thanks

You can either use an EXISTS query:
select *
from the_table t1
where exists (select 1
from the_table t2
where t1.id <> t2.id
and (t1.latitude, t1.longitude) = (t2.latitude, t2.longitude))
order by latitude, longitude;
or a window function:
select *
from (
select t.*,
count(*) over (partition by latitude, longitude) as cnt
from the_table t
) t
where cnt > 1
order by latitude, longitude;
Online example: http://rextester.com/ITKJ70005

Simple solution:
SELECT
t.id, t.latitude, t.longitude, grp.tot
FROM
your_table t INNER JOIN (
SELECT latitude, longitude, count(*) AS tot
FROM your_table
GROUP BY latitude, longitude
HAVING count(*) > 1
) grp ON (t.latitude = grp.latitude AND t.longitude = grp.longitude);
Or to get duplicates for lat/lng:
SELECT
latitude, longitude,
array_agg(id ORDER BY id) AS ids
FROM
place
GROUP BY
latitude, longitude
HAVING
count(*) > 1;

Related

extracting records from rank = 1

I would like to get name of title that have the number 1 in the rank column.
SELECT title, RANK() OVER(ORDER BY COUNT(*) DESC) rank
FROM rentals as w join copies as e on w.signature = e.signature join books as c on e.idbook = c.idbook
WHERE dateofloan <= CURRENT_DATE - 31
GROUP BY title;
My code shows two columns
title, rank
Thank you in advance for your help.
Subquery and restrict to the first rank:
WITH cte AS (
SELECT title, RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM rentals w
INNER JOIN copies e ON w.signature = e.signature
INNER JOIN books c ON e.idbook = c.idbook
WHERE dateofloan <= CURRENT_DATE - 31
GROUP BY title
)
SELECT title
FROM cte
WHERE rnk = 1;

Postgres select work 3x time faster then function with that select

I have a SELECT in Postgres:
SELECT DISTINCT ON (price) price, quantity, is_ask, final_update_id
FROM (SELECT *
FROM ((SELECT price, quantity, is_ask, book_depth.final_update_id
FROM order_depth
LEFT JOIN book_depth ON book_depth_id = book_depth.id
WHERE book_depth_id IN (SELECT id
FROM book_depth
WHERE final_update_id > (SELECT last_update_id
FROM order_book
WHERE symbol_name = 'XRPRUB'
ORDER BY last_update_id DESC
LIMIT 1)
AND symbol_name = 'XRPRUB'))
UNION
(SELECT price, quantity, is_ask, order_book_id
FROM "order"
WHERE order_book_id = (SELECT id
FROM order_book
WHERE symbol_name = 'XRPRUB'
ORDER BY last_update_id DESC
LIMIT 1))
ORDER BY final_update_id DESC) AS t) AS t1
ORDER BY price, final_update_id DESC;
It works for about 20 seconds.
But when I create function with this select this function works for about 1 min 40 seconds. Can someone explain me is it normal or I make mistake somewhere?

Find Nearest Neighbors Linstring for every Point using PostGIS in large tables

I am trying to connect each record from table 1 (10k records) with its nearest neighbor in table (20k records). I am hoping there is a solution that does not require me to iterate over table 1 and perform a KNN for each record.
I have tried to do a KNN index limiting by reasonable distance, and selecting distinct on table 1 id from that as a sub-query, but this requires me to order by the id number rather than the distance.
Any advice is greatly appreciated.
select distinct on (t1id) t1id, it2d, dist, name, street
from (
select t1id, it2d, dist, name, street
from (
select t1id, it2d, st_distance(t1geom, t2geom) as dist, name, street
from (
select t1.id as t1id,t1.geom as t1geom, t2.id as t2id, t2.geom as t2geom, t1.name, t2.street
from t1
join t2
on st_dwithin(t1.geom, t2.geom, 300)
where t1.seg is null
) as near
order by t1geom <-> t2geom
) as distOrdered
order by dist
) as idOrdered
order by t1id
I found a method that seems to be working, but it is a bit ugly. I query the results of the 1st query to get the id number and the shortest distance and then join that back to the original query to get the shortest distance for each record.
This is far from ideal, but I believe is returning the correct results.
select t1id, it2d, dist, name, street
from (
select t1id, it2d, dist, name, street
from (
select t1id, it2d, dist, name, street
from (
select t1id, it2d, st_distance(t1geom, t2geom) as dist, name, street
from (
select t1.id as t1id,t1.geom as t1geom, t2.id as t2id, t2.geom as t2geom, t1.name, t2.street
from t1
join t2
on st_dwithin(t1.geom, t2.geom, 300)
where t1.seg is null
) as near
order by t1geom <-> t2geom
) as distOrdered
order by dist
) as idOrdered
) as allD
join (
select distinct on (t1id) t1id, min(dist) as md
from (
select t1id, it2d, dist, name, street
from (
select t1id, it2d, st_distance(t1geom, t2geom) as dist, name, street
from (
select t1.id as t1id,t1.geom as t1geom, t2.id as t2id, t2.geom as t2geom, t1.name, t2.street
from t1
join t2
on st_dwithin(t1.geom, t2.geom, 300)
where t1.seg is null
) as near
order by t1geom <-> t2geom
) as distOrdered
order by dist
) as idOrdered
group by t1id order by t1id
) as short
on allD.t1id = short.t1id and allD.dist = short.md
There is actually a much simpler way of doing this by using cross join lateral which is effectively doing a nearest neighbor search for each row.
select
t1.id, t1.id, dist
from t1
cross join lateral
(
select st_distance(t1.geom, t2.geom) dist, t2.id
from t2
where t1.seg is null
order by t1.geom <-> t2.geom
limit 1
) as b
where b.dist < 300

selecting only two employees from every department

Can you let me know how to select only two employees from every department? The table has deptname, ssn, name . I am doing a sampling and I need only two ssns for every department name. Can someone help?
You can accomplish this with an "OLAP expression" row_number()
with e as
( select deptname, ssn, empname,
row_number() over (partition by dptname order by empname) as pick
from employees
)
select deptname, ssn, empname
from e
where pick < 3
order by deptname, ssn
This example will give you the two employees with the lowest order names, because that is what is specified in the row_number() (order by) expression.
Try this:
select *
from t t1
where (
select count(*)
from t t2
where
t2.deptname = t1.deptname
and
t2.ssn <= t1.ssn) <= 2
order by deptname, ssn,name;
The above will give "smallest" two ssn.
If you want top 2, change to t2.ssn >= t1.ssn
sqlfiddle
The data:
The result from query:
select * from
( select rank() over (partition by dptname order by empname) as count , *
from employees
)
where count<=2
order by deptname, ssn,name;

Is T-SQL (2005) RANK OVER(PARTITION BY) the answer?

I have a stored procedure that does paging for the front end and is working fine. I now need to modify that procedure to group by four columns of the 20 returned and then only return the row within each group that contains the lowest priority. So when resort_id, bedrooms, kitchen and checkin (date) all match then only return the row that has the min priority. I have to still maintain the paging functionality. The #startIndex and #upperbound are parms passed into the procedure from the front end for paging. I’m thinking that RANK OVER (PARTITION BY) is the answer I just can’t quite figure out how to put it all together.
SELECT I.id,
I.resort_id,
I.[bedrooms],
I.[kitchen],
I.[checkin],
I.[priority],
I.col_1,
I.col_2 /* ..... (more cols) */
FROM (
SELECT ROW_NUMBER() OVER(ORDER by checkin) AS rowNumber,
*
FROM Inventory
) AS I
WHERE rowNumber >= #startIndex
AND rowNumber < #upperBound
ORDER BY rowNumber
Example 2 after fix:
SELECT I.resort_id,
I.[bedrooms],
I.[kitchen],
I.[checkin],
I.[priority],
I.col_1,
I.col_2 /* ..... (more cols) */
FROM Inventory i
JOIN
(
SELECT ROW_NUMBER() OVER(ORDER BY h.checkin) as rowNumber, MIN(h.id) as id
FROM Inventory h
JOIN (
SELECT resort_id, bedrooms, kitchen, checkin, id, MIN(priority) as priority
FROM Inventory
GROUP BY resort_id, bedrooms, kitchen, checkin, id
) h2 on h.resort_id = h2.resort_id and
h.bedrooms = h2.bedrooms and
h.kitchen = h2.kitchen and
h.checkin = h2.checkin and
h.priority = h2.priority
GROUP BY h.resort_id, h.bedrooms, h.kitchen, h.checkin, h.priority
) AS I2
on i.id = i2.id
WHERE rowNumber >= #startIndex
AND rowNumber < #upperBound
ORDER BY rowNumber
I would accompish it this way.
SELECT I.resort_id,
I.[bedrooms],
I.[kitchen],
I.[checkin],
I.[priority],
I.col_1,
I.col_2 /* ..... (more cols) */
FROM Inventory i
JOIN
(
SELECT ROW_NUMBER(ORDER BY Checkin) as rowNumber, MIN(id) id
FROM Inventory h
JOIN (
SELECT resort_id, bedrooms, kitchen, checkin id, MIN(priority) as priority
FROM Inventory
GROUP BY resort_id, bedrooms, kitchen, checkin
) h2 on h.resort_id = h2.resort and
h.bedrooms = h2.bedrooms and
h.kitchen = h2.kitchen and
h.checkin = h2.checkin and
h.priority = h2.priority
GROUP BY h.resort_id, h.bedrooms, h.kitchen, h.checkin, h.priority
) AS I2
on i.id = i2.id
WHERE rowNumber >= #startIndex
AND rowNumber < #upperBound
ORDER BY rowNumber