postgis advanced (?) selection query

postgis advanced (?) selection query - postgresql

The problem: I need to select, for each building in my table that has say at least 2 pharmacies and 2 education centers within a radius of 1km, all POIs (pharmacies, comercial centres, medical centers, education centers, police stations, fire stations) which are within 1km of the respective building. table structure->
building (id serial, name varchar )
poi_category(id serial, cname varchar) --cname being the category name of course
poi(id serial, name varchar, c_id integer)-- c_id is the FK referencing poi_category(id)
all coordinate columns are of type geometry not geography (let's call them geom)
here's the way i thought it should be done but i'm not sure it's even correct let alone the optimal solution to this problem
SELECT r.id_b, r.id_p
FROM (
SELECT b.id AS id_b, p.id AS id_p, pc.id AS id_pc,pc.cname
FROM building AS b, poi AS p, poi_category AS pc
WHERE ST_DWithin(b.geom,p.geom, 1000) AND p.c_id=pc.id
) AS r,
(
SELECT * FROM r GROUP BY id_b
) AS r1
HAVING count (
SELECT *
FROM r, r1
WHERE r1.id_b=r.id_b AND r.id_pc='pharmacy'
)>1
AND
count (
SELECT *
FROM r, r1
WHERE r1.id_b=r.id_b AND r.id_pc='ed. centre'
)>1
Is this the way to go for what i need ? What solution would be better from a performance point of view? What about the most elegant solution?
I've also posted here :http://gis.stackexchange.com/questions/11445/postgis-advanced-selection-query

This is a solution I elaborated. It's the fastest one I could find but it's still slow. Given the nature of the task I doubt it can be made faster...
WITH
building AS (
SELECT way, osm_id
FROM osm_polygon
WHERE tags #> hstore('building','yes')
--ORDER BY 1
LIMIT 1000
),
pharmacy AS (
SELECT way
FROM osm_poi
WHERE tags #> hstore('amenity','pharmacy')
),
school AS (
SELECT way
FROM osm_poi
WHERE tags #> hstore('amenity','school')
)
SELECT ST_AsText(building.way) AS geom, building.osm_id AS label
FROM building
WHERE
(SELECT count(*) > 1
FROM pharmacy
WHERE ST_DWithin(building.way,pharmacy.way,1000))
AND
(SELECT count(*) > 1
FROM school
WHERE ST_DWithin(building.way,school.way,1000))
Yours. S.

Related

Express Nearest Neighbor Join in Postgresql?

I have two tables Q and T, both containing a column of float numbers.
What I want to do is, for each number in Q, I want to find a number in T that has the smallest distance to it.
For example, for T={1,7,9} and Q={2,6,10}, I want to return Q,T pairs as {(2,1),(6,7),(10,9)}.
How should I express this query with SQL?
In addition, is that possible to accelerate this join by index, e.g. add an operator class which bind "FOR ORDER BY <->" with fabs calculation?

create table t (val_t integer);
create table q (val_q integer);
insert into t values (1),(7),(9);
insert into q values (2),(6),(10);
Start with a query that cross joins the two tables and adds a rank based on the difference:
SELECT val_q, val_t, rank() OVER (PARTITION BY val_q ORDER BY abs(val_t - val_q))
FROM t
JOIN q ON true ;
Use this query in a cte or subquery and filter by rank:
WITH src AS(
SELECT val_q, val_t, rank() OVER (PARTITION BY val_q ORDER BY abs(val_t - val_q))
FROM t
JOIN q ON true )
SELECT val_q, val_t FROM src
WHERE rank = 1;
val_q | val_t
-------+-------
2 | 1
6 | 7
10 | 9
See https://www.postgresql.org/docs/12/tutorial-window.html

Given this schema:
create table t (tn float);
insert into t values (1), (7), (9);
create table q (qn float);
insert into q values (2), (6), (10);
DISTINCT ON is the most straightforward way:
select distinct on (qn) qn, tn
from q
cross join t
order by qn, abs(qn - tn);
Exploiting a numeric range may perform better depending on your data sizes. If performance is an issue, then you can create an actual temp table for the range_tn CTE and put a gist index on it:
with all_tn as (
select tn
from t
union select null
), range_tn as (
select numrange(tn::numeric, (lead(tn) over w)::numeric, '[]') as tr
from all_tn
window w as (order by tn nulls first)
)
select qn,
case
when lower_inf(tr) then upper(tr)
when upper_inf(tr) then lower(tr)
when 2 * qn - lower(tr) - upper(tr) > 0 then upper(tr)
else lower(tr)
end as tn
from q
join range_tn
on qn::numeric <# tr;
Fiddle here

how to get last added record for a battery with left join PSQL

I have query such as
select * from batteries as b ORDER BY inserted_at desc
which gives me data such as
and I have an query such as
select voltage, datetime, battery_id from battery_readings ORDER BY inserted_at desc limit 1
which returns data as
I want to combine both 2 above queries, so in one go, I can have each battery details as well as its last added voltage and datetime from battery_readings.

Postgres has a very useful syntax for this, called DISTINCT ON. This is different from plain DISTINCT in that it keeps only the first row of each set, defined by the sort order. In your case, it would be something like this:
SELECT DISTINCT ON (b.id)
b.id,
b.name,
b.source_url,
b.active,
b.user_id,
b.inserted_at,
b.updated_at,
v.voltage,
v.datetime
FROM battery b
JOIN battery_voltage v ON (b.id = v.battery_id)
ORDER BY b.id, v.datetime desc;

I think that widowing will make what you expected.
Assuming two tables
create table battery (id int, name text);
create table bat_volt(measure_time int, battery_id int, val int);
One of the possible queries is like this:
with latest as (select battery_id, max(measure_time) over (partition by battery_id) from bat_volt)
select * from battery b join bat_volt bv on bv.battery_id=b.id where (b.id,bv.measure_time) in (select * from latest);
If you have Postgres version which supports lateral, it might also make sense to try it out (in case there are way more values than batteries, it could have better performance).
select * from battery b
join bat_volt bv on bv.battery_id=b.id
join lateral
(select battery_id, max(measure_time) over (partition by battery_id) from bat_volt bbv
where bbv.battery_id = b.id limit 1) lbb on (lbb.max = bv.measure_time AND lbb.battery_id = b.id);

Select Intersect Polygon From Single Feature Layer

I have hundreds of polygon (circles) where some of the polygon intersected with each others. This polygon is come from single feature layer. What I am trying to do is to delete the intersected circles.
It is similar to this question: link, but those were using two different layer. In my case the intersection is from single feature layers.

If I understood your question right, you just need to either create a CTE or simple subquery.
This might give you a good idea of how to solve your issue:
CREATE TABLE t (id INTEGER, geom GEOMETRY);
INSERT INTO t VALUES
(1,'POLYGON((-4.54 54.30,-4.46 54.30,-4.46 54.29,-4.54 54.29,-4.54 54.30))'),
(2,'POLYGON((-4.66 54.16,-4.56 54.16,-4.56 54.14,-4.66 54.14,-4.66 54.16))'),
(3,'POLYGON((-4.60 54.19,-4.57 54.19,-4.57 54.15,-4.60 54.15,-4.60 54.19))'),
(4,'POLYGON((-4.40 54.40,-4.36 54.40,-4.36 54.38,-4.40 54.38,-4.40 54.40))');
This data set contains 4 polygons in total and two of them overlap, as seen in the following picture:
Applying a CTE with a subquery might give you what you want, which is the non-overlapping polygons from the same table:
SELECT id, ST_AsText(geom) FROM t
WHERE id NOT IN (
WITH j AS (SELECT * FROM t)
SELECT j.id
FROM j
JOIN t ON t.id <> j.id
WHERE ST_Intersects(j.geom,t.geom)
);
id | st_astext
----+---------------------------------------------------------------------
1 | POLYGON((-4.54 54.3,-4.46 54.3,-4.46 54.29,-4.54 54.29,-4.54 54.3))
4 | POLYGON((-4.4 54.4,-4.36 54.4,-4.36 54.38,-4.4 54.38,-4.4 54.4))
(2 rows)

You can write quite clear delete statement using EXISTS clause. You literally want to delete the rows, for which there exists other rows which geometry intersects:
DELETE
FROM myTable t1
WHERE EXISTS (SELECT 1 FROM myTable t2 WHERE t2.id <> t1.id AND ST_Intersects(t1.geom, t2.geom))

Identifying rows with multiple IDs linked to a unique value

Using ms-sql 2008 r2; am sure this is very straightforward. I am trying to identify where a unique value {ISIN} has been linked to more than 1 Identifier. An example output would be:
isin entity_id
XS0276697439 000BYT-E
XS0276697439 000BYV-E
This is actually an error and I want to look for other instances where there may be more than one entity_id linked to a unique ISIN.
This is my current working but it's obviously not correct:
select isin, entity_id from edm_security_entity_map
where isin is not null
--and isin = ('XS0276697439')
group by isin, entity_id
having COUNT(entity_id) > 1
order by isin asc
Thanks for your help.

Elliot,
I don't have a copy of SQL in front of me right now, so apologies if my syntax isn't spot on.
I'd start by finding the duplicates:
select
x.isin
,count(*)
from edm_security_entity_map as x
group by x.isin
having count(*) > 1
Then join that back to the full table to find where those duplicates come from:
;with DuplicateList as
(
select
x.isin
--,count(*) -- not used elsewhere
from edm_security_entity_map as x
group by x.isin
having count(*) > 1
)
select
map.isin
,map.entity_id
from edm_security_entity_map as map
inner join DuplicateList as dup
on dup.isin = map.isin;
HTH,
Michael
So you're saying that if isin-1 has a row for both entity-1 and entity-2 that's an error but isin-3, say, linked to entity-3 in two separe rows is OK? The ugly-but-readable solution to that is to pre-pend another CTE on the previous solution
;with UniqueValues as
(select distinct
y.isin
,y.entity_id
from edm_security_entity_map as y
)
,DuplicateList as
(
select
x.isin
--,count(*) -- not used elsewhere
from UniqueValues as x
group by x.isin
having count(*) > 1
)
select
map.isin
,map.entity_id
from edm_security_entity_map as map -- or from UniqueValues, depening on your objective.
inner join DuplicateList as dup
on dup.isin = map.isin;
There are better solutions with additional GROUP BY clauses in the final query. If this is going into production I'd be recommending that. Or if your table has a bajillion rows. If you just need to do some analysis the above should suffice, I hope.

TSQL Group By with an "OR"?

This query for creating a list of Candidate duplicates is easy enough:
SELECT Count(*), Can_FName, Can_HPhone, Can_EMail
FROM Can
GROUP BY Can_FName, Can_HPhone, Can_EMail
HAVING Count(*) > 1
But if the actual rule I want to check against is FName and (HPhone OR Email) - how can I adjust the GROUP BY to work with this?
I'm fairly certain I'm going to end up with a UNION SELECT here (i.e. do FName, HPhone on one and FName, EMail on the other and combine the results) - but I'd love to know if anyone knows an easier way to do it.
Thank you in advance for any help.
Scott in Maine

Before I can advise anything, I need to know the answer to this question:
name phone email
John 555-00-00 john#example.com
John 555-00-01 john#example.com
John 555-00-01 john-other#example.com
What COUNT(*) you want for this data?
Update:
If you just want to know that a record has any duplicates, use this:
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
)
SELECT *
FROM q qo
WHERE EXISTS
(
SELECT NULL
FROM q qi
WHERE qi.id <> qo.id
AND qi.name = qo.name
AND (qi.phone = qo.phone OR qi.email = qo.email)
)
It's more efficient, but doesn't tell you where the duplicate chain started.
This query select all entries along with the special field, chainid, that indicates where the duplicate chain started.
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
),
dup AS (
SELECT id AS chainid, id, name, phone, email, 1 as d
FROM q
UNION ALL
SELECT chainid, qo.id, qo.name, qo.phone, qo.email, d + 1
FROM dup
JOIN q qo
ON qo.name = dup.name
AND (qo.phone = dup.phone OR qo.email = dup.email)
AND qo.id > dup.id
),
chains AS
(
SELECT *
FROM dup do
WHERE chainid NOT IN
(
SELECT id
FROM dup di
WHERE di.chainid < do.chainid
)
)
SELECT *
FROM chains
ORDER BY
chainid

None of these answers is correct. Quassnoi's is a decent approach, but you will notice one fatal flaw in the expressions "qo.id > dup.id" and "di.chainid < do.chainid": comparisons made by ID! This is ALWAYS bad practice because it depends on some inherent ordering in the IDs. IDs should NEVER be given any implicit meaning and should ONLY participate in equality or null testing. You can easily break Quassnoi's solution in this example by simply reordering the IDs in the data.
The essential problem is a disjunctive condition with a grouping, which leads to the possibility of two records being related through an intermediate, though they are not directly relatable.
e.g., you stated these records should all be grouped:
(1) John 555-00-00 john#example.com
(2) John 555-00-01 john#example.com
(3) John 555-00-01 john-other#example.com
You can see that #1 and #2 are relatable, as are #2 and #3, but clearly #1 and #3 are not directly relatable as a group.
This establishes that a recursive or iterative solution is the ONLY possible solution.
So, recursion is not viable since you can easily end up in a looping situation. This is what Quassnoi was trying to avoid with his ID comparisons, but in doing so he broke the algorithm. You could try limiting the levels of recursion, but you may not then complete all relations, and you will still potentially be following loops back upon yourself, leading to excessive data size and prohibitive inefficiency.
The best solution is ITERATIVE: Start a result set by tagging each ID as a unique group ID, and then spin through the result set and update it, combining IDs into the same unique group ID as they match on the disjunctive condition. Repeat the process on the updated set each time until no further updates can be made.
I will create example code for this soon.

GROUP BY doesn't support OR - it's implicitly AND and must include every non-aggregator in the select list.

I assume you also have a unique ID integer as the primary key on this table. If you don't, it's a good idea to have one, for this purpose and many others.
Find those duplicates by a self-join:
select
c1.ID
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
, c2.ID
, c2.Can_FName
, c2.Can_HPhone
, c2.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
order by
c1.ID
The query gives you N-1 rows for each N duplicate combinations - if you want just a count along with each unique combination, count the rows grouped by the "left" side:
select count(1) + 1,
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
group by
c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
Granted, this is more involved than a union - but I think it illustrates a good way of thinking about duplicates.

Project the desired transformation first from a derived table, then do the aggregation:
SELECT COUNT(*)
, CAN_FName
, Can_HPhoneOrEMail
FROM (
SELECT Can_FName
, ISNULL(Can_HPhone,'') + ISNULL(Can_EMail,'') AS Can_HPhoneOrEMail
FROM Can) AS Can_Transformed
GROUP BY Can_FName, Can_HPhoneOrEMail
HAVING Count(*) > 1
Adjust your 'OR' operation as needed in the derived table project list.

I know this answer will be criticised for the use of the temp table, but it will work anyway:
-- create temp table to give the table a unique key
create table #tmp(
ID int identity,
can_Fname varchar(200) null, -- real type and len here
can_HPhone varchar(200) null, -- real type and len here
can_Email varchar(200) null, -- real type and len here
)
-- just copy the rows where a duplicate fname exits
-- (better performance specially for a big table)
insert into #tmp
select can_fname,can_hphone,can_email
from Can
where can_fname exists in (select can_fname from Can
group by can_fname having count(*)>1)
-- select the rows that have the same fname and
-- at least the same phone or email
select can_Fname, can_Hphone, can_Email
from #tmp a where exists
(select * from #tmp b where
a.ID<>b.ID and A.can_fname = b.can_fname
and (isnull(a.can_HPhone,'')=isnull(b.can_HPhone,'')
or (isnull(a.can_email,'')=isnull(b.can_email,'') )

Try this:
SELECT Can_FName, COUNT(*)
FROM (
SELECT
rank() over(partition by Can_FName order by Can_FName,Can_HPhone) rnk_p,
rank() over(partition by Can_FName order by Can_FName,Can_EMail) rnk_m,
Can_FName
FROM Can
) X
WHERE rnk_p=1 or rnk_m =1
GROUP BY Can_FName
HAVING COUNT(*)>1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse