A follow on from this question: ST_3DClosestPoint returning multiple points
1) I have a xyz target point stored as a geom, I want to update the row with the 3Ddistance to the nearest obs point in another table. So find the nearest obs point and then update the target point with the distance.
expected result for the target point:
id|geom|3Ddistance_To_Nearest_Point_In_Obs_Table
obs table:
id|geom
e.g. 100 records
2) To complicate matters, I also want to select n-neighbours from the obs table (lets say 10 for example) and calculate the average distance and update the target table.
expected target result:
id|geom|average_3Ddistance
I've been trying to alter the former example, but no joy, any ideas?
Thanks
If collections are static then you can CTAS (create table as select) your results instead of updating it.
create table new_table as
select t2.id, t2.geom, min(3ddistance) min_3DDistance, avg(3ddistance) avg_3ddistance
from target t2,
lateral (select t.id, st_3ddistance(o.geom, t.geom) 3ddistance
from obs o, target t
where t2.id=t.id
order by st_3ddistance(o.geom, t.geom) limit 10) a
group by t2.id, t2.geom;
or if you want to update
update target
set (average_3ddistance, min_3ddistance)=(
from (select id, min(3ddistance) min_3DDistance, avg(3ddistance) avg_3ddistance
from (select t.id, st_3ddistance(o.geom, t.geom) 3ddistance
from obs o, target t
where t2.id=t.id
order by st_3ddistance(o.geom, t.geom) limit 10) a
group by id) b
where b.id=t.id;
Related
I have query such as
select * from batteries as b ORDER BY inserted_at desc
which gives me data such as
and I have an query such as
select voltage, datetime, battery_id from battery_readings ORDER BY inserted_at desc limit 1
which returns data as
I want to combine both 2 above queries, so in one go, I can have each battery details as well as its last added voltage and datetime from battery_readings.
Postgres has a very useful syntax for this, called DISTINCT ON. This is different from plain DISTINCT in that it keeps only the first row of each set, defined by the sort order. In your case, it would be something like this:
SELECT DISTINCT ON (b.id)
b.id,
b.name,
b.source_url,
b.active,
b.user_id,
b.inserted_at,
b.updated_at,
v.voltage,
v.datetime
FROM battery b
JOIN battery_voltage v ON (b.id = v.battery_id)
ORDER BY b.id, v.datetime desc;
I think that widowing will make what you expected.
Assuming two tables
create table battery (id int, name text);
create table bat_volt(measure_time int, battery_id int, val int);
One of the possible queries is like this:
with latest as (select battery_id, max(measure_time) over (partition by battery_id) from bat_volt)
select * from battery b join bat_volt bv on bv.battery_id=b.id where (b.id,bv.measure_time) in (select * from latest);
If you have Postgres version which supports lateral, it might also make sense to try it out (in case there are way more values than batteries, it could have better performance).
select * from battery b
join bat_volt bv on bv.battery_id=b.id
join lateral
(select battery_id, max(measure_time) over (partition by battery_id) from bat_volt bbv
where bbv.battery_id = b.id limit 1) lbb on (lbb.max = bv.measure_time AND lbb.battery_id = b.id);
How can I count how many buffers intersects and select only those which have between 2 AND 6 intersections with other buffers? I can create a buffer with following query, but I don't have an idea on how to count intersections
SELECT ST_Buffer(geom::geography, 400)
FROM mytable;
I appreciate any help. Thanks.
It is wrong to use a buffer in such case, as the buffer is only an approximation. Instead, use the index compatible st_dwithin() function.
The idea is to select all points (polygons or else) that are within twice the distance, to group the result and keep the ones having at least 6 nearby features.
The example below use 2 tables, but you can use the same table twice.
SELECT myTable.ID, count(*), array_agg(myOtherTable.ID) as nearby_ids
FROM mytable
JOIN myOtherTable ON st_Dwithin(mytable.geom::geography, myOtherTable.geom::geography, 800)
GROUP BY myTable.ID
HAVING count(*) >= 6;
To use the same table twice, you can alias them:
SELECT a.ID, count(*), array_agg(b.ID) as nearby_ids
FROM mytable a
JOIN mytable b ON st_Dwithin(a.geom::geography, b.geom::geography, 800)
GROUP BY a.ID
HAVING count(*) >= 6;
I have hundreds of polygon (circles) where some of the polygon intersected with each others. This polygon is come from single feature layer. What I am trying to do is to delete the intersected circles.
It is similar to this question: link, but those were using two different layer. In my case the intersection is from single feature layers.
If I understood your question right, you just need to either create a CTE or simple subquery.
This might give you a good idea of how to solve your issue:
CREATE TABLE t (id INTEGER, geom GEOMETRY);
INSERT INTO t VALUES
(1,'POLYGON((-4.54 54.30,-4.46 54.30,-4.46 54.29,-4.54 54.29,-4.54 54.30))'),
(2,'POLYGON((-4.66 54.16,-4.56 54.16,-4.56 54.14,-4.66 54.14,-4.66 54.16))'),
(3,'POLYGON((-4.60 54.19,-4.57 54.19,-4.57 54.15,-4.60 54.15,-4.60 54.19))'),
(4,'POLYGON((-4.40 54.40,-4.36 54.40,-4.36 54.38,-4.40 54.38,-4.40 54.40))');
This data set contains 4 polygons in total and two of them overlap, as seen in the following picture:
Applying a CTE with a subquery might give you what you want, which is the non-overlapping polygons from the same table:
SELECT id, ST_AsText(geom) FROM t
WHERE id NOT IN (
WITH j AS (SELECT * FROM t)
SELECT j.id
FROM j
JOIN t ON t.id <> j.id
WHERE ST_Intersects(j.geom,t.geom)
);
id | st_astext
----+---------------------------------------------------------------------
1 | POLYGON((-4.54 54.3,-4.46 54.3,-4.46 54.29,-4.54 54.29,-4.54 54.3))
4 | POLYGON((-4.4 54.4,-4.36 54.4,-4.36 54.38,-4.4 54.38,-4.4 54.4))
(2 rows)
You can write quite clear delete statement using EXISTS clause. You literally want to delete the rows, for which there exists other rows which geometry intersects:
DELETE
FROM myTable t1
WHERE EXISTS (SELECT 1 FROM myTable t2 WHERE t2.id <> t1.id AND ST_Intersects(t1.geom, t2.geom))
So I have three tables (A, B, C). In tables A and B I have points, and I want to insert into C each row from A, and some columns from the closest point from B to each point in A, as well as the distance between them. I know that the query to get the nearest neighbour is this:
SELECT DISTINCT ON (A.id5) A.state, B.way, st_distance (A.geom,B.geom) INTO C
FROM A, B
WHERE ST_DWithin(A.geom, B.geom, 150)
ORDER BY A.objectid, ST_Distance(A.geom,A.geom)
But I need to get that into a bigger INSERT query, and I tried to do it this way:
INSERT INTO complete(id_door, distance, id_way,Y, X, geom, check)
(SELECT A.state, (select distinct on (A.id5) ST_DISTANCE(A.geom,B.geom) from A order by A.id5, st_distance(A.geom,B.geom)), b.way, ST_Y(B.geom), ST_X(B.geom) ,B.geom, V.check
FROM A, B, C, V
WHERE
ST_INTERSECTS(A.geom, V.geom)\
AND ST_DWithin(A.geom, B.geom,150))
But this is not the right way, because I get the error:
psycopg2.ProgrammingError: more than one row returned by a subquery used as an expression
I cannot copy all the distances from A and B to C and then delete all but the closest because it is a huge table and I would run out of memory, so I need a way to only insert the rows with the info from the closest point from B to A.
What am I doing wrong here? Thank you in advance
UPDATE:
After some help, I have learned that I should use a Lateral in the Select query, but I'm not sure how to use it.
I need the Select to get each row in table A and find its nearest neighbour from table B, which I guess it is done using the query previously stated, and insert into table C some columns from A, some columns from its nearest neighbour (table B), and some columns from table V, which is selected by an Intersect condition. The main problem is how to organize all that into the Select so I don't get an error.
This is where I am at this point:
INSERT INTO C (id_door, distance, id_way,Y, X, geom, check)
(SELECT A.state, l.*, V.check
FROM A, B, C, V
lateral (select st_distance(a.geom,b.geom), b.way, ST_Y(B.geom), ST_X(B.geom) ,B.geom
From B
Where ST_DWithin(a.geom, b.geom,150))
Order by a.geom<->b.geom limit 1) l
WHERE
ST_INTERSECTS(A.geom, V.geom)
You can use lateral join - very smart type of subquery that can reference tables outside the subquery. More about lateral you can find here
-- Edited according to new information in answer --
Insert into C (id_door, distance, id_way,Y, X, geom, check)
select l.*
from a,
lateral (select a.state, st_distance(a.geom,b.geom),
b.way, ST_Y(B.geom), ST_X(B.geom), B.geom,
v.check
from b, v
where ST_DWithin(a.geom, b.geom,150)
and st_dwithin(a.geom,v.geom,0)
and st_intersects(a.geom,v.geom)
order by a.geom<->b.geom, v.geom limit 1) l
If you want more records per each point from A then increase the limit from 1 to your desired value.
I have a table T as follows with 1 Billion records. Currently, this table has no Primary key or Indexes.
create table T(
day_c date,
str_c varchar2(20),
comm_c varchar2(20),
src_c varchar2(20)
);
some sample data:
insert into T
select to_date('20171011','yyyymmdd') day_c,'st1' str_c,'c1' comm_c,'s1' src_c from dual
union
select to_date('20171012','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171013','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171014','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171015','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171016','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171017','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171018','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171019','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171020','yyyymmdd'),'st1','c1','s1' from dual;
The expected result is to generate the date ranges for the changes in column src_c.
I have the following code snippet which provides the desired result. However, it is slow as the cost of running lag and lead is quite high on the table.
WITH EndsMarked AS (
SELECT
day_c,str_c,comm_c,src_c,
CASE WHEN src_c= LAG(src_c,1) OVER (ORDER BY day_c)
THEN 0 ELSE 1 END AS IS_START,
CASE WHEN src_c= LEAD(src_c,1) OVER (ORDER BY day_c)
THEN 0 ELSE 1 END AS IS_END
FROM T
), GroupsNumbered AS (
SELECT
day_c,str_c,comm_c,
src_c,
IS_START,
IS_END,
COUNT(CASE WHEN IS_START = 1 THEN 1 END)
OVER (ORDER BY day_c) AS GroupNum
FROM EndsMarked
WHERE IS_START=1 OR IS_END=1
)
SELECT
str_c,comm_c,src_c,
MIN(day_c) AS GROUP_START,
MAX(day_c) AS GROUP_END
FROM GroupsNumbered
GROUP BY str_c,comm_c, src_c,GroupNum
ORDER BY groupnum;
Output :
STR_C COMM_C SRC_C GROUP_START GROUP_END
st1 c1 s1 11-OCT-17 13-OCT-17
st1 c1 s2 14-OCT-17 16-OCT-17
st1 c1 s1 17-OCT-17 20-OCT-17
Any suggestion to speed up?
Oracle database :12c.
SGA Memory:20GB
Total CPU:22
Explain plan:
Order by day_c only, or do you need to partition by str_c and comm_c first? It seems so - in which case I am not sure your query is correct, and Sentinel's solution will need to be adjusted accordingly.
Then:
For some reason (which escapes me), it appears that the match_recognize clause (available only since Oracle 12.1) is faster than analytic functions, even when the work done seems to be the same.
In your problem, (1) you must read 1 billion rows from disk, which can't be done faster than the hardware allows (do you REALLY need to do this on all 1 billion rows, or should you archive a large portion of your table, perhaps after performing this identification of GROUP_START and GROUP_END)? (2) you must order the data by day_c no matter what method you use, and that is time consuming.
With that said, the tabibitosan method (see Sentinel's answer) will be faster than the start-of-group method (which is close to, but simpler than what you currently have).
The match_recognize solution, which will probably be faster than any solution based on analytic functions, looks like this:
select str_c, comm_c, src_c, group_start, group_end
from t
match_recognize(
partition by str_c, comm_c
order by day_c
measures x.src_c as src_c,
first(day_c) as group_start,
last(day_c) as group_end
pattern ( x y* )
define y as src_c = x.src_c
)
-- Add ORDER BY clause here, if needed
;
Here is a quick explanation of how this works; for developers who are not familiar with match_recognize, I provided links to a few good tutorials in a Comment below this Answer.
The match_recognize clause partitions the input rows by str_c and comm_c and orders them by day_c. So far this is exactly the same work that analytic functions do.
Then in the PATTERN and DEFINE clauses I declare and define two "classes" of rows, which will be flagged as X and Y, respectively. X is any row (there are no restrictions on it in the DEFINE clause). However, Y is restricted: it must have the same src_c as the last X row preceding it.
So, in each partition, and reading from the earliest row to the latest (within the partition), I am looking for any number of matches, where a match consists of an arbitrary row (marked X), followed by as many Y rows as possible; where Y means "same src_c as the first row in this match. So, this will identify sequences of rows where the src_c did not change.
For each match that is found, the clause will output the src_c value from the X row (which is the same, really, for all the rows in that match), and the first and the last value in the day_c column for that match. That is what we need to put in the SELECT clause of the overall query.
You can eliminate one CTE by using the Tabibito-san (Traveler) method:
with Groups as (
select t.*
, row_number() over (order by day_c)
- row_number() over (partition by str_c
, comm_c
, src_c
order by day_c) GroupNum
from t
)
select str_c
, comm_c
, src_c
, min(day_c) GROUP_START
, max(day_c) GROUP_END
from Groups
group by str_c
, comm_c
, src_c
, GroupNum