I have a table with about 800k points with information that includes bathimetry stored on PostGIS and served using ST_AsMVTGeom this is connected to a Web App develop using Leaflet but a certain levels (0-5) a single Vector-tile it's contaning all the data which makes the Query very slow (it takes minutes). I wonder if there exists (whitin PostGIS) a way to simplify the rows output depending on zoom level.
I'm using the following Query to create the tiles:
SELECT ST_AsMVT(q.*, 'bathymetry_layer', 4096, 'geom')
FROM (
SELECT
c.gid AS id,
c.x,
c.y,
c.z,
c.name,
c.inst,
ST_AsMVTGeom(
geom,
ST_MakeEnvelope(-86.8798828125, 20.632784250388028, -86.8359375, 20.673905264672843, 4326),
4096,
256,
false
) geom
FROM bathymetry_table WHERE geom && ST_MakeEnvelope(-86.8798828125, 20.632784250388028, -86.8359375, 20.673905264672843, 4326) AND ST_Intersects(geom, ST_MakeEnvelope(-86.8798828125, 20.632784250388028, -86.8359375, 20.673905264672843, 4326))
) q
Result of EXPLAIN ANALYZE:
Index Scan using geodat_batimetria_catoche_xcalak_geom_idx on geodat_batimetria_catoche_xcalak c (cost=0.29..8.56 rows=1 width=101) (actual time=0.117..11539.038 rows=434271 loops=1)
Index Cond: ((geom && '0103000020E610000001000000050000002B8716D9CECB55C0516B9A779CA234402B8716D9CECB55C0D34D621058493540492EFF21FD9E55C0D34D621058493540492EFF21FD9E55C0516B9A779CA234402B8716D9CECB55C0516B9A779CA23440'::geometry) AND (geom && '0103000020E610000001000000050000002B8716D9CECB55C0516B9A779CA234402B8716D9CECB55C0D34D621058493540492EFF21FD9E55C0D34D621058493540492EFF21FD9E55C0516B9A779CA234402B8716D9CECB55C0516B9A779CA23440'::geometry))
Filter: _st_intersects(geom, '0103000020E610000001000000050000002B8716D9CECB55C0516B9A779CA234402B8716D9CECB55C0D34D621058493540492EFF21FD9E55C0D34D621058493540492EFF21FD9E55C0516B9A779CA234402B8716D9CECB55C0516B9A779CA23440'::geometry)
Rows Removed by Filter: 5
Planning time: 7.222 ms
Execution time: 11579.044 ms
This function would generate vector tiles for all zooms, clustering points together for low zooms (0-10), and keeping them as is for zooms 11+.
CREATE OR REPLACE FUNCTION get_tile(z integer, x integer, y integer) RETURNS bytea AS
$$
-- MVTs can be combined by concatenating them together.
SELECT STRING_AGG(mvt, '') AS mvt
FROM (
-- Each `mvt` must be non-NULL, otherwise UNION ALL will skip them all
-- First generate the tile for the zoomed-out layer, clustering points together
SELECT COALESCE(ST_AsMVT(tile, 'bathymetry_layer', 4096, 'geom'), '') AS mvt
FROM (
-- Create a cluster of points for each tile
SELECT ST_AsMVTGeom(ST_Transform(center, 3857),
ST_TileEnvelope(z, x, y),
extent => 4096)
AS geom
-- need to add all the other columns here,
-- but they either have to be aggregated for each cluster,
-- or you need to pick one point to represent the whole cluster
FROM (
-- Cluster points into groups using DBSCAN algorithm
SELECT ST_Centroid(ST_Collect(geom)) AS center
FROM (SELECT *,
ST_ClusterDBSCAN(ST_TRANSFORM(geom, 3857),
-- Decide how many clusters you want per tile.
-- at zoom 0, earth circumference equals 1 tile. At each subsequent zoom level,
-- there are twice as many tiles for the same circumference.
-- Additionally, break each tile into 256 clusters
(40075016.6855785 / (256 * 2 ^ z)),
1) OVER () AS cluster_id
FROM bathymetry_table
-- This assumes your data should be clustered before zoom 11, and shouldn't after.
-- Keep this value in sync with the one below
WHERE z < 11
-- This assumes your data is in 4326. The envelope is always generated in 3857.
AND geom && ST_Transform(ST_TileEnvelope(z, x, y), 4326))
AS cluster
group by cluster_id) AS clusters) as tile
UNION ALL
SELECT COALESCE(ST_AsMVT(tile, 'bathymetry_layer', 4096, 'geom', 'id'), '') AS mvt
FROM (
-- Once zoomed in, there is no longer a need to cluster points
-- Note that the list of columns is different from the one above - you many want to adjust that.
SELECT ST_AsMVTGeom(ST_Transform(coord, 3857),
ST_TileEnvelope(z, x, y),
extent => 4096)
AS geom
, c.gid AS id
, c.x
, c.y
, c.z
, c.name
, c.inst
FROM bathymetry_table c
WHERE z >= 11
AND c.geom && ST_Transform(ST_TileEnvelope(z, x, y), 4326)) as tile) AS mvt
;
$$ LANGUAGE SQL IMMUTABLE
STRICT
PARALLEL SAFE;
Related
How does PostgreSQL estimate the number of rows in JOIN query like:
EXPLAIN
SELECT *
FROM R, S
WHERE (R.StartTime < S.EndTime) AND (S.StartTime < R.EndTime);
There is a chapter in the manual addressing your question exactly:
Row Estimation Examples
With explanation for what Laurenz provided, among other things.
But that wasn't the full story, yet. We also need the number of rows (cardinalities) of underlying tables. Postgres uses estimate_rel_size() defined in src/backend/utils/adt/plancat.c:
/*
* estimate_rel_size - estimate # pages and # tuples in a table or index
*
* We also estimate the fraction of the pages that are marked all-visible in
* the visibility map, for use in estimation of index-only scans.
*
* If attr_widths isn't NULL, it points to the zero-index entry of the
* relation's attr_widths[] cache; we fill this in if we have need to compute
* the attribute widths for estimation purposes.
*/
void
estimate_rel_size(Relation rel, int32 *attr_widths,
BlockNumber *pages, double *tuples, double *allvisfrac)
...
Here is a minimal SQL query to reproduce the calculation (ignoring some corner cases):
SELECT (reltuples / relpages * (pg_relation_size(oid) / 8192))::bigint
FROM pg_class
WHERE oid = 'mytable'::regclass; -- your table here
More details:
Fast way to discover the row count of a table in PostgreSQL
Example
CREATE TEMP TABLE r(id serial, start_time timestamptz, end_time timestamptz);
CREATE TEMP TABLE s(id serial, start_time timestamptz, end_time timestamptz);
INSERT INTO r(start_time, end_time)
SELECT now(), now() -- actual values don't matter for this particular case
FROM generate_series (1, 5000);
INSERT INTO s(start_time, end_time)
SELECT now(), now()
FROM generate_series (1, 10000);
VACUUM r, s; -- set reltuples & relpages in pg_class
-- add 2000 rows to S
INSERT INTO s(start_time, end_time)
SELECT now(), now()
FROM generate_series (1, 2000);
pg_class still has 5000 and 10000 reltuples, but we know there are 5000 & 12000 rows in R and S. (Since these are temporary tables, they are not covered by autovacuum, so numbers are never updated automatically.) Check:
SELECT relname, reltuples, relpages -- 5000 | 10000
FROM pg_class c
WHERE c.oid IN ('pg_temp.r'::regclass, 'pg_temp.s'::regclass);
SELECT count(*) FROM r; -- 5000
SELECT count(*) FROM s; -- 12000
Query plan:
EXPLAIN
SELECT *
FROM r, s
WHERE (r.start_time < s.end_time) AND (s.start_time < r.end_time);
'Nested Loop (cost=0.00..1053004.31 rows=6683889 width=40)'
' Join Filter: ((r.start_time < s.end_time) AND (s.start_time < r.end_time))'
' -> Seq Scan on s (cost=0.00..197.31 rows=12031 width=20)'
' -> Materialize (cost=0.00..107.00 rows=5000 width=20)'
' -> Seq Scan on r (cost=0.00..82.00 rows=5000 width=20)'
'JIT:'
' Functions: 6'
' Options: Inlining true, Optimization true, Expressions true, Deforming true'
Postgres estimates rows=12031 for table s. A pretty good estimate, the algorithm worked.
The estimate is more easily thrown off by deleting rows, as the physical size of the table doesn't shrink automatically. It's a good idea to VACUUM ANALYZE after a major DELETE. Or even VACUUM FULL ANALYZE. See:
VACUUM returning disk space to operating system
Postgres expects rows=6683889, which matches our expectation (as per Laurenz' explanation):
SELECT 5000 * 12031 * 0.3333333333333333^2 -- 6683888.89
Better query
Your example query is just that: an example. But it happens to be a poor one, as the same can be achieved with range types and operators more efficiently. Specifically with tstzrange and &&:
Selectivity for &&?
SELECT oprjoin -- areajoinsel
FROM pg_operator
WHERE oprname = '&&'
AND oprleft = 'anyrange'::regtype
AND oprright = 'anyrange'::regtype;
The source code in `src/backend/utils/adt/geoselfuncs.c:
Datum
areajoinsel(PG_FUNCTION_ARGS)
{
PG_RETURN_FLOAT8(0.005);
}
Much more selective 0.005 << 0.333! And typically more realistic.
EXPLAIN
SELECT *
FROM r, s
WHERE tstzrange(r.start_time, r.end_time) && tstzrange(s.start_time, s.end_time);
Happens to be exactly equivalent, since tstzrange defaults to including the lower bound and excluding the upper bound. I get this query plan:
'Nested Loop (cost=0.00..1203391.81 rows=300775 width=40)'
' Join Filter: (tstzrange(r.start_time, r.end_time) && tstzrange(s.start_time, s.end_time))'
' -> Seq Scan on s (cost=0.00..197.31 rows=12031 width=20)'
' -> Materialize (cost=0.00..107.00 rows=5000 width=20)'
' -> Seq Scan on r (cost=0.00..82.00 rows=5000 width=20)'
'JIT:'
' Functions: 6'
' Options: Inlining true, Optimization true, Expressions true, Deforming true'
Our expectation:
SELECT 5000 * 12031 * 0.005 -- 300775.000
It's a Bingo!
And this query can be supported with an index efficiently, changing the game ...
Assuming that the data type involved is timestamp with time time zone (but it does not really matter, as we will see), the join selectivity estimation function can be found with:
SELECT oprjoin
FROM pg_operator
WHERE oprname = '<'
AND oprleft = 'timestamptz'::regtype
AND oprright = 'timestamptz'::regtype;
oprjoin
═════════════════
scalarltjoinsel
(1 row)
That function is defined in src/backend/utils/adt/selfuncs.c:
/*
* scalarltjoinsel - Join selectivity of "<" for scalars
*/
Datum
scalarltjoinsel(PG_FUNCTION_ARGS)
{
PG_RETURN_FLOAT8(DEFAULT_INEQ_SEL);
}
This is defined in src/include/utils/selfuncs.h as
/* default selectivity estimate for inequalities such as "A < b" */
#define DEFAULT_INEQ_SEL 0.3333333333333333
So, simple as it sounds, PostgreSQL will estimate that one inequality join condition will filter out two thirds of the rows. Since there are two such conditions, the selectivity is multiplied, and PostgreSQL will estimate that the row count of the result is
(#rows in R) * (#rows in S) / 9
As of yet, PostgreSQL does not have any cross-table statistics that make this less crude.
I have hundreds of polygon (circles) where some of the polygon intersected with each others. This polygon is come from single feature layer. What I am trying to do is to delete the intersected circles.
It is similar to this question: link, but those were using two different layer. In my case the intersection is from single feature layers.
If I understood your question right, you just need to either create a CTE or simple subquery.
This might give you a good idea of how to solve your issue:
CREATE TABLE t (id INTEGER, geom GEOMETRY);
INSERT INTO t VALUES
(1,'POLYGON((-4.54 54.30,-4.46 54.30,-4.46 54.29,-4.54 54.29,-4.54 54.30))'),
(2,'POLYGON((-4.66 54.16,-4.56 54.16,-4.56 54.14,-4.66 54.14,-4.66 54.16))'),
(3,'POLYGON((-4.60 54.19,-4.57 54.19,-4.57 54.15,-4.60 54.15,-4.60 54.19))'),
(4,'POLYGON((-4.40 54.40,-4.36 54.40,-4.36 54.38,-4.40 54.38,-4.40 54.40))');
This data set contains 4 polygons in total and two of them overlap, as seen in the following picture:
Applying a CTE with a subquery might give you what you want, which is the non-overlapping polygons from the same table:
SELECT id, ST_AsText(geom) FROM t
WHERE id NOT IN (
WITH j AS (SELECT * FROM t)
SELECT j.id
FROM j
JOIN t ON t.id <> j.id
WHERE ST_Intersects(j.geom,t.geom)
);
id | st_astext
----+---------------------------------------------------------------------
1 | POLYGON((-4.54 54.3,-4.46 54.3,-4.46 54.29,-4.54 54.29,-4.54 54.3))
4 | POLYGON((-4.4 54.4,-4.36 54.4,-4.36 54.38,-4.4 54.38,-4.4 54.4))
(2 rows)
You can write quite clear delete statement using EXISTS clause. You literally want to delete the rows, for which there exists other rows which geometry intersects:
DELETE
FROM myTable t1
WHERE EXISTS (SELECT 1 FROM myTable t2 WHERE t2.id <> t1.id AND ST_Intersects(t1.geom, t2.geom))
I have a point on line with two polygons on both sides. The scenario is shown in the following:
Now, I would like to compute the perpendicular distance between two polygons (for example, yellow line) using a PostGIS query. I was wondering if someone could suggest me how to do that?
EDIT1:
Above given scenario was an example. There can be complications in scenarios having streets, points and polygons. For example, the side parallel to street may not always be there.
Scenario_2:
Scenario_3:
EDIT2
Scenario_4
I want to calculate the perpendicular distance only where there is a point on line. I understand there can be exceptions in this, as point by #JGH.
Assuming your data is projected and that the distance between the points and the two nearest polygon is the one you are looking for, you can compute the distance from each point to the two polygons and make the sum.
1) compute the distance. Restrict the search to a reasonable distance, maybe twice the expected largest distance. Make sure the geometries are indexed!!
SELECT pt.gid point_gid, plg.gid polygon_gid, ST_Distance(pt.geom, plg.geom) distance
FROM pointlayer pt, polygonlayer plg
WHERE ST_Distance(pt.geom, plg.geom) < 50
ORDER BY pt.gid, ST_Distance(pt.geom, plg.geom);
2) For each point, get the two closest polygons using the partition function
SELECT
point_gid, polygon_gid, distance
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY point_gid ORDER BY distance asc) AS rank,
t.*
FROM
[distanceTable] t) top_n
WHERE
top_n.rank <= 2;
3) agregate the result and keep track of which polygons were used
select point_gid,
sum(distance) streetwidth,
string_agg(polygon_gid || ' - ' || distance,';') polyid_dist_info
from [top_2_dist dst]
group by dst.point_gid;
All together:
SELECT
point_gid,
sum(distance) streetwidth,
string_agg(polygon_gid || ' - ' || distance,';') polyid_dist_info
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY point_gid ORDER BY distance asc) AS rank,
t.*
FROM
( SELECT pt.gid point_gid,
plg.gid polygon_gid,
ST_Distance(pt.geom, plg.geom) distance
FROM pointlayer pt, polygonlayer plg
WHERE ST_Distance(pt.geom, plg.geom) < 50
) t
) top_n
WHERE
top_n.rank <= 2
GROUP BY point_gid;
I'm stuck with a (Postgres 9.4.6) query that a) is using too much memory (most likely due to array_agg()) and also does not return exactly what I need, making post-processing necessary. Any input (especially regarding memory consumption) is highly appreciated.
Explanation:
the table token_groups holds all words used in tweets I've parsed with their respective occurrence frequency in a hstore, with one row per 10 minutes (for the last 7 days, so 7*24*6 rows in total). These rows are inserted in order of tweeted_at, so I can simply order by id. I'm using row_numberto identify when a word occurred.
# \d token_groups
Table "public.token_groups"
Column | Type | Modifiers
------------+-----------------------------+-----------------------------------------------------------
id | integer | not null default nextval('token_groups_id_seq'::regclass)
tweeted_at | timestamp without time zone | not null
tokens | hstore | not null default ''::hstore
Indexes:
"token_groups_pkey" PRIMARY KEY, btree (id)
"index_token_groups_on_tweeted_at" btree (tweeted_at)
What I'd ideally want is a list of words with each the relative distances of their row numbers. So if e.g. the word 'hello' appears in row 5 once, in row 8 twice and in row 20 once, I'd want a column with the word, and an array column returning {5,3,0,12}. (meaning: first occurrence in fifth row, next occurrence 3 rows later, next occurrence 0 rows later, next 12 rows later). If anyone wonders why: 'relevant' words occur in clusters, so (simplified) the higher the standard deviation of timely distances, the more likely a word is a keyword. See more here: http://bioinfo2.ugr.es/Publicaciones/PRE09.pdf
For now, I return an array with positions and an array with frequencies, and use this info to calculate the distances in ruby.
Currently the primary problem is a high memory spike, which seems to be caused by array_agg(). As I'm being told by the (very helpful) heroku staff that some of my connections use 500-700MB with very little shared memory, causing out of memory errors (I'm running Standard-0, which gives me 1GB total for all connections), I need to find an optimization.
The total number of hstore entries is ~100k, which then is aggregated (after skipping words with very low frequency):
SELECT COUNT(*)
FROM (SELECT row_number() over(ORDER BY id ASC) AS position,
(each(tokens)).key, (each(tokens)).value::integer
FROM token_groups) subquery;
count
--------
106632
Here is the query causing the memory load:
SELECT key, array_agg(pos) AS positions, array_agg(value) AS frequencies
FROM (
SELECT row_number() over(ORDER BY id ASC) AS pos,
(each(tokens)).key,
(each(tokens)).value::integer
FROM token_groups
) subquery
GROUP BY key
HAVING SUM(value) > 10;
The output is:
key | positions | frequencies
-------------+---------------------------------------------------------+-------------------------------
hello | {172,185,188,210,349,427,434,467,479} | {1,2,1,1,2,1,2,1,4}
world | {166,218,265,343,415,431,436,493} | {1,1,2,1,2,1,2,1}
some | {35,65,101,180,193,198,223,227,420,424,427,428,439,444} | {1,1,1,1,1,1,1,2,1,1,1,1,1,1}
other | {77,111,233,416,421,494} | {1,1,4,1,2,2}
word | {170,179,182,184,185,186,187,188,189,190,196} | {3,1,1,2,1,1,1,2,5,3,1}
(...)
Here's what explain says:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12789.00..12792.50 rows=200 width=44) (actual time=309.692..343.064 rows=2341 loops=1)
Output: ((each(token_groups.tokens)).key), array_agg((row_number() OVER (?))), array_agg((((each(token_groups.tokens)).value)::integer))
Group Key: (each(token_groups.tokens)).key
Filter: (sum((((each(token_groups.tokens)).value)::integer)) > 10)
Rows Removed by Filter: 33986
Buffers: shared hit=2176
-> WindowAgg (cost=177.66..2709.00 rows=504000 width=384) (actual time=0.947..108.157 rows=106632 loops=1)
Output: row_number() OVER (?), (each(token_groups.tokens)).key, ((each(token_groups.tokens)).value)::integer, token_groups.id
Buffers: shared hit=2176
-> Sort (cost=177.66..178.92 rows=504 width=384) (actual time=0.910..1.119 rows=504 loops=1)
Output: token_groups.id, token_groups.tokens
Sort Key: token_groups.id
Sort Method: quicksort Memory: 305kB
Buffers: shared hit=150
-> Seq Scan on public.token_groups (cost=0.00..155.04 rows=504 width=384) (actual time=0.013..0.505 rows=504 loops=1)
Output: token_groups.id, token_groups.tokens
Buffers: shared hit=150
Planning time: 0.229 ms
Execution time: 570.534 ms
PS: if anyone wonders: every 10 minutes I append new data to the token_groupstable and remove outdated data. Which is easy when storing data one row per 10 minutes, I still have to come up with a better data structure that e.g. uses one row per word. But that does not seem to be the main issue, I think it's the array aggregation.
Your presented query can be simpler, evaluating each() only once per row:
SELECT key, array_agg(pos) AS positions, array_agg(value) AS frequencies
FROM (
SELECT t.key, pos, t.value::int
FROM (SELECT row_number() OVER (ORDER BY id) AS pos, * FROM token_groups) tg
, each(g.tokens) t -- implicit LATERAL join
ORDER BY t.key, pos
) sub
GROUP BY key
HAVING sum(value) > 10;
Also preserving correct order of elements.
What I'd ideally want is a list of words with each the relative distances of their row numbers.
This would do it:
SELECT key, array_agg(step) AS occurrences
FROM (
SELECT key, CASE WHEN g = 1 THEN pos - last_pos ELSE 0 END AS step
FROM (
SELECT key, value::int, pos
, lag(pos, 1, 0) OVER (PARTITION BY key ORDER BY pos) AS last_pos
FROM (SELECT row_number() OVER (ORDER BY id)::int AS pos, * FROM token_groups) tg
, each(g.tokens) t
) t1
, generate_series(1, t1.value) g
ORDER BY key, pos, g
) sub
GROUP BY key;
HAVING count(*) > 10;
SQL Fiddle.
Interpreting each hstore key as a word and the respective value as number of occurrences in the row (= for the last 10 minutes), I use two cascading LATERAL joins: 1st step to decompose the hstore value, 2nd step to multiply rows according to value. (If your value (frequency) is mostly just 1, you can simplify.) About LATERAL:
What is the difference between LATERAL and a subquery in PostgreSQL?
Then I ORDER BY key, pos, g in the subquery before aggregating in the outer SELECT. This clause seems to be redundant, and in fact, I see the same result without it in my tests. That's a collateral benefit from the window definition of lag() in the inner query, which is carried over to the next step unless any other step triggers re-ordering. However, now we depend on an implementation detail that's not guaranteed to work.
Ordering the whole query once should be substantially faster (and easier on the required sort memory) than per-aggregate sorting. This is not strictly according to the SQL standard either, but the simple case is documented for Postgres:
Alternatively, supplying the input values from a sorted subquery will usually work. For example:
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
But this syntax is not allowed in the SQL standard, and is not portable to other database systems.
Strictly speaking, we only need:
ORDER BY pos, g
You could experiment with that. Related:
PostgreSQL unnest() with element number
Possible alternative:
SELECT key
, ('{' || string_agg(step || repeat(',0', value - 1), ',') || '}')::int[] AS occurrences
FROM (
SELECT key, pos, value::int
,(pos - lag(pos, 1, 0) OVER (PARTITION BY key ORDER BY pos))::text AS step
FROM (SELECT row_number() OVER (ORDER BY id)::int AS pos, * FROM token_groups) g
, each(g.tokens) t
ORDER BY key, pos
) t1
GROUP BY key;
-- HAVING sum(value) > 10;
Might be cheaper to use text concatenation instead of generate_series().
I have two maps loaded into the database, one map has the geometry of the states, and the other one has the geometry of the urban areas.
I want to make and intersection to make a relation between the urban areas and the states to know which urban area belong to each state.
The problem is that some urban areas occupy two states, the urban area belongs to the state in which it has more area within the state geometry.
i could use the command ST_Intersects, but it would add it both states in which the urban are instersect with the states.
What sql command i have to use? i have read the documentation of
ST_CoveredBy and ST_Within but i'm not really sure if they work for what i need to do.
First create the instersection object between state and urban region and calculate area size, using the ST_intersect on the JOIN will use index to avoid overhead.
Then assign a row_number to each urban_id order by area size.
With rn = 1 mean only return the largest area for each urban_id.
.
WITH cte as (
SELECT S.state_id,
U.urban_id,
ST_Area(ST_Intersection( S.geom, U.geom )) a_geom
-- This create the intersect geom and calculate area
FROM states S
JOIN urban U
ON ST_Intersects( S.geom, U.geom ) -- This is a boolean function
),
area as (
SELECT state_id,
urban_id,
row_number() over (partition by urban_id order by a_geom desc) as rn
FROM cte
)
SELECT state_id,
urban_id
FROM area
WHERE rn = 1
You can use ST_Area on ST_Intersection to sort and LATERAL JOIN.
WITH states(id, geom) AS(
VALUES (1, ST_MakePolygon(ST_GeomFromText('LINESTRING(0 0, 1 0, 1 1, 0 1, 0 0)')))
,(2, ST_MakePolygon(ST_GeomFromText('LINESTRING(1 0, 2 0, 2 1, 1 1, 1 0)')))
),cities(id, geom) AS(
VALUES (1,ST_Buffer(ST_GeomFromText('POINT(0.5 0.5)'), 0.3))
,(2,ST_Buffer(ST_GeomFromText('POINT(1.5 0.5)'), 0.3))
,(3,ST_Buffer(ST_GeomFromText('POINT(1.1 0.5)'), 0.3))
)
SELECT c.id AS city, s.id AS state
FROM cities AS c
CROSS JOIN LATERAL (SELECT s.id, s.geom
FROM states AS s
WHERE ST_Intersects(s.geom, c.geom)
ORDER BY ST_AREA(ST_Intersection(s.geom,c.geom)) DESC
LIMIT 1) AS s