Efficiently selecting from a large table using floor() in Postgres - postgresql

I have two tables: One with squares with columns x and y over the natural numbers, and another with points on this grid created by the first table. Example schema:
Grid Table
id | x | y
------------
123 | 1 | 1
234 | 1 | 2
345 | 2 | 1
456 | 2 | 2
Then, the points table:
id | x | y
----------------
12 | 1.23 | 1.23
23 | 2.89 | 1.55
Currently, using this query:
SELECT g.* FROM grid as g, points as p
WHERE p.id=23 AND floor(p.x)=g.x AND floor(p.y)=g.y;
I get the expected result, which is the grid square in which the point with id 23 resides (grid with id 345); However, when the table grid has 10,000,000 rows (the current situation I'm in), this query is incredibly slow, i.e. on the order of a few seconds.
I've found a workaround for this, but it's ugly:
SELECT g.* FROM grid as g, points as p
WHERE p.id=23 AND (p.x-.5)::integer=g.x AND (p.y-.5)::integer=g.y;
I get the expected result again, and in 11ms, but this feels hacky. Are there cleaner ways to do this? Any help is appreciated!

You can use a CTE, as it is evaluated once only.
WITH p2 AS (select floor(p.x) x,
floor(p.y) y
from points p
where p.id=23)
SELECT g.*
FROM grid g
INNER JOIN p2
ON p2.x=g.x and p2.y=g.y

Related

PostGIS minimum distance of two sets including other variables from both tables

I have two tables (table1 and table2) with three columns: id, value and geometry. The geometries are point features.
I want to do a join between both tables where the resulting table contains for each point of table1, the minimum distance to a point of table2, the value of table1 and the value of the corresponding point of table2.
I tried the following code, but logically, this gives for each poin of table1 the distance to each point of table2. However, I cannot leave v2 from the group by clause. How can I get the table I want?
SELECT t1.value AS v1,
t2.value AS v2,
MIN(st_distance(t1.eometry, t2.geometry)) AS dis
FROM table1 t1, table2 t2
GROUP BY v1, v2
For some simplicity I simply took integer values and their differences instead of the distance between points (but it should be exactly the same: just change the subtraction against the st_distance function):
demo:db<>fiddle
SELECT DISTINCT ON (v1.point)
v1.point,
v2.point,
abs(v1.point - v2.point)
FROM
table1 v1
CROSS JOIN table2 v2
ORDER BY v1.point, abs(v1.point - v2.point)
My tables:
table1.point: 1, 2, 4, 8, 16
table2.point: 2, 3, 5, 7, 11, 13
The result:
| point | point | abs |
|-------|-------|-----|
| 1 | 2 | 1 |
| 2 | 2 | 0 |
| 4 | 3 | 1 |
| 8 | 7 | 1 |
| 16 | 13 | 3 |
Explanation:
You have to calculate all differences to know which one is the smallest. That's the reason for the CROSS JOIN. Now you can ORDER BY the points of table1 and the differences (or distances). Notice the abs() function: This makes all negative values positive. Otherwise difference -42 would be taken instead of +1.
DISTINCT ON (v1.point) takes the first ordered row for each v1.point.
Notice:
Because of the CROSS JOIN and the heavy mathematics in st_distance it could be really slow for huge data sets!

How to calculate the nearest neighbor distance for 10000 points in a table

I am using PostgreSQL and I am using PostGIS extension.
I am able to compare one point with this query:
SELECT st_distance(geom, 'SRID=4326;POINT(12.601828337172 50.5173393068512)'::geometry) as d
FROM pointst1
ORDER BY d
but I want to compare not to one fixed point but to a column of points. And I want to do this with some sort of indexing so that it is computationally cheap and not 10000x10000 like a cross join within that table.
Create table:
create table pointst1
(
id integer not null
constraint pointst1_id_pk
primary key,
geom geometry(Point, 4325)
);
create unique index pointst1_id_uindex
on pointst1 (id);
create index geomidx
on pointst1 (geom);
Edit:
Refined query (comparing 10000 points with their nearest neighbor but getting the result of the point itself which is 0 and not the next nearest point:
select points.*,
p1.id as p1_id,
ST_Distance(geography(p1.geom), geography(points.geom)) as distance
from
(select distinct on(p2.geom)*
from pointst1 p2
where p2.id is not null) as points
cross join lateral
(select id, geom
from pointst1
order by points.geom <-> geom
limit 1) as p1;
Your query is already calculating the distance from the given geometry to all records in the table pointst1.
Considering these values ..
INSERT INTO pointst1 VALUES (1,'SRID=4326;POINT(16.19 48.21)'),
(2,'SRID=4326;POINT(18.96 47.50)'),
(3,'SRID=4326;POINT(13.47 52.52)'),
(4,'SRID=4326;POINT(-3.70 40.39)');
... if you run your query, it will already calculate the distance from all points in the table:
SELECT ST_Distance(geom, 'SRID=4326;POINT(12.6018 50.5173)'::geometry) as d
FROM pointst1
ORDER BY d
d
------------------
2.1827914536208
4.26600662563949
7.03781262396208
19.1914274750473
(4 Zeilen)
Change your index to GIST, which is the most suitable for geometry data:
create index geomidx on pointst1 using GIST (geom);
Just note that an index won't speed up this query of yours, since you're doing a full scan. But as soon as you start playing more in the where clause, you might see some improvement.
EDIT:
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1)
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2
id | id2 | d
----+-----+------------------
1 | 2 | 2.85954541841881
1 | 3 | 5.0965184194703
1 | 4 | 21.3720495039666
2 | 1 | 2.85954541841881
2 | 3 | 7.43911957156222
2 | 4 | 23.7492673571207
3 | 1 | 5.0965184194703
3 | 2 | 7.43911957156222
3 | 4 | 21.0225069865609
4 | 1 | 21.3720495039666
4 | 2 | 23.7492673571207
4 | 3 | 21.0225069865609
(12 rows)
Removing duplicate distances:
SELECT DISTINCT ON(d) * FROM (
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1)
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2) AS j
id | id2 | d
----+-----+------------------
1 | 2 | 2.85954541841881
3 | 1 | 5.0965184194703
3 | 2 | 7.43911957156222
4 | 3 | 21.0225069865609
4 | 1 | 21.3720495039666
2 | 4 | 23.7492673571207
(6 rows)

crosstab in PostgreSQL, count

Crosstab function returns error:
No function matches the given name and argument types
I have in table clients, dates and type of client.
Example:
CLIENT_ID | DATE | CLI_TYPE
1234 | 201601 | F
1236 | 201602 | P
1234 | 201602 | F
1237 | 201601 | F
I would like to get number of clients(distinct) group by date and then count all clients and sort them by client type (but types: P i F put in row and count client, if they are P or F)
Something like this:
DATE | COUNT_CLIENT | P | F
201601 | 2 | 0 | 2
201602 | 2 | 1 | 1
SELECT date
, count(DISTINCT client_id) AS count_client
, count(*) FILTER (WHERE cli_type = 'P') AS p
, count(*) FILTER (WHERE cli_type = 'F') AS f
FROM clients
GROUP BY date;
This counts distinct clients per day, and total rows for client_types 'P' and 'F'. It's undefined how you want to count multiple types for the same client (or whether that's even possible).
About aggregate FILTER:
Postgres COUNT number of column values with INNER JOIN
crosstab() might make it faster, but it's pretty unclear what you want exactly.
About crosstab():
PostgreSQL Crosstab Query

Calculate length of a series of line segments

I have a table like the following:
X | Y | Z | node
----------------
1 | 2 | 3 | 100
2 | 2 | 3 |
2 | 2 | 4 |
2 | 2 | 5 | 200
3 | 2 | 5 |
4 | 2 | 5 |
5 | 2 | 5 | 300
X, Y, Z are 3D space coordinates of some points, a curve passes through all the corresponding points from the first row to the last row. I need to calculate the curve length between two adjacent points whose "node" column aren't null.
If would be great if I can directly insert the result into another table that has three columns: "first_node", "second_node", "curve_length".
I don't need to interpolate extra points into the curve, just need to accumulate lengths all the straight lines, for example, in order to calculate the curve length between node 100 and 200, I need to sum the lengths of 3 straight lines: (1,2,3)<->(2,2,3), (2,2,3)<->(2,2,4), (2,2,4)<->(2,2,5)
EDIT
The table has an ID column, which is in increasing order from the first row to the last row.
To get a previous value in SQL, use the lag window function, e.g.
SELECT
x,
lag(x) OVER (ORDER BY id) as prev_x, ...
FROM ...
ORDER BY id;
That lets you get the previous and next points in 3-D space for a given segment. From there you can trivially calculate the line segment length using regular geometric maths.
You'll now have the lengths of each segment (sqlfiddle query). You can use this as input into other queries, using SELECT ... FROM (SELECT ...) subqueries or a CTE (WITH ....) term.
It turns out to be pretty awkward to go from the node segment lengths to node-to-node lengths. You need to create a table that spans the null entries, using a recursive CTE or with a window function.
I landed up with this monstrosity:
SELECT
array_agg(from_id) AS seg_ids,
-- 'max' is used here like 'coalese' for an aggregate,
-- since non-null is greater than null
max(from_node) AS from_node,
max(to_node) AS to_node,
sum(seg_length) AS seg_length
FROM (
-- lengths of all sub-segments with the null last segment
-- removed and a partition counter added
SELECT
*,
-- A running counter that increments when the
-- node ID changes. Allows us to group by series
-- of nodes in the outer query.
sum(CASE WHEN from_node IS NULL THEN 0 ELSE 1 END) OVER (ORDER BY from_id) AS partition_id
FROM
(
-- lengths of all sub-segments
SELECT
id AS from_id,
lead(id, 1) OVER (ORDER BY id) AS to_id,
-- length of sub-segment
sqrt(
(x - lead(x, 1) OVER (ORDER BY id)) ^ 2 +
(y - lead(y, 1) OVER (ORDER BY id)) ^ 2 +
(z - lead(z, 1) OVER (ORDER BY id)) ^ 2
) AS seg_length,
node AS from_node,
lead(node, 1) OVER (ORDER BY id) AS to_node
FROM
Table1
) sub
-- filter out the last row
WHERE to_id IS NOT NULL
) seglengths
-- Group into series of sub-segments between two nodes
GROUP BY partition_id;
Credit to How do I efficiently select the previous non-null value? for the partition trick.
Result:
seg_ids | to_node | from_node | seg_length
---------+---------+---------+------------
{1,2,3} | 100 | 200 | 3
{4,5,6} | 200 | 300 | 3
(2 rows)
To insert directly into another table, use INSERT INTO ... SELECT ....

Variable rows and columns in SSRS Matrix

(SSRS 2008)
I have a dataset with results looking like this:
FUNCTION | EMP-NMB
------------------
A | 100
A | 101
A | 103
B | 102
I want to display this data in my report in this way:
A | B
------------
100 | 102
101 |
103 |
I am managed to display it this way:
A | B
------------
100 |
101 |
103 |
| 102
But that table becomes very large with more data.
The number of employees and functions can vary. For now I am using a Matrix, but I don't know how to configure it to work the way I want.
I think the problem is that you are probably using EMP-NMB as you Row Group grouping.
Since you want the report to display different ones on the same line, you need to something else. Unfortunately, there isn't anything is the data you list but you can add a ROWNUMBER() to the query.
SELECT FUNCTION, EMP-NMB, ROW_NUMBER() OVER(PARTITION BY FUNCTION ORDER BY EMP-NMB) AS ROW_NUM
FROM ...
Then change the tablix Row Group Group On to use the new ROW_NUM field.