pgr_dijkstra returns an empty set - postgresql

I'm trying to write a function that is able to find the shortest way between two points using pgr_dijkstra function. I'm following this guide. With data provided in the guide everything works fine. But when I try to apply the same steps (build a topology using pgr_createTopology and then test it with pgr_dijkstra) to another data set, pgr_dijkstra returns an empty result. I've also noticed that the guide's data set has a LineString geometry column, while I have a MultiLineString geometry column. What could be the reason?
My table's structure:
Table "public.roads"
Column | Type | Collation | Nullable | Default
--------+--------------------------------+-----------+----------+------------------------------------
id | integer | | not null | nextval('roads_gid_seq'::regclass)
geom | geometry(MultiLineString,4326) | | |
source | integer | | |
target | integer | | |
Indexes:
"roads_pkey" PRIMARY KEY, btree (id)
"roads_geom_idx" gist (geom)
"roads_source_idx" btree (source)
"roads_target_idx" btree (target)
Topology creation query:
SELECT pgr_createTopology('roads', 0.00001, 'geom', 'id');
Shortest way test:
SELECT seq, node, edge, cost as cost, agg_cost, geom
FROM pgr_dijkstra(
'SELECT id, source, target, st_length(geom, true) AS cost FROM roads',
-- Some random points
1, 200
) AS pt
JOIN roads rd ON pt.edge = rd.id;

The problem was actually related to geometry data types. The function doesn't work properly with MultiLineString, though it doesn't produce any errors. So, I've converted MultiLineString to LineString and now everything seems to be OK.

Related

Postgresql + psycopg: Bulk Insert large data with POSTGRESQL function call

I am working with large, very large amount of very simple data (point clouds). I want to insert this data into a simple table in a Postgresql database using Python.
An example of the insert statement I need to execute is as follows:
INSERT INTO points_postgis (id_scan, scandist, pt) VALUES (1, 32.656, **ST_MakePoint**(1.1, 2.2, 3.3));
Note the call to the Postgresql function ST_MakePoint in the INSERT statement.
I must call this billions (yes, billions) of times, so obviously I must insert the data into the Postgresql in a more optimized way. There are many strategies to bulk insert the data as this article presents in a very good and informative way (insertmany, copy, etc).
https://hakibenita.com/fast-load-data-python-postgresql
But no example shows how to do these inserts when you need to call a function on the server-side. My question is: how can I bulk INSERT data when I need to call a function on the server-side of a Postgresql database using psycopg?
Any help is greatly appreciated! Thank you!
Please note that using a CSV doesn't make much sense because my data is huge.
Alternatively, I tried already to fill a temp table with simple columns for the 3 inputs of the ST_MakePoint function and then, after all data is into this temp function, call a INSERT/SELECT. The problem is that this takes a lot of time and the amount of disk space I need for this is nonsensical.
The most important, in order to do this within reasonable time, and with minimum effort, is to break this task down into component parts, so that you could take advantage of different Postgres features seperately.
Firstly, you will want to first create the table minus the geometry transformation. Such as:
create table temp_table (
id_scan bigint,
scandist numeric,
pt_1 numeric,
pt_2 numeric,
pt_3 numeric
);
Since we do not add any indexes and constraints, this will be most likely the fastest way to get the "raw" data into the RDBMS.
The best way to do this would be with COPY method, which you can use either from Postgres directly (if you have sufficient access), or via the Python interface by using https://www.psycopg.org/docs/cursor.html#cursor.copy_expert
Here is example code to achieve this:
iconn_string = "host={0} user={1} dbname={2} password={3} sslmode={4}".format(target_host, target_usr, target_db, target_pw, "require")
iconn = psycopg2.connect(iconn_string)
import_cursor = iconn.cursor()
csv_filename = '/path/to/my_file.csv'
copy_sql = "COPY temp_table (id_scan, scandist, pt_1, pt_2, pt_3) FROM STDIN WITH CSV HEADER DELIMITER ',' QUOTE '\"' ESCAPE '\\' NULL AS 'null'"
with open(csv_filename, mode='r', encoding='utf-8', errors='ignore') as csv_file:
import_cursor.copy_expert(copy_sql, csv_file)
iconn.commit()
The next step will be to efficiently create the table you want, from the existing raw data. You will then be able to create your actual target table with single SQL statement, and let RDBMS to do its magic.
Once data is in the RDBMS, makes sense to optimize it a little and add an index or two if applicable (primary or unique index preferably to speed up transformation)
This will be dependent on your data / use case, but something like this should help:
alter table temp_table add primary key (id_scan); --if unique
-- or
create index idx_temp_table_1 on temp_table(id_scan); --if not unique
To move data from raw into your target table:
with temp_t as (
select id_scan, scandist, ST_MakePoint(pt_1, pt_2, pt_3) as pt from temp_table
)
INSERT INTO points_postgis (id_scan, scandist, pt)
SELECT temp_t.id_scan, temp_t.scandist, temp_t.pt
FROM temp_t;
This will in one go select all data from the previous table and transform it.
Second option that you could use is similar. You can load all raw data to points_postgis directly, while keeping it separated into 3 temp columns. Then use alter table points_postgis add column pt geometry; and follow up with an update, and removal of the temp columns: update points_postgis set pt = ST_MakePoint(pt_1, pt_2, pt_3); & alter table points_postgis drop column pt_1, drop column pt_2, drop column pt_3;
The main takeaway is that the most performant option would be to not concentrate on the final final table state, but to break it down in easily achievable chunks. Postgres will easily handle both import of billion of rows, and transformation of it afterwards.
Some simple examples using a function that generates a UPC A barcode with check digit:
Using execute_batch. execute_batch has page_size argument that allows you to batch the inserts using a multi-line statement. By default this is set at 100 which will insert 100 rows at a time. You can bump this up to make fewer round trips to the server.
Using just execute and selecting data from another table.
import psycopg2
from psycopg2.extras import execute_batch
con = psycopg2.connect(dbname='test', host='localhost', user='postgres',
port=5432)
cur = con.cursor()
cur.execute('create table import_test(id integer, suffix_val varchar, upca_val
varchar)')
con.commit()
# Input data as a list of tuples. Means some data is duplicated.
input_list = [(1, '12345', '12345'), (2, '45278', '45278'), (3, '61289',
'61289')]
execute_batch(cur, 'insert into import_test values(%s, %s,
upc_check_digit(%s))', input_list)
con.commit()
select * from import_test ;
id | suffix_val | upca_val
----+------------+--------------
1 | 12345 | 744835123458
2 | 45278 | 744835452787
3 | 61289 | 744835612891
# Input data as list of dicts and using named parameters to avoid duplicating data.
input_list_dict = [{'id': 50, 'suffix_val': '12345'}, {'id': 51, 'suffix_val': '45278'}, {'id': 52, 'suffix_val': '61289'}]
execute_batch(cur, 'insert into import_test values(%(id)s,
%(suffix_val)s, upc_check_digit(%(suffix_val)s))', input_list_dict)
con.commit()
select * from import_test ;
id | suffix_val | upca_val
----+------------+--------------
1 | 12345 | 744835123458
2 | 45278 | 744835452787
3 | 61289 | 744835612891
50 | 12345 | 744835123458
51 | 45278 | 744835452787
52 | 61289 | 744835612891
# Create a table with values to be used for inserting into final table
cur.execute('create table input_vals (id integer, suffix_val varchar)')
con.commit()
execute_batch(cur, 'insert into input_vals values(%s, %s)', [(100, '76234'),
(101, '92348'), (102, '16235')])
con.commit()
cur.execute('insert into import_test select id, suffix_val,
upc_check_digit(suffix_val) from input_vals')
con.commit()
select * from import_test ;
id | suffix_val | upca_val
-------+------------+--------------
1 | 12345 | 744835123458
2 | 45278 | 744835452787
3 | 61289 | 744835612891
12345 | 12345 | 744835123458
45278 | 45278 | 744835452787
61289 | 61289 | 744835612891
100 | 76234 | 744835762343
101 | 92348 | 744835923485
102 | 16235 | 744835162358

Recursive postgres query to view

I have the following table which models a very simple hierarchical data structure with each element pointing to its parent:
Table "public.device_groups"
Column | Type | Modifiers
--------------+------------------------+---------------------------------------------------------------
dg_id | integer | not null default nextval('device_groups_dg_id_seq'::regclass)
dg_name | character varying(100) |
dg_parent_id | integer |
I want to query the recursive list of subgroups of a specific group.
I constructed the following recursive query which works fine:
WITH RECURSIVE r(dg_parent_id, dg_id, dg_name) AS (
SELECT dg_parent_id, dg_id, dg_name FROM device_groups WHERE dg_id=1
UNION ALL
SELECT dg.dg_parent_id, dg.dg_id, dg.dg_name
FROM r pr, device_groups dg
WHERE dg.dg_parent_id = pr.dg_id
)
SELECT dg_id, dg_name
FROM r;
I now want to turn this into a view where I can choose which group I want to drill down for using a WHERE clause. This means I want to be able to do:
SELECT * FROM device_groups_recursive WHERE dg_id = 1;
And get all the (recursive) subgroups of the group with id 1
I was able to write a function (by wrapping the query from above), but I would like to have a view instead of the function.
Side-Node: I know of the shortcoming of an adjacency list representation, I cannot change it currently.

Is it possible in PL/pgSQL to evaluate a string as an expression, not a statement?

I have two database tables:
# \d table_1
Table "public.table_1"
Column | Type | Modifiers
------------+---------+-----------
id | integer |
value | integer |
date_one | date |
date_two | date |
date_three | date |
# \d table_2
Table "public.table_2"
Column | Type | Modifiers
------------+---------+-----------
id | integer |
table_1_id | integer |
selector | text |
The values in table_2.selector can be one of one, two, or three, and are used to select one of the date columns in table_1.
My first implementation used a CASE:
SELECT value
FROM table_1
INNER JOIN table_2 ON table_2.table_1_id = table_1.id
WHERE CASE table_2.selector
WHEN 'one' THEN
table_1.date_one
WHEN 'two' THEN
table_1.date_two
WHEN 'three' THEN
table_1.date_three
ELSE
table_1.date_one
END BETWEEN ? AND ?
The values for selector are such that I could identify the column of interest as eval(date_#{table_2.selector}), if PL/pgSQL allows evaluation of strings as expressions.
The closest I've been able to find is EXECUTE string, which evaluates entire statements. Is there a way to evaluate expressions?
In the plpgsql function you can dynamically create any expression. This does not apply, however, in the case you described. The query must be explicitly defined before it is executed, while the choice of the field occurs while the query is executed.
Your query is the best approach. You may try to use a function, but it will not bring any benefits as the essence of the issue will remain unchanged.

Postgresql enforce unique two-way combination of columns

I'm trying to create a table that would enforce a unique combination of two columns of the same type - in both directions. E.g. this would be illegal:
col1 col2
1 2
2 1
I have come up with this, but it doesn't work:
database=> \d+ friend;
Table "public.friend"
Column | Type | Modifiers | Storage | Stats target | Description
--------------+--------------------------+-----------+----------+--------------+-------------
user_id_from | text | not null | extended | |
user_id_to | text | not null | extended | |
status | text | not null | extended | |
sent | timestamp with time zone | not null | plain | |
updated | timestamp with time zone | | plain | |
Indexes:
"friend_pkey" PRIMARY KEY, btree (user_id_from, user_id_to)
"friend_user_id_to_user_id_from_key" UNIQUE CONSTRAINT, btree (user_id_to, user_id_from)
Foreign-key constraints:
"friend_status_fkey" FOREIGN KEY (status) REFERENCES friend_status(name)
"friend_user_id_from_fkey" FOREIGN KEY (user_id_from) REFERENCES user_account(login)
"friend_user_id_to_fkey" FOREIGN KEY (user_id_to) REFERENCES user_account(login)
Has OIDs: no
Is it possible to write this without triggers or any advanced magic, using constraints only?
A variation on Neil's solution which doesn't need an extension is:
create table friendz (
from_id int,
to_id int
);
create unique index ifriendz on friendz(greatest(from_id,to_id), least(from_id,to_id));
Neil's solution lets you use an arbitrary number of columns though.
We're both relying on using expressions to build the index which is documented
https://www.postgresql.org/docs/current/indexes-expressional.html
Do you consider the intarray extension to be magic?
You'd need to use int keys for the users instead of text though...
Here's a possible solution:
create extension intarray;
create table friendz (
from_id int,
to_id int
);
create unique index on friendz ( sort( array[from_id, to_id ] ) );
insert into friendz values (1,2); -- good
insert into friendz values (2,1); -- bad
http://sqlfiddle.com/#!15/c84b7/1

Partitioning in PostgreSQL when partitioned table is referenced

My PostgreSQL database has table with entities which can be active and inactive - it's determined by isActive column value. Inactive entities are accessed very rarely, and, as database grows, "inactive to active" rate becomes very high for the database. So I expect partitioning based on simple isActive check to bring huge performance outcome.
The problem is, the table is referenced by foreign key constraint from many other tables. As specified in the last bullet of Caveats section of PostgreSQL Inheritance doc, there is no good workaround for this case.
So, is it true that currently partitioning in PostgreSQL is only suitable for the simple cases when the table partitioned is not referenced from anywhere?
Are there any other ways to go and optimize performance of queries to the table I described above? I'm pretty sure my use case is common and there should be good solution for that.
Example of queries to create the tables:
CREATE TABLE resources
(
id uuid NOT NULL,
isActive integer NOT NULL, -- 0 means false, anything else is true, I intentionally do not use boolean type
PRIMARY KEY (id)
);
CREATE TABLE resource_attributes
(
id uuid NOT NULL,
resourceId uuid NOT NULL,
name character varying(128) NOT NULL,
value character varying(1024) DEFAULT NULL,
PRIMARY KEY (id),
CONSTRAINT fk_resource_attributes_resourceid_resources_id FOREIGN KEY (resourceId) REFERENCES resources (id)
);
In this case, I'd like to partition resources table.
If the inactive to active ratio is very high a partial index is a good choice
create index index_name on resources (isActive) where isActive = 1
The only known workaround (i can think of) to create a foreign key for a table which has multiple child tables is to create another table to hold just the primary keys (but all of them, maintained by triggers) and point all foreign key references to it.
like:
+-----------+ +----------------+ +---------------------+
| resources | | resource_uuids | | resource_part_n |
+===========+ 0 1 +================+ 1 0 +=====================+
| id | --> | id | <-- | (id from resources) |
+-----------+ +----------------+ +---------------------+
| ... | ↑ 1 | CHECK(...) |
+-----------+ +--------+ +---------------------+
| | INHERITS(resources) |
+---------------------+ | +---------------------+
| resource_attributes | |
+---------------------+ |
| resourceId | --+ *
+---------------------+
| ... |
+---------------------+
But you still can't partition that table (resource_uuids), so i don't think partitioning will help you in this case.