If a Postgres DB has unique IDs across its tables, how do you find a row using its ID without knowing its table? - postgresql

Following the blog of Rob Conery I have set of unique IDs across the tables of my Postgres DB.
Now, using these unique IDs, is there a way to query a row on the DB without knowing what table it is in? Or can those tables be indexed such that if the row is not available on the current table, I just increase the index and I can query to the next table?

In short - if you did not prepared for that - then no. You can prepare for that by generating your own uuid. Please look here. For instance PG has uuid that preserve order. Also uuid v5 has something like namespaces. So you can build hierarchy. However that is done by hashing namespace, and I don't know tool to do opposite inside PG.

If you know all possible tables in advance you could prepare a query that simply UNIONs a search with a tagged type over all tables. In case of two tables named comments and news you could do something like:
PREPARE type_of_id(uuid) AS
SELECT id, 'comments' AS type
FROM comments
WHERE id = $1
UNION
SELECT id, 'news' AS type
FROM news
WHERE id = $1;
EXECUTE type_of_id('8ecf6bb1-02d1-4c04-8875-f1da62b7f720');
Automatically generating this could probably be done by querying pg_catalog.pg_tables and generating the relevant query on the fly.

Related

Querying from parent table's data (postgresql)

I have a parent table (parent_table) and few children tables inherit from it (child_one_table and child_two_table).
I want to query data using columns belong to the parent table only (the data itself is inserted to the children tables), when I run an explain on my query I see that there are sequence scans running on all the tables (parent_table, child_one_table and child_two_table).
Is there a more efficient way to do this? When I try to query using ONLY on parent_table I get back 0 result since the data was inserted to children table.
Thanks in advance!
Thanks for clarifying your question!
When I query the data for SELECT * FROM parent_table WHERE name = 'joe'; I see that it actually go over all 3 tables and query for that.
This seems to be standard behaviour from PostgreSQL. According to the 5.10.1. Caveats section of https://www.postgresql.org/docs/current/ddl-inherit.html
Note that not all SQL commands are able to work on inheritance hierarchies. Commands that are used for data querying, data modification, or schema modification (e.g., SELECT, UPDATE, DELETE, most variants of ALTER TABLE, but not INSERT or ALTER TABLE ... RENAME) typically default to including child tables and support the ONLY notation to exclude them.
So really what you want to do is explore the usage of ONLY to specify the scope of your query.
I hope this can help!

Best performance method for getting records by large collection of IDs

I am writing a query with code to select all records from a table where a column value is contained in a CSV. I found a suggestion that the best way to do this was using ARRAY functionality in PostgresQL.
I have a table price_mapping and it has a primary key of id and a column customer_id of type bigint.
I want to return all records that have a customer ID in the array I will generate from csv.
I tried this:
select * from price_mapping
where ARRAY[customer_id] <# ARRAY[5,7,10]::bigint[]
(the 5,7,10 part would actually be a csv inserted by my app)
But I am not sure that is right. In application the array could contain 10's of thousands of IDs so want to make sure I am doing right with best performance method.
Is this the right way in PostgreSQL to retrieve large collection of records by pre-defined column value?
Thanks
Generally this is done with the SQL standard in operator.
select *
from price_mapping
where customer_id in (5,7,10)
I don't see any reason using ARRAY would be faster. It might be slower given it has to build arrays, though it might have been optimized.
In the past this was more optimal:
select *
from price_mapping
where customer_id = ANY(VALUES (5), (7), (10)
But new-ish versions of Postgres should optimize this for you.
Passing in tens of thousands of IDs might run up against a query size limit either in Postgres or your database driver, so you may wish to batch this a few thousand at a time.
As for the best performance, the answer is to not search for tens of thousands of IDs. Find something which relates them together, index that column, and search by that.
If your data is big enough, try this:
Read your CSV using a FDW (foreign data wrapper)
If you need this connection often, you might build a materialized view from it, holding only needed columns. Refresh this when new CSV is created.
Join your table against this foreign table or materialized viev.

Inserting into postgres database

I am working trying to write an insert query into a backup database. I writing place and entities tables into this database. The issue is entities is linked to place via place.id column. I added a column place.original_id in the place table to store it's original 'id'. so now that i entered place into the new database it's id column changed but i have the original id stored so I can still link entities table to it. I am trying to figure out how to write entities to get the new id
so far i am at this point:
insert into entities_backup (id, place_id)
select
nextval('public.entities_backup_id_seq'),
(select id from places where original_id = (select place_id from entities) as place_id
from
entities
I know I am missing something because this does not work. I need to grab the id column from places when entity.place_id = places.original_id. Any help would be great.
I think this is what you want
insert into entities_backup (id, place_id)
select nextval('public.entities_backup_id_seq'), places.id
from places, entities
where places.original_id = entities.place_id;
I am working trying to write an insert query into a backup database. I writing place and entities tables into this database. The issue is entities is linked to place via place.id column. I added a column place.original_id in the place table to store it's original 'id'. so now that i entered place into the new database it's id column changed but i have the original id stored so I can still link entities table to it.
It would be simpler to not have this problem in the first place.
Rather than trying to fix this up after the fact, the better solution is to dump and load places and entities complete with their primary and foreign keys intact. Oracle's EXPORT or a utility such as ora2pg should be able to do it.
Sorry I can't say more. I know Postgres, not Oracle.

Insert data from staging table into multiple, related tables?

I'm working on an application that imports data from Access to SQL Server 2008. Currently, I'm using a stored procedure to import the data individually by record. I can't go with a bulk insert or anything like that because the data is inserted into two related tables...I have a bunch of fields that go into the Account table (first name, last name, etc.) and three fields that will each have a record in an Insurance table, linked back to the Account table by the auto-incrementing AccountID that's selected with SCOPE_IDENTITY in the stored procedure.
Performance isn't very good due to the number of round trips to the database from the application. For this and some other reasons I'm planning to instead use a staging table and import the data from there. Reading up on my options for approaching this, a cursor that executes the same insert stored procedure on the data in the staging table would make sense. However it appears that cursors are evil incarnate and should be avoided.
Is there any way to insert data into one table, retrieve the auto-generated IDs, then insert data for the same records into another table using the corresponding ID, in a set-based operation? Or is a cursor my only option here?
Look at the OUTPUT clause. You should be able to add it to your INSERT statement to do what you want.
BTW, if you need to output columns into the second table that weren't inserted into the first one, then use MERGE instead of INSERT (as suggested in the comment to the original question) as its OUTPUT clause supports referencing other columns from the source table(s). Otherwise, keeping it with an INSERT is more straightforward, and it does give you access to the inserted identity column.
I'm having experiment to worked out in inserting multiple record into related table using databinding. So, try this!
Hopefully this is very helpful. Follow this link How to insert record into related tables. for more information.

Optimize getting counts of rows grouped by first letter in SQLite?

My current query looks something like this:
SELECT SUBSTR(name,1,1), COUNT(*) FROM files GROUP BY SUBSTR(name,1,1)
But it's taking a pretty long time just to do counts on a table that's already indexed by the name column. I saw from this question that some engines might not use indexes correctly for the SUBSTR function, and in fact, sqlite will not use indexes for SUBSTR(string,1,1).
Is there any other approach that would utilize the index and net me some faster queries?
One strategy that is consistent with your access pattern is to add a new indexed column "first_letter" to your table. Use a trigger on to set the value on insert and update. Then your query is a simple group by first_letter.
Another strategy is to create a shadow table which contains an aggregation of the mother table. This isn't easy because it is your job as developer to keep the shadow table consistent with the mother table. Every delete, update or insert in table files needs to be accompanied by a change in the shadow table.
Databases like Oracle have support for materialized views to achieve this automatically but sqlite doesn't.