How to enable index-sequential files in postgres - postgresql

I am writing an application backed by Postgres DB.
The application is like a logging system, the main table is like this
create table if not exists logs
(
user_id bigint not null,
log bytea not null,
timestamp timestamptz not null default clock_timestamp() at time zone 'UTC'
);
One of the main query is to fetch all log about a certain user_id, ordered by timestamp desc. It would be nice that under the hood Postgres DB stores all rows about the same user_id in one page or sequential pages, instead of scattering here and there on the disk.
As I recall from textbooks, is this the so-called "index-sequential files"? How can I guide Postgres to do that?

The simple thing to do is to create a B-tree index to speed up the search:
CREATE INDEX logs_user_time_idx ON logs (user_id, timestamp);
That would speed up the query, but take extra space on the disk and slow down all INSERT operations on the table (the index has to be maintained). There is no free lunch!
I assume that you were talking about that when you mentioned "index-sequential files". But perhaps you meant what is called a clustered index or index-organized table, which essentially keeps the table itself in a certain order. That can speed up searches like that even more. However, PostgreSQL does not have that feature.
The best you can do to make disk access more efficient in PostgreSQL is to run the CLUSTER command, which rewrites the table in index order:
CLUSTER logs USING logs_user_time_idx;
But be warned:
That statement rewrites the whole table, so it could take a long time. During that time, the table is inaccessible.
Subsequent INSERTs won't maintain the order in the table, so it “rots” over time, and after a while you will have to CLUSTER the table again.

Related

PostgresQL query using index scan backward instead of index only scan

I'm running into something I cannot explain and I have been googling for a few days now and have not yet found the cause for my "problem" with the PostgresQL scheduler causing a (relatively simple) query to take massive amounts of time.
Let's start from the top (I've tried to remove as much useless information as possible so the tables may look pointless but trust me, they're not):
I have the following schema:
CREATE TABLE ct_log (
ID integer,
CONSTRAINT ctl_pk
PRIMARY KEY (ID)
);
CREATE TABLE ct_log_entry (
CERTIFICATE_ID bigint NOT NULL,
ENTRY_ID bigint NOT NULL,
ENTRY_TIMESTAMP timestamp NOT NULL,
CT_LOG_ID integer NOT NULL,
CONSTRAINT ctle_ctl_fk
FOREIGN KEY (CT_LOG_ID)
REFERENCES ct_log(ID)
) PARTITION BY RANGE (ENTRY_TIMESTAMP);
-- I will not repeat this one 7 times, but there are partition for each year from 2013-2020:
CREATE TABLE ct_log_entry_2020 PARTITION OF ct_log_entry
FOR VALUES FROM ('2020-01-01T00:00:00'::timestamp) TO ('2021-01-01T00:00:00'::timestamp);
CREATE INDEX ctle_c ON ct_log_entry (CERTIFICATE_ID);
CREATE INDEX ctle_e ON ct_log_entry (ENTRY_ID);
CREATE INDEX ctle_t ON ct_log_entry (ENTRY_TIMESTAMP);
CREATE INDEX ctle_le ON ct_log_entry (CT_LOG_ID, ENTRY_ID DESC);
(in case you are curious about the full schema: https://github.com/crtsh/certwatch_db/blob/master/sql/create_schema.sql)
And this is the query I am trying to run:
SELECT ctl.ID, latest.entry_id
FROM ct_log ctl
LEFT JOIN LATERAL (
SELECT coalesce(max(entry_id), -1) entry_id
FROM ct_log_entry ctle
WHERE ctle.ct_log_id = ctl.id
) latest ON TRUE;
For the people that know https://crt.sh this might look familiar because this is indeed the schema from crt.sh. This makes it a bit interesting since crt.sh provides public PostgresQL access allowing me to compare query plans between my own server and theirs.
My server query plan (~700s): https://explain.depesz.com/s/ZKkt
Public crt.sh query plan (~3ms): https://explain.depesz.com/s/01Ht
This difference is quit noticeable (:sad_smile:) but I'm not sure why because as far as I know I have the correct indexes for this to be very fast and the same indexes as the crt.sh server.
It looks like my instance is using a backwards index scan instead of a index only scan for the largest 2 partitions. This was not always the case and previously it execute using the same query plan as the crt.sh instance but for some reason it decided to stop doing that.
(This is the amount of data in those tables in case it's not clear from the query plans: https://d.bouma.dev/wUjdXJXk1OzF. I cannot see how much is in the crt.sh database because they don't provide access to the individual partitions)
Now onto the list of thing I've tried:
ANALYZE the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
VACUUM ANALYZE the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
VACUUM FULL the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
Dropping the ctle_le index and recreating it again (this worked once for me giving me a few hours of great performance until I imported more data and it went with the backwards scan again)
REINDEX INDEX the ctle_le index on each ct_log_entry_* table
SET random_page_cost = x;, tried 1, 1.1, 4 and 5 (according to many SO answers and blog posts)
The only thing I notice that are different is that crt.sh is running PostgresQL 12.1 and I'm running 12.3, but as far as I can tell that shouldn't have any impact.
Also before you say, "yes well, but you cannot run this amount of data on your laptop", the server I'm running is a dedicated box with 32 available threads and 128GB RAM and running a RAID 5 with 8 2TB Samsung EVO 860 drives on hardware RAID (yes I know this is bad if a drive fails, that's another issue I'll deal with later but the read performance should be excellent). I don't know what crt.sh is running for hardware but since I only have a fraction of the data imported I don't see my hardware being the issue here (yet).
I've also "tuned" my config using the guide here: https://pgtune.leopard.in.ua/#/.
Happy to provide more info where needed but hoping someone can point me to a flaw and/or provide a solution to resolve the problem and show PostgresQL how to use the optimal path!

Redshift query a daily-generated table

I am looking for a way to create a Redshift query that will retrieve data from a table that is generated daily. Tables in our cluster are of the form:
event_table_2016_06_14
event_table_2016_06_13
.. and so on.
I have tried writing a query that appends the current date to the table name, but this does not seem to work correctly (invalid operation):
SELECT * FROM concat('event_table_', to_char(getdate(),'YYYY_MM_DD'))
Any suggestions on how this can be performed are greatly appreciated!
I have tried writing a query that appends the current date to the
table name, but this does not seem to work correctly (invalid
operation):
Redshift does not support that. But you most likely won't need it.
Try the following (expanding on the answer from #ketan):
Create your main table with appropriate (for joins) DIST key, and COMPOUND or simple SORT KEY on timestamp column, and proper compression on columns.
Daily, create a temp table (use CREATE TABLE ... LIKE - this will preserve DIST/SORT keys), load it with daily data, VACUUM SORT.
Copy sorted temp table into main table using ALTER TABLE APPEND - this will copy the data sorted, and will reduce VACUUM on the main table. You may still need VACUUM SORT after that.
After that query your main table normally, probably giving it a range on timestamp. Redshift is optimised for these scenarios, and 99% of times you don't need to optimise table scans yourself - even on tables with billion of rows scans take milliseconds to few seconds. You may need to optimise elsewhere, but that's the second step.
To get insight in the performance of scans, use STL_QUERY system table to find your query ID, and then use STL_SCAN (or SVL_QUERY_SUMMARY) table to see how fast the scan was.
Your example is actually the main use case for ALTER TABLE APPEND.
I am assuming that you are creating a new table everyday.
What you can do is:
Create a view on top of event_table_* tables. Query your data using this view.
Whenever you create or drop a table, update the view.
If you want, you can avoid #2: Instead of creating a new table everyday, create empty tables for next 1-2 years. So, no need to update the view every day. However, do remember that there is an upper limit of 9,900 tables in Redshift.
Edit: If you always need to query today's table (instead of all tables, as I assumed originally), I don't think you can do that without updating your view.
However, you can modify your design to have just one table, with date as sort-key. So, whenever your table is queried with some date, all disk blocks that don't have that date will be skipped. That'll be as efficient as having time-series tables.

Postgresql table with one ID column, sorted index, with duplicate primary key

I want to use a PostgreSQL table as a kind of work queue for documents. Each document has an ID and is stored in another, normal table with lots of additional columns. But this question is about creating the table for the work queue.
I want to create a table for this queue without OIDs with just one column: The ID of the document as integer. If an ID of a document exists in this work queue table, it means that the document with that ID is dirty and some processing has to be done.
The extra table shall avoid the VACUUM and dead tuple problems and deadlocks with transactions that would emerge if there was just a dirty bit on each document entry in the main document table.
Many parts of my system would mark documents as dirty and therefore insert IDs to process into that table. These inserts would be for many IDs in one transaction. I don't want to use any kind of nested transactions and there doesn't seem to be any kind of INSERT IF NOT EXISTS command. I'd rather have duplicate IDs in the table. Therefore duplicates must be possible for the only column in that table.
The process which processes the work queue will delete all processes IDs and therefore take care of duplicates. (BTW: There is another queue for the next step, so regarding race conditions the idea should be clean and have no problem)
But also I want the documents to be processed in order: Always shall documents with smaller IDs be processed first.
Therefore I want to have an index which aids LIMIT and ORDER BY on the ID column, the only column in the workqueue table.
Ideally given that I have only one column, this should be the primary key. But the primary key must not have duplicates, so it seems I can't do that.
Without the index, ORDER BY and LIMIT would be slow.
I could add a normal, secondary index on that column. But I fear PostgreSQL would add a second file on disc (PostgreSQL does that for every additional index) and use the double amount of disc operations for that table.
What is the best thing to do?
Add a dummy column with something random (like the OID) in order to make the primary key not complain about duplicates? Must I waste that space in my queue table?
Or is adding the second index harmless, would it become kind of the primary index which is directly in the primary tuple btree?
Shall I delete everything above this and just leave the following? The original question is distracting and contains too much unrelated information.
I want to have a table in PostgreSQL with these properties:
One column with an integer
Allow duplicates
Efficient ORDER BY+LIMIT on the column
INSERTs should not do any query in that table or any kind of unique index. INSERTs shall just locate the best page for the main file/main btree for this table and just insert the row in between to other rows, ordered by ID.
INSERTs will happen in bulk and must not fail, expect for disc full, etc.
There shall not be additional btree files for this table, so no secondary indexes
The rows should occupy not much space, e.g. have no OIDs
I cannot think of a solution that solves all of this.
My only solution would compromise on the last bullet point: Add a PRIMARY KEY covering the integer and also a dummy column, like OIDs, a timestamp or a SERIAL.
Another solution would either use a hypothetical INSERT IF NOT EXISTS, or nested transaction or a special INSERT with a WHERE. All these solutions would add a query of the btree when inserting.
Also they might cause deadlocks.
(Also posted here: https://dba.stackexchange.com/q/45126/7788)
You said
Many parts of my system would mark documents as dirty and therefore
insert IDs to process into that table. Therefore duplicates must be
possible.
and
5 rows with the same ID mean the same thing as 1 or 10 rows with that
same ID: They mean that the document with that ID is dirty.
You don't need duplicates for that. If the only purpose of this table is to identify dirty documents, a single row containing the document's id number is sufficient. There's no compelling reason to allow duplicates.
A single row for each ID number is not sufficient if you need to track which process inserted that row, or order rows by the time they were inserted, but a single column isn't sufficient for that in the first place. So I'm sure a primary key constraint or unique constraint would work fine for you.
Other processes have to ignore duplicate key errors, but that's simple. Those processes have to trap errors anyway--there are a lot of things besides a duplicate key that can prevent an insert statement from succeeding.
An implementation that allows duplicates . . .
create table dirty_documents (
document_id integer not null
);
create index on dirty_documents (document_id);
Insert 100k ID numbers into that table for testing. This will necessarily require updating the index. (Duh.) Include a bunch of duplicates.
insert into dirty_documents
select generate_series(1,100000);
insert into dirty_documents
select generate_series(1, 100);
insert into dirty_documents
select generate_series(1, 50);
insert into dirty_documents
select generate_series(88000, 93245);
insert into dirty_documents
select generate_series(83000, 87245);
Took less than a second on my desktop, which isn't anything special, and which is running three different database servers, two web servers, and playing a Rammstein CD.
Pick the first dirty document ID number for cleaning up.
select min(document_id)
from dirty_documents;
document_id
--
1
Took only 0.136 ms. Now lets delete every row that has document ID 1.
delete from dirty_documents
where document_id = 1;
Took 0.272 ms.
Let's start over.
drop table dirty_documents;
create table dirty_documents (
document_id integer primary key
);
insert into dirty_documents
select generate_series(1,100000);
Took 500 ms. Let's find the first one again.
select min(document_id)
from dirty_documents;
Took .054 ms. That's about half the time it took using a table that allowed duplicates.
delete from dirty_documents
where document_id = 1;
Also took .054 ms. That's roughly 50 times faster than the other table.
Let's start over again, and try an unindexed table.
drop table dirty_documents;
create table dirty_documents (
document_id integer not null
);
insert into dirty_documents
select generate_series(1,100000);
insert into dirty_documents
select generate_series(1, 100);
insert into dirty_documents
select generate_series(1, 50);
insert into dirty_documents
select generate_series(88000, 93245);
insert into dirty_documents
select generate_series(83000, 87245);
Get the first document.
select min(document_id)
from dirty_documents;
Took 32.5 ms. Delete those documents . . .
delete from dirty_documents
where document_id = 1;
Took 12 ms.
All of this took me 12 minutes. (I used a stopwatch.) If you want to know what performance will be, build tables and write tests.
Reading between the lines, I think you're trying to implement a work-queueing system.
Stop. Now.
Work queueing is hard. Work queuing in a relational DBMS is very hard. Most of the "clever" solutions people come up with end up serializing work on a lock without them realising it, or they have nasty bugs in concurrent operation.
Use an existing message/task queueing system. ZeroMQ, RabbitMQ, PGQ, etc etc etc etc. There are lots to choose from and they have the significant advantages of (a) working and (b) being efficient. You'll most likely need to run an external helper process or server, but the limitations of the relational database model tend to make that necessary.
The scheme you seem to be envisioning, as best as I can guess, sounds like it'll suffer from hopeless concurrency problems when it comes to failure handling, insert/delete races, etc. Really, do not try to design this yourself, especially when you don't have a really good grasp of the underlying concurrency and performance issues.

Query rows by time of creation?

I have a table that contains no date or time related fields. Still I want to query that table based on when records/rows were created. Is there a way to do this in PostgreSQL?
I prefer an answer about doing it in PostgreSQL directly. But if that's not possible, can hibernate do it for PostgreSQL?
Basically: no. There is no automatic timestamp for rows in PostgreSQL.
I usually add a column like this to my tables (ignoring time zones):
ALTER TABLE tbl ADD COLUMN log_in timestamp DEFAULT localtimestamp NOT NULL;
As long as you don't manipulate the values in that column, you got your creation timestamp. You can add a trigger and / or restrict write privileges to avoid tempering with the values.
Second class options
If you have a serial column, you could at least tell with some probability in what order rows were entered. That's not 100% reliable, because the values can be changed by hand, and applications can get values from the sequence and INSERT out of order.
If you created your table WITH (OIDS=TRUE), then the OID column could be some indication - unless your database is heavily written and / or very old, then you may have gone through OID wrap-around and later rows can have a smaller OID. That's one of the reasons, why this feature is hardly used any more.
The default depends on the setting of default_with_oids I quote the manual:
The parameter is off by default; in PostgreSQL 8.0 and earlier, it was
on by default.
If you have not updated your rows or went through a dump / restore cycle, or ran VACUUM FULL or CLUSTER or .. , a plain SELECT * FROM tbl returns all rows in the order they were entered. But this is very unreliable and implementation-dependent. PostgreSQL (like any RDBMS) does not guarantee any order without an ORDER BY clause.

Delete Takes a Long Time

I've got a table which has about 5.5 million records. I need to delete some records from it based on date. My query looks like this:
DELETE FROM Table WHERE [Date] between '2011-10-31 04:30:23' and '2011-11-01 04:30:42'
It's about 9000 rows, but this operation last very long time. How can I speed it up? Date is type of datetime2, table has int primary key clustered. Update and delete triggers are disabled.
It's very possible that [Date] is being cast to a string on every row resulting in a sequential scan of the entire table.
You should try casting your parameters to a date instead:
DELETE FROM Table WHERE [Date] between convert(datetime, '2011-10-31 04:30:23') and convert(datetime, '2011-11-01 04:30:42')
Also, make sure there's an index on [Date]
Firstly make sure you have an index on date.
If there is an index check the execution plan and make sure it is using it. Notice that it doesn't always follow that using an index is the most efficient method of processing a delete because if you are deleting a large proportion of records (rule of thumb is in excess of 10%) the additional overhead of the index look-up can be greater than a full scan.
With a large table it's also well worth making sure that the statistics are up to date (run sp_updatestats) because if the database has an incorrect understanding of the number of rows in the table it will make inappropriate choices in its execution plan. For example if the statistics are incorrect the database may decide to ignore your index even if it exists because it thinks there are far fewer records in the table than there are. Odd distributions of dates might have similar effects.
I'd probably try dropping the index on date then recreating it again. Indexes are binary trees and to work efficiently they need to be balanced. If your data has accumulated over time the index may well lopsided and queries might take a long time to find the appropriate data. Both this and statistics issue should be handled automatically by your database maintenance job, but it's often overlooked.
Finally you don't say if there are many other indexes on the table. If there are then you might be running into issues with the database having to reorganize indexes as it progresses the delete as well as update the indexes. It's a bit drastic, but one option is to drop all other indexes on the table before running the delete, then create them again afterwards.