I have an application that records activity in a table (Oracle 10g). The logging records should be kept for at least 30 days. I expect about 20 million rows to be added to this table every month.
The DBA suggested that the table be split in partitions containing one week of data. The weekly maintenance script would then delete the oldest partition (leaving only 4 weeks of data in the table).
What would be the best way of partitioning this logging table?
Partitioning a table isn't hard - it appears that you will be removing the data on a weekly basis, so the partition clauses will look like
PARTITION "P2009_45" VALUES LESS THAN
(TO_DATE(' 2009-11-02 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
PARTITION "P2009_46" VALUES LESS THAN
(TO_DATE(' 2009-11-09 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
... etc
where your partitioning column is your date column of interest in the table.
Additional comments:
If you can upgrade to 11g you can
take advantage of interval
partitioning, which is similar to
this range partitioning, but Oracle
will manage creating new partitions
for you.
If you're going to routinely drop off
partitions, I would advise making all
indexes on the table
locally-partitioned to avoid the
rebuilds that would be necessary with
global partitions after partition
operations.
If you have a good idea of the number
of log entries per month, and it
stays relatively constant, you might
consider using a sequence (as a primary key) that is
capped at this number and then
recycles back to 0. Then your
logging statements must become "MERGE
INTO... " statements that either
create a new row or overwrite the row
if it exists. This only guarantees
that you'll retain the number of rows
allowed by the sequence max value and
NOT a certain time interval, but this
might be an alternative to
partitioning (which as DvE points
out, is an extra-expense option)
The most likely partitioning scheme would be to range-partition your data on the creation date. Each week you would create a new partition and drop the oldest one. The impact will depend on how this table is used / indexed.
Since it is a logging table perhaps it is not indexed, in that case dropping a partition will have little impact: referencing objects won't be invalidated, the drop will be just require a partition lock (and the oldest partition shouldn't be inserted at that time).
If the table is indexed, you will have to decide if your indexes will be global or partitionned. Global indexes will have to be rebuilt when you drop a partition (which takes time, although 20M rows is still manageable). You can use the UPDATE GLOBAL INDEXES clause to keep the indexes valid after the partition drop.
Local indexes will be partitionned like the table and may be less efficient than global indexes (index range scans will have to scan each local index instead of a common index if you do not query by date). These indexes won't have to be updated after a partition drop.
20 million rows every month and you only have to keep 30 days of data? (That's about a months worth).
Even with 12 months worth of data it wouldn't be hard to query this table (as one big table) with the correct index.
Inserting is no problemen either with 1 row in the logging table or 20 million.
Partitioning in Oracle is also a feature that needs to be paid for if I'm correct, so it's costly too (if you don't have a license already).
Related
What is the difference between a BRIN index and a table partition in PostgreSQL? When I should use one instead of another? It seems that they provide very similar benefits and also have similar use cases
Example
Suppose we have the following table structure
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
store_id INT,
client_id INT,
created_at timestamp,
information jsonb
)
that has the following characteristics:
orders can only be inserted, deletions are not allowed and updates are very rare and they don't involve the created_at column
the created_at column contains the timestamp of the insertion of the row in the database thus the values in the column are strictly increasing
almost every query use the created_at column in a condition and some of them may use the store_id and client_id columns
the most accessed rows are the most recent ones in terms of the created_at column
some queries may return a few records (example: analyzing a single record or the records created in a small time interval) while others may scan a vast amount of records (example: aggregate functions for a dashboard functionality)
I have chosen this example because it's very common and also both approach could be used (in my opinion). In this case which choice should I use between a BRIN index on the whole table or a partitioned table with maybe a btree index (or just a simple btree index without partitioning)? Does the table dimension influence the choice?
I have used both features (although I'll caveat that my experience with partitioning is from back when you had to use inheritance + constraints, before the introduction of CREATE TABLE ... PARTITION BY). You are correct that they seem similar-ish on a surface level, but they function by completely different mechanisms.
Table partitioning basically works as follows: replace all references to table with (select * from table_partition1 union all select * from table_partition2 /* repeat for all partitions */). The partitions will have a constraint on the partition columns, so that if those columns appear in a WHERE, the constraints can be applied up-front to prune which partitions are actually scanned. IOW, if table_partition1 has CHECK(client_id=1), and your WHERE Has client_id=2, table_partition1 will be skipped since the table constraint automatically excludes all rows from this partition from passing that WHERE.
BRIN indexes, in contrast, choose a block size for the table, and then for each block, records a min/max bound of the indexed column. This allows WHERE conditions to skip entire blocks when we can see, say, that the maximum created_at in a particular block of rows is below a created_at>={some_value} clause in your WHERE.
I can't tell you a definitive answer for your case as to which would work better. Well, that's not true, actually: the definitive answer is, "benchmark it for your own data" ;)
This is kind of fuzzy, but my general feeling is that BRIN is lightweight, and table partitioning is not. BRIN is something that can be added on to an existing table without much trouble, the indexes themselves are very small, and the impact on writes is not major (at least, not without inordinately many indices). Table partitioning, on the other hand, is a different way of representing the data on-disk; you are actually determining into which data files particular rows will be written. This requires a much more involved migration process when introducing it to an existing dataset.
However, the set of query optimizations available for table partitioning is much greater. Not only is there the constraint exclusion I described above, but you can also have indices (even BRIN ones!) on each individual partition. Of course, you can also have BRIN + other indices on a single-big-table, but I'm not sure that is particularly helpful IRL.
A few other thoughts: BRIN is good for monotonic data (timestamps, incremnting IDs, etc); the more correlated the on-disk ordering is to the indexed value, the more effective a BRIN index can be at pruning blocks to be scanned. Things like customer IDs, however, are unlikely to work well with BRIN; any given block of rows is likely to have at least one relatively low and relatively high ID. However, fields that like work quite well for partitioning: a partition-per-client, or partitioning on the modulus of a customer ID (which would more commonly be called sharding), is a good way of scaling horizontally, almost without bound.
Any update, even if it does not change the indexed column, will make a BRIN index pretty useless (unless it is a HOT update). Even without that, there are differences, for example:
partitioning allows you to get rid of lots of data efficiently, a BRIN index won't
a partitioned table allows one autovacuum worker per partition, which improves autovacuum performance
But if your only concern is to efficiently select all rows for a certain value of the index or partitioning key, both may offer about the same benefit.
We have a table partitioned on a date column.
Some of my colleagues believe that this means that column is automatically indexed. Having looked for evidence of this I don't believe that is so. Who is right?
The manual https://www.postgresql.org/docs/current/ddl-partitioning.html (section 5.11.2.1. Example) says:
Create an index on the key column(s), as well as any other indexes you
might want, on the partitioned table. (The key index is not strictly
necessary, but in most scenarios it is helpful.) This automatically
creates a matching index on each partition, and any partitions you
create or attach later will also have such an index. An index or
unique constraint declared on a partitioned table is “virtual” in the
same way that the partitioned table is: the actual data is in child
indexes on the individual partition tables.
This suggests to me we should create the index.
Each partition has ~350K rows. Since we often query by date range on that column would each partition get its own index? Or one massive one across all partitions?
Would adding an index on this column improve or degrade performance?
There is not automatically an index on the partition column.
If you did list partitioning and every list only contains one date (i.e. every date has its own partition) then I don't think also having an index on that column would be helpful. There is not extra information in the column beyond what the partitioning already knows about.
If you did range partitioning on quarter or year, but often query by a specific date, then the index would likely be useful as it provides a lot of extra specificity.
We have a table with nearly 2 billion events recorded. As per our data model, each event is uniquely identified with 4 columns combined primary key. Excluding the primary key, there are 5 B-tree indexes each on single different columns. So totally 6 B-tree indexes.
The events recorded span for years and now we need to remove the data older than 1 year.
We have a time column with long values recorded for each event. And we use the following query,
delete from events where ctid = any ( array (select ctid from events where time < 1517423400000 limit 10000) )
Does the indices gets updated?
During testing, it didn't.
After insertion,
total_table_size - 27893760
table_size - 7659520
index_size - 20209664
After deletion,
total_table_size - 20226048
table_size - 0
index_size - 20209664
Reindex can be done
Command: REINDEX
Description: rebuild indexes
Syntax:
REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name [ FORCE ]
Considering #a_horse_with_no_name method is the good solution.
What we had:
Postgres version 9.4.
1 table with 2 billion rows with 21 columns (all bigint) and 5 columns combined primary key and 5 individual column indices with date spanning 2 years.
It looks similar to time-series data with a time column containing UNIX timestamp except that its analytics project, so time is not at an ordered increase. The table was insert and select only (most select queries use aggregate functions).
What we need: Our data span is 6 months and need to remove the old data.
What we did (with less knowledge on Postgres internals):
Delete rows at 10000 batch rate.
At inital, the delete was so fast taking ms, as the bloat increased each batch delete increased to nearly 10s. Then autovacuum got triggered and it ran for almost 3 months. The insert rate was high and each batch delete has increased the WAL size too. Poor stats in the table made the current queries so slow that they ran for minutes and hours.
So we decided to go for Partitioning. Using Table Inheritance in 9.4, we implemented it.
Note: Postgres has Declarative Partitioning from version 10, which handles most manual work needed in partitioning using Table Inheritance.
Please go through the official docs as they have clear explanation.
Simplified and how we implemented it:
Create parent table
Create child table inheriting it with check constraints. (We had monthly partitions and created using schedular)
Indexes are need to be created separately for each child table
To drop old data, just drop the table, so vacuum is not needed and will be instant.
Make sure to have the postgres property constraint_exclusion to partition.
VACUUM ANALYZE the old partition after started inserting in the new partition. (In our case, it helped the query planner to use Index-Only scan instead of Seq. scan)
Using Triggers as mentioned in the docs may make the inserts slower, so we deviated from it, as we partitioned based on time column, we calculated the table name at application level based on time value before every insert and it didn't affect the insert rate for us.
Also read other caveats mentioned there.
I have a table that is initially partitioned by day. At the end of every day no more records will be added to that partition, so I cluster the index and then I then do a lot of number crunching and aggregation on that table (using the index I clustered):
CLUSTER table_a_20181104 USING table_a_20181104_index1;
After a few days (typically a week) I merge the partition for one day into a larger partition that contains all the days data for that month. I use this SQL to achieve this:
WITH moved_rows AS
(
DELETE FROM table_a_20181104
RETURNING *
)
INSERT INTO table_a_201811
SELECT * FROM moved_rows;
After maybe a month or too I change the tablespace to move the data from an SSD disk to a conventional magnetic hard disk.
ALTER TABLE ... SET TABLESPACE ...
My initial clustering of the index at the end of the day definitely improves the performance of the queries run against it.
I know that clustering is a one-off command and needs to be repeated if new records are added/removed.
My questions are:
Do I need to repeat the clustering after merging the 'day' partition into the 'month' partition?
Do I need to repeat the clustering after altering the tablespace?
Do I need to repeat the clustering if I VACUUM the partition?
Moving the data from one partition to the other will destroy the clustering, so you'll need to re-cluster after it.
ALTER TABLE ... SET TABLESPACE will just copy the table files as they are, so clustering will be preserved.
VACUUM does not move the rows, so clustering will also be preserved.
Redshift documentation identifies time-series tables as a best practice:
http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html
However, it doesn't address any of the following issues:
how many tables within a union-all view is reasonable - hundreds? (unanswered)
any method of writing to the union-all view and having redshift direct those inserts to the correct underlying tables? (Answer: no)
most effective method of loading underlying tables? Perhaps using firehose to insert into a staging table then periodically inserting those rows into appropriate table within union-all view? (unanswered)
any way to enable redshift to eliminate some underlying partitions (tables) when querying the union-all view if their date range is outside of a query's criteria? (Answer: No)
can redshift support dropping old tables, adding new tables and rebuilding union-all view within a transaction? (unanswered)
My situation:
100 million rows added daily, which will grow to 500 million in 3 years
12 month retention desired
Estimated 99% of all queries will hit the most recent 1-7 days
Data is written to existing table via kinesis firehose to s3 which then triggers a copy to redshift table.
My proposed solution:
Create a year of daily tables with a union all view, along with a dist_key of sensor_id (100,000+ uniq values) and a sort_key of (timestamp, sensor_id).
Have firehose load into staging table
Create separate process that once an hour queries staging table to discover dates of data within table, then performs an insert into 'appropriate table' select * from where timestamp = table's timestamp.
This hourly writer can probably wrap a table rename, multiple insert-selects, and table recreate in a transaction to be invisible to firehose.
Once a month drop old tables, create next month of tables, and rebuild view.
This union-all view maintenance can probably be wrapped in a transaction to avoid impacts to users.
Once a night run the vacuum analyzer.
EDITS: added notes identifying which issues have been answered, and added some detail to the proposed solution.
Your proposed process sounds quite good! While I can't answer all your questions, here is some information:
Any method of writing to the union-all view and having redshift direct those inserts to the correct underlying tables?
Views are read-only. It is not possible to write to a view, nor is it possible to insert data while expecting Redshift to send it to an appropriate table (eg a specific table for the given day).
Any way to enable redshift to eliminate some underlying partitions (tables) when querying the union-all view if their date range is outside of a query's criteria?
Redshift will not exclude specific tables from the query, but it will avoid reading particular disk blocks through the use of Zone Maps. Each block of data written to disk is associated with a specific table and column. The block has a Zone Map, which indicates the minimum and maximum values of that field stored within the block.
If a query includes a WHERE clause, Redshift can skip blocks that do not contain relevant data. This is particularly powerful when used on the SORTKEY column, since similar ranges of data are grouped together.
Given that you are using a date as the SORTKEY, Redshift will read very few disk blocks if the query includes a WHERE clause based on that column. This is very similar to the idea of skipping tables, but it actually skips reading disk blocks.