Criteria and strategy for partitioning a large table in Postgres - postgresql

We are looking into migrating our app to a multi-tenant database. Currently, the app runs with one database per tenant. There are currently around 400 tenants. When combined, the largest table would have around 1 billion rows and would grow as tenants are added. Size by tenant varies wildly, with one tenant alone having 180 million records in that table, some having less than a million. There are a few other tables in the hundred millions, most tables would have much less. My main concerns revolve around planning for scalability for the large tables, and I'll focus on the largest one. The parameters for it are that it's a linking/many-to-many table with basic audit fields for created by and created date (though am questioning if those are even necessary for this one). Date/time is not relevant to this, this is an assignment table and applies at all times. Records can get deleted or inserted, not updated, some times in bulk, probably not frequently but can happen at any time. Data cardinality would be relatively high on both foreign keys I think, though I'm not sure what constitutes high cardinality as a ratio to total number of records. For some perspective, the tenant with 180 million records has around 100,000 distinct records for one foreign key and 165,000 for the other. Meanwhile, another client has around 180,000 records, with 500 distinct values in one field and 5000 in the other. So as I said, a lot of variability.
Would the kind of table I described above (billions of rows, high data cardinality, not time based, tenant segmented, bulk insert/deletes at any time) in the kind of scenario I described (400+ tenants with varying amounts of data) be a good candidate for partitioning? The reason I'm concerned about this now is that I've read in a number of places that partitioning is something that can be much less painful to deal with if you plan for it ahead of time rather than try to partition later after the table is huge and harder to work with without requiring down-time or jumping through hoops. At this point, my main concern is not so much querying the data, I tested with a table with 1 billion records and with a proper index select queries run very fast. I'm more worried about concurrency with the read/write/delete, running into blocking because of locks, etc. If partitioning is warranted, what would a good strategy be? Partition by tenant? Just partition large ones and keep smaller ones bundled together?

Given that you said that query performance is not an issue, the only reason I can think of to consider partitioning is to make mass purging easier to accomplish.
Do you have contractual or legal retention policies in place?
The most common scenario would be using time periods as your partition key so that rolling-off old data is simply a matter of dropping partitions, but since you clearly state that date/time is not relevant, I do not see how that would help.
Is it common for you to roll-on/roll-off individual customers? Is there a purging or retention requirement? If so, then partitioning by customer, no matter how imbalanced the partitions would be, would make sense since you could purge a large customer's data without affecting other customers' access to their data.
As for any concurrency issues, partitioning by customer should help contain these problems within a specific customer that is showing heavy activity.
I recommend testing this thoroughly for a few reasons:
I have not seen multiple active partitions in action because I have worked only with time series partitions
I have not looked deeply into PostgreSQL 12's foreign key enhancements and wonder whether a foreign key with a partitioned table on both sides would complicate dropping parititons
I have never explored the practical limits of the number of partitions a database could contain
I may be reading things from my experience into your question about partitioning, but have you considered a schema per customer?

Related

What are settings to lookout for with Citus PostgresQL

We are looking into using CitusDB. After reading all the documentation we are not clear on some fundamentals. Hoping somebody can give some directions.
In Citus you specify a shard_count and a shard_max_size, these settings are set on the coordinator according to the docs (but weirdly can also be set on a node).
What happens when you specify 1000 shards and distribute 10 tables with 100 clients?
Does it create a shard for every table (users_1, users_2, shops_1, etc.) (so effectively using all 1000 shards.
If you would grow with another 100 clients, we already hit the 1000 limit, how are these tables partitioned?
The shard_max_size defaults to 1Gb. If a shard is > 1Gb a new shard is created, but what happens when the shard_count is already hit?
Lastly, is it advisible to go for 3000 shards? We read in the docs 128 is adviced for a saas. But this seams low if you have 100 clients * 10 tables. (I know it depends.. but..)
Former Citus/current Microsoft employee here, chiming in with some advice.
Citus shards are based on integer hash ranges of the distribution key. When a row is inserted, the value of the distribution key is hashed, the planner looks up what shard was assigned the range of hash values that that key falls into, then looks up what worker the shard lives on, and then runs the insert on that worker. This means that the customers are divided up across shards in a roughly even fashion, and when you add a new customer it'll just go into an existing shard.
It is critically important that all distributed tables that you wish to join to each other have the same number of shards and that their distribution columns have the same type. This lets us perform joins entirely on workers, which is awesome for performance.
If you've got a super big customer (100x as much data as your average customer is a decent heuristic), I'd use the tenant isolation features in advance to give them their own shard. This will make moving them to dedicated hardware much easier if you decide to do so down the road.
The shard_max_size setting has no effect on hash distributed tables. Shards will grow without limit as you keep inserting data, and hash-distributed tables will never increase their shard count under normal operations. This setting only applies to append distribution, which is pretty rarely used these days (I can think of one or two companies using it, but that's about it).
I'd strongly advise against changing the citus.shard_count to 3000 for the use case you've described. 64 or 128 is probably correct, and I'd consider 256 if you're looking at >100TB of data. It's perfectly fine if you end up having thousands of distributed tables and each one has 128 shards, but it's better to keep the number of shards per table reasonable.

Do you need to add an index on a partitioned table (postgres 11)?

My team is looking at moving our non partitioned table with ~1TB of data over to a partitioned table.
We would be using range partitioning based on a timestamp column.
One thing I don't understand is whether we need to add an index on the timestamp column if it's being used as the partition key. If we make our partitions quite small (e.g. partition for every day), would this act in a similar way to an index?
We would only be doing queries on a maximum resolution of one day.
I am reluctant to add an index as we've tried this in the past and it never completed (probably because we didn't turn off writes. Not really an option to turn off writes for an extended period).
Your feeling is right: omitting the index on the partitioning column is one of the few places where partitioning actually makes queries faster.
You can then get away with a sequential scan of a single partition, and you don't have to maintain the index with every data modifying statement.
The other advantage is that partitioning makes mass deletion of data (along the partition boundaries) so much more efficient. And finally, autovacuum's job will become easier.
Two points about partitioning:
Upgrade to v12; there have been substantial performance improvements that concern partitioning.
Don't use too many partitions. With v12, you can probably go up to a few thousand, in earlier versions you will get performance problems earlier on.

Postgres partitioning?

My software runs a cronjob every 30 minutes, which pulls data from Google Analytics / Social networks and inserts the results into a Postgres DB.
The data looks like this:
url text NOT NULL,
rangeStart timestamp NOT NULL,
rangeEnd timestamp NOT NULL,
createdAt timestamp DEFAULT now() NOT NULL,
...
(various integer columns)
Since one query returns 10 000+ items, it's obviously not a good idea to store this data in a single table. At this rate, the cronjob will generate about 480 000 records a day and about 14.5 million a month.
I think the solution would be using several tables, for example I could use a specific table to store data generated in a given month: stats_2015_09, stats_2015_10, stats_2015_11 etc.
I know Postgres supports table partitioning. However, I'm new to this concept, so I'm not sure what's the best way to do this. Do I need partitioning in this case, or should I just create these tables manually? Or maybe there is a better solution?
The data will be queried later in various ways, and those queries are expected to run fast.
EDIT:
If I end up with 12-14 tables, each storing 10-20 millions rows, Postgres should be still able to run select statements quickly, right? Inserts don't have to be super fast.
Partitioning is a good idea under various circumstances. Two that come to mind are:
Your queries have a WHERE clause that can be readily mapped onto one or a handful of partitions.
You want a speedy way to delete historical data (dropping a partition is faster than deleting records).
Without knowledge of the types of queries that you want to run, it is difficult to say if partitioning is a good idea.
I think I can say that splitting the data into different tables is a bad idea because it is a maintenance nightmare:
You can't have foreign key references into the table.
Queries spanning multiple tables are cumbersome, so simple questions are hard to answer.
Maintaining tables becomes a nightmare (adding/removing a column).
Permissions have to be carefully maintained, if you have users with different roles.
In any case, the place to start is with Postgres's documentation on partitioning, which is here. I should note that Postgres's implementation is a bit more awkward than in other databases, so you might want to review the documentation for MySQL or SQL Server to get an idea of what it is doing.
Firstly, I would like to challenge the premise of your question:
Since one query returns 10 000+ items, it's obviously not a good idea to store this data in a single table.
As far as I know, there is no fundamental reason why the database would not cope fine with a single table of many millions of rows. At the extreme, if you created a table with no indexes, and simply appended rows to it, Postgres could simply carry on writing these rows to disk until you ran out of storage space. (There may be other limits internally, I'm not sure; but if so, they're big.)
The problems only come when you try to do something with that data, and the exact problems - and therefore exact solutions - depend on what you do.
If you want to regularly delete all rows which were inserted more than a fixed timescale ago, you could partition the data on the createdAt column. The DELETE would then become a very efficient DROP TABLE, and all INSERTs would be routed through a trigger to the "current" partition (or could even by-pass it if your import script was aware of the partition naming scheme). SELECTs, however, would probably not be able to specify a range of createAt values in their WHERE clause, and would thus need to query all partitions and combine the results. The more partitions you keep around at a time, the less efficient this would be.
Alternatively, you might examine the workload on the table and see that all queries either already do, or easily can, explicitly state a rangeStart value. In that case, you could partition on rangeStart, and the query planner would be able to eliminate all but one or a few partitions when planning each SELECT query. INSERTs would need to be routed through a trigger to the appropriate table, and maintenance operations (such as deleting old data that is no longer needed) would be much less efficient.
Or perhaps you know that once rangeEnd becomes "too old" you will no longer need the data, and can get both benefits: partition by rangeEnd, ensure all your SELECT queries explicitly mention rangeEnd, and drop partitions containing data you are no longer interested in.
To borrow Linus Torvald's terminology from git, the "plumbing" for partitioning is built into Postgres in the form of table inheritance, as documented here, but there is little in the way of "porcelain" other than examples in the manual. However, there is a very good extension called pg_partman which provides functions for managing partition sets based on either IDs or date ranges; it's well worth reading through the documentation to understand the different modes of operation. In my case, none quite matched, but forking that extension was significantly easier than writing everything from scratch.
Remember that partitioning does not come free, and if there is no obvious candidate for a column to partition by based on the kind of considerations above, you may actually be better off leaving the data in one table, and considering other optimisation strategies. For instance, partial indexes (CREATE INDEX ... WHERE) might be able to handle the most commonly queried subset of rows; perhaps combined with "covering indexes", where Postgres can return the query results directly from the index without reference to the main table structure ("index-only scans").

Billions rows in PostgreSql: partition or not to partition?

What i have:
Simple server with one xeon with 8 logic cores, 16 gb ram, mdadm raid1 of 2x 7200rpm drives.
PostgreSql
A lot of data to work with. Up to 30 millions of rows are being imported per day.
Time - complex queries can be executed up to an hour
Simplified schema of table, that will be very big:
id| integer | not null default nextval('table_id_seq'::regclass)
url_id | integer | not null
domain_id | integer | not null
position | integer | not null
The problem with the schema above is that I don't have the exact answer on how to partition it.
Data for all periods is going to be used (NO queries will have date filters).
I thought about partitioning on "domain_id" field, but the problem is that it is hard to predict how many rows each partition will have.
My main question is:
Does is make sense to partition data if i don't use partition pruning and i am not going to delete old data?
What will be pros/cons of that ?
How will degrade my import speed, if i won't do partitioning?
Another question related to normalization:
Should url be exported to another table?
Pros of normalization
Table is going to have rows with average size of 20-30 bytes.
Joins on "url_id" are supposed to be much faster than on "url" field
Pros of denormalization
Data can be imported much, much faster, as i don't have to make lookup into "url" table before each insert.
Can anybody give me any advice? Thanks!
Partitioning is most useful if you are going to either have selection criteria in most queries which allow the planner to skip access to most of the partitions most of the time, or if you want to periodically purge all rows that are assigned to a partition, or both. (Dropping a table is a very fast way to delete a large number of rows!) I have heard of people hitting a threshold where partitioning helped keep indexes shallower, and therefore boost performance; but really that gets back to the first point, because you effectively move the first level of the index tree to another place -- it still has to happen.
On the face of it, it doesn't sound like partitioning will help.
Normalization, on the other hand, may improve performance more than you expect; by keeping all those rows narrower, you can get more of them into each page, reducing overall disk access. I would do proper 3rd normal form normalization, and only deviate from that based on evidence that it would help. If you see a performance problem while you still have disk space for a second copy of the data, try creating a denormalized table and seeing how performance is compared to the normalized version.
I think it makes sense, depending on your use cases. I don't know how far back in time your 30B row history goes, but it makes sense to partition if your transactional database doesn't need more than a few of the partitions you decide on.
For example, partitioning by month makes perfect sense if you only query for two months' worth of data at a time. The other ten months of the year can be moved into a reporting warehouse, keeping the transactional store smaller.
There are restrictions on the fields you can use in the partition. You'll have to be careful with those.
Get a performance baseline, do your partition, and remeasure to check for performance impacts.
With the given amount of data in mind, you'll be waiting on IO mostly. If possible, perform some tests with different HW configurations trying to get best IO figures for your scenarios. IMHO, 2 disks will not be enough after a while, unless there's something else behind the scenes.
Your table will be growing daily with a known ratio. And most likely it will be queried daily. As you haven't mentioned data being purged out (if it will be, then do partition it), this means that queries will run slower each day. At some point in time you'll start looking at how to optimize your queries. One of the possibilities is to parallelize query on the application level. But here some conditions should be met:
your table should be partitioned in order to parallelize queries;
HW should be capable of delivering the requested amount of IO in N parallel streams.
All answers should be given by the performance tests of different setups.
And as others mentioned, there're more benefits for DBA in partitioned tables, so I, personally, would go for partitioning any table that is expected to receive more then 5M rows per interval, be it day, week or month.

Don't more than a few dozen partitions make sense?

I store time-series simulation results in PostgreSQL.
The db schema is like this.
table SimulationInfo (
simulation_id integer primary key,
simulation_property1,
simulation_property2,
....
)
table SimulationResult ( // The size of one row would be around 100 bytes
simulation_id integer,
res_date Date,
res_value1,
res_value2,
...
res_value9,
primary key (simulation_id, res_date)
)
I usually query data based on simulation_id and res_date.
I partitioned the SimulationResult table into 200 sub-tables based on the range value of simulation_id. A fully filled sub table has 10 ~ 15 millions rows. Currently about 70 sub-tables are fully filled, and the database size is more than 100 gb. The total 200 sub tables would be filled soon, and when it happens, I need to add more sub tables.
But I read this answers, which says more than a few dozen partitions does not make sense. So my questions are like below.
more than a few dozen partitions not make sense? why?
I checked the execution plan on my 200 sub-tables, and it scan only the relevant sub-table. So i guessed more partitions with smaller each sub-table must be better.
if number of partitions should be limited, like 50, then is it no problem to have billions rows in one table? How big one table can be without big problem given the schema like mine?
It's probably unwise to have that many partitions, yes. The main reason to have partitions at all is not to make indexed queries faster (which they are not, for the most part), but to improve performance for queries that have to sequentially scan the table based on constraints that can be proved to not hold for some of the partitions; and to improve maintenance operations (like vacuum, or deleting large batches of old data which can be achieved by truncating a partition in certain setups, and such).
Maybe instead of using ranges of simulation_id (which means you need more and more partitions all the time), you could partition using a hash of it. That way all partitions grow at a similar rate, and there's a fixed number of partitions.
The problem with too many partitions is that the system is not prepared to deal with locking too many objects, for example. Maybe 200 work fine, but it won't scale well when you reach a thousand and beyond (which doesn't sound that unlikely given your description).
There's no problem with having billions of rows per partition.
All that said, there are obviously particular concerns that apply to each scenario. It all depends on the queries you're going to run, and what you plan to do with the data long-term (i.e. are you going to keep it all, archive it, delete the oldest, ...?)