I have created in PostgreSQL a table partitioned (see here) by received column. Let's use a toy example:
CREATE TABLE measurement (
received timestamp without timezone PRIMARY KEY,
city_id int not null,
peaktemp int,
unitsales int
);
I have created one partition for each month for several years (measurement_y2012m01 ... measurement_y2016m03).
I have noticed that postgresql is not aware of the order of the partitions, so for a query like below:
select * from measurement where ... order by received desc limit 1000;
postgresql performs index scan over all partitions, even though it is very likely that the first 1000 results are located in the latest partition (or the first two or three).
Do you have an idea how to take advantage of partitions for such query? I want to emphasize that where clause may vary, I don't want to hardcode it.
The first idea is to iterate partitions in a proper order until 1000 records are fetched or all partitions are visited. But how to implement it in a flexible way? I want to avoid implementing the aforementioned iteration in the application, but I don't mind if the app needs to call a stored procedure.
Thanks in advance for your help!
Grzegorz
If you really don't know how many partitions to scan to get your desired 1000 rows in the output you could build up your resultset in a stored procedure and fetch results iterating over partitions until your limit condition is satisfied.
Starting with the most recent partition would be a wise thing to do.
select * from measurement_y2016m03 where ... order by received desc limit 1000;
You could store the immediate resultset in a record and issue a count over it and change the limit dynamically for the next scanned partition, so that if you fetch for example 870 rows in first partition, you could build up a second query with limit 130 and then perform count once again after that and increase the counter if it still doesn't satisfy your 1000 rows condition.
Why Postgres doesn't know when to stop during planning?
Planner is unaware of how many partitions are needed to satisfy your LIMIT clause. Thus, it has to order the entire set by appending results from each partition and then perform a limit (unless it already satisfies this condition during run time). The only way to do this in an SQL statement would be to restrict the lookup only to a few partitions - but that may not be the case for you. Also, increasing work_mem setting may speed things up for you if you're hitting disk during lookups.
Key note
Also, a thing to remember is that when you setup your partitioning, you should have a descending order of mostly accessed partitions. This would speed up your inserts, because Postgres checks conditions one by one and stops on first that satisfies.
Instead of iterating the partitions, you could guess at the range of received that will satisfy your query and expand it until you get the desired number of rows. Adding the range to WHERE will exclude the unnecessary partitions (assuming you have exclusion constraints set).
Edit
Correct, that's what I meant (could've phrased it better).
Simplicity seems like a pretty reasonable advantage. I don't see the performance being different, either way. This might actually be a little more efficient if you guess reasonably close to the desired range most of the time, but probably won't make a significant difference.
It's also a little more flexible, since you're not relying on the particular partitioning scheme in your query code.
Related
Consider the following situation:
I have a large PostgreSQL table with a primary key of type UUID. The UUIDs are generated randomly and spread uniformly across the UUID space.
I partition the table on this UUID column on 256 ranges (e.g. based on the first 8 bits of the UUID).
All partitions are stored on the same physical disk.
Basically this means all the 256 partitions will be equally used (unlike with time-based paritionning where the most recent parititon would normally be hotter than the other ones).
Will I see any performance improvement at all by doing this type of partitioning:
For queries based on the UUID, returning a single row (WHERE uuid_key = :id)?
For other queries that must search all partitions?
Most queries will become slower. For example, if you search by uuid_key, the optimizer has to determine which partition to search, something that grows in expense with the number of partitions. The index scan itself will not be notably faster on a small table than on a big table.
You could benefit if you have several tables partitioned alike and you join them on the partitioning key, so that you get a partitionwise join (but remember to set enable_partitionwise_join = on). There are similar speed gains for partitionwise aggregates.
Even though you cannot expect a performance gain for your query, partitioning may still have its use, for example if you need several autovacuum workers to process a single table.
Will I see any performance improvement at all by doing this type of
partitioning:
For queries based on the UUID, returning a single row (WHERE uuid_key = :id)?
Yes: Postgresql will search only in the right partition. Also you can gain performances in insert or update, reducing page contention.
For other queries that must search all partitions?
Not really, but index desing can minimize the problem.
Consider this scenario.
You're a link shortening service, and you have two tables:
Links
Clicks - predominantly append only, but will need a full scan to produce aggregates, which should be (but probably won't be) quick.
Links is millions of rows, Clicks is billions of rows.
Should you split these onto separate hardware? What's the right approach to getting the most out of postgres for this sort of problem?
With partitioning, it should be scalable enough. Partition links on hash of the shortened link (the key used for retrieval). Depending on your aggregation and reporting needs, you might partition clicks by date (maybe one partition per day?). When you create a new partition, the old one can be summed and moved to history (or removed, if the summed data is enough for your needs.
In addition to partitioning, I suggest pre-aggregating the data. If you never need the individual data, but only aggregates per day, then perform the aggregation and materialize it in another table after each day is over. That will reduce the amount considerably and make the data manageable.
I have tried single node cluster and 3 node cluster on my local machine to fetch 2.5 million entries from cassandra using spark but in both scenarios it is takes 30 seconds just for SELECT COUNT(*) from table. I need this and similarly other counts for real time analytics.
SparkSession.builder().getOrCreate().sql("SELECT COUNT(*) FROM data").show()
Cassandra isn't designed to iterate over the entire data set in a single expensive query like this. If theres 10 petabytes in data for example this query would require reading 10 petabytes off disk, bring it into memory, stream it to coordinator which will resolve the tombstones/deduplication (you cant just have each replica send a count or you will massively under/over count it) and increment a counter. This is not going to work in a 5 second timeout. You can use aggregation functions over smaller chunks of the data but not in a single query.
If you really want to make this work like this, query the system.size_estimates table of each node, and for each range split according to the size such that you get an approximate max of say 5k per read. Then issue a count(*) for each with a TOKEN restriction for each of the split ranges and combine value of all those queries. This is how spark connector does its full table scans in the SELECT * rrds so you just have to replicate that.
Easiest and probably safer and more accurate (but less efficient) is to use spark to just read the entire data set and then count, not using an aggregation function.
How much does it take to run this query directly without Spark? I think that it is not possible to parallelize COUNT queries so you won't benefit from using Spark for performing such queries.
I have a collection of documents that looks like the following:
There is one document per VIN/SiteID and our access pattern is showing all documents
at a specific site. I see two potential partition keys we could choose from:
SiteID - We only have 75 sites so the cardinality is not very high. Also, the doucments are not very big so the 10GB limit is probably OK.
SiteID/VIN: The data is now more evenly distributed but now that means each logical partition will only store one item. is this an anti-pattern? also, so support our access pattern we will need to use a cross-partition query. again, the data set is small so is this a problem?
Based on what I am describing, which partition key makes more sense?
Any other suggestions would be greatly appreciated!
Your first option makes a lot of sense and could be a good partition key but the words "probably OK" don't really breed confidence. Remember, the only way to change the partition key is to migrate to a new collection. If you can take that risk then SiteId (which I'm guessing you will always have) is a good partition key.
If you have both VIN and SiteId when you are doing the reading or querying then this is the safer combination. There is no problem with having each logical partition to store one item per se. It's only a problem when you are doing cross partition queries. If you know both VIN and SiteId in your queries then it's a great plan.
You also have to remember that your RUs are evenly split between your partitions inside a collection.
I am researching the possibility of using secondary index feature in Cassandra using Aquiles. I know for the primary index (key), a I must be using OrderPreservingPartitioner in order to query. At first, I thought that with secondary indexes, there is no such limitation, but I noticed that start key is part of GetIndexedSlicesCommand. Does that imply that under RandomPartitioner, this command is unusable?
You don't need OrderPreservingPartitioner to query by row key, it's only needed if you want to get a meaningful range of rows by their key, like 'all rows with a key between 5 and 9'. (Note that can and should almost always use RandomPartitioner instead.)
The start key for get_indexed_slices behaves the same way that it does for get_range_slices. That is, it's not very meaningful for examining a range of rows between two keys when using RandomPartitioner, but it is useful for paging through a lot of rows. There's even a FAQ entry on the topic. Basically, if you're going to get a ton of results from a call to get_indexed_slices, you don't want to fetch them all at once, you want to get a chunk (of 10, 100, or 1000, depending on size) at a time, and then set the start_key to the last key you saw in the previous chunk to get the next chunk.