Generate non-fragmenting UUIDs in Postgres? - postgresql

If I understand correctly, fully-random UUID values create fragmented indexes. Or, more precisely, the lack of a common prefix prevents dense trie storage in the indexes.
I've seen a suggestion to use uuid_generate_v1() or uuid_generate_v1mc() instead of uuid_generate_v4() to avoid this problem.
However, it seems that Version 1 of the UUID spec has the low bits of the ID first, preventing a shared prefix. Also, this timestamp is 60 bits, which seems like it may be overkill.
By contrast, some databases provide non-standard UUID generators with a timestamp in the leading 32-bits and then 12 bytes of randomness. See Datomic's Squuid's for example 1, 2.
Does it in fact make sense to use "Squuids" like this in Postgres? If so, how can I generate such IDs efficiently with pgplsql?

Note that inserting sequential index entries will result in a denser index only if you don't delete values and all your updates produce heap only tuples.
If you want sequential unique index values, why not build them yourself?
You could use clock_timestamp() in microseconds as bigint and append values from a cycling sequence:
CREATE SEQUENCE seq MINVALUE 0 MAXVALUE 999 CYCLE;
SELECT CAST(
floor(
EXTRACT(epoch FROM t)
) AS bigint
) % 1000000 * 1000000000
+ CAST(
to_char(t, 'US') AS bigint
) * 1000
+ nextval('seq')
FROM (SELECT clock_timestamp()) clock(t);

Related

Unnest vs just having every row needed in table

I have a choice in how a data table is created and am wondering which approach is more performant.
Making a table with a row for every data point,
Making a table with an array column that will allow repeated content to be unnested
That is, if I have the data:
day
val1
val2
Mon
7
11
Tue
7
11
Wed
8
9
Thu
1
4
Is it better to enter the data in as shown, or instead:
day
val1
val2
(Mon,Tue)
7
11
(Wed)
8
9
(Thu)
1
4
And then use unnest() to explode those into unique rows when I need them?
Assume that we're talking about large data in reality - 100k rows of data generated every day x 20 columns. Using the array would greatly reduce the number of rows in the table but I'm concerned that unnest would be less performant than just having all of the rows.
I believe making a table with a row for every data point would be the option I would go for. As unnest for large amounts of data would be just as if not slower. Plus
unless your data will be very repeated 20 columns is alot to align.
"100k rows of data generated every day x 20 columns"
And:
"the array would greatly reduce the number of rows" - so lots of duplicates.
Based on this I would suggest a third option:
Create a table with your 20 columns of data and add a surrogate bigint PK to it. To enforce uniqueness across all 20 columns, add a generated hash and make it UNIQUE. I suggest a custom function for the purpose:
-- hash function
CREATE OR REPLACE FUNCTION public.f_uniq_hash20(col1 text, col2 text, ... , col20 text)
RETURNS uuid
LANGUAGE sql IMMUTABLE COST 30 PARALLEL SAFE AS
'SELECT md5(textin(record_out(($1,$2, ... ,$20))))::uuid';
-- data table
CREATE TABLE data (
data_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, col1 text
, col2 text
, ...
, col20 text
, uniq_hash uuid GENERATED ALWAYS AS (public.f_uniq_hash20(col1, col2, ... , col20)) STORED
, CONSTRAINT data_uniq_hash_uni UNIQUE (uniq_hash)
);
-- reference data_id in next table
CREATE TABLE day_data (
day text
, data_id bigint REFERENCES data ON UPDATE CASCADE -- FK to enforce referential integrity
, PRIMARY KEY (day, data_id) -- must be unique?
);
db<>fiddle here
With only text columns, the function is actually IMMUTABLE (which we need!). For other data types (like timestamptz) it would not be.
In-depth explanation in this closely related answer:
Why doesn't my UNIQUE constraint trigger?
You could use uniq_hash as PK directly, but for many references, a bigint is more efficient (8 vs. 16 bytes).
About generated columns:
Computed / calculated / virtual / derived columns in PostgreSQL
Basic technique to avoid duplicates while inserting new data:
INSERT INTO data (col1, col2) VALUES
('foo', 'bam')
ON CONFLICT DO NOTHING
RETURNING *;
If there can be concurrent writes, see:
How to use RETURNING with ON CONFLICT in PostgreSQL?

Predict partition number for Postgres hash partitioning

I'm writting an app which uses partitions in Postgres DB. This is will be send to customers and run on their server. This implies that I have to be prepared for many different scenarios.
Lets start with simple table schema:
CREATE TABLE dir (
id SERIAL,
volume_id BIGINT,
path TEXT
);
I want to partition that table by volume_id column.
What I would like to achieve:
limited number of partitions (right now it's 500 but I'm will be tweaking this parameter later)
Do not create all partitions at once - add them only when they are needed
support volume ids up to 100K
[NICE TO HAVE] - been able for human to calculate partition number from volume_id
Solution that I have right now:
partition by LIST
each partition handles volume_id % 500 like this:
CREATE TABLE dir_part_1 PARTITION OF dir FOR VALUES IN (1, 501, 1001, 1501, ..., 9501);
This works great because I can create partition when it's needed, and I know exactly to which partition given volume_id belongs. But I have to manually declare numbers and I cannot support high volume_ids because speed of insert statements decrease drastically (more than 2 times).
It looks like I could try HASH partitioning but my biggest concern is that I have to create all partitions at the very beginning and I would like to be able to create them dynamically when they are needed, because planning time increases significantly up to 5 seconds for 500 partitions. For example I know that I will be adding rows with volume_id=5. How can I tell which partition should I create?
I was able to force Postgres to use dummy hash function by adding hash operator for partitioned table.
CREATE OR REPLACE FUNCTION partition_custom_bigint_hash(value BIGINT, seed BIGINT)
RETURNS BIGINT AS $$
-- this number is UINT64CONST(0x49a0f4dd15e5a8e3) from
-- https://github.com/postgres/postgres/blob/REL_13_STABLE/src/include/common/hashfn.h#L83
SELECT value - 5305509591434766563;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
CREATE OPERATOR CLASS partition_custom_bigint_hash_op
FOR TYPE int8
USING hash AS
OPERATOR 1 =,
FUNCTION 2 partition_custom_bigint_hash(BIGINT, BIGINT);
Now you can declare partitioned table like this:
CREATE TABLE some_table (
id SERIAL,
partition_id BIGINT,
value TEXT
) PARTITION BY HASH (partition_id);
CREATE TABLE some_table_part_2 PARTITION OF some_table FOR VALUES WITH (modulus 3, remainder 2);
Now you can safely assume that allow rows with partition_id % 3 = 2 will land in some_table_part_2 partition. So if you are sure what values you will receive in partition_id column you can create only required partitions.
DISCLAIMER 1: Unfortunately this will not work correctly right now (Postgres 13.1) because of bug #16840
DISCLAIMER 2: There is not point of using this technic unless you are planning to create large number of partitions (I would say 50 or more) and prolonged planning time is an issue.

Estimating size of Postgres indexes

I'm trying to get a better understanding of the tradeoffs involved in creating Postgres indexes. As part of that, I'd love to understand how much space indexes usually use. I've read through the docs, but can't find any information on this. I've been doing my own little experiments creating tables and indexes, but it would be amazing if someone could offer an explanation of why the size is what it is. Assume a common table like this with 1M rows, where each row has a unique id and a unique outstanding.
CREATE TABLE account (
id integer,
active boolean NOT NULL,
outstanding double precision NOT NULL,
);
and the indexes created by
CREATE INDEX id_idx ON account(id)
CREATE INDEX outstanding_idx ON account(outstanding)
CREATE INDEX id_outstanding_idx ON account(id, outstanding)
CREATE INDEX active_idx ON account(active)
CREATE INDEX partial_id_idx ON account(id) WHERE active
What would you estimate the index sizes to be in bytes and more importantly, why?
Since you did not specify the index type, I'll assume default B-tree indexes. Other types can be a lot different.
Here is a simplistic function to compute the estimated minimum size in bytes for an index on the given table with the given columns:
CREATE OR REPLACE FUNCTION f_index_minimum_size(_tbl regclass, _cols VARIADIC text[], OUT estimated_minimum_size bigint)
LANGUAGE plpgsql AS
$func$
DECLARE
_missing_column text;
BEGIN
-- assert
SELECT i.attname
FROM unnest(_cols) AS i(attname)
LEFT JOIN pg_catalog.pg_attribute a ON a.attname = i.attname
AND a.attrelid = _tbl
WHERE a.attname IS NULL
INTO _missing_column;
IF FOUND THEN
RAISE EXCEPTION 'Table % has no column named %', _tbl, quote_ident(_missing_column);
END IF;
SELECT INTO estimated_minimum_size
COALESCE(1 + ceil(reltuples/trunc((blocksize-page_overhead)/(4+tuple_size)))::int, 0) * blocksize -- AS estimated_minimum_size
FROM (
SELECT maxalign, blocksize, reltuples, fillfactor, page_overhead
, (maxalign -- up to 16 columns, else nullbitmap may force another maxalign step
+ CASE WHEN datawidth <= maxalign THEN maxalign
WHEN datawidth%maxalign = 0 THEN datawidth
ELSE (datawidth + maxalign) - datawidth%maxalign END -- add padding to the data to align on MAXALIGN
) AS tuple_size
FROM (
SELECT c.reltuples, count(*)
, 90 AS fillfactor
, current_setting('block_size')::bigint AS blocksize
, CASE WHEN version() ~ '64-bit|x86_64|ppc64|ia64|amd64|mingw32' -- MAXALIGN: 4 on 32bits, 8 on 64bits
THEN 8 ELSE 4 END AS maxalign
, 40 AS page_overhead -- 24 bytes page header + 16 bytes "special space"
-- avg data width without null values
, sum(ceil((1-COALESCE(s.null_frac, 0)) * COALESCE(s.avg_width, 1024))::int) AS datawidth -- ceil() because avg width has a low bias
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_attribute a ON a.attrelid = c.oid
JOIN pg_catalog.pg_stats s ON s.schemaname = c.relnamespace::regnamespace::text
AND s.tablename = c.relname
AND s.attname = a.attname
WHERE c.oid = _tbl
AND a.attname = ANY(_cols) -- all exist, verified above
GROUP BY 1
) sub1
) sub2;
END
$func$;
Call examples:
SELECT f_index_minimum_size('my_table', 'col1', 'col2', 'col3');
SELECT f_index_minimum_size('public.my_table', VARIADIC '{col1, col2, col3}');
db<>fiddle here
About VARIADIC parameters:
Return rows matching elements of input array in plpgsql function
Basically, all indexes use data pages of typically 8 kb block size (rarely 4 kb). There is one data page overhead for B-tree indexes to start with. Each additional data page has a fixed overhead of 40 bytes (currently). Each page stores tuples like depicted in the manual here. Each tuple has a tuple header (typically 8 bytes incl. alignment padding), possibly a null bitmap, data (possibly incl. alignment padding between columns for multicolumn indices), and possibly alignment padding to the next multiple of MAXALIGN (typically 8 bytes). Plus, there is an ItemId of 4 bytes per tuple. Some space may be reserved initially for later additions with a fillfactor - 90 % by default for B-tree indexes.
Important notes & disclaimers
The reported size is the estimated minimum size. An actual index will typically be bigger by around 25 % due to natural bloat from page splits. Plus, the calculation does not take possible alignment padding between multiple columns into account. Can add another couple percent (or more in extreme cases). See:
Calculating and saving space in PostgreSQL
Estimations are based on column statistics in the view pg_stats which is based on the system table pg_statistics. (Using the latter directly would be faster, but only allowed for superusers.) In particular, the calculation is based on null_frac, the "fraction of column entries that are null" and avg_width, the "average width in bytes of column's entries" to compute an average data width - ignoring possible additional alignment padding for multicolumn indexes.
The default 90 % fillfactor is taken into account. (One might specify a different one.)
Up to 50 % bloat is typically natural for B-tree indexes and nothing to worry about.
Does not work for expression indexes.
No provision for partial indexes.
Function raises an exception if anything but existing plain column names is passed. Case-sensitive!
If the table is new (or in any case if statistics may be out of date), be sure to run ANALYZE on the table before calling the function to update (or even initiate!) statistics.
Due to major optimizations, B-tree indexes in Postgres 12 waste less space and are typically closer to the reported minimum size.
Does not account for deduplication that's introduced with Postgres 13, which can compact indexes with duplicate values.
Parts of the code are taken from ioguix' bloat estimation queries here:
https://github.com/ioguix/pgsql-bloat-estimation
More gory details i the Postgres source code here:
https://doxygen.postgresql.org/bufpage_8h_source.html
You can calculate it yourself. Each index entry has an overhead of 8 bytes. Add the average size of your indexed data (in the internal binary format).
There is some more overhead, like page header and footer and internal index pages, but that doesn't account for much, unless your index rows are very wide.

Is Hadoop Suitable For This?

We have some Postgres queries that take 6 - 12 hours to complete and are wondering if Hadoop is suited to doing it faster. We have (2) 64 core servers with 256GB of RAM that Hadoop could use.
We're running PostgreSQL 9.2.4. Postgres only uses one core on one server for the query, so I'm wondering if Hadoop could do it roughly 128 times faster, minus overhead.
We have two sets of data, each with millions of rows.
Set One:
id character varying(20),
a_lat double precision,
a_long double precision,
b_lat double precision,
b_long double precision,
line_id character varying(20),
type character varying(4),
freq numeric(10,5)
Set Two:
a_lat double precision,
a_long double precision,
b_lat double precision,
b_long double precision,
type character varying(4),
freq numeric(10,5)
We have indexes on all lat, long, type, and freq fields, using btree. Both tables have "VACUUM ANALYZE" run right before the query.
The Postgres query is:
SELECT
id
FROM
setone one
WHERE
not exists (
SELECT
'x'
FROM
settwo two
WHERE
two.a_lat >= one.a_lat - 0.000278 and
two.a_lat <= one.a_lat + 0.000278 and
two.a_long >= one.a_long - 0.000278 and
two.a_long <= one.a_long + 0.000278 and
two.b_lat >= one.b_lat - 0.000278 and
two.b_lat <= one.b_lat + 0.000278 and
two.b_long >= one.b_long - 0.000278 and
two.b_long <= one.b_long + 0.000278 and
(
two.type = one.type or
two.type = 'S'
) and
two.freq >= one.freq - 1.0 and
two.freq <= one.freq + 1.0
)
ORDER BY
line_id
Is that the type of thing Hadoop can do? If so can you point me in the right direction?
I think Hadoopis very apropriate for that, but consider using HBase too.
You can run a Hadoop MapReduceroutine to get data, treat it and save it in a optimal way to HBase table. That way, reading data from it would be really faster.
Try Stado at http://stado.us. Use this branch: https://code.launchpad.net/~sgdg/stado/stado, which will be used for the next release.
Even with 64 cores, you will only be using one core to process that query. With Stado you can create multiple PostgreSQL-based "nodes" even on a single box and leverage parallelism and get those cores working.
In addition, I have had success converting correlated not exists queries into WHERE (SELECT COUNT(*) ...) = 0.
Pure Hadoop isn't suitable because doesn't have indexes. HBase implementation is very tricky in this case because only one key is possible per table. Anyway they in best case both of them requires 5 servers at least to feel significant improvement. The best that you can do with PostgreSQL is to partition data per type column, use second server as replica of the first one and execute several queries in parallel for each particular type.
To be honest PostgeSQL isn't a best solution for that. The SybaseIQ(the best) or Oracle Exadata (in worse case) can do it much better because of columnar based data structure and BLOOM filtering.

strange result when use Where filter in CQL cassandra

i have a column family use counter as create table command below: (KEY i use bigin to filter when query ).
CREATE TABLE BannerCount (
KEY bigint PRIMARY KEY
) WITH
comment='' AND
comparator=text AND
read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
default_validation=counter AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
But when i insert data to this column family , and select using Where command to filter data
results i retrived very strange :( like that:
use Query:
select count(1) From BannerCount where KEY > -1
count
-------
71
use Query:
select count(1) From BannerCount where KEY > 0;
count
-------
3
use Query:
select count(1) From BannerCount ;
count
-------
122
What happen with my query , who any tell me why i get that :( :(
To understand the reason for this, you should understand Cassandra's data model. You're probably using RandomPartitioner here, so each of these KEY values in your table are being hashed to token values, so they get stored in a distributed way around your ring.
So finding all rows whose key has a higher value than X isn't the sort of query Cassandra is optimized for. You should probably be keying your rows on some other value, and then using either wide rows for your bigint values (since columns are sorted) or put them in a second column, and create an index on it.
To explain in a little more detail why your results seem strange: CQL 2 implicitly turns "KEY >= X" into "token(KEY) >= token(X)", so that a querier can iterate through all the rows in a somewhat-efficient way. So really, you're finding all the rows whose hash is greater than the hash of X. See CASSANDRA-3771 for how that confusion is being resolved in CQL 3. That said, the proper fix for you is to structure your data according to the queries you expect to be running on it.