How to determine how much space 1 row will take in Postgres db? - postgresql

I'm very new to Postgres so my math could be off here...
This is my table:
CREATE TABLE audit (
id BIGSERIAL PRIMARY KEY,
content_id VARCHAR (50) NULL,
type VARCHAR (100) NOT NULL,
size bigint NOT NULL,
timestamp1 timestamp NOT NULL DEFAULT NOW(),
timestamp2 timestamp NOT NULL DEFAULT NOW());
I want to make some estimations on how much space 1 row would occupy. So I did something like this:
1 row = id + content_id + type + size + timestamp1 + timestamp2
= 8 + 100 + 50 + 8 + 8 + 8 bytes
= 182 bytes
I also created this same table in my local postgres but the numbers are not matching
INSERT INTO public.audit(
content_id, type, size)
VALUES ('aaa', 'bbb', 100);
SELECT pg_size_pretty( pg_total_relation_size('audit') ); -- returns 24 kb
INSERT INTO public.audit(
content_id, type, size)
VALUES ('aaaaaaaaaaaaa', 'bbbbbbbbbbbbbb', 100000000000);
SELECT pg_size_pretty( pg_total_relation_size('audit') ); -- still returns 24 kb
Which brings me to think that Postgres reserves a space of 24 kb to start with and as I put in more data it will get incremented by 132 bytes once I go beyond 24 kb? But something inside me says that can't be right.
I want to see how much space 1 row would occupy in Postgres db so I can analyze how much data I can potentially store in it.
Edit
After reading more I've come up with this, is it correct?
1 row =
23 (heaptupleheader)
+ 1 (padding)
+ 8 (id)
+ 50 (content_id)
+ 6 (padding)
+ 100 (type)
+ 4 (padding)
+ 8 (size)
+ 8 (timestamp)
+ 8 (timestamp)
= 216 bytes

That "something inside me says that can't be right" is wrong. Actually trying id determine the size of each row is impractical. You can calculate the average row, and given a large number of rows the better that average get. Part of that reason is variable length columns. Your definition varchar(50) does not required bytes of storage unless unless it contains 50 bytes, if it has 20 then it only takes up 20 bytes (plus overhead), even then it's not exact as the padding may change. The definition only specifies the Maximum not the actual, storage is on actual.
As far a your 24kb that doesn't seem out-of-line at all. Keep in mind that physical I/O is the slowest possible individual operation and trying to adjust to individual rows for I/O would bring your system to a screeching halt. Postgres therefore only reads in full blocks (and allocates space the same), and/or multiple blocks. Typically with a block size of 8K (8192 bytes). This is the trade off I/O performance vs. space allocation. It appears your system has a multi-block read of 3 blocks (??). If anything is surprising it would that is is that small.
In short trying to get the size of a row not piratical, instead get several hundred representative rows and calculate the average.
BTW you can change the length just by rearranging your columns:
1 row =
23 (heaptupleheader)
+ 1 (padding)
+ 8 (id)
+ 8 (size)
+ 8 (timestamp)
+ 8 (timestamp)
+ 50 (content_id)
+ 2 (padding) (if content contains all 50 char)
+ 100 (type) (if type contains all 100 char)
= 208 bytes

Related

PostgreSQL DB size larger than expected

I have the next psql database:
CREATE TABLE "readings33" (
"uniqueid" BIGSERIAL PRIMARY KEY,
"uniqueid_sensor" INTEGER NOT NULL,
"timestamp" TIMESTAMP NOT NULL DEFAULT NULL,
"value" VARCHAR(15) NOT NULL,
CONSTRAINT "FK_readings_sensors" FOREIGN KEY ("uniqueid_sensor") REFERENCES "public"."sensors" ("uniqueid") ON UPDATE NO ACTION ON DELETE CASCADE
);
AFAIK the total size should be around:
"uniqueid" -> 8 bytes
"uniqueid_sensor" -> 4 bytes
"timestamp" -> 10 bytes
"value" VARCHAR(15) 8 bytes (because my value length for the test is a string with 8 bytes)
The sum of all is 8+4+10+8 = 30 bytes but when I write 100.000 rows to the DB this occupees 12.5 Mibs that is 125 bytes per row. I've done this text with 10.000 rows and the relation is about the same... Can anybody tell me why this increment is size??
Thanks in advance
Finally after run this script I found that the size is correct, don't know why HeidiSQL gives me a wrong size:
WITH cteTableInfo AS
(
SELECT
COUNT(1) AS ct
,SUM(length(t::text)) AS TextLength
,'public.readings'::regclass AS TableName
FROM public.readings AS t
)
,cteRowSize AS
(
SELECT ARRAY [pg_relation_size(TableName)
, pg_relation_size(TableName, 'vm')
, pg_relation_size(TableName, 'fsm')
, pg_table_size(TableName)
, pg_indexes_size(TableName)
, pg_total_relation_size(TableName)
, TextLength
] AS val
, ARRAY ['Total Relation Size'
, 'Visibility Map'
, 'Free Space Map'
, 'Table Included Toast Size'
, 'Indexes Size'
, 'Total Toast and Indexes Size'
, 'Live Row Byte Size'
] AS Name
FROM cteTableInfo
)
SELECT
unnest(name) AS Description
,unnest(val) AS Bytes
,pg_size_pretty(unnest(val)) AS BytesPretty
,unnest(val) / ct AS bytes_per_row
FROM cteTableInfo, cteRowSize
UNION ALL SELECT '------------------------------', NULL, NULL, NULL
UNION ALL SELECT 'TotalRows', ct, NULL, NULL FROM cteTableInfo
UNION ALL SELECT 'LiveTuples', pg_stat_get_live_tuples(TableName), NULL, NULL FROM cteTableInfo
UNION ALL SELECT 'DeadTuples', pg_stat_get_dead_tuples(TableName), NULL, NULL FROM cteTableInfo;
Results:
A timestamp occupies 8 bytes, but there will be 4 bytes of padding after the integer. An 8 byte string will occupy 9 bytes.
So the row data occupy 33 bytes. Together with 23 bytes row header and additional padding that will be around 60 bytes. Finally, there is some overhead per 8kB data block.
If you get much more than that, then either your measurements are wrong (did you include the primary key index?), or your table is bloated by data modifications or a non-standard fillfactor setting.
To measure the actual size of the table, use
SELECT pg_table_size('readings33');
To measure the size including the indexes, use
SELECT pg_total_relation_size('readings33');

How to calculate size of tables which are saved on disk? (PosgreSQL)

How to calculate size of tables which are saved on disk?
Based on my internet searching, how to calculate the length of the table based on the formula:
8KB × ceil(number of records / floor(floor(8KB × fillfactor - 24) / (28 + data length of 1 record)))
Example:
Column | Type |
aid | integer |
bid | integer |
abalance | integer |
filler | character(84) |
data length of 1 record = aid(4 bytes) + bid(4 bytes) + abalance(4 bytes) + filler(84 bytes + 1 byte) = 97 byte
The data length of a record must be rounded to 8 bytes.
=> Data length of 1 record is 104 bytes.
Therefore, I think that 1 character is contained in 1 byte of memory.
However, column "filler" can be input with 84 characters "a" (single byte) or 84 characters "あ" (double-byte)
I don’t know why double-byte character can be contained in single byte character?
Can you explain to me this question?
It's much simpler. Use pg_relation_size to calculate the size of one relation alone (without the associated TOAST tables and indexes) or pg_total_relation_size to include all associated objects.

Firebird list domains and data types

I want to list all domains, their datatypes, and size.
Background
I've managed to do the query, based on this SO answer.
The basic code takes all fields:
SELECT
*
FROM
rdb$fields
I found that I could get fields from rdb$fields:
filter fields from this request by RDB$FIELD_NAME
get field type code from RDB$FIELD_TYPE
get field length from RDB$FIELD_LENGTH
Reference:
https://firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref-appx04-fields.html
Question
How to combine all this to list all domains, their datatypes, and size?
I want to get only domains created by users, not automatic ones.
The code:
select
t.RDB$FIELD_NAME Name,
case t.RDB$FIELD_TYPE
when 7 then 'SMALLINT'
when 8 then 'INTEGER'
when 10 then 'FLOAT'
when 12 then 'DATE'
when 13 then 'TIME'
when 14 then 'CHAR'
when 16 then 'BIGINT'
when 27 then 'DOUBLE PRECISION'
when 35 then 'TIMESTAMP'
when 37 then 'VARCHAR'
when 261 then 'BLOB'
end Type_Name,
t.RDB$CHARACTER_LENGTH Chr_Length
from RDB$FIELDS t
where coalesce( rdb$system_flag, 0) = 0
and not ( rdb$field_name starting with 'RDB$')
Also interesting, I could not find a system table with datatypes. Had to hardcode them from the reference.
Thanks for the help in comments:
#MarkRotteveel
RDB$TYPE contains types, but names them differently:
You can find all data types in the RDB$TYPE for RDB$FIELD_NAME =
'RDB$FIELD_TYPE' (although you will need to map some types as it lists
SMALLINT as SHORT, INTEGER as LONG, BIGINT as INT64 and VARCHAR as
VARYING)
Need to use field RDB$CHARACTER_LENGTH instead of RDB$FIELD_LENGTH.
Note that RDB$FIELD_LENGTH is the wrong column for char/varchar
columns as it is the length in bytes (which depends on the character
set), you need to use RDB$CHARACTER_LENGTH for the length in
characters, and for numerical fields, you'll more likely need
RDB$FIELD_PRECISION (+ RDB$FIELD_SCALE), you are also ignoring sub
type information.
I needed the length of varchars only but appears RDB$FIELD_LENGTH = RDB$CHARACTER_LENGTH, 1 byte = 1 char for 1 byte character set.
If you use a 1 byte character set [1 byte = 1 char], but for example, UTF-8 is
(max) 4 byte per character, so then the field_length = 4 x
character_length
#Arioch
The most reliable way to get user domains:
To an extent one may use select * from rdb$fields where coalesce(
rdb$system_flag, 0) = 0 and not ( rdb$field_name starting with 'RDB$')
however no one prohibits user from manually/explicitly creating column
named "RDB$1234567".

Divide records into groups - quick solution

I need to divide with UPDATE command rows (selected from subselect) in PostgreSQL table into groups, these groups will be identified with integer value in one of its columns. These groups should be with the same size. Source table contains billions of records.
For example I need to divide 213 selected rows into groups, every group should contains 50 records. The result will be:
1 - 50. row => 1
51 - 100. row => 2
101 - 150. row => 3
151 - 200. row => 4
200 - 213. row => 5
There is no problem to do it with some loop (or use PostgreSQL window functions), but I need to do it very efficiently and quickly. I can't use sequence in id because there should be gaps in these ids.
I have an idea to use random integer number generator and set it as default value for a row. But this is not useable when I need to adjust group size.
The query below should display 213 rows with a group-number from 0-4. Just add 1 if you want 1-5
SELECT i, (row_number() OVER () - 1) / 50 AS grp
FROM generate_series(1001,1213) i
ORDER BY i;
create temporary sequence s minvalue 0 start with 0;
select *, nextval('s') / 50 grp
from t;
drop sequence s;
I think it has the potential to be faster than the row_number version #Richard. But the difference could be not relevant depending on the specifics.

Where are NUMERIC precision and scale for a field found in the pg_catalog tables?

In PostgreSQL, column data for the structure of a table is stored in pg_attribute, with a few fields in pg_class, and a couple in pg_attrdef .
But I do not see the precision or scale for a NUMERIC field type stored in there anywhere.
It can be found in the INFORMATION_SCHEMA tables, but I am trying to avoid them, as they do not use oids for easy joining to the pg_catalog tables.
So the question is:
Where is column precision and scale stored in the postgreSQL system tables?
It is stored in pg_attribute, in the column atttypmod. All information is available in the view information_schema.columns. This view uses some queries to calculate the values, these are the bare basics:
SELECT
CASE atttypid
WHEN 21 /*int2*/ THEN 16
WHEN 23 /*int4*/ THEN 32
WHEN 20 /*int8*/ THEN 64
WHEN 1700 /*numeric*/ THEN
CASE WHEN atttypmod = -1
THEN null
ELSE ((atttypmod - 4) >> 16) & 65535 -- calculate the precision
END
WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
ELSE null
END AS numeric_precision,
CASE
WHEN atttypid IN (21, 23, 20) THEN 0
WHEN atttypid IN (1700) THEN
CASE
WHEN atttypmod = -1 THEN null
ELSE (atttypmod - 4) & 65535 -- calculate the scale
END
ELSE null
END AS numeric_scale,
*
FROM
pg_attribute ;