oid and bytea are creating system in tables - postgresql

oid -> creates a table pg_largeobjects and stores data in there
bytea -> if the compressed data would still exceed 2000 bytes, PostgreSQL splits variable length data types in chunks and stores them out of line in a special “TOAST table” according to https://www.cybertec-postgresql.com/en/binary-data-performance-in-postgresql/
I don't want any other table for large data I want to store them in a column in my defined table, is that possible?

It is best to avoid Large Objects.
With bytea you can prevent PostgreSQL from storing data out of line in a TOAST table by changing the column definition like
ALTER TABLE tab ALTER col SET STORAGE MAIN;
Then PostgreSQL will compress that column but keep it in the main table.
Since the block size in PostgreSQL is 8kB, and one row is always stored in a single block, that will limit the size of your table rows to somewhat under 8kB (there is a block header and other overhead).
I think that you are trying to solve a non-problem, and your request to not store large data out of line is unreasonable.

Related

How to store a column value in Redshift varchar column with length more than 65535

I tried to load the redshift table but failed on one column- The length of the data column 'column_description'is longer than the length defined in the table. Table: 65535, Data: 86555.
I tried to increase the length of column in RS table, looks like 65535 is the max length RS supports.
Do we have any alternatives to store value in Redshift?
The answer is that Redshift doesn't support anything larger and that one shouldn't store large artifacts in an analytic database. If you are using Redshift for its analytic powers to find specific artifacts (images, files, etc) then these should be stored in S3 and the object key (pointer) should be stored in redshift.

Shall I SET STORAGE PLAIN on fixed-length bytea PK column?

My Postgres table's primary key is a SHA1 checksum (always 20 bytes) stored in a bytea column (because Postgres doesn't have fixed-length binary types).
Shall I ALTER TABLE t ALTER COLUMN c SET STORAGE PLAIN not to let Postgres compress and/or outsource (TOAST) my PK/FK for the sake of lookup and join performance? And why (not)?
I would say that that is a micro-optimization that will probably not have a measurable effect.
First, PostgreSQL only considers compressing and slicing values if the row exceeds 2000 bytes, so there will only be an effect at all if your rows routinely exceed that size.
Then, even if the primary key column gets toasted, you will probably only be able to measure a difference if you select a large number of rows in a single table scan. Fetching only a few rows by index won't make a big difference.
I'd benchmark both approaches, but I'd assume that it will be hard to measure a difference. I/O and other costs will probably hide the small extra CPU time required for decompression (remember that the row has to be large for TOAST to kick in in the first place).

Firebird table column order to save disk space

I have a table with almost six-hundred columns that contains raw data from an outside source, each row being a single transaction.
How can I order the columns to make its on disk and in memory layout space efficient?

varchar(max): how to control the length of "in row" data

I keep reading stuff like this:
The text in row option will be removed in a future version of SQL
Server. Avoid using this option in new development work, and plan to
modify applications that currently use text in row. We recommend that
you store large data by using the varchar(max), nvarchar(max), or
varbinary(max) data types. To control in-row and out-of-row behavior
of these data types, use the large value types out of row option.
So what should we do if we have a varchar(max) field that we want to limit to 16 chars in row?
Thanks!
EDIT. When I say "in row", I mean the VARCHAR/TEXT strings are stored directly in the data row, not as a pointer (with the string data stored elsewhere.) Moving the data out of the row will increase table scan performance if the data moved out of the row is not part of the "where" clause.
EDIT. The text I quoted, says this:
To control in-row and out-of-row behavior
of these data types, use the large value types out of row option.
Sure enough:
https://msdn.microsoft.com/en-us/library/ms173530.aspx
But on that page it says this:
The text in row feature will be removed in a future version of SQL
Server. To store large value data, we recommend that you use of the
varchar(max), nvarchar(max) and varbinary(max) data types.
So the question remains.
EDIT. It appears we will still have the ability to use this table option:
large value types out of row. A value of 1 means varbinary(max), xml
and large user-defined type (UDT) columns in the table are stored out
of row, with a 16-byte pointer to the root. a value of 0 means
varchar(max), nvarchar(max), varbinary(max), xml and large UDT values
are stored directly in the data row, up to a limit of 8000 bytes and
as long as the value can fit in the record. If the value does not fit
in the record, a pointer is stored in-row and the rest is stored out
of row in the LOB storage space. 0 is the default.
However, we seem to be losing the option to keep the data in the row when it is small. It will be either all in or all out. Is there any other way to do this?
You are correct, there is no configuration able to do the same as the old 'TEXT_IN_ROW'.
You either let SQL Server store up to 8Kb in the page or store information of any size out of row.
So what should we do if we have a varchar(max) field that we want to
limit to 16 chars in row?
CREATE TABLE dbo.T1 (SomeCol VARCHAR(MAX) CHECK (LEN(SomeCol)<=16));

Does the order of columns in a Postgres table impact performance?

In Postgres does the order of columns in a CREATE TABLE statement impact performance? Consider the following two cases:
CREATE TABLE foo (
a TEXT,
B VARCHAR(512),
pkey INTEGER PRIMARY KEY,
bar_fk INTEGER REFERENCES bar(pkey),
C bytea
);
vs.
CREATE TABLE foo2 (
pkey INTEGER PRIMARY KEY,
bar_fk INTEGER REFERENCES bar(pkey),
B VARCHAR(512),
a TEXT,
C bytea
);
Will performance of foo2 be better than foo because of better byte alignment for the columns? When Postgres executes CREATE TABLE does it follow the column order specified or does it re-organize the columns in optimal order for byte alignment or performance?
Question 1
Will the performance of foo2 be better than foo because of better byte
alignment for the columns?
Yes, the order of columns can have a small impact on performance. Type alignment is the more important factor, because it affects the footprint on disk. You can minimize storage size (play "column tetris") and squeeze more rows on a data page - which is the most important factor for speed.
Normally, it's not worth bothering. With an extreme example like in this related answer you get a substantial difference:
Calculating and saving space in PostgreSQL
Type alignment details:
Making sense of Postgres row sizes
The other factor is that retrieving column values is slightly faster if you have fixed size columns first. I quote the manual here:
To read the data you need to examine each attribute in turn. First
check whether the field is NULL according to the null bitmap. If it
is, go to the next. Then make sure you have the right alignment. If
the field is a fixed width field, then all the bytes are simply
placed. If it's a variable length field (attlen = -1) then it's a bit
more complicated. All variable-length data types share the common
header structure struct varlena, which includes the total length of
the stored value and some flag bits.
There is an open TODO item to allow reordering of column positions in the Postgres Wiki, partly for these reasons.
Question 2
When Postgres executes a CREATE TABLE does it follow the column order
specified or does it re-organize the columns in optimal order for byte
alignment or performance?
Columns are stored in the defined order, the system does not try to optimize.
I fail to see any relevance of column order to TOAST tables like another answer seems to imply.
As far as I understand, PostgreSQL adheres to the order in which you enter the columns when saving records. Whether this affects performance is debatable. PostgreSQL stores all table data in pages each being 8kb in size. 8kb is the default, but it can be change at compile time.
Each row in the table will take up space within the page. Since your table definition contains variable columns, a page can consist of a variable amount of records. What you want to do is make sure you can fit as many records into one page as possible. That is why you will notice performance degradation when a table has a huge amount of columns or column sizes are huge.
This being said, declaring a varchar(8192) does not mean the page will be filled up with one record, but declaring a CHAR(8192) will use up one whole page irrespective of the amount of data in the column.
There is one more thing to consider when declaring TOASTable types such as TEXT columns. These are columns that could exceed the maximum page size. A table that has TOASTable columns will have an associated TOAST table to store the data and only a pointer to the data is stored with the table. This can impact performance, but can be improved with proper indexes on the TOASTable columns.
To conclude, I would have to say that the order of the columns do not play much of role in the performance of a table. Most queries utilise indexes which are store separately to retrieve records and therefore column order is negated. It comes down to how many pages needs to be read to retrieve the data.