How to fix invalid memory alloc request size(postgresql) - postgresql

I can insert data into a bytea type column in a table, but I can't select it.
It seems like a memory issue, how do I fix it?
code sample
INSERT INTO bytea_test(data) values (repeat('x', 1024 * 1024 * 1023)::bytea);
-- OK
select repeat('x', 1024 * 1024 * 1023)::bytea;
-- invalid memory alloc request size 2145386499
container on k8s
PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
I tried editing around the system, but it didn't improve.
My current settings are:
max_connections=100
shared_buffers=6GB
effective_cache_size=4GB
maintenance_work_mem=64MB
checkpoint_completion_target=0.5
wal_buffers=16MB
default_statistics_target=100
random_page_cost=4
effective_io_concurrency=1
work_mem=3GB
min_wal_size=80MB
max_wal_size=6GB
max_worker_processes=8
max_parallel_workers_per_gather=2

You are running out of memory. Changing PostgreSQL parameters won't help with that. You'll have to set the memory limits of the container wider if you want to run a statement like that.

When formatted to return from a select, a bytea is converted to format which takes up twice as much space as it does when it is in storage. PostgreSQL can't deal with a single value as large as that.
In this case, you could work around it by doing set bytea_output=escape;, as that variable-width way of representing bytea would be more compact for this particular case.

Related

Postgres Varchar(n) or text which is better [duplicate]

What's the difference between the text data type and the character varying (varchar) data types?
According to the documentation
If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.
and
In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type text is not in the SQL standard, several other SQL database management systems have it as well.
So what's the difference?
There is no difference, under the hood it's all varlena (variable length array).
Check this article from Depesz: http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/
A couple of highlights:
To sum it all up:
char(n) – takes too much space when dealing with values shorter than n (pads them to n), and can lead to subtle errors because of adding trailing
spaces, plus it is problematic to change the limit
varchar(n) – it's problematic to change the limit in live environment (requires exclusive lock while altering table)
varchar – just like text
text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name
The article does detailed testing to show that the performance of inserts and selects for all 4 data types are similar. It also takes a detailed look at alternate ways on constraining the length when needed. Function based constraints or domains provide the advantage of instant increase of the length constraint, and on the basis that decreasing a string length constraint is rare, depesz concludes that one of them is usually the best choice for a length limit.
As "Character Types" in the documentation points out, varchar(n), char(n), and text are all stored the same way. The only difference is extra cycles are needed to check the length, if one is given, and the extra space and time required if padding is needed for char(n).
However, when you only need to store a single character, there is a slight performance advantage to using the special type "char" (keep the double-quotes — they're part of the type name). You get faster access to the field, and there is no overhead to store the length.
I just made a table of 1,000,000 random "char" chosen from the lower-case alphabet. A query to get a frequency distribution (select count(*), field ... group by field) takes about 650 milliseconds, vs about 760 on the same data using a text field.
(this answer is a Wiki, you can edit - please correct and improve!)
UPDATING BENCHMARKS FOR 2016 (pg9.5+)
And using "pure SQL" benchmarks (without any external script)
use any string_generator with UTF8
main benchmarks:
2.1. INSERT
2.2. SELECT comparing and counting
CREATE FUNCTION string_generator(int DEFAULT 20,int DEFAULT 10) RETURNS text AS $f$
SELECT array_to_string( array_agg(
substring(md5(random()::text),1,$1)||chr( 9824 + (random()*10)::int )
), ' ' ) as s
FROM generate_series(1, $2) i(x);
$f$ LANGUAGE SQL IMMUTABLE;
Prepare specific test (examples)
DROP TABLE IF EXISTS test;
-- CREATE TABLE test ( f varchar(500));
-- CREATE TABLE test ( f text);
CREATE TABLE test ( f text CHECK(char_length(f)<=500) );
Perform a basic test:
INSERT INTO test
SELECT string_generator(20+(random()*(i%11))::int)
FROM generate_series(1, 99000) t(i);
And other tests,
CREATE INDEX q on test (f);
SELECT count(*) FROM (
SELECT substring(f,1,1) || f FROM test WHERE f<'a0' ORDER BY 1 LIMIT 80000
) t;
... And use EXPLAIN ANALYZE.
UPDATED AGAIN 2018 (pg10)
little edit to add 2018's results and reinforce recommendations.
Results in 2016 and 2018
My results, after average, in many machines and many tests: all the same (statistically less than standard deviation).
Recommendation
Use text datatype, avoid old varchar(x) because sometimes it is not a standard, e.g. in CREATE FUNCTION clauses varchar(x)≠varchar(y).
express limits (with same varchar performance!) by with CHECK clause in the CREATE TABLE e.g. CHECK(char_length(x)<=10). With a negligible loss of performance in INSERT/UPDATE you can also to control ranges and string structure e.g. CHECK(char_length(x)>5 AND char_length(x)<=20 AND x LIKE 'Hello%')
On PostgreSQL manual
There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.
I usually use text
References: http://www.postgresql.org/docs/current/static/datatype-character.html
In my opinion, varchar(n) has it's own advantages. Yes, they all use the same underlying type and all that. But, it should be pointed out that indexes in PostgreSQL has its size limit of 2712 bytes per row.
TL;DR:
If you use text type without a constraint and have indexes on these columns, it is very possible that you hit this limit for some of your columns and get error when you try to insert data but with using varchar(n), you can prevent it.
Some more details: The problem here is that PostgreSQL doesn't give any exceptions when creating indexes for text type or varchar(n) where n is greater than 2712. However, it will give error when a record with compressed size of greater than 2712 is tried to be inserted. It means that you can insert 100.000 character of string which is composed by repetitive characters easily because it will be compressed far below 2712 but you may not be able to insert some string with 4000 characters because the compressed size is greater than 2712 bytes. Using varchar(n) where n is not too much greater than 2712, you're safe from these errors.
text and varchar have different implicit type conversions. The biggest impact that I've noticed is handling of trailing spaces. For example ...
select ' '::char = ' '::varchar, ' '::char = ' '::text, ' '::varchar = ' '::text
returns true, false, true and not true, true, true as you might expect.
Somewhat OT: If you're using Rails, the standard formatting of webpages may be different. For data entry forms text boxes are scrollable, but character varying (Rails string) boxes are one-line. Show views are as long as needed.
A good explanation from http://www.sqlines.com/postgresql/datatypes/text:
The only difference between TEXT and VARCHAR(n) is that you can limit
the maximum length of a VARCHAR column, for example, VARCHAR(255) does
not allow inserting a string more than 255 characters long.
Both TEXT and VARCHAR have the upper limit at 1 Gb, and there is no
performance difference among them (according to the PostgreSQL
documentation).
The difference is between tradition and modern.
Traditionally you were required to specify the width of each table column. If you specify too much width, expensive storage space is wasted, but if you specify too little width, some data will not fit. Then you would resize the column, and had to change a lot of connected software, fix introduced bugs, which is all very cumbersome.
Modern systems allow for unlimited string storage with dynamic storage allocation, so the incidental large string would be stored just fine without much waste of storage of small data items.
While a lot of programming languages have adopted a data type of 'string' with unlimited size, like C#, javascript, java, etc, a database like Oracle did not.
Now that PostgreSQL supports 'text', a lot of programmers are still used to VARCHAR(N), and reason like: yes, text is the same as VARCHAR, except that with VARCHAR you MAY add a limit N, so VARCHAR is more flexible.
You might as well reason, why should we bother using VARCHAR without N, now that we can simplify our life with TEXT?
In my recent years with Oracle, I have used CHAR(N) or VARCHAR(N) on very few occasions. Because Oracle does (did?) not have an unlimited string type, I used for most string columns VARCHAR(2000), where 2000 was at some time the maximum for VARCHAR, and in all practical purposes not much different from 'infinite'.
Now that I am working with PostgreSQL, I see TEXT as real progress. No more emphasis on the VAR feature of the CHAR type. No more emphasis on let's use VARCHAR without N. Besides, typing TEXT saves 3 keystrokes compared to VARCHAR.
Younger colleagues would now grow up without even knowing that in the old days there were no unlimited strings. Just like that in most projects they don't have to know about assembly programming.
I wasted way too much time because of using varchar instead of text for PostgreSQL arrays.
PostgreSQL Array operators do not work with string columns. Refer these links for more details: (https://github.com/rails/rails/issues/13127) and (http://adamsanderson.github.io/railsconf_2013/?full#10).
If you only use TEXT type you can run into issues when using AWS Database Migration Service:
Large objects (LOBs) are used but target LOB columns are not nullable
Due to their unknown and sometimes large size, large objects (LOBs) require more processing
and resources than standard objects. To help with tuning migrations of systems that contain
LOBs, AWS DMS offers the following options
If you are only sticking to PostgreSQL for everything probably you're fine. But if you are going to interact with your db via ODBC or external tools like DMS you should consider using not using TEXT for everything.
character varying(n), varchar(n) - (Both the same). value will be truncated to n characters without raising an error.
character(n), char(n) - (Both the same). fixed-length and will pad with blanks till the end of the length.
text - Unlimited length.
Example:
Table test:
a character(7)
b varchar(7)
insert "ok " to a
insert "ok " to b
We get the results:
a | (a)char_length | b | (b)char_length
----------+----------------+-------+----------------
"ok "| 7 | "ok" | 2

PostGIS ST_X() precision behaviour

We are investigating using PostGIS to perform some spacial filtering of data that has been gathered from a roving GPS engine. We have defined some start and end points that we use in our processing with the following table structure:
CREATE TABLE IF NOT EXISTS tracksegments
(
idtracksegments bigserial NOT NULL,
name text,
approxstartpoint geometry,
approxendpoint geometry,
maxpoints integer
);
If the data in this table is queried:
SELECT ST_AsText(approxstartpoint) FROM tracksegments
we get ...
POINT(-3.4525845 58.5133318)
Note that the Long/Lat points are given to 7 decimal places.
To get just the longitude element, we tried:
SELECT ST_X(approxstartpoint) AS long FROM tracksegments
we get ...
-3.45
We need much more precision than the 2 decimal places that are returned. We've searched the documentation and there does not appear to be a way to set the level of precision. Any help would be appreciated.
Vance
Your problem is definitely client related. Your client is most likely truncating double precision values for some reason. As ST_AsText returns a text value, it does not get affected by this behaviour.
ST_X does not truncate the coordinate's precision like that, e.g.
SELECT ST_X('POINT(-3.4525845 58.5133318)');
st_x
------------
-3.4525845
(1 Zeile)
Tested with psql in PostgreSQL 9.5 + PostGIS 2.2 and PostgreSQL 12.3 + PostGIS 3.0 and with pgAdmin III
Note: PostgreSQL 9.5 is a pretty old release! Besides the fact that it will reach EOL next January, you're missing really kickass features in the newer releases. I sincerely recommend you to plan a system upgrade as soon as possible.

What is the limit of the returned rows from a select statement for the SOCI Firebird backend?

Using SOCI Firebird v4.0 trunk to Read/Write data from/to a Firebird database.
The problem is, when I select all the rows from a table that are over 2 million rows via a select statement, SOCI throws an std::bad_alloc exception and I get only exactly the value of 1024 * 1024 rows that is: 1048576
I don't know if SOCI has a limit or there is something else I am missing here!
BTW I am storing the rows in a std::vector.
Neither SOCI nor Firebird was responsible for this.
The only cause was the system itself that ran out of memory and can't handle over 2^32 on 32-bit version.
I had to recompile the code in 64-bit and test it in a 64-bit system.

Firebird errors when using Symmetricds

I am using symmetricds free version to replicate my firebird database. When I demo by creating new (blank) DB, it worked fine. But when I config on my existing DB (have data), error occurred.
I use Firebird-2.5.5.26952_0 32bit & symmetric-server-3.9.5, OS is Windows Server 2008 Enterprise.
I had searched for whole day but found nothing to solve this. Anyone please help. Thank for your time.
UPDATE:
When initial load, Symmetric execute the statement to declare UDF in firebird DB:
declare external function sym_hex blob
returns cstring(32660) free_it
entry_point 'sym_hex' module_name 'sym_udf
It caused error because my existing DB charset is UNICODE_FSS, max length of CSTRING is 10922. When I work around by updating charset to NONE, it worked fine. But it is not a safe solution. Still working to find better.
One more thing, anyone know others open source tool to replicate Firebird database, I tried many in here and only Symmetric work.
The problem seems to be a bug in Firebird where the length of CSTRING is supposed to be in bytes, but in reality it uses characters. Your database seems to have UTF8 (or UNICODE_FSS) as its default character set, which means each character can take up to 4 bytes (3 for UNICODE_FSS). The maximum length of CSTRING is 32767 bytes, but if it calculates in characters for CSTRING, that suddenly reduces the maximum to 8191 characters (or 32764 bytes) (or 10922 characters, 32766 bytes for UNICODE_FSS).
The workaround to this problem would be to create a database with a different default character set. Alternatively, you could (temporarily) alter the default character set:
For Firebird 3:
Set the default character set to a single byte character set (eg NONE). Use of NONE is preferable to avoid unintended transliteration issues
alter database set default character set NONE;
Disconnect (important, you may need to disconnect all current connections because of metadata caching!)
Set up SymmetricDS so it creates the UDF
Set the default character set back to UTF8 (or UNICODE_FSS)
alter database set default character set UTF8;
Disconnect again
When using Firebird 2.5 or earlier, you will need perform a direct system table update (which is no longer possible in Firebird 3) using:
Step 2:
update RDB$DATABASE set RDB$CHARACTER_SET_NAME = 'NONE'
Step 4:
update RDB$DATABASE set RDB$CHARACTER_SET_NAME = 'UTF8'
The alternative would be for SymmetricDS to change its initialization to
DECLARE EXTERNAL FUNCTION SYM_HEX
BLOB
RETURNS CSTRING(32660) CHARACTER SET NONE FREE_IT
ENTRY_POINT 'sym_hex' MODULE_NAME 'sym_udf';
Or maybe character set BINARY instead of NONE, as that seems closer to the intent of the UDF.

Why is PostgreSQL 9.2 slower than 9.1 in my tests?

I have upgraded from PostgreSQL 9.1.5 to 9.2.1:
"PostgreSQL 9.1.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit"
"PostgreSQL 9.2.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit"
It is on the same machine with default PostgreSQL configuration files (only port was changed).
For testing purpose I have simple table:
CREATE TEMP TABLE test_table_md_speed(id serial primary key, n integer);
Which I test using function:
CREATE OR REPLACE FUNCTION TEST_DB_SPEED(cnt integer) RETURNS text AS $$
DECLARE
time_start timestamp;
time_stop timestamp;
time_total interval;
BEGIN
time_start := cast(timeofday() AS TIMESTAMP);
FOR i IN 1..cnt LOOP
INSERT INTO test_table_md_speed(n) VALUES (i);
END LOOP;
time_stop := cast(timeofday() AS TIMESTAMP);
time_total := time_stop-time_start;
RETURN extract (milliseconds from time_total);
END;
$$ LANGUAGE plpgsql;
And I call:
SELECT test_db_speed(1000000);
I see strange results. For PostgreSQL 9.1.5 I get "8254.769", and for 9.2.1 I get: "9022.219". This means that new version is slower. I cannot find why.
Any ideas why those results differ?
You say both are on the same machine. Presumably the data files for the newer version were added later. Later files tend to be added closer to the center of the platter, where access speeds are slower.
There is a good section on this in Greg Smith's book on PostgreSQL performance, including ways to measure and graph the effect. With clever use of the dd utility you might be able to do some ad hoc tests of the relative speed at each location, at least for reads.
The 9.2 release generally scales up to a large number of cores better than earlier versions, although in some of the benchmarks there was a very slight reduction in the performance of a single query running alone. I didn't see any benchmarks showing an effect anywhere near this big, though; I would bet on it being the result of position on the drive -- with just goes to show how hard it can be to do good benchmarking.
UPDATE: A change made in 9.2.0 to improve performance for some queries made some other queries perform worse. Eventually it was determined that this change should be reverted, which happened in version 9.2.3; so it is worth checking performance after upgrading to that maintenance release. A proper fix, which has been confirmed to fix the problem the reverted patch fixed without causing a regression, will be included in 9.3.0.