PostgreSQL 12.3: ERROR: out of memory for query result - postgresql

I have an AWS RDS PostgreSQL 12.3 (t3.small, 2CPU 2GB RAM). I have this table:
CREATE TABLE public.phones_infos
(
phone_id integer NOT NULL DEFAULT nextval('phones_infos_phone_id_seq'::regclass),
phone character varying(50) COLLATE pg_catalog."default" NOT NULL,
company_id integer,
phone_tested boolean DEFAULT false,
imported_at timestamp with time zone NOT NULL,
CONSTRAINT phones_infos_pkey PRIMARY KEY (phone_id),
CONSTRAINT fk_phones_infos FOREIGN KEY (company_id)
REFERENCES public.companies_infos (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE CASCADE
)
There are exactly 137468 records in this table, using:
SELECT count(1) FROM phones_infos;
The ERROR: out of memory for query result occurs with this simple query when I use pgAdmin 4.6:
SELECT * FROM phones_infos;
I have tables with 5M+ records and never had this problem before.
EXPLAIN SELECT * FROM phones_infos;
Seq Scan on phones_infos (cost=0.00..2546.68 rows=137468 width=33)
I read this article to see if I could find answers, but unfortunately as we can see on metrics: there are no old pending connections that could eat memory.
As suggested, the shared_buffers seems to be correctly sized:
SHOW shared_buffers;
449920kB
What should I try?

The problem must be on the client side. A sequential scan does not require much memory in PostgreSQL.
pgAdmin will cache the complete result set in RAM, which probably explains the out-of-memory condition.
I see two options:
Limit the number of result rows in pgAdmin:
SELECT * FROM phones_infos LIMIT 1000;
Use a different client, for example psql. There you can avoid the problem by setting
\set FETCH_COUNT 1000
so that the result set is fetched in batches.

Related

Speed query very slow on TimescaleDB

I created a table with the command:
CREATE TABLE public.g_dl (
id bigint NOT NULL,
id_tram integer,
site_id character varying(40),
user_id character varying(40),
app_code character varying(40),
time TIMESTAMP WITHOUT TIME ZONE NOT NULL
);
SELECT create_hypertable('g_dl','time');
Then I insert an amount of about 34 million records.
The query speed is very good, the amount of Ram used by the docker container is about 500MB-1.2GB. But query speed gets a problem after I insert amount of 1.8 million records, the time field is out of order before I had inserted it.
I use DBbeaver and get the message "You might need to increase max_locks_per_transaction". Then I change that value up
increase max_locks_per_transaction = 1000
Query speed is very slow and the amount of ram used by docker is 10GB - 12GB. What I'm doing wrong way. Pls let me know.
Outout when explain:
EXPLAIN analyze select * from g_dlquantracts gd where id_tram = 300
Raw JSON explain:
https://drive.google.com/file/d/1bA5EwcWMUn5oTUirHFD25cSU1cYGwGLo/view?usp=sharing
Formatted execution plan: https://explain.depesz.com/s/tM57

An index scan is going to the heap for fetching results when only the indexed column is queried

I have the following test table created in Postgres with 5 million rows
create table temp_test(
id serial,
latitude float, longitude float,name varchar(32),
questionaire jsonb, city varchar(32), country varchar(32)
);
I have generated random data in this table using the below query
CREATE OR REPLACE FUNCTION
random_in_range(INTEGER, INTEGER) RETURNS INTEGER
AS $$
SELECT floor(($1 + ($2 - $1 + 1) * random()))::INTEGER;
$$ LANGUAGE SQL;
insert into temp_test(latitude,longitude,name,questionaire,city,country)
Select random_in_range(-180,180),random_in_range(-180,180),md5(random()::text),'["autism","efg"]',md5(random()::text)
,md5(random()::text)
from generate_series(1,5000000);
Column 'id' has a btree index
CREATE INDEX id_index ON temp_test (id);
When I try to query with only 'id' which has an index and "explain analyse" the query
explain analyse select id from temp_test where id = 10000;
I get the following result:
The execution time of this query was around ~0.049 second and if I rerun the same query (considering the database is caching) I get the results in approximately a similar time duration.
From the results, I see that even if I am querying on an indexed column and not fetching any other column (I know it is not practical) why heap memory is being used when the information exists within the index.
I would expect the query to not touch the heap if I am extracting information from the index.
Is there something I am missing here? Any help would be appreciated. Thanks in advance!
The query is probably hitting the heap to check the visibility ma. Run vacuum full verbose and see what postgres says.. this will make the rows visible to all transactions. Sometimes postgres can’t run vacuum fully for one reason or another so verbose help.
I found the solution, I ran vacuum on my table
vacuum temp_test;
instead of running "Vacuum Full Verbose"
Although it worked I wonder what's the difference between the two commands

Slow joining on primary key in postgresql

I'm using PostgreSQL 10.4 on Windows 10 and have observed this strange slowness.
create table stock_prices (
quote_date timestamp,
security_id int NOT NULL,
...
PRIMARY KEY (quote_date, security_id)
);
create index stock_prices_idx2 on stock_prices (security_id, quote_date);
Doing the following was instantaneous.
select * from stock_prices where quote_date = '2017-12-29';
But the following took 31s.
create temp table ts (
quote_date timestamp,
primary key (quote_date)
);
insert into ts values ('2017-12-29');
select p.*
from stock_prices p, ts t
where p.quote_date = t.quote_date;
drop table ts;
AFAIK, the above should hit the index. Using DBeaver's explain execution plan function, it's reporting that it did a "Seq Scan" on stock_prices, which I assume means table scan.
Before I moved to Postgres 10.4, I was using SQL Server 2017 Developer Edition with the exact same schema and didn't have any issues. The database is quote big so I couldn't really provide much test data, but the underlying data was straight from Wharton Business School's WRDS academic research database (the table I'm using is CRSP.dsf). Any idea why postgres isn't using the index?
[Edit] Ok, looks like it hugely depends on what Postgres thinks is in the temp table ts. Adding analyze ts; before the select made it instantaneous. This is bizarre but anyway...
There is also a primary key/index on the quote_date column in the ts table. Since you are doing an inner join here, the optimizer should be free to choose which table appears on the left and right sides of the join. Assuming it puts the stock_prices table on the left side, then it could take advantage of that index on the ts.quote_date column. But, in this case, it means it would be scanning the stock_prices table. So, perhaps the query plan changed when you switched versions of Postgres, but I don't see anything unexpected here. Postgres isn't using the index you asked about, because it seems to have found a better index/table to use.
As mentioned above, adding analyze ts will solve the problem. Looks like statistics have a great deal of influence over how Postgre plans its query.

PostgreSQL - How to make a condition with records between the current record date and the same date plus 5 min?

I have something like this. With this part of code I detect if a vehicle stopped at least 5 minutes.
And works but, with a large amount of data, it starts to be slow.
I did a lot of tests and I'm sure that my problem is in the not exists block.
My table:
CREATE TABLE public.messages
(
id bigint PRIMARY KEY DEFAULT nextval('messages_id_seq'::regclass),
messagedate timestamp with time zone NOT NULL,
vehicleid integer NOT NULL,
driverid integer NOT NULL,
speedeffective double precision NOT NULL,
-- ... few nonsense properties
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.messages OWNER TO postgres;
CREATE INDEX idx_messages_1 ON public.messages
USING btree (vehicleid, messagedate);
And my query:
SELECT
*
FROM
messages m
WHERE
m.speedeffective > 0
and m.next_speedeffective = 0
and not exists( -- my problem
select id
from messages
where
vehicleid = m.vehicleid
and speedeffective > 5 -- I forgot this condition
and messagedate > m.messagedate
and messagedate <= m.messagedate + interval '5 minutes'
)
I can't figure out how to build the condition in a more performant way.
Edit DAY2:
I added a previous table like this to use in the second table:
WITH messagesx as (
SELECT
vehicleid,
messagedate
FROM
messages
WHERE
speedeffective > 5
)
and now works better. I think that I'm missing a little detail.
Typically, a 'NOT EXISTS' will slow down your query as it requires a full scan of the table for each of the outer rows. Try to incorporate the same functionality within a join (I'm trying to rewrite the query here, without knowing the table, so I might make a mistake here):
SELECT
*
FROM
messages m1
LEFT JOIN
messages m2
ON m1.vehicleid = m2.vehicleid AND m1.messagedate < m2.messagedate AND m1.messagedate <= m2.messagedate+interval '5 minutes'
WHERE
speedeffective > 0
and next_speedeffective = 0
and m2.vehicleid IS NULL
Take note that the NOT EXISTS is rewritten as the non-hit of the join condition.
Based on this answer: https://stackoverflow.com/a/36445233/5000827
and reading about NOT IN, NOT EXISTS and LEFT JOIN (where join is NULL)
For PostgreSQL, NOT EXISTS and LEFT JOIN are anti-join and works at the same way. (This is the reason why the #CountZukula answer's result is almost the same than mine)
The problem was on the kind of operation: Nest or Hash.
So, based on this: https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html
PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons:
To recover or reuse disk space occupied by updated or deleted rows.
To update data statistics used by the PostgreSQL query planner.
To update the visibility map, which speeds up index-only scans.
To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
I made a VACUUM ANALYZE to messages table and the same query works way fast.
So, with the VACUUM PostgreSQL can decide better.

Postgres using an index for one table but not another

I have three tables in my app, call them tableA, tableB, and tableC. tableA has fields for tableB_id and tableC_id, with indexes on both. tableB has a field foo with an index, and tableC has a field bar with an index.
When I do the following query:
select *
from tableA
left outer join tableB on tableB.id = tableA.tableB_id
where lower(tableB.foo) = lower(my_input)
it is really slow (~1 second).
When I do the following query:
select *
from tableA
left outer join tableC on tableC.id = tabelA.tableC_id
where lower(tableC.bar) = lower(my_input)
it is really fast (~20 ms).
From what I can tell, the tables are about the same size.
Any ideas as to the huge performance difference between the two queries?
UPDATES
Table sizes:
tableA: 2061392 rows
tableB: 175339 rows
tableC: 1888912 rows
postgresql-performance tag info
Postgres version - 9.3.5
Full text of the queries are above.
Explain plans - tableB tableC
Relevant info from tables:
tableA
tableB_id, integer, no modifiers, storage plain
"index_tableA_on_tableB_id" btree (tableB_id)
tableC_id, integer, no modifiers, storage plain,
"index_tableA_on_tableB_id" btree (tableC_id)
tableB
id, integer, not null default nextval('tableB_id_seq'::regclass), storage plain
"tableB_pkey" PRIMARY_KEY, btree (id)
foo, character varying(255), no modifiers, storage extended
"index_tableB_on_lower_foo_tableD" UNIQUE, btree (lower(foo::text), tableD_id)
tableD is a separate table that is otherwise irrelevant
tableC
id, integer, not null default nextval('tableC_id_seq'::regclass), storage plain
"tableC_pkey" PRIMARY_KEY, btree (id)
bar, character varying(255), no modifiers, storage extended
"index_tableC_on_tableB_id_and_bar" UNIQUE, btree (tableB_id, bar)
"index_tableC_on_lower_bar" btree (lower(bar::text))
Hardware:
OS X 10.10.2
CPU: 1.4 GHz Intel Core i5
Memory: 8 GB 1600 MHz DDR3
Graphics: Intel HD Graphics 5000 1536 MB
Solution
Looks like running vacuum and then analyze on all three tables fixed the issue. After running the commands, the slow query started using "index_patients_on_foo_tableD".
The other thing is that you have your indexed columns queried as lower() , which can also be creating a partial index when the query is running.
If you will always query the column as lower() then your column should be indexed as lower(column_name) as in:
create index idx_1 on tableb(lower(foo));
Also, have you looked at the execution plan? This will answer all your questions if you can see how it is querying the tables.
Honestly, there are many factors to this. The best solution is to study up on INDEXES, specifically in Postgres so you can see how they work. It is a bit of holistic subject, you can't really answer all your problems with a minimal understanding of how they work.
For instance, Postgres has an initial "lets look at these tables and see how we should query them" before the query runs. It looks over all tables, how big each of the tables are, what indexes exist, etc. and then figures out how the query should run. THEN it executes it. Oftentimes, this is what is wrong. The engine incorrectly determines how to execute it.
A lot of the calculations of this are done off of the summarized table statistics. You can reset the summarized table statistics for any table by doing:
vacuum [table_name];
(this helps to prevent bloating from dead rows)
and then:
analyze [table_name];
I haven't always seen this work, but often times it helps.
ANyway, so best bet is to:
a) Study up on Postgres indexes (a SIMPLE write up, not something ridiculously complex)
b) Study up the execution plan of the query
c) Using your understanding of Postgres indexes and how the query plan is executing, you cannot help but solve the exact problem.
For starters, your LEFT JOIN is counteracted by the predicate on the left table and is forced to act like an [INNER] JOIN. Replace with:
SELECT *
FROM tableA a
JOIN tableB b ON b.id = a.tableB_id
WHERE lower(b.foo) = lower(my_input);
Or, if you actually want the LEFT JOIN to include all rows from tableA:
SELECT *
FROM tableA a
LEFT JOIN tableB b ON b.id = a.tableB_id
AND lower(b.foo) = lower(my_input);
I think you want the first one.
An index on (lower(foo::text)) like you posted is syntactically invalid. You better post the verbatim output from \d tbl in psql like I commented repeatedly. A shorthand syntax for a cast (foo::text) in an index definition needs more parentheses, or use the standard syntax: cast(foo AS text):
Create index on first 3 characters (area code) of phone field?
But that's also unnecessary. You can just use the data type (character varying(255)) of foo. Of course, the data type character varying(255) rarely makes sense in Postgres to begin with. The odd limitation to 255 characters is derived from limitations in other RDBMS which do not apply in Postgres. Details:
Refactor foreign key to fields
Be that as it may. The perfect index for this kind of query would be a multicolumn index on B - if (and only if) you get index-only scans out of this:
CREATE INDEX "tableB_lower_foo_id" ON tableB (lower(foo), id);
You can then drop the mostly superseded index "index_tableB_on_lower_foo". Same for tableC.
The rest is covered by the (more important!) indices in table A on tableB_id and tableC_id.
If there are multiple rows in tableA per tableB_id / tableC_id, then either one of these competing commands can swing the performance to favor the respective query by physically clustering related rows together:
CLUSTER tableA USING "index_tableA_on_tableB_id";
CLUSTER tableA USING "index_tableA_on_tableC_id";
You can't have both. It's either B or C. CLUSTER also does everything a VACUUM FULL would do. But be sure to read the details first:
Optimize Postgres timestamp query range
And don't use mixed case identifiers, sometimes quoted, sometimes not. This is very confusing and is bound to lead to errors. Use legal, lower-case identifiers exclusively - then it doesn't matter if you double-quote them or not.