Postgres crashes when selecting from view - postgresql

I have a view in Postgres with the following definition:
CREATE VIEW participant_data_view AS
SELECT participant_profile.*,
"user".public_id, "user".created, "user".status, "user".region,"user".name, "user".email, "user".locale,
(SELECT COUNT(id) FROM message_log WHERE message_log.target_id = "user".id AND message_log.type = 'diary') AS diary_reminder_count,
(SELECT SUM(pills) FROM "order" WHERE "order".user_id = "user".id AND "order".status = 'complete') AS pills
FROM participant_profile
JOIN "user" ON "user".id = participant_profile.id
;
The view creation works just fine. However, when I query the view SELECT * FROM participant_data_view, postgres crashes with
10:24:46.345 WARN HikariPool-1 - Connection org.postgresql.jdbc.PgConnection#172d19fe marked as broken because of SQLSTATE(08006), ErrorCode(0) c.z.h.p.ProxyConnection
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
this question suggests to me that it might be an internal assertion that causes it to crash.
If I remove the diary_reminder_count field from the view definition, the select works just fine.
What am I doing wrong? How can I fix my view, or change it so I can query the same data in a different way?
Note that creating the view works just fine, it only crashes when querying it.
I tried running explain (analyze) select * from participant_data_view; from the IntelliJ query console, which only returns
[2020-12-08 11:13:56] [08006] An I/O error occurred while sending to the backend.
[2020-12-08 11:13:56] java.io.EOFException
I ran the same using psql, there it returns
my-database=# explain (analyze) select * from participant_data_view;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Looking at the log files, it contains:
2020-12-08 10:24:01.383 CET [111] LOG: server process (PID 89670) was terminated by signal 9: Killed: 9
2020-12-08 10:24:01.383 CET [111] DETAIL: Failed process was running: select "public"."participant_data_view"."id", "public"."participant_data_view"."study_number", <snip many other fields>,
"public"."participant_data_view"."diary_reminder_count", "public"."participant
2020-12-08 10:24:01.383 CET [111] LOG: terminating any other active server processes

In all likelihood, the Linux kernel out-of-memory killer killed your query because the system ran out of memory.
Either restrict the number of database sessions (for example with a connection pool) or reduce work_mem.
It is usually a good idea to set vm.overcommit_memory = 2 in the kernel and tune vm.overcommit_ratio appropriately.

Related

Disable logging of logical replication statements in Postgres 13.1

I have a simple process that is reading logical replication messages from postgres. This process runs every second and generates a lot of messages in the postgres logs like:
2021-02-15 20:35:11.032 UTC [35] STATEMENT: SELECT * FROM pg_logical_slot_get_changes('lazy_cloud', NULL, NULL);
2021-02-15 20:35:11.032 UTC [35] LOG: logical decoding found consistent point at 0/167C618
2021-02-15 20:35:11.032 UTC [35] DETAIL: There are no running transactions.
I've configured logging with the following settings:
log_min_messages=ERROR
log_statement=none
log_replication_commands=0
But, the logical replication logs are still produced.
Is there a setting to disable these messages? I can use sed or something like that, but would prefer a built in solution.
There is no way to disable that message short of setting
log_min_messages = fatal
in postgresql.conf, but that is not a smart idea, because then you'd miss out on all error messages in the log file and essentially disable logging.

postgresql pg_database_size throwing exception on random times

We are using Azure database for PostgreSQL ( Service ) for creating DB for each user when user register to the application ( less than 25 users databases right now ).
For reporting purpose we need information which each user's DB size.
To retrieve database size we have a Postgres function which fires the following query
SELECT pg_database.datname , pg_database_size(pg_database.datname) FROM
pg_database
We execute this function every hour throw azure function but at random time Postgres throw exceptions
Exception: Npgsql.PostgresException (0x80004005): 58P01: could not read directory "base/16452": No such file or directory at...
Exception remain same at most of the time with different directory or file location
Sometimes it also throws the exception
Exception: Npgsql.NpgsqlException (0x80004005): Exception while reading from stream ---> System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException
Working on the solution at the MSDN forums here.

Powercenter SQL1224N error connecting DB2

Im running a workflow in powercenter that is constatnly getting an SQL1224N error.
This process execute a query against one table (POLIZA) with 800k rows, it retrieves the first 10k rows and then it start to execute to another table with 75M rows, at ths moment in DB2 an idle thread error appear but the PWC process still running retrieving the 75M rows, when it is completed (after 20 minutes) the errros comes up related with the first table:
[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032
sqlstate = 40003
[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032
sqlstate = 40003
Database driver error...
Function Name : Fetch
SQL Stmt : SELECT POLIZA.BSPOL_BSCODCIA, POLIZA.BSPOL_BSRAMOCO
FROM POLIZA
WHERE
EXA01.POLIZA.BSPOL_IDEMPR='0015' for read only with ur
Native error code = -1224
DB2 Fatal Error].
I have a similar process runing against the same 2 tables and it is woking fine where the only difference I can see is that the DB2 user is different.
Any idea how can i fix this?
Regards
The common causes for -1224 are:
Your instance or database has crashed, or
Something/somebody is forcing off your application (FORCE APPLICATION or equivalent)
As for the crash, I think you would know by know. This typically requires a database or instance restart. At any rate, can you please have a look into your DIAGPATH to check for any FODC* directories whose timestamp would match the timestamp of the -1224 errors?
As for the FORCE case, you should find some evidence of the -1224 in db2diag.log. Try searching for the decimal -1224, but also for its hex representation (0xFFFFFB38).

Postgres crashes for long query

My postgres crashes for long query. It's on Debian 7 64bit, and postgresql-9.3.2. I uses all default configuration. Could anyone suggest what problem it could be? thanks.
--part1:
SELECT r1.f2 as b, r1.e as l
FROM r r8,r r7,r r6,r r5,r r4,r r3,r r2,r r1
WHERE
r1.f2=r2.f1 AND
r1.f2=r3.f1 AND
r1.f2=r4.f1 AND
r1.f1=r5.f2 AND
r1.f1=r8.f1 AND
r2.f1=r3.f1 AND
r2.f1=r4.f1 AND
r2.f2=r6.f2 AND
r2.f2=r7.f1 AND
r3.f1=r4.f1 AND
r3.f2=r7.f2 AND
r3.f2=r8.f2 AND
r4.f2=r5.f1 AND
r4.f2=r6.f1 AND
r5.f1=r6.f1 AND
r5.f2=r8.f1 AND
r6.f2=r7.f1 AND
r7.f2=r8.f2 AND
r1.d=1 AND
r2.d=1 AND
r3.d=2 AND
r4.d=2 AND
r5.d=2 AND
r6.d=2 AND
r7.d=2 AND
r8.d=2
-- part2
group by r1.f2,r1.e
having
calc_empty_a() AND
calc_empty_b();
In the query, calc_empty_a() are just empty boolean functions (return true), so they should have no problem.
If I run the query in client, the server crashes. There is nothing useful information in the log (please refer to the error info at end of the post).
If I run the part 1 query, the query works well.
If I first run the part 1 query, then I run the whole query, it works well.
If I change the query, reduce the r numbers, for example, there are only r1 to r6 FROM tables, delete the predicates with r8, r7, but keep the part 2's GROUP BY and HAVING clause. The query still works well.
If the query have one empty function in HVING clause, the query also works well, but will crash if there are two functions.
The following query works well
SELECT r1.f2 as b, r1.f1 as a , r1.e as e
FROM r r8,r r7,r r6,r r5,r r4,r r3,r r2,r r1
WHERE
r1.f2=r2.f1 AND
r1.f2=r3.f1 AND
r1.f2=r4.f1 AND
r1.f1=r5.f2 AND
r1.f1=r8.f1 AND
r2.f1=r3.f1 AND
r2.f1=r4.f1 AND
r2.f2=r6.f2 AND
r2.f2=r7.f1 AND
r3.f1=r4.f1 AND
r3.f2=r7.f2 AND
r3.f2=r8.f2 AND
r4.f2=r5.f1 AND
r4.f2=r6.f1 AND
r5.f1=r6.f1 AND
r5.f2=r8.f1 AND
r6.f2=r7.f1 AND
r7.f2=r8.f2
group by r1.f2,r1.f1, r1.e
having
calc_empty_a() AND
calc_empty_a();
I have set the os to use strict overcommit mode:
sysctl -w vm.overcommit_memory=2
Error info:
at client
The connection to the server was lost. Attempting reset: Succeeded.
at server
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2014-11-07 16:47:03 GMT
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/2126C98
LOG: record with zero length at 0/21A9D98
LOG: redo done at 0/21A9D68
LOG: last completed transaction was at log time 2014-11-07 16:47:26.844406+00
LOG: autovacuum launcher started
LOG: database system is ready to accept connections

SocketException in Mongo

I just set up a replica set in Mongo (prod environment). I'm now getting a lot of exceptions like below (clipped).
I went into mongo and ran a serverStatus command on my primary mongo node and only have about 300 connections going, so it's hardly working.
Below are my connection option settings in my server code:
auto_connect_retry = false
connections_per_host = 10
threads_multiplier = 10
max_wait_time = 120000
connect_timeout = 10000
socket_timeout = 0
Do I have something mis-configured?
Sep 9, 2013 8:31:26 PM com.mongodb.DBPortPool gotError
WARNING: emptying DBPortPool to /10.0.8.10:27017 b/c of error
java.net.SocketException: Connection timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.bson.io.Bits.readFully(Bits.java:46)
at org.bson.io.Bits.readFully(Bits.java:33)
at org.bson.io.Bits.readFully(Bits.java:28)
at com.mongodb.Response.<init>(Response.java:40)
at com.mongodb.DBPort.go(DBPort.java:142)
at com.mongodb.DBPort.call(DBPort.java:92)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:244)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:216)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:288)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:273)
at com.mongodb.DBCollection.findOne(DBCollection.java:347)
at com.mongodb.DBCollection.findOne(DBCollection.java:332)
at com.mongodb.casbah.MongoCollectionBase$class.findOneByID(MongoCollection.scala:232)
at com.mongodb.casbah.MongoCollection.findOneByID(MongoCollection.scala:866)
at com.novus.salat.dao.SalatDAO.findOneById(SalatDAO.scala:353)
at com.novus.salat.dao.ModelCompanion$class.findOneById(ModelCompanion.scala:173)
Generally a connection timeout occurs from one of the following in a replica set
1) All members are not able to communicate with each other
2) A program is connecting to replica for update and it is unable to send it to primary due to overload or 1st as well
3) All relicas are not in sync and one is lagging behind too much
4) Leader election is going on but not completed due to some reason
Please check if your relica set is consistent and all nodes are working by issuing rs.status() on primary node , also as earlier suggested check primary logs for more information