PostgreSQL WAL get commit content - postgresql

I'm using PostgreSQL 14.2 and am trying to replicate the inserts/updates/deletes for a database. Using
pg_waldump --rmgr=Transaction 000000010000000000000001
returns all transactions and I can see my commits, for example:
rmgr: Transaction len (rec/tot): 34/ 34, tx: 742, lsn: 0/0171DF08, prev 0/0171DE70, desc: ABORT 2022-10-05 08:50:55.236768 UTC
rmgr: Transaction len (rec/tot): 34/ 34, tx: 743, lsn: 0/0171E048, prev 0/0171DFA8, desc: COMMIT 2022-10-05 08:51:07.259488 UTC
I would like to access the contents of this commit, for example INSERT INTO table1 VALUES (5, 6);. Is there a way to access this?
The WAL level is set to replica. Or is the only way to do this by setting the level to logical and then using one of the logical decoding plugins, such as wal2json? If the level needs to be set to logical, how much would that increase the space of WALs?

WAL does not contain logical information (SQL statements), but which bytes in which block of which table is modified. You need logical decoding. How much more WAL you get when you set wal_level = logical depend on your workload and your table definitions. You'll have to try it.

Related

How to Resolve the Maximum Rejected Threshold was reached while reading data from SRCTable to TGTTable using ADF

I am getting below mentioned error while loading data from synapse src Table to synapse TGT Table.
SQLServerException: Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.\nColumn ordinal: 26, Expected data type: VARCHAR(255) collate SQL_Latin1_General_CP1_CI_AS NOT
Requesting you to suggest how to overcome the above mentioned issue.
Regards,
Ashok
The error may be due to data truncation from the column 26 of your source file.
As a first check I would suggest increasing the destination table column from VARCHAR(255) to VARCHAR(MAX) and then try to run this copy again.
ALTER TABLE TGT ALTER COLUMN [column 29] VARCHAR(MAX);
If it successes, you can easily run a max on that destination table column to determine how big it should be.
SELECT MAX(LEN([column 29])) FROM TGT
Some related reading about polybase copy:
https://medium.com/microsoftazure/azure-synapse-data-load-using-polybase-or-copy-command-from-vnet-protected-azure-storage-da8aa6a9ac68

Why does PostgreSQL think there is a conflict between the two serializable transactions?

I'm trying to figure out how the serializable isolation level in PostgreSQL works. In theory and according to PostgreSQL's own documentation PostgreSQL should be smart enough to somehow detect serialization conflicts and automatically roll back offending transactions. Yet when I tried to play with serializable isolation level myself I stumbled upon a lot of false positives and started to doubt my own understanding of the concept of serializability or PostgreSQL's implementation of it. Below you can find one of the simplest examples of such false positives:
create table mytab(
class integer,
value integer not null
);
create index mytab_class_idx on mytab (class);
insert into mytab (class, value) values (1, 10);
insert into mytab (class, value) values (1, 20);
insert into mytab (class, value) values (2, 100);
insert into mytab (class, value) values (2, 200);
The table data is the following:
class | value
-------+-------
1 | 10
1 | 20
2 | 100
2 | 200
Then I run two concurrent transactions. Step n comments in code show an order in which I execute the statements. Following advice from https://stackoverflow.com/a/42303225/3249257 I explicitly disabled sequential scan to force PostgreSQL to use an index:
SET enable_seqscan=off;
Transaction A:
begin; -- step 1
select sum(value) from mytab where class = 1; -- step 2
insert into mytab(class, value) values (3, 30); -- step 5
commit; -- step 7
Transaction B:
begin; -- step 3
select sum(value) from mytab where class = 2; -- step 4
insert into mytab(class, value) values (4, 300); -- step 6
commit; -- step 8
As I understand it, there shoudn't be any conflict between those two transactions. They don't touch the same rows. However, when I commit the second transaction it fails with this error:
[40001] ERROR: could not serialize access due to read/write dependencies among transactions
Detail: Reason code: Canceled on identification as a pivot, during commit attempt.
Hint: The transaction might succeed if retried.
What's going on here? Is my understanding of serializable isolation level flawed? Is it a failure of PostgreSQL's heuristics mentioned in this answer https://stackoverflow.com/a/50809788/3249257?
I'm using PostgreSQL 11.5 on x86_64-apple-darwin18.6.0, compiled by Apple LLVM version 10.0.1 (clang-1001.0.46.4), 64-bit.
The problem here is with predicate locks (SIReadLock) that are used by PostgreSQL to figure out whether there is a conflict between concurrent transactions. If you run the query bellow during the course of transactions' execution, you will see these locks:
select relation::regclass, locktype, page, tuple, pid from pg_locks
where mode = 'SIReadLock';
In this case, the issue was with page locks on the mytab_class_idx index. If the concurrent transactions happen to acquire a lock for the same page of mytab_class_idx relation, serialization conflict occurs. If they acquire locks for different pages, they both commit successfully.
If there is not enough data like in the question above, index entries for all rows will fall on the same page and then a serialization conflict will inevitably occur. For big enough tables serialization conflicts will happen rarely, though not as rare as they could.

What does "tuple (0,79)" in postgres log file mean when a deadlock happened?

In postgres log:
2016-12-23 15:28:14 +07 [17281-351 trns: 4280939, vtrns: 3/20] postgres#deadlocks HINT: See server log for query details.
2016-12-23 15:28:14 +07 [17281-352 trns: 4280939, vtrns: 3/20] postgres#deadlocks CONTEXT: while locking tuple (0,79) in relation "account"
2016-12-23 15:28:14 +07 [17281-353 trns: 4280939, vtrns: 3/20] postgres#deadlocks STATEMENT: SELECT id FROM account where id=$1 for update;
when I provoke a deadlock I can see text: tuple (0,79).
As I know, a tuple just is several rows in table. But I don't understand what (0,79) means. I have only 2 rows in table account, it's just play and self-learning application.
So what does (0,79) means?
This is the data type of the system column ctid. A tuple ID is a pair
(block number, tuple index within block) that identifies the physical
location of the row within its table.
read https://www.postgresql.org/docs/current/static/datatype-oid.html
It means block number 0, row index 79
also read http://rachbelaid.com/introduction-to-postgres-physical-storage/
also run SELECT id,ctid FROM account where id=$1 with right $1 to check out...

How do I track transactions in PostgreSQL log?

I recently inherited an application that uses PostgreSQL and I have been troubleshooting issues relating to saving records in the database.
I know that PostgreSQL allows me to record the transaction ID in the log by including the special value of %x in the log_line_prefix. What I noticed, however, is that the first statement that occurs within a transaction always gets logged with a zero.
If I perform the following operations in psql,
begin;
insert into numbers (1);
insert into numbers (2);
commit;
the query log will contain the following entries:
2016-09-20 03:07:40 UTC 0 LOG: statement: begin;
2016-09-20 03:07:53 UTC 0 LOG: statement: insert into numbers values (1);
2016-09-20 03:07:58 UTC 689 LOG: statement: insert into numbers values (2);
2016-09-20 03:08:03 UTC 689 LOG: statement: commit;
My log format is %t %x and as you can see, the transaction ID for the first insert statement is 0, but it changes to 689 when I execute the second insert.
Can anyone explain why after starting a transaction PostgreSQL doesn't log the right transaction ID on the first statement? Or if I'm just doing this wrong, is there a more reliable way of identifying which queries were part of a single transaction by looking at the log file?
The transaction ID is assigned after the statement starts, so log_statement doesn't capture it. BEGIN doesn't assign a transaction ID, it's delayed until the first write operation.
Use the virtual txid instead, which is assigned immediately. The placeholder is %v. These are assigned immediately, but are not persistent and are backend-local.
I find it useful to log both. The txid because it matches up to xmin and xmax system column contents, etc; the vtxid to help me group up operations done in transactions.

PostgreSQL 9.4 suddenly invalid memory alloc request size

I'm building a website which will be used to handle excel files from stores and manipulate them (merging, view, etc.). I'm using PostgreSQL 9.4 for the database, running on Centos 6.6 VM with 4GB RAM. It has 3 databases as follow:
postgres database
db_raw, which is used as a placeholder for the data. The excel which are uploaded from the website will be parsed, and the data will be stored here. The database consists of few tables used to keep the data required to process the excel, and a huge table for storing the excel data with currently >140 column and almost 1 million row
db_processed, which is the main database for the website. It has few small tables for the operational of the website (user table, access list, logging, etc), and 8 tables to store the processed excel data from db_raw. Each of the 8 tables have around 40 column and about a million row.
The databases were running fine until this morning. I tried connecting to the db_processed through pgAdmin and PuTTY, and PostgreSQL gave me this message
FATAL: invalid memory alloc request size 144115188075856068
db_raw works fine, and nothing has been changed since 3 days ago as far as I know. What should I do so I can connect to the database again?
update : I did what #CraigRinger said and restarted the service. I manage to connect to the database, but all the tables are gone :| now this keeps appearing in the log
< 2015-09-21 12:27:22.155 WIB >DEBUG: performing replication slot checkpoint
< 2015-09-21 12:27:22.158 WIB >LOG: request to flush past end of generated WAL; request 46/9E0981D8, currpos 46/771C69B0
< 2015-09-21 12:27:22.158 WIB >CONTEXT: writing block 2 of relation base/18774/12766
< 2015-09-21 12:27:22.158 WIB >ERROR: xlog flush request 46/9E0981D8 is not satisfied --- flushed only to 46/771C69B0
< 2015-09-21 12:27:22.158 WIB >CONTEXT: writing block 2 of relation base/18774/12766
< 2015-09-21 12:27:22.158 WIB >WARNING: could not write block 2 of base/18774/12766
< 2015-09-21 12:27:22.158 WIB >DETAIL: Multiple failures --- write error might be permanent.
It is caused by corrupted rows.
Create a function do "detect" the rows that are corrupted:
CREATE OR REPLACE FUNCTION is_bad_row(tableName TEXT, tabName TEXT, tidInitial tid)
RETURNS integer
as $find_bad_row$
BEGIN
EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM ' || tableName || ' WHERE ctid = $1' USING tidInitial;
RETURN 0;
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE '% = %: %', tidInitial, SQLSTATE, SQLERRM;
RETURN 1;
END
$find_bad_row$
LANGUAGE plpgsql;
... and then create a "temp table" to store the ctid of the bad rows:
create table bad_rows as
SELECT ctid as row_tid
FROM your_schema.your_table
where is_bad_row('your_schema.your_table', 'your_table', ctid) = 1
... and after that you just need to delete those rows:
delete from your_schema.your_table where ctid in (select row_tid from bad_rows)
... and remove the "temp table":
drop table bad_rows