Postgres pg_dump - postgresql

My automated pg_dump process has been failing when attempting to backup a postgres database. The error message I'm receiving:
ERROR Message:
pg_dump: Error message from server: ERROR: invalid memory alloc
request size 1249770967 pg_dump: The command was: COPY
public.data_store (id, length, last_modified, data) TO stdout;
Custom backup of jackrabbit pg_dump: SQL command failed pg_dump: Error
message from server: ERROR: invalid memory alloc request size
1249770967 pg_dump: The command was: COPY public.data_store
(id, length, last_modified, data) TO stdout; [!!ERROR!!] Failed to
produce custom backup database jackrabbit Plain backup of pdi_logging
Custom backup of pdi_logging Plain backup of postgres Custom backup of
postgres Plain backup of quartz Custom backup of quartz
Based on my findings everything seemed to point to corrupt data in the table so I created a function to query the table and extract the ctid so I then may find the culprit and delete the corrupt row.
Function:
CREATE OR REPLACE FUNCTION find_bad_row(tablename text)
RETURNS tid AS
$BODY$
DECLARE
result tid;
curs REFCURSOR;
row1 RECORD;
row2 RECORD;
tabName TEXT;
count BIGINT := 0;
BEGIN
SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName;
OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName;
count := 1;
FETCH curs INTO row1;
WHILE row1.ctid IS NOT NULL LOOP
result = row1.ctid;
count := count + 1;
FETCH curs INTO row1;
EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM '
|| tableName || ' WHERE ctid = $1' INTO row2
USING row1.ctid;
IF count % 100000 = 0 THEN
RAISE NOTICE 'rows processed: %', count;
END IF;
END LOOP;
CLOSE curs;
RETURN row1.ctid;
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE 'LAST CTID: %', result;
RAISE NOTICE '%: %', SQLSTATE, SQLERRM;
RETURN result;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION find_bad_row(text)
OWNER TO pentaho;
After calling the function select find_bad_row('public.data_store'), the result given was (0,2). I searched the table for said ctid, select ctid, * from public.data_store, and deleted the preceding row. I then executed my pg_dump script and received the same error. Re-running the function on the table returns a result again of the first row. Being new to postgres is my approach altogether wrong and is there another way to resolve this? Could it be that the entire table is corrupt?

Related

What is the equivalent of PL/SQL %ISOPEN in PL/pgSQL?

I'm migrating an Oracle PLSQL SP to be compatible with Postgres plpgsql (version PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit).
The exception block of the PLSQL SP has the below code:
exception
when others then
if CURR1%isopen then
close SPV_RECON_INFO;
end if;
open CURR1 for execute select sysdate from dual;
END;
How can %isopen be implemented in Postgres?
That is simple. You have to assign a name to the cursor variable, then you can search for that cursor in pg_cursors. If there is a row with that name, the cursor is open.
Here is a self-contained example:
DO
$$DECLARE
c refcursor;
BEGIN
c := 'mycursor';
/* cursor is not open, EXIST returns FALSE */
RAISE NOTICE '%', EXISTS (SELECT 1 FROM pg_cursors WHERE name = 'mycursor');
OPEN c FOR SELECT * FROM pg_class;
/* cursor is open, EXIST returns TRUE */
RAISE NOTICE '%', EXISTS (SELECT 1 FROM pg_cursors WHERE name = 'mycursor');
END;$$;
NOTICE: f
NOTICE: t
If you do not assign a name, PostgreSQL will generate a name (but you don't know what the name is).
PostgreSQL cursors do not support %ISOPEN or %NOTFOUND. To address this problem %ISOPEN can be replaced by a boolean variable declared internally in the procedure and is updated manually when the cursor is opened or closed.
http://wiki.openbravo.com/wiki/PL-SQL_code_rules_to_write_Oracle_and_Postgresql_code
I often found it convenient in cases like this to create a function that emulates Oracle. In his case something like:
create or replace function cursor_isopen(cur text)
returns boolean
language sql
as $$
select exists (select null
from pg_cursors
where name = cur
) ;
$$;
Then your code becomes something like:
exception
when others then
if cursor_isopen(cur_name::text) then
close SPV_RECON_INFO;
end if;
Of course you need to have preset the cursor name as Laurenz Albe has pointed out. Sample test case.
do $$
declare
cur1 cursor for select table_name from information_schema.tables;
cur2 cursor for select table_name from information_schema.tables;
table_name text;
begin
cur1 := 'closed-cursor';
cur2 := 'open-cursor';
open cur2;
if cursor_isopen(cur1::text)
then
fetch cur1 into table_name;
raise notice 'First table name: %', table_name;
close cur1;
else raise notice 'cursor_isopen(''%'') returned %', cur1::text, cursor_isopen(cur1::text);
end if;
if cursor_isopen(cur2::text)
then
fetch cur2 into table_name;
raise notice 'First table name: %', table_name;
close cur2;
else raise notice 'cursor_isopen(''%'') returned %', cur1::text, cursor_isopen(cur1::text);
end if;
end;
$$;
results:
cursor_isopen('closed-cursor') returned f
cursor_isopen('open-cursor') returned t. First table name: task_states

Fixing invalid memory alloc request at PostgreSQL 9.2.9

I've encountered a problem querying some of my tables recently. When I try to select data I get an ERROR telling: ERROR: invalid memory alloc request size 4294967293. This generally indicates data corruption. There is a nice and precise technique of how to delete corrupted rows described here: https://confluence.atlassian.com/jirakb/invalid-memory-alloc-request-size-440107132.html
But, since I have lots of corrupted tables, this method is too slow. So, I've found a nice function which returns the last successful ctid here: http://blog.dob.sk/2012/05/19/fixing-pg_dump-invalid-memory-alloc-request-size/
Looking for corrupted row is a bit faster when using it, but not fast enough. I slightly modified it to store all "last successful ctid" in a different table and now it looks like this:
CREATE OR REPLACE FUNCTION
find_bad_row(tableName TEXT)
RETURNS void
as $find_bad_row$
DECLARE
result tid;
curs REFCURSOR;
row1 RECORD;
row2 RECORD;
tabName TEXT;
count BIGINT := 0;
BEGIN
DROP TABLE IF EXISTS bad_rows_tbl;
CREATE TABLE bad_rows_tbl (id varchar(255), offs BIGINT);
SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName;
OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName;
count := 1;
FETCH curs INTO row1;
WHILE row1.ctid IS NOT NULL LOOP
BEGIN
result = row1.ctid;
count := count + 1;
FETCH curs INTO row1;
EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM '
|| tableName || ' WHERE ctid = $1' INTO row2
USING row1.ctid;
IF count % 100000 = 0 THEN
RAISE NOTICE 'rows processed: %', count;
END IF;
EXCEPTION
WHEN SQLSTATE 'XX000' THEN
RAISE NOTICE 'LAST CTID: %', result;
EXECUTE 'INSERT INTO bad_rows_tbl VALUES(' || result || ',' || count || ')';
END;
END LOOP;
CLOSE curs;
END
$find_bad_row$
LANGUAGE plpgsql;
I'm quite new to plpgsql, so I'm stuck with the following question: how to query not pre-unsuccessful ctid, but the exact unsuccessful one (or calculate the next one from pre-unsuccessful) so I could insert it into bad_rows_tbl and use as an argument for a DELETE statement further?
Hope for some help...
UPD: a function I ended up
CREATE OR REPLACE FUNCTION
find_bad_row(tableName TEXT)
RETURNS tid[]
as $find_bad_row$
DECLARE
result tid;
curs REFCURSOR;
row1 RECORD;
row2 RECORD;
tabName TEXT;
youNeedMe BOOLEAN = false;
count BIGINT := 0;
arrIter BIGINT := 0;
arr tid[];
BEGIN
CREATE TABLE bad_rows_tbl (id varchar(255), offs BIGINT);
SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName;
OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName;
count := 1;
FETCH curs INTO row1;
WHILE row1.ctid IS NOT NULL LOOP
BEGIN
result = row1.ctid;
count := count + 1;
IF youNeedMe THEN
arr[arrIter] = result;
arrIter := arrIter + 1;
RAISE NOTICE 'ADDING CTID: %', result;
youNeedMe = FALSE;
END IF;
FETCH curs INTO row1;
EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM '
|| tableName || ' WHERE ctid = $1' INTO row2
USING row1.ctid;
IF count % 100000 = 0 THEN
RAISE NOTICE 'rows processed: %', count;
END IF;
EXCEPTION
WHEN SQLSTATE 'XX000' THEN
RAISE NOTICE 'LAST GOOD CTID: %', result;
youNeedMe = TRUE;
END;
END LOOP;
CLOSE curs;
RETURN arr;
END
$find_bad_row$
LANGUAGE plpgsql;
This is supplemental to the function given in the question and answers next steps after the db is dumpable.
Your next steps should be:
dumpall and restore on a physically different system. The reason being at this point we don't know what caused this and chances are not too bad that it might be hardware.
You need to take the old system down and run hardware diagnostics on it, looking for problems. You really want to find out what happened so you don't run into it again. Of particular interest:
Double check ECC RAM and MCE logs
Look at all RAID arrays and their battery backups
CPUs and PSUs
If it were me I would also look at environmental variables such as AC in and datacenter temperature.
Go over your backup strategy. In particular look at PITR (and related utility pgbarman). Make sure you can recover from a similar situation in the future if you run into it.
Data corruption doesn't just happen. In rare cases it can be caused by bugs in PostgreSQL, but in most cases it is due to your hardware or due to custom code you have running in the back-end. Narrowing down the cause and ensuring recoverability are critical going forward.
Assuming you aren't running custom C code in your database, most likely your data corruption is due to something on the hardware

Function or loop using table and topologies names as arguments in Postgresql

I'm working with topologies in PostGIS and to create a TopoGeometry column, I'm using this loop:
DO $$DECLARE r record;
BEGIN
FOR r IN SELECT * FROM table_uf_11 LOOP
BEGIN
UPDATE table_uf_11 SET tg_geom = toTopoGeom(ST_Force2D(geom),'topology_uf_11', 1, 1)
WHERE gid= r.gid;
EXCEPTION
WHEN OTHERS THEN
RAISE WARNING 'Loading of record % failed: %', r.gid, SQLERRM;
END;
END LOOP;
END$$;
The reason for using this loop is because in some rows the toTopoGeom function displays error, but are just a few cases, for exemplo 38 cases in 24.000.
Using this structure I can identify which cases are problematic in the log and fix them later.
My problem is that I have another 26 tables with their respective topologies, all of them identified by the state code, for exemplo:
table_uf_12 / topology_uf_12
table_uf_13 / topology_uf_13
table_uf_14 / topology_uf_14
...
table_uf_53 / topology_uf_53
The state code are not necessarily sequential, but the names has the same pattern. Column names as geom and tg_geom are equal for all tables.
How can I make a function or another loop structure to replicate this process in all 27 tables and the same time save the log of each table?
I tried to make a function, but in this case the arguments would be the table name and the topology name, and i'm having difficult to elaborate this structure.
Any suggestions?
I think this should do it:
DO $BODY$
DECLARE
t regclass;
gid bigint;
BEGIN
FOR t IN SELECT oid::regclass FROM pg_class WHERE relname ~ '^table_uf_\d+$' LOOP
FOR gid IN EXECUTE 'SELECT gid FROM ' || t::text LOOP
BEGIN
EXECUTE
' UPDATE ' || t::text ||
' SET tg_geom = toTopoGeom(ST_Force2D(geom), $2, 1, 1)'
' WHERE gid = $1'
USING gid, replace(t::text, 'table', 'topology');
EXCEPTION
WHEN OTHERS THEN
RAISE WARNING 'Loading of record % failed: %', gid, SQLERRM;
END;
END LOOP;
END LOOP;
END
$BODY$

Load contents from a CSV file into a PostgreSQL table

Below is a description of the procedure I went through to try and load data from a file into a PostgreSQL 8.0 database running on a Linux RedHat 7.2 host.
Now, my issue is that the FOR EVERY ROW trigger is getting called and the procedure is executing.
What I'd like it to do, however, is have it check the appropriate row of my table once I have given in the filename and decide based on the contents of the record whether to do a DUMP BULK DATA or a DUMP WHOLE CSV FILE only once (on the trigger).
Please help me solve this issue...
My logfile.tmp is as follows:
27/Apr/2013:17:03:42 +0530#192.168.1.3#16#0##$http://localhost/images/
banner-left.jpg##$10.1ff.ff.ff#-#Y#-
27/Apr/2013:17:03:42 +0530#192.168.1.3#16#0##$http://localhost/images/
banner-left.jpg##$10.ff.ff.2ff05#-#Y#-
The COPY command I am using:
/usr/local/pgsql/bin/psql localhost -d d1 -U u1 -tc "COPY tblaccesslog ( accesstime, clientip, username, request,bytes, urlpath, url, contenttype, issite, webcatname) FROM 'logfile.tmp' WITH DELIMITER AS '#';" >> /tmp/parselog.log 2>&1
The trigger (insert_accesslog_trigger) in question:
insert_accesslog_trigger BEFORE INSERT ON tblaccesslog FOR EACH ROW EXECUTE PROCEDURE accesslog_insert_trigger()
and finally the trigger function (accesslog_insert_trigger()) being used:
accesslog_insert_trigger()
DECLARE
tablemaxtuples NUMERIC(10);
tableno NUMERIC(10);
newtable TEXT;
query TEXT;
tablecount NUMERIC(10);
min_limit NUMERIC(10);
max_limit NUMERIC(10);
BEGIN
tablemaxtuples := 100000;
tableno := ( NEW.id - ( NEW.id % tablemaxtuples ) ) / tablemaxtuples +1;
newtable := 'tblaccesslog'||to_char(CURRENT_DATE,'YYYYMMDD')||'_child_'||tableno;
SELECT trim(count(tablename)) INTO tablecount FROM pg_tables WHERE tablename=newtable ;
IF tablecount = 0
THEN
min_limit := (tableno-1)*tablemaxtuples;
max_limit := min_limit + tablemaxtuples;
query := 'CREATE TABLE '||newtable||'( PRIMARY KEY (id),CHECK ( id >= '||min_limit||' AND id <'||max_limit||' ) ) INHERITS (tblaccesslog)';
EXECUTE query;
END IF;
query := 'INSERT INTO '|| newtable ||' ( id, username, clientip, url, accesstime, requestbytes, contenttype, issite, urlpath, webcatname ) VALUES ('||NEW.id||','''||NEW.username||''','''||NEW.clientip||''','''||NEW.url||''','''||NEW.accesstime||''','''||NEW.requestbytes||''','''||NEW.contenttype||''','''||NEW.issite||''','''|| replace(NEW.urlpath,'\'','') ||''','''||NEW.webcatname||''')';
EXECUTE query;
RETURN NULL;
END;
The PostgreSQL documentation overview of triggers makes clear that there is no type of trigger that suits your requirements: a FOR EACH ROW trigger will, as its name says, be executed once for each row, and as the manual page states "Statement-level triggers do not currently have any way to examine the individual row(s) modified by the statement."
However, what you can do instead is put your actual COPY command inside a function. The function could COPY TO a temporary table and then perform the appropriate steps to determine where it should go from there.
Then your copy command (which I'm guessing is in a cron job or similar) would just run SELECT bulk_insert_access_log(); rather than the long line currently listed.

postgresql: USING CURSOR for extracting data from one database and inserting them to another

here is another algorithm using cursor but i'm having a hard time fixing its error ...
CREATE OR REPLACE FUNCTION extractstudent()
RETURNS VOID AS
$BODY$
DECLARE
studcur SCROLL cursor FOR SELECT fname, lname, mname, address FROM student;
BEGIN
open studcur;
Loop
--fetching 1 row at a time
FETCH First FROM studcur;
--every row fetched is being inserted to another database on the local site
--myconT is the name of the connection to the other database in the local site
execute 'SELECT * from dblink_exec(''myconT'', ''insert into temp_student values(studcur)'')';
--move to the next row and execute again
move next from studcur;
--exit when the row content is already empty
exit when studcur is null;
end loop;
close studcur;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION extractstudent() OWNER TO postgres;
You rarely need to explicitly use cursors in postgresql or pl/pgsql. What you've written looks suspiciously like a SQL Server cursor loop construct, and you don't need to do that. Also, you can use "PERFORM" instead of "EXECUTE" to run a query and discard the results: this will avoid re-parsing the query each time (although it can't avoid dblink parsing the query each time).
You can do something more like this:
DECLARE
rec student%rowtype;
BEGIN
FOR rec IN SELECT * FROM student
LOOP
PERFORM dblink_exec('myconT',
'insert into temp_student values ('
|| quote_nullable(rec.fname) || ','
|| quote_nullable(rec.lname) || ','
|| quote_nullable(rec.mname) || ','
|| quote_nullable(rec.address) || ')');
END LOOP;
END;
Why not try it by yourself , according the error, you can try to solve them step by step !