Postgres to postgres pgloader MATERIALIZE VIEWS - postgresql

I'm trying to understand the docs https://pgloader.readthedocs.io/en/latest/ref/pgsql.html#materialize-views.
How would I make a command file for pgloader that would use this. What I have is
load database
from pgsql://user:test#127.0.0.1:8081/db
into pgsql://user2:test#127.0.0.1:8082/db2
WITH testview AS
$$
SELECT
*
FROM data
$$;
I don't understand how to use this clause. Above example can't be parsed by pgloader.
Edit: in this example I have a data table in db and I'm just testing the materialize view of pgloader
Edit2: ok the WITH should be MATERIALIZED VIEWS
load database
from pgsql://user:test#127.0.0.1:8081/db
into pgsql://user2:test#127.0.0.1:8082/db2
MATERIALIZED VIEWS testview AS
$$
SELECT
*
FROM data
$$;
Still getting some FATAL error: Database error 42703: column "adsrc" does not exist errors
Edit3: seems like the debian release is not updated for e.g. ubuntu-20.04 to counter this bug https://github.com/dimitri/pgloader/issues/1034 also docker image of pgloader seems to be broken I can't get it to read command file. Similar to issue https://github.com/dimitri/pgloader/issues/889

Related

Postgres SQL ERROR: XX001: invalid page in block

This error has just started popping when I run queries against TABLE_A .......
ERROR: XX001: invalid page in block 38 of relation pg_tblspc/16402/PG_14_202107181/16404/125828
If I try a very simple query against the same table for example SELECT * FROM TABLE_A I get a similar error....
ERROR: invalid memory alloc request size 18446744073709551613
SQL state: XX000
Or another similar query select count(*) from TABLE_A gives me....
ERROR: could not access status of transaction 917520
DETAIL: Could not open file "pg_xact/0000": No such file or directory.
SQL state: 58P01
Based on this thread I tried this fix....
SET zero_damaged_pages = on;
VACUUM full TABLE_A;
REINDEX TABLE TABLE_A;
The 2nd command, VACUUM full TABLE_A produced another related error....
ERROR: found xmax 16384 from before relfrozenxid 379279265
SQL state: XX001
I think all these problems boil down to a simple case of file corruption at the OS level. I do have the ability to drop and re-create this table, but before I start I'd like to know if there's a quicker/simpler solution, and if there's any way of stopping this from happening again.

How can I reproduce a database context to debug a tricky PostgreSQL error: "variable not found in subplan target list"

I am facing a tricky error with a PostgreSQL Database that suddenly popped up and I cannot reproduce elsewhere.
The error happened suddenly without any known maintenance or upgrade and seems to be related to a specific database context.
Documentation
The bug seems to go back and forth, here is a list of links found when searching over the web for the error message:
Februray 2015: How to fix "InternalError: variable not found in subplan target list"
October 2017: query error: variable not found in subplan target list (when using PG 9.6.2)
Feburary 2022: PGroonga index-only scan problem with yesterday’s PostgreSQL updates
June 2022 (my report): Sudden database error with COUNT(*) making Query Planner crashes: variable not found in subplan target list
The product version I have detected the error is:
SELECT version();
-- PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
And the only extensions I have installed are:
SELECT extname, extversion FROM pg_extension;
-- "plpgsql" "1.0"
-- "postgis" "3.1.1"
Symptom
The error main symptom is variable not found in subplan target list:
SELECT COUNT(*) FROM items;
-- ERROR: variable not found in subplan target list
-- SQL state: XX000
And does not affect all tables, just some specific ones.
What is interesting is that it is only partially broken:
SELECT COUNT(id) FROM items; -- 213
SELECT COUNT(*) FROM items WHERE id > 0; -- 213
And it only affect the COUNT(*) aggregate most probably because of the * placeholder.
Further more the error is related to the query plan not to the query itself as:
EXPLAIN SELECT COUNT(*) FROM item;
-- ERROR: variable not found in subplan target list
-- SQL state: XX000
Fails as well without actually executing the query.
Digging into the PostgreSQL code on GitHub the error message appears here and is related to the function search_indexed_tlist_for_var in case it returns nothing.
This pointer should explain why it happens when using * placeholder instead of an explicit column name.
Reproducibility
It is a tricky bug, simply because showing it exists is difficult and the bug is somehow vicious as by now I cannot understand which conditions make it happen.
It seems this bug is raising in specific context (eg. bug with equivalent message and symptom reported with the PGroonga extension) but in my case I cannot make a parallel by now.
It is likely I am facing an equivalent problem in a different context but I could not succeed in capturing a simple MCVE to spot it.
CREATE TABLE t AS SELECT CAST(c AS text) FROM generate_series(1, 10000) AS c;
-- SELECT 10000
CREATE INDEX t_c ON t(c);
-- CREATE INDEX
VACUUM t;
-- VACUUM
SELECT COUNT(*) FROM t;
-- 10000
Works as expected. Table having the issue relies on postgis extension index, but again I cannot reproduce it:
CREATE EXTENSION postgis;
-- CREATE EXTENSION
CREATE TABLE test(
id serial,
geom geometry(Point, 4326)
);
-- CREATE TABLE
INSERT INTO test
SELECT x, ST_MakePoint(x/10000., x/10000.) FROM generate_series(1, 10000) AS x;
-- INSERT 0 10000
CREATE INDEX test_index ON test USING GIST(geom);
-- CREATE INDEX
VACUUM test;
-- VACUUM
SELECT COUNT(*) FROM test;
-- 10000
Works as expected.
And when I dump and restore the faulty database the problem vanishes.
Looking for a MCVE
When trying to reproduce the bug in order to build an MCVE and unit tests to highlight it and report it to developers I face a limitation. When dumping the database and recreating to a new instance, the bug simply vanishes.
So the unique way I can reproduce this bug is using the original database but I could not succeed to prepare a dump of the database to reproduce the bug elsewhere.
This what it is all about, I am looking for hints to reproduce the bug in my context.
At this point my analysis is:
The bug is related to the database state or to some meta data that is not equally the same when the dump is restored;
The bug is related to the COUNT function when using the * wildcard when there is no filtering clause;
The bug is not general as it affects only specific tables with specific index;
The bug reside at the query planner side.
Seems like some meta or state corruption prevent the query planner to find a column name to apply the COUNT method.
Question
My question is: How can I deeper investigate this bug to make it:
either reproducible (a dump technique preserving it);
or understandable to a developer (meta queries to identify where the problem resides in the database)?
Another way to phrase it would be:
How can I reproduce the context which is making the query planner crashes?
Is there a way to make the planner more verbose in order to get more details on the error?
What queries can I run against the catalog to capture the faulty context?

Query returns Error despite being executed succesfully (Robot Framework / JayDeBeApi)

Using the Keyword Query from the Robot Framework DatabaseLibrary JayDeBeApi in conjunction with DB2 like this: ${results}= Query CREATE TABLE SCHEMANAME.TEST_TEMP (id BIGINT, name VARCHAR(25)) is being executed (table exists afterwards).
But nevertheless RobotFramework throws a FAIL and ${results} contains the Message DatabaseError: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-601, SQLSTATE=42710, SQLERRMC=SCHEMANAME.TEST_TEMP;TABLE, DRIVER=4.14.122 and often even a very simple Message Error after running the same statement.
Running the query above (copy/paste) directly within a database SQL window doesn't return any errors.
How is it possible, in RobotFramework the query is executed successfully but nevertheless an error is thrown?
The error SQLCODE=-601 means that you are trying to create an object that already exists. So when you say that the table exists afterwards, it means that it existed before you ran the statement. I don't know the framework you are using, but the explanation by #pavelsaman in comment seems to be a very likely cause.

PostgreSQL: Creating a Trigger that tries to do work on a non existing table

as we start to migrate our Application from using Oracle to PostgreSQL we ran into the following problem:
A lot of our Oracle scripts create triggers that work on Oracle specific tables which dont exist in PostgreSQL. When running these scripts on the PG database they will not throw an error.
Only when the trigger is triggered an error is thrown.
Example code:
-- Invalid query under PostgreSQL
select * from v$mystat;
-- Create a view with the invalid query does not work (as expected)
create or replace view Invalid_View as
select * from v$mystat;
-- Create a test table
create table aaa_test_table (test timestamp);
-- Create a trigger with the invalid query does(!) work (not as expected)
create or replace trigger Invalid_Trigger
before insert
on aaa_test_table
begin
select * from v$mystat;
end;
-- Insert fails if the trigger exists
insert into aaa_test_table (test) values(sysdate);
-- Select from the test table
select * from aaa_test_table
order by test desc;
Is there a way to change this behavior to throw an error on trigger creation instead?
Kind Regards,
Hammerfels
Edit:
I was made aware, that we actually dont use basic PostgreSQL but EDB instead.
That would probably explain why the syntax for create trigger seems wrong.
I'm sorry for the confusion.
It will trigger an error, unless you have configured Postgres to postpone validation when creating functions.
Try issuing this before creating the trigger:
set check_function_bodies = on;
Creating the trigger should show
ERROR: syntax error at or near "trigger"
LINE 1: create or replace trigger Invalid_Trigger

Triggers in Postgresql/postgis

I have a shapefile loaded into a postgis database. This shapefile is frequently updated by the source and thus my current process is:
Use shp2pgql with -a option to generate insert statements.
Run the SQL generated in step 1 to append to database.
Of course, I end up with all the rows from both versions of the shapefile, and what I need is to get rid of all the previous rows and load only the rows from the updated shapefile.
I tried creating a trigger and trigger function in the database:
CREATE TRIGGER drop_all_rows_from_owner_table_trigger
BEFORE INSERT
ON owner_polygons_common_ownership_layer
FOR EACH STATEMENT
EXECUTE PROCEDURE drop_all_rows_from_owner_table();
Here's the trigger function:
CREATE OR REPLACE FUNCTION drop_all_rows_from_owner_table()
RETURNS trigger AS $$
BEGIN DELETE FROM owner_polygons_common_ownership_layer;
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
I believe all I have accomplished is to delete all rows from the table, insert the new rows, then delete them again, because when I look at the table after the process ends I have zero rows. I used the FOR EACH STATEMENT clause because shp2sql created one INSERT statement.
My question is: Are triggers the way to go to accomplish this?
Your trigger function seems right.
However, I don't think this is the way to go: you cannot be sure that shp2pgsql produces a single statement.
If your shapefile grows, it could split your inserts in multiple statements.
So, if you can't use the -d option (that delete and recreate the table), I'd add a step to the process, between 1 and 2, to truncate the table.
You could also prepend the truncate statement in the generated sql file, or you can execute another psql command to truncate the table.