PyFlink CDC Connectors Postgres failure - postgresql

Trying to follow the Flink CDC Connectors Postgres tutorial using PyFlink:
https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
Failing Code
ddl = """
CREATE TABLE shipments (
shipment_id INT,
order_id INT,
origin STRING,
destination STRING,
is_arrived BOOLEAN
) WITH (
'connector' = 'postgres-cdc',
'hostname' = 'localhost',
'port' = '5432',
'username' = 'postgres',
'password' = 'postgres',
'database-name' = 'postgres',
'schema-name' = 'public',
'slot.name' = 'slot2',
'table-name' = 'shipments'
);
"""
table_env.execute_sql(ddl)
table2: Table = table_env.sql_query("SELECT * FROM shipments")
table2.execute().print()
Main Stacktrace
Caused by: io.debezium.DebeziumException: Creation of replication slot failed
at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:141)
at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:130)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:759)
at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:188)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near "CREATE_REPLICATION_SLOT"
Position: 1
Troubleshooting
CDC needs access to Postgres' WAL, write ahead log. In order to get the WAL Postgres needs to have a replication slot. So you need to set this up in Postgres, I got that done with this command:
ALTER SYSTEM SET wal_level = logical
SELECT * FROM pg_create_logical_replication_slot('slot2', 'test_decoding')
Then restart Postgres.
CDC code that get the replication slot
jdbcConnection.getReplicationSlotState(connectorConfig.slotName(), connectorConfig.plugin().getPostgresPluginName());
called from:
https://github.com/debezium/debezium/blob/main/debezium-connector-postgres/src/main/java/io/debezium/connector/postgresql/PostgresConnectorTask.java
When that doesn't work you it will try to create the replication slot and that is where the exception is thrown.
Possible version issue
Maybe the problem is that I am running Postgres 13 and Ververica CDC might only support up to Postgres 12.

Related

Logstash refuse to see a postgres table

so I created my logstash conf file, and spun up logstash, kibana, Postgres, and elasticsearch in one docker compose file, it connected seemlessly with my database however it says the table "products" don't exist.
[2023-01-18T14:06:00,182][WARN ][logstash.inputs.jdbc ][main][6a13cd40fa144828caae9db4ed20b978765149c99cc59d5830fa4ccad80b4017] Exception when executing JDBC query {:exception=>"Java::OrgPostgresqlUtil::PSQLException: ERROR: relation \"products\" does not exist\n Position: 15"}
This is my conf
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://elastic-postgres-1:5432/shopdb"
jdbc_user => "postgres"
jdbc_password => "****"
jdbc_driver_library => "./postgresql-42.2.27.jre7.jar"
jdbc_driver_class => "org.postgresql.Driver"
statement => "SELECT * FROM products;"
schedule => "* * * * *"
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "PostgreSQL"
}
}
granted I did link my postgres to logstash with the conf BEFORE creating the table but I have tried re-starting the containers again. Again the error persists I tried putting in the wrong table to know if it's even updating the conf which it noticed so why isn't it seeing the table "products" which has been created and populated now?
Try explicitly to use the object schema name in your query for avoid this error not found the table.
like:
SELECT * FROM schema_name.object_name

Knex cannot find table in Cloud SQL Postgres from Cloud Functions

I am trying to connect to a Postgres 12 DB running in Cloud SQL from a Cloud Function written in TypeScript.
I create the database with the following:
import * as Knex from "knex"
const { username, password, instance } = ... // username, password, connection name (<app-name>:<region>:<database>)
const config = {
client: 'pg',
connection: {
user: username,
password: password,
database: 'ingredients',
host: `/cloudsql/${instance}`,
pool: { min: 1, max: 1}
}
}
const knex = Knex(config as Knex.Config)
I am then querying the database using:
const query = ... // passed in as param
const result = await knex('tableName').where('name', 'ilike', query).select('*')
When I run this code, I get the following error in the Cloud Functions logs:
Unhandled error { error: select * from "tableName" where "name" ilike $1 - relation "tableName" does not exist
at Parser.parseErrorMessage (/workspace/node_modules/pg-protocol/dist/parser.js:278:15)
at Parser.handlePacket (/workspace/node_modules/pg-protocol/dist/parser.js:126:29)
at Parser.parse (/workspace/node_modules/pg-protocol/dist/parser.js:39:38)
at Socket.stream.on (/workspace/node_modules/pg-protocol/dist/index.js:10:42)
at Socket.emit (events.js:198:13)
at Socket.EventEmitter.emit (domain.js:448:20)
at addChunk (_stream_readable.js:288:12)
at readableAddChunk (_stream_readable.js:269:11)
at Socket.Readable.push (_stream_readable.js:224:10)
at Pipe.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)
I created the table using the following commands in the GCP Cloud Shell (then populated with a data from a CSV):
\connect ingredients;
CREATE TABLE tableName (name VARCHAR(255), otherField VARCHAR(255), ... );
In that console, if I run the query SELECT * FROM tableName;, I see the correct data listed.
Why does Knex not see the table: tableName, but the GCP Cloud Shell does?
BTW, I am definitely connecting to the correct db, as I see the same error logs in the Cloud SQL logging interface.
Looks like you are creating the table tableName without quoting, which makes it actually lower case (case insensitive). So when creating schema do:
CREATE TABLE "tableName" ("name" VARCHAR(255), "otherField" VARCHAR(255), ... );
or use only lower-case table / column names.

AWS RDS PostgreSQL - copying from/to csv files on EC2 instance

I've run into problem that I can't fix for a few days.
The thing is - I have following architecture:
Two EC2 instances which are nodes running Trifacta application (some kind of application for data scientists),
AWS RDS PostgreSQL instance.
Since the newest version this Trifacta application is using new schema in database which performs some database migrations at the start of application. During the startup some tables are copied into *.csv files and then copied back into tables from this *csv files.
It's all okay when it's run on local database because superuser role in postgresql allows for such actions.
When it comes to performing it on AWS RDS PostgreSQL instance it falls in following errors:
Error running query COPY (select "id" from workspaces) TO '/tmp/workspaces.csv' DELIMITER ',' CSV HEADER; error: must be superuser to COPY to or from a file
at Connection.parseE (/opt/trifacta/migration-framework/node_modules/pg/lib/connection.js:614:13)
at Connection.parseMessage (/opt/trifacta/migration-framework/node_modules/pg/lib/connection.js:413:19)
at Socket.<anonymous> (/opt/trifacta/migration-framework/node_modules/pg/lib/connection.js:129:22)
at Socket.emit (events.js:315:20)
at addChunk (_stream_readable.js:295:12)
at readableAddChunk (_stream_readable.js:271:9)
at Socket.Readable.push (_stream_readable.js:212:10)
at TCP.onStreamRead (internal/stream_base_commons.js:186:23) {
length: 178,
severity: 'ERROR',
code: '42501',
detail: undefined,
hint: "Anyone can COPY to stdout or from stdin. psql's \\copy command also works for anyone.",
position: undefined,
internalPosition: undefined,
internalQuery: undefined,
where: undefined,
schema: undefined,
table: undefined,
column: undefined,
dataType: undefined,
constraint: undefined,
file: 'copy.c',
line: '905',
routine: 'DoCopy'
}
It's just first one, there are a lot of them. I made a research on that and figured why it's happening. AWS is using rds_superuser role instead of superuser and privilleges of this role aren't sufficient for copying from/to local filesystem.
From psql console it can be done with using \copy instead of copy but in my case it isn't any helpful because the way Trifacta does it is executing SQL queries from their *.js files and as far as I know it isn't possible to run \copy query from anywhere else than psql CLI.
With a suggestion of #IMSoP I am adding the code of Trifacta *.js file where the actions are performed:
ConnectUtils.copyQuery = function(query, connection, options = {}) {
ensure.notNull(connection.base.DriverName, 'connection driver name');
ensure.notNull(options.tableName, 'table name');
const table = options.tableName;
const filePath = ConnectUtils.getOutputFilePath(table, options);
if (connection.base.DriverName === DATABASE_JS_TYPE[MYSQL]) {
return `${query} INTO OUTFILE \'${filePath}\' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'`;
} else if (connection.base.DriverName === DATABASE_JS_TYPE[POSTGRES]) {
return `COPY (${query}) TO '${filePath}' DELIMITER ',' CSV HEADER;`;
} else if (connection.base.DriverName === DATABASE_JS_TYPE[SQLITE]) {
return query;
}
return;
};
ConnectUtils.loadQuery = function(connection, options = {}) {
ensure.notNull(connection.base.DriverName, 'connection driver name');
ensure.notNull(connection.base.Database, 'connection database');
ensure.notNull(options.tableName, 'table name');
const table = options.tableName;
const filePath = ConnectUtils.getOutputFilePath(table, options);
if (connection.base.DriverName === DATABASE_JS_TYPE[MYSQL]) {
return `LOAD DATA INFILE \'${filePath}\' INTO TABLE ${
connection.base.Database
}.${table} FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 ROWS;`;
} else if (connection.base.DriverName === DATABASE_JS_TYPE[POSTGRES]) {
return `COPY ${table} FROM '${filePath}' DELIMITER ',' CSV HEADER;`;
}
return;
};
${filePath} is path on EC2 instance and ${table} are the tables on AWS RDS EC2 instance. From your answers before editing my question I assume there is no way to workaround this as this script is trying to reach ${filePath} as a path on AWS RDS instance. Right?
Thanks for reading.

CDC with WSO2 Streaming Integrator and Postgres DB

I am trying to setup Change Data Capture (CDC) between WSO2 Streaming Integrator and a local Postgres DB.
I have added the Postgres Driver (v42.2.5) to SI_HOME/lib and I am able to read data from the database from a Siddhi application.
I am following the CDCWithListeningMode example to implement CDC and I am using pgoutput as the logical decoding plugin. But when I run the application I get the following log.
[2020-04-23_19-02-37_460] INFO {org.apache.kafka.connect.json.JsonConverterConfig} - JsonConverterConfig values:
converter.type = key
schemas.cache.size = 1000
schemas.enable = true
[2020-04-23_19-02-37_461] INFO {org.apache.kafka.connect.json.JsonConverterConfig} - JsonConverterConfig values:
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
[2020-04-23_19-02-37_461] INFO {io.debezium.embedded.EmbeddedEngine$EmbeddedConfig} - EmbeddedConfig values:
access.control.allow.methods =
access.control.allow.origin =
bootstrap.servers = [localhost:9092]
header.converter = class org.apache.kafka.connect.storage.SimpleHeaderConverter
internal.key.converter = class org.apache.kafka.connect.json.JsonConverter
internal.value.converter = class org.apache.kafka.connect.json.JsonConverter
key.converter = class org.apache.kafka.connect.json.JsonConverter
listeners = null
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
offset.flush.interval.ms = 60000
offset.flush.timeout.ms = 5000
offset.storage.file.filename =
offset.storage.partitions = null
offset.storage.replication.factor = null
offset.storage.topic =
plugin.path = null
rest.advertised.host.name = null
rest.advertised.listener = null
rest.advertised.port = null
rest.host.name = null
rest.port = 8083
ssl.client.auth = none
task.shutdown.graceful.timeout.ms = 5000
value.converter = class org.apache.kafka.connect.json.JsonConverter
[2020-04-23_19-02-37_516] INFO {io.debezium.connector.common.BaseSourceTask} - offset.storage = io.siddhi.extension.io.cdc.source.listening.InMemoryOffsetBackingStore
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.server.name = localhost_5432
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.port = 5432
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - table.whitelist = SweetProductionTable
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - cdc.source.object = 1716717434
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.hostname = localhost
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - database.password = ********
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - name = CDCWithListeningModeinsertSweetProductionStream
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - server.id = 6140
[2020-04-23_19-02-37_519] INFO {io.debezium.connector.common.BaseSourceTask} - database.history = io.debezium.relational.history.FileDatabaseHistory
[2020-04-23_19-02-38_103] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - user 'user_name' connected to database 'db_name' on PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit with roles:
role 'user_name' [superuser: false, replication: true, inherit: true, create role: false, create db: false, can log in: true] (Encoded)
[2020-04-23_19-02-38_104] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - No previous offset found
[2020-04-23_19-02-38_104] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - Taking a new snapshot of the DB and streaming logical changes once the snapshot is finished...
[2020-04-23_19-02-38_105] INFO {io.debezium.util.Threads} - Requested thread factory for connector PostgresConnector, id = localhost_5432 named = records-snapshot-producer
[2020-04-23_19-02-38_105] INFO {io.debezium.util.Threads} - Requested thread factory for connector PostgresConnector, id = localhost_5432 named = records-stream-producer
[2020-04-23_19-02-38_293] INFO {io.debezium.connector.postgresql.connection.PostgresConnection} - Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLSN=null]
[2020-04-23_19-02-38_704] ERROR {io.siddhi.core.stream.input.source.Source} - Error on 'CDCWithListeningMode'. Connection to the database lost. Error while connecting at Source 'cdc' at 'insertSweetProductionStream'. Will retry in '5 sec'. (Encoded)
io.siddhi.core.exception.ConnectionUnavailableException: Connection to the database lost.
at io.siddhi.extension.io.cdc.source.CDCSource.lambda$connect$1(CDCSource.java:424)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:793)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: Cannot create replication connection
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:87)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:38)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$ReplicationConnectionBuilder.build(PostgresReplicationConnection.java:362)
at io.debezium.connector.postgresql.PostgresTaskContext.createReplicationConnection(PostgresTaskContext.java:65)
at io.debezium.connector.postgresql.RecordsStreamProducer.(RecordsStreamProducer.java:81)
at io.debezium.connector.postgresql.RecordsSnapshotProducer.(RecordsSnapshotProducer.java:70)
at io.debezium.connector.postgresql.PostgresConnectorTask.createSnapshotProducer(PostgresConnectorTask.java:133)
at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:86)
at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:45)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:677)
... 3 more
Caused by: io.debezium.jdbc.JdbcConnectionException: ERROR: could not access file "decoderbufs": No such file or directory
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initReplicationSlot(PostgresReplicationConnection.java:145)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:79)
... 12 more
Caused by: org.postgresql.util.PSQLException: ERROR: could not access file "decoderbufs": No such file or directory
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:266)
at org.postgresql.replication.fluent.logical.LogicalCreateSlotBuilder.make(LogicalCreateSlotBuilder.java:48)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initReplicationSlot(PostgresReplicationConnection.java:108)
... 13 more
Debezium defaults to decoderbufs plugin - "could not access file "decoderbufs": No such file or directory".
According to this answer, the issue is due to the configuration of decoderbufs plugin.
Details
Postgres - 11.4
siddhi-cdc-io - 2.0.3
Debezium - 0.8.3
How do I configure the embedded debezium engine to use the pgoutput plugin? Will changing this configuration fix the error?
Please help me with this issue. I have not found any resources that can help me.
you either need to update the Debezium to the latest 1.1 version - this will enable you to use pgoutput plugin using plugin.name config option or you need to deploy (and maybe build) decoderbufs.so library to your PostgreSQL database.
I'd recommend the former as 0.8.3 is very old version.
I observed this behavior with PostgreSQL 12 when I tried to do CDC with pgoutput logical decoding output plug-in. It seems like even though I configured the database with pgoutput, the siddhi extension is trying to make the connection using "decoderbufs" as decoding plug-in.
When I tried configuring decoderbufs as the logical decoding output plug-in in the database level, I was able to use siddhi io extension without any issue.
It seems like for now, Siddhi io CDC only supports decoderbufs logical decoding output plug-in with PostgreSQL.

How to connect postgres database with logstash from JDBC to import data?

I'm trying to connect PostgreSQL database with Logstash to import data from postgres to elasticsearch.
I'm using JDBC driver to connect Logstash with postgres.
But i'm getting following error
[2019-06-27T13:04:05,943][ERROR][logstash.javapipeline ] A plugin
had an unrecoverable error. Will restart this plugin.
Pipeline_id:main Plugin: "postgres", jdbc_password=>, statement=>"SELECT
* FROM public.\"contacts\";", jdbc_driver_library=>"postgresql-42.2.6.jar",
jdbc_connection_string=>"jd
bc:postgresql://localhost:5432/LogstashTest",
id=>"a76a604bb9cb591dd4a19afc95e03873023e008c564101b4ac19aefe30071213",
jdbc_driver_class=>"org.postgresql.Driver", enable_metric=>true,
codec=>"plain_8f80bf3a-29fe-49e8-86b1-c94e9a298ffb", enable_metric=>true,
charset=>"UTF-8">, jdbc_paging_enabled=>false, jdbc_page_size=>100000,
jdbc_validate_connection=>false, jdbc_validation_timeout=
3600, jdbc_pool_timeout=>5, sql_log_level=>"info", connection_retry_attempts=>1,
connection_retry_attempts_wait_time=>0.5,
parameters=>{"sql_last_value"=>1970-01-01 00:00:00 UTC},
last_run_metadata_path=>"C :\Users\roshan/.logstash_jdbc_last_run",
use_column_value=>false, tracking_column_type=>"numeric",
clean_run=>false, record_last_run=>true, lowercase_column_names=>true>
Error: org.postgresql.Driver not loaded. Are you sure you've included
the correct jdbc driver in :jdbc_driver_library? Exception:
LogStash::ConfigurationError Stack:
D:/Swares/logstash-7.2.0/logstash-7.2.0/vendor/bundle/jruby/2.5.0/gems/logstash-input-jdbc-4.3.13/lib/logstash/plugin_mixins/jdbc/jdbc.rb:163:in
open_jdbc_connection'
D:/Swares/logstash-7.2.0/logstash-7.2.0/vendor/bundle/jruby/2.5.0/gems/logstash-input-jdbc-4.3.13/lib/logstash/plugin_mixins/jdbc/jdbc.rb:221:in
execute_statement'
D:/Swares/logstash-7.2.0/logstash-7.2.0/vendor/bundle/jruby/2.5.0/gems/logstash-input-jdbc-4.3.13/lib/logstash/inputs/jdbc.rb:277:in execute_query'
D:/Swares/logstash-7.2.0/logstash-7.2.0/vendor/bundle/jruby/2.5.0/gems/logstash-input-jdbc-4.3.13/lib/logstash/inputs/jdbc.rb:263:inrun'
D:/Swares/logstash-7.2.0/logstash-7.2.0/logstash-core/lib/logstash/java_pipeline.rb:309:in
inputworker'
D:/Swares/logstash-7.2.0/logstash-7.2.0/logstash-core/lib/logstash/java_pipeline.rb:302:in
block in start_input'
[2019-06-27T13:04:06,946][ERROR][logstash.inputs.jdbc ] Failed to
load postgresql-42.2.6.jar {:exception=>#}
My configurations are
Java version - "1.8.0_211"
postgres (PostgreSQL) 11.0
logstash-7.2.0
And here is my logstash conf file
input {
jdbc{
#input configuration
jdbc_driver_library => "postgresql-42.2.6.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://localhost:5432/LogstashTest"
jdbc_user => "postgres"
jdbc_password => "root"
statement => 'SELECT * FROM public."contacts";'
}
}
output{
stdout { codec => json_lines }
}
The problem can occur due to corrupted PostgreSQL JDBC driver when you passing incorrect config file to Logstash. I had the same issue, you need to check your JDBC driver file.