Errors with BigQuery Sink Connector Configuration - docker-compose

I am trying to ingest data from MySQL to BigQuery. I am using Debezium components running on Docker for this purpose.
Anytime I try to deploy the BigQuery sink connector to Kafka connect, I am getting this error:
{"error_code":400,"message":"Connector configuration is invalid and contains the following 2 error(s):\nFailed to construct GCS client: Failed to access JSON key file\nAn unexpected error occurred while validating credentials for BigQuery: Failed to access JSON key file\nYou can also find the above list of errors at the endpoint `/connector-plugins/{connectorType}/config/validate`"}
It shows it's an issue with the service account key (trying to locate it)
I granted the service account BigQuery Admin and Editor permissions, but the error persists.
This is my BigQuery connector configuration file:
{
"name": "kcbq-connect1",
"config": {
"connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",
"tasks.max" : "1",
"topics" : "kcbq-quickstart1",
"sanitizeTopics" : "true",
"autoCreateTables" : "true",
"autoUpdateSchemas" : "true",
"schemaRetriever" : "com.wepay.kafka.connect.bigquery.retrieve.IdentitySchemaRetriever",
"schemaRegistryLocation":"http://localhost:8081",
"bufferSize": "100000",
"maxWriteSize":"10000",
"tableWriteWait": "1000",
"project" : "dummy-production-overview",
"defaultDataset" : "debeziumtest",
"keyfile" : "/Users/Oladayo/Desktop/Debezium-Learning/key.json"
}
Can anyone help?
Thank you.

I needed to mount the service account key in my local directory to the Kafka connect container. That was how I was able to solve the issue. Thank you :)

Related

Debezium Connector for Oracle - Not getting new items or updates on the table

Context:
i've installed a Kafka Cluster with the confluent helm chart on AWS Kubernetes.
And i've configured a Oracle Server so I can connect to it with Kafka Connect.
My Kafka connect configuration
{
"name": "oracle-debez",
"config": {
"connector.class" : "io.debezium.connector.oracle.OracleConnector",
"tasks.max" : "1",
"database.server.name" : "servername",
"database.hostname" : "myserver",
"database.port" : "1521",
"database.user" : "myuser",
"database.password" : "mypass",
"database.dbname" : "KAFKAPOC",
"database.out.server.name" : "dbzxout",
"database.history.kafka.bootstrap.servers" : "mybrokersvc:9092",
"database.history.kafka.topic": "my-conf-topic",
"table.include.list": "MYSCHEMA.MYTABLE",
"database.oracle.version": 11,
"errors.log.enable": "true"
}
}
I've configured in this way and some topics are created:
my-conf-topic: Comes with the table DDL
servername
servername.MYSCHEMA.MYTABLE
In the 'kafka-poc-dev.MYSCHEMA.MYTABLE' topic are all the information from the table.
when i start the plugin all the information is saved with success! But the problem is that every new insert or update does not appears on the topic.
One more thing, my oracle is not the version 11, my version is Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production, but if I do not put the property "database.oracle.version": 11, it gives me the error:
"org.apache.kafka.connect.errors.ConnectException: An exception
occurred in the change event producer. This connector will be
stopped.\n\tat
io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42)\n\tat
io.debezium.connector.oracle.xstream.XstreamStreamingChangeEventSource.execute(XstreamStreamingChangeEventSource.java:82)\n\tat
io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:140)\n\tat
io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:113)\n\tat
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
d.java:834)\nCaused by: oracle.streams.StreamsExa:343)\n\tat
io.debezium.connector.oracle.xstream.XstreamStreamingChangeEventSource.execute(XstreamStreamingChangeEventSource.java:70)\n\t...
7 more\n"
Can somebody help me understand what i'm doing wrong here?
Now when i create the connector the table is being locked.. and the data is not arriving at the topics...
Table being locked
Thanks!
I'm facing a similar problem, but currently using the LogMiner adapter.
The initial snapshot and streaming works just fine, but can't get any more update/insert events if I add more connectors to Kafka Connect to monitor different tables and schemas.
Everything just stops working, even though I can see that the LogMiner sessions are still active.
Did you enable Golden Gate replication and Archive log mode?
About the database.oracle.version problem you're facing, you should just use the default value as mentioned here:
https://debezium.io/documentation/reference/connectors/oracle.html#oracle-property-database-oracle-version
"database.oracle.version" : "12+"
Posting as an answer because I can't comment yet.
Hope it helps you somehow.
You are using container and PDB version of oracle so you need to pass database.pdb.name value in your property. you must have a user with logminer or Xstream access.

Kafka Connect FileConfigProvdier not working

I'm running Kafka Connect with JDBC Source Connector for DB2 in standalone mode. Everything works fine, but I'm putting the passwords and other sensitive info into my connector file in plain text. I'd like to remove this, so I found that FileConfigProvider can be used:
https://docs.confluent.io/current/connect/security.html#fileconfigprovider
However, when I try to use this it does not seem to pick up my properties file. Here's what I'm doing:
connect.standalone.oroperties -
config.providers=file
config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
secrets.properties -
password=thePassword
Source Config -
"connection.password": "${file:/Users/me/app/context/src/main/kafkaconnect/connector/secrets.properties:password}",
"table.whitelist": "MY_TABLE",
"mode": "timestamp",
When I try to load my source connector (via rest api) I get the following error:
{"error_code":400,"message":"Connector configuration is invalid and contains the following 2 error(s):\nInvalid value com.ibm.db2.jcc.am.SqlInvalidAuthorizationSpecException: [jcc][t4][2013][11249][4.26.14] Connection authorization failure occurred. Reason: User ID or Password invalid. ERRORCODE=-4214, SQLSTATE=28000 for configuration Couldn't open connection to jdbc:db2://11.1.111.111:50000/mydb\nInvalid value com.ibm.db2.jcc.am.SqlInvalidAuthorizationSpecException: [jcc][t4][2013][11249][4.26.14] Connection authorization failure occurred. Reason: User ID or Password invalid. ERRORCODE=-4214, SQLSTATE=28000 for configuration Couldn't open connection to jdbc:db2://11.1.111.111:50000/mydb\nYou can also find the above list of errors at the endpoint /{connectorType}/config/validate"}
The password I'm providing is correct. It works if I just hardcode it into my source json. Any ideas? Thanks!
Edit: As a note, I get similar results on the sink side inserting into a Postgres database.
Edit: Result of GET /connectors:
{
"name": "jdbc_source_test-dev",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"timestamp.column.name": "UPDATED_TS",
"connection.password": "${file:/opt/connect-secrets.properties:dev-password}",
"validate.non.null": "false",
"table.whitelist": "MY_TABLE",
"mode": "timestamp",
"topic.prefix": "db2-test-",
"transforms.extractInt.field": "kafka_id",
"_comment": "The Kafka topic will be made up of this prefix, plus the table name ",
"connection.user": "username",
"name": "jdbc_source_test-dev",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"connection.url": "jdbc:db2://11.1.111.111:50000/mydb",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
},
"tasks": [
{
"connector": "jdbc_source_test-dev",
"task": 0
}
],
"type": "source"
}

issue to connect jupyter sparkmagic kernel to kerberized livy server

please help if you have any idea:
I am trying to connect kerberized Hortonworks hadoop clusters livy server with jupyter, I have 401 error when connecting
Is it possible to connect sparkmagic to kerberized spark livy server? if it is then I think I have some misconfiguration in the sparkmagic config json.
username and password is the technical user which runs server and have right of impersonation in the hadoop cluster(proxy user) not actual username when I login in the jupyterhub
its part of my config.json:
"kernel_python_credentials" : {
"username": "username",
"password": "password",
"url": "http://mylivy.server:8999",
"auth": "Kerberos"
}
"logging_config": {
"version": 1,
"formatters": {
"magicsFormatter": {
"format": "%(asctime)s\t%(levelname)s\t%(message)s",
"datefmt": ""
}
},
"handlers": {
"magicsHandler": {
"class": "hdijupyterutils.filehandler.MagicsFileHandler",
"formatter": "magicsFormatter",
"home_path": "~/.sparkmagic"
}
},
"loggers": {
"magicsLogger": {
"handlers": ["magicsHandler"],
"level": "DEBUG",
"propagate": 0
}
}
},
"wait_for_idle_timeout_seconds": 15,
"livy_session_startup_timeout_seconds": 600,
.................................................etc............................
this is error message when I try some "hello world" in the spark or pyspark notebook or shell in the jupyter:
print("Hello World")
The code failed because of a fatal error: Invalid status code '401'
from http://mylivy.server:8999/sessions with error payload:
Error 401
HTTP ERROR: 401 Problem accessing
/sessions. Reason: Authentication required Powered by Jetty://
9.3.24.v20180605 .
Some things to try: a) Make sure Spark has enough available resources
for Jupyter to create a Spark context. b) Contact your Jupyter
administrator to make sure the Spark magics library is configured
correctly. c) Restart the kernel.
(UPDATE)
I just have found the reason why the error occurred: problem was that there was not ticket on the system where the notebook app was launched, kinit command resolved the issue. P.S. also username and password was not needed when use kerberos in the config.json

AWS Data Pipelines with a Heroku Database

I'm wondering about the feasibility of connecting an AWS Data Pipeline to a Heroku Database. The heroku databases are stored on EC2 instances (east region), and require SSL.
I've tried to open up a connection using a JdbcDatabase Object, but have run into issues at every turn.
I've tried the following:
{
"id" : "heroku_database",
"name" : "heroku_database",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "org.postgresql.Driver",
"connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
"jdbcProperties": "ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory",
"username" : "#{myHerokuDatabaseUserName}",
"*password" : "#{*myHerokuDatabasePassword}"
},
with the result of:
unable to find valid certification path to requested target
ActivityFailed:SunCertPathBuilderException
as well as:
{
"id" : "heroku_database",
"name" : "heroku_database",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "org.postgresql.Driver",
"connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
"jdbcProperties": "sslmode=require",
"username" : "#{myHerokuDatabaseUserName}",
"*password" : "#{*myHerokuDatabasePassword}"
},
with the result of:
amazonaws.datapipeline.database.ConnectionFactory: Unable to establish connection to jdbc:postgresql://ec2-54-235-something-something.compute-1.amazonaws.com:5442/redacted FATAL: no pg_hba.conf entry for host "52.13.105.196", user "redacted", database "redacted", SSL off
To boot -- I have also tried to use a ShellCommandActivity to copy the postgres table from the ec2 instance and stdout it to my s3 bucket -- however the ec2 instance doesn't understand the psql command:
{
"id": "herokuDatabaseDump",
"name": "herokuDatabaseDump",
"type": "ShellCommandActivity",
"runsOn": {
"ref": "Ec2Instance"
},
"stage": "true",
"stdout": "#{myOutputS3Loc}/#{myOutputFileName}",
"command": "PGPASSWORD=#{*myHerokuDatabasePassword} psql -h #{myHerokuDatabaseHost} -U #{myHerokuDatabaseUserName} -d #{myHerokuDatabaseName} -p #{myHerokuDatabasePort} -t -A -F',' -c 'select * #{myHerokuDatabaseTableName}'"
},
and I also cannot yum install postgres beforehand.
It sucks to have both RDS and Heroku as our database sources. Any ideas on how to get a select query to run against a heroku postgres db via a data pipeline would be a great help. Thanks.
It looks like Heroku needs/wants the postgres 42.2.1 driver: https://devcenter.heroku.com/articles/heroku-postgresql#connecting-in-java. Or at least if you are compiling a java app that's what they tell you to use.
I wasn't able to find out which driver Data Pipeline uses by default but it allows you to use the jdbcDriverJarUri and specify custom driver jars: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html
An important note here is that it requires Java7, so you are going to want to use the postgres-42.2.1.jre7.jar: https://jdbc.postgresql.org/download.html
That combined with a jdbcProperties field of sslmode=require should allow it to go through and create the dump file you are looking for.

Accessing Google Cloud Storage from Grails Application

from grails application I would like to create a blob in bucket.
I already created bucket in google cloud, created service account and gave owner access to the bucket to the same service account. Later created service account key project-id-c4b144.json and it holds all the credentials.
StorageOptions storageOptions = StorageOptions.newBuilder()
.setCredentials(ServiceAccountCredentials
.fromStream(new FileInputStream("/home/etibar/Downloads/project-id-c4b144.json"))) // setting credentials
.setProjectId("project-id") //setting project id, in reality it is different
.build()
Storage storage=storageOptions.getService()
BlobId blobId = BlobId.of("dispatching-photos", "blob_name")
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("text/plain").build()
Blob blob = storage.create(blobInfo, "Hello, Cloud Storage!".getBytes(StandardCharsets.UTF_8))
When I run this code, I get a json error message back.
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Caller does not have storage.objects.create access to bucket dispatching-photos.",
"reason" : "forbidden"
} ],
"message" : "Caller does not have storage.objects.create access to bucket dispatching-photos."
}
| Grails Version: 3.2.10
| Groovy Version: 2.4.10
| JVM Version: 1.8.0_131
google-cloud-datastore:1.2.1
google-auth-library-oauth2-http:0.7.1
google-cloud-storage:1.2.2
Concerning the service account that json file corresponds to -- I'm betting either:
A) the bucket you're trying to access is owned by a different project than the one where you have that account set as a storage admin
or B) you're setting permissions for a different service account than what that json file corresponds to