How to use the useBulkCopyForBatchInsert on the JdbcSinkConnector?

How to use the useBulkCopyForBatchInsert on the JdbcSinkConnector? - apache-kafka

I’m trying to bulk insert to the mssql db table by adding “useBulkCopyForBatchInsert=true” to the connection.url option of Jdbcsinkconnector as below.
"connection.url": "jdbc:sqlserver://...:1433;database=****;useBulkCopyForBatchInsert=true"
But data is not being inserted using bulk insert.
I will attach the connect log and reference document.
Using bulk copy API for batch insert operation
https://learn.microsoft.com/en-us/sql/connect/jdbc/use-bulk-copy-api-batch-insert-operation?view=sql-server-ver16
Connect Log
[2022-07-18 16:46:32,224] INFO JdbcSinkConfig values:
auto.create = false
auto.evolve = false
batch.size = 3000
connection.attempts = 3
connection.backoff.ms = 10000
connection.password = [hidden]
connection.url = jdbc:sqlserver://...:1433;database=****;useBulkCopyForBatchInsert=true
connection.user = ****
db.timezone = Asia/Seoul
delete.enabled = false
dialect.name =
fields.whitelist = []
insert.mode = insert
max.retries = 10
pk.fields = []
pk.mode = none
quote.sql.identifiers = ALWAYS
retry.backoff.ms = 3000
table.name.format = ****
table.types = [TABLE]
(io.confluent.connect.jdbc.sink.JdbcSinkConfig:361)

I'm not sure if this is the issue, but you're using the wrong JDBC URL for SQL Server. You should use jdbc:sqlserver://...:1433;databaseName=; instead of jdbc:sqlserver://...:1433;database=;.

Related

Cannot use csvSftpConnector with schema registry

I'm trying to configure a standalone producer to read csv files from a sftp server and send data to a topic on the cloud.
So far I succeeded in reading my csv data from the file and parsing it according to my value.schema.
But now instead of using a fixed configuration schema, I'd like to use the schema registry. So I configured an AVRO schema for my test topic on the confluent cloud, generated the API key/secret and updated my config files.
I can see that connection is working fine, no authentication errors, via cli I can access the test schema, but when I try to run the producer I get the following error:
[2021-09-20 16:39:53,442] INFO SftpCsvSourceConnectorConfig values:
batch.size = 1000
behavior.on.error = IGNORE
cleanup.policy = NONE
csv.case.sensitive.field.names = false
csv.escape.char = 92
csv.file.charset = UTF-8
csv.first.row.as.header = false
csv.ignore.leading.whitespace = true
csv.ignore.quotations = false
csv.keep.carriage.return = false
csv.null.field.indicator = NEITHER
csv.quote.char = 34
csv.rfc.4180.parser.enabled = false
csv.separator.char = 44
csv.skip.lines = 0
csv.strict.quotes = false
csv.verify.reader = true
empty.poll.wait.ms = 250
error.path = /home/alberto/opt/confluent-6.2.0/sftp2/error
file.minimum.age.ms = 0
finished.path = /home/alberto/opt/confluent-6.2.0/sftp2/finished
input.file.pattern = .*.csv
input.path = /home/alberto/opt/confluent-6.2.0/sftp2/data
kafka.topic = testSchema
kerberos.keytab.path =
kerberos.user.principal =
key.schema = {"name" : "com.example.users.UserKey","type" : "STRUCT","isOptional" : true,"fieldSchemas" : {"material" : {"type" : "STRING","isOptional" : true}}}
parser.timestamp.date.formats = [yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss]
parser.timestamp.timezone = UTC
processing.file.extension = .PROCESSING
proxy.password = [hidden]
proxy.username =
schema.generation.enabled = false
schema.generation.key.fields = []
schema.generation.key.name = defaultkeyschemaname
schema.generation.value.name = defaultvalueschemaname
sftp.host = 192.168.1.6
sftp.password = [hidden]
sftp.port = 22
sftp.proxy.url =
sftp.username = user
timestamp.field =
timestamp.mode = PROCESS_TIME
tls.passphrase = [hidden]
tls.pemfile =
tls.private.key = [hidden]
tls.public.key = [hidden]
value.schema =
...
Caused by: org.apache.kafka.common.config.ConfigException: Both configs key.schema and value.schema must be set if schema.generation.enabled is false, but key.schema was not null and value.schema was null.
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.validateSchema(SftpSourceConnectorConfig.java:181)
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.<init>(SftpSourceConnectorConfig.java:121)
at io.confluent.connect.sftp.source.SftpCsvSourceConnectorConfig.<init>(SftpCsvSourceConnectorConfig.java:157)
at io.confluent.connect.sftp.SftpCsvSourceConnector.start(SftpCsvSourceConnector.java:44)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:184)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:209)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:348)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:331)
... 7 more
If I set schema.generation.enabled to true, it seems that it creates an empty schema:
value.schema = {"type":"STRUCT","isOptional":false,"fieldSchemas":{}}
and then I get:
org.apache.kafka.common.config.ConfigException: Failed to access Avro data from topic testSchema : Schema being registered is incompatible with an earlier schema for subject "testSchema-value"; error code: 409; error code: 409
as if it's trying to register a schema, except that it's not what I want, I just need to fetch the schema from the registry and use it.
If anyone need any addition information regarding the configuration I'll happy to provide.

Problems running Cygnus with Postgresql

So I have installed Cygnus and in the simple test configuration case which I took from here (https://github.com/telefonicaid/fiware-cygnus/blob/master/cygnus-ngsi/README.md) everything works fine.
But I need Postgresql as a backend for my application.
For this I adjusted the agent_1.conf file with all postgresql parameters found from http://fiware-cygnus.readthedocs.io/en/latest/cygnus-ngsi/installation_and_administration_guide/ngsi_agent_conf/
cygnus-ngsi.sources = http-source
cygnus-ngsi.sinks = postgresql-sink
cygnus-ngsi.channels = postgresql-channel
cygnus-ngsi.sources.http-source.channels = hdfs-channel mysql-channel ckan-channel mongo-channel sth-channel kafka-channel dynamo-channel postgresql-channel
cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnus-ngsi.sources.http-source.port = 5050
cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
cygnus-ngsi.sources.http-source.handler.notification_target = /notify
cygnus-ngsi.sources.http-source.handler.default_service = default
cygnus-ngsi.sources.http-source.handler.default_service_path = /
cygnus-ngsi.sources.http-source.interceptors = ts gi
cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder
cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
cygnus-ngsi.sinks.postgresql-sink.channel = postgresql-channel
cygnus-ngsi.sinks.postgresql-sink.type = com.telefonica.iot.cygnus.sinks.NGSIPostgreSQLSink
cygnus-ngsi.sinks.postgresql-sink.postgresql_host = 127.0.0.1
cygnus-ngsi.sinks.postgresql-sink.postgresql_port = 5432
cygnus-ngsi.sinks.postgresql-sink.postgresql_database = myUser
cygnus-ngsi.sinks.postgresql-sink.postgresql_username = mydb
cygnus-ngsi.sinks.postgresql-sink.postgresql_password = xxxx
cygnus-ngsi.sinks.postgresql-sink.attr_persistence = row
cygnus-ngsi.sinks.postgresql-sink.batch_size = 100
cygnus-ngsi.sinks.postgresql-sink.batch_timeout = 30
cygnus-ngsi.sinks.postgresql-sink.batch_ttl = 10
# postgresql-channel configuration
cygnus-ngsi.channels.postgresql-channel.type = memory
cygnus-ngsi.channels.postgresql-channel.capacity = 1000
cygnus-ngsi.channels.postgresql-channel.transactionCapacity = 100
I didn'r really find any information about other files I am supposed to change and aren't really sure if all parameters are correct.
I also tried the sample configuration from here http://fiware-cygnus.readthedocs.io/en/latest/cygnus-ngsi/flume_extensions_catalogue/ngsi_postgresql_sink/index.html
Cygnus seems to start correctly but all if I try to send a notification I get connection refused

Befor doing anything, please, create the database, the user and the password in Postgresql.
The cygnus configuration is like this one. The /etc/cygnus/conf/cygnus_instance_1.conf file:
CYGNUS_USER=cygnus
CONFIG_FOLDER=/usr/cygnus/conf
CONFIG_FILE=/usr/cygnus/conf/agent_1.conf
AGENT_NAME=cygnus-ngsi
LOGFILE_NAME=cygnus.log
ADMIN_PORT=8081
POLLING_INTERVAL=30
So, the other file /usr/cygnus/conf/agent_1.conf is like this one (please change PostgreSQL parameters):
cygnus-ngsi.sources = http-source
cygnus-ngsi.sinks = postgresql-sink
cygnus-ngsi.channels = postgresql-channel
cygnus-ngsi.sources.http-source.channels = postgresql-channel
cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnus-ngsi.sources.http-source.port = 5050
cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
cygnus-ngsi.sources.http-source.handler.notification_target = /notify
cygnus-ngsi.sources.http-source.handler.default_service = default
cygnus-ngsi.sources.http-source.handler.default_service_path = /
cygnus-ngsi.sources.http-source.handler.events_ttl = 10
cygnus-ngsi.sources.http-source.interceptors = ts gi
cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp
cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder
#cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
# =============================================
# postgresql-channel configuration
# channel type (must not be changed)
cygnus-ngsi.channels.postgresql-channel.type = memory
# capacity of the channel
cygnus-ngsi.channels.postgresql-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnus-ngsi.channels.postgresql-channel.transactionCapacity = 100
# ============================================
# NGSIPostgreSQLSink configuration
# channel name from where to read notification events
cygnus-ngsi.sinks.postgresql-sink.channel = postgresql-channel
# sink class, must not be changed
cygnus-ngsi.sinks.postgresql-sink.type = com.telefonica.iot.cygnus.sinks.NGSIPostgreSQLSink
# true applies the new encoding, false applies the old encoding.
# cygnus-ngsi.sinks.postgresql-sink.enable_encoding = false
# true if the grouping feature is enabled for this sink, false otherwise
cygnus-ngsi.sinks.postgresql-sink.enable_grouping = false
# true if name mappings are enabled for this sink, false otherwise
cygnus-ngsi.sinks.postgresql-sink.enable_name_mappings = false
# true if lower case is wanted to forced in all the element names, false otherwise
# cygnus-ngsi.sinks.postgresql-sink.enable_lowercase = false
# the FQDN/IP address where the PostgreSQL server runs
cygnus-ngsi.sinks.postgresql-sink.postgresql_host = 127.0.0.1
# the port where the PostgreSQL server listens for incomming connections
cygnus-ngsi.sinks.postgresql-sink.postgresql_port = 5432
# the name of the postgresql database
cygnus-ngsi.sinks.postgresql-sink.postgresql_database = cygnusdb
# a valid user in the PostgreSQL server
cygnus-ngsi.sinks.postgresql-sink.postgresql_username = cygnus
# password for the user above
cygnus-ngsi.sinks.postgresql-sink.postgresql_password = cygnusdb
# how the attributes are stored, either per row either per column (row, column)
cygnus-ngsi.sinks.postgresql-sink.attr_persistence = row
# select the data_model: dm-by-service-path or dm-by-entity
cygnus-ngsi.sinks.postgresql-sink.data_model = dm-by-entity
# number of notifications to be included within a processing batch
cygnus-ngsi.sinks.postgresql-sink.batch_size = 1
# timeout for batch accumulation
cygnus-ngsi.sinks.postgresql-sink.batch_timeout = 30
# number of retries upon persistence error
cygnus-ngsi.sinks.postgresql-sink.batch_ttl = 0
# true enables cache, false disables cache
cygnus-ngsi.sinks.postgresql-sink.backend.enable_cache = true

Not able to connect to MongoDB with Auth - FIWARE Cygnus

We have been trying today to put a Cygnus container in production and we haven't been able to connect it to MongoDB. In our case, we have installed MongoDB with the Auth flag, and we created different users in order to test everything work.
However, we didn't find out the way to connect Cygnus. It tries to connect to the sth_default database, but the it requires enough privileges to create other databases.
The workaround was to start the MongoDB service without the Auth flag, allowing us to check that everything worked when the user can access with admin user without login in, which is not the way we would like to work, due to the fact that it is insecure.
Are we missing anything?
Thanks in advance!
UPDATE
I'm adding here the Cygnus agent.conf file. Moreover, I'm using the Docker Image (docker-ngsi: https://hub.docker.com/r/fiware/cygnus-ngsi/) in its latest version.
cygnus-ngsi.sources = http-source
# Using both, Mongo and Postgres sinks
cygnus-ngsi.sinks = mongo-sink postgresql-sink
cygnus-ngsi.channels = mongo-channel postgresql-channel
cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnus-ngsi.sources.http-source.channels = mongo-channel postgresql-channel
cygnus-ngsi.sources.http-source.port = 5050
cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
cygnus-ngsi.sources.http-source.handler.notification_target = /notify
cygnus-ngsi.sources.http-source.handler.default_service = default
cygnus-ngsi.sources.http-source.handler.default_service_path = /
cygnus-ngsi.sources.http-source.interceptors = ts gi
cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp
cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder
cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /opt/apache-flume/conf/grouping_rules.conf
cygnus-ngsi.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.NGSIMongoSink
cygnus-ngsi.sinks.mongo-sink.channel = mongo-channel
#cygnus-ngsi.sinks.mongo-sink.enable_encoding = false
#cygnus-ngsi.sinks.mongo-sink.enable_grouping = false
#cygnus-ngsi.sinks.mongo-sink.enable_name_mappings = false
#cygnus-ngsi.sinks.mongo-sink.enable_lowercase = false
#cygnus-ngsi.sinks.mongo-sink.data_model = dm-by-entity
#cygnus-ngsi.sinks.mongo-sink.attr_persistence = row
cygnus-ngsi.sinks.mongo-sink.mongo_hosts = MyIP:MyPort
cygnus-ngsi.sinks.mongo-sink.mongo_username = MyUsername
cygnus-ngsi.sinks.mongo-sink.mongo_password = MyPassword
#cygnus-ngsi.sinks.mongo-sink.db_prefix = sth_
#cygnus-ngsi.sinks.mongo-sink.collection_prefix = sth_
#cygnus-ngsi.sinks.mongo-sink.batch_size = 1
#cygnus-ngsi.sinks.mongo-sink.batch_timeout = 30
#cygnus-ngsi.sinks.mongo-sink.batch_ttl = 10
#cygnus-ngsi.sinks.mongo-sink.data_expiration = 0
#cygnus-ngsi.sinks.mongo-sink.collections_size = 0
#cygnus-ngsi.sinks.mongo-sink.max_documents = 0
#cygnus-ngsi.sinks.mongo-sink.ignore_white_spaces = true
Thanks

The following configuration lines are missing:
cygnus-ngsi.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.NGSIMongoSink
cygnus-ngsi.sinks.mongo-sink.channel = mongo-channel
I.e. you have to specify the Java class implementing the MongoDB sink, and the channel that connects the source with such a sink.
If the configuration you are showing is the default one when Cygnus is installed through Docker, then the development team must be warned.

How to remove the following character "/" from service path

Good Morning!
Currently I have set up my structure in Fiware saving my historical records in MongoDB, for this I have been using Mlab as a hosting.
I attache the configuration file of my agent, the problem comes in that due to the mandatory character "/" of the service path I can not access the generated historical data, since it is a character not allowed for collections in MongoDB.
agent_1.conf
cygnus-ngsi.sources = http-source
cygnus-ngsi.sinks = mongo-sink
cygnus-ngsi.channels = mongo-channel
cygnus-ngsi.sources.http-source.channels = mongo-channel
cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnus-ngsi.sources.http-source.port = 5050
cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
cygnus-ngsi.sources.http-source.handler.notification_target = /notify
cygnus-ngsi.sources.http-source.handler.default_service = default
cygnus-ngsi.sources.http-source.handler.default_service_path = /sevilla
cygnus-ngsi.sources.http-source.handler.events_ttl = 2
cygnus-ngsi.sources.http-source.interceptors = ts
cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp
cygnus-ngsi.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.NGSIMongoSink
cygnus-ngsi.sinks.mongo-sink.channel = mongo-channel
cygnus-ngsi.sinks.mongo-sink.enable_encoding = false
cygnus-ngsi.sinks.mongo-sink.enable_grouping = false
cygnus-ngsi.sinks.mongo-sink.enable_name_mappings = false
cygnus-ngsi.sinks.mongo-sink.enable_lowercase = false
cygnus-ngsi.sinks.mongo-sink.data_model = dm-by-service-path
cygnus-ngsi.sinks.mongo-sink.attr_persistence = row
cygnus-ngsi.sinks.mongo-sink.mongo_hosts = ds******.mlab.com:35866
cygnus-ngsi.sinks.mongo-sink.mongo_username = my_user
cygnus-ngsi.sinks.mongo-sink.mongo_password = ********
cygnus-ngsi.sinks.mongo-sink.db_prefix = sth_
cygnus-ngsi.sinks.mongo-sink.collection_prefix = sth_
cygnus-ngsi.sinks.mongo-sink.batch_size = 1
cygnus-ngsi.sinks.mongo-sink.batch_timeout = 30
cygnus-ngsi.sinks.mongo-sink.batch_ttl = 10
cygnus-ngsi.sinks.mongo-sink.data_expiration = 0
cygnus-ngsi.sinks.mongo-sink.collections_size = 0
cygnus-ngsi.sinks.mongo-sink.max_documents = 0
cygnus-ngsi.sinks.mongo-sink.ignore_white_spaces = true
cygnus-ngsi.channels.mongo-channel.type = com.telefonica.iot.cygnus.channels.CygnusMemoryChannel
cygnus-ngsi.channels.mongo-channel.capacity = 1000
cygnus-ngsi.channels.mongo-channel.transactionCapacity = 100
Is there any way for Cygnus to remove the "/" character from the service path?
Error: http://www.subirimagenes.com/imagedata.php?url=http://s2.subirimagenes.com/imagen/9827048captura-de-pantalla.png
SOLUTION: You just have to change the enconding to true in the agent configuration
cygnus-ngsi.sinks.mongo-sink.enable_encoding = true
Thank you very much!

Table names are encoded incorrectly

I'm using Grails 2.4.3 on Tomcat 7.0 with PostgreSQL 9.4. I have a domain object called Iteration. If I run Grails without Tomcat, the iteration table is created. But when I try to run war inside Tomcat, ıteration table is created instead of iteration.
I did not set anything in Tomcat configuration files or the Tomcat Service to enable UTF-8 encoding.
What may cause to this problem to occur?
EDIT: Here is my production settings in DataSource.groovy:
production {
dataSource {
dbCreate = ""
url = "jdbc:postgresql://localhost:5432/db"
driverClassName = "org.postgresql.Driver"
username = "postgres"
password = "password"
dialect = "net.kaleidos.hibernate.PostgresqlExtensionsDialect"
logsql = false
properties {
jmxEnabled = true
initialSize = 5
maxActive = 50
minIdle = 5
maxIdle = 25
maxWait = 10000
maxAge = 10 * 60000
timeBetweenEvictionRunsMillis = 1800000
minEvictableIdleTimeMillis = 1800000
validationQuery = "SELECT 1"
validationQueryTimeout = 3
validationInterval = 15000
testOnBorrow = true
testWhileIdle = true
testOnReturn = false
jdbcInterceptors = "ConnectionState"
}
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to use the useBulkCopyForBatchInsert on the JdbcSinkConnector? - apache-kafka

I'm not sure if this is the issue, but you're using the wrong JDBC URL for SQL Server. You should use jdbc:sqlserver://...:1433;databaseName=; instead of jdbc:sqlserver://...:1433;database=;.

Related

Cannot use csvSftpConnector with schema registry

Problems running Cygnus with Postgresql

Not able to connect to MongoDB with Auth - FIWARE Cygnus

How to remove the following character "/" from service path

Table names are encoded incorrectly

Categories

Resources