Cannot use csvSftpConnector with schema registry - apache-kafka

I'm trying to configure a standalone producer to read csv files from a sftp server and send data to a topic on the cloud.
So far I succeeded in reading my csv data from the file and parsing it according to my value.schema.
But now instead of using a fixed configuration schema, I'd like to use the schema registry. So I configured an AVRO schema for my test topic on the confluent cloud, generated the API key/secret and updated my config files.
I can see that connection is working fine, no authentication errors, via cli I can access the test schema, but when I try to run the producer I get the following error:
[2021-09-20 16:39:53,442] INFO SftpCsvSourceConnectorConfig values:
batch.size = 1000
behavior.on.error = IGNORE
cleanup.policy = NONE
csv.case.sensitive.field.names = false
csv.escape.char = 92
csv.file.charset = UTF-8
csv.first.row.as.header = false
csv.ignore.leading.whitespace = true
csv.ignore.quotations = false
csv.keep.carriage.return = false
csv.null.field.indicator = NEITHER
csv.quote.char = 34
csv.rfc.4180.parser.enabled = false
csv.separator.char = 44
csv.skip.lines = 0
csv.strict.quotes = false
csv.verify.reader = true
empty.poll.wait.ms = 250
error.path = /home/alberto/opt/confluent-6.2.0/sftp2/error
file.minimum.age.ms = 0
finished.path = /home/alberto/opt/confluent-6.2.0/sftp2/finished
input.file.pattern = .*.csv
input.path = /home/alberto/opt/confluent-6.2.0/sftp2/data
kafka.topic = testSchema
kerberos.keytab.path =
kerberos.user.principal =
key.schema = {"name" : "com.example.users.UserKey","type" : "STRUCT","isOptional" : true,"fieldSchemas" : {"material" : {"type" : "STRING","isOptional" : true}}}
parser.timestamp.date.formats = [yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss]
parser.timestamp.timezone = UTC
processing.file.extension = .PROCESSING
proxy.password = [hidden]
proxy.username =
schema.generation.enabled = false
schema.generation.key.fields = []
schema.generation.key.name = defaultkeyschemaname
schema.generation.value.name = defaultvalueschemaname
sftp.host = 192.168.1.6
sftp.password = [hidden]
sftp.port = 22
sftp.proxy.url =
sftp.username = user
timestamp.field =
timestamp.mode = PROCESS_TIME
tls.passphrase = [hidden]
tls.pemfile =
tls.private.key = [hidden]
tls.public.key = [hidden]
value.schema =
...
Caused by: org.apache.kafka.common.config.ConfigException: Both configs key.schema and value.schema must be set if schema.generation.enabled is false, but key.schema was not null and value.schema was null.
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.validateSchema(SftpSourceConnectorConfig.java:181)
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.<init>(SftpSourceConnectorConfig.java:121)
at io.confluent.connect.sftp.source.SftpCsvSourceConnectorConfig.<init>(SftpCsvSourceConnectorConfig.java:157)
at io.confluent.connect.sftp.SftpCsvSourceConnector.start(SftpCsvSourceConnector.java:44)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:184)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:209)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:348)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:331)
... 7 more
If I set schema.generation.enabled to true, it seems that it creates an empty schema:
value.schema = {"type":"STRUCT","isOptional":false,"fieldSchemas":{}}
and then I get:
org.apache.kafka.common.config.ConfigException: Failed to access Avro data from topic testSchema : Schema being registered is incompatible with an earlier schema for subject "testSchema-value"; error code: 409; error code: 409
as if it's trying to register a schema, except that it's not what I want, I just need to fetch the schema from the registry and use it.
If anyone need any addition information regarding the configuration I'll happy to provide.

Related

ExtractField$Key doesn't work with MySqlCdcSource in Confluent Cloud

I'm creating the following connector:
CREATE
SOURCE CONNECTOR IF NOT EXISTS `myconn` WITH (
"name" = 'myconn',
"connector.class" = 'MySqlCdcSource',
"tasks.max" = 1,
--Database config--------------------------
"database.hostname" = '${dbHost}',
"database.port"....
--Kafka config------------------------------
"kafka.api.key" = '${kafkaApiKey}',
"kafka.auth.mode"...
--Connector behavior------------------------
"output.data.format" = 'AVRO',
"output.key.format" = 'STRING',
"key.converter" = 'org.apache.kafka.connect.storage.StringConverter',
"key.converter.schemas.enable" = true,
"tombstones.on.delete" = true,
"null.handling.mode" = 'keep',
"include.schema.changes" = false,
"table.include.list" = 'sandbox\.(xxx|yyy)',
"errors.tolerance" = 'none',
"errors.log.enable" = true,
"errors.log.include.messages" = true,
--Topics configuration----------------------
"topic.creation.default.cleanup.policy" = 'compact',
"topic.creation.default.min.insync.replicas"...
--Predicates--------------------------------
"predicates" = 'TopicDoestHaveIdField',
"predicates.TopicDoestHaveIdField.type" = 'org.apache.kafka.connect.transforms.predicates.TopicNameMatches',
"predicates.TopicDoestHaveIdField.pattern" = 'myconn.sandbox\.(zzz|ooo)$',
--Transforms--------------------------------
"transforms" = 'extractKey',
"transforms.extractKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractKey.field" = 'id',
"transforms.extractKey.predicate" = 'TopicDoestHaveIdField',
"transforms.extractKey.negate" = true
);
In local development I'm working with io.debezium.connector.mysql.MySqlConnector and the transformation works correctly, the problem is when I make de creation in Confluent Cloud (using MySqlCdcSource). It gives me the following error:
The field configured for org.apache.kafka.connect.transforms.ExtractField transform does not exist in the key or value of Kafka records. Please verify the record's key or value has the configured field.
The problem is that it doesn't find the field Id, but when I make a test without the transformation, I see this key in the topic: Struct(id=0000), so the field exists. I suppose it should be related with types or some configuration option is missed in my connector. Any ideas?
The problem was related with a wrong usage of a predicate. Take a look at the following part:
--Predicates--------------------------------
"predicates" = 'TopicDoestHaveIdField',
"predicates.TopicDoestHaveIdField.type" = 'org.apache.kafka.connect.transforms.predicates.TopicNameMatches',
"predicates.TopicDoestHaveIdField.pattern" = 'myconn.sandbox\.(zzz|ooo)$',
--Transforms--------------------------------
"transforms" = 'extractKey',
"transforms.extractKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractKey.field" = 'id',
"transforms.extractKey.predicate" = 'TopicDoestHaveIdField',
"transforms.extractKey.negate" = true
As you can see negate is being used, so extractKey transforms includes all the tables that doesn't match with it. Despite that I was using "table.include.list", some topics where included additionally and they didn't have id field.
In my case the confusion was motivated because of the message in log:
The field configured for org.apache.kafka.connect.transforms.ExtractField transform does not exist in the key or value of Kafka records. Please verify the record's key or value has the configured field.
(No information about the involved topic) I was supposing that the message was talking about any of the both allowed tables ('zzz' or 'ooo') but in the end it was another third topic included.

How to use the useBulkCopyForBatchInsert on the JdbcSinkConnector?

I’m trying to bulk insert to the mssql db table by adding “useBulkCopyForBatchInsert=true” to the connection.url option of Jdbcsinkconnector as below.
"connection.url": "jdbc:sqlserver://...:1433;database=****;useBulkCopyForBatchInsert=true"
But data is not being inserted using bulk insert.
I will attach the connect log and reference document.
Using bulk copy API for batch insert operation
https://learn.microsoft.com/en-us/sql/connect/jdbc/use-bulk-copy-api-batch-insert-operation?view=sql-server-ver16
Connect Log
[2022-07-18 16:46:32,224] INFO JdbcSinkConfig values:
auto.create = false
auto.evolve = false
batch.size = 3000
connection.attempts = 3
connection.backoff.ms = 10000
connection.password = [hidden]
connection.url = jdbc:sqlserver://...:1433;database=****;useBulkCopyForBatchInsert=true
connection.user = ****
db.timezone = Asia/Seoul
delete.enabled = false
dialect.name =
fields.whitelist = []
insert.mode = insert
max.retries = 10
pk.fields = []
pk.mode = none
quote.sql.identifiers = ALWAYS
retry.backoff.ms = 3000
table.name.format = ****
table.types = [TABLE]
(io.confluent.connect.jdbc.sink.JdbcSinkConfig:361)
I'm not sure if this is the issue, but you're using the wrong JDBC URL for SQL Server. You should use jdbc:sqlserver://...:1433;databaseName=; instead of jdbc:sqlserver://...:1433;database=;.

bulk processor "monstache" was unable to perform work: elastic: Error 400 (Bad Request): Validation Failed: 1: type is missing;

I am trying to integrate mongoDB and elasticsearch using monstache but I am facing this error. Please help me solve it out.
I will response with all the output you want.
config.toml file
mongo-url = "mongodb+srv://prince:mypassword#cluster0.mp297.mongodb.net/?retryWrites=true&w=majority"
elasticsearch-urls = ["http://127.0.0.1:9200"]
elasticsearch-max-conns = 10
replay = false
resume = true
enable-oplog = true
resume-name = "default"
namespace-regex = '^Satellite\.posts$'
direct-read-namespaces = ["Satellite.posts"]
change-stream-namespaces = ["Satellite."]
index-as-update = true
verbose = true
exit-after-direct-reads = false
[[mapping]]
namespace = "Satellite.posts"
index = "satellite"

Not able to connect to MongoDB with Auth - FIWARE Cygnus

We have been trying today to put a Cygnus container in production and we haven't been able to connect it to MongoDB. In our case, we have installed MongoDB with the Auth flag, and we created different users in order to test everything work.
However, we didn't find out the way to connect Cygnus. It tries to connect to the sth_default database, but the it requires enough privileges to create other databases.
The workaround was to start the MongoDB service without the Auth flag, allowing us to check that everything worked when the user can access with admin user without login in, which is not the way we would like to work, due to the fact that it is insecure.
Are we missing anything?
Thanks in advance!
UPDATE
I'm adding here the Cygnus agent.conf file. Moreover, I'm using the Docker Image (docker-ngsi: https://hub.docker.com/r/fiware/cygnus-ngsi/) in its latest version.
cygnus-ngsi.sources = http-source
# Using both, Mongo and Postgres sinks
cygnus-ngsi.sinks = mongo-sink postgresql-sink
cygnus-ngsi.channels = mongo-channel postgresql-channel
cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnus-ngsi.sources.http-source.channels = mongo-channel postgresql-channel
cygnus-ngsi.sources.http-source.port = 5050
cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
cygnus-ngsi.sources.http-source.handler.notification_target = /notify
cygnus-ngsi.sources.http-source.handler.default_service = default
cygnus-ngsi.sources.http-source.handler.default_service_path = /
cygnus-ngsi.sources.http-source.interceptors = ts gi
cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp
cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder
cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /opt/apache-flume/conf/grouping_rules.conf
cygnus-ngsi.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.NGSIMongoSink
cygnus-ngsi.sinks.mongo-sink.channel = mongo-channel
#cygnus-ngsi.sinks.mongo-sink.enable_encoding = false
#cygnus-ngsi.sinks.mongo-sink.enable_grouping = false
#cygnus-ngsi.sinks.mongo-sink.enable_name_mappings = false
#cygnus-ngsi.sinks.mongo-sink.enable_lowercase = false
#cygnus-ngsi.sinks.mongo-sink.data_model = dm-by-entity
#cygnus-ngsi.sinks.mongo-sink.attr_persistence = row
cygnus-ngsi.sinks.mongo-sink.mongo_hosts = MyIP:MyPort
cygnus-ngsi.sinks.mongo-sink.mongo_username = MyUsername
cygnus-ngsi.sinks.mongo-sink.mongo_password = MyPassword
#cygnus-ngsi.sinks.mongo-sink.db_prefix = sth_
#cygnus-ngsi.sinks.mongo-sink.collection_prefix = sth_
#cygnus-ngsi.sinks.mongo-sink.batch_size = 1
#cygnus-ngsi.sinks.mongo-sink.batch_timeout = 30
#cygnus-ngsi.sinks.mongo-sink.batch_ttl = 10
#cygnus-ngsi.sinks.mongo-sink.data_expiration = 0
#cygnus-ngsi.sinks.mongo-sink.collections_size = 0
#cygnus-ngsi.sinks.mongo-sink.max_documents = 0
#cygnus-ngsi.sinks.mongo-sink.ignore_white_spaces = true
Thanks
The following configuration lines are missing:
cygnus-ngsi.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.NGSIMongoSink
cygnus-ngsi.sinks.mongo-sink.channel = mongo-channel
I.e. you have to specify the Java class implementing the MongoDB sink, and the channel that connects the source with such a sink.
If the configuration you are showing is the default one when Cygnus is installed through Docker, then the development team must be warned.

How to remove the following character "/" from service path

Good Morning!
Currently I have set up my structure in Fiware saving my historical records in MongoDB, for this I have been using Mlab as a hosting.
I attache the configuration file of my agent, the problem comes in that due to the mandatory character "/" of the service path I can not access the generated historical data, since it is a character not allowed for collections in MongoDB.
agent_1.conf
cygnus-ngsi.sources = http-source
cygnus-ngsi.sinks = mongo-sink
cygnus-ngsi.channels = mongo-channel
cygnus-ngsi.sources.http-source.channels = mongo-channel
cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnus-ngsi.sources.http-source.port = 5050
cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
cygnus-ngsi.sources.http-source.handler.notification_target = /notify
cygnus-ngsi.sources.http-source.handler.default_service = default
cygnus-ngsi.sources.http-source.handler.default_service_path = /sevilla
cygnus-ngsi.sources.http-source.handler.events_ttl = 2
cygnus-ngsi.sources.http-source.interceptors = ts
cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp
cygnus-ngsi.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.NGSIMongoSink
cygnus-ngsi.sinks.mongo-sink.channel = mongo-channel
cygnus-ngsi.sinks.mongo-sink.enable_encoding = false
cygnus-ngsi.sinks.mongo-sink.enable_grouping = false
cygnus-ngsi.sinks.mongo-sink.enable_name_mappings = false
cygnus-ngsi.sinks.mongo-sink.enable_lowercase = false
cygnus-ngsi.sinks.mongo-sink.data_model = dm-by-service-path
cygnus-ngsi.sinks.mongo-sink.attr_persistence = row
cygnus-ngsi.sinks.mongo-sink.mongo_hosts = ds******.mlab.com:35866
cygnus-ngsi.sinks.mongo-sink.mongo_username = my_user
cygnus-ngsi.sinks.mongo-sink.mongo_password = ********
cygnus-ngsi.sinks.mongo-sink.db_prefix = sth_
cygnus-ngsi.sinks.mongo-sink.collection_prefix = sth_
cygnus-ngsi.sinks.mongo-sink.batch_size = 1
cygnus-ngsi.sinks.mongo-sink.batch_timeout = 30
cygnus-ngsi.sinks.mongo-sink.batch_ttl = 10
cygnus-ngsi.sinks.mongo-sink.data_expiration = 0
cygnus-ngsi.sinks.mongo-sink.collections_size = 0
cygnus-ngsi.sinks.mongo-sink.max_documents = 0
cygnus-ngsi.sinks.mongo-sink.ignore_white_spaces = true
cygnus-ngsi.channels.mongo-channel.type = com.telefonica.iot.cygnus.channels.CygnusMemoryChannel
cygnus-ngsi.channels.mongo-channel.capacity = 1000
cygnus-ngsi.channels.mongo-channel.transactionCapacity = 100
Is there any way for Cygnus to remove the "/" character from the service path?
Error: http://www.subirimagenes.com/imagedata.php?url=http://s2.subirimagenes.com/imagen/9827048captura-de-pantalla.png
SOLUTION: You just have to change the enconding to true in the agent configuration
cygnus-ngsi.sinks.mongo-sink.enable_encoding = true
Thank you very much!