How to deal with Kafka JDBC Sink Connector with FME - apache-kafka

I already setup the Kafka JDBC Sink Connector where it will consume the data from the kafka producer api, however I want to setup FME to deal with the data side and sink it to the database where it will interact with GIS (geographic information system) and it will stream the spatial data. I do not have much knowledge on FME, so are there any information/ documentation or does anyone know and can explain how to setup FME with the Kafka JDBC Sink Connector
Thank you

The FME connector appears to be a producer/consumer and has no correlation to the Kafka Connect API. https://docs.safe.com/fme/2019.1/html/FME_Desktop_Documentation/FME_Transformers/Transformers/kafkaconnector.htm
You also wouldn't "set it up with the JDBC connector". The sink writes to the database, so FME would need to read from there, or bypass Kafka Connect altogether, and use the FME supported Kafka consumer processes

Related

Kafka Connect or Kafka Streams?

I have a requirement to read messages from a topic, enrich the message based on provided configuration (data required for enrichment is sourced from external systems), and publish the enriched message to an output topic. Messages on both source and output topics should be Avro format.
Is this a good use case for a custom Kafka Connector or should I use Kafka Streams?
Why I am considering Kafka Connect?
Lightweight in terms of code and deployment
Configuration driven
Connection and error handling
Scalability
I like the plugin based approach in Connect. If there is a new type of message that needs to be handled I just deploy a new connector without having to deploy a full scale Java app.
Why I am not sure this is good candidate for Kafka Connect?
Calls to external system
Can Kafka be both source and sink for a connector?
Can we use Avro schemas in connectors?
Performance under load
Cannot do stateful processing (currently there is no requirement)
I have experience with Kafka Streams but not with Connect
Use both?
Use Kafka Connect to source external database into a topic.
Use Kafka Streams to build that topic into a stream/table that can then be manipulated.
Use Kafka Connect to sink back into a database, or other system other than Kafka, as necessary.
Kafka Streams can also be config driven, use plugins (i.e. reflection), is just as scalable, and has no different connection modes (to Kafka). Performance should be the similar. Error handling is really the only complex part. ksqlDB is entirely "config driven" via SQL statements, and can connect to external Connect clusters, or embed its own.
Avro works for both, yes.
Some connectors are temporarily stateful, as they build in-memory batches, such as S3 or JDBC sink connectors

Stream both schema and data changes from MySQL to MySQL using Kafka Connect

How we can stream schema and data changes along with some kind of transformations into another MySQL instance using Kafka connect source connector.
Is there a way to propagate schema changes also if I use Kafka's Python library(confluent_kafka) to consume and transform messages before loading into target DB.
You can use Debezium to stream MySQL binlogs into Kafka. Debezium is built upon Kafka Connect framework.
From there, you can use whatever client you want, including Python, to consume and transform the data.
If you want to write to MySQL, you can use Kafka Connect JDBC sink connector.
Here is an old post on this topic - https://debezium.io/blog/2017/09/25/streaming-to-another-database/

Kafka Streams application integrate with Kafka JDBC sink connector

I am trying to use kafka streams for some sort of computation, and send the result of computation to a topic which is sinked to database by JDBC sink connector. The result needs to be serialized using avro with confluent schema registry. Is there any demo or guide to show how to handle this scenario?
Not clear what you mean by "integrate"; Kafka Streams is independent from Kafka Connect, however both can be used from ksqlDB
The existing examples of Kafka Connect should be adequate enough using the output topic of your Streams tasks
As for Kafka Streams, you'd need to use the Confluent Avro Serde's and add Schema Registry URL to the StreamsConfig.

Is Kafka Connect JDBC Source connector idempotent?

Reading the documentation of this connector there isn't a mention about this characteristic.
So, does this connector guarantee that it won't produce duplicated records under broker crashes or whatever could happen?
Do we have to configure something to get indempotence the same way we would do with any other Kafka Producer (enable.idempotence: true)?
Kafka Connect JDBC source connector, is not idempotent at the moment. Here's the relevant KIP-318 and JIRA ticket.

How to fetch Kafka source connector schema based on connector name

I am using Confluent JDBC Kafka connector to publish messages into topic. The source connector will send data to topic along with schema on each poll. I want to retrieve this schema.
Is it possible? How? Can anyone suggest me
My intention is to create a KSQL stream or table based on schema build by Kafka connector on poll.
The best way to do this is to use Avro, in which the schema is stored separately and automatically used by Kafka Connect and KSQL.
You can use Avro by configuring Kafka Connect to use the AvroConverter. In your Kafka Connect worker configuration set:
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://schema-registry:8081
(Update schema-registry to the hostname of where your Schema Registry is running)
From there, in KSQL you just use
CREATE STREAM my_stream WITH (KAFKA_TOPIC='source_topic', VALUE_FORMAT='AVRO');
You don't need to specify the schema itself here, because KSQL fetches it from the Schema Registry.
You can read more about Converters and serialisers here.
Disclaimer: I work for Confluent, and wrote the referenced blog post.