I've added the following line to my server.properties file:
confluent.support.metrics.enable=false
However, when KSQL starts up, it spits out the following:
Please note that the version check feature of KSQL is enabled.
...
By proceeding with `confluent.support.metrics.enable=true`, you agree to
all such collection, transfer and use of Version information by Confluent.
You can turn the version check feature off by setting
`confluent.support.metrics.enable=false` in the KSQL configuration and
restarting the KSQL. See the Confluent Platform documentation for further information.
I know this properties file is being read and parsed by KSQL because it's pulling the other configs (like broker info) and reading that just fine. It's basically just ignoring my request to turn off metric collection. Any idea on how to actually turn this off?
This appears to have been a bug in KSQL. It was fixed here:
https://github.com/confluentinc/ksql/pull/2948
Related
I have configured a PostgreSQL CDC Connector in Confluent, which is connected to an AWS RDS instance.
The messages from each table are being streamed into the topics but the structure for the JSON is
However, I was expecting something like this structure according to the docs (specifically with the before, after, and op fields)
I have tried setting the REPLICA IDENTITY to FULL, according to the docs, but it's still not working.
Any idea how I can get these fields?
Figured out I was missing setting the after-state only parameter in Advanced Configuration to false.
I have been testing with kafka connect. But for every connector I have to go and read the connector documentation to understand the configuration needed for the connectors. As far as I read the kafka connect API documentation I have seen to APIs to get the connector related data.
GET /connector-plugins - return a list of connector plugins installed in the Kafka Connect cluster. Note that the API only checks for connectors on the worker that handles the request, which means you may see inconsistent results, especially during a rolling upgrade if you add new connector jars.
PUT /connector-plugins/{connector-type}/config/validate - validate the provided configuration values against the configuration definition. This API performs per config validation, returns suggested values and error messages during validation.
Rest of other APIs are related to created connectors. Is there anyway to get the configuration for the required connectors?
Is there anyway to get the configuration for the required connectors
The validate endpoint does exactly that, and is what the Landoop Kafka Connect UI uses to provide errors for missing/misconfigured properties.
The implementation details of how properties become required depends on the Importance level of the connector configuration, and for any non-high importance configs, referring documentation or source code (if available) would be best
I'm developing a Kafka Sink connector on my own. My deserializer is JSONConverter. However, when someone send a wrong JSON data into my connector's topic, I want to omit this record and send this record to a specific topic of my company.
My confuse is: I can't find any API for me to get my Connect's bootstrap.servers.(I know it's in the confluent's etc directory but it's not a good idea to write hard code of the directory of "connect-distributed.properties" to get the bootstrap.servers)
So question, is there another way for me to get the value of bootstrap.servers conveniently in my connector program?
Instead of trying to send the "bad" records from a SinkTask to Kafka, you should instead try to use the dead letter queue feature that was added in Kafka Connect 2.0.
You can configure the Connect runtime to automatically dump records that failed to be processed to a configured topic acting as a DLQ.
For more details, see the KIP that added this feature.
I am using kafka confluent-4.1.1. I created few topics and it worked well. I don't see previously created topics if I restart confluent.
I tried the suggestions mentioned in the post:
Kafka topic no longer exists after restart
But no luck. Did any body face the same issue? Do I need to change any configurations?
Thanks in advance.
What configuration changes do I need to do in order to persist?
confluent start will use the CONFLUENT_CURRENT environmental variable for all its data storage. If you export this to a static location, data should, in theory, persist across reboots.
Otherwise, the standard ways to run each component individually is what you would do in a production environment (e.g. zookeeeper-start, kafka-server-start, schema-registry-start, etc.), which will persist data in whatever settings you've given in their respective configuration files.
We need to export production data from a Kafka topic to use it for testing purposes: the data is written in Avro and the schema is placed on the Schema registry.
We tried the following strategies:
Using kafka-console-consumer with StringDeserializer or BinaryDeserializer. We were unable to obtain a file which we could parse in Java: we always got exceptions when parsing it, suggesting the file was in the wrong format.
Using kafka-avro-console-consumer: it generates a json which includes also some bytes, for example when deserializing BigDecimal. We didn't even know which parsing option to choose (it is not avro, it is not json)
Other unsuitable strategies:
deploying a special kafka consumer would require us to package and place that code in some production server, since we are talking about our production cluster. It is just too long. After all, isn't kafka console consumer already a consumer with configurable options?
Potentially suitable strategies
Using a kafka connect Sink. We didn't find a simple way to reset the consumer offset since apparently the connector created consumer is still active even when we delete the sink
Isn't there a simply, easy way to dump the content of the value (not the schema) of a Kafka topic containing avro data to a file so that it can be parsed? I expect this to be achievable using kafka-console-consumer with the right options, plus using the correct Java Api of Avro.
for example, using kafka-console-consumer... We were unable to obtain a file which we could parse in Java: we always got exceptions when parsing it, suggesting the file was in the wrong format.
You wouldn't use regular console consumer. You would use kafka-avro-console-consumer which deserializes the binary avro data into json for you to read on the console. You can redirect > topic.txt to the console to read it.
If you did use the console consumer, you can't parse the Avro immediately because you still need to extract the schema ID from the data (4 bytes after the first "magic byte"), then use the schema registry client to retrieve the schema, and only then will you be able to deserialize the messages. Any Avro library you use to read this file as the console consumer writes it expects one entire schema to be placed at the header of the file, not only an ID pointing to anything in the registry at every line. (The basic Avro library doesn't know anything about the registry either)
The only thing configurable about the console consumer is the formatter and the registry. You can add decoders by additionally exporting them into the CLASSPATH
in such a format that you can re-read it from Java?
Why not just write a Kafka consumer in Java? See Schema Registry documentation
package and place that code in some production server
Not entirely sure why this is a problem. If you could SSH proxy or VPN into the production network, then you don't need to deploy anything there.
How do you export this data
Since you're using the Schema Registry, I would suggest using one of the Kafka Connect libraries
Included ones are for Hadoop, S3, Elasticsearch, and JDBC. I think there's a FileSink Connector as well
We didn't find a simple way to reset the consumer offset
The connector name controls if a new consumer group is formed in distributed mode. You only need a single consumer, so I would suggest standalone connector, where you can set offset.storage.file.filename property to control how the offsets are stored.
KIP-199 discusses reseting consumer offsets for Connect, but feature isn't implemented.
However, did you see Kafka 0.11 how to reset offsets?
Alternative options include Apache Nifi or Streamsets, both integrate into the Schema Registry and can parse Avro data to transport it to numerous systems
One option to consider, along with cricket_007's, is to simply replicate data from one cluster to another. You can use Apache Kafka Mirror Maker to do this, or Replicator from Confluent. Both give the option of selecting certain topics to be replicated from one cluster to another- such as a test environment.