How to configure test environment using kafka connect and XML config file - apache-kafka

i am new to kafka, i wrote a piece of code that writes to a topic (A producer).
Now, i was given the task of watching if the content is being written on the topic.
The only information provided from my tech lead was that i should install kafka connect, and use this XML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<connections>
<connection bootstrap_servers="xxxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096,xxxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096,xxxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096" broker_security_type="SASL_SSL" chroot="/" group="Clusters" groupId="1" host="xxxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com" jaas_config="org.apache.kafka.common.security.scram.ScramLoginModule required username="USER" password="PASSWD";" keystore_location="" keystore_password="" keystore_privatekey="" name="Worten" port="9096" sasl_mechanism="SCRAM-SHA-512" schema_registry_endpoint="" truststore_location="" truststore_password="" version="VERSION_2_7_0"/>
<groups>
<group id="1" name="Clusters"/>
</groups>
</connections>
I have absolutely no idea on where or how to import this xml config file. I Installed kafka, put it to run locally but all config files are typically on this format:
$ cat config/connect-standalone.properties
partial output:
bootstrap.servers=xxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096,xxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096,xxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000
I tried adding the fields here, but many are missing, any tips would be greatly welcome, i did research for a bit, but i can't find much that helped me.
Thank you!!!
I tried searching for anything that would allow me to start a local standalone consumer cluster so i could see the topics i'm writing to.

Kafka Connect doesn't use XML files. It uses a Java .properties file only.
Your properties file shown is missing SASL_SSL values that are mentioned in the XML you were given.
The Kafka quickstart covers running Kafka Connect standalone mode, and you can refer the documentation for configuration properties, such as consumer. or producer. properties that will need configured with SASL/SSL values, such as consumer.sasl.mechanism=SCRAM-SHA-512

Related

Create kafka topic using predefined config files

Is there any way to create kafka topic in kafka/zookeeper configuration files before I will run the services, so once they will start - the topics will be in place?
I have looked inside of script bin/kafka-topics.sh and found that in the end, it executes a live command on the live server. But since the server is here, its config files are here and zookeeper with its configs also are here, is it a way to predefined topics in advance?
Unfortunately haven't found any existing config keys for this.
The servers need to be running in order to allocate metadata and log directories for them, so no

Kafka not publishing file changes to topic

Reading:Kafka Connect FileStreamSource ignores appended lines
An Answer from 2018 states :
Kafka Connect does not "watch" or "tail" a file. I don't believe it is documented anywhere that it does do that.
It seems Kafka does now support this as reading:
https://docs.confluent.io/5.5.0/connect/managing/configuring.html#standalone-example
does state the file is watched:
FileSource Connector The FileSource Connector reads data from a file
and sends it to Apache Kafka®. Beyond the configurations common to all
connectors it takes only an input file and output topic as properties.
Here is an example configuration:
name=local-file-source connector.class=FileStreamSource tasks.max=1
file=/tmp/test.txt topic=connect-test This connector will read only
one file and send the data within that file to Kafka. It will then
watch the file for appended updates only. Any modification of file
lines already sent to Kafka will not be reprocessed.
My configuration is same as question posted Kafka Connect FileStreamSource ignores appended lines
connect-file-source.properties contains:
name=my-file-connector
connector.class=FileStreamSource
tasks.max=1
file=/data/users/zamara/suivi_prod/app/data/logs.txt
topic=connect-test
Starting connect standalone with
connect-standalone connect-standalone.properties connect-file-source.properties
Adds all the contents of the file logs.txt to the topic connect-test , adding new lines to logs.txt does not add the lines to the topic. Is there configuration required to enable Kafka to watch the file so that new data added to logs.txt is added to the topic connect-test ?
Unless you're just experimenting with FileStreamSource for educational purposes, you're heading down a blind alley here. The connector only exists as a sample connector.
To ingest files into Kafka use Kafka Connect Spooldir, Kafka Connect FilePulse, or look at things like Filebeat from Elastic.

Is it possible to log all incoming messages in Apache Kafka

I need to know if it possible to configure logging for Apache Kafka broker to write all produced/consumed topics and it's messages.
I've been looking the log4j.properties but none of the suggested properties seems to do what I need.
Thanks in advance.
Looking the generated logging files by Kafka, none of them seems to log the messages written in the different topics.
UPDATE:
Not exactly what I was looking for, but for anyone looking something similar I found: https://github.com/kafka-lens/kafka-lens which provides a friendly GUI to view messages on different topics.
I feel like there's some confusion with the word "log".
As you're talking about log4j, I assume you're talking about what I'd call "application logs". Kafka does not write the records it handles in application/log4j logs. In Kafka, log4j logs are only used to trace errors and give some context about the work brokers are doing.
On the other hand, Kafka write/read records into/from its "log", the Kafka log. These are stored in the path specified by log.dirs (/tmp/kafka-logs by default) and are not directly readable. You can use the DumpLogSegments tool to read these files if you want, for example:
bin/kafka-run-class.sh kafka.tools.DumpLogSegments \
-f /tmp/kafka-logs/topic-0/00000000000000000000.log

How to dump avro data from Kafka topic and read it back in Java/Scala

We need to export production data from a Kafka topic to use it for testing purposes: the data is written in Avro and the schema is placed on the Schema registry.
We tried the following strategies:
Using kafka-console-consumer with StringDeserializer or BinaryDeserializer. We were unable to obtain a file which we could parse in Java: we always got exceptions when parsing it, suggesting the file was in the wrong format.
Using kafka-avro-console-consumer: it generates a json which includes also some bytes, for example when deserializing BigDecimal. We didn't even know which parsing option to choose (it is not avro, it is not json)
Other unsuitable strategies:
deploying a special kafka consumer would require us to package and place that code in some production server, since we are talking about our production cluster. It is just too long. After all, isn't kafka console consumer already a consumer with configurable options?
Potentially suitable strategies
Using a kafka connect Sink. We didn't find a simple way to reset the consumer offset since apparently the connector created consumer is still active even when we delete the sink
Isn't there a simply, easy way to dump the content of the value (not the schema) of a Kafka topic containing avro data to a file so that it can be parsed? I expect this to be achievable using kafka-console-consumer with the right options, plus using the correct Java Api of Avro.
for example, using kafka-console-consumer... We were unable to obtain a file which we could parse in Java: we always got exceptions when parsing it, suggesting the file was in the wrong format.
You wouldn't use regular console consumer. You would use kafka-avro-console-consumer which deserializes the binary avro data into json for you to read on the console. You can redirect > topic.txt to the console to read it.
If you did use the console consumer, you can't parse the Avro immediately because you still need to extract the schema ID from the data (4 bytes after the first "magic byte"), then use the schema registry client to retrieve the schema, and only then will you be able to deserialize the messages. Any Avro library you use to read this file as the console consumer writes it expects one entire schema to be placed at the header of the file, not only an ID pointing to anything in the registry at every line. (The basic Avro library doesn't know anything about the registry either)
The only thing configurable about the console consumer is the formatter and the registry. You can add decoders by additionally exporting them into the CLASSPATH
in such a format that you can re-read it from Java?
Why not just write a Kafka consumer in Java? See Schema Registry documentation
package and place that code in some production server
Not entirely sure why this is a problem. If you could SSH proxy or VPN into the production network, then you don't need to deploy anything there.
How do you export this data
Since you're using the Schema Registry, I would suggest using one of the Kafka Connect libraries
Included ones are for Hadoop, S3, Elasticsearch, and JDBC. I think there's a FileSink Connector as well
We didn't find a simple way to reset the consumer offset
The connector name controls if a new consumer group is formed in distributed mode. You only need a single consumer, so I would suggest standalone connector, where you can set offset.storage.file.filename property to control how the offsets are stored.
KIP-199 discusses reseting consumer offsets for Connect, but feature isn't implemented.
However, did you see Kafka 0.11 how to reset offsets?
Alternative options include Apache Nifi or Streamsets, both integrate into the Schema Registry and can parse Avro data to transport it to numerous systems
One option to consider, along with cricket_007's, is to simply replicate data from one cluster to another. You can use Apache Kafka Mirror Maker to do this, or Replicator from Confluent. Both give the option of selecting certain topics to be replicated from one cluster to another- such as a test environment.

Exporting data from volt to kafka

We are trying to do a POC where we try to export data from a volt db table to kafka below is the steps I followed:-
Step1:- prepared the deployment.xml to enable the export to kafka
<?xml version="1.0"?>
<deployment>
<cluster hostcount="1" kfactor="0" schema="ddl" />
<httpd enabled="true">
<jsonapi enabled="true" />
</httpd>
<export enabled="true" target="kafka">
<configuration>
<property name="metadata.broker.list">localhost:9092</property>
<property name="batch.mode">false</property>
</configuration>
</export>
</deployment>
Step2:- Then Strted the voltdb server
./voltdb create -d deployment-noschema.xml --zookeeper=2289
Step3:- Create a export only table and insert some data into it
create table test(x int);
export table test;
insert into test values(1);
insert into test values(2);
After this I tried to verify if any topic has been created in kafka but there was none.
./kafka-topics.sh --list --zookeeper=localhost:2289
Also I can see logging of all the data in exportoverflow directory. Could anyone please let me know what's the missing part here.
Prabhat,
In your specific case, a possible explanation of the behavior you observe is you started Kafka with out the auto create topics options set to true. The export process requires Kafka to have this enabled to be able to create topics on the fly. If not you will have to manually create the topic and then export from VoltDB.
As a side note, while you can use the zookeeper that starts with VoltDB to start your Kafka, it is not the recommended approach since when you bring down VoltDB server, then your Kafka is left with no zookeeper. It is best approach to use Kafka's own zookeeper to manager your Kafka instance.
Let me know if this helped - Thx.
Some Questions and Possible answers.
Are you using enterprise version?
Can you call #Quiesce from sqlcmd and see if your data pushes to kafka.
Which version you are using?
VoltDB embeds a zookeeper are you using standalone zookeeper or VoltDB's ? we dont test with embedded one as its not exactly same as kafka supported.
Let us know or email support At voltdb.com
Looking forward.