Exporting data from volt to kafka - apache-kafka

We are trying to do a POC where we try to export data from a volt db table to kafka below is the steps I followed:-
Step1:- prepared the deployment.xml to enable the export to kafka
<?xml version="1.0"?>
<deployment>
<cluster hostcount="1" kfactor="0" schema="ddl" />
<httpd enabled="true">
<jsonapi enabled="true" />
</httpd>
<export enabled="true" target="kafka">
<configuration>
<property name="metadata.broker.list">localhost:9092</property>
<property name="batch.mode">false</property>
</configuration>
</export>
</deployment>
Step2:- Then Strted the voltdb server
./voltdb create -d deployment-noschema.xml --zookeeper=2289
Step3:- Create a export only table and insert some data into it
create table test(x int);
export table test;
insert into test values(1);
insert into test values(2);
After this I tried to verify if any topic has been created in kafka but there was none.
./kafka-topics.sh --list --zookeeper=localhost:2289
Also I can see logging of all the data in exportoverflow directory. Could anyone please let me know what's the missing part here.

Prabhat,
In your specific case, a possible explanation of the behavior you observe is you started Kafka with out the auto create topics options set to true. The export process requires Kafka to have this enabled to be able to create topics on the fly. If not you will have to manually create the topic and then export from VoltDB.
As a side note, while you can use the zookeeper that starts with VoltDB to start your Kafka, it is not the recommended approach since when you bring down VoltDB server, then your Kafka is left with no zookeeper. It is best approach to use Kafka's own zookeeper to manager your Kafka instance.
Let me know if this helped - Thx.

Some Questions and Possible answers.
Are you using enterprise version?
Can you call #Quiesce from sqlcmd and see if your data pushes to kafka.
Which version you are using?
VoltDB embeds a zookeeper are you using standalone zookeeper or VoltDB's ? we dont test with embedded one as its not exactly same as kafka supported.
Let us know or email support At voltdb.com
Looking forward.

Related

How to configure test environment using kafka connect and XML config file

i am new to kafka, i wrote a piece of code that writes to a topic (A producer).
Now, i was given the task of watching if the content is being written on the topic.
The only information provided from my tech lead was that i should install kafka connect, and use this XML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<connections>
<connection bootstrap_servers="xxxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096,xxxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096,xxxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096" broker_security_type="SASL_SSL" chroot="/" group="Clusters" groupId="1" host="xxxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com" jaas_config="org.apache.kafka.common.security.scram.ScramLoginModule required username="USER" password="PASSWD";" keystore_location="" keystore_password="" keystore_privatekey="" name="Worten" port="9096" sasl_mechanism="SCRAM-SHA-512" schema_registry_endpoint="" truststore_location="" truststore_password="" version="VERSION_2_7_0"/>
<groups>
<group id="1" name="Clusters"/>
</groups>
</connections>
I have absolutely no idea on where or how to import this xml config file. I Installed kafka, put it to run locally but all config files are typically on this format:
$ cat config/connect-standalone.properties
partial output:
bootstrap.servers=xxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096,xxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096,xxxxxxxxx.c3.kafka.eu-west-3.amazonaws.com:9096
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000
I tried adding the fields here, but many are missing, any tips would be greatly welcome, i did research for a bit, but i can't find much that helped me.
Thank you!!!
I tried searching for anything that would allow me to start a local standalone consumer cluster so i could see the topics i'm writing to.
Kafka Connect doesn't use XML files. It uses a Java .properties file only.
Your properties file shown is missing SASL_SSL values that are mentioned in the XML you were given.
The Kafka quickstart covers running Kafka Connect standalone mode, and you can refer the documentation for configuration properties, such as consumer. or producer. properties that will need configured with SASL/SSL values, such as consumer.sasl.mechanism=SCRAM-SHA-512

Add multiple Kafka users to clickhouse

I'm trying to use Apache Kafka with Clickhouse. I have a kafka username and password which I added to config.xml file in clickhouse files liked this:
<kafka>
<sasl_mechanisms>SCRAM-SHA-256</sasl_mechanisms>
<sasl_username>some_user</sasl_username>
<sasl_password>some_pass</sasl_password>
</kafka>
This way I can use Kafka topics which is available for that one user. How can I use multiple user and use different topics available for different users while using a Kafka Engine in Clickhouse?
Is there a way to configure Kafka user settings while writing Kafka table with SQL in Clickhouse?
https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka/#configuration
each topic can have own settings
<kafka_mytopic>
<sasl_mechanisms>SCRAM-SHA-256</sasl_mechanisms>
<sasl_username>yyyy</sasl_username>
<sasl_password>xxxx</sasl_password>
</kafka_mytopic>
<kafka_mytopic2>
<sasl_mechanisms>SCRAM-SHA-256</sasl_mechanisms>
<sasl_username>ddd</sasl_username>
<sasl_password>zzz</sasl_password>
</kafka_mytopic2>

Configuration of a specific topic. Kafkacat

I have a topic "topic-one" and I want to know if it has "log.cleanup.policy = compact" configured or not.
Is it possible with kafkacat, extract the properties and / or configuration of a specific topic?
kafkacat does not yet support the Topic Admin API (which allows you to alter and view cluster configs).
Suggest you use kafka-configs.sh from the Apache Kafka distribution in the meantine.

Configuring Kafka connect Postgress Debezium CDC plugin

I am trying to use kafka connect to read changes in postgress DB.
I have Kafka running on my local system and i want to use the Kafka connect API in standalone mode to read the postgress server DB changes.
connect-standalone.sh connect-standalone.properties dbezium.properties
i would appreciate if someone can help me with setting up configuration properties for CDC postgress debezium connector
https://www.confluent.io/connector/debezium-postgresql-cdc-connector/
I am following the below to construct the properties
https://debezium.io/docs/connectors/postgresql/#how-the-postgresql-connector-works
The name of the Kafka topics takes by default the form
serverName.schemaName.tableName, where serverName is the logical name
of the connector as specified with the database.server.name
configuration property
and here is what i have come up with for dbezium.properties
name=cdc_demo
connector.class=io.debezium.connector.postgresql.PostgresConnector
tasks.max=1
plugin.name=wal2json
slot.name=debezium
slot.drop_on_stop=false
database.hostname=localhost
database.port=5432
database.user=postgress
database.password=postgress
database.dbname=test
time.precision.mode=adaptive
database.sslmode=disable
Lets say i create a PG schema name as demo and table name as suppliers
So i need to create a topic with name as test.demo.suppliers so that this plugin can push the data to?
Also can someone suggest a docker image which has the postgress server + with suitable replication plugin such as wal2json etc? i am having hard time configuring postgress and the CDC plugin myself.
Check out the tutorial with associated Docker Compose and sample config.
The topic you've come up with sounds correct, but if you have your Kafka broker configured to auto-create topics (which is the default behaviour IIRC) then it will get created for you and you don't need to pre-create it.

Push data from Clickhouse to Kafka

For test purposes I have to push data from a Clickhouse-server to Kafka. I already tried to use the Confluent JDBC connector but this doesen't work very well.
It also seems the Clickhouse-Kafka-Engine only works in the direction Kafka -> Clickhouse, so that the Clickhouse-server works as consumer.
Is ther a convenient way to use a Table in Clickhouse as a producer or have I write my own producer?
I'd suggest offloading tasks like this outside ClickHouse. You can dump the testing data via clickhouse-client and pipe it to a kafka client via shell. Check this out https://github.com/fgeller/kt