Create kafka topic using predefined config files - apache-kafka

Is there any way to create kafka topic in kafka/zookeeper configuration files before I will run the services, so once they will start - the topics will be in place?
I have looked inside of script bin/kafka-topics.sh and found that in the end, it executes a live command on the live server. But since the server is here, its config files are here and zookeeper with its configs also are here, is it a way to predefined topics in advance?
Unfortunately haven't found any existing config keys for this.

The servers need to be running in order to allocate metadata and log directories for them, so no

Related

Kafka file stream connect and stream API

am working on the file stream connector, I have more than ten million records in the file(it's not a single file, its partition by account #). I have to load these files into the topic and update my streams. have gone through stand-alone streams, I have the following question and need help to achieve.
look at the data set, I have two account#, each account has 5 rows, I would need to group them in two rows and key as acctNbr.
how to write my source connector to read the file and get the grouping logic?
my brokers are running in Linux machines X,Y,Z.. post-development of source connector, my jar file should it deploy in every broker(if I start running in the distributed broker )?
I have only 30 mins window to extract file drop to the topic? what are all the parameters that are there to tune the logic to get my working window down? FYI, this topic would have more than 50 partitions and 3 broker set up.
Data set:
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-01","currentPrice":"10","availQnty":"10"}
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-02","currentPrice":"10","availQnty":"10"}
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-03","currentPrice":"10","availQnty":"10"}
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-04","currentPrice":"10","availQnty":"10"}
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-05","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-01","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-02","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-03","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-04","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-05","currentPrice":"10","availQnty":"10"}
how to write my source connector to read the file and get the grouping logic
FileSream connector cannot do this, and was not intended for such a purpose other than an example to write your own connectors. In other words, do not use in production.
That being said, you can use alternative solutions like Flume, Filebeat, Fluentd, NiFi, Streamsets, etc, etc, to glob your filepaths, then send all records line-by-line into a Kafka topic.
post-development of source connector, my jar file should it deploy in every broker
You should not run Connect on any broker. The Connect servers are called workers.
have only 30 mins window to extract file drop to the topic?
Not clear where this number came from. Any of the above methods listed above watch for all new files, without any defined window.

Kafka Connect: can multiple standalone connectors write to the same HDFS directory?

For our pipeline, we have about 40 topics (10-25 partitions each) that we want to write into the same HDFS directory using HDFS 3 Sink Connectors in standalone mode (distributed doesn't work for our current setup). We have tried running all the topics on one connector but encounter problems recovering offsets if it needs to be restarted.
If we divide the topics among different standalone connectors, can they all write into the same HDFS directory? Since the connectors then organize all files in HDFS by topic, I don't think this should be an issue but I'm wondering if anyone has experience with this setup.
Basic example:
Connector-1 config
name=connect-1
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
topics=topic1
hdfs.url=hdfs://kafkaOutput
Connector-2 config
name=connect-2
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
topics=topic2
hdfs.url=hdfs://kafkaOutput
distributed doesn't work for our current setup
You should be able to run connect-distibured in the exact same nodes as connect-standalone is ran.
We have tried running all the topics on one connector but encounter problems recovering offsets if it needs to be restarted
Yeah, I would suggest not bundling all topics into one connector.
If we divide the topics among different standalone connectors, can they all write into the same HDFS directory?
That is my personal recommendation, and yes, they can because the HDFS path is named by the topic name, futher split by the partitioning scheme
Note: The following allow applies to all other storage connectors (S3 & GCS)

Where does Zookeeper keep Kafka ACL list?

Where does Zookeeper(or Kafka) keep its ACL list?
When you run scripts like kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --list --topic test, where does Zookeeper (or Kafka) get its list?
I am trying to find a file that stores all the ACLs.
You can access Zookeeper using the zookeeper-shell.sh script.
There is a znode called kafka-acl where information about ACLs for group, topic, cluster and so on are stored.
You can list for example information about ACLs on topics ls /kafka-acl/Topic.
Then getting information about a specific topic with get /kafka-acl/Topic/test.
Since I landed here searching for the same information and eventually stubbled my way to the answer, I thought to add some additional information. Since Apache Kafka ver 2.0, for topics with patternType=PREFIXED, the acls are stored under a zookeeper node /kafka-acl-extended, this is in addition to the /kafka-acl node which hold topic details of patterType=LITERAL.
For more details, read - KAFKA-KIP-290
For additional reference :
If you look at your Zookeeper configuration file (zoo.cfg or zookeeper.properties), you will see the dataDir parameter, which tells you where the zookeeper stores its data.
For example,
dataDir=/tmp/confluent.iSAdMTvO/zookeeper/data
So, Kafka ACL list will be stored there, but in order to control it or view it, use
zookeeper-shell script. Because if you open the data to see, you won't be able to recognize it. See below for those who are curious.

Kafka confluent 4.1.1 restart issue -- previously created topics not getting displayed

I am using kafka confluent-4.1.1. I created few topics and it worked well. I don't see previously created topics if I restart confluent.
I tried the suggestions mentioned in the post:
Kafka topic no longer exists after restart
But no luck. Did any body face the same issue? Do I need to change any configurations?
Thanks in advance.
What configuration changes do I need to do in order to persist?
confluent start will use the CONFLUENT_CURRENT environmental variable for all its data storage. If you export this to a static location, data should, in theory, persist across reboots.
Otherwise, the standard ways to run each component individually is what you would do in a production environment (e.g. zookeeeper-start, kafka-server-start, schema-registry-start, etc.), which will persist data in whatever settings you've given in their respective configuration files.

How to save a kafka topic at shutdown

I'm configuring my first kafka network. I can't seem to find any support on saving a configured topic. I know I can configure a topic from the quickstart guide here, but how do I save it? I thought I could add the topic info to a .properties file inside the config dir, but I don't see any support for that.
If I shutdown my machine, my topic is deleted. How do I save the configuration?
Could the topic be deleted because you are using the default broker config? With the default config, Kafka logs are stored under /tmp folder. This folder gets wiped out during a machine reboot. You could change the broker config and pick another location for Kafka logs.