KSQL Stream output Topic - apache-kafka

Hi i have Left Join Ksql Stream ( SEARCHREQUESTDTO) with a Ksql Table (NGINX_TABLE). with following ksql command
CREATE STREAM NIGINX_SEARCH_QUERY AS \
SELECT *\
FROM SEARCHREQUESTDTO\
LEFT JOIN NGINX_TABLE\
ON SEARCHREQUESTDTO.sessionid = NGINX_TABLE.sessionid;
Resulting Stream NIGINX_SEARCH_QUERY successfully. also i can see NIGINX_SEARCH_QUERY topic using show topic command in Ksql terminal.
when i try to connect kafka consumer to this topic consumer is not able to fetch any data.
but print NIGINX_SEARCH_QUERY command showing data is publishing in this topic.

If PRINT shows output then the topic does exist and has data.
If your consumer doesn't show output then that's an error with your consumer. So I would rephrase your question as, I have a Kafka topic that my Consumer does not show data for.
I would use kafkacat to check the topic externally:
kafkacat -b kafka-broker:9092 -C -K: \
-f '\nKey (%K bytes): %k\t\nValue (%S bytes): %s\n\Partition: %p\tOffset: %o\n--\n' \
-t NIGINX_SEARCH_QUERY

Related

unable to read avro message via kafka-avro-console-consumer (end goal read it via spark streaming)

(end goal) before trying out whether i could eventually read avro data, usng spark stream, out of the Confluent Platform like some described here: Integrating Spark Structured Streaming with the Confluent Schema Registry
I'd to verify whether I could use below command to read them:
$ kafka-avro-console-consumer \
> --topic my-topic-produced-using-file-pulse-xml \
> --from-beginning \
> --bootstrap-server localhost:9092 \
> --property schema.registry.url=http://localhost:8081
I receive this error message, Unknown magic byte
Processed a total of 1 messages
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
note, The message can be read like this (using console-consumer instead of avro-console-consumer):
kafka-console-consumer \
--bootstrap-server localhost:9092 --group my-group-console \
--from-beginning \
--topic my-topic-produced-using-file-pulse-xml
The message was produced using confluent connect file-pulse (1.5.2) reading xml file (streamthoughts/kafka-connect-file-pulse)
Please help here:
Did I use the kafka-avro-console-consumer wrong?
I tried "deserializer" properties options described here: https://stackoverflow.com/a/57703102/4582240, did not help
I did not want to be brave to start the spark streaming to read the data yet.
the file-pulse 1.5.2 properties i used are like below added 11/09/2020 for completion.
name=connect-file-pulse-xml
connector.class=io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector
topic= my-topic-produced-using-file-pulse-xml
tasks.max=1
# File types
fs.scan.filters=io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.RegexFileListFilter
file.filter.regex.pattern=.*\\.xml$
task.reader.class=io.streamthoughts.kafka.connect.filepulse.reader.XMLFileInputReader
force.array.on.fields=sometagNameInXml
# File scanning
fs.cleanup.policy.class=io.streamthoughts.kafka.connect.filepulse.clean.LogCleanupPolicy
fs.scanner.class=io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker
fs.scan.directory.path=/tmp/kafka-connect/xml/
fs.scan.interval.ms=10000
# Internal Reporting
internal.kafka.reporter.bootstrap.servers=localhost:9092
internal.kafka.reporter.id=connect-file-pulse-xml
internal.kafka.reporter.topic=connect-file-pulse-status
# Track file by name
offset.strategy=name
If you are getting Unknown Magic Byte with the consumer, then the producer didn't use the Confluent AvroSerializer, and might have pushed Avro data that doesn't use the Schema Registry.
Without seeing the Producer code or consuming and inspecting the data in binary format, it is difficult to know which is the case.
The message was produced using confluent connect file-pulse
Did you use value.converter with the AvroConverter class?

Kafka Connect FileStreamSink connector removes quotation marks and changes colon to equal sign for JSON message

Summary
When I stream this with the console producer
{"id":1337,"status":"example_topic_1 success"}
I get this in from my filestream consumer
/data/example_topic_1.txt
{id=1337, status=example_topic_1 success}
This is a major problem for me, because the original JSON message cannot be recovered without making assumptions about where the quotes used to be. How can I output the messages to a file, while preserving the quotation marks?
Details
First, I start my file sink connector.
# sh bin/connect-standalone.sh \
> config/worker.properties \
> config/connect-file-sink-example_topic_1.properties
Second, I start console consumer (also built in to Kafka) so that I have easy visual confirmation that the messages are coming through correctly.
# sh bin/kafka-console-consumer.sh \
> --bootstrap-server kafka_broker:9092 \
> --topic example_topic_1
Finally, I start a console producer for sending messages, and I enter a message.
# sh bin/kafka-console-producer.sh \
> --broker-list kafka_broker:9092 \
> --topic example_topic_1
From the console consumer, the message pops out correctly, with quotes.
{"id":1337,"status":"example_topic_1 success"}
But I get this from my the FileStreamSink consumer:
/data/example_topic_1.txt
{id=1337, status=example_topic_1 success}
My Configuration
config/worker.properties
offset.storage.file.filename=/tmp/example.offsets
bootstrap.servers=kafka_broker:9092
offset.flush.interval.ms=10000
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
config/connect-file-sink-example_topic_1.properties
name=file-sink-example_topic_1
connector.class=FileStreamSink
tasks.max=1
file=/data/example_topic_1.txt
topics=example_topic_1
Since you're not actually wanting to parse the JSON data, but just pass it straight through as a lump of text, you need to use the StringConverter:
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
This article explains more about the nuances of converters: https://rmoff.net/2019/05/08/when-a-kafka-connect-converter-is-not-a-converter/. This shows an example of what you're trying to do, although uses kafkacat in place of the console producer/consumer.

How to fetch recent messages from Kafka topic

Do we have any option like fetching recent 10/20/ etc., messages from Kafka topic. I can see --from-beginning option to fetch all messages from the topic but if I want to fetch only few messages first, last, middle or latest 10. do we have some options?
First N messages
You can use --max-messages N in order to fetch the first N messages of a topic.
For example, to get the first 10 messages, run
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --max-messages 10
Next N messages
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --max-messages 10
Last N messages
To get the last N messages, you need to define a specific partition and the offset:
bin/kafka-simple-consumer-shell.sh --bootstrap-server localhost:9092 --topic test--partition testPartition --offset yourOffset
M to N messages
Again, for this case you'd have to define both the partition and the offset.
For example, you can run the following in order to get N messages starting from an offset of your choice:
bin/kafka-simple-consumer-shell.sh --bootstrap-server localhost:9092 --topic test--partition testPartition --offset yourOffset --max-messages 10
If you don't want to stick to the binaries, I would suggest you to use kt which is a Kafka command line tool with more options and functionality.
For more details refer to the article How to fetch specific messages in Apache Kafka
Without specifying an offset and partition, you'll only be able to consume next N or first N. To consume in the "middle" of the unbounded stream, you need to give the offset
Other than console consumer, there's kafkacat
First twenty
kafkacat -C -b -t topic -o earliest -c 20
And from previous twenty (from partition zero)
kafkacat -C -b -t topic -P 0 -o -20

Is it possible to see the data partition wise in a kafka topic?

I have start working with Kafka few days ago. I am using Kafka on Windows environments, I want to see the data in each partition of a Kafka topic.
I have a topic called ExampleTopic with replication.factor set to 3 and 3 partitions. I am able to see the data into the topic but I want to see which messages are going in which partitions.
Please let me know is it possible if yes then how?
You can use the --partition parameter of the Kafka console consumer to specify which partition to consume from:
bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic ExampleTopic --partition 0
You can also specify an --offset parameter that indicates which offset to start from. If absent, the consumption starts at the end of the partition.
I have got an GUI based tool to view data in each partition of a topic named kafka tool.
http://www.kafkatool.com
It’s a tool to manage our kafka cluster. Also provide many features should try.
Use kafkacat, e.g. :
$ kafkacat -b localhost:9092 -t my_topic -C \
-f '\nKey (%K bytes): %k\t\nValue (%S bytes): %s\n\
Timestamp: %T\tPartition: %p\tOffset: %o\n--\n'
Key (1 bytes): 1
Value (79 bytes): {"uid":1,"name":"Cliff","locale":"en_US","address_city":"St Louis","elite":"P"}
Timestamp: 1520618381093 Partition: 0 Offset: 0

Terminate Kafka Console Consumer when all the messages have been read

I know there has to be a way to do this, but I am not able to figure this out. I need to stop the kafka consumer once I have read all the messages from the queue.
Can somebody provide any info on this?
You can pass parameter: -consumer-timeout-ms with a value when starting the consumer and it will throw an exception if no messages have been read during that time. For example, to stop the consumer if no new messages have arrived in the last 2 seconds:
kafka.consumer.ConsoleConsumer -consumer-timeout-ms 2000
You can see this and all the other input options here
Currently, Kafka version 2.11-2.1.1 has a script called kafka-console-consumer.sh.
It has a new flag: --timeout-ms.
Basically, this flag is the maximum time to wait before exiting when there is no new log to wait. It's in millisecond term.
You can use this property to end you console consumer after reading all the messages.
You can use SimpleConsumerShell with no-wait-at-logend option. See SystemTools-SimpleConsumerShell
For example:
./kafka-run-class.bat kafka.tools.SimpleConsumerShell --broker-list localhost:9092 --topic kafkademo --partition 0 --no-wait-at-logend
If you are not dead set on using the Scala client, try kafkacat with the -e option telling it to exit when end of partition has been reached.
E.g. to consume all messages from mytopic partition 2 and then exit:
$ kafkacat -b mybroker -t mytopic -p 2 -o beginning -e
Or consume the last 3000 messages and then exit:
$ kafkacat -b mybroker -t mytopic -p 2 -o -3000 -e