cmd for hLDA topic modeling in mallet - mallet

I am trying to use hLDA for topic modeling in mallet.Ihave already checked this. Using cmd
bin\mallet train-topics --input tutorial.mallet
according to thistutorial. By default LDA topic modeling is being performed. How can I change it into hLDA?
Any suggestion would be helpful.

Use Mallet version 2.0.7 and then
bin/mallet hlda --input tutorial.mallet

Related

kcat protobuf deserialization

I'm using kcat to check the content of kafka topics when working locally but, when messages are serialized with protobuf, the result I get is an unreadable stream of encoded characters. I'm aware of the existence of some other kafka-consumers tools (Kafdrop, AKHQ, Kowl, Kadek...) but I'm looking for the simplest option which fits my needs.
Does kcat support protobuf key/value deserialization from protofile?
Is there any simple terminal-based tool which allows this?
I've had luck with this command:
% kcat -C -t <topic> -b <kafkahost>:9092 -o -1 -e -q -D "" | protoc --decode=<full message class> path/to/my.proto --proto_path <proto_parent_folder>
any simple terminal-based tool which allows this
Only ones that integrate with the Confluent Schema Registry (which is what those linked tools use as well), e.g. kafka-protobuf-console-consumer is already part of Confluent Platform.
Regarding kcat - refer https://github.com/edenhill/kcat/issues/72 and linked issues

Is it possible to push messages to Kafka from Google Dataflow?

Is there any way to connect Kafka as sink in Google Dataflow? I know we can use CloudPubSubConnector with pub/sub and Kafka, but I dont want to use Pub/sub in between Dataflow and Kafka.
Thanks,
Bala
Yes (assuming you are using Java SDK). See 'Writing to Kafka' with usabe example in JavaDoc for KafkaIO : https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L221
If you're writing DataFlow jobs in Python you can use Confluents Kafka client
[https://github.com/confluentinc/confluent-kafka-python][1]
and write you own Beam Sink/Source interface. There is a guide for writing your own sinks/sources in Beam [https://beam.apache.org/documentation/sdks/python-custom-io/][1]

measure kafka performance metrics(transactions per second,memory)

We are using connecter Golden gate(producer) to kafka to stream messages.All configurations/machines resides on AWS Ec2 and would like to measure the performance(tps,memory,cpu). we have linux machines which we can use for performance evaluation. Could anyone suggest how to get the TPS,memory usage in linux? Appreciate your help.
Apache Kafka ships with performance testing tools in the ./bin sub directory such as bin/kafka-producer-perf-test.sh and bin/kafka-consumer-perf-test

kafka consumer in shell script

I am newbie in Kafka. I want to consume remote kafka message in shell script. Basically I have linux machine where I cannot run any web server (some strange reasons) only thing I can do is use crontab/shell script to listen for kafka message which is hosted remotely. Is it possible to write simple shell script which will consume kafka message, parse it and take corresponding action.
kafka clients are available in multiple languages. you can use any client, you don't need any web server or browser for it.
you may use shell script for consuming message & parsing but that script have to use any kafka client provided here because currently there is no client written in pure shell script.
Kafka has provided kafka client console producer and consumer, you can use that as well.
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
follow the document properly.
You could also use the tool kafkacat which is documented for example here.
This is a very powerful and fast tool to read data out of Kafka from the console and is open source: https://github.com/edenhill/kafkacat.
Many exmples are provided on GitHub and one example is shown below:
kafkacat -C -b mybroker -t mytopic

Kafka: Is it possible to create topic with specified replication factor by java client

From the official document of Kafka, it said below from 4.7 Replication
you can set this replication factor on a topic-by-topic basis
But from the javadoc of its java client, I can't find any API is relating to createTopic with replication factor. Is it only possible by the shell script it provided?
You may use AdminUtils.createTopic() method from kafka.admin package - https://github.com/apache/kafka/blob/97e61d4ae2feaf0551e75fa8cdd041f49f42a9a5/core/src/main/scala/kafka/admin/AdminUtils.scala#L409-L418