kafka won't connect with logstash - apache-kafka

i am trying to connect kafka to logstash using https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
I have the kafka and zookeeper running,(i've verified this by creating a producer and consumer in python), but logstash won't detect kafka,
I have installed the kafka input plug-in, this is what my conf file looks like :
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["divolte-data"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "divolte-data"
}
}
any help would be appreciated.

I guess the problem is with the version. Since you're running on ES 2.3, it's not compatible to use bootstrap_servers within your kafka input plugin, which was introduced from the version of 5.0.
As per the doc, you should be using zk_connect instead of bootstrap_servers, like this:
kafka {
zk_connect => "localhost:9092"
}

Related

Exception in Flink Streaming to Kafka Avro Sink java.lang.IllegalAccessException: Class org.apache.avro.specific.SpecificData

I'm Using flink streaming to read events from Kafka source topic and after de-duplication, writing to separate kafka topic in avro topic.
Flow
Kafka Topic(json format) -> flink streaming(de-duplication) -> scala
case class objects -> Kafka Topic(Avro Format)
val sink = sinkProvider.getKafkaSink(brokerURL, targetTopic,kafkaTransactionMaxTimeoutMs, kafkaTransactionTimeoutMs)
messageStream
.map {
record =>
convertJsonToExample(record)
}
.sinkTo(sink)
.name("Example Kafka Avro Sink")
.uid("Example-Kafka-Avro-Sink")
Here are the steps I followed:
I created avro schema for my output schema
{
"type":"record",
"name":"Example",
"namespace":"ca.ix.dcn.test",
"fields":[
{
"name":"x",
"type":"string"
},
{
"name":"y",
"type":"long"
}
]
}
From avro schema I generated case class using avro-hugger tools(version 1.2.1) for
SpecificRecord
I used flink AvroSerializationSchema forSpecificRecord cause flink
kafka avro sink let's you use either specific record or generic
record constructor for serialization to avro.
def getKafkaSink(brokers: String, targetTopic: String,transactionMaxTimeoutMs:String,transactionTimeoutMs:String) = {
val schema = ReflectData.get.getSchema(classOf[Example])
val sink = KafkaSink.builder()
.setBootstrapServers(brokers)
.setProperty("transaction.max.timeout.ms",transactionMaxTimeoutMs)
.setProperty("transaction.timeout.ms",transactionTimeoutMs)
.setRecordSerializer(KafkaRecordSerializationSchema.builder()
.setTopic(targetTopic)
.setValueSerializationSchema(AvroSerializationSchema.forSpecific[Example](classOf[Example]))
.setPartitioner(new FlinkFixedPartitioner())
.build()
)
.setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE)
.build()
sink
}
Now when I run it I get the exeption:
Caused by: org.apache.avro.AvroRuntimeException: java.lang.IllegalAccessException: Class org.apache.avro.specific.SpecificData can not access a member of class ca.ix.dcn.test with modifiers "private final"
at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:405)
at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:734)
I saw there is a bug opened on flink for this:
https://issues.apache.org/jira/browse/FLINK-18478
But I didn't find any work around for this.
Wondering if there is any workaround for this. Also if there are detailed examples that explain how to use flink streaming sink(for avro) using AvroSerializationSchema(Specific/Generic)
Appreciate the help on this.
In the Flink ticket that you're linking to, there's a comment made that avro-hugger is not really compatible with the Apache Avro Java library, see https://issues.apache.org/jira/browse/FLINK-18478?focusedCommentId=17164456&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17164456
The solution would be to generate Avro Java POJOs and use them in your Scala application.

How to run the mongo-kafka connector as a source for kafka and integrate that with logstash input to use elasticsearch as a sink?

I have created a build of https://github.com/mongodb/mongo-kafka
But how does this run to connect with my running kafka instance.
Even how stupid this question sound. But there is no documentation seems to be available to make this working with locally running replicaset of mongodb.
All blogs point to using mongo atlas instead.
If you have a good resource, please guide me towards it.
UPDATE 1 --
Used maven plugin - https://search.maven.org/artifact/org.mongodb.kafka/mongo-kafka-connect
Placed it in kafka plugins, restarted kafka.
UPDATE 2 -- How to enable mongodb as source for kafka?
https://github.com/mongodb/mongo-kafka/blob/master/config/MongoSourceConnector.properties
file to be used as a configuration for Kafka
bin/kafka-server-start.sh config/server.properties --override config/MongoSourceConnector.properties
UPDATE 3 - The above method hasn't worked going back to the blog which does not mention what the port 8083 is.
Installed Confluent and confluent-hub, still unsure of the mongo-connector working with kafka.
UPDATE 4 -
Zookeeper, Kafka Server, Kafka connect running
Mongo Kafka Library Files
Kafka Connect Avro Connector Library Files
Using below commands my source got working -
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/connect-standalone.sh config/connect-standalone.properties config/MongoSourceConnector.properties
And using below configuration for logstash I was able to push data into elasticsearch -
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["users","organisations","skills"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
stdout { codec => rubydebug }
}
So now one MongoSourceConnector.properties keeps a single collection name it reads from, I need to run kafka connect with different property files for each collection.
My Logstash is pushing new data into elasticsearch, instead of updating old data. Plus it is not creating indexes as per the name of the collection. The idea is this should be able to sync with my MongoDB Database perfectly.
FINAL UPDATE - Everything is now working smoothly,
Created multiple properties files for kafka connect
The latest logstash actually creates index as per the topic name, and updates the indexes accordingly
input {
kafka {
bootstrap_servers => "localhost:9092"
decorate_events => true
topics => ["users","organisations","skills"]
}
}
filter {
json {
source => "message"
target => "json_payload"
}
json {
source => "[json_payload][payload]"
target => "payload"
}
mutate {
add_field => { "[es_index]" => "%{[#metadata][kafka][topic]}" }
rename => { "[payload][fullDocument][_id][$oid]" => "mongo_id"}
rename => { "[payload][fullDocument]" => "document"}
remove_field => ["message","json_payload","payload"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{es_index}"
action => "update"
doc_as_upsert => true
document_id => "%{mongo_id}"
}
stdout {
codec =>
rubydebug {
metadata => true
}
}
}
Steps to successfully get MongoDb syncing with Elasticsearch -
First deploy the mongodb Replica -
//Make sure no mongo deamon instance is running
//To check all the ports which are listening or open
sudo lsof -i -P -n | grep LISTEN
//Kill the process Id of mongo instance
sudo kill 775
//Deploy replicaset
mongod --replSet "rs0" --bind_ip localhost --dbpath=/data/db
Create config properties For Kafka
//dummycollection.properties <- Filename
name=dummycollection-source
connector.class=com.mongodb.kafka.connect.MongoSourceConnector
tasks.max=1
# Connection and source configuration
connection.uri=mongodb://localhost:27017
database=dummydatabase
collection=dummycollection
copy.existing=true
topic.prefix=
poll.max.batch.size=1000
poll.await.time.ms=5000
# Change stream options
publish.full.document.only=true
pipeline=[]
batch.size=0
collation=
Make sure JAR files from below urls are available for your kafka plugins -
Maven Central Repository Search
Kafka Connect Avro Converter
Deploy kafka
//Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
//Kaka Server
bin/kafka-server-start.sh config/server.properties
//Kaka Connect
bin/connect-standalone.sh config/connect-standalone.properties config/dummycollection.properties
Config Logstash -
// /etc/logstash/conf.d/apache.conf <- File
input {
kafka {
bootstrap_servers => "localhost:9092"
decorate_events => true
topics => ["dummydatabase.dummycollection"]
}
}
filter {
json {
source => "message"
target => "json_payload"
}
json {
source => "[json_payload][payload]"
target => "payload"
}
mutate {
add_field => { "[es_index]" => "%{[#metadata][kafka][topic]}" }
rename => { "[payload][fullDocument][_id][$oid]" => "mongo_id"}
rename => { "[payload][fullDocument]" => "document"}
remove_field => ["message","json_payload","payload"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{es_index}"
action => "update"
doc_as_upsert => true
document_id => "%{mongo_id}"
}
stdout {
codec =>
rubydebug {
metadata => true
}
}
}
Start ElasticSearch, Kibana and Logstash
sudo systemctl start elasticsearch
sudo systemctl start kibana
sudo systemctl start logstash
Test
Open Mongo Compass, and
create a collection, Mention those collection in logstash topics and create properties files for Kafka
Add data to it
Update Data
Review indexes in Elasticsearch
Port 8083 is Kafka Connect, which you start with one of the connect-*.sh scripts.
It is standalone from the broker, and properties do not get set from kafka-server-start

Logstash: Kafka Output Plugin - Issues with Bootstrap_Server

I am trying to use the logstash-output-kafka in logstash:
Logstash Configuration File
input {
stdin {}
}
output {
kafka {
topic_id => "mytopic"
bootstrap_server => "[Kafka Hostname]:9092"
}
}
However, when executing this configuration, I am getting this error:
[ERROR][logstash.agent ] Failed to execute action
{:action=>LogStash::PipelineAction::Create/pipeline_id:main,
:exception=>"LogStash::ConfigurationError", :message=>"Something is wrong
with your configuration."
I tried to change "[Kafka Hostname]:9092" to "localhost:9092", but that also fails to connect to kafka. Only when I remove the bootstrap_server configuration (which then defaults to localhost:9092) then the kafka connection seems to be established.
Is there something wrong with the bootstrap_server configuration of the kafka output plugin? i am using logstash v6.4.1, logstash-output-kafka v7.1.3
I think there is a typo in your configuration. Instead of bootstrap_server you need to define bootstrap_servers.
input {
stdin {}
}
output {
kafka {
topic_id => "mytopic"
bootstrap_servers => "your_Kafka_host:9092"
}
}
According to Logstash Docs,
bootstrap_servers
Value type is string
Default value is "localhost:9092"
This is for bootstrapping and the producer will only
use it for getting metadata (topics, partitions and replicas). The
socket connections for sending the actual data will be established
based on the broker information returned in the metadata. The format
is host1:port1,host2:port2, and the list can be a subset of brokers or
a VIP pointing to a subset of brokers.

apache kafka producer NetworkClient broker server disconnected

I m newbie on the apache kafka and I have problem about the connection to apache kafka.When I run configuration with java based apache kafka producer its working but when I added packaged jar file and run on apache metron kafka broker,error occure as seen below
2017-03-17 07:32:32 WARN NetworkClient:568 - Bootstrap broker node1:6667 disconnected
configuration
bootstrap.servers=node1:6667
acks=all
retries=0
batch.size=16384
linger.ms=1
buffer.memory=33554432
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
Code :
sender.send(new ProducerRecord<S, String>(getSenderTopic(), (S) getSenderKey(),
data), new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
if (e != null) {
LOGGER.error("exception ", e);
sender.close();
}
}
});
Apache kafka broker run on the apache metron and configuration host is setted node1:6667
I think its network side problem but what is your suggesition on it?.
Thanks
so after research , I have solve the problem , problem on the kafka versions apache metron use 10.0.1 and we use 10.1.0 , changed on the pom xml problem solved.

LogStash Kafka output/input is not working

I try to use logstash with kafka broker. It can not integrated together.
Each version are,
logstash 2.4
kafka 0.8.2.2 with scala 2.10
The kafka input config file is:
input {
stdin{
}
}
output{
stdout {
codec => rubydebug
}
kafka {
bootstrap_servers => 10.120.16.202:6667,10.120.16.203:6667,10.120.16.204:6667'
topic_id => "cephosd1"
}
}
I can list topic cephosd1 from kafka.
The stdout could print out content, also.
But I can not read anything from kafka-console-consumer.sh .
I think you have a compatibility issue. If you check the version compatibility matrix between Logstash, Kafka and the kafka output plugin, you'll see that the kafka output plugin in Logstash 2.4 uses the Kafka 0.9 client version.
If you have a Kafka broker 0.8.2.2, it is not compatible with the client version 0.9 (the other way around would be ok). You can either downgrade to Logstash 2.0 or upgrade your Kafka broker to 0.9.