Logstash: Kafka Output Plugin - Issues with Bootstrap_Server - apache-kafka

I am trying to use the logstash-output-kafka in logstash:
Logstash Configuration File
input {
stdin {}
}
output {
kafka {
topic_id => "mytopic"
bootstrap_server => "[Kafka Hostname]:9092"
}
}
However, when executing this configuration, I am getting this error:
[ERROR][logstash.agent ] Failed to execute action
{:action=>LogStash::PipelineAction::Create/pipeline_id:main,
:exception=>"LogStash::ConfigurationError", :message=>"Something is wrong
with your configuration."
I tried to change "[Kafka Hostname]:9092" to "localhost:9092", but that also fails to connect to kafka. Only when I remove the bootstrap_server configuration (which then defaults to localhost:9092) then the kafka connection seems to be established.
Is there something wrong with the bootstrap_server configuration of the kafka output plugin? i am using logstash v6.4.1, logstash-output-kafka v7.1.3

I think there is a typo in your configuration. Instead of bootstrap_server you need to define bootstrap_servers.
input {
stdin {}
}
output {
kafka {
topic_id => "mytopic"
bootstrap_servers => "your_Kafka_host:9092"
}
}
According to Logstash Docs,
bootstrap_servers
Value type is string
Default value is "localhost:9092"
This is for bootstrapping and the producer will only
use it for getting metadata (topics, partitions and replicas). The
socket connections for sending the actual data will be established
based on the broker information returned in the metadata. The format
is host1:port1,host2:port2, and the list can be a subset of brokers or
a VIP pointing to a subset of brokers.

Related

Logstash pipeline issues when sending to multiple Kafka topics

I am using Logstash to extract change data from an SQL Server DB and send it to different Kafka topics.
Some Logstash config files send to the Ticket Topic others to the Availability topic
If I run just the configs that send to the Ticket topic on their own using the pipeline it works fine. If I run the configs for availability topic on their own in a pipeline they send the data ok.
However when I include the configs to send to both topics together I get the error. Please see extract from the logs. This time the availability topic failed other times the ticket topic fails.
[2021-03-22T07:30:00,172][WARN ][org.apache.kafka.clients.NetworkClient][AvaililityDOWN] [Producer clientId=Avail_down1] Error while fetching metadata with correlation id 467 : {dcsvisionavailability=TOPIC_AUTHORIZATION_FAILED}
[2021-03-22T07:30:00,172][ERROR][org.apache.kafka.clients.Metadata][AvaililityDOWN] [Producer clientId=Avail_down1] Topic authorization failed for topics [dcsvisionavailability]
[2021-03-22T07:30:00,203][INFO ][logstash.inputs.jdbc ][Ticket1][a296a0df2f603fe98d8c108e860be4d7a17f840f9215bb90c5254647bb9c37cd] (0.004255s) SELECT sys.fn_cdc_map_lsn_to_time(__$start_lsn) transaction_date, abs(convert(bigint, __$seqval)) seqval, * FROM cdc.dbo_TICKET_CT where ( __$operation = 2 or __$operation = 4) and modified_date > '2021-03-22T07:27:00.169' order by modified_date ASC
[2021-03-22T07:30:00,203][INFO ][logstash.inputs.jdbc ][AvailabilityMAXUP][7805e7bd44f20b373e99845b687dc15d7c2a3de084fb4424dd492be93b39b64a] (0.004711s) With Logstash as(
SELECT sys.fn_cdc_map_lsn_to_time(__$start_lsn) transaction_date, abs(convert(bigint, __$seqval)) seqval, *
FROM cdc.dbo_A_TERM_MAX_UPTIME_DAY_CT
)
select * from Logstash
where ( __$operation = 2 or __$operation = 4 or __$operation = 1 ) and TMZONE = 'Etc/UTC' and transaction_date > '2021-03-22T07:15:00.157' order by seqval ASC
[2021-03-22T07:30:00,281][WARN ][org.apache.kafka.clients.NetworkClient][AvailabilityMAXUP] [Producer clientId=Avail_MaxUp1] Error while fetching metadata with correlation id 633 : {dcsvisionavailability=TOPIC_AUTHORIZATION_FAILED}
[2021-03-22T07:30:00,281][ERROR][org.apache.kafka.clients.Metadata][AvailabilityMAXUP] [Producer clientId=Avail_MaxUp1] Topic authorization failed for topics [dcsvisionavailability]
[2021-03-22T07:30:00,297][WARN ][org.apache.kafka.clients.NetworkClient][AvaililityDOWN] [Producer clientId=Avail_down1] Error while fetching metadata with correlation id 468 : {dcsvisionavailability=TOPIC_AUTHORIZATION_FAILED}
[2021-03-22T07:30:00,297][ERROR][org.apache.kafka.clients.Metadata][AvaililityDOWN] [Producer clientId=Avail_down1] Topic authorization failed for topics [dcsvisionavailability]
[2021-03-22T07:30:00,406][WARN ][org.apache.kafka.clients.NetworkClient][AvailabilityMAXUP] [Producer clientId=Avail_MaxUp1] Error while fetching metadata with correlation id 634 : {dcsvisionavailability=TOPIC_AUTHORIZATION_FAILED}
[2021-03-22T07:30:00,406][ERROR][org.apache.kafka.clients.Metadata][AvailabilityMAXUP] [Producer clientId=Avail_MaxUp1] Topic authorization failed for topics [dcsvisionavailability]
[2021-03-22T07:30:00,406][WARN ][logstash.outputs.kafka ][AvailabilityMAXUP][3685b3e90091e526485060db8df552a756f11f0f7fd344a5051e08b484a8ff8a] producer send failed, dropping record {:exception=>Java::OrgApacheKafkaCommonErrors::TopicAuthorizationException, :message=>"Not authorized to access topics: [dcsvisionavailability]", :record_value=>"<A_TERM_MAX_UPTIME_DAY>\
This is the output section of the availability config
output {
kafka {
bootstrap_servers => "namespaceurl.windows.net:9093"
topic_id => "dcsvisionavailability"
security_protocol => "SASL_SSL"
sasl_mechanism => "PLAIN"
jaas_path => "C:\Logstash\keys\kafka_sasl_jaasAVAILABILITY.java"
client_id => "Avail_MaxUp1"
codec => line {
format => "<A_TERM_MAX_UPTIME_DAY>
<stuff deleted>"
}
}
}
The pipeline.yml file has this in it
## Ticket Topic
- pipeline.id: Ticket1
path.config: "TicketCT2KafkaEH8.conf"
queue.type: persisted
- pipeline.id: PublicComments1
path.config: "Public_DiaryCT2KafkaEH1.conf"
queue.type: persisted
## - Availability topic
- pipeline.id: AvailabilityDOWN
path.config: "Availability_Down_TimeCT2KafkaEH3.conf"
queue.type: persisted
- pipeline.id: AvailabilityMAXUP
path.config: "Availability_Max_UptimeCT2KafkaEH2.conf"
queue.type: persisted
I have tried running in different instances and yes that works where I have the Pipeline running and open another command window and run one other config sending to a different topic (for this I specify a different --path.data )
However with 40 configs going to 4 different topics I don't really want to run so lots of instances in parallel. Any advice welcomed
I have been able to resolve the issue. It was to do with the jaas_path file.
I had a different jaas_path file for each topic that specified the topic in the string as follows in the EntityPath.
KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="$ConnectionString"
password="Endpoint=sb://<stuff redacted>.windows.net/;SharedAccessKeyName=keyname;SharedAccessKey=<key redacted>;EntityPath=dcsvisionavailability";
};
When I provided common key from Event Hub for use with all the topics that does not have the ;EntityPath=topicname at the end it worked.
This makes sense since I had already specified the topic in the line
topic_id => "dcsvisionavailability" in the logstash conf file.
I'm glad it worked Walter.
For others who may look up for solution on this issue, the issue here was same shareAccessKeyName was used with different tokens for different Kafka topics for authorization at EventHub.
That is why, it all worked fine when requests targeted to only one topic came in for authorization.
When requests for different topics came to EventHub at the same time with same shareAccessKeyName but with different tokens, only one would go through but the other would exception out due to conflict of token.
Different options for solving this were
use same token for all requests (for all topics)
use different shareAccessKeyName with different token for each topic.
Walter chose#1 and needed topic name to be removed in password (as EventHub might have had topic wise authorization).
For topic wise authorization, solution #2 will be better.

How to run the mongo-kafka connector as a source for kafka and integrate that with logstash input to use elasticsearch as a sink?

I have created a build of https://github.com/mongodb/mongo-kafka
But how does this run to connect with my running kafka instance.
Even how stupid this question sound. But there is no documentation seems to be available to make this working with locally running replicaset of mongodb.
All blogs point to using mongo atlas instead.
If you have a good resource, please guide me towards it.
UPDATE 1 --
Used maven plugin - https://search.maven.org/artifact/org.mongodb.kafka/mongo-kafka-connect
Placed it in kafka plugins, restarted kafka.
UPDATE 2 -- How to enable mongodb as source for kafka?
https://github.com/mongodb/mongo-kafka/blob/master/config/MongoSourceConnector.properties
file to be used as a configuration for Kafka
bin/kafka-server-start.sh config/server.properties --override config/MongoSourceConnector.properties
UPDATE 3 - The above method hasn't worked going back to the blog which does not mention what the port 8083 is.
Installed Confluent and confluent-hub, still unsure of the mongo-connector working with kafka.
UPDATE 4 -
Zookeeper, Kafka Server, Kafka connect running
Mongo Kafka Library Files
Kafka Connect Avro Connector Library Files
Using below commands my source got working -
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/connect-standalone.sh config/connect-standalone.properties config/MongoSourceConnector.properties
And using below configuration for logstash I was able to push data into elasticsearch -
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["users","organisations","skills"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
stdout { codec => rubydebug }
}
So now one MongoSourceConnector.properties keeps a single collection name it reads from, I need to run kafka connect with different property files for each collection.
My Logstash is pushing new data into elasticsearch, instead of updating old data. Plus it is not creating indexes as per the name of the collection. The idea is this should be able to sync with my MongoDB Database perfectly.
FINAL UPDATE - Everything is now working smoothly,
Created multiple properties files for kafka connect
The latest logstash actually creates index as per the topic name, and updates the indexes accordingly
input {
kafka {
bootstrap_servers => "localhost:9092"
decorate_events => true
topics => ["users","organisations","skills"]
}
}
filter {
json {
source => "message"
target => "json_payload"
}
json {
source => "[json_payload][payload]"
target => "payload"
}
mutate {
add_field => { "[es_index]" => "%{[#metadata][kafka][topic]}" }
rename => { "[payload][fullDocument][_id][$oid]" => "mongo_id"}
rename => { "[payload][fullDocument]" => "document"}
remove_field => ["message","json_payload","payload"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{es_index}"
action => "update"
doc_as_upsert => true
document_id => "%{mongo_id}"
}
stdout {
codec =>
rubydebug {
metadata => true
}
}
}
Steps to successfully get MongoDb syncing with Elasticsearch -
First deploy the mongodb Replica -
//Make sure no mongo deamon instance is running
//To check all the ports which are listening or open
sudo lsof -i -P -n | grep LISTEN
//Kill the process Id of mongo instance
sudo kill 775
//Deploy replicaset
mongod --replSet "rs0" --bind_ip localhost --dbpath=/data/db
Create config properties For Kafka
//dummycollection.properties <- Filename
name=dummycollection-source
connector.class=com.mongodb.kafka.connect.MongoSourceConnector
tasks.max=1
# Connection and source configuration
connection.uri=mongodb://localhost:27017
database=dummydatabase
collection=dummycollection
copy.existing=true
topic.prefix=
poll.max.batch.size=1000
poll.await.time.ms=5000
# Change stream options
publish.full.document.only=true
pipeline=[]
batch.size=0
collation=
Make sure JAR files from below urls are available for your kafka plugins -
Maven Central Repository Search
Kafka Connect Avro Converter
Deploy kafka
//Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
//Kaka Server
bin/kafka-server-start.sh config/server.properties
//Kaka Connect
bin/connect-standalone.sh config/connect-standalone.properties config/dummycollection.properties
Config Logstash -
// /etc/logstash/conf.d/apache.conf <- File
input {
kafka {
bootstrap_servers => "localhost:9092"
decorate_events => true
topics => ["dummydatabase.dummycollection"]
}
}
filter {
json {
source => "message"
target => "json_payload"
}
json {
source => "[json_payload][payload]"
target => "payload"
}
mutate {
add_field => { "[es_index]" => "%{[#metadata][kafka][topic]}" }
rename => { "[payload][fullDocument][_id][$oid]" => "mongo_id"}
rename => { "[payload][fullDocument]" => "document"}
remove_field => ["message","json_payload","payload"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{es_index}"
action => "update"
doc_as_upsert => true
document_id => "%{mongo_id}"
}
stdout {
codec =>
rubydebug {
metadata => true
}
}
}
Start ElasticSearch, Kibana and Logstash
sudo systemctl start elasticsearch
sudo systemctl start kibana
sudo systemctl start logstash
Test
Open Mongo Compass, and
create a collection, Mention those collection in logstash topics and create properties files for Kafka
Add data to it
Update Data
Review indexes in Elasticsearch
Port 8083 is Kafka Connect, which you start with one of the connect-*.sh scripts.
It is standalone from the broker, and properties do not get set from kafka-server-start

How to change the key of the message with mqtt source connect

I have a java application sendind data into a mosquitto broker (mqtt), and using kafka connect, I'm sending this data from the mqtt broker to a kafka topic, but the problem is, always when the mqtt source connect sends the data, by default the key is always the name of the kafka topic, and I need to change it. I used SMT (Single Message Transforms), I could change the key, but it is encoded in base64, any idea how can I can convert and get only the value of the id?
my connector config with the transformation:
name=MqttSourceConnector
connector.class=io.confluent.connect.mqtt.MqttSourceConnector
mqtt.qos=1
tasks.max=2
mqtt.clean.session.enabled=true
mqtt.server.uri=tcp://mosquitto-server:1883
mqtt.connect.timeout.seconds=30
key.converter.schemas.enable=false
value.converter.schemas.enable=false
mqtt.topics=mqtt_topic
mqtt.keepalive.interval.seconds=60
kafka.topic=kafka_topic
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
transforms=createMap,createKey,extractInt
transforms.createMap.type=org.apache.kafka.connect.transforms.HoistField$Value
transforms.createMap.field=id
transforms.createKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.createKey.fields=id
transforms.extractInt.type=org.apache.kafka.connect.transforms.ExtractField$Value
transforms.extractInt.field=id
And what is getting into my kafka topic is:
Key { id: eyJpZCI6IlNCMDQiLCJ0ZW1wZXJhdHVyZSI6MjQuOTk2OTQyOTk0NDIyMTM4LCJodW1pZGl0eSI6MzkuNzUyNjQzNTk3MjcyMjYsInRpbWVzdGFtcCI6MTUzMjY4NTEzMTI2Nn0= }
Value { id: SB04, temperature: 24.996942994422138, humidity: 39.75264359727226, timestamp: 1532685131266 }
So, I just need the SB04 value to set as the key. Any idea?

kafka won't connect with logstash

i am trying to connect kafka to logstash using https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
I have the kafka and zookeeper running,(i've verified this by creating a producer and consumer in python), but logstash won't detect kafka,
I have installed the kafka input plug-in, this is what my conf file looks like :
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["divolte-data"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "divolte-data"
}
}
any help would be appreciated.
I guess the problem is with the version. Since you're running on ES 2.3, it's not compatible to use bootstrap_servers within your kafka input plugin, which was introduced from the version of 5.0.
As per the doc, you should be using zk_connect instead of bootstrap_servers, like this:
kafka {
zk_connect => "localhost:9092"
}

LogStash Kafka output/input is not working

I try to use logstash with kafka broker. It can not integrated together.
Each version are,
logstash 2.4
kafka 0.8.2.2 with scala 2.10
The kafka input config file is:
input {
stdin{
}
}
output{
stdout {
codec => rubydebug
}
kafka {
bootstrap_servers => 10.120.16.202:6667,10.120.16.203:6667,10.120.16.204:6667'
topic_id => "cephosd1"
}
}
I can list topic cephosd1 from kafka.
The stdout could print out content, also.
But I can not read anything from kafka-console-consumer.sh .
I think you have a compatibility issue. If you check the version compatibility matrix between Logstash, Kafka and the kafka output plugin, you'll see that the kafka output plugin in Logstash 2.4 uses the Kafka 0.9 client version.
If you have a Kafka broker 0.8.2.2, it is not compatible with the client version 0.9 (the other way around would be ok). You can either downgrade to Logstash 2.0 or upgrade your Kafka broker to 0.9.