logstash kafka input multiple topics with different codecs - apache-kafka

this is my logstash conf
input {
kafka {
bootstrap_servers => "127.0.0.1:9092"
topics => ["filebeat", "access"]
group_id => "test-consumer-group"
consumer_threads => 1
decorate_events => true
}
}
I have two topics but I want to use different codec for diffrent topic. how can I do this?
I try to add
if ([topic] == "filebeat") {
codec => "json"
}
in the kafka input conf, the the logstash returns me errors.
Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of #, => at line 6, column 8 (byte 143) after input {\n kafka {\n bootstrap_servers => \"127.0.0.1:9092\"\n topics => [\"filebeat\", \"access\"]\n group_id => \"test-consumer-group\"\n if "

You can create 2 separate kafka inputs with each a different codec.
One other option is to add a filter that parses the json object depending on the the topic
filter {
if([topic] == "filebeat") {
json {
source => "message"
}
}
}
for more info check:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html

Related

Ingesting data in MongoDB with mongodb-output-plugin in Logstash

I am trying to ingest data from a txt file in MongoDB (Machine 1), using Logstash (Machine 2).
I set a DB and a collection with Compass and I am using the mongodb-output-plugin in Logstash.
Here's the Logstash conf file:
input
{
file {
path => "/home/user/Data"
type => "cisco-asa"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter
{
grok {
match => { "message" => "^%{SYSLOGTIMESTAMP:syslog_timestamp} %{HOSTNAME:device_src} %%{CISCO_REASON:facility}-%{INT:severity_level}-%{CISCO_REASON:facility_mnemonic}: %{GREEDY>
}
date {
match => ["syslog_timestamp", "MMM dd HH:mm:ss" ]
target => "#timestamp"
}
}
output
{
stdout {
codec => dots
}
mongodb {
id => "mongo-cisco"
collection => "Cisco ASA"
database => "Logs"
uri => "mongodb+srv://user:pass#192.168.10.9:27017/Logs"
}
}
Here's a screenshot of the Logstash output:
Logstash output
[2021-03-27T13:29:35,178][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
.............................................................................................................................
[2021-03-27T13:30:06,201][WARN ][logstash.outputs.mongodb ][main][mongo-cisco] Failed to send event to MongoDB, retrying in 3 seconds {:event=>#<LogStash::Event:0x6d0984a>, :exception=>#<Mongo::Error::NoServerAvailable: No server is available matching preference: #<Mongo::ServerSelector::Primary:0x6711494c #tag_sets=[], #server_selection_timeout=30, #options={:database=>"Logs", :user=>"username", :password=>"passwd"}>>}
PS: this is my first time using MongoDB

How to check if the source is kafka or beat in logstash?

I have two sources of data for my logs. One is the beat and one is kafka and I want to create ES indexes based on the source. if kafka -> prefix index_name with kafka, and if beat prefix the index name with beat.
input {
beats {
port => 9300
}
}
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["my-topic"]
codec => json
}
}
output {
# if kafka
elasticsearch {
hosts => "http://localhost:9200"
user => "elastic"
password => "password"
index => "[kafka-topic]-my-index"
}
# else if beat
elasticsearch {
hosts => "http://localhost:9200"
user => "elastic"
password => "password"
index => "[filebeat]-my-index"
}
}
Add tags in your inputs and use them to filter the output.
input {
beats {
port => 9300
tags => ["beats"]
}
}
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["my-topic"]
codec => json
tags => ["kafka"]
}
}
output {
if "beats" in [tags] {
output for beats
}
if "kafka" in [tags] {
output for kafka
}
}

Kafka Logstash Avro integration failing

I am trying to consume a topic from Kafka using Avro Deserializer in Logstash and getting the below error.
Here is my Logstash Config file input
input {
kafka {
bootstrap_servers => "kafka1:9911,kafka2:9911,kafka3.com:9911"
topics => "EMS.Elastic_new"
auto_offset_reset => earliest
group_id => "logstash106"
ssl_truststore_location => "/apps/opt/application/elasticsearch/logstash-7.1.1/kafka_files/kafka.client.truststore.jks"
ssl_truststore_password => "xxxx"
security_protocol => "SSL"
key_deserializer_class => "io.confluent.kafka.serializers.KafkaAvroDeserializer"
value_deserializer_class => "io.confluent.kafka.serializers.KafkaAvroDeserializer"
codec => avro_schema_registry {
endpoint => "https://kafka1:9990"
subject_name => "EMS.Elastic_new"
schema_id => 170
schema_uri => "/apps/opt/application/elasticsearch/logstash-7.1.1/kafka_files/ticketInfo.avsc"
tag_on_failure => true
register_schema => true
}
}
}
output {
elasticsearch {
index => "smd_etms_es2"
document_id => "%{tktnum}%"
action => "update"
doc_as_upsert => "true"
retry_on_conflict => 5
hosts => ["npes1:9200"]
}
stdout { codec => rubydebug }
}
[ERROR][logstash.inputs.kafka ]
Unable to create Kafka consumer from given configuration {:kafka_error_message=>org.apache.kafka.common.KafkaException:
Failed to construct kafka consumer, :cause=>io.confluent.common.config.
ConfigException: Missing required configuration "schema.registry.url"
which has no default value.} [2019-07-26T16:58:22,736][ERROR][logstash.javapipeline ]
A plugin had an unrecoverable error. Will restart this plugin. Pipeline_id:main
I have provided avro_uri in codec however, the settings is not been read by the logstash.
Missing required configuration "schema.registry.url"
Comes from setting
key_deserializer_class => "io.confluent.kafka.serializers.KafkaAvroDeserializer"
value_deserializer_class => "io.confluent.kafka.serializers.KafkaAvroDeserializer"
Based on the example code, it seems it wants you just to use org.apache.kafka.common.serialization.ByteArraySerializer for both, then I assume avro_codec does the schema management on its own using the endpoint parameter

Logstash displaying weird characters in it's output

While getting the output from a kafka stream, logstash is also displaying other characters. (\u0018, \u0000, \u0002, etc.)
I tried adding a key_deserializer_class to the logstash conf file, but that didn't help much.
input {
kafka {
bootstrap_servers => "broker1-kafka.net:9092"
topics => ["TOPIC"]
group_id => "T-group"
jaas_path => "/opt/kafka_2.11-1.1.0/config/kafka_client_jaas.conf"
key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
sasl_mechanism => "SCRAM-SHA-256"
security_protocol => "SASL_PLAINTEXT"
}
}
output { stdout { codec => rubydebug } }
Output
{
"#timestamp" => 2019-04-10T06:09:53.918Z,
"message" => "(TOPIC\u0002U42019-04-10 06:09:47.01739142019-04-10T06:09:53.738000(00000021290065792800\u0002\u0004C1\u0000\u0000\u0002\u001EINC000014418569\u0002\u0010bppmUser\u0002����\v\u0000\u0002\u0010bppmUser\u0002֢��\v\u0002\u0002\u0002\u0002.\u0002\u0018;1000012627;\u0002<AGGAA5V0FEEW7APPOPCYPOR3RPPOLL\u0000\",
"#version" => "1"
}
Is there any way to not get these characters in the output.

Logstash: Feed whole file and create a new event by splitting with newline character

I have the following logstash configuration for reading syslog-like messages from kafka:
input {
kafka {
bootstrap_servers => "172.24.0.3:9092"
topics => ["test"]
}
}
filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP}" }
}
}
output {
stdout { codec => rubydebug }
}
So, when a syslog-line is sent at logstash input the following message is generated at stdout:
FROM KAFKA
r = p1.send('test', b'Jul 16 09:07:47 ubuntu user: test500')
STDOUT
{
"message" => "Jul 16 09:07:47 ubuntu user: test500",
"#version" => "1",
"#timestamp" => 2018-07-16T12:29:57.854Z,
"host" => "6d87dde4c74e"
}
Now, I would like to add multiple lines with \n character at the end of each line and logstash processes the input as two separated messages so that the logstash stdout to be similar to the following example:
MULTIPLE LINES FROM KAFKA IN THE SAME MESSAGE
r = p1.send('test', b'Jul 16 09:07:47 ubuntu user: test501\nJul 16 09:07:47 ubuntu user: test502')
DESIRED STDOUT
{
"message" => "Jul 16 09:07:47 ubuntu user: test501",
"#version" => "1",
"#timestamp" => 2018-07-16T12:29:57.854Z,
"host" => "6d87dde4c74e"
}
{
"message" => "Jul 16 09:07:47 ubuntu user: test502",
"#version" => "1",
"#timestamp" => 2018-07-16T12:29:57.854Z,
"host" => "6d87dde4c74e"
}
Any ideas how to achieve this behavior on logstash?
I managed to achieve the behavior I described above by using line codec:
input {
kafka {
bootstrap_servers => "172.24.0.3:9092"
topics => ["test"]
## ## ## ## ##
codec => line
## ## ## ## ##
}
stdin {}
}