how to remove filebeat metadata - apache-kafka

am using filebeat to forward incoming logs from haproxy to Kafka topic but after forwarding filebeat is adding so much metadata to the kafka message which consumes more memory which I want to avoid.
Example of message sinked to kafka from filebeat where it is adding metadata, host and lot of other things:
{
"#timestamp": "2017-03-27T08:14:09.508Z",
"beat": {
"hostname": "stage-kube03",
"name": "stage-kube03",
"version": "5.2.1"
},
"input_type": "log",
"message": {
"message": {
"activityType": null
},
"offset": 3783008,
"source": "/var/log/audit.log",
"type": "log"
}
How do I control/reduce the additional metadata filebeat adds to kafka message along with the log line payload? below is my filebeat.yml file
###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /var/log/haproxy.log
#exclude_files: [".gz$"]
#fields:
# codec: plain
# token: USER_TOKEN
# type: haproxy_log
#fields_under_root: true
#- c:\programdata\elasticsearch\logs\*
processors:
- drop_event:
# fields: ["prospector","event","dataset"]
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
#exclude_lines: ['^DBG']
exclude_lines: ['^source']
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^\[
# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after
#============================= Filebeat modules ===============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 3
#index.codec: best_compression
#_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
#host: "localhost:5601"
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
# Array of hosts to connect to.
# hosts: ["localhost:9200"]
# Enabled ilm (beta) to use index lifecycle management instead daily indices.
#ilm.enabled: false
# Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"
#----------------------------- Logstash output --------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:
output.kafka:
hosts: ["10.12.0.90:9092"]
topic: "data-meter-topic"
codec.json:
pretty: true

You need to remove the additional add_host_metadata and add_cloud_metadata metadata you're adding explicitly and remove the remainder of the fields with the drop_field processor:
I've tested your configuration and changed the following:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
output.console:
pretty: true
processors:
- drop_fields:
fields: ["agent", "log", "input", "host", "ecs" ]
#- add_host_metadata: ~
#- add_cloud_metadata: ~
The result:
{
"#timestamp": "2020-11-27T15:55:17.098Z",
"#metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.10.0"
},
"message": "2020-11-27 00:29:58 status installed libc-bin:amd64 2.28-10"
}
According to the documentation, you can't remove some of the metadata, namely the #timestamp and type (which should include the #metadata field).
The drop_fields processor specifies which fields to drop if a certain
condition is fulfilled. The condition is optional. If it’s missing,
the specified fields are always dropped. The #timestamp and type
fields cannot be dropped, even if they show up in the drop_fields
list.
EDIT:
Since you appear to be running filebeat 5.2.1, I've tried the following configuration with even better success than filebeat 7.x:
filebeat.prospectors:
- input_type: log
paths:
- /var/log/*.log
output.console:
pretty: true
processors:
- drop_fields:
fields: ["log_type", "input_type", "offset", "beat", "source"]
Result:
{
"#timestamp": "2020-11-30T09:51:40.404Z",
"message": "2020-11-27 00:29:58 status half-configured vim:amd64 2:8.1.0875-5",
"type": "log"
}
EDIT2:
Conversely, because you've posted a filebeat 6.8.0 version output, I've also tested with this very same version:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
output.console:
pretty: true
processors:
- drop_fields:
fields: ["beat", "source", "prospector", "offset", "host", "log", "input", "event", "fileset" ]
#- add_host_metadata: ~
#- add_cloud_metadata: ~
Output:
{
"#timestamp": "2020-11-30T10:08:26.176Z",
"#metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.8.0"
},
"message": "2020-11-27 00:29:58 status unpacked vim:amd64 2:8.1.0875-5"
}

Related

Multi-line Filebeat templates don’t work with filebeat.inputs - type: filestream

I ran into a multiline processing problem in Filebeat when the filebeat.inputs: parameters specify type: filestream - the logs of the file stream are not analyzed according to the requirements of multiline. pattern: '^[[0-9]{4}-[0-9]{2}-[0-9]{2}', in the output, I see that the lines are not added to the lines, are created new single-line messages with individual lines from the log file.
If you specify type: log in the file bat.inputs: parameters, then everything works correctly, in accordance with the requirements of multiline. pattern: '^[[0-9]{4}-[0-9]{2}-[0-9]{2}' - a multiline message is created.
What is not correctly specified in my config?
filebeat.inputs:
- type: filestream
enabled: true
paths:
- C:\logs\GT\TTL\*\*.log
fields_under_root: true
fields:
instance: xml
system: ttl
subsystem: GT
account: abc
multiline.type: pattern
multiline.pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
To get it working you should have something like this:
filebeat.inputs:
- type: filestream
enabled: true
paths:
- C:\logs\GT\TTL\*\*.log
fields_under_root: true
fields:
instance: xml
system: ttl
subsystem: GT
account: abc
parsers:
- multiline:
type: pattern
pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}'
negate: true
match: after
There are 2 reasons why this works:
The general documentation at the time of this writing regarding multiline handling is not updated to reflect changes done for fileinputstream type. You can find information regarding setting up multiline for fileinputstream under parsers on this page: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html
The documentation I just mentioned above is also wrong (at least at the time of this writing). The example shows configuring the multiline parsers without indenting its children, which will not work as the parser will not initialize any of the values underneath it. This issue is also being discussed here: https://discuss.elastic.co/t/filebeat-filestream-input-parsers-multiline-fails/290543/13 and I expect it will be fixed sometime in the future.

Not able to send filebeat output to mongodb

I have added output.mongodb in filebeat.yml file but it is showing error "Exiting: error initializing publisher: output type mongodb undefined"
Does anyone here has any different fail safe approach towards this requirement where I want to redirect filebeat output directly to mongo database?
Filbeat.yml file
###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /var/log/test.log
#- c:\programdata\elasticsearch\logs\*
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
#exclude_lines: ['^DBG']
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^\[
# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after
#============================= Filebeat modules ===============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
reload.period: 5s
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 2
#index.codec: best_compression
#_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
host: "10.27.3.235:5601"
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
# output.elasticsearch:
# # Array of hosts to connect to.
# # hosts: ["10.27.3.235:9200"]
# hosts: ["http://10.27.3.235:9200"]
# index: "filebeatSYS-%{[agent.version]}-%{+yyyy.MM.dd}"
# setup.template:
# name: 'api-access'
# pattern: 'api-access-*'
# enabled: false
#
# # Optional protocol and basic auth credentials.
# #protocol: "https"
# #username: "elastic"
# #password: "changeme"
# #index: "filebeat-%{+yyyy.MM.dd}"
#-------------------------- MongoDB output ------------------------------
output.mongodb:
enabled: true
# URL format, according to mgo.v2 doc : [mongodb://][user:pass#]host1[:port1][,host2[:port2],...][/database][?options]
# More info : https://godoc.org/gopkg.in/mgo.v2#Dial
hosts: ["mongodb://<my-db-url-inserted-here>:27017"]
# The mongodb database to push to
db: "<my-db-here>"
# The database collection to push to
# Could be configured like key/keys of the Redis output : https://www.elastic.co/guide/en/beats/filebeat/current/redis-output.html#_key_2
collection: "filebeat"
# https://www.elastic.co/guide/en/beats/filebeat/current/redis-output.html#_loadbalance
loadbalance: true
# https://www.elastic.co/guide/en/beats/filebeat/current/redis-output.html#_timeout_4
timeout: 5s
# https://www.elastic.co/guide/en/beats/filebeat/current/redis-output.html#_max_retries_4
max_retries: 5
# https://www.elastic.co/guide/en/beats/filebeat/current/redis-output.html#_bulk_max_size_4
bulk_max_size: 2048
#----------------------------- Logstash output --------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== X-Pack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#monitoring.enabled: false
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:
#================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true
You get the error
Exiting: error initializing publisher: output type mongodb undefined
because Filebeat does not support this kind of output. Take a look at the Output Configuration doc of Filebeat. There is no output for MongoDB mentioned. Filebeat supports only the following outputs:
Elasticsearch
Logstash
Kafka
Redis
File
Console
Elastic Cloud
By defining
output.mongodb:
Filebeat crashes because 'mongodb' is an unknown/undefined configuration-field in the output-element.
Does anyone here has any different fail safe approach towards this requirement where I want to redirect filebeat output directly to mongo database?
Logstash has a dedicated MongoDB output plugin. So you could send the data from Filebeat to Logstash which sends it to your MongoDB (this approach is not direct but a valid workaround).

Hyperledger fabric channel creation error - unknown consortium name

I've been following the byfn network tutorial from hyperledger-fabric v1.4.4 and adapting it so that it has only a single organisation with a single orderer (however my idea is to not have a solo orderer but a etcdraft orderer type) and two peers so far.
I can generate the certificates and artifacts and then start the network (using docker-compose -f docker-compose-cli.yaml up) without any errors, using the commands provided in the tutorial (I've changed some peer names and orderer names from the tutorial to fit my project), however when trying to create my first channel with the command peer channel create -o orderer.trading.com:7050 -c trading-channel -f ./channel-artifacts/channel.tx --tls --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/trading.com/orderers/orderer.trading.com/msp/tlscacerts/tlsca.trading.com-cert.pem, I get the following error: Error: got unexpected status: BAD_REQUEST -- Unknown consortium name: TradingConsortium I've checked the channel-artifacts genesis.block and channel.tx and both have reference of the consortium.
It is also defined under the profiles of the configtx.yaml file as shown below.
What could it be and how would I go about debugging this?
# Copyright IBM Corp. All Rights Reserved.
#
# SPDX-License-Identifier: Apache-2.0
#
---
################################################################################
#
# Section: Organizations
#
# - This section defines the different organizational identities which will
# be referenced later in the configuration.
#
################################################################################
Organizations:
# SampleOrg defines an MSP using the sampleconfig. It should never be used
# in production but may be used as a template for other definitions
- &OrdererOrg
# DefaultOrg defines the organization which is used in the sampleconfig
# of the fabric.git development environment
Name: OrdererOrg
# ID to load the MSP definition as
ID: OrdererMSP
# MSPDir is the filesystem path which contains the MSP configuration
MSPDir: crypto-config/ordererOrganizations/trading.com/msp
# Policies defines the set of policies at this level of the config tree
# For organization policies, their canonical path is usually
# /Channel/<Application|Orderer>/<OrgName>/<PolicyName>
Policies:
Readers:
Type: Signature
Rule: "OR('OrdererMSP.member')"
Writers:
Type: Signature
Rule: "OR('OrdererMSP.member')"
Admins:
Type: Signature
Rule: "OR('OrdererMSP.admin')"
- &OrgTrader
# DefaultOrg defines the organization which is used in the sampleconfig
# of the fabric.git development environment
Name: OrgTraderMSP
# ID to load the MSP definition as
ID: OrgTraderMSP
MSPDir: crypto-config/peerOrganizations/orgtrader.trading.com/msp
# Policies defines the set of policies at this level of the config tree
# For organization policies, their canonical path is usually
# /Channel/<Application|Orderer>/<OrgName>/<PolicyName>
Policies:
Readers:
Type: Signature
Rule: "OR('OrgTraderMSP.admin', 'OrgTraderMSP.peer', 'OrgTraderMSP.client')"
Writers:
Type: Signature
Rule: "OR('OrgTraderMSP.admin', 'OrgTraderMSP.client')"
Admins:
Type: Signature
Rule: "OR('OrgTraderMSP.admin')"
Endorsement:
Type: Signature
Rule: "OR('OrgTraderMSP.peer')"
# leave this flag set to true.
AnchorPeers:
# AnchorPeers defines the location of peers which can be used
# for cross org gossip communication. Note, this value is only
# encoded in the genesis block in the Application section context
- Host: peer0.orgtrader.trading.com
Port: 7051
################################################################################
#
# SECTION: Capabilities
#
# - This section defines the capabilities of fabric network. This is a new
# concept as of v1.1.0 and should not be utilized in mixed networks with
# v1.0.x peers and orderers. Capabilities define features which must be
# present in a fabric binary for that binary to safely participate in the
# fabric network. For instance, if a new MSP type is added, newer binaries
# might recognize and validate the signatures from this type, while older
# binaries without this support would be unable to validate those
# transactions. This could lead to different versions of the fabric binaries
# having different world states. Instead, defining a capability for a channel
# informs those binaries without this capability that they must cease
# processing transactions until they have been upgraded. For v1.0.x if any
# capabilities are defined (including a map with all capabilities turned off)
# then the v1.0.x peer will deliberately crash.
#
################################################################################
Capabilities:
# Channel capabilities apply to both the orderers and the peers and must be
# supported by both.
# Set the value of the capability to true to require it.
Channel: &ChannelCapabilities
# V2_0 capability ensures that orderers and peers behave according
# to v2.0 channel capabilities. Orderers and peers from
# prior releases would behave in an incompatible way, and are therefore
# not able to participate in channels at v2.0 capability.
# Prior to enabling V2.0 channel capabilities, ensure that all
# orderers and peers on a channel are at v2.0.0 or later.
V2_0: true
# Orderer capabilities apply only to the orderers, and may be safely
# used with prior release peers.
# Set the value of the capability to true to require it.
Orderer: &OrdererCapabilities
# V2_0 orderer capability ensures that orderers behave according
# to v2.0 orderer capabilities. Orderers from
# prior releases would behave in an incompatible way, and are therefore
# not able to participate in channels at v2.0 orderer capability.
# Prior to enabling V2.0 orderer capabilities, ensure that all
# orderers on channel are at v2.0.0 or later.
V2_0: true
# Application capabilities apply only to the peer network, and may be safely
# used with prior release orderers.
# Set the value of the capability to true to require it.
Application: &ApplicationCapabilities
# V2_0 application capability ensures that peers behave according
# to v2.0 application capabilities. Peers from
# prior releases would behave in an incompatible way, and are therefore
# not able to participate in channels at v2.0 application capability.
# Prior to enabling V2.0 application capabilities, ensure that all
# peers on channel are at v2.0.0 or later.
V2_0: true
################################################################################
#
# SECTION: Application
#
# - This section defines the values to encode into a config transaction or
# genesis block for application related parameters
#
################################################################################
Application: &ApplicationDefaults
# Organizations is the list of orgs which are defined as participants on
# the application side of the network
Organizations:
# Policies defines the set of policies at this level of the config tree
# For Application policies, their canonical path is
# /Channel/Application/<PolicyName>
Policies:
Readers:
Type: ImplicitMeta
Rule: "ANY Readers"
Writers:
Type: ImplicitMeta
Rule: "ANY Writers"
Admins:
Type: ImplicitMeta
Rule: "MAJORITY Admins"
LifecycleEndorsement:
Type: ImplicitMeta
Rule: "MAJORITY Endorsement"
Endorsement:
Type: ImplicitMeta
Rule: "MAJORITY Endorsement"
Capabilities:
<<: *ApplicationCapabilities
################################################################################
#
# SECTION: Orderer
#
# - This section defines the values to encode into a config transaction or
# genesis block for orderer related parameters
#
################################################################################
Orderer: &OrdererDefaults
# Orderer Type: The orderer implementation to start
OrdererType: etcdraft
Addresses:
- orderer.trading.com:7050
# Batch Timeout: The amount of time to wait before creating a batch
BatchTimeout: 2s
# Batch Size: Controls the number of messages batched into a block
BatchSize:
# Max Message Count: The maximum number of messages to permit in a batch
MaxMessageCount: 10
# Absolute Max Bytes: The absolute maximum number of bytes allowed for
# the serialized messages in a batch.
AbsoluteMaxBytes: 99 MB
# Preferred Max Bytes: The preferred maximum number of bytes allowed for
# the serialized messages in a batch. A message larger than the preferred
# max bytes will result in a batch larger than preferred max bytes.
PreferredMaxBytes: 512 KB
# Organizations is the list of orgs which are defined as participants on
# the orderer side of the network
Organizations:
# Policies defines the set of policies at this level of the config tree
# For Orderer policies, their canonical path is
# /Channel/Orderer/<PolicyName>
Policies:
Readers:
Type: ImplicitMeta
Rule: "ANY Readers"
Writers:
Type: ImplicitMeta
Rule: "ANY Writers"
Admins:
Type: ImplicitMeta
Rule: "MAJORITY Admins"
# BlockValidation specifies what signatures must be included in the block
# from the orderer for the peer to validate it.
BlockValidation:
Type: ImplicitMeta
Rule: "ANY Writers"
################################################################################
#
# CHANNEL
#
# This section defines the values to encode into a config transaction or
# genesis block for channel related parameters.
#
################################################################################
Channel: &ChannelDefaults
# Policies defines the set of policies at this level of the config tree
# For Channel policies, their canonical path is
# /Channel/<PolicyName>
Policies:
# Who may invoke the 'Deliver' API
Readers:
Type: ImplicitMeta
Rule: "ANY Readers"
# Who may invoke the 'Broadcast' API
Writers:
Type: ImplicitMeta
Rule: "ANY Writers"
# By default, who may modify elements at this config level
Admins:
Type: ImplicitMeta
Rule: "MAJORITY Admins"
# Capabilities describes the channel level capabilities, see the
# dedicated Capabilities section elsewhere in this file for a full
# description
Capabilities:
<<: *ChannelCapabilities
################################################################################
#
# Profile
#
# - Different configuration profiles may be encoded here to be specified
# as parameters to the configtxgen tool
#
################################################################################
Profiles:
ChemartOrgsChannel:
Consortium: TradingConsortium
<<: *ChannelDefaults
Application:
<<: *ApplicationDefaults
Organizations:
- *OrgTrader
Capabilities:
<<: *ApplicationCapabilities
MultiNodeEtcdRaft:
<<: *ChannelDefaults
Capabilities:
<<: *ChannelCapabilities
Orderer:
<<: *OrdererDefaults
OrdererType: etcdraft
EtcdRaft:
Consenters:
- Host: orderer.trading.com
Port: 7050
ClientTLSCert: crypto-config/ordererOrganizations/trading.com/orderers/orderer.trading.com/tls/server.crt
ServerTLSCert: crypto-config/ordererOrganizations/trading.com/orderers/orderer.trading.com/tls/server.crt
Addresses:
- orderer.trading.com:7050
Organizations:
- *OrdererOrg
Capabilities:
<<: *OrdererCapabilities
Application:
<<: *ApplicationDefaults
Organizations:
- <<: *OrdererOrg
Consortiums:
TradingConsortium:
Organizations:
- *OrgTrader
This is my crypto-config.yaml file:
# Copyright IBM Corp. All Rights Reserved.
#
# SPDX-License-Identifier: Apache-2.0
#
# ---------------------------------------------------------------------------
# "OrdererOrgs" - Definition of organizations managing orderer nodes
# ---------------------------------------------------------------------------
OrdererOrgs:
# ---------------------------------------------------------------------------
# Orderer
# ---------------------------------------------------------------------------
- Name: Orderer
Domain: trading.com
# ---------------------------------------------------------------------------
# "Specs" - See PeerOrgs below for complete description
# ---------------------------------------------------------------------------
Specs:
- Hostname: orderer
# ---------------------------------------------------------------------------
# "PeerOrgs" - Definition of organizations managing peer nodes
# ---------------------------------------------------------------------------
PeerOrgs:
# ---------------------------------------------------------------------------
# OrgTrader
# ---------------------------------------------------------------------------
- Name: OrgTrader
Domain: orgtrader.trading.com
EnableNodeOUs: true
# ---------------------------------------------------------------------------
# "Specs"
# ---------------------------------------------------------------------------
# Uncomment this section to enable the explicit definition of hosts in your
# configuration. Most users will want to use Template, below
#
# Specs is an array of Spec entries. Each Spec entry consists of two fields:
# - Hostname: (Required) The desired hostname, sans the domain.
# - CommonName: (Optional) Specifies the template or explicit override for
# the CN. By default, this is the template:
#
# "{{.Hostname}}.{{.Domain}}"
#
# which obtains its values from the Spec.Hostname and
# Org.Domain, respectively.
# ---------------------------------------------------------------------------
# Specs:
# - Hostname: foo # implicitly "foo.orgtrader.trading.com"
# CommonName: foo27.org5.trading.com # overrides Hostname-based FQDN set above
# - Hostname: bar
# - Hostname: baz
# ---------------------------------------------------------------------------
# "Template"
# ---------------------------------------------------------------------------
# Allows for the definition of 1 or more hosts that are created sequentially
# from a template. By default, this looks like "peer%d" from 0 to Count-1.
# You may override the number of nodes (Count), the starting index (Start)
# or the template used to construct the name (Hostname).
#
# Note: Template and Specs are not mutually exclusive. You may define both
# sections and the aggregate nodes will be created for you. Take care with
# name collisions
# ---------------------------------------------------------------------------
Template:
Count: 2
# Start: 5
# Hostname: {{.Prefix}}{{.Index}} # default
# ---------------------------------------------------------------------------
# "Users"
# ---------------------------------------------------------------------------
# Count: The number of user accounts _in addition_ to Admin
# ---------------------------------------------------------------------------
Users:
Count: 1

"Failed to submit supervisor: Request failed with status code 502" on submitting ingestion spec to router

I am getting the error "Failed to submit supervisor: Request failed with status code 502" when I am trying to submit an ingestion spec to the druid UI (through the router). The ingestion spec works in a standalone druid server.
I have set up the cluster using 4 machines-1 for the coordinator and overlord (master), 1 for historical and middle manager (data), 1 for broker (query), and 1 for router, with a separate instance for zookeeper. There is no error in the logs.
The ingestion spec is as follows:
{
"type": "kafka",
"dataSchema": {
"dataSource": "table1",
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"dimensionsSpec": {
"dimensions": [
// List of valid dimensions
]
},
"timestampSpec": {
"column": "createdOnDate", // In 'YYYY-MM-DD' format
"format": "iso"
}
}
},
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "MONTH",
"rollup": false,
"queryGranularity": "none"
},
"metricsSpec": []
},
"ioConfig": {
"type": "kafka",
"topic": "mongotopic",
"consumerProperties": {
"bootstrap.servers": "ip:9092"
},
"useEarliestOffset": true
},
"tuningConfig": {
"type": "kafka",
"forceExtendableShardSpecs": true,
"maxParseExceptions": 100,
"maxSavedParseExceptions": 10
}
}
The content of common.runtime.properties is as follows:
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
#
# Extensions
#
# This is not the full list of Druid extensions, but common ones that people often use. You may need to change this list
# based on your particular setup.
druid.extensions.loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "druid-s3-extensions","postgresql-metadata-storage"]
# If you have a different version of Hadoop, place your Hadoop client jar files in your hadoop-dependencies directory
# and uncomment the line below to point to your directory.
#druid.extensions.hadoopDependenciesDir=/my/dir/hadoop-dependencies
#
# Logging
#
# Log all runtime properties on startup. Disable to avoid logging properties on startup:
druid.startup.logging.logProperties=true
#
# Zookeeper
#
druid.zk.service.host=druid-ip:2181
druid.zk.paths.base=/druid
#
# Metadata storage
#
# For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over):
#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://metadata.store.ip:1527/var/druid/metadata.db;create=true
#druid.metadata.storage.connector.host=metadata.store.ip
#druid.metadata.storage.connector.port=1527
# For MySQL (make sure to include the MySQL JDBC driver on the classpath):
#druid.metadata.storage.type=mysql
#druid.metadata.storage.connector.connectURI=jdbc:mysql:///druid
#druid.metadata.storage.connector.user=druid
#druid.metadata.storage.connector.password=druid
# For PostgreSQL (make sure to additionally include the Postgres extension):
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid
druid.metadata.storage.connector.user=user
druid.metadata.storage.connector.password=password
#
# Deep storage
#
# For local disk (only viable in a cluster if this is a network mount):
# druid.storage.type=local
# druid.storage.storageDirectory=var/druid/segments
# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
#druid.storage.type=hdfs
#druid.storage.storageDirectory=/druid/segments
# For S3:
druid.storage.type=s3
druid.storage.bucket=valid-bucket
druid.storage.baseKey=druid/segments
druid.s3.accessKey=valid-key
druid.s3.secretKey=valid-secret
#
# Indexing service logs
#
# For local disk (only viable in a cluster if this is a network mount):
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs
# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
#druid.indexer.logs.type=hdfs
#druid.indexer.logs.directory=/druid/indexing-logs
# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=druid-test-1
druid.indexer.logs.s3Prefix=druid/indexing-logs
#
# Service discovery
#
druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator
#
# Monitoring
#
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=logging
druid.emitter.logging.logLevel=info
# Storage type of double columns
# ommiting this will lead to index double as float at the storage layer
druid.indexing.doubleStorage=double
#
# SQL
#
druid.sql.enable=true
On submitting the specification to the task/supervisor/Submit Supervisor tab through the IP of the router (IP:8888/unified-console), I get the Failed to submit supervisor: Request failed with status code 502 error in the UI. Zookeeper, S3 and Postgres configurations are valid. The UI shows 1 middle-manager, and 0 historicals, 0 data sources, and 0 segments.
It happened because the druid-kafka-indexing-service extension was missing from the extension list of common.runtime.properties.

Filebeat collects logs and pushes them to Kafka error

1.My version information
jdk-8u191-linux-x64.tar.gz
kibana-6.5.0-linux-x86_64.tar.gz
elasticsearch-6.5.0.tar.gz
logstash-6.5.0.tar.gz
filebeat-6.5.0-linux-x86_64.tar.gz
kafka_2.11-2.1.0.tgz
zookeeper-3.4.12.tar.gz
2.Problem description
I have a log file in XML format. I use filebeat to collect this file and push it to Kafka Content garbled.
Here's my filebeat configuration
filebeat.inputs:
- type: log
enabled: true
paths:
- /data/reporttg/ChannelServer.log
include_lines: ['\<\bProcID.*\<\/ProcID\b\>']
### Filebeat modules
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
### Elasticsearch template setting
setup.template.settings:
index.number_of_shards: 3
### Kibana
setup.kibana:
### Kafka
output.kafka:
enabled: true
hosts: ["IP:9092", "IP:9092", "IP:9092"]
topic: houry
### Procesors
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
My log content
<OrigDomain>ECIP</OrigDomain>
<HomeDomain>UCRM</HomeDomain>
<BIPCode>BIP2A011</BIPCode>
<BIPVer>0100</BIPVer>
<ActivityCode>T2000111</ActivityCode>
<ActionCode>1</ActionCode>
<ActionRelation>0</ActionRelation>
<Routing>
<RouteType>01</RouteType>
<RouteValue>13033935743</RouteValue>
</Routing>
<ProcID>PROC201901231142020023206514</ProcID>
<TransIDO>SSP201901231142020023206513</TransIDO>
<TransIDH>2019012311420257864666</TransIDH>
<ProcessTime>20190123114202</ProcessTime>
<Response>
<RspType>0</RspType>
<RspCode>0000</RspCode>
<RspDesc>success</RspDesc>
</Response>
Test regular expressions
3.Start filebeat and view Kafka content
4.I tested that it was normal for filebeat to collect content and push it to logstash.
How should this problem be solved?
If you don't want to include the XML tags, I would recommend you use a regex grouping
Something like so
'<ProcID>(PROC[0-9]+)<\/ProcID>'