Can logstash's Google Cloud Storage output plugin do persistent transmission after restart machine? - google-cloud-storage

Using this config for logstash's output. It's using /tmp/logstash-gcs as a local folder. Send to GCS when file become 1024 kbytes.
input {
beats {
port => 5044
}
}
filter {}
output {
google_cloud_storage {
bucket => "gcs-bucket-name"
json_key_file => "/secrets/service_account/credentials.json"
temp_directory => "/tmp/logstash-gcs"
log_file_prefix => "logstash_gcs"
max_file_size_kbytes => 1024
output_format => "json"
date_pattern => "%Y-%m-%dT%H:00"
flush_interval_secs => 2
gzip => false
gzip_content_encoding => false
uploader_interval_secs => 60
include_uuid => true
include_hostname => true
}
}
After restart machine, can it continuous do the job schedule without any data loss?
Does it has a queue feature to manage as pub/sub?

About solution, there are 2 ways.
Use filebeat to beat those tmp files again.
Set grace period seconds to insure logstash has enough time to send last task when machine down.

Related

sync mongo data to elastic using logstash

I want to sync my mongodb data(local mongodb) to elastic search(local elastic) using logstash-plugin of mongodb
I have install logstash plugin using
bin/logstash-plugin install logstash-input-mongodb .
Then i created a mongodata.conf file in /usr/share/logstash directory.
When I execute the conf file then it shows
--> Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
My config file is:
input{
mongodb{
uri => "mongodb://localhost:27017/reporterDB"
placeholder_db_dir => "/opt/logstash-mongodb/"
placeholder_db_name => "logstash_sqlite.db"
collection => "iam_ms_test"
batch_size => 5000
}
}
filter{
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => "localhost:9200"
user => elastic
password => changeme
index => "mongo_log"
document_type => "document_type"
document_id => "%{id}"
}
}
I am getting below lines in logstash-plain.log file
[2019-11-01T15:41:00,869][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2019-11-01T15:41:00,871][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>6, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>750, :thread=>"#<Thread:0x351f7fd1#/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:245 run>"}
[2019-11-01T15:41:01,068][INFO ][logstash.inputs.mongodb ] Registering MongoDB input
[2019-11-01T15:41:01,116][ERROR][logstash.pipeline ] Error registering plugin {:pipeline_id=>"main", :plugin=>"<LogStash::Inputs::MongoDB uri=>\"mongodb://localhost:27017/anchorReports\", placeholder_db_dir=>\"/opt/logstash-mongodb/\", placeholder_db_name=>\"logstash_sqlite.db\", collection=>\"hi_p5m\", batch_size=>5000, id=>\"ec7682e8c6c5676deca84d5072c5f7865120a107ffce81ce21caa878c6e4ed09\", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>\"plain_441f95b8-cc8a-4b9e-a45f-657ed2011e2b\", enable_metric=>true, charset=>\"UTF-8\">, since_table=>\"logstash_since\", since_column=>\"_id\", since_type=>\"id\", parse_method=>\"flatten\", isodate=>false, retry_delay=>3, generateId=>false, unpack_mongo_id=>false, message=>\"Default message...\", interval=>1>", :error=>"Java::JavaSql::SQLException: path to '/opt/logstash-mongodb/logstash_sqlite.db': '/opt/logstash-mongodb' does not exist", :thread=>"#<Thread:0x351f7fd1#/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:245 run>"}
[2019-11-01T15:41:01,869][ERROR][logstash.pipeline ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<Sequel::DatabaseConnectionError: Java::JavaSql::SQLException: path to '/opt/logstash-mongodb/logstash_sqlite.db': '/opt/logstash-mongodb' does not exist>, :backtrace=>["org.sqlite.core.CoreConnection.open(org/sqlite/core/CoreConnection.java:190)", "org.sqlite.core.CoreConnection.<init>(org/sqlite/core/CoreConnection.java:74)", "org.sqlite.jdbc3.JDBC3Connection.<init>(org/sqlite/jdbc3/JDBC3Connection.java:24)", "org.sqlite.jdbc4.JDBC4Connection.<init>(org/sqlite/jdbc4/JDBC4Connection.java:23)", "org.sqlite.SQLiteConnection.<init>
"(org/sqlite/SQLiteConnection.java:45)",
"org.sqlite.JDBC.createConnection(org/sqlite/JDBC.java:114)",
"org.sqlite.JDBC.connect(org/sqlite/JDBC.java:88)"
I want the records on my elastic search under `index("mongo_log").
I also want to know the uses of placeholder_db_dir and placeholder_db_name and whats should be these values when we are using mongodb as the input database.
Problem solved! actually the directory opt/logstash was not created . So I manually create the logstash folder under opt. After that i gave Write permission to that directory , so that when we execute the command for logstash then it can create file inside this folder.

Logstash output is from another input

I have an issue where my beatmetric is caught by my http pipeline.
Both Logstash, Elastic and Metricbeat is running in Kubernetes.
My beatmetric is setup to send to Logstash on port 5044 and log to a file in /tmp. This works fine. But whenever I create a pipeline with an http input, this seems to also catch beatmetric inputs and send them to index2 in Elastic as defined in the http pipeline.
Why does it behave like this?
/usr/share/logstash/pipeline/http.conf
input {
http {
port => "8080"
}
}
output {
#stdout { codec => rubydebug }
elasticsearch {
hosts => ["http://my-host.com:9200"]
index => "test2"
}
}
/usr/share/logstash/pipeline/beats.conf
input {
beats {
port => "5044"
}
}
output {
file {
path => '/tmp/beats.log'
codec => "json"
}
}
/usr/share/logstash/config/logstash.yml
pipeline.id: main
pipeline.workers: 1
pipeline.batch.size: 125
pipeline.batch.delay: 50
http.host: "0.0.0.0"
http.port: 9600
config.reload.automatic: true
config.reload.interval: 3s
/usr/share/logstash/config/pipeline.yml
- pipeline.id: main
path.config: "/usr/share/logstash/pipeline"
Even if you have multiple config files, they are read as a single pipeline by logstash, concatenating the inputs, filters and outputs, if you need to run then as separate pipelines you have two options.
Change your pipelines.yml and create differents pipeline.id, each one pointing to one of the config files.
- pipeline.id: beats
path.config: "/usr/share/logstash/pipeline/beats.conf"
- pipeline.id: http
path.config: "/usr/share/logstash/pipeline/http.conf"
Or you can use tags in your input, filter and output, for example:
input {
http {
port => "8080"
tags => ["http"]
}
beats {
port => "5044"
tags => ["beats"]
}
}
output {
if "http" in [tags] {
elasticsearch {
hosts => ["http://my-host.com:9200"]
index => "test2"
}
}
if "beats" in [tags] {
file {
path => '/tmp/beats.log'
codec => "json"
}
}
}
Using the pipelines.yml file is the recommended way to running multiple pipelines

Logstash not forwarding logs to ES via Kafka

I'm using ELK 5.0.1 and Kafka 0.10.1.0 . I'm not sure why my logs aren't forwarding I installed Kafkacat and was successfully able to Produce and Consume logs from all the 3 servers where Kafka cluster is installed.
shipper.conf
input {
file {
start_position => "beginning"
path => "/var/log/logstash/logstash-plain.log"
}
}
output {
kafka {
topic_id => "stash"
bootstrap_servers => "<i.p1>:9092,<i.p2>:9092,<i.p3>:9092"
}
}
receiver.conf
input {
kafka {
topics => ["stash"]
group_id => "stashlogs"
bootstrap_servers => "<i.p1>:2181,<i,p2>:2181,<i.p3>:2181"
}
}
output {
elasticsearch {
hosts => ["<eip>:9200","<eip>:9200","<eip>:9200"]
manage_template => false
index => "logstash-%{+YYYY.MM.dd}"
}
}
Logs: Getting the below warnings in logstash-plain.log
[2017-04-17T16:34:28,238][WARN ][org.apache.kafka.common.protocol.Errors] Unexpected error
code: 38.
[2017-04-17T16:34:28,238][WARN ][org.apache.kafka.clients.NetworkClient] Error while fetching
metadata with correlation id 44 : {stash=UNKNOWN}
It looks like your bootstrap servers are using zookeeper ports. Try using Kafka ports (default 9092)

Can’t get Logstash to read existing Kafka topic from start

I'm trying to consume a Kafka topic using Logstash, for indexing by Elasticsearch. The Kafka events are JSON documents.
We recently upgraded our Elastic Stack to 5.1.2.
I believe that I was able to consume the topic OK in 5.0, using the same settings, but that was a while ago so perhaps I'm doing something wrong now, but can't see it. This is my config (slightly sanitized):
input {
kafka {
bootstrap_servers => "host1:9092,host2:9092,host3:9092"
client_id => "logstash-elastic-5-c5"
group_id => "logstash-elastic-5-g5"
topics => "trp_v1"
auto_offset_reset => "earliest"
}
}
filter {
json {
source => "message"
}
mutate {
rename => { "#timestamp" => "indexedDatetime" }
remove_field => [
"#timestamp",
"#version",
"message"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["host10:9200", "host11:9200", "host12:9200", "host13:9200"]
action => "index"
index => "trp-i"
document_type => "event"
}
}
When I run this, no messages are consumed, no sign of activity appears in the log after "[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] Setting newly assigned partitions", and in Kafka Manager the consumer appears to immediately appear with "total lag = 0" for the topic.
This version of the Kafka plugin stores consumer offsets in Kafka itself, so each time I try to run Logstash against the same topic, I increment the group_id so in theory, it should start from the earliest offset for the topic.
Any advice?
EDIT: It appears that despite setting auto_offset_reset to "earliest", it isn't working - it's as if it's being set to "latest". I left Logstash running, then had more entries loaded into the Kafka queue, and they were processed by Logstash.

Reading from multiple topics in Apache Kafka

I'm trying to read from multiple kafka topics (say 'newtest-1' and 'newtest-2') using 'white_list' configuration in the logstash input plugin. My logstash conf looks like:
input { kafka { white_list => "newtest-1|newtest-2" } } output { stdout {codec => rubydebug } }
With this configuration I can successfully read from two different topics. But I want to use regex for input topics as I'm expecting the topics to be of the form 'newtest-*'. According to the suggestion in this link, the following configuration should work:
input { kafka { white_list => "newtest-*" } } output { stdout {codec => rubydebug } }
But with this I'm not able to read from kafka. Any help is appreciated.
The white_list should be newtest-.*
This is relevant to older versions of the plugin. Now you can use topics.