Cast or convert mongodb _id object to string in Logstash pipeline

Cast or convert mongodb _id object to string in Logstash pipeline - mongodb

I am working on creating a pipeline to get data from MongoDB to ElasticSearch using Logstash.
I am using dbschema mongodb jdbc drivers. I am able to connect to database using driver but I am facing issue with _id . As in MongoDB its of type object So I am getting issue with converter. Here is error I am getting.
Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::OrgLogstash::MissingConverterException: Missing Converter handling for full class name=org.bson.types.ObjectId, simple name=ObjectId>}
My pipeline is as below :
input{
jdbc{
jdbc_driver_library => "C:/logstash-6.1.0/logstash-6.1.0/bin/driver/mongo/dbschema/mongojdbc1.2.jar"
jdbc_driver_class => "Java::com.dbschema.MongoJdbcDriver"
jdbc_connection_string => "jdbc:mongodb://abc.com:27017/test"
jdbc_user => ""
statement => "db.getCollection('Employee').find({})"
codec => json
}
}
output {
elasticsearch {
hosts => 'http://localhost:9200'
index => 'mongodbschema'
codec => json
}
stdout { codec => rubydebug }
}
Is there any way I can convert/cast or do something in filter to change datatype of _id from object to string

try
statement => "db.getCollection('Employee').find({ },{'_id': false})"

Try this code in filter
filter {
mutate {
remove_field => [ "_id" ]
}}

Since in elasticsearch _id is a reserved field (you can pass _id to index API in PUT operation to update a document), you must apply a filter that mutates the mongodb _id, for instance:
filter {
mutate {
rename => [ "_id", "mongo_id" ]
}
}

Related

JDBC Logstash Elastic Kibana

I'm using JDBC input plugin to ingest data from mongodb to ElasticSearch.
My config is:
`input {
jdbc {
jdbc_driver_class => "mongodb.jdbc.MongoDriver"
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mongodb_unityjdbc_free.jar"
jdbc_user => ""
jdbc_password => ""
jdbc_connection_string => "jdbc:mongodb://localhost:27017/pritunl"
schedule => "* * * * *"
jdbc_page_size => 100000
jdbc_paging_enabled => true
statement => "select * from servers_output"
}
}
filter {
mutate {
copy => { "_id" => "[#metadata][id]"}
remove_field => ["_id"]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "pritunl"
document_id => "%{[#metadata][_id]}"
}
stdout {}
}`
In Kibana I see only one hitenter image description here, but in stdout I see many records from mongodb collection. What should I do, to see them all?

The problem is, that all your documents are saved with the same "_id", so even though you're sending different records to ES, only one document is being overwritten internally - thus you get 1 hit in Kibana.
There is a typo in your configuration to cause this issue.
You're copying "_id" into "[#metadata][id]"
But you're trying to read "[#metadata][_id]" with an underscore.
Removing the underscore when reading the value for document_id should fix your issue.
output {
elasticsearch {
hosts => "localhost:9200"
index => "pritunl"
document_id => "%{[#metadata][id]}"
}
stdout {}
}`

Ingesting data in MongoDB with mongodb-output-plugin in Logstash

I am trying to ingest data from a txt file in MongoDB (Machine 1), using Logstash (Machine 2).
I set a DB and a collection with Compass and I am using the mongodb-output-plugin in Logstash.
Here's the Logstash conf file:
input
{
file {
path => "/home/user/Data"
type => "cisco-asa"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter
{
grok {
match => { "message" => "^%{SYSLOGTIMESTAMP:syslog_timestamp} %{HOSTNAME:device_src} %%{CISCO_REASON:facility}-%{INT:severity_level}-%{CISCO_REASON:facility_mnemonic}: %{GREEDY>
}
date {
match => ["syslog_timestamp", "MMM dd HH:mm:ss" ]
target => "#timestamp"
}
}
output
{
stdout {
codec => dots
}
mongodb {
id => "mongo-cisco"
collection => "Cisco ASA"
database => "Logs"
uri => "mongodb+srv://user:pass#192.168.10.9:27017/Logs"
}
}
Here's a screenshot of the Logstash output:
Logstash output
[2021-03-27T13:29:35,178][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
.............................................................................................................................
[2021-03-27T13:30:06,201][WARN ][logstash.outputs.mongodb ][main][mongo-cisco] Failed to send event to MongoDB, retrying in 3 seconds {:event=>#<LogStash::Event:0x6d0984a>, :exception=>#<Mongo::Error::NoServerAvailable: No server is available matching preference: #<Mongo::ServerSelector::Primary:0x6711494c #tag_sets=[], #server_selection_timeout=30, #options={:database=>"Logs", :user=>"username", :password=>"passwd"}>>}
PS: this is my first time using MongoDB

How to sync changes from mongodb to elasticsearch(i.e.reflect changes in elastic when crud operation performed in mongo) using logstash

I just want to sync the data from mongodb to elastic using logstash. Its working good, when any new record comes in mongodb, logstash pushes into elastic. But when I update any record in mongodb then it does not change into elasticsearch. I want to make changes in the config file so that when any record updates in mongo it should reflect in elastic as well.
I have already tried by making action => "index" as well as action =>"update" in .conf file.
Here is my mongodata.conf file
input{
mongodb{
uri => 'mongodb://localhost:27017/DBname'
placeholder_db_dir => '/opt/logstash'
placeholder_db_name => 'logstash_sqlite.db'
collection => "collectionName"
batch_size => 500
}
}
filter {
mutate {
remove_field => [ "_id" ]
}
}
output{
elasticsearch{
action => "index"
hosts => ["localhost:9200"]
index => "mongo_hi_log_data"
}
stdout{ codec => rubydebug }
}
I want to sync the data from mongodb to elastic using logstash when any record is updated or inserted in mongodb.

input {
mongodb {
uri => 'mongodb://mongo-db-url/your-collection'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'products'
batch_size => 5000
}
}
filter {
date {
match => [ "logdate", "ISO8601" ]
}
mutate {
rename => { "[_id]" => "product_id" }
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["http://es-url"]
user => "elastic"
password => "changeme"
index => "your-index"
action => "update"
doc_as_upsert => true
document_id => "%{product_id}"
timeout => 30
workers => 1
}
}

Logstash-input-mongodb configuration file

I am trying to pull data from a mongoDB to Elasticsearch using logstash.
I am using the Logstash-input-mongodb plugin. This is my config file:
input {
mongodb {
uri => 'mongodb://localhost:27017/test'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'student'
batch_size => 202
}
}
filter {
}
output {
elasticsearch {
host => "localhost:9200"
user => elastic
password => changeme
index => "testStudent"
}
stdout { codec => rubydebug }
}
I am having an error saying:
Pipelines YAML file is empty.
Is this because I left the filter part empty?

How to have an input of type MongoDB for Logstash

I know we can input files, and output to a mongo database. But I have a collection in my mongodb that I would like to have as an input so that I can use it with ES. Is this possible?
Thank you.

I have had a similar problem, the logstash-input-mongodb plugin is fine, but it is very limited, it also seems that it is no longer being maintained, so, I have opted for the logstash-integration-jdbc plugin.
I have followed the following steps to sync a MongoDB collection with ES:
First, I have downloaded the JDBC driver for MongoDB developed by DBSchema that you can find here.
I have prepared a custom Dockerfile to integrate the driver and plugins as you can see below:
FROM docker.elastic.co/logstash/logstash:7.9.2
RUN mkdir /usr/share/logstash/drivers
COPY ./drivers/* /usr/share/logstash/drivers/
RUN logstash-plugin install logstash-integration-jdbc
RUN logstash-plugin install logstash-output-elasticsearch
I have configured a query that will be executed every 30 seconds and will look for documents with an insert timestamp later than the timestamp of the last query (provided with the parameter :sql_last_value)
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/drivers/mongojdbc2.3.jar"
jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
jdbc_connection_string => "jdbc:mongodb://devroot:devroot#mongo:27017/files?authSource=admin"
jdbc_user => "devroot"
jdbc_password => "devroot"
schedule => "*/30 * * * * *"
statement => "db.processed_files.find({ 'document.processed_at' : {'$gte': :sql_last_value}},{'_id': false});"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
action => "create"
index => "processed_files"
hosts => ["elasticsearch:9200"]
user => "elastic"
password => "password"
ssl => true
ssl_certificate_verification => false
cacert => "/etc/logstash/keys/certificate.pem"
}
}
Hope it can help someone, regards

You could set up a river to pull data from MongoDB to Elasticsearch.
See the instructions here - http://www.codetweet.com/ubuntu-2/configuring-elasticsearch-mongodb/

I tried out with Sergio Sánchez Sánche's solution suggestion and found following updates and improvements:
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/drivers/mongojdbc3.0.jar"
jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
jdbc_connection_string => "jdbc:mongodb://devroot:devroot#mongo:27017/files?authSource=admin"
jdbc_user => "devroot"
jdbc_password => "devroot"
schedule => "*/30 * * * * *"
statement => "db.processed_files.find({ 'document.processed_at' : {'$gte': new ISODate(:sql_last_value)}},{'_id': false});"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{[document][uuid]}"
index => "processed_files"
hosts => ["elasticsearch:9200"]
user => "elastic"
password => "password"
ssl => true
ssl_certificate_verification => false
cacert => "/etc/logstash/keys/certificate.pem"
}
}
Explanation:
The date comparison in Mongodb has to use new ISODate to convert
:sql_last_value
I'd like to use "update" instead of "create" to cover
the case of update. The query result from the section input is
contained in "document". Assume you have a field with unique value
"uuid", you have to use it to identify the document, because Mongodb's
"_id" is not supported anyway.
If you have any embedded document which has also "_id" filed, you have to exclude it, too, e.g.
statement => "db.profiles.find({'updatedAt' : {'$gte': new ISODate(:sql_last_value)}},
{'_id': false, 'embedded_doc._id': false}});"

So apparently, the short answer is No, it is not possible to have an input from a database in Logstash.
EDIT
#elssar thank you for your answer:
Actually, there is a 3rd party mongodb input for logstash -
github.com/phutchins/logstash-input-mongodb – elssar