Logstash Reading Same Data (Duplicates) - postgresql

I'm using logstash input jdbc plugin to read one database and send the data to elasticsearch.
My Logstash.conf file looks like this:
input {
jdbc {
jdbc_driver_library => "${LOGSTASH_JDBC_DRIVER_JAR_LOCATION}"
jdbc_driver_class => "${LOGSTASH_JDBC_DRIVER}"
jdbc_connection_string => "${LOGSTASH_JDBC_URL}"
jdbc_user => "${LOGSTASH_JDBC_USERNAME}"
jdbc_password => "${LOGSTASH_JDBC_PASSWORD}"
schedule => "* * * * *"
statement => "select * from testtable"
use_column_value => true
tracking_column => "time"
}
}
filter {
mutate {
add_field => { "message" => "%{time}" }
convert => [ "time", "string" ]
}
date {
timezone => "Etc/GMT+3"
match => ["time" , "ISO8601", "yyyy-MM-dd HH:mm:ss.SSS"]
target => "#timestamp"
remove_field => [ "time", "timestamp" ]
}
fingerprint {
source => ["testid", "programid", "unitid"]
target => "[#metadata][fingerprint]"
method => "MD5"
key => "${LOGSTASH_JDBC_PASSWORD}"
}
ruby {
code => "event.set('[#metadata][tsprefix]', event.get('#timestamp').to_i.to_s(16))"
}
}
output {
elasticsearch {
hosts => ["${LOGSTASH_ELASTICSEARCH_HOST}"]
user => "${ELASTIC_USER}"
password => "${ELASTIC_PASSWORD}"
index => "test"
document_id => "%{[#metadata][tsprefix]}%{[#metadata][fingerprint]}"
}
stdout { codec => json_lines }
}
I tried using this .conf without these lines:
use_column_value => true
tracking_column => "time"
Also tried using:
clean_run => true
But Logstash keeps reading same data over and over again.
Can you help me understand why Logstash keeps reading?
Logstash (8.3.1)
Database (PostgreSQL 14.5)
JDBC (42.4.1)

statement query in your jdbc input configuration "select * from testtable" will read all the contents from DB table on each run. Input configuration should be as below to avoid reading same data repeatedly.
jdbc {
jdbc_driver_library => "${LOGSTASH_JDBC_DRIVER_JAR_LOCATION}"
jdbc_driver_class => "${LOGSTASH_JDBC_DRIVER}"
jdbc_connection_string => "${LOGSTASH_JDBC_URL}"
jdbc_user => "${LOGSTASH_JDBC_USERNAME}"
jdbc_password => "${LOGSTASH_JDBC_PASSWORD}"
schedule => "* * * * *"
statement => "select * from testtable where time > :sql_lat_value"
use_column_value => true
tracking_column => "time"
record_last_run => true
last_run_metadata_path => <valid file path>
}

Related

JDBC Logstash Elastic Kibana

I'm using JDBC input plugin to ingest data from mongodb to ElasticSearch.
My config is:
`input {
jdbc {
jdbc_driver_class => "mongodb.jdbc.MongoDriver"
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mongodb_unityjdbc_free.jar"
jdbc_user => ""
jdbc_password => ""
jdbc_connection_string => "jdbc:mongodb://localhost:27017/pritunl"
schedule => "* * * * *"
jdbc_page_size => 100000
jdbc_paging_enabled => true
statement => "select * from servers_output"
}
}
filter {
mutate {
copy => { "_id" => "[#metadata][id]"}
remove_field => ["_id"]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "pritunl"
document_id => "%{[#metadata][_id]}"
}
stdout {}
}`
In Kibana I see only one hitenter image description here, but in stdout I see many records from mongodb collection. What should I do, to see them all?
The problem is, that all your documents are saved with the same "_id", so even though you're sending different records to ES, only one document is being overwritten internally - thus you get 1 hit in Kibana.
There is a typo in your configuration to cause this issue.
You're copying "_id" into "[#metadata][id]"
But you're trying to read "[#metadata][_id]" with an underscore.
Removing the underscore when reading the value for document_id should fix your issue.
output {
elasticsearch {
hosts => "localhost:9200"
index => "pritunl"
document_id => "%{[#metadata][id]}"
}
stdout {}
}`

Not getting SQL Statement output using ELK logstash jdbc plugin

Trying to get output using logstash with jdbc plugin but not getting data. Try different way to get pg_stat_replication data. But at Index Management did not found that index where my index name is pg_stat_replication.
Conf. file path /etc/logstash/conf.d/postgresql.conf
input {
# pg_stat_replication;
jdbc {
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/postgresql-jdbc.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://192.168.43.21:5432/postgres"
jdbc_user => "postgres"
jdbc_password => "p"
statement => "SELECT * FROM pg_stat_replication"
schedule => "* * * * *"
type => "pg_stat_replication"
}
}
output {
elasticsearch {
hosts => "http://192.168.43.100:9200"
index => "%{type}"
user => "elastic"
password => "abc#123"
}
}
~

Connect to mongodb using logstash Jdbc_streaming filter plugin

I'm trying to fetch data from mongodb using Jdbc_streaming filter plugin in logstash in windows.
I'm using mongo-java-driver-3.4.2.jar to connect to the database but, getting a error like this.
JavaSql::SQLException: No suitable driver found for jdbc:mongo://localhost:27017/EmployeeDB
No any luck with existing references. I'm using logstash 7.8.0 version. This is my logstash config:
jdbc_streaming {
jdbc_driver_library => "C:/Users/iTelaSoft-User/Downloads/logstash-7.8.0/mongo-java-driver-3.4.2.jar"
jdbc_driver_class => "com.mongodb.MongoClient"
jdbc_connection_string => "jdbc:mongo://localhost:27017/EmployeeDB"
statement => "select * from Employee"
target => "name"
}
You can also try as follows:
download https://dbschema.com/jdbc-drivers/MongoDbJdbcDriver.zip
unzip and copy all the files to the path(~/logstash-7.8.0/logstash-core/lib/jars/)
modify the .config file
Example:
input {
jdbc{
jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
jdbc_driver_library => "mongojdbc2.1.jar"
jdbc_user => "user"
jdbc_password => "pwd"
jdbc_connection_string => "jdbc:mongodb://localhost:27017/EmployeeDB"
statement => "select * from Employee"
}
}
output {
stdout { }
}

Pushing data to a kafka server with Logstash

i have a logstash conf file that looks like this:
input {
jdbc {
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:#//the-s2.db.oracle.yn:1521/DPP2.mind.com"
jdbc_user => "STG_TEST"
jdbc_password => "cddcdcd"
parameters => {"orderid" => 1212332365}
statement => "select PO_SEARCH_IL_ID,ORDER_DATE,REF_1,SHIPPING_WINDOW_START,SHIPPING_WINDOW_END FROM ods.po_search_il where PO_SEARCH_IL_ID =:orderid "
schedule => "* * * * *"
clean_run => true
}
}
output {
kafka {
bootstrap_servers => "mykafkaservername.kn:9092"
topic_id => ["test3"]
}
}
the Script run the topic test3 is created into the kafka server but no data in there.
Does somebody could help on that issue?

How to sync changes from mongodb to elasticsearch(i.e.reflect changes in elastic when crud operation performed in mongo) using logstash

I just want to sync the data from mongodb to elastic using logstash. Its working good, when any new record comes in mongodb, logstash pushes into elastic. But when I update any record in mongodb then it does not change into elasticsearch. I want to make changes in the config file so that when any record updates in mongo it should reflect in elastic as well.
I have already tried by making action => "index" as well as action =>"update" in .conf file.
Here is my mongodata.conf file
input{
mongodb{
uri => 'mongodb://localhost:27017/DBname'
placeholder_db_dir => '/opt/logstash'
placeholder_db_name => 'logstash_sqlite.db'
collection => "collectionName"
batch_size => 500
}
}
filter {
mutate {
remove_field => [ "_id" ]
}
}
output{
elasticsearch{
action => "index"
hosts => ["localhost:9200"]
index => "mongo_hi_log_data"
}
stdout{ codec => rubydebug }
}
I want to sync the data from mongodb to elastic using logstash when any record is updated or inserted in mongodb.
input {
mongodb {
uri => 'mongodb://mongo-db-url/your-collection'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'products'
batch_size => 5000
}
}
filter {
date {
match => [ "logdate", "ISO8601" ]
}
mutate {
rename => { "[_id]" => "product_id" }
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["http://es-url"]
user => "elastic"
password => "changeme"
index => "your-index"
action => "update"
doc_as_upsert => true
document_id => "%{product_id}"
timeout => 30
workers => 1
}
}