jdbc output plugin logstash using UPSERT function - postgresql

I'm using this plugin as output for my logstash logs.
I need to use the upsert function to check if a row exists then update, if it doesn't exist then simply add.
I'm using PostgreSQL as db and it supports the usage of UPSERT, very good described here. As input, the logs are coming from elasticsearch.
The problem with my configuration is that I correctly add new rows in my table but cannot update an existing one.
Here's my configuration :
jdbc {
driver_jar_path => '/home/vittorio/Downloads/postgresql-42.1.1.jre6.jar'
connection_test => false
connection_string => 'jdbc:postgresql://127.0.0.1:5432/postgres'
statement => ["
INSERT INTO userstate VALUES(?,?,?,?,?) on conflict (username)
do update set (business_name, iban, status, timestamp) = ('%{[resource][response_attributes][business_name]}','%{[resource][response_attributes][iban]}','%{[resource][response_attributes][status]}','%{#timestamp}')
where userstate.username = '%{[request][username]}';", "%{[request][username]}","%{[resource][response_attributes][business_name]}","%{[resource][response_attributes][iban]}","%{[resource][response_attributes][status]}","%{#timestamp}"
]
username => "myuser"
password => "mypass"
}
Am I doing something wrong?
thanks

I manged to make it work by myself and this is what I've done so far :
jdbc {
driver_jar_path => '/home/vittorio/Downloads/postgresql-42.1.1.jre6.jar'
connection_test => false
connection_string => 'jdbc:postgresql://127.0.0.1:5432/postgres'
statement => ["
INSERT INTO userstate VALUES(?,?,?,?,?)
on conflict (username)
do update set (business_name, iban, status, timestamp) = (?,?,?,?)
where userstate.username = ?"
, "%{[request][username]}","%{[resource][response_attributes][business_name]}","%{[resource][response_attributes][iban]}","%{[resource][response_attributes][status]}","%{#timestamp}","%{[resource][response_attributes][business_name]}","%{[resource][response_attributes][iban]}","%{[resource][response_attributes][status]}","%{#timestamp}","%{[request][username]}"
]
username => "myusername"
password => "mypass"
}
Basically,I've changed the where statement using ? instead of %{[request][username]} and then map each ? with the corresponding value from the log. I know, it's pretty long stuff after the coma but this is the only way I found to make it work. If anyone knows a better way to do it please let me know.
Thank you

Related

How to create a new field in elasticsearch and import data from postges into it?

I have a big table in postgres having 200 columns and more than a million rows. I want to migrate this data into elasticsearch using logstash. I am currently migrating around 50 columns.
What I want to know is can I add the other columns later mapping them to an index in elasticsearch? For example, say I have 10 columns in postgres and I map 4 into elasticsearch. Can I add the other 6 columns along with their data later to elasticsearch on the same index?
My current logstash config file looks like this:
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/school"
jdbc_user => "postgres"
jdbc_password => "postgres"
jdbc_driver_library => "/Users/karangupta/Downloads/postgresql-42.2.8.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_paging_enabled => true
statement_filepath => "/usr/local/Cellar/logstash/7.3.2/conf/myQuery.sql"
}
}
# output {
# stdout { codec => json_lines }
# }
output {
elasticsearch {
index => "schoolupdated"
hosts => "http://localhost:9200"
}
}
The above config file works perfectly and adds the index. How can I add fields to this index later from postgres?
I am using postgres 11.4, elasticsearch 6.8.
Yes, you can as long elasticsearch will be provided with a id for each if the rows.
Just add
document_id => "%{uniqe_identifier_column_name_in_your_result}" #column name is case senitive
to the elasticsearch output plugin configuration.
If you execute the jdbc import again (now with new tables), new fields will be added to the existing documents by overwriting the old documents.
More details on this topic: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-document_id
Have fun!

How to do atomic operations like $push $pull $set etc. in Ecto MongoDB

I'm using mongodb_ecto and I want to know how can I do operations like $push or $pull on a deeply nested field? At the moment I write back the whole document which sometimes causes false data to be in the DB due to a race-condition.
Ok, I kind of figured it out. Do not use Ecto for this. In some cases you need the MongoDB positional operator and this can only be done directly via the Mongo-Adapter. Now for some usage examples:
I have a doucment with a list of options. Options have an ID, a label and list of userIDs who voted for this option.
BTW to generate an ObjectID (which is needed for talking directly to the MongoDB-Adapter) use this:
id = "584a5b9419d51d724d146e3f" # string form
value = (for <<hex::16 <- id>>, into: <<>>, do: <<String.to_integer(<<hex::16>>, 16)::8>>)
object_id = %BSON.ObjectId{value: value}
And now for some examples:
# update label of option
Mongo.update_one(Repo.Pool, "polls",
%{"_id" => document_id, "options.id" => option_id}, # query
%{"$set" => %{"data.options.$.label" => new_label}} # update
)
# add new option to poll
Mongo.update_one(Repo.Pool, "polls",
%{"_id" => document_id},
%{"$addToSet" => %{"options" => %{label: label, id: random_new_id, votes: []}}}
)
# add user_id to option
Mongo.update_one(Repo.Pool, "polls",
%{"_id" => document_id, "options.id" => option_id},
%{"$addToSet" => %{"options.$.votes" => user_id}}
)
# remove user_id form option
Mongo.update_one(Repo.Pool, "polls",
%{"_id" => document_id, "options.id" => option_id},
%{"$pull" => %{"data.options.$.votes" => user_id}}
)

How to connecting RapidApp to PostgreSQL, with utf-8 enabled

I'm creating a simple CRUD interface to a database, and I'm trying RapidApp.
I have an existing database, which I connect to with existing Moose-based code. There is a complication in that there is UTF-8 text in the database (eg 'Encyclopédie médico-chirurgicale. Técnicas quirúrgicas. Aparato digestivo')
My Moose-based code works just fine: data goes in & data comes out... and everyone is happy.
In my existing Moose code, the connector is:
$schema = My::Service::Schema->connect(
'dbi:Pg:dbname=my_db;host=my.host.name;port=1234',
'me',
'secret',
{ pg_enable_utf8 => 1 }
);
When I set about connecting RapidApp, I first tried a simple rdbic.pl command, but that doesn't pick up the UTF-8 strings. In an attempt to enforce UTF-8-ness, I've created the following:
use Plack::Runner;
use Plack::App::RapidApp::rDbic;
my $cnf = {
connect_info => {
dsn => 'dbi:Pg:dbname=my_db;host=my.host.name;port=1234',
user => 'me',
password => 'secret',
{ pg_enable_utf8 => 1 },
},
schema_class => 'My::Service::Schema'
};
my $App = Plack::App::RapidApp::rDbic->new( $cnf );
my $psgi = $App->to_app;
my $runner = Plack::Runner->new;
$runner->parse_options('--port', '5678');
$runner->run($psgi);
(which is pretty much rdbic.pl, compressed to one specific thing)
However - I'm getting mal-formed strings (eg: 'Encyclopédie médico-chirurgicale. Técnicas quirúrgicas. Aparato digestivo')
Having fought to get the correct text INTO the database, I know the database is correct... so how do I connect RapidApp to get UTF-8 back out?
Your schema will need to be configured to support UTF-8. Here's a helpful set of things to try:
How to properly use UTF-8-encoded data from Schema inside Catalyst app?

CakePHP 3: Model Unit Test fails - "duplicate key value"

I'm using Postgres (which I think is related to the problem), and CakePHP 3.
I have the following unit test to just check to make sure that a valid dataset can get saved by the model. When I run the following test, with a standard "bake'd" Model unit test, I get the error below.
I think this is the problem:
We are using fixtures to add some base data. This is the only place that I think might be causing a problem. To add credence to this, while the unit tests were running I ran the following command to get the next auto-incrementing id value and it returned 1, even though it returned the proper number in non-test DB. Select nextval(pg_get_serial_sequence('agencies', 'id')) as new_id;
Unit Test:
public function testValidationDefault()
{
$agencyData = [
'full_name' => 'Agency Full Name',
'mode' => 'transit',
'request_api_class' => 'Rest\Get\Json',
'response_api_class' => 'NextBus\Generic',
'realtime_url_pattern' => 'http://api.example.com',
'routes' => '{"123": {"full_route": "123 Full Route", "route_color": "#123456"}}'
];
$agency = $this->Agencies->newEntity($agencyData);
$saved = $this->Agencies->save($agency);
$this->assertInstanceOf('App\Model\Entity\Agency', $saved);
}
Error:
PDOException: SQLSTATE[23505]: Unique violation: 7 ERROR: duplicate key value violates unique constraint "agencies_pkey"
DETAIL: Key (id)=(1) already exists.
Things I've tried
Copied that same code into a controller, and it successfully added the entity in the table.
Adding an id of 200. Same error appears.
Update 1
The fixture for this does have the ID field set each record. Deleting them from the fixture does work, but it breaks other unit tests that rely on some relational data.
I don't like this solution, but adding the following before saving the entity does work.
$this->Agencies->deleteAll('1=1');
[UPDATE: My other answer is the real solution to this problem.! You don't have to do this anymore...]
Here is a less dirty workaround that doesn't require deleting all the records:
use Cake\Datasource\ConnectionManager;
...
$connection = ConnectionManager::get('test');
$results = $connection->execute('ALTER SEQUENCE <tablename>_id_seq RESTART WITH 999999');
//TEST WHICH INSERTS RECORD(s)...
It appears that the auto-incrementing doesn't get properly set/reset during the setUp() or tearDown()... so manually setting it to something really high (greater than the number of existing records) prevents the "duplicate key..." error.
The benefit of this hack (over deleteAll('1=1')) is that you can still subsequently run tests that reference existing DB data.
It might be a problem in your fixture definition. The Cake PHP documentation uses a _constraints field specifying that the id field is a primary key:
'_constraints' => [
'primary' => ['type' => 'primary', 'columns' => ['id']],
]
I believe I've finally figured out the REAL solution to this problem!
I believe this issue stems from a default fixture setting that results from using the bake command to generate fixtures.
When you bake a model it creates the boilerplate for it's fixtures. Notice the autoIncrement for the ID property in the code below? Contrary to what you might think, this should not but true. When I set it to null and remove the ids from the items in the $records array I no longer get uniqueness errors.
public $fields = [
'id' => ['type' => 'integer', 'length' => 10, 'autoIncrement' => true, 'default' => null, 'null' => false, 'comment' => null, 'precision' => null, 'unsigned' => null],
'nickname' => ['type' => 'text', 'length' => null, 'default' => null, 'null' => false, 'comment' => null, 'precision' => null],
...
public $records = [
[
// 'id' => 1,
'nickname' => 'Foo bar',
'width' => 800,
...
The ninja wizards on the CakePHP project are the heroes: source
CakePHP ticket
If id fields are removed from fixture records then they will utilize auto-incrementing when inserted, leaving the table's ID sequence in the right place for inserts that happen during tests. I believe that is why it works for #emersonthis as described above.
That solution has another problem, though: you can't create dependable relationships between fixture records because you don't know what IDs they will get. What do you put in the foreign ID field of a related table? This has led me back to his original solution of just altering the table sequence after records with hard-coded IDs have been inserted. I do it like this in affected TestCases now:
public $fixtures = [
'app.articles',
'app.authors',
];
...
public function setUp()
{
$connection = \Cake\Datasource\ConnectionManager::get('test');
foreach ($this->fixtures as $fixture) {
$tableName = explode('.', $fixture)[1];
$connection->execute("
SELECT setval(
pg_get_serial_sequence('$tableName', 'id'),
(SELECT MAX(id) FROM $tableName)
)");
}
}
This moves the auto-increment sequence to the highest previously-used ID. The next time an ID is generated from the sequence it will be one higher, resolving the problem in all cases.
Including one of these solutions in an upcoming CakePHP release is being discussed here.

How to have an input of type MongoDB for Logstash

I know we can input files, and output to a mongo database. But I have a collection in my mongodb that I would like to have as an input so that I can use it with ES. Is this possible?
Thank you.
I have had a similar problem, the logstash-input-mongodb plugin is fine, but it is very limited, it also seems that it is no longer being maintained, so, I have opted for the logstash-integration-jdbc plugin.
I have followed the following steps to sync a MongoDB collection with ES:
First, I have downloaded the JDBC driver for MongoDB developed by DBSchema that you can find here.
I have prepared a custom Dockerfile to integrate the driver and plugins as you can see below:
FROM docker.elastic.co/logstash/logstash:7.9.2
RUN mkdir /usr/share/logstash/drivers
COPY ./drivers/* /usr/share/logstash/drivers/
RUN logstash-plugin install logstash-integration-jdbc
RUN logstash-plugin install logstash-output-elasticsearch
I have configured a query that will be executed every 30 seconds and will look for documents with an insert timestamp later than the timestamp of the last query (provided with the parameter :sql_last_value)
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/drivers/mongojdbc2.3.jar"
jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
jdbc_connection_string => "jdbc:mongodb://devroot:devroot#mongo:27017/files?authSource=admin"
jdbc_user => "devroot"
jdbc_password => "devroot"
schedule => "*/30 * * * * *"
statement => "db.processed_files.find({ 'document.processed_at' : {'$gte': :sql_last_value}},{'_id': false});"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
action => "create"
index => "processed_files"
hosts => ["elasticsearch:9200"]
user => "elastic"
password => "password"
ssl => true
ssl_certificate_verification => false
cacert => "/etc/logstash/keys/certificate.pem"
}
}
Hope it can help someone, regards
You could set up a river to pull data from MongoDB to Elasticsearch.
See the instructions here - http://www.codetweet.com/ubuntu-2/configuring-elasticsearch-mongodb/
I tried out with Sergio Sánchez Sánche's solution suggestion and found following updates and improvements:
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/drivers/mongojdbc3.0.jar"
jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
jdbc_connection_string => "jdbc:mongodb://devroot:devroot#mongo:27017/files?authSource=admin"
jdbc_user => "devroot"
jdbc_password => "devroot"
schedule => "*/30 * * * * *"
statement => "db.processed_files.find({ 'document.processed_at' : {'$gte': new ISODate(:sql_last_value)}},{'_id': false});"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{[document][uuid]}"
index => "processed_files"
hosts => ["elasticsearch:9200"]
user => "elastic"
password => "password"
ssl => true
ssl_certificate_verification => false
cacert => "/etc/logstash/keys/certificate.pem"
}
}
Explanation:
The date comparison in Mongodb has to use new ISODate to convert
:sql_last_value
I'd like to use "update" instead of "create" to cover
the case of update. The query result from the section input is
contained in "document". Assume you have a field with unique value
"uuid", you have to use it to identify the document, because Mongodb's
"_id" is not supported anyway.
If you have any embedded document which has also "_id" filed, you have to exclude it, too, e.g.
statement => "db.profiles.find({'updatedAt' : {'$gte': new ISODate(:sql_last_value)}},
{'_id': false, 'embedded_doc._id': false}});"
So apparently, the short answer is No, it is not possible to have an input from a database in Logstash.
EDIT
#elssar thank you for your answer:
Actually, there is a 3rd party mongodb input for logstash -
github.com/phutchins/logstash-input-mongodb – elssar