elasticsearch jdbc river polling--- load data from mysql repeatedly - plugins

When using https://github.com/jprante/elasticsearch-river-jdbc I notice that the following curl statement successfully indexes data the first time. However, the river fails to repeatedly poll the database for updates.
To restate, when I run the following, the river successfully connects to MySQL, runs the query successfully, indexes the results, but never runs the query again.
curl -XPUT '127.0.0.1:9200/_river/projects_river/_meta' -d '{
"type" : "jdbc",
"index" : {
"index" : "test_projects",
"type" : "project",
"bulk_size" : 100,
"max_bulk_requests" : 1,
"autocommit": true
},
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"poll" : "1m",
"strategy" : "simple",
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "root",
"sql" : "SELECT name, updated_at from projects p where p.updated_at > date_sub(now(),interval 1 minute)"
}
}'
Tailing the log, I see:
[2013-09-27 16:32:24,482][INFO ][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
[2013-09-27 16:33:24,488][INFO ]> [org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
[2013-09-27 16:34:24,494][INFO ]> [org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
But the index stays empty. Running on a macbook pro with elasticsearch version stable 0.90.2, HEAD and mysql-connector-java-5.1.25-bin.jar in the river pligns directory.

I think if you switch your strategy value from "simple" to "poll" you may get what you are looking for - it has worked for me with jdbc on that version of elasticsearch against MS SQL.
Also - you will need to select a field as _id (select primarykey as _id) as this is used in the elasticsearch river for determining what records are added/deleted/updated.

Related

Cannot update `max_wal_senders` parameter, always set to 1

I'm trying to increase the max_wal_senders param and no matter how or what I set it to, it always shows up as 1.
Updated postgresql.conf, only one instanced of max_wal_senders and it's set to 10.
I've also used alter system set max_wal_senders = 10 and verified it's showing as 10 in the auto.conf.
I've restarted the DB multiple times, and updating other config changes like max_connections are showing as updated when looking at show max_connections so I know the config I'm updating is the correct one.
In running select * from pg_settings where name = 'max_wal_senders';
Current value is 1, boot_value is set to 10, and the reset_value is set to 1.
It seems like it's getting reset or the changes just aren't being applied for some reason, but not having the issue with any other parameter. Anything I'm missing?
Should also be noted that I'm running Postgres though docker and my method for restarting postgres is simply restarting the docker container. (again, this works for other config changes, so not sure if it matters)
{
"select * from pg_settings where name = 'max_wal_senders'": [
{
"name" : "max_wal_senders",
"setting" : "1",
"unit" : null,
"category" : "Replication \/ Sending Servers",
"short_desc" : "Sets the maximum number of simultaneously running WAL sender processes.",
"extra_desc" : null,
"context" : "postmaster",
"vartype" : "integer",
"source" : "command line",
"min_val" : "0",
"max_val" : "262143",
"enumvals" : null,
"boot_val" : "10",
"reset_val" : "1",
"sourcefile" : null,
"sourceline" : null,
"pending_restart" : false
}
]}
In checking my docker-compose.yml, in the postgres command I'm setting cmax_wal_senders=1.
postgres -cwal_level=archive -carchive_mode=on -carchive_command="/usr/bin/wget wale/wal-push/%f -O -" -carchive_timeout=600 -ccheckpoint_timeout=700 -cmax_wal_senders=1
I've updated this to 10 though and have restarted the container and am still seeing 1.
postgres -cwal_level=archive -carchive_mode=on -carchive_command="/usr/bin/wget wale/wal-push/%f -O -" -carchive_timeout=600 -ccheckpoint_timeout=700 -cmax_wal_senders=10
The explanation can be seen in the output from pg_settings: the source of the setting is "command line". That means that the server was started with that explicit parameter value, e.g.
postgres -c max_wal_senders=1 -D datadir
That will override the setting in the configuration files.

elasticsearch 6 not allowing multiple types when trying to pipeline with mongo-connector

I am trying to push data from mongodb3.6 to elasticsearch6.1 using mongo-connector.
My records are:
db.administrators.find({}).pretty()
{
"_id" : ObjectId("5701d81893dc484c812b4fc1"),
"name" : "Test Naupada",
"username" : "adminn",
"ward" : "56a6129f44fc869f215fe3fe",
"password" : "nadmin"
}
rs0:PRIMARY> db.sub_ward_master.find({}).pretty()
{
"_id" : ObjectId("56a6129f44fc869f215fe3fe"),
"wardCode" : "3",
"wardName" : "Naupada",
"wardgeoCodes" : [],
"cityName" : "thane"
}
When I run mongo-connector I am getting following error:
OperationFailed: (u'1 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'administrators', u'_index': u'smartjn', u'error': {u'reason': u'Rejecting mapping update to [smartjn] as the final mapping would have more than 1 type: [sub_ward_master, administrators]', u'type': u'illegal_argument_exception'}, u'_id': u'5701d81893dc484c812b4fc1', u'data': {u'username': u'adminn', u'ward': u'56a6129f44fc869f215fe3fe', u'password': u'nadmin', u'name': u'Test Naupada'}}}
Any help any one?
Thanks
ES 6 does not allow to create more than one type in any single index.
There's an open issue in the mongo-connector repo to support ES 6. Until that's solved, you should go with ES 5 instead.
You can do it in ES6 by creating a new index for different document type (ie different collection in mongoDB) and use -g flag to direct it to new index.
For example:
mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager -n {db}.{collection_name} -g {new_index}.{document_type}.
Refer mongo-connector-wiki

mongoid/ mongodb query with time condition is extremely slow

I want to select all the data which ts(timestamp) less than specific time
last_record = History.where(report_type: /#{params["report_type"]}/).order_by(ts: 1).only(:ts).last
History.where(:ts.lte => last_record.ts)
It seems this query will take super long time
I don't understand why, is there any quick way to do the sort of query ?
class History
include Mongoid::Document
include Mongoid::Timestamps
include Mongoid::Attributes::Dynamic
field :report_type, type: String
field :symbol, type: String
field :ts, type: Time
end
The query log in console
Started GET "/q/com_disagg/last" for 127.0.0.1 at 2015-01-10 10:36:55 +0800
Processing by QueryController#last as HTML
Parameters: {"report_type"=>"com_disagg"}
MOPED: 127.0.0.1:27017 COMMAND database=admin command={:ismaster=>1} runtime: 0.4290ms
...
MOPED: 127.0.0.1:27017 GET_MORE database=cot_development collection=histories limit=0 cursor_id=44966970901 runtime: 349.9560ms
Have set the timestamp as index, but still extremely slow query
db.system.indexes.find()
{ "v" : 1, "key" : { "ts" : 1 },
"name" : "ts_index",
"ns" : "cot_development.histories" }

elasticsearch: Import data from sql using c#

can anyone give me some directions / examples about how to import about 100 million rows from SQL server to Elasticsearch using c# language?
Currently I'm using a NEST client in c# but is very slow ( 5k - 10k / Minute ), the slowness looks like is more from the app side than ES.
Appreciate any help.
You can use IndexMany but if you want to index only one table I think you can try with JDBC plugin. After installation, you can simply execute a .bat script to index your table.
#echo off
set DIR=%~dp0
set LIB=%DIR%..\lib\*
set BIN=%DIR%..\bin
REM ???
echo {^
"type" : "jdbc",^
"jdbc" : {^
"url" : "jdbc:sqlserver://localhost:25488;instanceName=SQLEXPRESS;databaseName=AdventureWorks2014",^
"user" : "hintdesk",^
"password" : "123456",^
"sql" : "SELECT BusinessEntityID as _id, BusinessEntityID, Title, FirstName, MiddleName, LastName FROM Person.Person",^
"treat_binary_as_string" : true,^
"elasticsearch" : {^
"cluster" : "elasticsearch",^
"host" : "localhost",^
"port" : 9200^
},^
"index" : "person",^
"type" : "person"^
}^
}^ | "%JAVA_HOME%\bin\java" -cp "%LIB%" -Dlog4j.configurationFile="%BIN%\log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"

No updatedExisting from getLastError in MongoLab

I am running updates against a database in MongoLab (Heroku) and cannot get information from getLastError.
As an example, below are statements to update a collection in a MongoDB database running locally in my machine (db version v2.0.3-rc1).
ariels-MacBook:mongodb ariel$ mongo
MongoDB shell version: 2.0.3-rc1
connecting to: test
> db.mycoll.insert({'key': '1','data': 'somevalue'});
> db.mycoll.find();
{ "_id" : ObjectId("505bcc5783cdc9e90ffcddd8"), "key" : "1", "data" : "somevalue" }
> db.mycoll.update({'key': '1'},{$set: {'data': 'anothervalue'}});
> db.runCommand('getlasterror');
{
"updatedExisting" : true,
"n" : 1,
"connectionId" : 4,
"err" : null,
"ok" : 1
}
>
All is well locally.
Now I switch to a database in MongoLab and run the same statements to update a document. getLastError is not returning an updatedExisting field. Hence, I am unable to test if my update was successful or otherwise.
ariels-MacBook:mongodb ariel$ mongo ds0000000.mongolab.com:00000/heroku_app00000 -u someuser -p somepassword
MongoDB shell version: 2.0.3-rc1
connecting to: ds000000.mongolab.com:00000/heroku_app00000
> db.mycoll.insert({'key': '1','data': 'somevalue'});
> db.mycoll.find();
{ "_id" : ObjectId("505bcf9b2421140a6b8490dd"), "key" : "1", "data" : "somevalue" }
> db.mycoll.update({'key': '1'},{$set: {'data': 'anothervalue'}});
> db.runCommand('getlasterror');
{
"n" : 0,
"lastOp" : NumberLong("5790450143685771265"),
"connectionId" : 1097505,
"err" : null,
"ok" : 1
}
> db.mycoll.find();
{ "_id" : ObjectId("505bcf9b2421140a6b8490dd"), "data" : "anothervalue", "key" : "1" }
>
Did anyone run into this?
If it matters, my resource at MongoLab is running mongod v2.0.7 (my shell is 2.0.3).
Not exactly sure what I am missing.
I am waiting to hear from their support (I will post here when I hear back) but wanted to check with you fine folks here as well just in case.
Thank you.
This looks to be a limitation of not having admin privileges to the mongod process. You might file a ticket with 10gen as it doesn't seem like a necessary limitation.
When I run Mongo in auth mode on my laptop I need to authenticate as a user in the admin database in order to see an "n" other than 0 or the "updatedExisting" field. When I authenticate as a user in any other database I get similar results to what you're seeing in MongoLab production.
(Full disclosure: I work for MongoLab. As a side note, I don't see the support ticket you mention in our system. We'd be happy to work with you directly if you'd like. You can reach us at support#mongolab.com or http://support.mongolab.com)