elasticsearch: Import data from sql using c# - import

can anyone give me some directions / examples about how to import about 100 million rows from SQL server to Elasticsearch using c# language?
Currently I'm using a NEST client in c# but is very slow ( 5k - 10k / Minute ), the slowness looks like is more from the app side than ES.
Appreciate any help.

You can use IndexMany but if you want to index only one table I think you can try with JDBC plugin. After installation, you can simply execute a .bat script to index your table.
#echo off
set DIR=%~dp0
set LIB=%DIR%..\lib\*
set BIN=%DIR%..\bin
REM ???
echo {^
"type" : "jdbc",^
"jdbc" : {^
"url" : "jdbc:sqlserver://localhost:25488;instanceName=SQLEXPRESS;databaseName=AdventureWorks2014",^
"user" : "hintdesk",^
"password" : "123456",^
"sql" : "SELECT BusinessEntityID as _id, BusinessEntityID, Title, FirstName, MiddleName, LastName FROM Person.Person",^
"treat_binary_as_string" : true,^
"elasticsearch" : {^
"cluster" : "elasticsearch",^
"host" : "localhost",^
"port" : 9200^
},^
"index" : "person",^
"type" : "person"^
}^
}^ | "%JAVA_HOME%\bin\java" -cp "%LIB%" -Dlog4j.configurationFile="%BIN%\log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"

Related

Creating mongodb database and populating it with some data via docker-compose

I've the following script with a custom database specified but I don't see the database user getting created within the GUI (compass). I only see 3 default databases (admin, config, local).
I've looked into this linked answer but I need a specific answer for my question, please.
mongo:
image: mongo:4.0.10
container_name: mongo
restart: always
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: mypass
MONGO_INITDB_DATABASE: mydb
ports:
- 27017:27017
- 27018:27018
- 27019:27019
The expectation for a user database to be created.
Database prefilled with some records.
Edit - made some progress, 2 Problems
Added volumes
mongo:
image: mongo:4.0.1r0
container_name: mongo
restart: always
volumes:
- ./assets:/docker-entrypoint-initdb.d/
1. Ignore
Within assets folder, I've 3 files and I see this in the logs, my files are getting ignored.
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/file1.json
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/file2.json
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/file3.json
all my JSON files look like following. (no root array object? no [] at the root?)
{ "_id" : { "$oid" : "5d3a9d423b881e4ca04ae8f0" }, "name" : "Human Resource" }
{ "_id" : { "$oid" : "5d3a9d483b881e4ca04ae8f1" }, "name" : "Sales" }
2. Default Database not getting created. following line is not having any effect.
MONGO_INITDB_DATABASE: mydb
All files *.json extension will be ignored, it should in *.js. Look into the documentation of mongo DB docker hub
MONGO_INITDB_DATABASE
This variable allows you to specify the name of a database to be used
for creation scripts in /docker-entrypoint-initdb.d/*.js (see
Initializing a fresh instance below). MongoDB is fundamental
designed for "create on first use", so if you do not insert data with
your JavaScript files, then no database is created.
Initializing a fresh instance
When a container is started for the first time it will execute files
with extensions .sh and .js that are found in
/docker-entrypoint-initdb.d. Files will be executed in alphabetical
order. .js files will be executed by mongo using the database
specified by the MONGO_INITDB_DATABASE variable, if it is present, or
test otherwise. You may also switch databases within the .js script.
you can look into this example
create folder data and place create_article.js in it
( in the example I am passing your created DB user)
db = db.getSiblingDB("user");
db.article.drop();
db.article.save( {
title : "this is my title" ,
author : "bob" ,
posted : new Date(1079895594000) ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
});
db.article.save( {
title : "this is your title" ,
author : "dave" ,
posted : new Date(4121381470000) ,
pageViews : 7 ,
tags : [ "fun" , "nasty" ] ,
comments : [
{ author :"barbara" , text : "this is interesting" } ,
{ author :"jenny" , text : "i like to play pinball", votes: 10 }
],
other : { bar : 14 }
});
db.article.save( {
title : "this is some other title" ,
author : "jane" ,
posted : new Date(978239834000) ,
pageViews : 6 ,
tags : [ "nasty" , "filthy" ] ,
comments : [
{ author :"will" , text : "i don't like the color" } ,
{ author :"jenny" , text : "can i get that in green?" }
],
other : { bar : 14 }
});
mount the data directory
docker run --rm -it --name some-mongo -v /home/data/:/docker-entrypoint-initdb.d/ -e MONGO_INITDB_DATABASE=user -e MONGO_INITDB_ROOT_USERNAME=root -e MONGO_INITDB_ROOT_PASSWORD=mypass mongo:4.0.10
once container created you will be able to see the DBs,

elasticsearch 6 not allowing multiple types when trying to pipeline with mongo-connector

I am trying to push data from mongodb3.6 to elasticsearch6.1 using mongo-connector.
My records are:
db.administrators.find({}).pretty()
{
"_id" : ObjectId("5701d81893dc484c812b4fc1"),
"name" : "Test Naupada",
"username" : "adminn",
"ward" : "56a6129f44fc869f215fe3fe",
"password" : "nadmin"
}
rs0:PRIMARY> db.sub_ward_master.find({}).pretty()
{
"_id" : ObjectId("56a6129f44fc869f215fe3fe"),
"wardCode" : "3",
"wardName" : "Naupada",
"wardgeoCodes" : [],
"cityName" : "thane"
}
When I run mongo-connector I am getting following error:
OperationFailed: (u'1 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'administrators', u'_index': u'smartjn', u'error': {u'reason': u'Rejecting mapping update to [smartjn] as the final mapping would have more than 1 type: [sub_ward_master, administrators]', u'type': u'illegal_argument_exception'}, u'_id': u'5701d81893dc484c812b4fc1', u'data': {u'username': u'adminn', u'ward': u'56a6129f44fc869f215fe3fe', u'password': u'nadmin', u'name': u'Test Naupada'}}}
Any help any one?
Thanks
ES 6 does not allow to create more than one type in any single index.
There's an open issue in the mongo-connector repo to support ES 6. Until that's solved, you should go with ES 5 instead.
You can do it in ES6 by creating a new index for different document type (ie different collection in mongoDB) and use -g flag to direct it to new index.
For example:
mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager -n {db}.{collection_name} -g {new_index}.{document_type}.
Refer mongo-connector-wiki

mongoid/ mongodb query with time condition is extremely slow

I want to select all the data which ts(timestamp) less than specific time
last_record = History.where(report_type: /#{params["report_type"]}/).order_by(ts: 1).only(:ts).last
History.where(:ts.lte => last_record.ts)
It seems this query will take super long time
I don't understand why, is there any quick way to do the sort of query ?
class History
include Mongoid::Document
include Mongoid::Timestamps
include Mongoid::Attributes::Dynamic
field :report_type, type: String
field :symbol, type: String
field :ts, type: Time
end
The query log in console
Started GET "/q/com_disagg/last" for 127.0.0.1 at 2015-01-10 10:36:55 +0800
Processing by QueryController#last as HTML
Parameters: {"report_type"=>"com_disagg"}
MOPED: 127.0.0.1:27017 COMMAND database=admin command={:ismaster=>1} runtime: 0.4290ms
...
MOPED: 127.0.0.1:27017 GET_MORE database=cot_development collection=histories limit=0 cursor_id=44966970901 runtime: 349.9560ms
Have set the timestamp as index, but still extremely slow query
db.system.indexes.find()
{ "v" : 1, "key" : { "ts" : 1 },
"name" : "ts_index",
"ns" : "cot_development.histories" }

elasticsearch jdbc river polling--- load data from mysql repeatedly

When using https://github.com/jprante/elasticsearch-river-jdbc I notice that the following curl statement successfully indexes data the first time. However, the river fails to repeatedly poll the database for updates.
To restate, when I run the following, the river successfully connects to MySQL, runs the query successfully, indexes the results, but never runs the query again.
curl -XPUT '127.0.0.1:9200/_river/projects_river/_meta' -d '{
"type" : "jdbc",
"index" : {
"index" : "test_projects",
"type" : "project",
"bulk_size" : 100,
"max_bulk_requests" : 1,
"autocommit": true
},
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"poll" : "1m",
"strategy" : "simple",
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "root",
"sql" : "SELECT name, updated_at from projects p where p.updated_at > date_sub(now(),interval 1 minute)"
}
}'
Tailing the log, I see:
[2013-09-27 16:32:24,482][INFO ][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
[2013-09-27 16:33:24,488][INFO ]> [org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
[2013-09-27 16:34:24,494][INFO ]> [org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
But the index stays empty. Running on a macbook pro with elasticsearch version stable 0.90.2, HEAD and mysql-connector-java-5.1.25-bin.jar in the river pligns directory.
I think if you switch your strategy value from "simple" to "poll" you may get what you are looking for - it has worked for me with jdbc on that version of elasticsearch against MS SQL.
Also - you will need to select a field as _id (select primarykey as _id) as this is used in the elasticsearch river for determining what records are added/deleted/updated.

No updatedExisting from getLastError in MongoLab

I am running updates against a database in MongoLab (Heroku) and cannot get information from getLastError.
As an example, below are statements to update a collection in a MongoDB database running locally in my machine (db version v2.0.3-rc1).
ariels-MacBook:mongodb ariel$ mongo
MongoDB shell version: 2.0.3-rc1
connecting to: test
> db.mycoll.insert({'key': '1','data': 'somevalue'});
> db.mycoll.find();
{ "_id" : ObjectId("505bcc5783cdc9e90ffcddd8"), "key" : "1", "data" : "somevalue" }
> db.mycoll.update({'key': '1'},{$set: {'data': 'anothervalue'}});
> db.runCommand('getlasterror');
{
"updatedExisting" : true,
"n" : 1,
"connectionId" : 4,
"err" : null,
"ok" : 1
}
>
All is well locally.
Now I switch to a database in MongoLab and run the same statements to update a document. getLastError is not returning an updatedExisting field. Hence, I am unable to test if my update was successful or otherwise.
ariels-MacBook:mongodb ariel$ mongo ds0000000.mongolab.com:00000/heroku_app00000 -u someuser -p somepassword
MongoDB shell version: 2.0.3-rc1
connecting to: ds000000.mongolab.com:00000/heroku_app00000
> db.mycoll.insert({'key': '1','data': 'somevalue'});
> db.mycoll.find();
{ "_id" : ObjectId("505bcf9b2421140a6b8490dd"), "key" : "1", "data" : "somevalue" }
> db.mycoll.update({'key': '1'},{$set: {'data': 'anothervalue'}});
> db.runCommand('getlasterror');
{
"n" : 0,
"lastOp" : NumberLong("5790450143685771265"),
"connectionId" : 1097505,
"err" : null,
"ok" : 1
}
> db.mycoll.find();
{ "_id" : ObjectId("505bcf9b2421140a6b8490dd"), "data" : "anothervalue", "key" : "1" }
>
Did anyone run into this?
If it matters, my resource at MongoLab is running mongod v2.0.7 (my shell is 2.0.3).
Not exactly sure what I am missing.
I am waiting to hear from their support (I will post here when I hear back) but wanted to check with you fine folks here as well just in case.
Thank you.
This looks to be a limitation of not having admin privileges to the mongod process. You might file a ticket with 10gen as it doesn't seem like a necessary limitation.
When I run Mongo in auth mode on my laptop I need to authenticate as a user in the admin database in order to see an "n" other than 0 or the "updatedExisting" field. When I authenticate as a user in any other database I get similar results to what you're seeing in MongoLab production.
(Full disclosure: I work for MongoLab. As a side note, I don't see the support ticket you mention in our system. We'd be happy to work with you directly if you'd like. You can reach us at support#mongolab.com or http://support.mongolab.com)