elasticsearch 6 not allowing multiple types when trying to pipeline with mongo-connector - mongodb

I am trying to push data from mongodb3.6 to elasticsearch6.1 using mongo-connector.
My records are:
db.administrators.find({}).pretty()
{
"_id" : ObjectId("5701d81893dc484c812b4fc1"),
"name" : "Test Naupada",
"username" : "adminn",
"ward" : "56a6129f44fc869f215fe3fe",
"password" : "nadmin"
}
rs0:PRIMARY> db.sub_ward_master.find({}).pretty()
{
"_id" : ObjectId("56a6129f44fc869f215fe3fe"),
"wardCode" : "3",
"wardName" : "Naupada",
"wardgeoCodes" : [],
"cityName" : "thane"
}
When I run mongo-connector I am getting following error:
OperationFailed: (u'1 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'administrators', u'_index': u'smartjn', u'error': {u'reason': u'Rejecting mapping update to [smartjn] as the final mapping would have more than 1 type: [sub_ward_master, administrators]', u'type': u'illegal_argument_exception'}, u'_id': u'5701d81893dc484c812b4fc1', u'data': {u'username': u'adminn', u'ward': u'56a6129f44fc869f215fe3fe', u'password': u'nadmin', u'name': u'Test Naupada'}}}
Any help any one?
Thanks

ES 6 does not allow to create more than one type in any single index.
There's an open issue in the mongo-connector repo to support ES 6. Until that's solved, you should go with ES 5 instead.

You can do it in ES6 by creating a new index for different document type (ie different collection in mongoDB) and use -g flag to direct it to new index.
For example:
mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager -n {db}.{collection_name} -g {new_index}.{document_type}.
Refer mongo-connector-wiki

Related

mongoimport --mode merge --typ csv --collection not working

Trying to import the following csv:
_id,receiver,month,accrualMonth,paymentData.bankCode,operation
573378aef3af68090023da7d,547517955021020200599440,2016-05,2016-04,41,Manual
When I run in mongo shell mongo version 3.4.5
mongoimport --db mean-dev --mode=merge --collection fulfilledpayments --type csv --headerline --file ~/Downloads/\Import.csv -vvvv
it returns the following log but it doesn't really import:
2018-04-04T20:51:25.331-0300 using upsert fields: [_id]
2018-04-04T20:51:25.332-0300 using 0 decoding workers
2018-04-04T20:51:25.332-0300 using 1 insert workers
2018-04-04T20:51:25.332-0300 will listen for SIGTERM, SIGINT, and SIGKILL
2018-04-04T20:51:25.360-0300 filesize: 139 bytes
2018-04-04T20:51:25.361-0300 using fields: _id,receiver,month,accrualMonth,paymentData.bankCode,operation
2018-04-04T20:51:25.381-0300 connected to: localhost
2018-04-04T20:51:25.381-0300 ns: mean-dev.fulfilledpayments
2018-04-04T20:51:25.381-0300 connected to node type: standalone
2018-04-04T20:51:25.381-0300 standalone server: setting write concern w to 1
2018-04-04T20:51:25.381-0300 using write concern: w='1', j=false, fsync=false, wtimeout=0
2018-04-04T20:51:25.381-0300 standalone server: setting write concern w to 1
2018-04-04T20:51:25.381-0300 using write concern: w='1', j=false, fsync=false, wtimeout=0
2018-04-04T20:51:25.382-0300 got line: [573378aef3af68090023da7d 547517955021020200599440 2016-05 2016-04 41 Manual]
2018-04-04T20:51:25.384-0300 imported 1 document
But nothing is really imported into the database, which remains untouched like this:
{
"_id" : ObjectId("573378aef3af68090023da7d"),
"creator" : "547517955021020200599440",
"amountTransferred" : 101.79,
"externalId" : "61fa09",
"date" : ISODate("2016-05-06T16:00:00.000-03:00"),
"payments" : [
ObjectId("559363f127c09e0900b679dd"),
ObjectId("55bc4c9170b99e090093e2a8"),
ObjectId("55e5175a3b2a8e090040d4cd"),
ObjectId("560cab8bad3c6a0900275f5a"),
ObjectId("563cc8d3f2db060900a8ba81"),
ObjectId("5661033e57d24c090035b191"),
ObjectId("568eac27eaa71c090074d5b0"),
ObjectId("56b2e691ced93a0900408267"),
ObjectId("56d905cb4c830809007e8355"),
ObjectId("56fee8063cdd4d0900776fa9"),
ObjectId("5732732e5d237d09008c57e2")
],
"__v" : 0
}
If I get the --collection fulfilledpayments parameter off it imports to a new collection, but of course there in no need for the merge mode there because it doesn't contain the _id to be matched.
Maybe you should enclose your _id within ObjectId, like:
_id,receiver,month,accrualMonth,paymentData.bankCode,operation
ObjectID(573378aef3af68090023da7d),547517955021020200599440,2016-05,2016-04,41,Manual
https://docs.mongodb.com/manual/reference/program/mongoimport/#ex-mongoimport-merge

elasticsearch jdbc river polling--- load data from mysql repeatedly

When using https://github.com/jprante/elasticsearch-river-jdbc I notice that the following curl statement successfully indexes data the first time. However, the river fails to repeatedly poll the database for updates.
To restate, when I run the following, the river successfully connects to MySQL, runs the query successfully, indexes the results, but never runs the query again.
curl -XPUT '127.0.0.1:9200/_river/projects_river/_meta' -d '{
"type" : "jdbc",
"index" : {
"index" : "test_projects",
"type" : "project",
"bulk_size" : 100,
"max_bulk_requests" : 1,
"autocommit": true
},
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"poll" : "1m",
"strategy" : "simple",
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "root",
"sql" : "SELECT name, updated_at from projects p where p.updated_at > date_sub(now(),interval 1 minute)"
}
}'
Tailing the log, I see:
[2013-09-27 16:32:24,482][INFO ][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
[2013-09-27 16:33:24,488][INFO ]> [org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
[2013-09-27 16:34:24,494][INFO ]> [org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run, waiting 1m
But the index stays empty. Running on a macbook pro with elasticsearch version stable 0.90.2, HEAD and mysql-connector-java-5.1.25-bin.jar in the river pligns directory.
I think if you switch your strategy value from "simple" to "poll" you may get what you are looking for - it has worked for me with jdbc on that version of elasticsearch against MS SQL.
Also - you will need to select a field as _id (select primarykey as _id) as this is used in the elasticsearch river for determining what records are added/deleted/updated.

Can i use mongoexport --query <file> where file is a list of conditions

I have an array of ids stored in a file, and I want to retrieve their data from the mongdb
so i looked into the mongoexport method. it seems --query option can only accept a json instead read a large json or array from a file. In my case, it is about 4000 ids stored in the file. Is there a solution to this?
I was able to use
mongoexport --db db --collection collection --field name --csv -oout ~/data.csv
but how to read query conditions from a file
for example, for mongoid in rails application, query like this is Data.where(:_id.in => array).
or is it possible to do from mongo shell by executing a javscript file
tks
I believe you can use a javascript to output the array you need.
you can use "printjson" command in your script, for example:
create a script.js javascript file as following:
script.js:
printjson( db.albums.find({_id : 18}, {"images" : 1,"_id":0}).toArray() )
Call hi as follow:
mongo test script.js > out.txt
In my local environment albums collection has the following structure:
db.albums.findOne({"_id":18
{
"_id" : 18,
"images" : [
2926,
5377,
8036,
9023,
10119,
11543,
12305,
12556,
12576,
13753,
14414,
14865,
15193,
15933,
17156,
17314,
17391,
20168,
21705,
22016,
22348,
23036,
23452,
24112,
27086,
27310,
27864,
28092,
29184,
29190,
29250,
29354,
29454,
29563,
30366,
30619,
31390,
31825,
31906,
32339,
32674,
33307,
33844,
37475,
37976,
38717,
38774,
39801,
41369,
41752,
44977,
45384,
45643,
46918,
47069,
50099,
52755,
54314,
54497,
62338,
63438,
63572,
63600,
65631,
66953,
67160,
67369,
69802,
71087,
71127,
71282,
73123,
73201,
73954,
74972,
76279,
77054,
78397,
78645,
78936,
79364,
79707,
83065,
83142,
83568,
84160,
85391,
85443,
85488,
86143,
86240,
86949,
89406,
89846,
92591,
92639,
92655,
93844,
93934,
94987,
95324,
95431,
95817,
95864,
96230,
96975,
97026
]
}
>
, so the output I got was:
$ cat out.txt
MongoDB shell version: 2.2.1
connecting to: test
[
{
"images" : [
2926,
5377,
8036,
9023,
10119,
11543,
12305,
12556,
12576,
13753,
14414,
14865,
15193,
15933,
17156,
17314,
17391,
20168,
21705,
22016,
22348,
23036,
23452,
24112,
27086,
27310,
27864,
28092,
29184,
29190,
29250,
29354,
29454,
29563,
30366,
30619,
31390,
31825,
31906,
32339,
32674,
33307,
33844,
37475,
37976,
38717,
38774,
39801,
41369,
41752,
44977,
45384,
45643,
46918,
47069,
50099,
52755,
54314,
54497,
62338,
63438,
63572,
63600,
65631,
66953,
67160,
67369,
69802,
71087,
71127,
71282,
73123,
73201,
73954,
74972,
76279,
77054,
78397,
78645,
78936,
79364,
79707,
83065,
83142,
83568,
84160,
85391,
85443,
85488,
86143,
86240,
86949,
89406,
89846,
92591,
92639,
92655,
93844,
93934,
94987,
95324,
95431,
95817,
95864,
96230,
96975,
97026
]
}
]
Regards,
Moacy

No updatedExisting from getLastError in MongoLab

I am running updates against a database in MongoLab (Heroku) and cannot get information from getLastError.
As an example, below are statements to update a collection in a MongoDB database running locally in my machine (db version v2.0.3-rc1).
ariels-MacBook:mongodb ariel$ mongo
MongoDB shell version: 2.0.3-rc1
connecting to: test
> db.mycoll.insert({'key': '1','data': 'somevalue'});
> db.mycoll.find();
{ "_id" : ObjectId("505bcc5783cdc9e90ffcddd8"), "key" : "1", "data" : "somevalue" }
> db.mycoll.update({'key': '1'},{$set: {'data': 'anothervalue'}});
> db.runCommand('getlasterror');
{
"updatedExisting" : true,
"n" : 1,
"connectionId" : 4,
"err" : null,
"ok" : 1
}
>
All is well locally.
Now I switch to a database in MongoLab and run the same statements to update a document. getLastError is not returning an updatedExisting field. Hence, I am unable to test if my update was successful or otherwise.
ariels-MacBook:mongodb ariel$ mongo ds0000000.mongolab.com:00000/heroku_app00000 -u someuser -p somepassword
MongoDB shell version: 2.0.3-rc1
connecting to: ds000000.mongolab.com:00000/heroku_app00000
> db.mycoll.insert({'key': '1','data': 'somevalue'});
> db.mycoll.find();
{ "_id" : ObjectId("505bcf9b2421140a6b8490dd"), "key" : "1", "data" : "somevalue" }
> db.mycoll.update({'key': '1'},{$set: {'data': 'anothervalue'}});
> db.runCommand('getlasterror');
{
"n" : 0,
"lastOp" : NumberLong("5790450143685771265"),
"connectionId" : 1097505,
"err" : null,
"ok" : 1
}
> db.mycoll.find();
{ "_id" : ObjectId("505bcf9b2421140a6b8490dd"), "data" : "anothervalue", "key" : "1" }
>
Did anyone run into this?
If it matters, my resource at MongoLab is running mongod v2.0.7 (my shell is 2.0.3).
Not exactly sure what I am missing.
I am waiting to hear from their support (I will post here when I hear back) but wanted to check with you fine folks here as well just in case.
Thank you.
This looks to be a limitation of not having admin privileges to the mongod process. You might file a ticket with 10gen as it doesn't seem like a necessary limitation.
When I run Mongo in auth mode on my laptop I need to authenticate as a user in the admin database in order to see an "n" other than 0 or the "updatedExisting" field. When I authenticate as a user in any other database I get similar results to what you're seeing in MongoLab production.
(Full disclosure: I work for MongoLab. As a side note, I don't see the support ticket you mention in our system. We'd be happy to work with you directly if you'd like. You can reach us at support#mongolab.com or http://support.mongolab.com)

mongo dbname --eval 'db.collection.find()' does not work

Why does this work:
# mongo dbname
MongoDB shell version: 1.8.3
connecting to: nextmuni_staging
> db.collection.find()
{ "foo" : "bar" }
> bye
While this does not work:
# mongo localhost/dbname --eval 'db.collection.find()'
MongoDB shell version: 1.8.3
connecting to: localhost/dbname
DBQuery: dbname.collection -> undefined
It should be exactly the same, no?
Thanks!
The return val of db.collection.find() is a cursor type. Executing this command from within the shell will create a cursor and show you the first page of data. You can start going through the rest by repeating the 'it' command.
I think the scope of variables used during the execution of an eval'd script is only for the lifetime of the script (data can be persisted into collections of course) so once the script terminates those cursor variables no longer exist and so you would be able to send another eval script to page the data. So the behaviour you get during a shell session wouldn't really work from an eval script.
To get close to the behaviour you could run something like this:
mongo dbname --eval "db.collection.find().forEach(printjson)"
That shows you that the command does execute and produce a cursor which you can then iterate over sending the output to stdout.
Edit: I think the point I was trying to make was that the command you are issuing is working its just the output is not what you expect.
The printjson functions covers a lot of ground when scripting with mongo --eval '...'. Rather than chaining .forEach you can simply wrap your call.
$ mongo --eval 'db.stats_data.stats()' db_name
MongoDB shell version: 2.4.14
connecting to: db_name
[object Object]
$ mongo --eval 'db.stats_data.stats().forEach(printjson)' db_name
MongoDB shell version: 2.4.14
connecting to: db_name
Tue Jan 10 15:32:11.961 TypeError: Object [object Object] has no method 'forEach'
$ mongo --eval 'printjson(db.stats_data.stats())' db_name
MongoDB shell version: 2.4.14
connecting to: db_name
{
"ns" : "db_name.stats_data",
"count" : 5516290,
"size" : 789938800,
"avgObjSize" : 143.20110073980882,
"storageSize" : 1164914688,
"numExtents" : 18,
"nindexes" : 3,
"lastExtentSize" : 307515392,
"paddingFactor" : 1.0000000000000457,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 1441559616,
"indexSizes" : {
"_id_" : 185292688,
"owner_id_key_idx" : 427678384,
"onwer_metric_key_idx" : 828588544
},
"ok" : 1
}