How to create an index with ElasticSearch, River and MongoDB? - mongodb

I used this tutorial to install and configure MongoDB / Elasticsearch.
The whole tutorial worked on Mac OSX Yosemite and now I tried to do the same on Ubuntu 14.04.
Here is my ElasticSearch log:
[2014-12-08 15:49:13,733][INFO ][cluster.service ] [Western Kid] new_master [Western Kid][fo8GLpDoRyKYBAkjk7f-jw][my_hostname][inet[localhost/127.0.0.1:9300]], reason: zen-disco-join (elected_as_master)
[2014-12-08 15:49:13,758][INFO ][http ] [Western Kid] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2014-12-08 15:49:13,758][INFO ][node ] [Western Kid] started
[2014-12-08 15:49:14,449][INFO ][gateway ] [Western Kid] recovered [1] indices into cluster_state
[2014-12-08 15:49:15,225][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] Starting river mongodb
[2014-12-08 15:49:15,230][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB River Plugin - version[2.0.4] - hash[7472875] - time[2014-11-11T13:26:19Z]
[2014-12-08 15:49:15,231][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] starting mongodb stream. options: secondaryreadpreference [false], drop_collection [false], include_collection [], throttlesize [5000], gridfs [false], filter [null], db [my_db_name], collection [my_collection], script [null], indexing to [my_index]/[my_type]
[2014-12-08 15:49:15,231][INFO ][river.mongodb.util ] setRiverStatus called with mongodb - RUNNING
[2014-12-08 15:57:56,543][INFO ][cluster.metadata ] [Western Kid] [_river] update_mapping [my_db_name] (dynamic)
When I tried to start indexing my collection, I got the following message:
{
"_index": "_river",
"_type": "my_type",
"_id": "_meta",
"_version": 4,
"created": false
}
The version is increasing every time I try and started with 1.
I guess, created:false means, the index could not be created for some reason but I have no idea why.

The version of the river is increasing because you are actually updating the _river index so Elasticsearch is not creating it since it's already there.
You might want to install elasticsearch-head plugin to visualize your cluster better since you don't seem to be very familiar with the API.
Try to delete the _river and create it again and you'll see that this time it will actually have the status created.

Related

Logstash MongoDB Output plugin 3.1.7 error

I am not able to connect to MongoDB Atlas with below error.Using plugin version 3.1.7.
[WARN ][logstash.outputs.mongodb ][main] MONGODB | Failed to handshake with : 27017 ArgumentError: wrong number of
arguments (given 2, expected 1)
[WARN ][logstash.outputs.mongodb ][main] MONGODB | Error checking
:27017: ArgumentError: wrong number of arguments (given
2, expected 1)
Mongodb url is "mongodb+srv://username:password#clustername.mongodb.net/dbname?retryWrites=true&w=majority"
I see, the version has fix 1 for the same issue. Am I missing anything?

Mongos Vs Mongod Compatibility Issue

I run a Sharded Mongo Database in Production Environment.
Recently, once of the colleagues deprecated one of the Mongos (Router) instance and the new instance was spawned up (via ASG) with a latest minor version of Mongo 4.2 which is 4.2.14. Rest of the instances (shards, mongos and configs) continue to be on 4.2.5.
Soon after this, I am facing a problem to which I can find absolutely no documentation over the web. The problem happens only for the node which has the latest version of mongos - 4.2.14.
Error - Command failed with error 40415 (Location40415): 'BSON field '$mergeCursors.recordRemoteOpWaitTime' is an unknown field.' on server 10.17.9.84:27017. The full response is {"ok": 0.0, "errmsg": "BSON field '$mergeCursors.recordRemoteOpWaitTime' is an unknown field.", "code": 40415, "codeName": "Location40415", "operationTime": {"$timestamp": {"t": 1620920773, "i": 1}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1620920773, "i": 1}}, "signature": {"hash":
{"$binary": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type": "00"}
, "keyId": {"$numberLong": "0"}}}}
After a day of digging into this I could understand that from any version 4.2.5 (unknown to us at the moment) onwards all requests going out of mongos have recordRemoteOpWaitTime: true. Since rest of the nodes are lower than 4.2.13, they aren't able to recognise this attribute hence erroring out.
I cannot find any reference to this over the available documentation. I need to figure out a way to disable mongos passing this attribute to shards. Any leads would be appreciated!

Elasticsearch and MongoDB: no river _meta document found after 5 attempts

I have a MongoDB database named news to which I tried to index with ES.
Using these plugins:
richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.9
and elasticsearch/elasticsearch-mapper-attachments/2.5.0
This is what happening when I tried to create the index. I have tried to delete the index and recreating it, without that helping.
$ curl -XPUT 'http://localhost:9200/_river/news/_meta' -d #init.json
init.json
{
"type": "mongodb",
"mongodb": {
"db": "news",
"collection": "entries"
},
"index": {
"name": "news",
"type": "entries"
}
}
Here is a log
update_mapping [mongodb] (dynamic)
MongoDB River Plugin - version[2.0.9] - hash[73ddea5] - time[2015-04-06T21:16:46Z]
setRiverStatus called with mongodb - RUNNING
river mongodb startup pending
Starting river mongodb
MongoDB options: secondaryreadpreference [false], drop_collection [false],
include_collection [], throttlesize [5000], gridfs [false], filter [null],
db [news], collection [entries], script [null], indexing to [news]/[entries]
MongoDB version - 3.0.2
update_mapping [mongodb] (dynamic)
[org.elasticsearch.river.mongodb.CollectionSlurper] Cannot ..
import collection entries into existing index
d with mongodb - INITIAL_IMPORT_FAILED
Started river mongodb
no river _meta document found after 5 attempts
no river _meta document found after 5 attempts
Any suggestions to what might be wrong?
I'm running ES 1.5.2 and MongoDB 3.0.2 on OS X.
On the mongodb river github pages, it looks like the plugin is supported up until version 1.4.2, but not higher (i.e. you're running 1.5.2)
Also note that rivers have been deprecated in ES v1.5 and there's an open issue in the mongodb river project on this very topic.
UPDATE after chatting with #martins
Finally, the issue was simply that the name of the created river was wrong (i.e. news instead of mongodb), the following command would properly create the mongodb river, which still works with ES 1.5.2 even though not it's officially tested.
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d #init.json

elasticsearch throw exception work with mongodb river

I followed the link http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-and-Mongo-DB-td4033358.html to integrate elasticsearch and mongodb using mongodb river. The versions of each component are:
ubuntu 12.04 64bit
ES 0.90.0
mongodb 2.4.3
river 1.6.5
Mongodb is standalone running in one server but according to this link http://loosexaml.wordpress.com/2012/09/03/how-to-get-a-mongodb-oplog-without-a-full-replica-set/, the oplog is opened as replSet and oplogSize is configured in /etc/mongodb.conf, and the db.oplog.rs.find() also displayed some operation records.
The index added by:
curl -XPUT localhost:9200/_river/appdata/_meta -d'
{
"type": "mongodb",
"mongodb" : {
"db": "test_appdata",
"collection": "app_collection"
},
"index": {
"name": "test_appdata",
"type": "app"
}
}'
But when the elasticsearch started, the log show some exception as follow:
[2013-05-07 23:20:40,400][INFO ][river.mongodb ] [Ransak the Reject] [mongodb][app] starting mongodb stream. options: secondaryreadpreference [false], throttlesize [500], gridfs [false], filter [], db [test_appdata], script [null], indexing to [test_appdata]/[app]
Exception in thread "elasticsearch[Sundragon][mongodb_river_slurper][T#1]" java.lang.NoSuchMethodError: org.elasticsearch.action.get.GetResponse.exists()Z
at org.elasticsearch.river.mongodb.MongoDBRiver.getLastTimestamp(MongoDBRiver.java:1088)
at org.elasticsearch.river.mongodb.MongoDBRiver.access$2200(MongoDBRiver.java:93)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.getIndexFilter(MongoDBRiver.java:967)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.oplogCursor(MongoDBRiver.java:1021)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:858)
at java.lang.Thread.run(Thread.java:679)
I'm a newbie to elasticsearch and mongodb, is the replica setting of mongodb caused the error?
Any suggestion is appreciated.
Your river is not compatible with Elasticsearch 0.90.
Move to ES 0.20.6 or ask for a patch in Mongodb river Project.

mongodb 2.4.1 failing with 'process out of memory'

I have the following mongo versions
db version v2.4.1
MongoDB shell version: 2.4.1,
and
db version v2.2.1-rc1, pdfile version 4.5,
MongoDB shell version: 2.2.1-rc1
installed on 64-bit windows 7 machine.
I have a collection having 10001000 (10 million+) records, when I use V 2.4.1 to aggregate, it fails with the following
error:
Fatal error in CALL_AND_RETRY_2
Allocation failed - process out of memory
However when I use V 2.2.1-rc1, to aggregate the same collection, it works fine and gives result in around 1 minute.
Sample document of the collection that is being aggregated:
{
"_id" : ObjectId("516bdd1c39b10c722792e007"),
"f1" : 10000010,
"f2" : 10000000,
"key" : 0
}
Aggregation Command:
{$group: {"_id": "$key", total: {$sum: "$f1"}}}
Command used to populate records:
for(var i = 10011000; i < 10041000; ++i)
{
db.testp.insert({"f1": i+10, "f2": i, "key": i%1000})
}
How much memory do you have? Could it be the $group is taking up more than 10% of available memory and causing the error? See the aggregation documentation on memory for cumulative operators.
edit 1:
Out of interest - does the aggregation work outside the shell? eg calling it from a driver.
I have seen similar v8 errors and as the shell was updated to v8 in 2.4 Theres a chance it could be that.
edit 2:
If the resulting array is too big in the shell then that can also trigger the error: see SERVER-8859. To work around you might need to run multiple aggregations, either by doing a $match early on to limit the working set or even a $skip and $limit to paginate through the result set.
I tried your aggregation with 10,070,999 docs on 2.4.1 on a mac and didn't get the error