Index mongoDB with ElasticSearch - mongodb

I already have MongoDB and installed Elasticsearch with Mongoriver. So I set up my river:
$ curl -X PUT localhost:9200/_river/database_test/_meta -d '{
"type": "mongodb",
"mongodb": {
"servers": [
{
"host": "127.0.0.1",
"port": 27017
}
],
"options": {
"secondary_read_preference": true
},
"db": "database_test",
"collection": "event"
},
"index": {
"name": "database_test",
"type": "event"
}
}'
I simply want to get events that have country:Canada so I try:
$ curl -XGET 'http://localhost:9200/database_test/_search?q=country:Canada'
And I get:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I am searching the web and I read that I should first index my collection with Elasticsearch (lost the link). Should I index my Mongodb? What should I do to get results from an existing MongoDB collection?

The mongodb river relies on the operations log of MongoDB to index documents, so it is a requirement that you create your mongo database as a replica set. I assume that you're missing it, so when you create the river, the initial import sees nothing to index. I am also assuming that you're on Linux and you have a handle on the shell cli tools, so try this:
Follow these steps:
Make sure that the mapper-attachments Elasticsearch plugins is also installed
Make a backup of your database with mongodump
edit mongodb.conf (usually in /etc/mongodb.conf, but varies on how you installed it) and add the line:
replSet = rs0
"rs0" is the name of the replicaset, it can be whatever you like.
restart your mongo and then log in its console. Type:
rs.initiate()
rs.slaveOk()
The prompt will change to rs0:PRIMARY>
Now create your river just as you did in the question and restore your database with mongorestore. Elasticsearch should index your documents.
I recomend using this plugin: http://mobz.github.io/elasticsearch-head/ to navigate your indexes and rivers and make sure your data got indexed.
If that doesnt work, please post which versions you are using for the mongodb-river-plugin, elasticsearch and mongodb.

Related

River Plugin Not_analyzed option for Elasticsearch

I'm using Elasticsearch 1.1.1, River Plugin and MongoDB 2.4
I have a field called cidr that is being analyzed. I need to set it so that it is not_analyzed anymore to use it with Kibana correctly. Following is the index I used. But now Im going to reindex it again (delete and write a new one.)
Whats the proper way to write a new index in a way that the values in the "cidr" field are not analyzed? Thank you.
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "collective_name",
"collection": "ips"
},
"index": {
"name": "mongoindex"
}
}'
I see. It's working now. Mapping should be created BEFORE creating the index.
curl -XPUT "localhost:9200/mongoindex" -d '
{
"mappings": {
"mongodb" : {
"properties": {
"cidr": {"type":"string", "index" : "not_analyzed"}
}
}
}
}'
This is it. :)

updates in mongodb not getting pushed in elasticsearch

I am using elasticsearch and mongodb. Setup river week ago. Today on searching records just check the elasticsearch index are not getting updated or modified as records in mongodb are added daily.
I am using following lines to create river
curl -XPUT "localhost:9200/_river/myindex/_meta" -d '
{
"type": "mongodb",
"mongodb": {
"host": "localhost",
"port": "27017",
"db": "qna",
"collection": "collection"
},
"index": {
"name": "myindex",
"type": "index_type"
}
}'

Not all documents are indexed with ElasticSearch and MongoDB

I am using ElasticSearch 1.1.0 (I was running 1.2.0 but had issues with a ElasticSearch plugin) and MongoDB 2.6.1. I've installed them using the tutorial provided at enter link description here. When I create an index using
curl -XPUT "localhost:9200/_river/tenders/_meta" -d '{
"type": "mongodb",
"mongodb": {
"servers": [
{ "host": "127.0.0.1", "port": 27017 }
],
"options": { "secondary_read_preference": true },
"db": "tenderdb",
"collection": "tenders"
},
"index": {
"name": "tendersidx",
"type": "page"
}
}'
Indexing starts fine of the collection but it only indexes a part of the collection. E.g. the collection has at the moment 5184 records while only 1060 are indexed.
Avish's comment did the trick, he wrote: "ElasticSearch rivers only monitor changes in the other data store; your river should only track documents added to the collection after the river has been set up."

ElasticSearch with river to MongoDB

I have installed and configured MongoDB and ES with mongodb river. But I'm not sure if I really understand rivers in ES. For example, I want index collection "users" from mongodb.
I will send curl PUT/POST request to url /_river/mongodb_users/_meta
{
"type": "mongodb",
"mongodb": {
"db": "somedb",
"collection": "users"
},
"index": {
"name": "users",
"type": "user"
}
}
But now, I want to index second collection, for example "users2". I really need to create new river using curl POST/PUT on url like /_river/mongodb_users2/_meta with JSON:
{
"type": "mongodb",
"mongodb": {
"db": "somedb",
"collection": "users2"
},
"index": {
"name": "users2",
"type": "user"
}
}
I can not use already created river "mongodb_users"? I will need create one river for one collection?
Thank you for explanation!
Yes. The way MongoDB river works does not allow to fetch content from more than one collection in a single river.
But, you can create as many river as you need.
That said, if you want to index users1 to users type in Elasticsearch and users2 to the same users type, you can (as soon as they don't use the same IDs).
Just modify index.type to "users".
Does it help?

com.mongodb.MongoException: not talking to master and retries used up

My search is not working now. I guess because my index was not configured for replica set:
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "mongo",
"host": "local",
"port": "40000",
"collection": "users"
},
"index": {
"name": "api",
"type": "users"
}
}'`
Is there anyway to declare a replica set properly so that elasticsearch can find the master, the way PHP driver does:
$m = new Mongo(
"mongodb://localhost:40000,localhost:41000",
array("replicaSet" => true)
);
so that elasticsearch can automatically fail over to another member.
I solved this simply by updating to the latest version of the client driver.
The previous (minor) version had trouble connecting to the latest mongo server.