How to sync between mongodb and elasticsearch? - mongodb

I have a scala microservice that serves as database api, and the database I am using is mongodb.
I want to add elasticsearch that will contain all the data that my mongodb have, and I need to keep it in sync when the mongodb is updated, how can I achieve it?
what would be the best approach to do this? is there some plugins or something that can help me with this task?

Look at the 5 Different ways to synchronize data from MongoDB to ElasticSearch, personally, I did it with Logstash where I simply filtered one collection and dumped to ES every 24 hrs, the use case is key to determine what strategy/tool is to use.

Related

What are the right practices of using Apache Solr with MongoDB?

I am using Apache Solr 6.3.0 and MongoDB 3.4 for advance text search features. I have successfully, synced mongodb with solr cores using mongo-connector 2.5 and solr doc manager.
I want to know the right way and practices to use solr with mongo and I have some issues that I need help on:
1). Now that my data is available both in mongo database and also indexed and stored in Solr cores, should I now query Solr all the time ? Or should I query solr for text search only and perform rest of the queries on mongo ?
2). Is there some way I could perform powerful search directly on mongo database using the indexing done by Solr ?
3). I have some collections that contain deeply nested json data and MongoDb supports them well. Solr indexes and stores such data in flattened form.But, I want to maintain the original nested json format in query response. Is this something I can achieve with Solr ?
Other suggestions about good practices of using solr with mongoDb will be extremely helpful.
If it makes sense to just query Solr, do that. If it makes sense to query Solr for certain data, do that. It depends on your use case, but if any query can be answered with the data in Solr, it's perfectly fine to use that for everything. That'll probably allow a more efficient use of your caches.
No, not that I know of.
Not really. Solr isn't well suited for nested JSON (even if you have parent/child documents, it's something you'll have to manually handle in every situation and will require special casing all over).
In those situations you can use Solr for querying, get the ids back and then retrieve the actual documents from mongo with their JSON structure intact. In that case you can leave most fields as non-stored in Solr.

MongoDB with ElasticSearch

I'm looking for the right way so use ElasticSearch with MongoDB. I want to save several informations in MongoDB. Additionally i want to save a larger text with ElasticSearch to support complex fulltext-search.
My problem at the moment is:
I'm not sure what the best solution is for this. Most solutions i found to synchronize MongoDB with ElasticSearch are using "river" which is deprecated!
What is the best way to combine these two technologies?
Is it even the best way to save it in MongoDB and ElasticSearch?
I found multiple articles that explained, that ElasticSearch alone is not safe enough and that you have to use another DBMS.
Also under robustness on the mongoDB website I found this:
Unfortunately, Elasticsearch (and the components it's made of) does not currently handle OutOfMemory-errors very well.
[source]
So saving the data redundant is probably the best way.
Thanks in advance!
Hei,
We are also working with both Elasticsearch and MongoDb. We started with a river and after having a lot of issues with it we got rid of it before becoming deprecated. The way we do it is: when saving data to mongo we create a message in a queue which notifies the search storage to do the insert/delete operation with the given data.
So basically we keep them in sync manually and there will always be a delay between mongo and elaticsearch. The good part is that if elasticsearch would fail, we have implemented an endpoint which reimports the data from mongo to ES. Also, the structure inside ES it's different from the one in mongo. Before, it was a lot more complicated to do this with the river. Imagine that we even had our own custom implementation.
Hope my answer helps at least a bit.

MongoDB integration with Solr

I am beginner with mongodb and its integraiton with Solr. From different posts I got an idea about the integration steps. But need info on the below
I have the data in mongodb, for faster retrieval we are integrating it with Solr.
Solr indexes all mongodb entries. Is this indexing one time activity after integration or Do we need to periodically update Solr to index the entries which got inserted after the integration ?
If we need to periodically update solr, it becomes an extra overhead to maintain it in Solr as well along with mongodb. Best approaches on overcoming it.
As far as I know you do not have official(supported/complete) solution to integrate MongoDB and Solr, but let me give you some ideas/direction.
For me the best approach is when it is possible to modify the application and add to the persistence layer the fact that you have all writes operations done in MongoDB and Solr in the "same" time. Like that you can control exactly what you want to send to the Database and what you want to index for a full text operation. But as I said this means that you have to change your application code. (You will have anyway to change it to be able to query Solr when needed). And yes you have to index all the existing documents the first time
You can use a "connector" approach where MongoDB and Solr are kind of connected together, this could be done in various ways.
You can use for example the MongoDB Connector available here : https://github.com/10gen-labs/mongo-connector
LucidWorks, the company behind Solr has also a connector for MongoDB, documented here : http://docs.lucidworks.com/display/help/Create+a+New+MongoDB+Data+Source# (I have not used it so cannot comment, but it is also an approach)
You point #2 is true, you have to manage two clusters and be sure the data are in sync, and sometimes pay the price of inconsistency between the Solr index and the document just updated in MongoDB... So you need to see if the best approach for your application is to use MongoDB alone or MongoDB with Solr (see comment below)
Just a small comment in addition to this answer:
You are talking about "faster retrieval", not sure it should be the reason, if you write correct queries with correct indexes in MongoDB you should be able to do it without Solr. If you requirement is really oriented towards the power of solr meaning: full text index (with all related features it makes sense)
How large is your data? MongoDB has a few good indexing mechanism of its own.
There is a powerful geo-api and for full text search there is http://docs.mongodb.org/manual/core/index-text/. So it would be ideal to identify if your need fits into MongoDB or you need to spill over to SOLR.
About the indexing part. How often if your data updated? If you can afford to have infrequent updates, then a batch job with once a day re-indexing may work for you. Ideally SOLR would work well for some form of master data.

What is the typical usage of ElasticSearch in conjuncion with other storage?

It is not recommended to use ElasticSearch as the only storage from some obvious reasons like security, transactions etc. So how it is usually used together with other database?
Say, I want to store some documents in MongoDB and be able to effectively search by some of their properties. What I'd do would be to store full document in Mongo as usual and then trigger insertion to ElasticSearch but I'd insert only searchable properties plus MongoDB ObjectID there. Then I can search using ElasticSearch and having ObjectID found, go to Mongo and fetch whole documents.
Is this correct usage of ElasticSearch? I don't want to duplicate whole data as I have them already in Mongo.
The best practice is for now to duplicate documents in ES.
The cool thing here is that when you search, you don't have to return to your database to fetch content as ES provide it in only one single call.
You have everything with ES Search Response to display results to your user.
My 2 cents.
You may like to use mongodb river take a look at this post
There are more issue then the size of the data you store or index, you might like to have MongoDB as a backup with "near real time" query for inserted data. and as a queue for the data to indexed (you may like to use mongodb as cluster with the relevant write concern suited for you application

solr Data Import Handlers for MongoDB

I am working on a project where we have millions of entries stored in MongoDB database and, i want to index all this data using SOLR.
After extensive Searching i came to know there are no proper "Data Import Handlers" for mongoDB database.
Can anyone tell me what are the proper approaches for indexing data in MongoDB using SOLR ?
I want to use all the features of SOLR and want it to be scalable in real-time. I saw one or two approaches from different posts but not sure how they will work real time..
Many Thanks
10Gen introduce Mongodb Connector. You can integrate Mongodb with Solr using this tool.
Blog post : Introducing Mongo Connector
Github page : mongo-connector
I have created a plugin to allow you to load data from MongoDb using the Solr data import handler.
Check it out at:
https://github.com/james75/SolrMongoImporter
I wrote a response to a similar question, except it was how to import data from MySQL into SOLR. The example code is in PHP, but should give you a general idea. All you would need to do is set up an iterator to step through your MongoDB assets, extract the data to SOLR datatypes, and then save it to your SOLR index.
If you want it to be real-time, you could add some custom code to the save mechanism (assuming this can be done with MongoDB), and save directly to the SOLR index, then run a commit script to commit data every 15 minutes (via cron).