solr Data Import Handlers for MongoDB - mongodb

I am working on a project where we have millions of entries stored in MongoDB database and, i want to index all this data using SOLR.
After extensive Searching i came to know there are no proper "Data Import Handlers" for mongoDB database.
Can anyone tell me what are the proper approaches for indexing data in MongoDB using SOLR ?
I want to use all the features of SOLR and want it to be scalable in real-time. I saw one or two approaches from different posts but not sure how they will work real time..
Many Thanks

10Gen introduce Mongodb Connector. You can integrate Mongodb with Solr using this tool.
Blog post : Introducing Mongo Connector
Github page : mongo-connector

I have created a plugin to allow you to load data from MongoDb using the Solr data import handler.
Check it out at:
https://github.com/james75/SolrMongoImporter

I wrote a response to a similar question, except it was how to import data from MySQL into SOLR. The example code is in PHP, but should give you a general idea. All you would need to do is set up an iterator to step through your MongoDB assets, extract the data to SOLR datatypes, and then save it to your SOLR index.
If you want it to be real-time, you could add some custom code to the save mechanism (assuming this can be done with MongoDB), and save directly to the SOLR index, then run a commit script to commit data every 15 minutes (via cron).

Related

Can we use neo4j and mongodb same time?

I am new to neo4j and the current project I'm working on uses MongoDB as database, I have a new requirement to show recommendations and visualize data using neo4j. I have no option to replace MongoDB with neo4j and if I am using both DBs I will have to dump data to neo4j every time to get updated data from MongoDB. Is there any better solution for this?
You can use MongoDB as caching and put all your data in Neo4j, but not a good idea to use Neo4j just for data visualization

ELK stack - how to search through mongo database?

I'm using the elk stack for the first time and I can import data with logstash but how do I link my mongodb to elastic instead?
Also, what is the best way to import bulk data?
I'm using the MEAN stack and the newest version of elk 5+. I am not using beats like filebeats but am willing to use if needed.
First, if you're using logstash successfully, then you don't need filebeats. (Although filebeats is much better than logstash).
I think you're confused about the other terms.. You don't hook up mongodb to elastic. When using the ELK stack, logstash is intended to send your logs to elasticsearch, and kibana is the UI layer to view your data.
If you really want to use mongodb (although I don't recommend), then you're using mongodb instead of elasticsearch.
If you are after searching MongoDB data in elasticsearch you will need to import it (from Mongo to Elasticsearch)
There are a few ways, one of them is described here: https://stackoverflow.com/a/24119066/6079633 - but i don't think it supports elastic 5
And there is this one https://github.com/ozlerhakan/mongolastic - which is according to elasticsearch website: "A tool that clones data from ElasticSearch to MongoDB and vice versa"
I know this answer could be late, but it may help other people.
If you need a tool to transfer your data from MongoDB to Elasticsearch, have a look into this mongoose plugin https://github.com/mongoosastic/mongoosastic/tree/master/docs, it's a great tool for automatically indexing MongoDB models into elasticsearch.
and you can transfer your collection data through indexing an existing collection in MongoDB

What are the right practices of using Apache Solr with MongoDB?

I am using Apache Solr 6.3.0 and MongoDB 3.4 for advance text search features. I have successfully, synced mongodb with solr cores using mongo-connector 2.5 and solr doc manager.
I want to know the right way and practices to use solr with mongo and I have some issues that I need help on:
1). Now that my data is available both in mongo database and also indexed and stored in Solr cores, should I now query Solr all the time ? Or should I query solr for text search only and perform rest of the queries on mongo ?
2). Is there some way I could perform powerful search directly on mongo database using the indexing done by Solr ?
3). I have some collections that contain deeply nested json data and MongoDb supports them well. Solr indexes and stores such data in flattened form.But, I want to maintain the original nested json format in query response. Is this something I can achieve with Solr ?
Other suggestions about good practices of using solr with mongoDb will be extremely helpful.
If it makes sense to just query Solr, do that. If it makes sense to query Solr for certain data, do that. It depends on your use case, but if any query can be answered with the data in Solr, it's perfectly fine to use that for everything. That'll probably allow a more efficient use of your caches.
No, not that I know of.
Not really. Solr isn't well suited for nested JSON (even if you have parent/child documents, it's something you'll have to manually handle in every situation and will require special casing all over).
In those situations you can use Solr for querying, get the ids back and then retrieve the actual documents from mongo with their JSON structure intact. In that case you can leave most fields as non-stored in Solr.

Which approach of search is feasible (elastic search + mongoDB) or mongoDB text indexes

In my project I have to implement text search and and have to choose a feasible
approach among the two that are :-
Synchronising MongoDB database with ElasticSearch.
MongoDB's own text indexes that has Elastic Search like text searching
capabilities.
I have gone through many articles that provide the pros of each of the cases but haven't found any relevant document that provides comparison between the two approaches and which approach is better than the other or what are the limitation for a specific approach.
Note:- I am using Node.js with express.js.
MongoDB is a general purpose database, Elasticsearch is a distributed text search engine backed by Lucene. You can store data in MongoDB and use Elasticsearch exclusively for its' full-text search capabilities. According to your use case, you can only send a subset of the mongo data fields to elastic.
Synchronization of mongodb with Elasticsearch can be done with mongoosastic. You can solve data safety concern by persisting in mongo and speed up your search by using elasticsearch. Also can use python script using elasticsearch.py package to sync mongo and elasticsearch.
Mongodb search is very slow as compared to elasticsearch. Also indexing in mongodb takes up more time and more resources.

MongoDB integration with Solr

I am beginner with mongodb and its integraiton with Solr. From different posts I got an idea about the integration steps. But need info on the below
I have the data in mongodb, for faster retrieval we are integrating it with Solr.
Solr indexes all mongodb entries. Is this indexing one time activity after integration or Do we need to periodically update Solr to index the entries which got inserted after the integration ?
If we need to periodically update solr, it becomes an extra overhead to maintain it in Solr as well along with mongodb. Best approaches on overcoming it.
As far as I know you do not have official(supported/complete) solution to integrate MongoDB and Solr, but let me give you some ideas/direction.
For me the best approach is when it is possible to modify the application and add to the persistence layer the fact that you have all writes operations done in MongoDB and Solr in the "same" time. Like that you can control exactly what you want to send to the Database and what you want to index for a full text operation. But as I said this means that you have to change your application code. (You will have anyway to change it to be able to query Solr when needed). And yes you have to index all the existing documents the first time
You can use a "connector" approach where MongoDB and Solr are kind of connected together, this could be done in various ways.
You can use for example the MongoDB Connector available here : https://github.com/10gen-labs/mongo-connector
LucidWorks, the company behind Solr has also a connector for MongoDB, documented here : http://docs.lucidworks.com/display/help/Create+a+New+MongoDB+Data+Source# (I have not used it so cannot comment, but it is also an approach)
You point #2 is true, you have to manage two clusters and be sure the data are in sync, and sometimes pay the price of inconsistency between the Solr index and the document just updated in MongoDB... So you need to see if the best approach for your application is to use MongoDB alone or MongoDB with Solr (see comment below)
Just a small comment in addition to this answer:
You are talking about "faster retrieval", not sure it should be the reason, if you write correct queries with correct indexes in MongoDB you should be able to do it without Solr. If you requirement is really oriented towards the power of solr meaning: full text index (with all related features it makes sense)
How large is your data? MongoDB has a few good indexing mechanism of its own.
There is a powerful geo-api and for full text search there is http://docs.mongodb.org/manual/core/index-text/. So it would be ideal to identify if your need fits into MongoDB or you need to spill over to SOLR.
About the indexing part. How often if your data updated? If you can afford to have infrequent updates, then a batch job with once a day re-indexing may work for you. Ideally SOLR would work well for some form of master data.