MongoDB with ElasticSearch - mongodb

I'm looking for the right way so use ElasticSearch with MongoDB. I want to save several informations in MongoDB. Additionally i want to save a larger text with ElasticSearch to support complex fulltext-search.
My problem at the moment is:
I'm not sure what the best solution is for this. Most solutions i found to synchronize MongoDB with ElasticSearch are using "river" which is deprecated!
What is the best way to combine these two technologies?
Is it even the best way to save it in MongoDB and ElasticSearch?
I found multiple articles that explained, that ElasticSearch alone is not safe enough and that you have to use another DBMS.
Also under robustness on the mongoDB website I found this:
Unfortunately, Elasticsearch (and the components it's made of) does not currently handle OutOfMemory-errors very well.
[source]
So saving the data redundant is probably the best way.
Thanks in advance!

Hei,
We are also working with both Elasticsearch and MongoDb. We started with a river and after having a lot of issues with it we got rid of it before becoming deprecated. The way we do it is: when saving data to mongo we create a message in a queue which notifies the search storage to do the insert/delete operation with the given data.
So basically we keep them in sync manually and there will always be a delay between mongo and elaticsearch. The good part is that if elasticsearch would fail, we have implemented an endpoint which reimports the data from mongo to ES. Also, the structure inside ES it's different from the one in mongo. Before, it was a lot more complicated to do this with the river. Imagine that we even had our own custom implementation.
Hope my answer helps at least a bit.

Related

MongoDB Document of Size 300kb taking 8-15s

I am using MongoDB Atlas Free Tier hosted on GCP. I have documents which have arrays containing 300kb data. A simple Get By ID query takes around 8-15 seconds. There are less than 50 records in the collection so probably indexing is not an issue. Also, the I have used my custom Ids, and not the built in ObjectIds in my collection. Is this much query time normal? If yes, what are some ways to address this issue as I need fast realtime analytics on Frontend. I already have Redis in mind, but is there any better way to address this?
Ensure your operations are not throttled. https://docs.atlas.mongodb.com/reference/free-shared-limitations/
Test performance with a different driver (another language), verify you are using most recent driver releases.
Test smaller documents to identify whether time is being expended on the server or over the network.
Test with mongo shell.
As for an answer, I highly recommend you not to deal with M0 Atlas tier. Or at least choose it wisely, don't choose US-based cluster if you thousand of miles away from States side. Don't understood me wrong. It's a good product. But it depends on your costs.
As for myself, I prefer to deal with MongoDB Community Edition version and deploy it on my VPS/VDS. Of course it doesn't provide you such good web-interface like you have seen in Atlas. And there is no support of Realms functional (stitch), but instead you could design it yourself. And also, every performance issue is depend on you.
As for me, I using MongoDB not for real-time data, but visual snapshots on front-end, and I have no problems with performance.
I mean if I have them, then I deal with them myself, via indexing,
increasing VPS CPU/RAM, optimizing queries and so on
Also, one more thing about your problem: «I have documents which have arrays containing 300kb data»
If you have an array field in your schema, and it stores lots of data, especially if it's embedded docs, are you sure that you are using right schema pattern?
You might wanna take a look at this articles at Mongo University about architecture patterns.
Probably it will be much better for you to have a different collection for embedded docs, and request them via aggregation.$lookup when they needed.

MongoDB integration with Solr

I am beginner with mongodb and its integraiton with Solr. From different posts I got an idea about the integration steps. But need info on the below
I have the data in mongodb, for faster retrieval we are integrating it with Solr.
Solr indexes all mongodb entries. Is this indexing one time activity after integration or Do we need to periodically update Solr to index the entries which got inserted after the integration ?
If we need to periodically update solr, it becomes an extra overhead to maintain it in Solr as well along with mongodb. Best approaches on overcoming it.
As far as I know you do not have official(supported/complete) solution to integrate MongoDB and Solr, but let me give you some ideas/direction.
For me the best approach is when it is possible to modify the application and add to the persistence layer the fact that you have all writes operations done in MongoDB and Solr in the "same" time. Like that you can control exactly what you want to send to the Database and what you want to index for a full text operation. But as I said this means that you have to change your application code. (You will have anyway to change it to be able to query Solr when needed). And yes you have to index all the existing documents the first time
You can use a "connector" approach where MongoDB and Solr are kind of connected together, this could be done in various ways.
You can use for example the MongoDB Connector available here : https://github.com/10gen-labs/mongo-connector
LucidWorks, the company behind Solr has also a connector for MongoDB, documented here : http://docs.lucidworks.com/display/help/Create+a+New+MongoDB+Data+Source# (I have not used it so cannot comment, but it is also an approach)
You point #2 is true, you have to manage two clusters and be sure the data are in sync, and sometimes pay the price of inconsistency between the Solr index and the document just updated in MongoDB... So you need to see if the best approach for your application is to use MongoDB alone or MongoDB with Solr (see comment below)
Just a small comment in addition to this answer:
You are talking about "faster retrieval", not sure it should be the reason, if you write correct queries with correct indexes in MongoDB you should be able to do it without Solr. If you requirement is really oriented towards the power of solr meaning: full text index (with all related features it makes sense)
How large is your data? MongoDB has a few good indexing mechanism of its own.
There is a powerful geo-api and for full text search there is http://docs.mongodb.org/manual/core/index-text/. So it would be ideal to identify if your need fits into MongoDB or you need to spill over to SOLR.
About the indexing part. How often if your data updated? If you can afford to have infrequent updates, then a batch job with once a day re-indexing may work for you. Ideally SOLR would work well for some form of master data.

Why use ElasticSearch with Mongo?

I have read a few articles recently on the combination of mongodb for storage and elasticsearch for indexing/search. I feel like I'm missing something though. Why would you go this route as opposed to just using mongo to index the data? What benefits does elasticsearch bring and is it worth the added complexity?
ElasticSearch implements a lot more features, such as custom splitting of text into words, custom stemming, facetted search and a whole lot more. While MongoDB's (rather simple) text search does some of this, it is not nearly as powerful as ElasticSearch.
If all you ever do is look for a single string in a single field, then MongoDB's normal query system will work excellently for that. If you need to look for words in multiple fields, then MongoDB's text search will work. If you need anything more than that, ElasticSearch is the way to go.
A search engine and a database do some fundamentally different things. A good search engine (like ElasticSearch) supports far more elaborate and complex indexing, facets, highlighting etc. In the case of ElasticSearch, you also get your replies 'real-time'. On the other hand, a search engine doesn't return every single document that matches your query. Instead, it will score documents according to how much they match, and return the top scoring ones. When you query a database such as MongoDB, you should expect it to return everything that matches your query.
You can store the entire document in ElasticSearch, but it is usually not the optimal solution. Normally you will have it configured to return the document id's, which you use to fetch the document from a database. MongoDB is a database optimized for document based storage. this is why you hear about people using them together.
edit:
When this was posted, it matched the recommendations, but this may no longer be the case.
Derick's answer pretty much nails it. The questions behind all this is:
What are the features you want to implement in your application?
If you rely on heavy searching capabilities in large chunks of text, ElasticSearch is probably a good thing to use. If you want to have a flexible datastore that can cope with complex ad-hoc queries, Mongo might be a good fit. If you have different requirements for a datastore, it is often a good thing to combine two tools instead of implementing all kind of workarounds to make it work with just one datastore.
Choose the right tool for the job.

Some questions about mongodb that have me puzzled

I just installed mongodb from Synaptic (Using ubuntu 11.10).
I'm following this tutorial: http://howtonode.org/express-mongodb
to build a simple blog using node.js, express.js and mongodb.
I reached the part where you can actually store post in the mongodb.
It worked for me but I'm not sure how. I didn't even start the mongodb (just installed it).
The second thing that has me puzzled is: where is that data going?
Does anyone have any clue?
EDIT:
I think I found where is the data going:
var/lib/
Where are some files that are apparently written in binary code.
MongoDB stores all your data as BSON within the directory you found. The internal Mongo processes handle the sharding of the data when necessary following your specific setup, accomodating you if you have setup distribution across several servers.
One of the most confusing things about coming over to Mongo from RDBMS is the nature of collections vs tables.
http://www.mongodb.org/display/DOCS/MongoDB%2C+CouchDB%2C+MySQL+Compare+Grid
Of Note: A Mongo collection does not need to have any schema designed, it is assumed that the client application will validate and enforce any particular scheme. The way the data comes in and is serialized is the way it will be saved in its BSON representation. It is entirely possible to have missing keys and and totally different docs in the same collection, which is why robust data checking in your app becomes so important.
Mongo collections also do not need to be saved. There isn't anything that approximates the CREATE TABLE syntax from MySQL, since there's no schema there isn't much such a command would have to describe. Your collection will be created with the first document inserted, and every document inserted will be given a unique hashed '_id' key (assuming you don't define this key in your objects using your own methodology.'
Here's a helpful resource with the basic commands.
http://cheat.errtheblog.com/s/mongo
Mongo, and NoSQL in general, is a developer lead movement trying to bridge the gap between the object-oriented application layer and the persistency layer in an app, which would traditionally have been represented by an RDBMS. Developers just would much rather work within one paradigm, and OOP has become predominant especially in the web environment.
If you're coming over from LAMP and used PhpMyAdmin in the past, I really must suggest RockMongo. This tool makes it so much easier to observe and understand the actual structure of the BSON documents stored in your server.
http://code.google.com/p/rock-php/wiki/rock_mongo
Best of luck!

Using MongoDB and Redis together?

We started with Redis, storing active data, logged in users, etc. We're using some pubsub too for realtime data passing.
Recently we added Mongo to fit our geo spatial needs, and it seems great for non-active data too.
How should these two work together? It is dumb to use both? Is it dumb to pass chunks of data from mongo to redis when they becomes active?
Our thoughts were that we might store everything in mongo but then pass user data from mongo to redis when a user is active and the data is likely to be accessed. I know Mongo does some cacheing like this on its own, we are new to both of them and just want to know how they should be used together, if at all.
Thanks!!
It is dumb to use both? Is it dumb to pass chunks of data from mongo to redis when they becomes active?
So I feel like there's actually a legitimate to test and validate this question. Redis is basically an "in-memory" DB, so how much better can you do by giving that RAM to Mongo?
Historically, we've used the Memcache/MySQL combo to basically "add RAM" to MySQL and limit the amount of writing it needed to do. We did this simply because it was complicated to shard MySQL.
However, MongoDB provides a sharding mechanism. So you can "add RAM" to a problem (along with "adding disks") simply by adding more shards.
Thanks to the way memory-mapped files work, MongoDB tends to keep recently used data in memory. So if you're pulling recent data into Redis, that data is probably also in memory on the MongoDB side, so it's not clear that you benefit from having it in two places.
Is it dumb ...
That's hard to say without some testing and analysis. MongoDB doesn't really have the pub/sub mechanism, but it does tend to have fast query times, so it may be appropriate in specific spots.