Running Lucene-based search on Grails application using MongoDB - mongodb

Currently I am investigating in ways to implement a Lucene-based search on a Grails application using MongoDB.
Requirements include the following:
The data to index is stored in a MongoDB
Data only gets inserted (no updates, no deletions)
The application has to run on the CloudBees platform
The search should be implemented without any external services like Searchly or WebSolr
So far this does not seem to be very complicated as there are Grails plugins. However, the main problem I am facing is that my application uses dynamic MongoDB collections. So I do not have a domain class for each and every collection. Instead, the collections that should be indexed can have arbitrary names and schemas. As a result I cannot use Grails plugins like searchable as these seem to only work on fixed domain classes (or am I wrong about that?).
Does anybody have experience on how to implement a search in such a context? Any tips, links, hints, or recommendations?

You can use one index and multiple types for your dynamic MongoDB collections. However that logic should be coded by yourself since integration modules done within a mind set of domain model indexing.
For ElasticSearch you can use Jest via groovy for ElasticSearch https://github.com/searchbox-io/Jest
Searchly offers MongoDB integration out of the box unfortunately for a single collection. therefore for now you also need to query MongoDB(the collection you have created dynamically), index the data to index under new type and query it.
my old post is deleted due to not being related answer, well it is OK :)

Related

MongoDB collection not listed in _schema

I am using MongoDB 2.6 standard. I have successfully created multiple collections, inserted data, queried data, etc. Works fine inside of the Mongo shell, as well as from NodeJS apps using MongoSkin. So far, so good. Now I need to use a third-party tool (D2RQ) to access the data. It appears that D2RQ uses the _schema collection to obtain collection names, column names, data types, and so on. D2RQ works for three of the collections because the collections are in _schema in MongoDB. A fourth collection is not in _schema and seems to be invisible. However, the fourth collection is present in MongoDB. The collection has data. I can query the collection in the Mongo shell, and from NodeJS using Mongoskin. Any idea why the collection is not appearing in _schema? Is this a MongoDB bug?
This is not a MongoDB bug. The root cause of the problem is that D2RQ uses the UnityJDBC driver to access MongoDB. There is a parameter on the JDBC connection string indicating whether or not to rebuild _schema. D2RQ does not properly pass the parameter when making the JDBC connection to MongoDB, resulting in the _schema collection being out of date on all calls after the first call. The solution has two parts:
Part one was to write a tiny NodeJS application that does nothing but force the _schema rebuild on connection. That solved my immediate issue.
Part two was to then extend the tiny NodeJS application to be a full featured export process that generates an RDF file from MongoDB. This allowed me to remove both D2RQ and the UnityJDBC driver from the solution stack.
"The most reliable component in the architecture is the one that is not present"

MongoDB integration with Solr

I am beginner with mongodb and its integraiton with Solr. From different posts I got an idea about the integration steps. But need info on the below
I have the data in mongodb, for faster retrieval we are integrating it with Solr.
Solr indexes all mongodb entries. Is this indexing one time activity after integration or Do we need to periodically update Solr to index the entries which got inserted after the integration ?
If we need to periodically update solr, it becomes an extra overhead to maintain it in Solr as well along with mongodb. Best approaches on overcoming it.
As far as I know you do not have official(supported/complete) solution to integrate MongoDB and Solr, but let me give you some ideas/direction.
For me the best approach is when it is possible to modify the application and add to the persistence layer the fact that you have all writes operations done in MongoDB and Solr in the "same" time. Like that you can control exactly what you want to send to the Database and what you want to index for a full text operation. But as I said this means that you have to change your application code. (You will have anyway to change it to be able to query Solr when needed). And yes you have to index all the existing documents the first time
You can use a "connector" approach where MongoDB and Solr are kind of connected together, this could be done in various ways.
You can use for example the MongoDB Connector available here : https://github.com/10gen-labs/mongo-connector
LucidWorks, the company behind Solr has also a connector for MongoDB, documented here : http://docs.lucidworks.com/display/help/Create+a+New+MongoDB+Data+Source# (I have not used it so cannot comment, but it is also an approach)
You point #2 is true, you have to manage two clusters and be sure the data are in sync, and sometimes pay the price of inconsistency between the Solr index and the document just updated in MongoDB... So you need to see if the best approach for your application is to use MongoDB alone or MongoDB with Solr (see comment below)
Just a small comment in addition to this answer:
You are talking about "faster retrieval", not sure it should be the reason, if you write correct queries with correct indexes in MongoDB you should be able to do it without Solr. If you requirement is really oriented towards the power of solr meaning: full text index (with all related features it makes sense)
How large is your data? MongoDB has a few good indexing mechanism of its own.
There is a powerful geo-api and for full text search there is http://docs.mongodb.org/manual/core/index-text/. So it would be ideal to identify if your need fits into MongoDB or you need to spill over to SOLR.
About the indexing part. How often if your data updated? If you can afford to have infrequent updates, then a batch job with once a day re-indexing may work for you. Ideally SOLR would work well for some form of master data.

Combining Neo4J and MongoDB : Consistency

I am experimenting a lot these days, and one of the things I wanted to do is combine two popular NoSQL databases, namely Neo4j and MongoDB. Simply because I feel they complement eachother perfectly. The first class citizens in Neo4j, the relations, are imo exactly what's missing in MongoDB, whereas MongoDb allows me to not put large amounts of data in my node properties.
So I am trying to combine the two in a Java application, using the Neo4j Java REST binding, and the MongoDB Java driver. All my domain entities have a unique identifier which I store in both databases. The other data is stored in MongoDB and the relations between entities are stored in Neo4J. For instance, both databases contain a userid, MongoDB contains the profile information, and Neo4J contains friendship relations. With the custom data access layer I have written, this works exactly like I want it to. And it's fast.
BUT... When I want to create a user, I need to create both a node in Neo4j and a document in MongoDB. Not necessarily a problem, except that Neo4j is transactional and MongoDB is not. If both were transactional, I would just roll back both transactions when one of them fails. But since MongoDB isn't transactional, I cannot do this.
How do I ensure that whenever I create a user, either both a Node and Document are created, or none of both. I don't want to end up with a bunch of documents that have no matching node.
On top of that, not only do I want my combined database interaction to be ACID compliant, I also want it to be threadsafe. Both the GraphDatabaseService and the MongoClient / DB are provided from singletons.
I found something about creating "Transaction Documents" in MongoDB, but I realy don't like that approach. I would like something nice and clean like the neo4j beginTx, tx.success, tx.failure, tx.finish setup. Ideally, something I can implement in the same try/catch/finally block.
Should I perhaps make a switch to CouchDB, which does appear to be transactional?
Edit : After some more research, sparked by a comment, I came to realize that CouchDB is also not suitable for my specific needs. To clarify, the Neo4j part is set in stone. The Document Store database is not as long as it has a Java Library.
Pieter-Jan,
if you are able to use Neo4j 2.0 you can implement a Schema-Index-Provider (which is really easy) that creates your documents transactionally in MongoDB.
As Neo4j makes its index providers transactional (since the beginning), we did that with Lucene and there is one for Redis too (needs to be updated). But it is much easier with Neo4j 2.0, if you want to you can check out my implementation for MapDB. (https://github.com/jexp/neo4j-mapdb-index)
Although I'm a huge fan of both technologies, I think a better option for you could be OrientDB. It's a graph (as Neo4) and document (as MongoDB) database in one and supports ACID transactions. Sounds like a perfect match for your needs.
As posted here https://stackoverflow.com/questions/23465663/what-is-the-best-practice-to-combine-neo4j-and-mongodb?lq=1, you might have a look on Structr.
Its backend can be regarded as a Document database around Neo4j. It's fully transactional and open-source.

Grapical database builder for MongoDB (like DBForge for MySQL)

I really like DBForge with it's ability to create database schemes with a graphical UI.
Is there a tool like that for MongoDB?
Basically I'm looking for a tool that helps creating clean MongoDB Collections and Documents.
I'm starting a very big project in a few weeks and need to design a pretty huge MongoDB Database with lots of collections. So to keep everything organized I would like to have something graphical to look at instead of just coding the entities and their properties.
Try Mongo Explorer # http://mongoexplorer.com/ as it has some of the options you are looking for.
Good luck
I'm not aware of any specific MongoDB 'schema' designer. Since a collection is itself schema-less it's hard to imagine exactly what that would look like. You might for example decide in your database to store People, Cats and Cars all in the same collection because it happens to make it easier to query across them by name.
Typically I just use a standard class-viewer or entity designer for those occasions when I do want to see a graphical view of the objects that will be persisted as documents.
Relationships are also tricky for a graphical tool - sometimes all you have is an ObjectId, sometimes you denormalize and have maybe a Name and an ObjectId allowing you to display a list without fetching each item and sometimes you store an entire copy of the other object embedded in the current object.
You can even come up with ways to store data in MongoDB to mimic multiple-inheritance. Since most graphical entity designers support only single inheritance you would again have issues trying to use them with MongoDB.
In summary I recommend that you model your entities using whatever entity designer you have for the language you will be using and then worry about how to map them to collections to support your query and update patterns, adjusting them as necessary to denormalize parts of the data where you need performance, or to share a common field (by interface or inheritance) where you want to be able to easily search across entity types (e.g. string[] Tags, or string Name).
UI / GUI Admin Tools for MongoDB
This waas answered in this post.
I an using robomongo and its good.
The latest version supports 3.0 as well.
http://mongodb-tools.com/tool/robomongo/
http://robomongo.org/

Some questions about mongodb that have me puzzled

I just installed mongodb from Synaptic (Using ubuntu 11.10).
I'm following this tutorial: http://howtonode.org/express-mongodb
to build a simple blog using node.js, express.js and mongodb.
I reached the part where you can actually store post in the mongodb.
It worked for me but I'm not sure how. I didn't even start the mongodb (just installed it).
The second thing that has me puzzled is: where is that data going?
Does anyone have any clue?
EDIT:
I think I found where is the data going:
var/lib/
Where are some files that are apparently written in binary code.
MongoDB stores all your data as BSON within the directory you found. The internal Mongo processes handle the sharding of the data when necessary following your specific setup, accomodating you if you have setup distribution across several servers.
One of the most confusing things about coming over to Mongo from RDBMS is the nature of collections vs tables.
http://www.mongodb.org/display/DOCS/MongoDB%2C+CouchDB%2C+MySQL+Compare+Grid
Of Note: A Mongo collection does not need to have any schema designed, it is assumed that the client application will validate and enforce any particular scheme. The way the data comes in and is serialized is the way it will be saved in its BSON representation. It is entirely possible to have missing keys and and totally different docs in the same collection, which is why robust data checking in your app becomes so important.
Mongo collections also do not need to be saved. There isn't anything that approximates the CREATE TABLE syntax from MySQL, since there's no schema there isn't much such a command would have to describe. Your collection will be created with the first document inserted, and every document inserted will be given a unique hashed '_id' key (assuming you don't define this key in your objects using your own methodology.'
Here's a helpful resource with the basic commands.
http://cheat.errtheblog.com/s/mongo
Mongo, and NoSQL in general, is a developer lead movement trying to bridge the gap between the object-oriented application layer and the persistency layer in an app, which would traditionally have been represented by an RDBMS. Developers just would much rather work within one paradigm, and OOP has become predominant especially in the web environment.
If you're coming over from LAMP and used PhpMyAdmin in the past, I really must suggest RockMongo. This tool makes it so much easier to observe and understand the actual structure of the BSON documents stored in your server.
http://code.google.com/p/rock-php/wiki/rock_mongo
Best of luck!