Faceted search. Are Solr with MongoDb good for it? Does someone know about modules, libraries for faceted search? - mongodb

Good day, everyone.
I have an e-commerce website (Kohana php framework + Mysql + Sphinx search).
I want to integrate faceted search (also called faceted navigation, guided navigation, or parametric search) on my e-commerce shop.
1. I found several opinions that Solr was the best solution for faceted browsing? I want to be sure in my choice. Is Solr the best in doing that?
2. Also I want to migrate products (with attributes) from mysql to MongoDb. Is MongoDb good in cooperation with Solr?
3. Does someone know modules, ui, api for faceted search? Maybe there is some Zend library, Rest api...
Thanks for your help.

I found several opinions that Solr was the best solution for faceted browsing?
I am pretty sure that those who say that are most likely the same sales men who say that MongoDB pwns MySQL in every role.
Sphinx (the tech you are currently using) has been proven to support massive sets in a performant manner, for example: http://infegy.com/ uses it for a result set of 22 billion records ( http://sphinxsearch.com/info/powered/ )!
Solr is dead fast too but saying one is better than the other when both support facets and both support super fast speed on super big result sets is just utter nonsense.
Also I want to migrate products (with attributes) from mysql to MongoDb. Is MongoDb good in cooperation with Solr?
Solr uses a separate XML schema and files to represent its internal files (Lucence here) unlike Sphinx which can do this transparently without you knowing.
So to get MongoDB to work with Solr is no different than using MySQL with it. You build up the XML files and commit (or soft_commit) them to Solr.
As for faceting, here is a very simple page on it with links to the places you need to go: http://wiki.apache.org/solr/SolrFacetingOverview
Solr has a built in REST (JSON) API on top of Jety ( http://jetty.codehaus.org/jetty/ ) which you can use to fetch all the facets you need easily.

Related

NoSQL Database for Blog / Content Management System? (MongoDB / Cassandra)

My company has been used Oracle for a long time but we would like to look for a NoSQL database as a replacement for faster querying and flexible schema design.
I have tried to use MongoDB which would be the most popular NoSQL database nowadays. I connected it to Spring Data to do some simple queries, which is quite easy to be set up and code simply. Since we are using Spring MVC for web development, Spring Data seems quite suitable for integration.
However, I heard that Cassandra would have better performance in write and read, especially in large scaling system. I am not sure whether it is worth to move to Cassandra and not sure how to measure the performance between MongoDB and Cassandra.
Here are some requirements for my system:
focusing on article fetching
tagging for articles for users to easily search for their favors or related articles
non-distributed system, but have load-balancing and fail-over
Java based, Spring MVC for web development
articles would be stored as XML
probably provide user-defined tables (collections) and fields (keys)
Therefore I would like to raise some questions:
Which Database is the most suitable for my case? You may also raise other databases apart from MongoDB and Cassandra.
If I use Cassandra, which framework would be suitable for integrating to Spring MVC?
Thank you so much in advanced.
I have experience using Spring and Cassandra together. But I always have written my own data access layer.
Using the ORMs out there for Cassandra will not allow you to leverage its full power, and you will, most likely, introduce bugs because your SQL background will make you expect certain behaviours that are just not what Cassandra will give you.
My advice write the code that will access Cassandra yourself and do not be afraid to denormalize A LOT. Think more about how you want to query (or find it) your data than the format in which you want to save it.
I also strongly recommend reading this amazing article: Cassandra Data Modeling Best Practices part 1 part 2
Another DB which might suit your application better is CouchDB (I like using BigCouch). It is another Document based NoSQL database and is in my opinion superior to MongoDB. It offers better solution for scaling and gives emphasis to Availability (just like Cassandra).
I'd like to point you to this question about the difference between CouchDB and MongoDB.
As far as framework goes Play framework has a lot of plugin to work with NoSQL systems, so you might give it a try. You could try playorm which is the last I experimented on.
EDIT : I forgot to mention Kundera as well as an ORM for Cassandra
Choosing between Cassandra and MongoDB depends on type of storage. MongoDB is primarily for document based storage where you get an edge by having various sql like features.
If you require columnar database with high availability and multi dc replication? go for Cassandra.
http://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB

mongoDB as a file storage for Grails application

I've recently came across a need to store a higher amount of files in my application and because PaaS platform used to host the application provides mongo, I've would like to use it.
However because I'm quite inexperienced with mongo I have almost no idea what is the current state of mongo related plugins and tools for grails. What should I use? As I want to keep domain classes in SQL database and use mongo only to store related files (in this case it will be mostly a bunch of PDFs and text documents related to domain instance) the mongoDB ORM [1] plugin seems too "heavy". Unfortunately mongoDB ORM is probably the only mongo plugin for grails in active development at the moment.
In short, what would be the best plugin / library tool-set for this purpose? The closest thing that matches my need I've found is grails-mongo-files plugin [2], which is probably a little bit outdated with no further development.So far it seems that I will have to use mongo's java driver (or the gmongo wrapper) and write some storage service and taglib by myself (what is not necessary a bad thing).
[1] http://grails.org/plugin/mongodb
[2] https://github.com/quirklabs/grails-mongo-file
There is also the mongodb gridfs plugin. http://grails.org/plugin/mongodb-gridfs
One thing to consider is that gridfs effectively does two calls to mongo, one to retrieve file information and one to retrieve the file. So it might not be a good fit if your files are under 16 megabytes.
Here is a post on how to do this manually if you want to bypass plugins - http://jameswilliams.be/blog/entry/171

Mongodb document versioning using spring data

I am using Spring Data in my Java application to connect to MongoDb and have a requirement around versioning the documents (basically storing the history).
It seems that its pretty straightforward in Ruby, if one uses Mongoid
I was wondering if spring data has something similar for Java. Or are you better of trying to implement your own.
Yes there is a very good feature in Spring data which is auditing you can refer to the following link
http://www.javacodegeeks.com/2013/11/auditing-entities-in-spring-data-mongodb-2.html
After lot of research I found that https://javers.org/documentation/spring-boot-integration/. This works like rock solid and very easy to implement.
This library helps to store all the history of the changed fields and easy to query over it and it has great support of it. The sample POC shared here: https://nullbeans.com/auditing-using-spring-boot-mongodb-and-javers/

Full-text search on MongoDB GridFS?

Say, if I want to store PDFs or ePub files using MongoDB's GridFS, is it possible to perform full-text searching on the data files?
You can't currently do real full text search within mongo: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo
Feel free to vote for it here:
https://jira.mongodb.org/browse/SERVER-380
Mongo is more of a general purpose scalable data store, and as of yet it doesn't have any full text search support. Depending on your use case, you could use the standard b-tree indexes with an array of all of the words in the text, but it won't do stemming or fuzzy matches, etc.
However, I would recommend combining mongodb with a lucene-based application (elastic search is popular). You can store all of your data in mongodb (binary data, metadata, etc.), and then index the plain text of your documents in lucene. Or, if your use case is pure full text search, you might consider just using elastic search instead of mongodb.
Update (April 2013):
MongoDB 2.4 now supports a basic full-text index! Some useful resources below.
http://docs.mongodb.org/manual/applications/text-search/
http://docs.mongodb.org/manual/reference/command/text/#dbcmd.text
http://blog.mongohq.com/blog/2013/01/22/first-week-with-mongodb-2-dot-4-development-release/
Not using MongoDB APIs, not that I know of. GridFS seems to be designed to be more like a simplified file system with APIs that provides a straightforward key-value semantic. On their project ideas page they list two things that would help you if existed in production-ready state:
GridFS FUSE that would allow you to mount GridFS as a local file system and then index it like you would index stuff on your disk
Real-Time Full Text search integration with tools like Lucene and Solr. There are some projects on github and bitbucket that you might want to check out.
Also look at ElasticSearch. I have seen some integration with Mongo but I am not sure how much has been done to tap into the GridFS (GridFS attachment support is mentioned but I haven't worked with it to know for sure). Maybe you will be the one to build it and then opensource it? should be a fun adventure

Which Framework or CMS for Google-Video like site?

I am working on a Web Project similar to Google-Video.
As for now, I want to start coding the site.
I know some PHP, HTML and MySQL.
I already have:
Database built and ready (in MySQL)
Links and Tags in the Database
The thing is, I don't want to code everything from hand.
As I've seen so far, with CMS it's not possible to use my own database. Or am I wrong?
And what Framework would you suggest me?
Looking forward for your advice!
Thanks
You should probably start over, but use your existing DB design as your logical schema to be implemented in the CMS you eventually choose.
Go to http://cmsmatrix.org/ and compare Drupal, Joomla!, eZ Publish and TYPO3 for the best fit for your requirements.
Also, pay attention to the search engine features available with each one. e.g. eZ Publish eZ Find is based on Lucene.
In terms of functionality ( but excluding add management and your specific layout or graphic-design) you should be able to create a reasonable clone within a few hours using eZ. Here is one example http://untoldstories.eu/ezinfo/about