Is there a way to use OrientDb for performance efficient retrieving Information Coded in Alphanumeric? (something like PatriciaTrie from apache.commons)
For example searching "st*" should returns me all words started with "st".
you can use a LUCENE index for that.
http://orientdb.com/docs/2.1/Full-Text-Index.html
Related
I have a series of text data (tweets) which need to be indexed on 3 attributes. I wanted to use redis for the same as the response time has to be fast. Can anyone suggest how to go about this. Or should I go with MongoDB.
In most cases, with Redis you'll need to maintain an index for each attribute you want to search on. Here's a simple example - let's say that you store your tweets like in hashes, e.g.:
HMSET tweet:<id> text <tweet text> time <timestamp> ...
To create an index on your tweets' timestamps, you'll need to maintain a sorted set with the timestamp as score and the tweet's id as the value:
ZADD _tweet:time <timestamp> <id>
This will allow you to search for certain tweets in a given time period with ZRANGEBYSCORE.
Note that you'll also have to take care of maintaining the index (modify, del). You'd also need to repeat this approach for any additional indices. If you're looking for more material, here are some slides on the subject: http://www.slideshare.net/itamarhaber/20140922-redis-tlv-redis-indices
I'm a little bit confused about the term "index"... If you want to search in (JSON) data, I think ElasticSearch (http://www.elasticsearch.org/) would be the much better choice.
It you use case is mostly about geo information, have a look at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-geo-point-type.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-filter.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geohashgrid-aggregation.html
I think building this in Redis would give you a hard time. Don't get me wrong, I love Redis and I'm a huge advocate, but I think it's the wrong tool for what you want to build apparantly.
There's even a plugin for ElasticSearch which gathers the tweets for you:
https://github.com/elasticsearch/elasticsearch-river-twitter
I have read a few articles recently on the combination of mongodb for storage and elasticsearch for indexing/search. I feel like I'm missing something though. Why would you go this route as opposed to just using mongo to index the data? What benefits does elasticsearch bring and is it worth the added complexity?
ElasticSearch implements a lot more features, such as custom splitting of text into words, custom stemming, facetted search and a whole lot more. While MongoDB's (rather simple) text search does some of this, it is not nearly as powerful as ElasticSearch.
If all you ever do is look for a single string in a single field, then MongoDB's normal query system will work excellently for that. If you need to look for words in multiple fields, then MongoDB's text search will work. If you need anything more than that, ElasticSearch is the way to go.
A search engine and a database do some fundamentally different things. A good search engine (like ElasticSearch) supports far more elaborate and complex indexing, facets, highlighting etc. In the case of ElasticSearch, you also get your replies 'real-time'. On the other hand, a search engine doesn't return every single document that matches your query. Instead, it will score documents according to how much they match, and return the top scoring ones. When you query a database such as MongoDB, you should expect it to return everything that matches your query.
You can store the entire document in ElasticSearch, but it is usually not the optimal solution. Normally you will have it configured to return the document id's, which you use to fetch the document from a database. MongoDB is a database optimized for document based storage. this is why you hear about people using them together.
edit:
When this was posted, it matched the recommendations, but this may no longer be the case.
Derick's answer pretty much nails it. The questions behind all this is:
What are the features you want to implement in your application?
If you rely on heavy searching capabilities in large chunks of text, ElasticSearch is probably a good thing to use. If you want to have a flexible datastore that can cope with complex ad-hoc queries, Mongo might be a good fit. If you have different requirements for a datastore, it is often a good thing to combine two tools instead of implementing all kind of workarounds to make it work with just one datastore.
Choose the right tool for the job.
I am really new to the programming but I am studying it. I have one problem which I don't know how to solve.
I have collection of docs in mongoDB and I'm using Elasticsearch to query the fields. The problem is I want to store the output of search back in mongoDB but in different DB. I know that I have to create temporary DB which has to be updated with every search result. But how to do this? Or give me documentation to read so I could learn it. I will really appreciate your help!
Mongo does not natively support "temp" collections.
A typical thing to do here is to not actually write the entire results output to another DB since that would be utterly pointless since Elasticsearch does its own caching as such you don't need any layer over the top.
As well, due to IO concerns it is normally a bad idea to write say a result set of 10k records to Mongo or another DB.
There is a feature request for what you talk of: https://jira.mongodb.org/browse/SERVER-3215 but no planning as of yet.
Example
You could have a table of results.
Within this table you would have a doc that looks like:
{keywords: ['bok', 'mongodb']}
Each time you search and scroll through each result item you would write a row to this table populating the keywords field with keywords from that search result. This would be per search result per search result list per search. It would probably be best to just stream each search result to MongoDB as they come in. I have never programmed Python (though I wish to learn) so an example in pseudo:
var elastic_results = [{'elasticresult'}];
foreach(elastic_results as result){
//split down the phrases in this result and make a keywords array
db.results_collection.insert(array_formed_from_splitting_down_result); // Lets just lazy insert no need for batch or trying to shrink the amount of data to one go or whatever, lets just stream it in.
}
So as you go along your results you basically just mass insert as fast a possible create a sort of "stream" of input to MongoDB. It can do this quite well.
This should then give you a shardable list of words and language verbs to process things like MRs on and stuff to aggregate statistics about them.
Without knowing more and more about your scenario this is pretty much my best answer.
This does not use the temp table concept but instead makes your data permanent which is fine by the sounds of it since you wish to use Mongo as a storage engine for further tasks.
Actually there is MongoDB river plugin to work with Elasticsearch...
db.your_table.find().forEach(function(doc) { b.another_table.insert(doc); } );
We are trying to develop a strategy for using elasticsearch for full-text searching on our mongodb instance. It would appear that every key that we want to use as a filter must be included in elastics index. Potentially we could want to use every key in mongo as a filter - i.e. full-text search on description, filter by date and telephone number. Does anyone have any real-world experiences of adding full-text to mongo that they can share?
Maybe we can just use elasticsearch as a db?
I do not see any reason to use ElasticSearch in conjunction with MongoDb, just use ElasticSearch as separate document storage for documents, that have to be searched. And yes, you can even as whole db. Of course it depends on your domain model and other factors.
If you don't need stemming, fuzzy search, complicated wildcard search, you can do search with mongoDb. When new document inserted, split it to words in lower case, and add to the array "words" for example. Later you can perform search request against this array with regex. Not you can' use I (ignore case) option in this regex, and you can search only LIKE% wildcard (or without wildcard), otherwise search would not use mongoDb index.
One more option - you can try to find river for mongoDb
Another option - is to use Lucene if you are using Java. Probably you will be able to extend Directory class, in such a way, that Lucene will store index in MongoDb instead of file system or RAM. I have not made any research in this area, but I think it is possible
I experimented with full text search in MongoDB by splitting the words in the string like #Umar suggested. Honestly though, its a database and not a search engine so I would use Mongo for persistant storage and ElasticSearch for the search engine part of it. As a matter of fact, I would stick with something like Postgresql for persistant storage and then push the data you want to search out to the search engine. http://gdal.org/ogr/drv_elasticsearch.html is a driver that will allow you to quickly export your data from one RDBMS to ElasticSearch. THe data does not have to be geospatial in order to use it GDAL as long as their is a way to connect to the input source.
Adam
The documents in my database have names and descriptions among other fields. I would like to allow the users to search for those documents by providing some keywords. The keywords should be used to lookup in both the name and the description field. I've read the mongoDB documentation on full text search and it looks really nice and easy if I want to search for keywords in the name field of my documents. However, the description field contains free form text and can take up to 2000 characters, so potentially there are a few hundred words per document. I could treat them the same way as names and just split the whole description into separate words and store it as another tag-like array (as per the Mongo example), but it seems like a terrible idea - each document's size could be almost doubled, plus there are characters like dots, commas, etc.
I know there are specialized solutions for exactly this kind of problems and I was just looking at Lucene.Net, I also saw Solr mentioned here and there.
Should I be looking to implement this search feature in mongoDB or should I use a specialized solution? Currently I just have one instance of mongod and one instance of a web server. We might need to scale later, but for now that is all I use. I'd appreciate any suggestions on how to implement this feature.
If storing the text split out into an array per the documented approach is not viable (I can understand your concerns), then I think you should look into a specialised solution.
Quote from the MongoDB documentation:
MongoDB has interesting functionality
that makes certain search functions
easy. That said, it is not a dedicated
full text search engine.
So, for more advanced full text search functionality I think a dedicated engine would be more suited. I have no experience in this area so I can't offer much in the way of suggestions from here, other than what my thoughts would be if I was in the same boat:
how much work involved in using a dedicated full-text search engine instead of MongoDB's functionality?
does that add more complexity / is it worth it?
would it be quicker/simpler to use MongoDB and just take the hit on the extra disk space?
maybe MongoDB will support better full-text functionality in future (it is rapidly evolving after all)
Fulltext search support is planned for the future. However right now you have to go with Solr & friends. Using the built-in "fulltext" functionality is not really suitable for real world usage.