I'm trying to implement an inverted-index search engine with MongoDb (MongoEngine) where terms in Posts are assigned weights and then used as embedded documents like such:
class Term(db.EmbeddedDocument):
t = db.StringField()
weight = db.FloatField()
class Post(db.Document):
terms = db.ListField(db.EmbeddedDocumentField(Term))
Then given a term, I can find the Posts that contain the term using this query:
post_list = Post.objects(terms__t=term)
However, this returns a list of Posts, but how can I find the weight of the term for each returned Post without having to iterate through the list of embedded terms looking for the term? Is there a way to query the Posts to automatically return the weight for any returned Posts as well?
Also would appreciate if anyone has any better methods of implementing a search engine in MongoDB?
Thanks!
MongoDB supports a basic text index see: http://docs.mongodb.org/manual/core/index-text/ This is a better way to store and search against documents, especially if you want a score for the match.
You'd have to call the command manually as its not currently implemented in MongoEngine.
Related
QUERYING MONGODB: RETREIVE SHOPS BY NAME AND BY LOCATION WITH ONE SINGLE QUERY
Hi folks!
I'm building a "search shops" application using MEAN Stack.
I store shops documents in MongoDB "location" collection like this:
{
_id: .....
name: ...//shop name
location : //...GEOJson
}
UI provides to the users one single input for shops searching. Basically, I would perform one single query to retrieve in the same results array:
All shops near the user (eventually limit to x)
All shops named "like" the input value
On logical side, I think this is a "$or like" query
Based on this answer
Using full text search with geospatial index on Mongodb
probably assign two special indexes (2dsphere and full text) to the collection is not the right manner to achieve this, anyway I think this is a different case just because I really don't want to apply sequential filter to results, "simply" want to retreive data with 2 distinct criteria.
If I should set indexes on my collection, of course the approach is to perform two distinct queries with two distinct mehtods ($near for locations and $text for name), and then merge the results with some server side logic to remove duplicate documents and sort them in some useful way for user experience, but I'm still wondering if exists a method to achieve this result with one single query.
So, the question is: is it possible or this kind of approach is out of MongoDB purpose?
Hope this is clear and hope that someone can teach something today!
Thanks
The Firestore documentation on making queries includes examples where you can filter a collection of documents based on whether some field in the document either matches, is less than, or is greater than some value you pass in. For example:
db.collection("cities").whereField("population", isLessThan: 100000)
This will return every "city" whose "population" is less than 100000. This type of query can be made on fields of type String as well.
db.collection("cities").whereField("name", isGreaterThanOrEqualTo: "San Francisco")
I don't see a method to perform a substring search. For example, this is not available:
db.collection("cities").whereField("name", beginsWith: "San")
I suppose I could add something like this myself using greaterThan and lessThan but I wanted to check first:
Why doesn't this functionality exist?
My fear is that it doesn't exist because the performance would be terrible.
[Googler here] You are correct, there are no string operations like beginsWith or contains in Cloud Firestore, you will have to approximate your query using greater than and less than comparisons.
You say "it doesn't exist because the performance would be terrible" and while I won't use those exact words you are right, the reason is performance.
All Cloud Firestore queries must hit an index. This is how we can guarantee that the performance of any query scales with the size of the result set even as the data set grows. We don't currently index string data in a way that would make it easy to service the queries you want.
Full text search operations are one of the top Cloud Firestore feature requests so we're certainly looking into it.
Right now if you want to do full text search or similar operations we recommend integrating with an external service, and we provide some guidance on how to do so:
https://firebase.google.com/docs/firestore/solutions/search
It is possible now:
db.collection('cities')
.where('name', '>=', 'San')
.where('name', '<', 'Sam');
for more details see Firestore query documents startsWith a string
(MongoDB Full Text Search)
Hello,
I have put some fields in index and this is how I could search for a search_keyword.
BasicDBObject search = new BasicDBObject("$search", "search_keyword");
BasicDBObject textSearch = new BasicDBObject("$text", search);
DBCursor cursor = users.find(textSearch);
I don't want search_keyword to be searched in all the fields specified in the index. *Is there any method to specify search_keyword to be searched in specific fields from the index??*
If so, please give me some idea how to do it in Java.
Thank you.
If you want to index a single field and search for it then it is the way it works by default. Lets say you want to index the field companyName. When you perform $text search on this collection, only the data from the companyName field will be used because you only included that field in your index.
Now the second scenario, your $text index includes more than one field. In this case you cannot limit the search to only look for values indexed from a specific field. The $text index is constructed on the collection level and a collection can have at most one $text index. Your option to limit search on specific field in this case may be to use regex instead.
MongoDB has the flexibility to fulfil requirements of other scenarios, but you can also evaluate using other technologies if your application is heavily search-driven and you are primarily after a full-text search engine for locating documents by keyword with a rich query syntax. ElasticSearch might be an alternative here. It really depends on the type of the application and your needs.
Since it is not possible to find "blueberry" by the word "blue" by using a mongodb full text search, I want to help my users to complete the word "blue" to "blueberry". To do so, is it possible to query all the words in a mongodb full text index -> that I can use the words as suggestions i.e. for typeahead.js?
Language stemming in text search uses an algorithm to try to relate words derived from a common base (eg. "running" should match "run"). This is different from the prefix match (eg. "blue" matching "blueberry") that you want to implement for an autocomplete feature.
To most effectively use typeahead.js with MongoDB text search I would suggest focusing on the prefetch support in typeahead:
Create a keywords collection which has the common words (perhaps with usage frequency count) used in your collection. You could create this collection by running a Map/Reduce across the collection you have the text search index on, and keep the word list up to date using a periodic Incremental Map/Reduce as new documents are added.
Have your application generate a JSON document from the keywords collection with the unique keywords (perhaps limited to "popular" keywords based on word frequency to keep the list manageable/relevant).
You can then use the generated keywords JSON for client-side autocomplete with typeahead's prefetch feature:
$('.mysearch .typeahead').typeahead({
name: 'mysearch',
prefetch: '/data/keywords.json'
});
typeahead.js will cache the prefetch JSON data in localStorage for client-side searches. When the search form is submitted, your application can use the server-side MongoDB text search to return the full results in relevance order.
A simple workaround I am doing right now is to break the text into individual chars stored as a text indexed array.
Then when you do the $search query you simply break up the query into chars again.
Please note that this only works for short strings say length smaller than 32 otherwise the indexing building process will take really long thus performance will be down significantly when inserting new records.
You can not query for all the words in the index, but you can of course query the original document's fields. The words in the search index are also not always the full words, but are stemmed anyway. So you probably wouldn't find "blueberry" in the index, but just "blueberri".
Don't know if this might be useful to some new people facing this problem.
Depending on the size of your collection and how much RAM you have available, you can make a search by $regex, by creating the proper index. E.g:
db.collection.find( {query : {$regex: /querywords/}}).sort({'criteria': -1}).limit(limit)
You would need an index as follows:
db.collection.ensureIndex( { "query": 1, "criteria" : -1 } )
This could be really fast if you have enough memory.
Hope this helps.
For those who have not yet started implementing any database architecture and are here for a solution, go for Elasticsearch. Its a json document driven database similar to mongodb structurally. It has "edge-ngram" analyzer which is really really efficient and quick in giving you did you mean for mis-spelled searches. You can also search partially.
This question already has answers here:
How to query MongoDB with "like"
(45 answers)
Closed 7 years ago.
Is there in MongoDB/mongoose 'like' statement such as in SQL language?
The single reason of using is a implementation of full-text searching.
MongoDB supports RegularExpressions.
Two ways we can implement regular expression using java API for mongodb:
Expression 1:
I need to search start with that string in document field
String strpattern ="your regular expression";
Pattern p = Pattern.compile(strpattern);
BasicDBObject obj = new BasicDBObject()
obj.put("key",p);
Example : (consider I want to search wherever name start with 'a')
String strpattern ="^a";
Pattern p = Pattern.compile(strpattern);
BasicDBObject obj = new BasicDBObject()
obj.put("name",p);
Expression 2:
I need to search multiple words in one field , you can use as follows
String pattern = "\\b(one|how many)\\b";
BasicDBObject obj = new BasicDBObject();
//if you want to check case sensitive
obj.put("message", Pattern.compile(pattern,Pattern.CASE_INSENSITIVE));
Although you can use regular expressions, you won't get good performance on full text search because a contains query such as /text/ cannot use an index.
A begins-with query such as /^text/ can, however, use an index.
If you are considering full text search on any large scale please consider MongoDB's Multi Key Search feature.
Update
As of MongoDB v3.2 you can also use a text index.
Consider making a script at application level that transforms your data to tokens (words). Then treat tokens as tags, build an index on those tokens, and then search the tokens like searching for tags. This is like creating an inverted index.
For way better search capabilities on text consider using Lucene instead of MongoDB.
Use regex: /^textforesarc$/i. Just don't forget to say goodbye to performance :).
Simulating regex search with like will not hit custom defined index. Only starts with is supported for now.