Solr and Unicode - unicode

I've installed and configured Solr, Tika and tried indexing and search documents. Until now, everything worked fine. But, there is a problem with Unicode (of course :)). I indexed a document with Unicode text about Red Star (Црвена звезда) football club. When I try to search this article, and type "Црвена" or "звезда" Solr finds correct document.
How I can create synonym list for these words (or other Cyrillic words)? What do I have to do in order to enable Solr to find documents even if I type "звезде", "звезду", etc. ?
Best,
Joksimovic

Solr provides an interface for synonyms as well as a SpellCheckComponent.
However your problem is not really related to an Unicode problem.

Related

Elasticsearch vs MongoDB for full text search

This is a full text search question.
I was using Elasticsearch for my logging system. And now I heard that MongoDB also supports full text search and tested the performance.
I made a text index and tested it.
With 10,000 words, 10 million documents were created.
And it looked up two words. (ex. "apple pineapple")
The results were surprising. MongoDB searches were faster.
Am I misunderstanding full text search in Elasticsearch?? did i do the test wrong?
In terms of full text search performance, is there no reason why Elasticsearch should be used?
Am I misunderstanding full text search??
Please teach me.
If your use case is full text search only, I will still be more inclined towards Elasticsearch as it is designed for the same. I admit however that I haven't explored Mongodb capabilities in this regard. Elasticsearch provides various search paths fuzzy, proximity matches, match phrases and more which can be used depending on your use case.
One another difference between Elastic and Mongo's data storage is that Elastic keeps everything in memory while Mongo balances between disk and memory. So ideally Elastic should be faster if you load test it.
In terms of your test, please make sure that both mongo and elastic clusters are of equivalent strength in terms of resources. Else it is not apple to apple comparison.

How well do the search engines of databases like mongoDB and Postgres do compared to something like Elasticsearch? (in 2020)

I am working on a website, where users can create entries into a database. These entries are all of the same form, so I was using Sqlite (with FTS5) so far, which is the only database I know ;)
However it is crucial that users can search these entries on the website. The full text search is working decently well (the users know approximately what they are looking for) but I need two improvements:
Wrong spelling should return the correct entry (I have seen the spellfix extension for sqlite for that, but I don't know how well it works)
more importantly if a user enters a few query words on the website I try to MATCH those with a sql query. If a user enters too many words, it will return 0 matches:
for example: if a user inputs "sars covid 19" into the search-bar:
CREATE VIRTUAL TABLE TEST USING FTS5(SomeText);
INSERT INTO TEST(SomeText) VALUES
('Covid 19');
SELECT SomeText
FROM TEST
WHERE SomeText MATCH 'sars covid 19';
=> 0 matches, but I would want it to return the 'covid 19' entry.
I realise that sqlite might be too simplistic for a database that needs to handle searches well. Now my question is: Do Postgres or MongoDB have search engines that include the functionality that I need? Or is it worth diving into solutions with Elastic Search etc.?
Most articles I found on this are 5-10 years old, so I am wondering what the current state of affairs is regarding search engines in databases. Any hints are greatly appreciated
Combination es + mongodb work well, you index and perform full text search in es and you keep the original documents with some key fields indexed in mongodb...
Elasticsearch will work for sure. You only have to think about how you will index your document, and you will be able to find them the way you index them, in your context it seems that the default text will work with a match query :
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
MongoDb will work too in this simple case : https://docs.mongodb.com/manual/text-search/ , but mongo wont work with tokenizer so if you need to upgrade your text search mongo will be limited.
Postgresql could do it, using the like but I am not familiar with enough, if you have 10k entries, it will be ok for sure, if you expect 1 millions, mongo or es would be better.
If you have to choose between mongodb and es, you have to be more specific in your question, for full text, es is really nice, having a lot of features, mongodb give some nice database tools too. Sometimes es will be better, sometimes mongo, depends of what you need. If you only want full text, es is a must.

autocomplete search with sphinx

I am looking for a very fast autocomplete solution for displaying results in mobile apps. I am using sphinx as full text index solution, but I thing if sphinx is the best one solution for autocomplete search, because after the index is searched, then I need to ask mysql for the results. Is there better and faster solution?
Well you can use string attributes, to store the actual text.
Then you don't need to go back to the database at all. Can just query sphinx. Sphinx stores attributes in memory; so doesn't slow the actual sphinx query searching down noticeably.
Sphinx works well for autocomplete in my experience.
If you are running sphinx 2.0.2 or greater:
index_exact_words = 1
Sphinx supports wildcard searching. Have a look at the parameter "enable_star". If you set it to 1 and restart sphinx, you should be able to search using wildcards.
Check it out in the Sphinx docs.
To find matches where any word contains "micro", the search term needs to be "micro".

advanced searching mongodb using mongomapper, sunspot/solr or sphinx?

I have am using mongodb with mongomapper to store all my products. Each product belongs to multiple categories that have many levels i.e. category, sub category etc.
Each product has many search fields that are embedded documents in product.
All this is working and I now want to add search to the app.
The search system needs text search: multiple, dynamic, faceted search including min/max range search.
I have been looking into sunspot gem but having difficulty setting it up on dev let alone trying to run it in production! And I have also looked at sphinx.
But I am wondering if using just mongomapper / mongodb will be quick enough and the best way, as its quite a complex search system ?
Any help / suggestions / experiences / tutorials and examples on this would be most appreciated.
Thanks a lot,
Rick
I've been involved with a very large Sphinx powered search and I think its awful. Very difficult to configure if you want anything past a very simple full-text search. Solr\Lucene, on the other hand, is incredibly flexible and was unbelievably easier to setup and get running.
I am not using Solr in conjunction with MongoDB to power full text search with all the extra goodies, like facets, etc. Depending on how you configure Solr, you may not need to even hit your MongoDB for data. Or, you may tell Solr to index fields, but not to store them and instead you just store the ObjectId's that correspond to data inside of MongoDB.
If your search truly is a complex search system, I very strongly recommend that you do not use MongoDB for search and go with Solr. One big reason is that MongoDb doesnt have a full text feature - instead, it has regular expression matches. The Regex matches work wonderfully but will only use indexes in certain cases.

Please advise an optimal solution to full text search in mongoDB

The documents in my database have names and descriptions among other fields. I would like to allow the users to search for those documents by providing some keywords. The keywords should be used to lookup in both the name and the description field. I've read the mongoDB documentation on full text search and it looks really nice and easy if I want to search for keywords in the name field of my documents. However, the description field contains free form text and can take up to 2000 characters, so potentially there are a few hundred words per document. I could treat them the same way as names and just split the whole description into separate words and store it as another tag-like array (as per the Mongo example), but it seems like a terrible idea - each document's size could be almost doubled, plus there are characters like dots, commas, etc.
I know there are specialized solutions for exactly this kind of problems and I was just looking at Lucene.Net, I also saw Solr mentioned here and there.
Should I be looking to implement this search feature in mongoDB or should I use a specialized solution? Currently I just have one instance of mongod and one instance of a web server. We might need to scale later, but for now that is all I use. I'd appreciate any suggestions on how to implement this feature.
If storing the text split out into an array per the documented approach is not viable (I can understand your concerns), then I think you should look into a specialised solution.
Quote from the MongoDB documentation:
MongoDB has interesting functionality
that makes certain search functions
easy. That said, it is not a dedicated
full text search engine.
So, for more advanced full text search functionality I think a dedicated engine would be more suited. I have no experience in this area so I can't offer much in the way of suggestions from here, other than what my thoughts would be if I was in the same boat:
how much work involved in using a dedicated full-text search engine instead of MongoDB's functionality?
does that add more complexity / is it worth it?
would it be quicker/simpler to use MongoDB and just take the hit on the extra disk space?
maybe MongoDB will support better full-text functionality in future (it is rapidly evolving after all)
Fulltext search support is planned for the future. However right now you have to go with Solr & friends. Using the built-in "fulltext" functionality is not really suitable for real world usage.