How do you use sphinx to search for urls - sphinx

I have millions of URLs in a mysql database. I'm using sphinx to index them, but sphinx is having trouble with the periods in the urls. How do you go about indexing urls in sphinx so that it handles the periods when I search?
I guess I just want it to ignore the periods?

Did you try to add the periods to the
charset_table
Hopes this help.

Related

How well do the search engines of databases like mongoDB and Postgres do compared to something like Elasticsearch? (in 2020)

I am working on a website, where users can create entries into a database. These entries are all of the same form, so I was using Sqlite (with FTS5) so far, which is the only database I know ;)
However it is crucial that users can search these entries on the website. The full text search is working decently well (the users know approximately what they are looking for) but I need two improvements:
Wrong spelling should return the correct entry (I have seen the spellfix extension for sqlite for that, but I don't know how well it works)
more importantly if a user enters a few query words on the website I try to MATCH those with a sql query. If a user enters too many words, it will return 0 matches:
for example: if a user inputs "sars covid 19" into the search-bar:
CREATE VIRTUAL TABLE TEST USING FTS5(SomeText);
INSERT INTO TEST(SomeText) VALUES
('Covid 19');
SELECT SomeText
FROM TEST
WHERE SomeText MATCH 'sars covid 19';
=> 0 matches, but I would want it to return the 'covid 19' entry.
I realise that sqlite might be too simplistic for a database that needs to handle searches well. Now my question is: Do Postgres or MongoDB have search engines that include the functionality that I need? Or is it worth diving into solutions with Elastic Search etc.?
Most articles I found on this are 5-10 years old, so I am wondering what the current state of affairs is regarding search engines in databases. Any hints are greatly appreciated
Combination es + mongodb work well, you index and perform full text search in es and you keep the original documents with some key fields indexed in mongodb...
Elasticsearch will work for sure. You only have to think about how you will index your document, and you will be able to find them the way you index them, in your context it seems that the default text will work with a match query :
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
MongoDb will work too in this simple case : https://docs.mongodb.com/manual/text-search/ , but mongo wont work with tokenizer so if you need to upgrade your text search mongo will be limited.
Postgresql could do it, using the like but I am not familiar with enough, if you have 10k entries, it will be ok for sure, if you expect 1 millions, mongo or es would be better.
If you have to choose between mongodb and es, you have to be more specific in your question, for full text, es is really nice, having a lot of features, mongodb give some nice database tools too. Sometimes es will be better, sometimes mongo, depends of what you need. If you only want full text, es is a must.

Using Mongodb to check existence of URL in small crawler

I'm using MongoDB for indexing URLs in a small crawler. Maximum number of URLs in my crawler is about 500 million URLs. I want to search in the URLdb for checking existing URLs, but the speed of MongoDB in search is very low for this query:
db.hosts.find({URL:"http://myhost.com"})
My questions are:
What can I do to improve the search speed in MongoDB?
For my purpose, is Lucene better than MongoDB or not?
It's fairly well established in the documentation that the way to improve query performance is by adding an index to the field on which you are querying.
The amount of information about what you are doing is insufficient for anyone to tell if Lucene will be better than MongoDB.
Also, if you are searching your URL for an existing URL so that you don't add a duplicate, then what you want is to create a unique index.

autocomplete search with sphinx

I am looking for a very fast autocomplete solution for displaying results in mobile apps. I am using sphinx as full text index solution, but I thing if sphinx is the best one solution for autocomplete search, because after the index is searched, then I need to ask mysql for the results. Is there better and faster solution?
Well you can use string attributes, to store the actual text.
Then you don't need to go back to the database at all. Can just query sphinx. Sphinx stores attributes in memory; so doesn't slow the actual sphinx query searching down noticeably.
Sphinx works well for autocomplete in my experience.
If you are running sphinx 2.0.2 or greater:
index_exact_words = 1
Sphinx supports wildcard searching. Have a look at the parameter "enable_star". If you set it to 1 and restart sphinx, you should be able to search using wildcards.
Check it out in the Sphinx docs.
To find matches where any word contains "micro", the search term needs to be "micro".

mongodb fulltext searching strategy

We are trying to develop a strategy for using elasticsearch for full-text searching on our mongodb instance. It would appear that every key that we want to use as a filter must be included in elastics index. Potentially we could want to use every key in mongo as a filter - i.e. full-text search on description, filter by date and telephone number. Does anyone have any real-world experiences of adding full-text to mongo that they can share?
Maybe we can just use elasticsearch as a db?
I do not see any reason to use ElasticSearch in conjunction with MongoDb, just use ElasticSearch as separate document storage for documents, that have to be searched. And yes, you can even as whole db. Of course it depends on your domain model and other factors.
If you don't need stemming, fuzzy search, complicated wildcard search, you can do search with mongoDb. When new document inserted, split it to words in lower case, and add to the array "words" for example. Later you can perform search request against this array with regex. Not you can' use I (ignore case) option in this regex, and you can search only LIKE% wildcard (or without wildcard), otherwise search would not use mongoDb index.
One more option - you can try to find river for mongoDb
Another option - is to use Lucene if you are using Java. Probably you will be able to extend Directory class, in such a way, that Lucene will store index in MongoDb instead of file system or RAM. I have not made any research in this area, but I think it is possible
I experimented with full text search in MongoDB by splitting the words in the string like #Umar suggested. Honestly though, its a database and not a search engine so I would use Mongo for persistant storage and ElasticSearch for the search engine part of it. As a matter of fact, I would stick with something like Postgresql for persistant storage and then push the data you want to search out to the search engine. http://gdal.org/ogr/drv_elasticsearch.html is a driver that will allow you to quickly export your data from one RDBMS to ElasticSearch. THe data does not have to be geospatial in order to use it GDAL as long as their is a way to connect to the input source.
Adam

advanced searching mongodb using mongomapper, sunspot/solr or sphinx?

I have am using mongodb with mongomapper to store all my products. Each product belongs to multiple categories that have many levels i.e. category, sub category etc.
Each product has many search fields that are embedded documents in product.
All this is working and I now want to add search to the app.
The search system needs text search: multiple, dynamic, faceted search including min/max range search.
I have been looking into sunspot gem but having difficulty setting it up on dev let alone trying to run it in production! And I have also looked at sphinx.
But I am wondering if using just mongomapper / mongodb will be quick enough and the best way, as its quite a complex search system ?
Any help / suggestions / experiences / tutorials and examples on this would be most appreciated.
Thanks a lot,
Rick
I've been involved with a very large Sphinx powered search and I think its awful. Very difficult to configure if you want anything past a very simple full-text search. Solr\Lucene, on the other hand, is incredibly flexible and was unbelievably easier to setup and get running.
I am not using Solr in conjunction with MongoDB to power full text search with all the extra goodies, like facets, etc. Depending on how you configure Solr, you may not need to even hit your MongoDB for data. Or, you may tell Solr to index fields, but not to store them and instead you just store the ObjectId's that correspond to data inside of MongoDB.
If your search truly is a complex search system, I very strongly recommend that you do not use MongoDB for search and go with Solr. One big reason is that MongoDb doesnt have a full text feature - instead, it has regular expression matches. The Regex matches work wonderfully but will only use indexes in certain cases.