I would like to exclude images from a search results with indexed_search plugin. I've excluded file types from indexing under plugin settings, but I'm not sure if that would also effect all images that were already indexed.
As far as i know this setting only affects the indexing but not the search result.
So you need to remove all images that are already indexed - maybe it is easier to delete the whole index and rebuild it from scratch.
Related
I have a bunch of html pages. The idea is to allow users enter keywords to search through these html pages. only html pages that match the criteria will be store for later references. I knew that Elastic search can index html,pdf, and more but in my case I already have postgresql as my database and my system is small enough so I don't want to have Elasticsearch as extra dependency for this project.
A few issues I have here here:
because html won't be stored unless the query match the users' keywords, is there a better approach to handle this without having had to index html first to the search engine to be able to search and remove it afterward if it doesn't match the criteria ?
yes is it possible to index whole html content like in Elasticsearch ?
Thanks a lot for your help?
I am looking for a Document-Oriented-Database solution - MongoDB preferred - to index a continuously growing and frequently changing number of (pandoc) markdown files.
I read that MongoDB has a clean text indexer but I have not worked with MongoDB before and the only thing related which I found was an indexing process of preprocessed HTML. The scenario I am thinking about is: An automatic indexing of the markdown files where the markdown syntax is used to create keys (for example ## FOOO -> header2: FOO) and where the hierarchical structure of the key/value pairs is preserved as they appear in the document.
Is this possible with MongoDB only or do I always need a preprocessing in which I transform the markdown into something like a BSON file and than ingest it into MongoDB?
Why do you want to use MongoDB for it? I think ElasticSearch is much better fitting for this purpose, it's basically built for indexing texts. However - the same as with MongoDB - you won't get anything automatic and will need to process the document before saving it, if you to improve the precision of finding the documents. The whole document needs to be sent to ElasticSearch as a JSON object, but you can store inside a property also the whole unprocessed markdown text.
I'm not sure about MongoDB full text indices, but ElasticSearch also combines all indexed properties of a document for the full text search. Additionally you can also define the importance of different properties in your index. For instance the title might be more important than the rest of the text, ...
I have an index with around 5 million documents that I am trying to do a "contains" search on. I know how to accomplish this and I have explained the performance cost to the customer, but that is what they want. As expected doing a "contains" search on the entire index is very slow, but sometimes I only want to search a very small subset of the index (say 100 documents or so). I've done this by adding a Filter to the search that should limit the results correctly. However I find that this search and the entire index search perform almost exactly the same. Is there something I'm missing here? It feels like this search is also searching the entire index.
Adding a filter to the search will not limit the scope of the index.
You need to be more clear about what you need from your search, but I don't believe what you want is possible.
Is the subset of documents always the same? If so, maybe you can get clever with multiple indices. (e.g. search the smaller index and if there aren't enough hits, then search the larger index).
You can try SingleCharTokenAnalyzer
I'm wondering if anybody can suggest the right way to re-index with zend_search_lucene. There isn't an option to update documents, you need to delete and re-add. I've got a bunch of database tables which I'm going to cycle over and add a document to the index for each. I can't see any point in deleting documents as I go - I may as well empty the entire index, and then add everything afresh.
There doesn't seem to be a simple deleteAllDocs() method, so I have to find them all first, and then loop over them, delete them one by one, then loop over my database tables and add them all. There isn't a getAllDocuments method either (although there is a solution here http://forums.zend.com/viewtopic.php?f=69&t=9121)
Obviously I could write something fancy which checks if the document has changed, and only delete it if it has, but this involves comparing all fields doesn't it?
I feel like I must be missing something.
I delete the index and create a new index. more or less as here
I have am using mongodb with mongomapper to store all my products. Each product belongs to multiple categories that have many levels i.e. category, sub category etc.
Each product has many search fields that are embedded documents in product.
All this is working and I now want to add search to the app.
The search system needs text search: multiple, dynamic, faceted search including min/max range search.
I have been looking into sunspot gem but having difficulty setting it up on dev let alone trying to run it in production! And I have also looked at sphinx.
But I am wondering if using just mongomapper / mongodb will be quick enough and the best way, as its quite a complex search system ?
Any help / suggestions / experiences / tutorials and examples on this would be most appreciated.
Thanks a lot,
Rick
I've been involved with a very large Sphinx powered search and I think its awful. Very difficult to configure if you want anything past a very simple full-text search. Solr\Lucene, on the other hand, is incredibly flexible and was unbelievably easier to setup and get running.
I am not using Solr in conjunction with MongoDB to power full text search with all the extra goodies, like facets, etc. Depending on how you configure Solr, you may not need to even hit your MongoDB for data. Or, you may tell Solr to index fields, but not to store them and instead you just store the ObjectId's that correspond to data inside of MongoDB.
If your search truly is a complex search system, I very strongly recommend that you do not use MongoDB for search and go with Solr. One big reason is that MongoDb doesnt have a full text feature - instead, it has regular expression matches. The Regex matches work wonderfully but will only use indexes in certain cases.