Lucene.NET faceted search - lucene.net

I found a great tutorial on performing a faceted search.
http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
This article does not explain how to retrieve the narrowed available attributes to filter from (for further drill down).
Lets say I am looking for planners that are red. When I perform the faceted search, I want to return all available attributes to filter from that are red. Then when I add a "weekly format" filter, I want the attribute list to get even smaller, containing only filters available for the segmented group.
I want love to use Solr/SolrNET but I am in a shared hosting situation with limited access to the actual server.
I am fairly new to lucene.net, so examples are much appreciated.

IIUC, you get a BitArray containing the list of the filtered results. In the tutorial's example, you will have combinedResults as this list. If you want to further narrow this down, you need to reiterate the process: run another searchQuery and intersect the results with the BitArray you have for combinedResults.

I want love to use Solr/SolrNET but I am in a shared hosting situation with limited access to the actual server.
You can always use an off-site, hosted Solr solution. See this question for more information.

Related

Approach for extracting relevant text using Azure Cognitive Search

Context:
I have a set of documents in SharePoint. I have set up Azure Cognitive Search (Standard tier) with data sources (SharePoint), index and indexers. I have also added a semantic configuration.
Outcome:
Ask a question, and have the search find and return relevant sections from the documents. I will use these sections to feed into OpenAI to construct a cohesive result.
I would like to replicate this Microsoft demo: https://www.youtube.com/watch?v=3t3qZu1Dy1k&t=572s It seems to me to create this 'demo' each document content is very small and they could easily be combined to pass into OpenAI.
My experience so far:
The results return the documents and rank them, which seems OK - however it returns a short 'caption' and the full text. The caption is not necessarily related to my question - and can therefore not be used for the next step. The full document is far too big to be used in OpenAI.
I have managed to get Semantic answers - however the question has to be so precise to get a result, and the associated text is limited.
What I would like:
I would like the search to return sub-sections of the document, where the results of my question may be. If that is not supported, I feel I need an entirely new approach.
Any ideas? Thanks in advance for your time.
The demo you refer to works by feeding documents to Azure Cognitive Search. A query is then formulated as a question that uses the Semantic Search functionality to return a set of potential semantic answers extracted from the content in the index.
These potential semantic answers are then fed as a prompt to OpenAI's text completion service: https://beta.openai.com/docs/guides/completion
First, you must ensure you can get good semantic answers. Inspect the content you have indexed and verify that it contains content that could semantically be an answer to the questions you test with. Good content should have declarations of facts. I.e., statements that could be used verbatim as an answer to a question. Examples:
The capital of France is Paris.
Forecast for 2022 is expected to be 22%.
The semantic functionality in Azure Search will only respond with a text section containing a potential answer to your question. If you can't get this step to work, you have to work on improving that. Either via semantic configuration, choice of content, or by making sure you process your content so that the items in your index contain the relevant content in the correct properties.
Ensure your content is indexed and mapped to properties in a sensible way
Work with the semantic configuration until you get sensible results
Once the previous two steps are ok, submit to OpenAI
I have tested the semantic text on two different data sets. Both were a combination of website content, PDF- and Word documents, etc. The topic and volume of content were essentially the same. From one data set, I could get excellent semantic answers. But, the other data set was disappointing.
My conclusion was that the content in the good data set was formulated and structured in a way that fits a semantic scenario. The other data set would often have logic and meaning presented in tables and layouts. As a human reading the content on paper, you would understand it. But, semantically, it would not make as much sense.

How to design a query where I retrieve last data from resource that I want to apply filter to in RESTful way?

How should a query look like when I want to retrieve last measurements from installations that aren't removed?
Something like that?
/my-web-service/installations/measurements/last?removed=false
The thing is, I don't want to retrieve last measurements that weren't removed from installations. I want to retrieve last measurements from installations that weren't removed.
I see a couple possibilities here:
If you need to read the data from the endpoint transactionally, the way you designed it is the way to go. What I'd change is the name of the param from removed to installationRemoved since it's more descriptive and shorten the endpoint to /my-web-service/measurements/ - since with installations it's unclear in which scope does the client operate. Also, don't you need since param to filter the last measurements?
It there's a chance to split the two endpoints I'd add:
/my-web-service/installations/?removed=false
/my-web-service/measurements/?since=timestamp&installations=<array>
It does not make it better (when it comes to better or worse) but easier and more predictive for the users.
In general try to add more general endpoints with filtering options rather then highly dedicated - doing one particular thing. This way leads to hard to use, loose API. Also, on filtering.
And final notice, your API is good if your clients use it not because they have to but when they like it ;)
According to this best practices article, you could use "aliases for common queries":
To make the API experience more pleasant for the average consumer,
consider packaging up sets of conditions into easily accessible
RESTful paths. For example, the recently closed tickets query above
could be packaged up as GET /tickets/recently_closed
So, in your case, it could be:
/my-web-service/installations/non_removed/measurements/last
where non_removed would be an alias for querying installations that weren't removed.
Hope it helps!

Cant use LIKE but need to find related records in SQL Server

I've got a table used for issue tracking (kind of like stackoverflow :) to log PC related issues) and for simplicity I'll narrow it down to a few fields, something like the following:
Site Category Issue
MI Office Software My MS word does not run macros.
CL Office Hardware PC memory needs to be upgraded
MX Office Printer Printer is out of memory.
MI Office Software Office product prompts for allowing macro to run
I want to find related issues when I am looking at for instance one issue. I can't really use the LIKE operator as for instance if I do:
SELECT...FROM...WHERE Issue LIKE '%My MS word does not run macros.%'
Would only return the first record. Do I have to figure out how to pull key words like "Macros" ? How would I find related records so that my query for instance could return records 1 and 4. Or return 2 and 3 together?
Well here are 3 ways to go about it..
1. Best case:
We have the users add 'tags' to each issue. This way users can search issues using tags and find related issues too. (Just like http://stackoverflow.com ;) )
This could be implemented creating two new tables:
tag_metadata (tag#, name, description, ...)
tag_issue_relationship(tag#, issue#)
We could go a step further and add weight to each issue entry that will determine its position in the similar issue search/look-up ranking.
2. Average case:
We have more levels to sub-categories to help further classify the problem. Now thinking of change control, will your system support easily adding/removing/re-arranging the category hierarchy over time..?
3. Worst case:
Lets say, the users are very lazy and don't want to spend a few seconds tagging their issues :).. Then you would have to implement an indexing algorithm that picks up keywords (nouns) for the issue description and builds indexes to facilitate finding 'similar' issues. Now many-a-times we may have keywords in the description that may not be significant and would result in false positives.
[Update]
Basically the solution what you are looking for could be broken up into these modules:
Parser: Will extract significant keywords from the issue description. A custom dictionary list of keywords would be used as the lookup table.
Indexer: Would index these keywords to make it searchable. This involves maintaining forward and reverse indices!
Search: Would use the indexes to locate 'similar' issues.
There may be an existing commercial/open-source product that does this..

Displaying results of perform find in a portal

I have some global variables $$A, $$B, $$C and what to search within a table for these terms in fieldA, fieldB and fieldC (using Perform Find). How can I use the result of this Perform Find to display the results in a portal.
The implementation by my predecessor replaces a field fieldSEARCHwith 1 if it is in the Perform Find results and 0 otherwise, and then uses a portal filtered by this field. This seems a very dodgey way of doing it, not least becuase it means that multiple users will not be able to search at the same time!
Can you enhance the portal filter to filter against the variables themselves? Or you can perform the find, grab IDs of the found set, put them into a global field, and then use the field to construct the relationship. Global fields are multi-user safe.
The best way is not to do this at all, but use list views to perform searches. List views are naturally searchable and much more flexible than portals (you can easily sort them, omit arbitrary records, and so on). It's possible to repeat this functionality in portals, but it's way more complex. I mean, if there's some serious gain from using a portal, then it's doable, but if not, then the native way is obviously better.
List views are easier to search, as FileMaker still hasn't transitioned to the 21st century and insists on this model... Most users however want a Master-Detail view, like a mail app, and understandably so as it's more intuitive (i.e. produce a list view on one side, but clicking on it updates detail/fields in the middle).
If this is what you want, you may want to cast an eye at Modular FM, where someone has already done the hard work for you:
http://www.modularfilemaker.org/module/masterdetail-2-0/
HTH
Stam

Lucene.Net/SpellChecker - multi-word/phrase based auto-suggest

I've implemented Lucenet.NET on my site, using it to index my products which are theatre shows, tours and attractions around London.
I want to implement a "Did you mean?" feature for when users misspell product names that takes the whole product titles into account and not just single words. For example,
If the user typed:
Lodnon Eye
I would like to auto-suggest:
London
London Eye
I assume I nead to have the analyzer index the titles as if they are a single entity, so that SpellChecker can nearest-match on the phrase, as well as the individual words.
How would I do this?
There is a excellent blog series here:
Lucene.NET
Introduction to Lucene
Indexing basics
Search basics
Did you mean..
Faceted Search
Class Reference
I have also found another project called SimpleLucene which you can use to maintain your lucene indexes whenever you need to update or delete a document. Read about it here
i've just recently implemented a phrase autosuggest system in lucene.net.
basically, the java version of lucene has a shinglefilter in one of the contrib folders which breaks down a sentence into all possible phrase combinations. Unfortunately lucene.nets contrib filters aren't quite there yet and so we don't have a shingle filter.
but, a lucene index written in java can be read by lucene.net as long as the versions are the same. so what i did was the following :
created a spell index in lucene.net using the spellcheck.IndexDictionary method as laid out in the "did you mean" section of jake scotts link. please note that only creates a spelling index of single words, not phrases.
i then created a java app that uses the shingle filter to create phrases of the text i'm searching and saves it in a temporary index.
i then wrote another method in dotnet to open this temporary index and add each of the phrases as a line or document into my spelling index that already contains the single words. the trick is to make sure the documents you're adding have the same form as the rest of the spell documents, so i ripped out the methods used in the spellchecker code in the lucene.net project and edited those.
once you've done that you can call the spellcheck.suggestsimilar method and pass it a misspelled phrase and it will return you a valid suggestion.
This is probably not the best solution and I definitely would use the answer suggested by spaceman but here is another possible solution. Use the KeywordAnalyzer or the KeywordTonenizer on each title, this will not break down the title into separate tokens but keep it as one token. Using the SuggestSimilar method would return the whole title as suggestions.