Parsing string to a combination of PhraseQuery and PrefixQuery in Lucene.net - lucene.net

I have a dotnetnuke module that integrates the search infrastructure of dotnetnuke with Lucene. The Lucene.search API takes either a query or a string containing what i need to search for.
For reasons i won't detail here, i can't create the query directly so i use the QueryParser abilities to parse the search string. It works great, except that i haven't found how to combine PhraseQuery and PrefixQuery in the search parameters.
I'd like to be able to parse the following string "here be drag" and have it return documents containing "here be dragons" or "here be dragsters"
I tried parsing "here be drag"* and "here be drag*" but not luck. Is there a special syntax to parse this kind of combination?

Take a look at the Complex Phrase Query Parser. Since you are using Lucene.net, you may have to look for an earlier version of the ComplexPhrase parser. If I recall correctly, this parser was part of the contrib.
Since this parser is not too complex in coding, porting it to c# from Java should not be too difficult.

Related

REST - What is standard file format for RESTful API design?

I would like to have my design stored as file for version control.
Are there any standards or commonly used formats?
For example, I can write one file for structure definition:
User {
uid,
name
}
And another file for API definition:
GET /users/:uid => User
GET /users?name=:name => [User]
However, these are in my own preferences. Are there any commonly used formats for representing these?
I expect it to be something like UML, regardless of language, just focusing on API itself.
The notation you mention is quite close to what developers would expect to get as a design or specification, so that might be enough.
However, if your project will get certain scale you can try to use some notation that might be then used by tools to automate either code generation, testing or documentation.
In particular, Swagger is a quite common tool to use for this. If you write your specification following these standards you'll get documentation and even some code generation if you use that tool.
https://swagger.io/specification/

Swift: filtering objects - use symbols for precision searching

Curious to know if Swift permits the use of search operators like the wildcard "*" or the exclusive "-" or Boolean search operators like AND, OR and NOT. By search operators I mean symbols an app user would input into a text box to narrow a search. I think NSPredicate's LIKE allows the use of "*" and "?". But I have not come across online examples of search operators used in connection with swift's often cited filtering code:
object.filter{$0.objectProperty.contains(searchText)}
If someone could point me in the right direction of some literature I would be grateful. I would be interested to learn how to make it possible for an app user to use search operators referenced above and/or use something like the following to narrow a search: dog w/20 food
The latter search term would find all instances of "dog" within 20 characters of "food."
The filter on Swift's array Is simply a method that returns it's own type, using a passed function that returns a Bool. So the short answer is, there's nothing related to the "filter" function that allows you to do anything like what you're talking about.
One common way to filter/find things is to use Regular Expressions which are supported by Swift. (You can search here for more specific info).
If you have everything in a database and expect your users to know how to write Predicates, I suppose you could use CoreData and search with a string from the user, but that seem pretty unlikely.
Outside of those options you will probably need to search for a third party library or build some sort of parser yourself.

Does Scala offer functionality similar to Pretty Print in Python?

Does Scala offer functionality similar to Pretty Print pprint in Python?
No, it doesn't. Except for XML, that is -- there's a pretty printer for that, which generates interpreter-readable data.
In fact, it doesn't even have a way to print interpreter-readable data, mainly because of how strings are represented when converted to string. For instance, List("abc").toString is List(abc).
Add to that, there's no facility at all that will break them based on width, or ident nested collections.
That said, it is doable, within the same limits as pprint.

TTXMLParser Sample Code?

Is anybody famaliar with how to use TTXMLParser. I can't find any documentation or sample code on it.
Is it SAX or DOM?
Does it support Xpath?
Can I extract CDATA from elements?
I have an application that already uses several Three20 modules it would be a shame to have to use another parser.
The main documentation I've found for TTXMLParser is in the header file. The comment there gives an overview of what TTXMLParser does.
TTXMLParser shouldn't really be thought of as an XML parser in the way you are thinking of it -- in this sense, questions such as "is it SAX or DOM" and "does it support XPath" aren't directly applicable. Instead, think of TTXMLParser as a convenience class to take XML and turn it into a tree of Objective-C objects. For example, this XML node:
<myNode attr1="value1" attr2="value2" />
would be turned into an Objective-C NSDictionary node which mapped the key "attr1" to the value "value1" and the key "attr2" to the key "value2".
TTXMLParser internally uses NSXMLParser (which is basically SAX) to build up its tree, but you, as the user of TTXMLParser, don't have to do any SAX-like stuff.
So, no, you will not end up with an XML document on which you can perform XPath queries. Instead, you will end up with an Objective-C tree of objects. If that's what you want, great; if you want a traditional XML parser with XPath, I'm currently working on a project that uses both Three20 and TouchXML. TouchXML supports XPath.
I agree it's hard to find sample code for TTXMLParser. Three20's TTTwitter sample used to use TTXMLParser (well actually, TTURLXMLResponse, which in turn uses TTURLParser), but at some point it was changed to use TTURLJSONResponse instead, which is a shame, because this was their only XML sample.
You can still see the old XML-based sample code here. Specifically, look at the -[requestDidFinishLoad:] function near the bottom of the file, for an example of some code that takes a TTURLXMLResponse, queries its rootObject member, and then walks down the resulting tree of objects.

Lucene.Net features

Am new to Lucene.Net
Which is the best Analyzer to use in Lucene.Net?
Also,I want to know how to use Stop words and word stemming features ?
I'm also new to Lucene.Net, but I do know that the Simple Analyzer omits any stop words, and indexes all tokens/works.
Here's a link to some Lucene info, by the way, the .NET version is an almost perfect, byte-for-byte rewrite of the Java version, so the Java documentation should work fine in most cases: http://darksleep.com/lucene/. There's a section in there about the three analyzers, Simple, Stop, and Standard.
I'm not sure how Lucene.Net handles word stemming, but this link, http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html?page=2, demonstrates how to create your own Analyzer in Java, and uses a PorterStemFilter to do word-stemming.
...[T]he Porter stemming algorithm (or "Porter stemmer") is a process for removing the more common morphological and inflexional endings from words in English
I hope that is helpful.
The best analyzer which i found is the StandardAnalyzer in which you can specify the stopwords also.
For Example :-
string indexFileLocation = #"C:\Index";
string stopWordsLocation = #"C:\Stopwords.txt";
var directory = FSDirectory.Open(new DirectoryInfo(indexFileLocation));
Analyzer analyzer = new StandardAnalyzer(
Lucene.Net.Util.Version.LUCENE_29, new FileInfo(stopWordsLocation));
It depends on your requirements. If your requirements are ultra simple - e.g. case insensitve, non-stemming searches - then StandardAnalyzer is a good choice. If you look into the Analyzer class and get familiar with Filters, particulary TokenFilter, you can exert an enormous amount of control over your index by rolling your own analyzer.
Stemmers are tricky, and it's important to have a deep understanding of what type of stemming you really need. I've used the Snowball stemmers. For example, the word "policy" and "police" have the same root in the English Snowball stemmer, and getting hits on documents with "policy" when the search term "police" isn't so hot. I've implemented strategies to support stemmed and non-stemmed search so that may be avoided, but it's important to understand the impact.
Beware of temptations like stop words. If you need to search for the phrase "to be or not to be" and the standard stop words are enabled, your search will fail to find documents with that phrase.