TermQuery not returning on a known search term, but WildcardQuery does - lucene.net

Am hoping someone with enough insight into the inner workings of Lucene might be able to point me in the right direction =)
I'll skip most of the surrounding irellevant code, and cut right to the chase. I have a Lucene index, to which I am adding the following field to the index (variables replaced by their literal values):
document.Add( new Field("Typenummer", "E5CEB501A244410EB1FFC4761F79E7B7",
Field.Store.YES , Field.Index.UN_TOKENIZED));
Later, when I search my index (using other types of queries), I am able to verify that this field does indeed appear in my index - like when looping through all Fields returned by Document.GetFields()
Field: Typenummer, Value: E5CEB501A244410EB1FFC4761F79E7B7
So far so good :-)
Now the real problem is - why can I not use a TermQuery to search against this value and actually get a result.
This code produces 0 hits:
// Returns 0 hits
bq.Add( new TermQuery( new Term( "Typenummer",
"E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );
But if I switch this to a WildcardQuery (with no wildcards), I get the 1 hit I expect.
// returns the 1 hit I expect
bq.Add( new WildcardQuery( new Term( "Typenummer",
"E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );
I've checked field lengths, I've checked that I am using the same Analyzer and so on and I am still on square 1 as to why this is.
Can anyone point me in a direction I should be looking?

I finally figured out what was going on. I'm expanding the tags for this question as it, much to my surprise, actually turned out to be an issue with the CMS this particular problem exists in. In summary, the problem came down to this:
The field is stored UN_TOKENIZED, meaning Lucene will store it excactly "as-is"
The BooleanQuery I pasted snippets from gets sent to the Sitecore SearchManager inside a PreparedQuery wrapper
The behaviour I expected from this was, that my query (having already been prepared) would go - unaltered - to the Lucene API
Turns out I was wrong. It passes through a RewriteQuery method that copies my entire set of nested queries as-is, with one exception - all the Term arguments are passed through a LowercaseStrategy()
As I indexed an UPPERCASE Term (UN_TOKENIZED), and Sitecore changes my PreparedQuery to lowercase - 0 results are returned
Am not going to start an argument of whether this is "by design" or "by design flaw" implementation of the Lucene Wrapper API - I'll just note that rewriting my query when using the PreparedQuery overload is... to me... unexpected ;-)
Further teachings from this; storing the field as TOKENIZED will eliminate this problem too, as the StandardAnalyzer by default will lowercase all tokens.

Related

How to allow leading wild cards in custom smart search web part (Kentico 10)

I have a custom index for my products and I am using the Subset Analyzer. This Analyzer works great, but if you do field searches, it does not work.
For example, I have a document with the following fields:
"documentname", "My-Document-Name"
"tags", "1234,5678,9101"
"documentdescription", "This is a great Document, My-Document-Name."
When I just search "name AND tags:(1234)", I get this document in my results because it searches +_content:name.
-- However:
When I search "documentname:(name)^3.0 AND tags:(1234)", I do not get this document in my results.
Of course, when I do "documentname:(*name*)^3.0" I get a parse error saying: '*' or '?' not allowed as first character in WildcardQuery.
How can I either enable wildcard query in my custom CMS.Search webpart?
First of all you have to make sure that a field you checking is in the index with proper name. documentname might not be in the index it can be called _title, depends how you index is set up. Get lukeall and check your index (it should be in \CMS\App_Data\CMSModules\SmartSearch\YourIndexName). You can use luke to test your searches as well.
For examples there is no tags but there is documenttags field.
P.S. Wildcards are working and you are right you can't use them as a first character by default (lucene documentation says: You cannot use a * or ? symbol as the first character of a search), but there is a way to set it up in lucene.net, although i dont know if there are setting for that in Kentico. But i dont think you need wildcards, so your query should be (assuming you have documentname and documenttags in the index):
+(documentname:"My-Name" AND documenttags:"tag1")

Using "is null" when retrieving prices doesn't work

I am trying to get the price items for performance block storage that are generic (not specific to a certain datacenter). I can see that these have the locationGroupId set to blank or null, but I can't seem to get the objectFilter to work with that, the query returns nothing. If I omit the locationGroupId filter I get a result that contain both location-specific and non-location specific prices.
GET /rest/v3.1/SoftLayer_Product_Package/759/getItemPrices.json?objectMask=mask[locationGroupId,id,categories,item]&objectFilter={"itemPrices":{"categories":{"categoryCode":{"operation":"performance_storage_space"}},"item":{"keyName":{"operation":"$=GBs"}},"locationGroupId":{"operation":"is null"}}}
I am guessing there is something wrong with the object filter, any ideas?
If I filter on locationGroupId 509 it works:
/rest/v3.1/SoftLayer_Product_Package/759/getItemPrices.json?objectMask=mask[locationGroupId,id,categories,item]&objectFilter={"itemPrices":{"categories":{"categoryCode":{"operation":"performance_storage_space"}},"item":{"keyName":{"operation":"$=GBs"}},"locationGroupId":{"operation":509}}}
The reason it the first query didn't work while the second did was that I used the command "curl -sg" to do the request. While that eliminates the need to escape the {}[] characters - it also turns off escaping other characters correctly in the URL - like the space in "is null". Changing that to "is%20null" solves the issue.
I am posting this as the answer as I find it likely for others to encounter this problem.

whoosh doesn't search for short words like "C#"

i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)

Word 2010 can Field added via QuickParts be given an ID and later referenced in document.Fields collection

I need to add a few fields to a Word 2010 DOTX template which are to be populated automatically with custom content at "run time" when the document is opened in a C# program using Word Interop services. I don't see any way to assign a unique name to "Ask" or "Fill-In" fields when adding them to the template via the QuickParts ribbon-menu option.
When I iterate the document.Fields collection in the C# program, I must know which field I'm referencing, so it can be assigned the correct value.
It seems things have changed between previous versions of Word and Word 2010. So, if you answer please make sure your answer applies to 2010. Don't assume that what used to work in previous versions works in 2010. Much appreciated, since I rarely work with Word and feel like a dolt when trying to figure out the ribbon menuing in 2010.
You are correct in that fields don't necessarily have a built-in way to uniquely distinguish themselves from other field instances (other than its index in the Fields collection). However, you can use the Field.Type property to test for wdFieldAsk or wdFieldFillIn . If this is not narrow enough to ID then you will need to parse your own unique identifier from the Field.Code. For example, you can construct your FILLIN field as:
{ FILLIN "Hello, World!" MYIDENTIFER }
when you iterate through your document.Fields collection just have a test for the identifier being in the string. EDIT: example:
For Each fld In ActiveDocument.Fields
If InStr("CARMODEL", fld.Code) <> 0 Then
''this is the carmodel field
End If
Next
Another alternative - seek your specific field with a Find.Text for "^d MYIDENTIFIER" (where ^d is expression for 'field code')
Let me know if this helps and expand on your question if any gaps.

Asp.Net MVC 2: How exactly does a view model bind back to the model upon post back?

Sorry for the length, but a picture is worth 1000 words:
In ASP.NET MVC 2, the input form field "name" attribute must contain exactly the syntax below that you would use to reference the object in C# in order to bind it back to the object upon post back. That said, if you have an object like the following where it contains multiple Orders having multiple OrderLines, the names would look and work well like this (case sensitive):
This works:
Order[0].id
Order[0].orderDate
Order[0].Customer.name
Order[0].Customer.Address
Order[0].OrderLine[0].itemID // first order line
Order[0].OrderLine[0].description
Order[0].OrderLine[0].qty
Order[0].OrderLine[0].price
Order[0].OrderLine[1].itemID // second order line, same names
Order[0].OrderLine[1].description
Order[0].OrderLine[1].qty
Order[0].OrderLine[1].price
However we want to add order lines and remove order lines at the client browser. Apparently, the indexes must start at zero and contain every consecutive index number to N.
The black belt ninja Phil Haack's blog entry here explains how to remove the [0] index, have duplicate names, and let MVC auto-enumerate duplicate names with the [0] notation. However, I have failed to get this to bind back using a nested object:
This fails:
Order.id // Duplicate names should enumerate at 0 .. N
Order.orderDate
Order.Customer.name
Order.Customer.Address
Order.OrderLine.itemID // And likewise for nested properties?
Order.OrderLine.description
Order.OrderLine.qty
Order.OrderLine.price
Order.OrderLine.itemID
Order.OrderLine.description
Order.OrderLine.qty
Order.OrderLine.price
I haven't found any advice out there yet that describes how this works for binding back nested ViewModels on post. Any links to existing code examples or strict examples on the exact names necessary to do nested binding with ILists?
Steve Sanderson has code that does this sort of thing here, but we cannot seem to get this to bind back to nested objects. Anything not having the [0]..[n] AND being consecutive in numbering simply drops off of the return object.
Any ideas?
We found a work around, by using the following:
Html.EditorFor(m => m, "ViewNameToUse", "FieldPrefix")
Where FieldPrefix is the "object[0]". This is hardly ideal, but it certainly works pretty well. It's simple and elegant.