ElasticSearch - filter terms for autocomplete - autocomplete

I would like for an autocomplete to get the terms starting with some specific characters.However, the terms returned do not begin with the specified text ("wer" in this case), and more than that, no matter what the prefix content is, they are always the same. The query I am using now is:
{"facets":
{"count":
{"terms":
{"field":"content"},
"facet_filter":
{"prefix":{"content":"wer"}}
}
},
"size":0
}
I am wondering what am I missing or doing wrong. Any help much appreciated. Thanks!

Perhaps you miss a "query" entry after the facet_filter.
If this does not help try regex pattern for facets, but be aware that facets will be removed in future updates of elasticsearch.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_regex_patterns

Related

Returning the number of results?

I'm looking to return the number of results found in an ajax fashion on Algolia instant search.
A little field saying something like "There are X number of results" and refines as the characters are typed.
I've read you utilise 'nbHits' but i'm unsure of how to go about it.. Being from a design background.
Thanks for help in advance.
The instantsearch.js stats widget shows the number of results and speed of the search. If you don't want to use the widget, I believe you can still use {{nbHits}} inside of your template wherever you want the number to print.
Very easy when you know how, Thanks for pointing me in the right direction Josh.
This works:
search.addWidget(
instantsearch.widgets.stats({
container: '.no-of-results'
})
);

Django-Tastypie filter all fields

I'm doing a project with django-tastypie and know how to set a field to filter by, thus:
filtering = {
'nationality': ALL,
}
What I want to know is if there was some setting that would allow me to have all the fields available for filtering, without having to be set one by one as in the example above?
Someone can help me?
Unfortunately, I Don't think there is a way to do it at once.
You can use this way,
for field in YourModel.__dict__['_meta'].fields:
filtering.update({field.name : ALL})

whoosh doesn't search for short words like "C#"

i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)

TermQuery not returning on a known search term, but WildcardQuery does

Am hoping someone with enough insight into the inner workings of Lucene might be able to point me in the right direction =)
I'll skip most of the surrounding irellevant code, and cut right to the chase. I have a Lucene index, to which I am adding the following field to the index (variables replaced by their literal values):
document.Add( new Field("Typenummer", "E5CEB501A244410EB1FFC4761F79E7B7",
Field.Store.YES , Field.Index.UN_TOKENIZED));
Later, when I search my index (using other types of queries), I am able to verify that this field does indeed appear in my index - like when looping through all Fields returned by Document.GetFields()
Field: Typenummer, Value: E5CEB501A244410EB1FFC4761F79E7B7
So far so good :-)
Now the real problem is - why can I not use a TermQuery to search against this value and actually get a result.
This code produces 0 hits:
// Returns 0 hits
bq.Add( new TermQuery( new Term( "Typenummer",
"E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );
But if I switch this to a WildcardQuery (with no wildcards), I get the 1 hit I expect.
// returns the 1 hit I expect
bq.Add( new WildcardQuery( new Term( "Typenummer",
"E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );
I've checked field lengths, I've checked that I am using the same Analyzer and so on and I am still on square 1 as to why this is.
Can anyone point me in a direction I should be looking?
I finally figured out what was going on. I'm expanding the tags for this question as it, much to my surprise, actually turned out to be an issue with the CMS this particular problem exists in. In summary, the problem came down to this:
The field is stored UN_TOKENIZED, meaning Lucene will store it excactly "as-is"
The BooleanQuery I pasted snippets from gets sent to the Sitecore SearchManager inside a PreparedQuery wrapper
The behaviour I expected from this was, that my query (having already been prepared) would go - unaltered - to the Lucene API
Turns out I was wrong. It passes through a RewriteQuery method that copies my entire set of nested queries as-is, with one exception - all the Term arguments are passed through a LowercaseStrategy()
As I indexed an UPPERCASE Term (UN_TOKENIZED), and Sitecore changes my PreparedQuery to lowercase - 0 results are returned
Am not going to start an argument of whether this is "by design" or "by design flaw" implementation of the Lucene Wrapper API - I'll just note that rewriting my query when using the PreparedQuery overload is... to me... unexpected ;-)
Further teachings from this; storing the field as TOKENIZED will eliminate this problem too, as the StandardAnalyzer by default will lowercase all tokens.

How to get whole index in ZEND Lucene?

Hi I am looking for a way to get the whole index if my query is nothing.
My lucene is a picture of my database without some unwanted content, like expired annonces...
So I would like to use only lucene to get annonces, and for that I need a way to get the whole index.
Any ideas?
thanks!
This is not the answer but something wich works for me:
Using an indexKey like is_indexed, always true.
I am adding "is_indexed:1" to my query and it works...
If you have something else, let me know!
I tend to use a field which is named after the index such as:
$oDoc->addField(
Zend_Search_Lucene_Field::keyword(
'index',
'availability'
)
);
Then the term query will return all fields. It's not pretty but it works fine.