QUESTION below:
Data structure in SOLR:
<field name="id" type="string" required="true"/>
<field name="session_id" type="string" required="true"/>
<field name="action_type" required="true"/>
<field name="error_msg" required="false"/>
(all fields have: indexed="true" stored="true" multiValued="false")
only 'error' field is not required (can be null).
There is equivalent table in oracle:
TABLE SOLR_TEST
(
ID NUMBER NOT NULL ,
SESSION_ID VARCHAR2(20 BYTE) NOT NULL ,
ACTION_TYPE VARCHAR2(20 BYTE) NOT NULL ,
ERROR_MSG VARCHAR2(20 BYTE)
);
there is sample data (the same for SOLR and Oracle)
ID SESSION_ID ACTION_TYPE ERROR_MSG
-- -------------------- -------------------- --------------------
1 00001 SELECTED_ACTION
2 00001 SELECTED_ACTION
3 00001 OTHER
4 00002 A2 ERROR_001
5 00002 OTHER
6 00003 SELECTED_ACTION ERROR_002
7 00004 A1 ERROR_001
8 00005 A2
9 00005 SELECTED_ACTION
10 00005 SELECTED_ACTION ERROR_003
11 00006 SELECTED_ACTION
12 00006 OTHER ERROR_004
QUESTION:
How to create in SOLR query which will return:
all session_id which have specified action_type but never happen specified action_type with non empty error_msg
or equivalent of this query in Oracle:
select distinct session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and not session_id in
( select session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and error_msg is not null
);
result for this query is:
SESSION_ID
--------------------
00001
00006
e.g. SOLR query like this is not working:
http://solrhost/solr/collection/select?rows=1&q=-(error_msg:[*+TO+*]+AND+action_type:SELECTED_ACTION)&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false&fq=action_type:SELECTED_ACTION
// EDIT /////////////////////////////////////
real schema looks like this:
<schema name="elogging" version="1.5">
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="action_type" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="session_id" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="error_msg" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<types>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="uuid" class="solr.UUIDField" indexed="true"/>
</types>
<updateRequestProcessorChain name="uniq-fields">
<processor class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory">
<lst name="fields">
<str>id</str>
</lst>
</processor>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
</schema>
// EDIT 2 //////////////////////
SOLR query is not working as I expect - this SOLR query returns something like:
select distinct session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and error_msg is null;
SESSION_ID
--------------------
00001
00005
00006
Value '00005' is wrong because there is a row:
10 00005 SELECTED_ACTION ERROR_003
// EDIT 3 ////////////
this SOLR query also not working (the same issue as for previous):
http://solrhost/solr/collection/select?rows=1&q=action_type:SELECTED_ACTION+AND+-{!join+from=session_id+to=session_id}error_msg:*+AND+action_type:SELECTED_ACTION&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false
// EDIT 4 ///////
*fixes schema - 'error_msg' is indexed*
// EDIT 5 /////
There you have sample data for SOLR:
id,session_id,action_type,error_msg
1,00001,SELECTED_ACTION,
2,00001,SELECTED_ACTION,
3,00001,OTHER,
4,00002,A2,ERROR_001
5,00002,OTHER,
6,00003,SELECTED_ACTION,ERROR_002
7,00004,A1,ERROR_001
8,00005,A2,
9,00005,SELECTED_ACTION,
10,00005,SELECTED_ACTION,ERROR_003
11,00006,SELECTED_ACTION,
12,00006,OTHER,ERROR_004
and
result from SOLR for this data and query http://localhost:8983/solr/collection3/select?rows=1&q=-(error_msg:[*+TO+*]+AND+action_type:SELECTED_ACTION)&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false&fq=action_type:SELECTED_ACTION:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">30</int>
<lst name="params">
<str name="facet.zeros">false</str>
<str name="facet">true</str>
<str name="indent">true</str>
<str name="q">
-(error_msg:[* TO *] AND action_type:SELECTED_ACTION)
</str>
<str name="facet.field">session_id</str>
<str name="wt">xml</str>
<str name="fq">action_type:SELECTED_ACTION</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="4" start="0">
<doc>
<str name="id">1</str>
<str name="session_id">00001</str>
<str name="action_type">SELECTED_ACTION</str>
<long name="_version_">1449881246216749056</long>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="session_id">
<int name="00001">2</int>
<int name="00005">1</int>
<int name="00006">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
This is kind of tricky, because as far as I know(and I would be very happy if someone could prove this wrong) - it's not possible to reuse parts of query result in another query(e.g. filter query or nested query).
So, here is as close as I can get currently:
Query:
http://localhost:8983/solr/stack19588325/select?q=action_type%3A%22SELECTED_ACTION%22&fq=%7B!tag%3Ddt%7Daction_type%3ASELECTED_ACTION+AND+error_msg%3A%5B*+TO+*%5D+AND+_query_%3A%7B!join+from%3Dsession_id+to%3Dsession_id+v%3D%24qq%7D&rows=0&wt=xml&indent=true&facet=true&facet.mincount=1&facet.field={!ex=dt%20key=nonfilter_session_id}session_id&facet.field=session_id&qq=-error_msg:[*%20TO%20*]
Result:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="qq">-error_msg:[* TO *]</str>
<str name="q">action_type:"SELECTED_ACTION"</str>
<arr name="facet.field">
<str>{!ex=dt key=nonfilter_session_id}session_id</str>
<str>session_id</str>
</arr>
<str name="indent">true</str>
<str name="fq">{!tag=dt}action_type:SELECTED_ACTION AND error_msg:[* TO *] AND _query_:{!join from=session_id to=session_id v=$qq}</str>
<str name="facet.mincount">1</str>
<str name="rows">0</str>
<str name="wt">xml</str>
<str name="facet">true</str>
<str name="_">1382878844535</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="nonfilter_session_id">
<int name="00001">2</int>
<int name="00005">2</int>
<int name="00003">1</int>
<int name="00006">1</int>
</lst>
<lst name="session_id">
<int name="00005">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
So, as you see here, we have two different facet results:
nonfilter_session_id - This shows those "session_id" which don't have error_msg. The count - is overal count of session_id records.
session_id - This shows those "session_id" which both have AND don't have error_msg (this is the case with 00005). The count - is cound of session_id with error_msg.
So, if there won't be any better choice - you can make an intersection of those two sets, and there will be only those session_id's that are expected.
Related
I am beginner with getJSON and JS programming. I tried the answers I found on this forum but still got a NaN instead of the expected result ( 4950) of 2475 + 2475. What am doing wrong?
Here is the output generated from the getJSON request on apache solr
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">114</int>
<lst name="params">
<str name="q">objectType:mods AND objectProject:DESerie AND state:published</str>
<str name="_">1568794672289</str>
</lst>
</lst>
<result name="response" numFound="2475" start="0" maxScore="2.1764848"> </result>
</response>
This is my javascript
<script type="text/javascript">
$.getJSON('../api/v1/search?q=objectType:mods AND objectProject:DESerie AND state:published&wt=json', function (data) {
var var1 = $('#series').text(data.response.numFound);
var intVar1 = parseInt(var1, 10);
var total = intVar1 + intVar1;
$('#total').text(total);
Here is my display code
<div id="pubtyp">
Series: <span id="total"></span>
</div>
Im working on auto completion search with solr using EdgeNGrams.I use solr 3.3 and I would like to use collations from suggester as a autocomplete solution for multi term searches. Unfortunately the Suggester returns only one collation for a multi term search
If the user is searching for names of employees, then auto completion should be applied. ie., want results like google search. It's working fine for me below configurations.
schema.xml
<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
</analyzer>
<field name="title" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>
<field name="empname" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true" />
<field name="autocomplete_text" type="edgytext" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />
<copyField source="empname" dest="autocomplete_text"/>
<copyField source="title" dest="autocomplete_text"/>
URL : $http://local:8080/test/suggest/?q=michael
Result :
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="michael">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>michael bolton</str>
<str>michael foret</str>
<str>michael force</str>
<str>michael w. smith featuring andrae crouch</str>
</arr>
</lst>
<str name="collation">michael bolton</str>
</lst>
</lst>
</response>
It's working fine for me. When im searching with "michael f", getting response like below. (http://local:8080/test/suggest/?q=michael f)
Response :
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="michael">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>michael bolton</str>
<str>michael foret</str>
<str>michael force</str>
<str>michael w. smith featuring andrae crouch</str>
.....
</arr>
</lst>
<lst name="f">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">9</int>
<arr name="suggestion">
<str>f**k the facts</str>
<str>fairest lord jesus</str>
<str>franz ferdinand</str>
<str>françois rauber</str>
.........
</arr>
</lst>
<str name="collation">michael bolton f**k the facts</str>
</lst>
</lst>
</response>.
So when i search with "michael f" then, i should get "michael foret" and "michael force" only. Data coming starts with "f". Please suggest me if there's anything wrong in my configuration settings in solr.
Thanks in Advance,
Anil.
I'm working on auto completion search with Solr using EdgeNGrams. If the user is searching for names of employees, then auto completion should be applied. That is, I want the results to be like a Google search. It's working fine for some searches.
File schema.xml:
<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
</analyzer>
<field name="title" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>
<field name="empname" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true" />
<field name="autocomplete_text" type="edgytext" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />
<copyField source="empname" dest="autocomplete_text"/>
<copyField source="title" dest="autocomplete_text"/>
http://local:8080/test/suggest/?q=michael
Result:
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="michael">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>michael bolton</str>
<str>michael foret</str>
<str>michael houser</str>
<str>michael o'brien</str>
<str>michael penn</str>
<str>michael row your boat ashore</str>
<str>michael tilson thomas</str>
<str>michael w. smith</str>
<str>michael w. smith featuring andrae crouch</str>
</arr>
</lst>
<str name="collation">michael bolton</str>
</lst>
</lst>
</response>
It's working fine for me. When I search with michael f
http:// local:8080/test/suggest/?q=michael f
I get a response like:
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="michael">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>michael bolton</str>
<str>michael foret</str>
<str>michael houser</str>
<str>michael o'brien</str>
<str>michael penn</str>
<str>michael row your boat ashore</str>
<str>michael tilson thomas</str>
<str>michael w. smith</str>
<str>michael w. smith featuring andrae crouch</str>
</arr>
</lst>
<lst name="f">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">9</int>
<arr name="suggestion">
<str>f**k the facts</str>
<str>fairest lord jesus</str>
<str>fatboy slim</str>
<str>ffh</str>
<str>fiona apple</str>
<str>foo fighters</str>
<str>frank sinatra</str>
<str>frans bauer</str>
<str>franz ferdinand</str>
<str>françois rauber</str>
</arr>
</lst>
<str name="collation">michael bolton f**k the facts</str>
</lst>
</lst>
</response>.
When I search with michael f then, I should get michael foret only. Data coming starts with f. Is there anything wrong in my configuration settings in Solr?
I wrote [old link] about different ways to make auto-suggestions with Solr and some questions you should ask yourself in order to make the right choice. Briefly, the out-of-the-box ways are:
Facet prefix
NGrams
TermsComponent
Suggester
They all have advantages and limitations at the same time, I'd suggest you to read the article.
If you are looking for a complete and flexible solution, which requires some more work, you can have a look at this article as well.
If you already decided to use NGrams, given your examples you can index your employees using EdgeNGramFilterFactory with minGramSize 1, then search on that field to make auto-suggestions. For the client part you need to use some javascript.
i receive the following error when trying to implement auto-complete based on edismax type.
SEVERE: java.lang.IllegalStateException: field "locality_ng" was indexed without position data; cannot run PhraseQuery (term=львів)
at org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:241)
at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:145)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:317)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:324)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1275)
schema types:
<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<!-- autocomplete_edge : Will match from the left of the field, e.g. if the document field
is "A brown fox" and the query is "A bro", it will match, but not "brown"
-->
<fieldType name="autocomplete_edge" class="solr.TextField">
<analyzer type="index">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([\.,;:_])" replacement=" " replace="all"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="50" minGramSize="2"/>
</analyzer>
<analyzer type="query">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([\.,;:_])" replacement=" " replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^(.{50})(.*)?" replacement="$1" replace="all"/>
</analyzer>
</fieldType>
<!-- autocomplete_ngram : Matches any word in the input field, with implicit right truncation.
This means that the field "A brown fox" will be matched by query "bro".
We use this to get partial matches, but these whould be boosted lower than exact and left-anchored
-->
<fieldType name="autocomplete_ngram" class="solr.TextField">
<analyzer type="index">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" minGramSize="2"/>
</analyzer>
<analyzer type="query">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^(.{20})(.*)?" replacement="$1" replace="all"/>
</analyzer>
</fieldType>
schema fields
<!-- AutoComplete fields
Construct documents containing these fields for all suggestions you like to provide
Then use a dismax query to search on some fields, display some fields and boost others
-->
<field name="locality_id" type="int" indexed="true" stored="true" required="true"/>
<!-- The main text to return as the suggestion. This is not searched -->
<field name="locality_suggest" type="text_suggest" indexed="true" stored="true" omitNorms="true" multiValued="true" />
<!-- A variant of textsuggest which only matches from the very left edge -->
<copyField source="locality_suggest" dest="locality_nge"/>
<field name="locality_nge" type="autocomplete_edge" indexed="true" stored="false" multiValued="true" />
<!-- A variant of textsuggest which matches from the left edge of all terms (implicit truncation) -->
<copyField source="locality_suggest" dest="locality_ng"/>
<field name="locality_ng" type="autocomplete_ngram" indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true" multiValued="true" />
solr config, use the following request handler with edismax type:
<requestHandler class="solr.SearchHandler" name="autocomplete" >
<lst name="defaults">
<str name="defType">edismax</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="qf">locality_suggest locality_ng^5.0 locality_nge^10.0</str>
<str name="debugQuery">false</str>
</lst>
this error occurred only if query contains specific symbols like + - $ # after word
львів+в
київ+а
any suggestions would be great
Change
<field name="locality_ng" type="autocomplete_ngram" indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true" multiValued="true" />
into
<field name="locality_ng" type="autocomplete_ngram" indexed="true" stored="false" omitNorms="true" multiValued="true" />
There seems to be no difference for solr to search for
'search/suggest/?q=print%20'
or
'search/suggest/?q=print'
Results are the same. But this is really important for autosuggestion.
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="print">
<int name="numFound">5</int>
<int name="startOffset">0</int>
<int name="endOffset">6</int>
<arr name="suggestion">
<str>printer facsimile toner</str>
<str>print cartridge</str>
<str>printhead printhead</str>
<str>printer copier paper</str>
<str>printer kit</str>
</arr>
</lst>
<str name="collation">printer facsimile toner</str>
</lst>
</lst>
</response>
Try
q=str:"print "
and
q=str:(print )
One of these should work (it depends on, how is your str field analysed).