QUESTION below:
Data structure in SOLR:
<field name="id" type="string" required="true"/>
<field name="session_id" type="string" required="true"/>
<field name="action_type" required="true"/>
<field name="error_msg" required="false"/>
(all fields have: indexed="true" stored="true" multiValued="false")
only 'error' field is not required (can be null).
There is equivalent table in oracle:
TABLE SOLR_TEST
(
ID NUMBER NOT NULL ,
SESSION_ID VARCHAR2(20 BYTE) NOT NULL ,
ACTION_TYPE VARCHAR2(20 BYTE) NOT NULL ,
ERROR_MSG VARCHAR2(20 BYTE)
);
there is sample data (the same for SOLR and Oracle)
ID SESSION_ID ACTION_TYPE ERROR_MSG
-- -------------------- -------------------- --------------------
1 00001 SELECTED_ACTION
2 00001 SELECTED_ACTION
3 00001 OTHER
4 00002 A2 ERROR_001
5 00002 OTHER
6 00003 SELECTED_ACTION ERROR_002
7 00004 A1 ERROR_001
8 00005 A2
9 00005 SELECTED_ACTION
10 00005 SELECTED_ACTION ERROR_003
11 00006 SELECTED_ACTION
12 00006 OTHER ERROR_004
QUESTION:
How to create in SOLR query which will return:
all session_id which have specified action_type but never happen specified action_type with non empty error_msg
or equivalent of this query in Oracle:
select distinct session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and not session_id in
( select session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and error_msg is not null
);
result for this query is:
SESSION_ID
--------------------
00001
00006
e.g. SOLR query like this is not working:
http://solrhost/solr/collection/select?rows=1&q=-(error_msg:[*+TO+*]+AND+action_type:SELECTED_ACTION)&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false&fq=action_type:SELECTED_ACTION
// EDIT /////////////////////////////////////
real schema looks like this:
<schema name="elogging" version="1.5">
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="action_type" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="session_id" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="error_msg" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<types>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="uuid" class="solr.UUIDField" indexed="true"/>
</types>
<updateRequestProcessorChain name="uniq-fields">
<processor class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory">
<lst name="fields">
<str>id</str>
</lst>
</processor>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
</schema>
// EDIT 2 //////////////////////
SOLR query is not working as I expect - this SOLR query returns something like:
select distinct session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and error_msg is null;
SESSION_ID
--------------------
00001
00005
00006
Value '00005' is wrong because there is a row:
10 00005 SELECTED_ACTION ERROR_003
// EDIT 3 ////////////
this SOLR query also not working (the same issue as for previous):
http://solrhost/solr/collection/select?rows=1&q=action_type:SELECTED_ACTION+AND+-{!join+from=session_id+to=session_id}error_msg:*+AND+action_type:SELECTED_ACTION&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false
// EDIT 4 ///////
*fixes schema - 'error_msg' is indexed*
// EDIT 5 /////
There you have sample data for SOLR:
id,session_id,action_type,error_msg
1,00001,SELECTED_ACTION,
2,00001,SELECTED_ACTION,
3,00001,OTHER,
4,00002,A2,ERROR_001
5,00002,OTHER,
6,00003,SELECTED_ACTION,ERROR_002
7,00004,A1,ERROR_001
8,00005,A2,
9,00005,SELECTED_ACTION,
10,00005,SELECTED_ACTION,ERROR_003
11,00006,SELECTED_ACTION,
12,00006,OTHER,ERROR_004
and
result from SOLR for this data and query http://localhost:8983/solr/collection3/select?rows=1&q=-(error_msg:[*+TO+*]+AND+action_type:SELECTED_ACTION)&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false&fq=action_type:SELECTED_ACTION:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">30</int>
<lst name="params">
<str name="facet.zeros">false</str>
<str name="facet">true</str>
<str name="indent">true</str>
<str name="q">
-(error_msg:[* TO *] AND action_type:SELECTED_ACTION)
</str>
<str name="facet.field">session_id</str>
<str name="wt">xml</str>
<str name="fq">action_type:SELECTED_ACTION</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="4" start="0">
<doc>
<str name="id">1</str>
<str name="session_id">00001</str>
<str name="action_type">SELECTED_ACTION</str>
<long name="_version_">1449881246216749056</long>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="session_id">
<int name="00001">2</int>
<int name="00005">1</int>
<int name="00006">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
This is kind of tricky, because as far as I know(and I would be very happy if someone could prove this wrong) - it's not possible to reuse parts of query result in another query(e.g. filter query or nested query).
So, here is as close as I can get currently:
Query:
http://localhost:8983/solr/stack19588325/select?q=action_type%3A%22SELECTED_ACTION%22&fq=%7B!tag%3Ddt%7Daction_type%3ASELECTED_ACTION+AND+error_msg%3A%5B*+TO+*%5D+AND+_query_%3A%7B!join+from%3Dsession_id+to%3Dsession_id+v%3D%24qq%7D&rows=0&wt=xml&indent=true&facet=true&facet.mincount=1&facet.field={!ex=dt%20key=nonfilter_session_id}session_id&facet.field=session_id&qq=-error_msg:[*%20TO%20*]
Result:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="qq">-error_msg:[* TO *]</str>
<str name="q">action_type:"SELECTED_ACTION"</str>
<arr name="facet.field">
<str>{!ex=dt key=nonfilter_session_id}session_id</str>
<str>session_id</str>
</arr>
<str name="indent">true</str>
<str name="fq">{!tag=dt}action_type:SELECTED_ACTION AND error_msg:[* TO *] AND _query_:{!join from=session_id to=session_id v=$qq}</str>
<str name="facet.mincount">1</str>
<str name="rows">0</str>
<str name="wt">xml</str>
<str name="facet">true</str>
<str name="_">1382878844535</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="nonfilter_session_id">
<int name="00001">2</int>
<int name="00005">2</int>
<int name="00003">1</int>
<int name="00006">1</int>
</lst>
<lst name="session_id">
<int name="00005">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
So, as you see here, we have two different facet results:
nonfilter_session_id - This shows those "session_id" which don't have error_msg. The count - is overal count of session_id records.
session_id - This shows those "session_id" which both have AND don't have error_msg (this is the case with 00005). The count - is cound of session_id with error_msg.
So, if there won't be any better choice - you can make an intersection of those two sets, and there will be only those session_id's that are expected.
I am trying to parse:
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><LoginResponse xmlns="http://services.marketernet.com/application"><LoginResult><results><response value="UY+/9dD+Lz7DT3Oq/WG3CVJ/pFW7o6LEFNA4xOSIWr88Dh2RVAgy9qHP1BwpdiYA"/><exceptions></exceptions></results></LoginResult></LoginResponse></soap:Body></soap:Envelope>
So far I have:
<cfset soapResponse = xmlParse(httpResponse.fileContent) />
<cfset results = xmlSearch(soapResponse,"//*[local-name()='LoginResult' and namespace-uri()='http://services.marketernet.com/application']") />
I need the value of <response value="UY+/9dD+Lz7DT3Oq/WG3CVJ/pFW7o6LEFNA4xOSIWr88Dh2RVAgy9qHP1BwpdiYA"/>
I try looping, even try to do a deep xml path, nothing.
Please help me, if you have questions please let me know.
Update 1: "ScreenShot"
Update 2: "Screenshot long version"
I normally just use xmlSearch(soapResponse,"//*[local-name()='whatever']") and it works fine for me. It can return different types depending on how deep you search in the XML. Because of that, when developing the code I always use <cfdump> to view the results of the xmlSearch() function to know what I am dealing with.
I took the SOAP response that you shared and tested the following code successfully on ColdFusion 9.0.1. Notice that I have three different searches here each delving deeper into the XML tree. I left the <cfdump> in there so you can see what each returns.
<cftry>
<cfsavecontent variable="content">
<?xml version="1.0" encoding="UTF-8" ?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<LoginResponse xmlns="http://services.marketernet.com/application">
<LoginResult>
<results>
<response value="UY+/9dD+Lz7DT3Oq/WG3CVJ/pFW7o6LEFNA4xOSIWr88Dh2RVAgy9qHP1BwpdiYA"/>
<exceptions></exceptions>
</results>
</LoginResult>
</LoginResponse>
</soap:Body>
</soap:Envelope>
</cfsavecontent>
<cfset soapResponse = xmlParse(Trim(content)) />
<html>
<head><title>Test xmlParse</title></head>
<body>
<h3>xmlParse option 1</h3>
<div>
<cfset results = xmlSearch(soapResponse,"//*[local-name()='LoginResult']") />
<cfdump var="#results#" />
<cfset value = results[1].results.response.XmlAttributes.value />
<cfdump var="#value#" />
</div>
<h3>xmlParse option 2</h3>
<div>
<cfset results = xmlSearch(soapResponse,"//*[local-name()='results']") />
<cfdump var="#results#" />
<cfset value = results[1].response.XmlAttributes.value />
<cfdump var="#value#" />
</div>
<h3>xmlParse option 3</h3>
<div>
<cfset results = xmlSearch(soapResponse,"//*[local-name()='response']") />
<cfdump var="#results#" />
<cfset value = results[1].XmlAttributes.value />
<cfdump var="#value#" />
</div>
</body>
</html>
<cfcatch type="any">
<cfdump var="#cfcatch#" />
</cfcatch>
</cftry>
All of the options result in setting the value variable to UY+/9dD+Lz7DT3Oq/WG3CVJ/pFW7o6LEFNA4xOSIWr88Dh2RVAgy9qHP1BwpdiYA from the XML.
Im working on auto completion search with solr using EdgeNGrams.I use solr 3.3 and I would like to use collations from suggester as a autocomplete solution for multi term searches. Unfortunately the Suggester returns only one collation for a multi term search
If the user is searching for names of employees, then auto completion should be applied. ie., want results like google search. It's working fine for me below configurations.
schema.xml
<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
</analyzer>
<field name="title" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>
<field name="empname" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true" />
<field name="autocomplete_text" type="edgytext" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />
<copyField source="empname" dest="autocomplete_text"/>
<copyField source="title" dest="autocomplete_text"/>
URL : $http://local:8080/test/suggest/?q=michael
Result :
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="michael">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>michael bolton</str>
<str>michael foret</str>
<str>michael force</str>
<str>michael w. smith featuring andrae crouch</str>
</arr>
</lst>
<str name="collation">michael bolton</str>
</lst>
</lst>
</response>
It's working fine for me. When im searching with "michael f", getting response like below. (http://local:8080/test/suggest/?q=michael f)
Response :
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="michael">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>michael bolton</str>
<str>michael foret</str>
<str>michael force</str>
<str>michael w. smith featuring andrae crouch</str>
.....
</arr>
</lst>
<lst name="f">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">9</int>
<arr name="suggestion">
<str>f**k the facts</str>
<str>fairest lord jesus</str>
<str>franz ferdinand</str>
<str>françois rauber</str>
.........
</arr>
</lst>
<str name="collation">michael bolton f**k the facts</str>
</lst>
</lst>
</response>.
So when i search with "michael f" then, i should get "michael foret" and "michael force" only. Data coming starts with "f". Please suggest me if there's anything wrong in my configuration settings in solr.
Thanks in Advance,
Anil.
I'm working on auto completion search with Solr using EdgeNGrams. If the user is searching for names of employees, then auto completion should be applied. That is, I want the results to be like a Google search. It's working fine for some searches.
File schema.xml:
<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
</analyzer>
<field name="title" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>
<field name="empname" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true" />
<field name="autocomplete_text" type="edgytext" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />
<copyField source="empname" dest="autocomplete_text"/>
<copyField source="title" dest="autocomplete_text"/>
http://local:8080/test/suggest/?q=michael
Result:
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="michael">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>michael bolton</str>
<str>michael foret</str>
<str>michael houser</str>
<str>michael o'brien</str>
<str>michael penn</str>
<str>michael row your boat ashore</str>
<str>michael tilson thomas</str>
<str>michael w. smith</str>
<str>michael w. smith featuring andrae crouch</str>
</arr>
</lst>
<str name="collation">michael bolton</str>
</lst>
</lst>
</response>
It's working fine for me. When I search with michael f
http:// local:8080/test/suggest/?q=michael f
I get a response like:
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="michael">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>michael bolton</str>
<str>michael foret</str>
<str>michael houser</str>
<str>michael o'brien</str>
<str>michael penn</str>
<str>michael row your boat ashore</str>
<str>michael tilson thomas</str>
<str>michael w. smith</str>
<str>michael w. smith featuring andrae crouch</str>
</arr>
</lst>
<lst name="f">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">9</int>
<arr name="suggestion">
<str>f**k the facts</str>
<str>fairest lord jesus</str>
<str>fatboy slim</str>
<str>ffh</str>
<str>fiona apple</str>
<str>foo fighters</str>
<str>frank sinatra</str>
<str>frans bauer</str>
<str>franz ferdinand</str>
<str>françois rauber</str>
</arr>
</lst>
<str name="collation">michael bolton f**k the facts</str>
</lst>
</lst>
</response>.
When I search with michael f then, I should get michael foret only. Data coming starts with f. Is there anything wrong in my configuration settings in Solr?
I wrote [old link] about different ways to make auto-suggestions with Solr and some questions you should ask yourself in order to make the right choice. Briefly, the out-of-the-box ways are:
Facet prefix
NGrams
TermsComponent
Suggester
They all have advantages and limitations at the same time, I'd suggest you to read the article.
If you are looking for a complete and flexible solution, which requires some more work, you can have a look at this article as well.
If you already decided to use NGrams, given your examples you can index your employees using EdgeNGramFilterFactory with minGramSize 1, then search on that field to make auto-suggestions. For the client part you need to use some javascript.
There seems to be no difference for solr to search for
'search/suggest/?q=print%20'
or
'search/suggest/?q=print'
Results are the same. But this is really important for autosuggestion.
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="print">
<int name="numFound">5</int>
<int name="startOffset">0</int>
<int name="endOffset">6</int>
<arr name="suggestion">
<str>printer facsimile toner</str>
<str>print cartridge</str>
<str>printhead printhead</str>
<str>printer copier paper</str>
<str>printer kit</str>
</arr>
</lst>
<str name="collation">printer facsimile toner</str>
</lst>
</lst>
</response>
Try
q=str:"print "
and
q=str:(print )
One of these should work (it depends on, how is your str field analysed).