Lucene.net query for contains and avoiding empty string fields

Lucene.net query for contains and avoiding empty string fields - lucene.net

I have a Lucene index setup which I can query fine. I just am not able to get a "field not equals to empty string" condition to work. For example in the below code specimen, I want to have 3 conditions
Where "country tag" field contains "{4ED2F7EE-5C2A-418C-B2F6-236F94166BA1}".
Where "country tag" field is not empty string.
Where "date" range is between "20110101T000000" and "20121001T000000".
WildcardQuery taggingQuery = new WildcardQuery(new Term("country tag", "*" + ShortID.Encode("{4ED2F7EE-5C2A-418C-B2F6-236F94166BA1}").ToLowerInvariant() + "*"));
TermQuery taggingNotQuery = new Term("country tag", " "));
RangeQuery rangeQuery = new RangeQuery(new Term("date", "20110101T000000"), new Term("date", "20121001T000000"), true);
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.Add(taggingQuery, BooleanClause.Occur.MUST);
booleanQuery.Add(taggingNotQuery, BooleanClause.Occur.MUST_NOT);
booleanQuery.Add(rangeQuery, BooleanClause.Occur.MUST);
I have a feeling I am doing this wrong or my query is wrong somehow. I should not need a condition where I should have to look out for empty or null fields.
Any help is appreciated!

If you allow '*' as the first character in the search string, Lucene can use queries like "countrytag:*" to find all documents that contain anything in the countrytag field. (Lucene's default is to disable an initial '*' in a query string.)

I should've paid more attention when setting up the index. I forgot to add field analyzers for each field. The multilist fields were getting indexed with a different analyzer instead of the standard analyzer. I added this to my config section for field crawls and my query started working
<fieldTypes hint="raw:AddFieldTypes">
<!-- Text fields need to be tokenized -->
<fieldType name="single-line text" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="multi-line text" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="word document" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="html" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="rich text" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="memo" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="text" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<!-- Multilist based fields need to be tokenized to support search of multiple values -->
<fieldType name="multilist" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="treelist" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="treelistex" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<fieldType name="checklist" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
<!-- Legacy tree list field from ver. 5.3 -->
<fieldType name="tree list" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
</fieldTypes>

Related

SharePoint PnP - Adding fields to default view

I’m using the SharePoint PnP templates to deploy a list instance to SharePoint 2016. Is it possible to make the field “TheName” part of the default view or a particular view using an attribute?
Below I’ve set the “TheName” field attributes Viewable="true" ShowInDisplayForm="true" ShowInViewForms="true" but that has not resulted in “TheName” being part of the default view. Below is the xml for the list instance:
<pnp:ListInstance Title="Application" Description="" EnableAttachments="true" DocumentTemplate="" TemplateType="100" Url="Lists/Application" MinorVersionLimit="0" MaxVersionLimit="0" DraftVersionVisibility="0" TemplateFeatureID="00bfea71-de22-43b2-a848-c05709900100" ContentTypesEnabled="true" EnableFolderCreation="true">
<pnp:ContentTypeBindings>
<pnp:ContentTypeBinding ContentTypeID="0x0109413FF39DA2049E08C8B9564402E3562" Default="true" />
</pnp:ContentTypeBindings>
<pnp:Fields>
<pnp:Field Type="Text" DisplayName="TheName" StaticName="TheName" Name="TheName" Default="true" ID="{db2beb10-5325-434d-a559-691e340a4fea}" Viewable="true" ShowInDisplayForm="true" ShowInViewForms="true" />
</pnp:Fields>
</pnp:Views>
</pnp:ListInstance>
I can explicitly create/set the default view to include “TheName” but then I end up having to list all the fields that are part of the list including the ones which come from a site content type. This can become a hassle to maintain. The below list instance xml displays “TheName” as part of the default view:
<pnp:ListInstance Title="Application" Description="" EnableAttachments="true" DocumentTemplate="" TemplateType="100" Url="Lists/Application" MinorVersionLimit="0" MaxVersionLimit="0" DraftVersionVisibility="0" TemplateFeatureID="00bfea71-de22-43b2-a848-c05709900100" ContentTypesEnabled="true" EnableFolderCreation="true">
<pnp:ContentTypeBindings>
<pnp:ContentTypeBinding ContentTypeID="0x0109413FF39DA2049E08C8B9564402E3562" Default="true" />
</pnp:ContentTypeBindings>
<pnp:Fields>
<pnp:Field Type="Text" DisplayName="TheName" StaticName="TheName" Name="TheName" Default="true" ID="{db2beb10-5325-434d-a559-691e340a4fea}" Viewable="true" ShowInDisplayForm="true" ShowInViewForms="true" />
</pnp:Fields>
<pnp:Views>
<View DisplayName="All Items">
<ViewFields>
<FieldRef Name="TheName" />
<FieldRef Name="Title" />
<FieldRef Name="ApplicationId" />
<FieldRef Name="Case" />
</ViewFields>
</View>
</pnp:Views>
</pnp:ListInstance>

Data type not working in Solr

I wanna fetch records including a date type from Cassandra in solr, the following are my codes:
in dataconfig.xml:
<entity name="artist" query="SELECT artist_id, name, email, total_jobs, created FROM artist_list">
<field column="artist_id" template="ARTIST_${artist.artist_id}" name="id"/>
<field column="created" name="artist_created" />
</entity>
in schema.xml:
<fieldType name="tdate" class="solr.TrieDoubleField" omitNorms="true" />
<field name="artist_created" type="tdate" indexed="false" stored="true"/>
But the result did not contain created field. Is there anyone can tell me what the problem is? Thanks very much!

You are defining tdate data type as solr.TrieDoubleField. That's why result don't contain artist_created data.
Change your schema to :
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<field name="artist_created" type="date" indexed="false" stored="true"/>

not all fields are copied from mongodb to solr after integration using mongo-connector

I am able to successfully integrate between MONGODB & SOLR, using MONGO-CONNECTOR. And whenever, I update or add any thing, in the sample collection i have created, it copies only two or three fields in a documents, and rest of the fields data are not copied into solr. This is some thing I am not able to do it.
This is my collection and its document details. Name of collection: testdb
document inserted as follows:
db.testdb.insert( {
... _id: "101",
... name: "test",
... description: "descr",
... mydesc: "mydescr",
... nmdsc: "nmdsc1",
... coords: "coords1"
... })
And the data sync between solr and mongo logs says successful:
2014-01-17 19:35:38,462 - INFO - Finished 'http://<hostname>:<port>/solr/update/?co
mmit=true' (post) with body '<add><doc>' in 0.210 seconds.
But when I execute a query to see the document data it says only these fields data:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"wt": "json"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "101",
"description": "descr",
"name": "test",
"_version_": 1457486601392226300
}
]
}
}
Clearly i can see that following fields & respective data are not copied into solr:
... mydesc: "mydescr",
... nmdsc: "nmdsc1",
... coords: "coords1"
Following is my schema.xml:
<schema name="narayana" version="1.5">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" />
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
<fieldType name="text_wslc" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0" />
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate" />
<fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0" />
</types>
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_wslc" indexed="true" stored="true" />
<field name="description" type="text_wslc" indexed="true" stored="true" />
<field name="mydesc" type="text_wslc" indexed="true" stored="true" />
<field name="nmdsc" type="text_wslc" indexed="true" stored="true" multiValued="true" />
<field name="coords" type="string" indexed="true" stored="true" multiValued="true" />
<field name="_version_" type="long" indexed="true" stored="true" />
<dynamicField name="*" type="string" indexed="true" stored="true" />
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>nmdsc</defaultSearchField>
<!-- we don't want too many results in this usecase -->
<solrQueryParser defaultOperator="AND" />
<copyField source="name" dest="nmdsc" />
<copyField source="description" dest="nmdsc" />
</schema>

You are defining your configuration to copy only two specific fields, if you want to index additional fields, you should be adding them to your configuration file and the same import cycle again:
<!-- we don't want too many results in this usecase -->
<solrQueryParser defaultOperator="AND" />
<copyField source="name" dest="nmdsc" />
<copyField source="description" dest="nmdsc" />

Solr indexing of MongoDB collection

Suppose I have a test application representing some friends list. The application uses a collection where all documents are in the following format:
_id : ObjectId("someString"),
name : "George",
description : "some text",
age : 35,
friends : {
[
{
name: "Peter",
age: 30
town: {
name_town: "Paris",
country: "France"
}
},
{
name: "Thomas",
age: 25
town: {
name_town: "Berlin",
country: "Germany"
}
}, ... // more friends
]
}
... // more documents
How can I describe such collection in the schema.xml ? I need to produce facet queries like: "Give me countries, where George's friends live". Another use case may be - "Return all documents(persons), whose friend is 30 years old." etc.
My initial idea is to mark "friends" attribute as text field by this schema.xml definition:
<fieldType name="text_wslc" class="solr.TextField" positionIncrementGap="100">
....
<field name="friends" type="text_wslc" indexed="true" stored="true" />
and try to search for eg. "age" and "30" words in the text, but it is not a very reliable solution.
Please, leave aside not logically well-formed architecture of the collection. It is only an example of similar problem I am just facing.
Any help or idea will be highly appreciated.
EDIT:
Sample 'schema.xml'
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="text-schema" version="1.5">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
<fieldType name="trInt" class="solr.TrieIntField" precisionStep="0" omitNorms="true" />
<fieldType name="text_p" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="_id" type="string" indexed="true" stored="true" required="true" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_ts" type="long" indexed="true" stored="true"/>
<field name="ns" type="string" indexed="true" stored="true"/>
<field name="description" type="text_p" indexed="true" stored="true" />
<field name="name" type="text_p" indexed="true" stored="true" />
<field name="age" type="trInt" indexed="true" stored="true" />
<field name="friends" type="text_p" indexed="true" stored="true" /> <!-- Here is the problem - when the type is text_p, all fields are considered as a text; optimal solution would be something like "collection" tag to mark name_town and town as descendant of the field 'friends' but unfortunately, this is not how the solr works-->
<field name="town" type="text_p" indexed="true" stored="true"/>
<field name="name_town" type="string" indexed="true" stored="true"/>
<field name="town" type="string" indexed="true" stored="true"/>
</fields>
<uniqueKey>_id</uniqueKey>

As Solr is document-centric you will need to flatten as much as you can down. According to the sample you have given, I would create a schema.xml like the one below.
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="friends" version="1.0">
<fields>
<field name="id"
type="int" indexed="true" stored="true" multiValued="false" />
<field name="name"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="description"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="age"
type="int" indexed="true" stored="true" multiValued="false" />
<field name="town"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="townRaw"
type="string" indexed="true" stored="true" multiValued="false" />
<field name="country"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="countryRaw"
type="string" indexed="true" stored="true" multiValued="false" />
<field name="friends"
type="int" indexed="true" stored="true" multiValued="true" />
</fields>
<copyField source="country" dest="countryRaw" />
<copyField source="town" dest="townRaw" />
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="int" class="solr.TrieIntField"
precisionStep="0" positionIncrementGap="0" />
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
</types>
</schema>
I would go with the approach to model each person for itself. The relationship between two persons is modelled via the attribute friends, which translates into an array of IDs. So at index time you would need to fetch the IDs of all friends for a person and put them into that field.
Most of the other fields are straight forward. Interesting are the two Raw fields. Since you said that you want to facet on the country you will need the country unchanged or optimized for faceting. Usually the types of fields differ depending on their purpose (searching for them, faceting by them, autosuggesting them, etc.). In this case country and town are indexed just as they are given in.
Now to your use cases,
Give me countries, where George's friends live
This can then be done by faceting. You could query
for the ID of George
facet on countryRaw
Such a query would look like q=friends:1&rows=0&facet=true&facet.field=countryRaw&facet.mincount=1
Return all documents(persons), whose friend is 30 years old.
This one is harder. First off you will need Solr's join feature. You need to configure this in your solrconfig.xml.
<config>
<!-- loads of other stuff -->
<queryParser name="join" class="org.apache.solr.search.JoinQParserPlugin" />
<!-- loads of other stuff -->
</config>
The according join query would look like this q={!join from=id to=friends}age:[30 TO *]
This explains as follows
with age:[30 TO *] you search for all persons that are of age 30 or older
then you take their id and join it on the friends attibute of all others
this will return you all persons that have the ids matched by the initial query within their friends attribute
As I have not written this off of my mind, you may have a look on my solrsample project on github. I have added a test case there that deals about the question:
https://github.com/chriseverty/solrsample/blob/master/src/main/java/de/cheffe/solrsample/FriendJoinTest.java

Solr autocomplete based on edismax type error

i receive the following error when trying to implement auto-complete based on edismax type.
SEVERE: java.lang.IllegalStateException: field "locality_ng" was indexed without position data; cannot run PhraseQuery (term=львів)
at org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:241)
at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:145)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:317)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:324)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1275)
schema types:
<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<!-- autocomplete_edge : Will match from the left of the field, e.g. if the document field
is "A brown fox" and the query is "A bro", it will match, but not "brown"
-->
<fieldType name="autocomplete_edge" class="solr.TextField">
<analyzer type="index">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([\.,;:_])" replacement=" " replace="all"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="50" minGramSize="2"/>
</analyzer>
<analyzer type="query">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([\.,;:_])" replacement=" " replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^(.{50})(.*)?" replacement="$1" replace="all"/>
</analyzer>
</fieldType>
<!-- autocomplete_ngram : Matches any word in the input field, with implicit right truncation.
This means that the field "A brown fox" will be matched by query "bro".
We use this to get partial matches, but these whould be boosted lower than exact and left-anchored
-->
<fieldType name="autocomplete_ngram" class="solr.TextField">
<analyzer type="index">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" minGramSize="2"/>
</analyzer>
<analyzer type="query">
<!--charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^(.{20})(.*)?" replacement="$1" replace="all"/>
</analyzer>
</fieldType>
schema fields
<!-- AutoComplete fields
Construct documents containing these fields for all suggestions you like to provide
Then use a dismax query to search on some fields, display some fields and boost others
-->
<field name="locality_id" type="int" indexed="true" stored="true" required="true"/>
<!-- The main text to return as the suggestion. This is not searched -->
<field name="locality_suggest" type="text_suggest" indexed="true" stored="true" omitNorms="true" multiValued="true" />
<!-- A variant of textsuggest which only matches from the very left edge -->
<copyField source="locality_suggest" dest="locality_nge"/>
<field name="locality_nge" type="autocomplete_edge" indexed="true" stored="false" multiValued="true" />
<!-- A variant of textsuggest which matches from the left edge of all terms (implicit truncation) -->
<copyField source="locality_suggest" dest="locality_ng"/>
<field name="locality_ng" type="autocomplete_ngram" indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true" multiValued="true" />
solr config, use the following request handler with edismax type:
<requestHandler class="solr.SearchHandler" name="autocomplete" >
<lst name="defaults">
<str name="defType">edismax</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="qf">locality_suggest locality_ng^5.0 locality_nge^10.0</str>
<str name="debugQuery">false</str>
</lst>
this error occurred only if query contains specific symbols like + - $ # after word
львів+в
київ+а
any suggestions would be great

Change
<field name="locality_ng" type="autocomplete_ngram" indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true" multiValued="true" />
into
<field name="locality_ng" type="autocomplete_ngram" indexed="true" stored="false" omitNorms="true" multiValued="true" />

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Lucene.net query for contains and avoiding empty string fields - lucene.net

If you allow '' as the first character in the search string, Lucene can use queries like "countrytag:" to find all documents that contain anything in the countrytag field. (Lucene's default is to disable an initial '*' in a query string.)

Related

SharePoint PnP - Adding fields to default view

Data type not working in Solr

not all fields are copied from mongodb to solr after integration using mongo-connector

Solr indexing of MongoDB collection

Solr autocomplete based on edismax type error

Categories

Resources