Titan : Elastic Search Ignore stop word - titan

I am indexing country code as vertex property
v.setProperty("country","IN");
but when I do search like
g.indexQuery("search","v.country:IN").vertices();
It return zero result. I think its because of it hadling IN as stop word. How can I avoid this ?

You can get around this by indexing country as a String:
import com.thinkaurelius.titan.core.Parameter
g.makeKey("country").dataType(String.class).indexed("search", Vertex.class,
Parameter.of(Mapping.MAPPING_PREFIX, Mapping.STRING)).make()
g.commit()
See Full Text and String Search for further details.
Cheers,
Daniel

Related

Possible to get words matched fuzzily by MongoDB full text search?

I'm writing a UI that presents the results of a MongoDB full text search query, visually highlighting the matched search terms in each result; this works well enough for full word or phrase matches, but not for partial/fuzzy matches.
For example, if I search for "delete" a will get a search result that contains "deletion", which does not contain the full word "delete" and therefore won't be highlighted if I merely highlight the full search term matches. I do want the partial matches, though.
Is there any way to project the set of matched words/substrings when I execute the query?
I've so far been unable to find anything in the docs that hints at this being possible, but I thought it worth asking around. Any help would be greatly appreciated.
You can use the Mongo DB Atlas feature where you can search your text based on different Analyzers that MongoDB provides. And you can then do a search like this: Without the fuzzy object, it would do a full-text-match search.
$search:{
{
index: 'analyzer_name_created_from_atlas_search',
text: {
query: 'Text to do a full match or fuzzy match with',
path: 'sentence',
fuzzy:{
maxEdits: 2 #max 2 is allowed
}
}
}
}

how to find partial search in Mongodb?

How to find partial search?
Now Im trying to find
db.content.find({$text: {$search: "Customer london"}})
It finds all records matching customer, and all records matching london.
If I am searching for a part of a word for example lond or custom
db.content.find({$text: {$search: "lond"}})
It returns an empty result. How can I modify the query to get the same result like when I am searching for london?
You can use regex to get around with it (https://docs.mongodb.com/manual/reference/operator/query/regex/). However, it will work for following :
if you have word Cooking, following queries may give you result
cooking(exact matching)
coo(part of the word)
cooked(The word containing the english root of the document word, where cook is the root word from which cooking or cooked are derived)
If you would like to go one step further and get a result document containing cooking when you type vooking (missplled V instead of C), go for elasticsearch.
Elasticsearch is easy to setup, has extremely powerful edge-ngram analyzer which converts each words into smaller weightage words. Hence when you misspell, you will still get a document based on score elasticsearch gives to that document.
You can read about it here : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
it will always return the empty array for partial words like when you are searching for lond to get this type of text london..
Because it take full words and search same as that they are ..
Not achive same results like :-
LO LON LOND LONDO LONDON
Here you may get help from ELASTIC-SEARCH . It is quite good for full text search when implement with mongoDB.
Refrence : ElasticSearch
Thanks
The find all is to an Array
clientDB.collection('details').find({}).toArray().then((docs) =>
I now used the str.StartWith in a for loop to pick out my record.
if (docs[i].name.startsWith('U', 0)) {
return console.log(docs[i].name);
} else {
console.log('Record not found!!!')
};
This may not be efficient, but it works for now

whoosh doesn't search for short words like "C#"

i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)

Facing issue when search using Zend_Lucene

I am using zend_lucene for search functionality.I 've the following code,
$doc->addField(Zend_Search_Lucene_Field::Text('categoryName', $result->name));
Here name in "$result->name" is varchar type in Database. Also have some following values like dinesh,kumar123,3333. For testing purpose i have stored number in name field. when i search dinesh , Search comes with exact result but when i use number search, That is 3333 Search has no result. What i done wrong on the code Zend_Search_Lucene_Field::Text.
Is there any way for search number/char/alphanumeric (kumar123) ?
Thanks in Advance
Finally i found by declaring "Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());" and use Zend_Search_Lucene_Field::Keyword instead of Zend_Search_Lucene_Field::Text

Make Lucene index a value and store another

I want Lucene.NET to store a value while indexing a modified, stripped-down version of the stored value. e.g. Consider the value:
this_example-has some/weird (chars) 100%
I want it stored right like that (so that I can retrieve exactly that for showing in the results list), but I want lucene to index it as:
this example has some weird chars 100
(you see, like a "sanitized" version of the original value) for a simplified search.
I figure this would be the job of an analyzer, but I don't want to mess with rolling my own. Ideally, the solution should remove everything that is not a letter, a number or quotes, replacing the removed chars by a white-space before indexing.
Any suggestions on how to implement that?
This is because I am indexing products for an e-commerce search, and some have realy creepy names. I think this would improve search assertiveness.
Thanks in advance.
If you don't want a custom analyzer, try storing the value as a separate non-indexed field, and use a simple regex to generate the sanitized version.
var input = "this_example-has some/weird (chars) 100%";
var output = Regex.Replace(input, #"[\W_]+", " ");
You mention that you need another Analyzer for some searching functionality. Dont forget the PerFieldAnalyzerWrapper which will allow you to use different analyzers within the same document.
public static void Main() {
var wrapper = new PerFieldAnalyzerWrapper(defaultAnalyzer: new StandardAnalyzer(Version.LUCENE_29));
wrapper.AddAnalyzer(fieldName: "id", analyzer: new KeywordAnalyzer());
IndexWriter writer = null; // TODO: Retrieve these.
Document document = null;
writer.AddDocument(document, analyzer: wrapper);
}
You are correct that this is the work of the analyzer. And I'd start by using a tool like luke to see what the standard analyzer does with your term before getting into what to use -- it tends to do a good job stripping noise characters and words.