Whitespace tokenizer not working when using simple query string

Whitespace tokenizer not working when using simple query string - hibernate-search

I first implemented query search using SimpleQueryString shown as follows.
Entity Definition
#Entity
#Indexed
#AnalyzerDef(name = "whitespace", tokenizer = #TokenizerDef(factory = WhitespaceTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = ASCIIFoldingFilterFactory.class)
})
public class AdAccount implements SearchableEntity, Serializable {
#Id
#DocumentId
#Column(name = "ID")
#GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
#Field(store = Store.YES, analyzer = #Analyzer(definition = "whitespace"))
#Column(name = "NAME")
private String name;
//other properties and getters/setters
}
I use the white space tokenizer factory here because the default standard analyzer ignores special characters, which is not ideal in my use case. The document I referred to is https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-WhiteSpaceTokenizer. In this document it states that Simple tokenizer that splits the text stream on whitespace and returns sequences of non-whitespace characters as tokens.
SimpleQueryString Method
protected Query inputFilterBuilder() {
SimpleQueryStringMatchingContext simpleQueryStringMatchingContext = queryBuilder.simpleQueryString().onField("name");
return simpleQueryStringMatchingContext
.withAndAsDefaultOperator()
.matching(searchRequest.getQuery() + "*").createQuery();
}
searchRequest.getQuery() returns the search query string, then I append the prefix operator in the end so that it supports prefix query.
However, this does not work as expected with the following example.
Say I have an entity whose name is "AT&T Account", when searching with "AT&", it does not return this entity.
I then made the following changes to directly use a white space analyzer. This time searching with "AT&" works as expected. But the search is case sensitive now, i.e, searching with "at&" returns nothing now.
#Field
#Analyzer(impl = WhitespaceAnalyzer.class)
#Column(name = "NAME")
private String name;
My questions are:
Why doesn't it work when I use the white space factory in my first attempt? I assume using the factory versus using the actual analyzer implementation is different?
How to make my search case-insensitive if I use the #Analyzer annotation as in my second attempt?

Why doesn't it work when I use the white space factory in my first attempt? I assume using the factory versus using the actual analyzer implementation is different?
Wildcard and prefix queries (the one you're using when you add a * suffix in your query string) do not apply analysis, ever. Which means your lowercase filter is not applied to your search query, but it has been applied to your indexed text, which means it will never match: AT&* does not match the indexed at&t.
Using the #Analyzer annotation only worked because you removed the lowercasing at index time. With this analyzer, you ended up with AT&T (uppercase) in the index, and AT&* does match the indexed AT&T. It's just by chance, though: if you index At&t, you will end up with At&t in the index and you'll end up with the same problem.
How to make my search case-insensitive if I use the #Analyzer annotation as in my second attempt?
As I mentioned above, the #Analyzer annotation is not the solution, you actually made your search worse.
There is no built-in solution to make wildcard and prefix queries apply analysis, mainly because analysis could remove pattern characters such as ? or *, and that would not end well.
You could restore your initial analyzer, and lowercase the query yourself, but that will only get you so far: ascii folding and other analysis features won't work.
The solution I generally recommend is to use an edge-ngrams filter. The idea is to index every prefix of every word, so "AT&T Account" would get indexed as the terms a, at, at&, at&t, a, ac, acc, acco, accou, accoun, account and a search for "at&" would return the correct results even without a wildcard.
See this answer for a more extensive explanation.
If you use the ELasticsearch integration, you will have to rely on a hack to make the "query-only" analyzer work properly. See here.

Related

How to use ICU collation rules to sort the data in a JavaFX TableView?

I wanted to use ICU collation rules to sort the (String) data in a TableColumn in a TableView using JavaFX and could not find an example online. Here is what worked for me. (I'm assuming the reader already knows how to get data into the TableView since that is not what is in focus.)

First, we import an ICU RuleBasedCollator:
import com.ibm.icu.text.RuleBasedCollator;
Second, suppose we have a Person class with first name and last name String fields. The TableView has two TableColumns, one for the first name and one for the second name:
TableView<Person> personTable;
TableColumn<Person, String> firstNameColumn;
TableColumn<Person, String> lastNameColumn;
Third, in the view controller's initialize() method, add something like the following:
String newRules = "& S < C & Mu < Mue";
RuleBasedCollator collatorViaRules = new RuleBasedCollator(newRules);
Comparator<String> comparatorViaRules = Comparator.comparing(String::toString, collatorViaRules);
firstNameColumn.setComparator((String s1, String s2) -> {
return comparatorViaRules.compare(s1, s2);
});
lastNameColumn.setComparator((String s1, String s2) -> {
return comparatorViaRules.compare(s1, s2);
});
The two ICU rules in newRules will put any C after S and put Mu... before Mue. (These are not intended to make great sense here; they are to see if the ICU rules are applied. A real case could have much more complicated rules.)
We create an ICU RuleBasedCollator using the ICU rules and then create a Comparator using those rules.
Finally, we set the comparator of the column fields to use this comparator.

Javers:Ignore specific fields in Value Object while comparing two jsons

I am trying to compare two jsons, expected and the API Response using Javers, as part of testing. I want the comparison to exclude the ID parameters that are dynamically generated by response.
My VO is like
public class expectedResponse{
#DiffIgnore
private String id;
private String name;
}
Both my expectedResponse- which is read from excel file and the actual response from API are deserialized into this format and then both the responses are compared.
JsonNode expectedOutput = mapper.readTree(expected.toString());
JsonNode apiResponse = mapper.readTree(actual.toString());
diff=javers.compare(expectedOutput, apiResponse);
But this comparison doesn't exclude/ignore the ID field. Any Idea how I can get it to work? I want only the ID field excluded in comparison results, diff in name should be listed.
Also question 2> I am trying to list the changes from diff
if (diff.hasChanges())
{
List<ValueChange> changes=diff.getChangesByType(ValueChange.class);
for (ValueChange change : changes)
{
logger.info(change.getPropertyName()+ "||" +change.getLeft().toString() + "||" +change.getRight().toString());
change.getPropertyName()- doesnt print the property's name but simply prints "_value" as its value.
Can you pls help in identifying what is going wrong with the code and how can I get this fixed? I am not finding much useful documentations about Javers anywhere in google. Any help is appreciated.

You should compare you domain object instead of object with JsonNode class, look that #DiffIgnore annotation is present only in your domain class and there is no connection between JsonNode and ExpectedResponse, thats why Javers doesn't know to ignore this field.
To summarise, your code should looks like this:
ExpectedResponse expectedOutput = ...
ExpectedResponse apiResponse = ...
diff=javers.compare(expectedOutput, apiResponse);

How to check SQL, Language syntax in String at compile time (Scala)

I am writing a translator which converts DSL to multiple programming language (It seems like Apache Thrift).
For example,
// an example DSL
LOG_TYPE: COMMERCE
COMMON_FIELD : session_id
KEY: buy
FIELD: item_id, transaction_id
KEY: add_to_cart
FIELD: item_id
// will be converted to Java
class Commerce {
private String session_id
private String key;
private String item_id;
private String transaction_id
// auto-created setter, getter, helper methods
...
}
It also should be translated into objective-c and javascript.
To implement it, I have to replace string
// 1. create or load code fragments
String variableDeclarationInJava = "private String {$field};";
String variableDeclarationInJavascript = "...";
String variableDeclarationInObjC = "...";
// 2. replace it
variableDeclarationInJava.replace(pattern, fieldName)
...
Replacing code fragment in String is not type safe and frustrating since it does not any information even if there are errors.
So, my question is It is possible to parse String at compile time? like Scala sqltyped library
If it is possible, I would like to know how can I achieve it.
Thanks.

As far, as I understand, it could be. Please take a look at string interpolation. You implement a custom interpolator, (like it was done for quasi quotations or in Slick).
A nice example of the thing you may want to do is here

Can a composite format in String.Format() be used to return a substring?

I apologize for what seems to be an exceedingly simple question, but after 4 hours of searching and beating my head against the wall, I'm doubting my sanity.
I need a string format expression that trims a supplied string argument much like Substring(0,1). Although I've never seen this in code, it just seems like it should be possible.
Here's a very basic example of what I'm trying to do:
string ClassCode = "B608H2014"; // sample value from the user's class schedule
string fRoom = "{0:0,2}";
string fGrade = "{0:2,2}";
string fYear = "{0:5,4}";
string Classroom = String.Format(fRoom, ClassCode); // intended result - "B6"
string Gradelevel = String.Format(fGrade, ClassCode); // intended result - "08"
string Schoolyear = String.Format(fYear, ClassCode); // intended result - "2014"
This is a very basic example, but I'm trying to use the String.Format() so I can store the format pattern(s) in the database for each respective DTO property since the class code layouts are not consistent. I can just pass in the formats from the database along with the required properties and it extracts what I need.
Does this make sense?

GWT messages interface with lookup support

I'm working on a new application and i need to make a messages interface with lookup, using the key to find the value (like ConstantsWithLookup, but capable to receive parameters). I have being investigating Dictionary class functionality but it lacks message customization through parameters.
With ConstantsWithLookup i can make the following:
myConstantsWithLookupInterface.getString("key");
and get something like:
Field must be filled with numbers
But i need to do this:
myMessagesWithLookupInterface.getString("key", "param1", "param2",...);
and get something like:
Field _param1_ must be filled with numbers greater than _param2_
I have no clue how to do this.

Use GWT regular expressions:
//fields in your class
RegEx pattern1 = RegEx.compile("_param1_");
RegEx pattern2 = RegEx.compile("_param2_");
public String getString(String key, String replace1, String replace2){
// your original getString() method
String content = getString(key);
content = pattern1.replace(content,replace1);
content = pattern2.replace(content,replace2);
return content;
}
If your data contains Field _param1_ must be filled with numbers greater than _param2_ then this will replace _param1_ with content of string replace1.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Whitespace tokenizer not working when using simple query string - hibernate-search

Related

How to use ICU collation rules to sort the data in a JavaFX TableView?

Javers:Ignore specific fields in Value Object while comparing two jsons

How to check SQL, Language syntax in String at compile time (Scala)

Can a composite format in String.Format() be used to return a substring?

GWT messages interface with lookup support

Categories

Resources