Using an Analyzer within a custom FieldBridge - hibernate-search

I have a List getter method that I want to index (tokenized) into a number of fields.
I have a FieldBridge implementation that iterates over the list and indexes each string into a field with the index appended to the field name to give a different name for each.
I have two different Analyzer implementations (CaseSensitiveNGramAnalyzer and CaseInsensitiveNGramAnalyzer) that I want to use with this FieldBridge (to make a case-sensitive and a case-insensitive index of the field).
This is the FieldBridge I want to apply the Analyzers to:
public class StringListBridge implements FieldBridge
{
#Override
public void set(String name, Object value, Document luceneDocument, LuceneOptions luceneOptions)
{
List<String> strings = (List<String>) value;
for (int i = 0; i < strings.size(); i++)
{
addStringField(name + 1, strings.get(i), luceneDocument, luceneOptions);
}
}
private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
{
Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
luceneDocument.add(field);
}
}
Is it possible to apply an Analyzer to a field that uses a FieldBridge?
If so, can this be done with annotations, or does it have to be done programatically?
If the latter, can I inject the Analyzer as a parameter?
I am thinking along the lines of the following, but am not at all familiar with field token streams etc.:
private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
{
Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
try
{
field.setTokenStream(new CaseSensitiveNGramAnalyzer().reusableTokenStream(fieldName, new StringReader(fieldValue)));
}
catch (IOException e)
{
e.printStackTrace();
}
luceneDocument.add(field);
}
Is this a sane approach?
EDIT I have tried specifying the Analyzer and FieldBridge within a #Field annotation (without including the above analyzer code) as follows, but it appears to be using the default analyzer rather than those specified with analyzer = .
#Fields({
#Field(name="content-nocase",
index = Index.TOKENIZED,
analyzer = #Analyzer(impl = CaseInsensitiveNgramAnalyzer.class),
bridge = #FieldBridge(impl = StringListBridge.class)),
#Field(name = "content-case",
index = Index.TOKENIZED,
analyzer = #Analyzer(impl = CaseSensitiveNgramAnalyzer.class),
bridge = #FieldBridge(impl = StringListBridge.class)),
})
public List<String> getContents()

The solution atm is via a custom scoped analyzer or using #AnalyzerDiscriminator together with #AnalyzerDef. This is also discussed on the Hibernate Search forum - https://forum.hibernate.org/viewtopic.php?f=9&t=1016667

I managed to get this working. Hibernate Search appears not to use the specified Analyzer when both analyzer = and bridge = are specified, at least if the specified bridge creates multiple fields.
Manually passing the TokenStream from the desired analyzer to the generated Fields in the bridge got me the expected result:
private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
{
Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
// manually apply token stream from analyzer, as hibernate search does not
// apply the specified analyzer properly
try
{
field.setTokenStream(analyzer.reusableTokenStream(fieldName, new StringReader(fieldValue)));
}
catch (IOException e)
{
e.printStackTrace();
}
luceneDocument.add(field);
}
ParameterizedBridge is implemented to specify which analyzer to use (analyzer is instantiated and stored in a field before this method is called).

Related

How to trim Strings before inserting to MongoDB using MongoTemplate?

Basically, I am fetching data from another source and creating my db collections. But some of the data has spaces in the end, causing issues in frontend when used later.
Is there a generic way to trim all String fields of all Collections before inserting and updating to MongoDB using spring mongoTemplate configuration/code.
I don't want to write logic specific to each type of collection and each field in it. Also, is it a good practice to put such logic at DB repositories level?
Try to use AbstractMongoEventListener. For example, create create your own implementation of AbstractMongoEventListener:
#Component
class SaveMongoEventListener extends AbstractMongoEventListener<Object> {
#Override
public void onBeforeConvert(BeforeConvertEvent<Object> event) {
Object source = event.getSource();
for (Field field : source.getClass().getFields()) {
if (field.getType().isAssignableFrom(String.class)) {
try {
String value = (String) field.get(source);
field.setAccessible(true);
field.set(value != null ? value.trim(): value, source);
} catch (IllegalAccessException e) {
e.printStackTrace();
}
}
}
}
}
This implementation will trim all strings in your object before converting it to MongoDB object. The listener should work for all your collections. Do not forget to register this listener in Spring context.
To trim string after loading from MongoDB, you should do the same in onAfterConvert event.

How to use BeanWrapperFieldSetMapper to map a subset of fields?

I have a Spring batch application where BeanWrapperFieldSetMapper is used to map fields using a prototype object. However, the CSV file that is being read (via a FlatFileItemReader) contains one (indicator) field that determines the mapping of another field. If the indicator field has a value of Y, then the value of the another field should be mapped to property foo otherwise it should be mapped to property bar.
I know that I can use a custom FieldSetMapper to do this, but then I have to code the mapping all of the other fields (of which there are a quite a few). Alternatively, I could do this post reading via an ItemProcessor but then my domain (prototype) object must have a property representing the indicator field (which I prefer not to do since it is not really part of the business domain).
Is it possible to perhaps use a custom FieldSetMapper to only map these custom fields and delegate the other mappings to BeanWrapperFieldSetMapper? Or is there some other better way to solve for this?
Here is my current attempt to use a custom FieldSetMapper and delegate to BeanWrapperFieldSetMapper:
public class DelegatedFieldSetMapper extends BeanWrapperFieldSetMapper<MyProtoClass> {
#Override
public MyProtoClass mapFieldSet(FieldSet fieldSet) throws BindException {
String indicator = fieldSet.readString("indicator");
Properties fieldProperties = fieldSet.getProperties();
if (indicator.equalsIgnoreCase("y")) {
fieldProperties.put("test.foo", fieldSet.readString("value");
} else {
fieldProperties.put("test.bar", fieldSet.readString("value");
}
fieldProperties.remove("indicator");
Set<Object> keys = fieldProperties.keySet();
List<String> names = new ArrayList<String>();
List<String> values = new ArrayList<String>();
for (Object key : keys) {
names.add((String) key);
values.add((String) fieldProperties.getProperty((String) key));
}
DefaultFieldSet domainObjectFieldSet = new DefaultFieldSet(names.toArray(new String[names.size()]), values.toArray(new String[values.size()]));
return super.mapFieldSet(domainObjectFieldSet);
}
}
However, a FlatFileParseException is thrown. The relevant parts of the batch config class are as follows:
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Value("${file}")
private File file;
#Bean
#Scope("prototype")
public MyProtoClass () {
return new MyProtoClass();
}
#Bean
public ItemReader<MyProtoClass> reader(LineMapper<MyProtoClass> lineMapper) {
FlatFileItemReader<MyProtoClass> flatFileItemReader = new FlatFileItemReader<MyProtoClass>();
flatFileItemReader.setResource(new FileSystemResource(file));
final int NUMBER_OF_HEADER_LINES = 1;
flatFileItemReader.setLinesToSkip(NUMBER_OF_HEADER_LINES);
flatFileItemReader.setLineMapper(lineMapper);
return flatFileItemReader;
}
#Bean
public LineMapper<MyProtoClass> lineMapper(LineTokenizer lineTokenizer, FieldSetMapper<MyProtoClass> fieldSetMapper) {
DefaultLineMapper<MyProtoClass> lineMapper = new DefaultLineMapper<MyProtoClass>();
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
return lineMapper;
}
#Bean
public LineTokenizer lineTokenizer() {
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setNames(new String[] {"value", "test.bar", "test.foo", "indicator"});
return lineTokenizer;
}
#Bean
public FieldSetMapper<MyProtoClass> fieldSetMapper(PropertyEditor emptyStringToNullPropertyEditor) {
BeanWrapperFieldSetMapper<MyProtoClass> fieldSetMapper = new DelegatedFieldSetMapper();
fieldSetMapper.setPrototypeBeanName("myProtoClass");
Map<Class<String>, PropertyEditor> customEditors = new HashMap<Class<String>, PropertyEditor>();
customEditors.put(String.class, emptyStringToNullPropertyEditor);
fieldSetMapper.setCustomEditors(customEditors);
return fieldSetMapper;
}
Finally, the CSV flat file look like this:
value,bar,foo,indicator
abc,,,y
xyz,,,n
Let's say that BatchWorkObject is the class to be mapped.
Here's a sample code in Spring Boot style that needs only your custom logic to be added.
new BeanWrapperFieldSetMapper<BatchWorkObject>(){
{
this.setTargetType(BatchWorkObject.class);
}
#Override
public BatchWorkObject mapFieldSet(FieldSet fs)
throws BindException {
BatchWorkObject tmp= super.mapFieldSet(fs);
// your custom code here
return tmp;
}
});
The code actually accomplishes what is desired except for one issue that results in the FlatFileParseException. The DelegatedFieldSetMapper contains the issue as follows:
DefaultFieldSet domainObjectFieldSet = new DefaultFieldSet(names.toArray(new String[names.size()]), values.toArray(new String[values.size()]));
To resolve, change to:
DefaultFieldSet domainObjectFieldSet = new DefaultFieldSet(values.toArray(new String[values.size()]), names.toArray(new String[names.size()]));
Write your own FieldSetMapper with a set of prepared delegates inside.
Those delegates are pre-built for every different kind of fields mapping.
In your object route to correct delegate based on indicator field (with a Classifier, for example).
I can't see any other way, but this solution is quite easy and straightforward to maintain.
Processing based on the input format/data can be done using a custom implementation of ItemProcessor which is either changing values in the same entity (that was populated by IteamReader) or creates a new one output entity.

Filehelpers and Entity Framework

I'm using Filehelpers to parse a very wide, fixed format file and want to be able to take the resulting object and load it into a DB using EF. I'm getting a missing key error when I try to load the object into the DB and when I try and add an Id I get a Filehelpers error. So it seems like either fix breaks the other. I know I can map a Filehelpers object to a POCO object and load that but I'm dealing with dozens (sometimes hundreds of columns) so I would rather not have to go through that hassle.
I'm also open to other suggestions for parsing a fixed width file and loading the results into a DB. One option of course is to use an ETL tool but I'd rather do this in code.
Thanks!
This is the FileHelpers class:
public class AccountBalanceDetail
{
[FieldHidden]
public int Id; // Added to try and get EF to work
[FieldFixedLength(1)]
public string RecordNuber;
[FieldFixedLength(3)]
public string Branch;
// Additional fields below
}
And this is the method that's processing the file:
public static bool ProcessFile()
{
var dir = Properties.Settings.Default.DataDirectory;
var engine = new MultiRecordEngine(typeof(AccountBalanceHeader), typeof(AccountBalanceDetail), typeof(AccountBalanceTrailer));
engine.RecordSelector = new RecordTypeSelector(CustomSelector);
var fileName = dir + "\\MOCK_ACCTBAL_L1500.txt";
var res = engine.ReadFile(fileName);
foreach (var rec in res)
{
var type = rec.GetType();
if (type.Name == "AccountBalanceHeader") continue;
if (type.Name == "AccountBalanceTrailer") continue;
var data = rec as AccountBalanceDetail; // Throws an error if AccountBalanceDetail.Id has a getter and setter
using (var ctx = new ApplicationDbContext())
{
// Throws an error if there is no valid Id on AccountBalanceDetail
// EntityType 'AccountBalanceDetail' has no key defined. Define the key for this EntityType.
ctx.AccountBalanceDetails.Add(data);
ctx.SaveChanges();
}
//Console.WriteLine(rec.ToString());
}
return true;
}
Entity Framework needs the key to be a property, not a field, so you could try declaring it instead as:
public int Id {get; set;}
I suspect FileHelpers might well be confused by the autogenerated backing field, so you might need to do it long form in order to be able to mark the backing field with the [FieldHidden] attribute, i.e.,
[FieldHidden]
private int _Id;
public int Id
{
get { return _Id; }
set { _Id = value; }
}
However, you are trying to use the same class for two unrelated purposes and this is generally bad design. On the one hand AccountBalanceDetail is the spec for the import format. On the other you are also trying to use it to describe the Entity. Instead you should create separate classes and map between the two with a LINQ function or a library like AutoMapper.

Ormlite and PostgreSQL - Error inserting text array with custom persister

I have been working to setup Ormlite as the primary data access layer between a PostgreSQL database and Java application. Everything has been fairly straightforward, until I started messing with PostgreSQL's array types. In my case, I have two tables that make use of text[] array type. Following the documentation, I created a custom data persister as below:
public class StringArrayPersister extends StringType {
private static final StringArrayPersister singleTon = new StringArrayPersister();
private StringArrayPersister() {
super(SqlType.STRING, new Class<?>[]{String[].class});
}
public static StringArrayPersister getSingleton() {
return singleTon;
}
#Override
public Object javaToSqlArg(FieldType fieldType, Object javaObject) {
String[] array = (String[]) javaObject;
if (array == null) {
return null;
} else {
String join = "";
for (String str : array) {
join += str +",";
}
return "'{" + join.substring(0,join.length() - 1) + "}'";
}
}
#Override
public Object sqlArgToJava(FieldType fieldType, Object sqlArg, int columnPos) {
String string = (String) sqlArg;
if (string == null) {
return null;
} else {
return string.replaceAll("[{}]","").split(",");
}
}
}
And then in my business object implementation, I set up the persister class on the column likeso:
#DatabaseField(columnName = TAGS_FIELD, persisterClass = StringArrayPersister.class)
private String[] tags;
When ever I try inserting a new record with the Dao.create statement, I get an error message saying tags is of type text[], but got character varying... However, when querying existing records from the database, the business object (and text array) load just fine.
Any ideas?
UPDATE:
PostGresSQL 9.2. The exact error message:
Caused by: org.postgresql.util.PSQLException: ERROR: column "tags" is
of type text[] but expression is of type character varying Hint: You
will need to rewrite or cast the expression.
I've not used ormlite before (I generally use MyBatis), however, I believe the proximal issue is this code:
private StringArrayPersister() {
super(SqlType.STRING, new Class<?>[]{String[].class});
}
SqlType.String is mapped to varchar in SQL in the ormlite code, and so therefore I believe is the proximal cause of the error you're getting. See ormlite SQL Data Types info for more detail on that.
Try changing it to this:
private StringArrayPersister() {
super(SqlType.OTHER, new Class<?>[]{String[].class});
}
There may be other tweaks necessary as well to get it fully up and running, but that should get you passed this particular error with the varchar type mismatch.

QueryBuider get parameters for Dao.queryRaw

I'm using QueryBuider to create raw query, but I need to fill parameters to raw query manually.
Properties 'from' and 'to' are filled two times. One in 'where' section of QueryBuider, and one in queryRaw method as parameters.
Method StatementBuilder.prepareStatementString() returns query string with "?" for substitution.
Is there any way to get these parameters directly from QueryBuider instance?
For example, imagine a new method in ormlite - StatementBuilder.getPreparedStatementParameters();
QueryBuilder<AccountableItemEntity, Long> accountableItemQb = accountableItemDao.queryBuilder();
QueryBuilder<AccountingEntryEntity, Long> accountingEntryQb = accountingEntryDao.queryBuilder();
accountingEntryQb.where().eq(
AccountingEntryEntity.ACCOUNTING_ENTRY_STATE_FIELD_NAME,
AccountingEntryStateEnum.CREATED);
accountingEntryQb.join(accountableItemQb);
QueryBuilder<AccountingTransactionEntity, Long> accountingTransactionQb =
accountingTransactionDao.queryBuilder();
accountingTransactionQb.selectRaw("ACCOUNTINGENTRYENTITY.TITLE, " +
"ACCOUNTINGENTRYENTITY.ACCOUNTABLE_ITEM_ID, " +
"SUM(ACCOUNTINGENTRYENTITY.COUNT), " +
"SUM(ACCOUNTINGENTRYENTITY.COUNT * CONVERT(ACCOUNTINGENTRYENTITY.PRICEAMOUNT,DECIMAL(20, 2)))");
accountingTransactionQb.join(accountingEntryQb);
accountingTransactionQb.where().eq(
AccountingTransactionEntity.ACCOUNTING_TRANSACTION_STATE_FIELD_NAME,
AccountingTransactionStateEnum.PRINTED)
.and().between(AccountingTransactionEntity.CREATE_TIME_FIELD_NAME, from, to);
accountingTransactionQb.groupByRaw(
"ACCOUNTINGENTRYENTITY.ACCOUNTABLE_ITEM_ID, ACCOUNTINGENTRYENTITY.TITLE");
String query = accountingTransactionQb.prepareStatementString();
accountingTransactionQb.prepare().getStatement();
Timestamp fromTimestamp = new Timestamp(from.getTime());
Timestamp toTimestamp = new Timestamp(to.getTime());
//TODO: get parameters from accountingTransactionQb
GenericRawResults<Object[]> genericRawResults =
accountingEntryDao.queryRaw(query, new DataType[] { DataType.STRING,
DataType.LONG, DataType.LONG, DataType.BIG_DECIMAL },
fromTimestamp.toString(), toTimestamp.toString());
Is there any way to get these parameters directly from QueryBuider instance?
Yes, there is a way. You need to subclass QueryBuilder and then you can use the appendStatementString(...) method. You provide the argList which then can be used to get the list of arguments.
protected void appendStatementString(StringBuilder sb,
List<ArgumentHolder> argList) throws SQLException {
appendStatementStart(sb, argList);
appendWhereStatement(sb, argList, true);
appendStatementEnd(sb, argList);
}
For example, imagine a new method in ormlite - StatementBuilder.getPreparedStatementParameters();
Good idea. I've made the following changes to the Github repo.
public StatementInfo prepareStatementInfo() throws SQLException {
List<ArgumentHolder> argList = new ArrayList<ArgumentHolder>();
String statement = buildStatementString(argList);
return new StatementInfo(statement, argList);
}
...
public static class StatementInfo {
private final String statement;
private final List<ArgumentHolder> argList;
...
The feature will be in version 4.46. You can build a release from current trunk if you don't want to wait for that release.