Spring batch ItemReader locale, import a double with comma - spring-batch

I want to import the following file with Spring Batch
key;value
A;9,5
I model it with the bean
class CsvModel
{
String key
Double value
}
The shown code here is Groovy but the language is irrelevant for the problem.
#Bean
#StepScope
FlatFileItemReader<CsvModel> reader2()
{
// set the locale for the tokenizer, but this doesn't solve the problem
def locale = Locale.getDefault()
def fieldSetFactory = new DefaultFieldSetFactory()
fieldSetFactory.setNumberFormat(NumberFormat.getInstance(locale))
def tokenizer = new DelimitedLineTokenizer(';')
tokenizer.setNames([ 'key', 'value' ].toArray() as String[])
// and assign the fieldSetFactory to the tokenizer
tokenizer.setFieldSetFactory(fieldSetFactory)
def fieldMapper = new BeanWrapperFieldSetMapper<CsvModel>()
fieldMapper.setTargetType(CsvModel.class)
def lineMapper = new DefaultLineMapper<CsvModel>()
lineMapper.setLineTokenizer(tokenizer)
lineMapper.setFieldSetMapper(fieldMapper)
def reader = new FlatFileItemReader<CsvModel>()
reader.setResource(new FileSystemResource('output/export.csv'))
reader.setLinesToSkip(1)
reader.setLineMapper(lineMapper)
return reader
}
Setting up a reader is well known, what was new for me was the first code block, setting up a numberFormat / locale / fieldSetFactory and assign it to the tokenizer. However this doesn't work, I still receive the exception
Field error in object 'target' on field 'value': rejected value [5,0]; codes [typeMismatch.target.value,typeMismatch.value,typeMismatch.float,typeMismatch]; arguments [org.springframework.context.support.DefaultMessageSourceResolvable: codes [target.value,value]; arguments []; default message [value]]; default message [Failed to convert property value of type 'java.lang.String' to required type 'float' for property 'value'; nested exception is java.lang.NumberFormatException: For input string: "9,5"]
at org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper.mapFieldSet(BeanWrapperFieldSetMapper.java:200) ~[spring-batch-infrastructure-4.1.2.RELEASE.jar:4.1.2.RELEASE]
at org.springframework.batch.item.file.mapping.DefaultLineMapper.mapLine(DefaultLineMapper.java:43) ~[spring-batch-infrastructure-4.1.2.RELEASE.jar:4.1.2.RELEASE]
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:180) ~[spring-batch-infrastructure-4.1.2.RELEASE.jar:4.1.2.RELEASE]
So the question is: how do I import floats in the locale de_AT (we write our decimals with a comma like this: 3,141592)? I could avoid this problem with a FieldSetMapper but I want to understand what's going on here and want to avoid the unnecessary mapper class.
And even the FieldSetMapper solution doesn't obey locales out of the box, I have to read a string and convert it myself in a double:
class PnwExportFieldSetMapper implements FieldSetMapper<CsvModel>
{
private nf = NumberFormat.getInstance(Locale.getDefault())
#Override
CsvModel mapFieldSet(FieldSet fieldSet) throws BindException
{
def model = new CsvModel()
model.key = fieldSet.readString(0)
model.value = nf.parse(fieldSet.readString(1)).doubleValue()
return model
}
}
The class DefaultFieldSet has a function setNumberFormat, but when and where do I call this function?

This unfortunately seems to be a bug. I have the same Problem and debugged into the code.
The BeanWrapperFieldSetMapper is not using the methods of DefaultFieldSetFactory, that would do the right conversion, but instead just uses FieldSet.getProperties and does the conversion by itself.
So, I see the following options: Provide the BeanWrapperFieldSetMapper either with PropertyEditors or a ConversionService, or use a different mapper.
Here is a sketch of a conversion Service:
private static class CS implements ConversionService {
#Override
public boolean canConvert(Class<?> sourceType, Class<?> targetType) {
return sourceType == String.class && targetType == double.class;
}
#Override
public boolean canConvert(TypeDescriptor sourceType, TypeDescriptor targetType) {
return sourceType.equals(TypeDescriptor.valueOf(String.class)) &&
targetType.equals(TypeDescriptor.valueOf(double.class)) ;
}
#Override
public <T> T convert(Object source, Class<T> targetType) {
return (T)Double.valueOf(source.toString().replace(',', '.'));
}
#Override
public Object convert(Object source, TypeDescriptor sourceType, TypeDescriptor targetType) {
return Double.valueOf(source.toString().replace(',', '.'));
}
}
and use it:
final BeanWrapperFieldSetMapper<IBISRecord> mapper = new BeanWrapperFieldSetMapper<>();
mapper.setTargetType(YourClass.class);
mapper.setConversionService(new CS());
...
new FlatFileItemReaderBuilder<IBISRecord>()
.name("YourReader")
.delimited()
.delimiter(";")
.includedFields(fields)
.names(names)
.fieldSetMapper(mapper)
.saveState(false)
.resource(resource)
.build();

Related

Writable Classes in mapreduce

How can i use the values from hashset (the docid and offset) to the reduce writable so as to connect map writable with reduce writable?
The mapper (LineIndexMapper) works fine but in the reducer (LineIndexReducer) i get the error that it can't get string as argument when i type this:
context.write(key, new IndexRecordWritable("some string");
although i have the public String toString() in the ReduceWritable too.
I believe the hashset in reducer's writable (IndexRecordWritable.java) maybe isn't taking the values correctly?
I have the below code.
IndexMapRecordWritable.java
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
public class IndexMapRecordWritable implements Writable {
private LongWritable offset;
private Text docid;
public LongWritable getOffsetWritable() {
return offset;
}
public Text getDocidWritable() {
return docid;
}
public long getOffset() {
return offset.get();
}
public String getDocid() {
return docid.toString();
}
public IndexMapRecordWritable() {
this.offset = new LongWritable();
this.docid = new Text();
}
public IndexMapRecordWritable(long offset, String docid) {
this.offset = new LongWritable(offset);
this.docid = new Text(docid);
}
public IndexMapRecordWritable(IndexMapRecordWritable indexMapRecordWritable) {
this.offset = indexMapRecordWritable.getOffsetWritable();
this.docid = indexMapRecordWritable.getDocidWritable();
}
#Override
public String toString() {
StringBuilder output = new StringBuilder()
output.append(docid);
output.append(offset);
return output.toString();
}
#Override
public void write(DataOutput out) throws IOException {
}
#Override
public void readFields(DataInput in) throws IOException {
}
}
IndexRecordWritable.java
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.HashSet;
import org.apache.hadoop.io.Writable;
public class IndexRecordWritable implements Writable {
// Save each index record from maps
private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
public IndexRecordWritable() {
}
public IndexRecordWritable(
Iterable<IndexMapRecordWritable> indexMapRecordWritables) {
}
#Override
public String toString() {
StringBuilder output = new StringBuilder();
return output.toString();
}
#Override
public void write(DataOutput out) throws IOException {
}
#Override
public void readFields(DataInput in) throws IOException {
}
}
Alright, here is my answer based on a few assumptions. The final output is a text file containing the key and the file names separated by a comma based on the information in the reducer class's comments on the pre-condition and post-condition.
In this case, you really don't need IndexRecordWritable class. You can simply write to your context using
context.write(key, new Text(valueBuilder.substring(0, valueBuilder.length() - 1)));
with the class declaration line as
public class LineIndexReducer extends Reducer<Text, IndexMapRecordWritable, Text, Text>
Don't forget to set the correct output class in the driver.
That must serve the purpose according to the post-condition in your reducer class. But, if you really want to write a Text-IndexRecordWritable pair to your context, there are two ways approach it -
with string as an argument (based on your attempt passing a string when you IndexRecordWritable class constructor is not designed to accept strings) and
with HashSet as an argument (based on the HashSet initialised in IndexRecordWritable class).
Since your constructor of IndexRecordWritable class is not designed to accept String as an input, you cannot pass a string. Hence the error you are getting that you can't use string as an argument. Ps: if you want your constructor to accept Strings, you must have another constructor in your IndexRecordWritable class as below:
// Save each index record from maps
private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
// to save the string
private String value;
public IndexRecordWritable() {
}
public IndexRecordWritable(
HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
/***/
}
// to accpet string
public IndexRecordWritable (String value) {
this.value = value;
}
but that won't be valid if you want to use the HashSet. So, approach #1 can't be used. You can't pass a string.
That leaves us with approach #2. Passing a HashSet as an argument since you want to make use of the HashSet. In this case, you must create a HashSet in your reducer before passing it as an argument to IndexRecordWritable in context.write.
To do this, your reducer must look like this.
#Override
protected void reduce(Text key, Iterable<IndexMapRecordWritable> values, Context context) throws IOException, InterruptedException {
//StringBuilder valueBuilder = new StringBuilder();
HashSet<IndexMapRecordWritable> set = new HashSet<>();
for (IndexMapRecordWritable val : values) {
set.add(val);
//valueBuilder.append(val);
//valueBuilder.append(",");
}
//write the key and the adjusted value (removing the last comma)
//context.write(key, new IndexRecordWritable(valueBuilder.substring(0, valueBuilder.length() - 1)));
context.write(key, new IndexRecordWritable(set));
//valueBuilder.setLength(0);
}
and your IndexRecordWritable.java must have this.
// Save each index record from maps
private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
// to save the string
//private String value;
public IndexRecordWritable() {
}
public IndexRecordWritable(
HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
/***/
tokens.addAll(indexMapRecordWritables);
}
Remember, this is not the requirement according to the description of your reducer where it says.
POST-CONDITION: emit the output a single key-value where all the file names are separated by a comma ",". <"marcello", "a.txt#3345,b.txt#344,c.txt#785">
If you still choose to emit (Text, IndexRecordWritable), remember to process the HashSet in IndexRecordWritable to get it in the desired format.

How can I ignore a "$" in a DocumentContent to save in MongoDB?

My Problem is, that if I save a Document with a $ inside the content, Mongodb gives me an exception:
java.lang.IllegalArgumentException: Invalid BSON field name $ xxx
I would like that mongodb ignores the $ character in the content.
My Application is written in java. I read the content of the File and put it as a string into an object. After that the object will be saved with a MongoRepository class.
Someone has any ideas??
Example content
Edit: I heard mongodb has the same problem wit dot. Our Springboot found i workaround with dot, but not for dollar.
How to configure mongo converter in spring to encode all dots in the keys of map being saved in mongo db
If you are using Spring Boot you can extend MappingMongoConverter class and add override methods that do the escaping/unescaping.
#Component
public class MappingMongoConverterCustom extends MappingMongoConverter {
protected #Nullable
String mapKeyDollarReplacemant = "characters_to_replace_dollar";
protected #Nullable
String mapKeyDotReplacement = "characters_to_replace_dot";
public MappingMongoConverterCustom(DbRefResolver dbRefResolver, MappingContext<? extends MongoPersistentEntity<?>, MongoPersistentProperty> mappingContext) {
super(dbRefResolver, mappingContext);
}
#Override
protected String potentiallyEscapeMapKey(String source) {
if (!source.contains(".") && !source.contains("$")) {
return source;
}
if (mapKeyDotReplacement == null && mapKeyDollarReplacemant == null) {
throw new MappingException(String.format(
"Map key %s contains dots or dollars but no replacement was configured! Make "
+ "sure map keys don't contain dots or dollars in the first place or configure an appropriate replacement!",
source));
}
String result = source;
if(result.contains(".")) {
result = result.replaceAll("\\.", mapKeyDotReplacement);
}
if(result.contains("$")) {
result = result.replaceAll("\\$", mapKeyDollarReplacemant);
}
//add any other replacements you need
return result;
}
#Override
protected String potentiallyUnescapeMapKey(String source) {
String result = source;
if(mapKeyDotReplacement != null) {
result = result.replaceAll(mapKeyDotReplacement, "\\.");
}
if(mapKeyDollarReplacemant != null) {
result = result.replaceAll(mapKeyDollarReplacemant, "\\$");
}
//add any other replacements you need
return result;
}
}
If you go with this approach make sure you override the default converter from AbstractMongoConfiguration like below:
#Configuration
public class MongoConfig extends AbstractMongoConfiguration{
#Bean
public DbRefResolver getDbRefResolver() {
return new DefaultDbRefResolver(mongoDbFactory());
}
#Bean
#Override
public MappingMongoConverter mappingMongoConverter() throws Exception {
MappingMongoConverterCustom converter = new MappingMongoConverterCustom(getDbRefResolver(), mongoMappingContext());
converter.setCustomConversions(customConversions());
return converter;
}
.... whatever you might need extra ...
}

Mongodb scala driver custom conversion to JSON

If I am using "native" json support from mongodb oficial scala driver:
val jsonText = Document(...).toJson()
it produces json text with type prefixes for extended types:
{ "$oid" : "AABBb...." } - for ObjectID,
{ "$longNumber" : 123123 } - for Long and etc.
I want to avoid such type conversion and write directly just values for each type. Is it possible somehow to overwrite encoding behavior for some type?
You can subclass JsonWriter and override writeXXX methods. For example, to customize date serialization you can use:
class CustomJsonWriter extends JsonWriter {
public CustomJsonWriter(Writer writer) {
super(writer);
}
public CustomJsonWriter(Writer writer, JsonWriterSettings settings) {
super(writer, settings);
}
#Override
protected void doWriteDateTime(long value) {
doWriteString(DateTimeFormatter.ISO_DATE_TIME
.withZone(ZoneId.of("Z"))
.format(Instant.ofEpochMilli(value)));
}
}
And then you can use the overridden version that way:
public static String toJson(Document doc) {
CustomJsonWriter writer = new CustomJsonWriter(new StringWriter(), new JsonWriterSettings());
DocumentCodec encoder = new DocumentCodec();
encoder.encode(writer, doc, EncoderContext.builder().isEncodingCollectibleDocument(true).build());
return writer.getWriter().toString();
}

Mapping Yes/no to Boolean in ReST API query parameter

I am trying to map yes/no, true/false, Y/N to a boolean in JAX-RS url query parameter, but it maps only true/false successfully, all other values are mapped to false all the time.
I understand when mapping the url query parameters, jAX-RS tries to find the given data type constructor that takes the string argument and converts the query parameter to the object of the declared data type based on what the constructor is doing. Boolean class does takes true/TRUE as true and treats all other values as false.
Is there a way to map yes/no, y/n to true/false?
You could wrap a boolean in something that respects the QueryParam javadoc. In the following example I'm implementing number 3:
#Path("/booleanTest")
public class TestClass {
#GET
public String test(#QueryParam("value") FancyBoolean fancyBoolean) {
String result = "Result is " + fancyBoolean.getValue();
return result;
}
public static class FancyBoolean {
private static final FancyBoolean FALSE = new FancyBoolean(false);
private static final FancyBoolean TRUE = new FancyBoolean(true);
private boolean value;
private FancyBoolean(boolean value) {
this.value = value;
}
public boolean getValue() {
return this.value;
}
public static FancyBoolean valueOf(String value) {
switch (value.toLowerCase()) {
case "true":
case "yes":
case "y": {
return FancyBoolean.TRUE;
}
default: {
return FancyBoolean.FALSE;
}
}
}
}
}
Accessing /booleanTest?value=yes, /booleanTest?value=y or /booleanTest?value=true will return Result is true, any other value will return Result is false.
Using query string boolean just violates single responsibility principle, because you force your function to do more than one thing. I would suggest this style for RESTful:
#GET("/someValue=true")
#GET("/someValue=false")
This means instead of one endpoint you define two :) and in this case any function just focus on its business and there is no need to check false/true.

How to conditionally serialize a field (attribute) using XStream

I am using XStream for serializing and de-serializing an object. For example, a class named Rating is defined as follows:
Public Class Rating {
String id;
int score;
int confidence;
// constructors here...
}
However, in this class, the variable confidence is optional.
So, when the confidence value is known (not 0), an XML representation of a Rating object should look like:
<rating>
<id>0123</id>
<score>5</score>
<confidence>10</confidence>
</rating>
However, when the confidence is unknown (the default value will be 0), the confidence
attribute should be omitted from the XML representation:
<rating>
<id>0123</id>
<score>5</score>
</rating>
Could anyone tell me how to conditionally serialize a field using XStream?
One option is to write a converter.
Here's one that I quickly wrote for you:
import com.thoughtworks.xstream.converters.Converter;
import com.thoughtworks.xstream.converters.MarshallingContext;
import com.thoughtworks.xstream.converters.UnmarshallingContext;
import com.thoughtworks.xstream.io.HierarchicalStreamReader;
import com.thoughtworks.xstream.io.HierarchicalStreamWriter;
public class RatingConverter implements Converter
{
#Override
public boolean canConvert(Class clazz) {
return clazz.equals(Rating.class);
}
#Override
public void marshal(Object value, HierarchicalStreamWriter writer,
MarshallingContext context)
{
Rating rating = (Rating) value;
// Write id
writer.startNode("id");
writer.setValue(rating.getId());
writer.endNode();
// Write score
writer.startNode("score");
writer.setValue(Integer.toString(rating.getScore()));
writer.endNode();
// Write confidence
if(rating.getConfidence() != 0)
{
writer.startNode("confidence");
writer.setValue(Integer.toString(rating.getConfidence()));
writer.endNode();
}
}
#Override
public Object unmarshal(HierarchicalStreamReader arg0,
UnmarshallingContext arg1)
{
return null;
}
}
All that's left for you to do is to register the converter, and provide accessor methods (i.e. getId, getScore, getConfidence) in your Rating class.
Note: your other option would be to omit the field appropriately.