custom item writer to write to database using list of list which contains hashmap - spring-batch

i am a new to spring batch and my requirement is to read a dynamic excel sheet and insert into database.I am able to read excel and pass it to writer but only last record in excel sheet gets inserted into database. Here is my code for item writer
#Bean
public ItemWriter<List<LinkedHashMap<String, String>>> tempwrite() {
JdbcBatchItemWriter<List<LinkedHashMap<String, String>>> databaseItemWriter = new JdbcBatchItemWriter<>();
databaseItemWriter.setDataSource(dataSource);
databaseItemWriter.setSql("insert into table values(next value for seq_table,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)");
ItemPreparedStatementSetter<List<LinkedHashMap<String, String>>> valueSetter =
new databasePsSetter();
databaseItemWriter.setItemPreparedStatementSetter(valueSetter);
return databaseItemWriter;
}
and below is my prepared statement setter class
public class databasePsSetter implements ItemPreparedStatementSetter<List<LinkedHashMap<String, String>>> {
#Override
public void setValues(List<LinkedHashMap<String, String>> item, PreparedStatement ps) throws SQLException {
int columnNumber=1;
for(LinkedHashMap<String, String> row:item){
columnNumber=1;
for (Map.Entry<String, String> entry : row.entrySet()) {
ps.setString(columnNumber, entry.getValue());
columnNumber++;
}
}
}
}
I have seen many examples but all of them is using a dto class but i am not sure whether this is the correct way of implementation for list of list which contains hashmap

Related

Is it possible to split a list item into smaller chunks in Spring Batch?

I have a spring batch job which consists of
ItemReader<Product>: reads a product from DB
ItemProcesser<Product, List<RelatedProduct>>: reads related products from the original product
ItemWriter<List<RelatedProduct>>: writes some aspect of related products into DB
Recently, I found a case where a product had so many related products so it caused a long transaction in DB which took about an hour. This happened in our OLTP DB, so we want to split the list of related products into smaller chunks to avoid long transactions.
At first I tried to check the feasibility of code below. But it seemed this kind of code is not possible in Spring Batch.
#Bean
#JobScope
public Step step1() {
return stepBuilderFactory.get("step1")
.chunk<Product, List<RelatedProduct>>(1)
.reader(productItemReader()) // reads a product from DB
.processor(relatedProductProcessor()) // reads related products, the number of related products can be huge.
// Beginning of my hope
.reader(eachRelatedProductInListReader()) // reads each related product in the list.
.chunk(100) // re-aggregate them smaller chunks
// End of my hope
.writer(relatedProductItemWriter()) // writes info about related products
.build()
}
So now I'm thinking of storing the long list in the job context and adding one more step to process RelatedProduct in smaller chunks. But I'm wondering if there are any better ways. Any suggestions?
I implemented a new ItemReader like below to read RelatedProduct one by one using the original reader and processor:
public class FlatteningItemReader<I, O> implements ItemReader<O> {
private final ItemReader<I> originalReader;
private final ItemProcessor<I, ? extends Iterable<O>> iterableProducer;
private final Deque<O> buffer = new ArrayDeque<>();
public FlatteningItemReader(ItemReader<I> originalReader, ItemProcessor<I, ? extends Iterable<O>> iterableProducer) {
this.originalReader = Objects.requireNonNull(originalReader);
this.iterableProducer = Objects.requireNonNull(iterableProducer);
}
#Override
public O read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
if (!buffer.isEmpty()) {
return buffer.pollFirst();
}
I item;
while ((item = originalReader.read()) != null) {
Iterable<O> iter = iterableProducer.process(item);
if (iter == null) continue;
for (O o : iter) {
buffer.offerLast(o);
}
if (!buffer.isEmpty()) {
return buffer.pollFirst();
}
}
return null;
}
}
The step configuration using FlatteningItemReader looks like:
#Bean
#JobScope
public Step step1() {
return stepBuilderFactory.get("step1")
.chunk<Product, List<RelatedProduct>>(1)
// reads each RelatedItem using this new reader
.reader(new FlatteningItemReader(productItemReader(), relatedProductProcessor()))
.processor(relatedProductProcessor())
.writer(relatedProductItemWriter())
.build()
}
I'm not sure if this is good approach in Spring Batch.

JPA : Update operation without JPA query or entitymanager

I am learning JPA, I found out that we have some functions which is already present in Jparepository like save,saveAll,find, findAll etc. but there is nothing like update,
I come across one scenario where I need to update the table, if the value is already present otherwise I need to insert the record in table.
I created
#Repository
public interface ProductInfoRepository
extends JpaRepository<ProductInfoTable, String>
{
Optional<ProductInfoTable> findByProductName(String productname);
}
public class ProductServiceImpl
implements ProductService
{
#Autowired
private ProductInfoRepository productRepository;
#Override
public ResponseMessage saveProductDetail(ProductInfo productInfo)
{
Optional<ProductInfoTable> productInfoinTable =
productRepository.findByProductName(productInfo.getProductName());
ProductInfoTable productInfoDetail;
Integer quantity = productInfo.getQuantity();
if (productInfoinTable.isPresent())
{
quantity += productInfoinTable.get().getQuantity();
}
productInfoDetail =
new ProductInfoTable(productInfo.getProductName(), quantity + productInfo.getQuantity(),
productInfo.getImage());
productRepository.save(productInfoDetail);
return new ResponseMessage("product saved successfully");
}
}
as you can see, I can save the record if the record is new, but when I am trying to save the record which is already present in table it is giving me error related to primarykeyviolation which is obvious. I checked somewhat, we can do the update by creating the entitymanager object or jpa query but what if I dont want to use both of them. is there any other way we can do so ?
update I also added the instance of EntityManager and trying to merge the code
#Override
public ResponseMessage saveProductDetail(ProductInfo productInfo)
{
Optional<ProductInfoTable> productInfoinTable =
productRepository.findByProductName(productInfo.getProductName());
ProductInfoTable productInfoDetail;
Integer price = productInfo.getPrice();
if (productInfoinTable.isPresent())
{
price = productInfoinTable.get().getPrice();
}
productInfoDetail =
new ProductInfoTable(productInfo.getProductName(), price, productInfo.getImage());
em.merge(productInfoDetail);
return new ResponseMessage("product saved successfully");
but no error, no execution of update statements in log, any possible reasons for that ?
}
I suspect you need code like this to solve the problem
public ResponseMessage saveProductDetail(ProductInfo productInfo)
{
Optional<ProductInfoTable> productInfoinTable =
productRepository.findByProductName(productInfo.getProductName());
final ProductInfoTable productInfoDetail;
if (productInfoinTable.isPresent()) {
// to edit
productInfoDetail = productInfoinTable.get();
Integer quantity = productInfoDetail.getQuantity() + productInfo.getQuantity();
productInfoDetail.setQuantity(quantity);
} else {
// to create new
productInfoDetail = new ProductInfoTable(productInfo.getProductName(),
productInfo.getQuantity(), productInfo.getImage());
}
productRepository.save(productInfoDetail);
return new ResponseMessage("product saved successfully");
}

Invoking Kafka Interactive Queries from inside a Stream

I have a particular requirement for invoking an Interactive Query from inside a Stream . This is because I need to create a new Stream which should have data contained inside the State Store. Truncated code below:
tempModifiedDataStream.to(topic.getTransformedTopic(), Produced.with(Serdes.String(), Serdes.String()));
GlobalKTable<String, String> myMetricsTable = builder.globalTable(
topic.getTransformedTopic(),
Materialized.<String, String, KeyValueStore<Bytes, byte[]>>as(
topic.getTransformedStoreName() /* table/store name */)
.withKeySerde(Serdes.String()) /* key serde */
.withValueSerde(Serdes.String()) /* value serde */
);
KafkaStreams streams = new KafkaStreams(builder.build(), kStreamsConfigs());
KStream<String, String> tempAggrDataStream = tempModifiedDataStream
.flatMap((key, value) -> {
try {
List<KeyValue<String, String>> result = new ArrayList<>();
ReadOnlyKeyValueStore<String, String> keyValueStore =
streams .store(
topic.getTransformedStoreName(),
QueryableStoreTypes.keyValueStore());
In the last line, To access the State Store I need to have the KafkaStreams object and the Topology is finalized when I create the KafkaStreams object. The problem with this approach is that the 'tempAggrDataStream' is hence not part of the Topology and that part of the code does not get executed. And I cant move the KafkaStreams definition below as otherwise I can't call the Interactive Query.
I am a bit new to Kafka Streams ; so is this something silly from my side?
If you want to achieve sending all content of the topic content after each data modification, I think you should rather use Processor API.
You could create org.apache.kafka.streams.kstream.Transformer with state store.
For each processing message it will update state store and send all content to downstream.
It is not very efficient, because it will be forwarding for each processing message the whole content of the topic/state store (that can be thousands, millions of records).
If you need only latest value it is enough to set your topic cleanup.policy to compact. And from other site use KTable, which give abstraction of Table (Snapshot of stream)
Sample Transformer code for forwarding whole content of state store is as follow. The whole work is done in transform(String key, String value) method.
public class SampleTransformer
implements Transformer<String, String, KeyValue<String, String>> {
private String stateStoreName;
private KeyValueStore<String, String> stateStore;
private ProcessorContext context;
public SampleTransformer(String stateStoreName) {
this.stateStoreName = stateStoreName;
}
#Override
#SuppressWarnings("unchecked")
public void init(ProcessorContext context) {
this.context = context;
stateStore = (KeyValueStore) context.getStateStore(stateStoreName);
}
#Override
public KeyValue<String, String> transform(String key, String value) {
stateStore.put(key, value);
stateStore.all().forEachRemaining(keyValue -> context.forward(keyValue.key, keyValue.value));
return null;
}
#Override
public void close() {
}
}
More information about Processor APi can be found:
https://docs.confluent.io/current/streams/developer-guide/processor-api.html
https://kafka.apache.org/documentation/streams/developer-guide/processor-api.html
How to combine Processor API with Stream DSL can be found:
https://kafka.apache.org/documentation/streams/developer-guide/dsl-api.html#applying-processors-and-transformers-processor-api-integration

How to Iterate through list with RxJava and perform initial process on first item

I am new to RxJava and finding it very useful for network and database processing within my Android applications.
I have two use cases that I cannot implement completely in RxJava
Use Case 1
Clear down my target database table Table A
Fetch a list of database records from Table B that contain a key field
For each row retrieved from Table B, call a Remote API and persist all the returned data into Table A
The closest I have managed is this code
final AtomicInteger id = new AtomicInteger(0);
DatabaseController.deleteAll(TableA_DO.class);
DatabaseController.fetchTable_Bs()
.subscribeOn(Schedulers.io())
.toObservable()
.flatMapIterable(b -> b)
.flatMap(b_record -> NetworkController.getTable_A_data(b_record.getKey()))
.flatMap(network -> transformNetwork(id, network, NETWORK_B_MAPPER))
.doOnNext(DatabaseController::persistRealmObjects)
.doOnComplete(onComplete)
.doOnError(onError)
.doAfterTerminate(doAfterTerminate())
.doOnSubscribe(compositeDisposable::add)
.subscribe();
Use Case 2
Clear down my target database table Table X
Clear down my target database table Table Y
Fetch a list of database records from Table Z that contain a key field
For each row retrieved from Table B, call a Remote API and persist some of the returned data into Table X the remainder of the data should be persisted into table Y
I have not managed to create any code for use case 2.
I have a number of questions regarding the use of RxJava for these use cases.
Is it possible to achieve both my use cases in RxJava?
Is it "Best Practice" to combine all these steps into an Rx "Stream"
UPDATE
I ended up with this POC test code which seems to work...
I am not sure if its the optimum solution however My API calls return Single and my database operations return Completable so I feel like this is the best solution for me.
public class UseCaseOneA {
public static void main(final String[] args) {
login()
.andThen(UseCaseOneA.deleteDatabaseTableA())
.andThen(UseCaseOneA.deleteDatabaseTableB())
.andThen(manufactureRecords())
.flatMapIterable(x -> x)
.flatMapSingle(record -> NetworkController.callApi(record.getPrimaryKey()))
.flatMapSingle(z -> transform(z))
.flatMapCompletable(p -> UseCaseOneA.insertDatabaseTableA(p))
.doOnComplete(() -> System.out.println("ON COMPLETE"))
.doFinally(() -> System.out.println("ON FINALLY"))
.subscribe();
}
private static Single<List<PayloadDO>> transform(final List<RemotePayload> payloads) {
return Single.create(new SingleOnSubscribe<List<PayloadDO>>() {
#Override
public void subscribe(final SingleEmitter<List<PayloadDO>> emitter) throws Exception {
System.out.println("transform - " + payloads.size());
final List<PayloadDO> payloadDOs = new ArrayList<>();
for (final RemotePayload remotePayload : payloads) {
payloadDOs.add(new PayloadDO(remotePayload.getPayload()));
}
emitter.onSuccess(payloadDOs);
}
});
}
private static Observable<List<Record>> manufactureRecords() {
final List<Record> records = new ArrayList<>();
records.add(new Record("111-111-111"));
records.add(new Record("222-222-222"));
records.add(new Record("3333-3333-3333"));
records.add(new Record("44-444-44444-44-4"));
records.add(new Record("5555-55-55-5-55-5555-5555"));
return Observable.just(records);
}
private static Completable deleteDatabaseTableA() {
return Completable.create(new CompletableOnSubscribe() {
#Override
public void subscribe(final CompletableEmitter emitter) throws Exception {
System.out.println("deleteDatabaseTableA");
emitter.onComplete();
}
});
}
private static Completable deleteDatabaseTableB() {
return Completable.create(new CompletableOnSubscribe() {
#Override
public void subscribe(final CompletableEmitter emitter) throws Exception {
System.out.println("deleteDatabaseTableB");
emitter.onComplete();
}
});
}
private static Completable insertDatabaseTableA(final List<PayloadDO> payloadDOs) {
return Completable.create(new CompletableOnSubscribe() {
#Override
public void subscribe(final CompletableEmitter emitter) throws Exception {
System.out.println("insertDatabaseTableA - " + payloadDOs);
emitter.onComplete();
}
});
}
private static Completable login() {
return Completable.complete();
}
}
This code doesn't address all my use case requirements. Namely being able to transform the remote payload records into multiple Database record types and insert each type into its own specific target databased table.
I could just call the Remote API twice to get the same remote data items and transform first into one database type then secondly into the second database type, however that seems wasteful.
Is there an operand in RxJava where I can reuse the output from my API calls and transform them into another database type?
You have to index the items yourself in some manner, for example, via external counting:
Observable.defer(() -> {
AtomicInteger counter = new AtomicInteger();
return DatabaseController.fetchTable_Bs()
.subscribeOn(Schedulers.io())
.toObservable()
.flatMapIterable(b -> b)
.doOnNext(item -> {
if (counter.getAndIncrement() == 0) {
// this is the very first item
} else {
// these are the subsequent items
}
});
});
The defer is necessary to isolate the counter to the inner sequence so that repetition still works if necessary.

How to get document id (_id) inside an Elasticsearch native script

I've got a native script that runs as a transform when adding a new document. I need the document id but its not passed in as part of the script params. Only the _source is passed in as the params, but not the _id value. Is there any way to get ElasticSearch to pass in the _id? Or some way of reading it ... somehow?
Below is a contrived ElasticSearch native script example that demonstrates what I'm talking about. setNextVar() is called by ElasticSearch and the "ctx" object is passed in. The value is a Map that only has one object in it, the _source object.
But the _id key/value pair is not passed in by ElasticSearch by default. I'm hoping there is a why to config the native script in the mapping json that tells it to pass in the document id.
public class ExampleNativeScript extends AbstractExecutableScript {
/**
* Factory class that serves native script to ElasticSearch
*/
public static class Factory extends AbstractComponent implements NativeScriptFactory {
#Inject
public Factory(Settings settings, String prefixSettings) {
super(settings, prefixSettings);
}
#Override
public ExecutableScript newScript(#Nullable Map<String, Object> params) {
return new ExampleNativeScript(params);
}
}
private final Map<String, Object> variables;
private Map ctx = null;
private Map source = null;
public ExampleNativeScript(Map<String, Object> params) {
this.variables = params;
}
#Override
public void setNextVar(String name, Object value) {
variables.put(name, value);
if (name.equals("ctx")) {
ctx = (Map<String, LinkedHashMap>) value;
source = (Map<String, LinkedHashMap>) ctx.get("_source");
} else {
//never gets here
System.out.println("Unexpected variable");
}
}
#Override
public Object run() {
//PROBLEM: ctx only has _source. _id does not get passed in so this blows chunks
String documentId = ctx.get("_id").toString();
System.out.println(documentId);
return ctx;
}
}
So just did some digging in the ElasticSearch source and found that its hard coded to just pass in the _source field, and none of the other ctx level fields. So there is no way to get/config ElasticSearch to pass in the _id
Below is the DocumentMapper.transformSourceAsMap() function that calls the native script's setNextVar() and run() functions. It shows that the only thing put in the ctx map is the _source field.
public Map<String, Object> transformSourceAsMap(Map<String, Object> sourceAsMap) {
try {
// We use the ctx variable and the _source name to be consistent with the update api.
ExecutableScript executable = scriptService.executable(language, script, scriptType, ScriptContext.Standard.MAPPING, parameters);
Map<String, Object> ctx = new HashMap<>(1);
ctx.put("_source", sourceAsMap);
executable.setNextVar("ctx", ctx);
executable.run();
ctx = (Map<String, Object>) executable.unwrap(ctx);
return (Map<String, Object>) ctx.get("_source");
} catch (Exception e) {
throw new ElasticsearchIllegalArgumentException("failed to execute script", e);
}
}