Optimising performance with StoredProcedureItemReader in Spring Batch - spring-batch

For my project I have to use StoredProcedureItemReader which I believe is very similar to JdbcCursorItemReader. I want to use pagination instead of reading the data row wise using StoredProcedureItemReader and get more performance. Please suggest what will be the best configuration to achieve max performance using StoredProcedureItemReader.

Related

performance write many record (for example 2000000) in Cassandra?

How to achieve best performance for writing huge number of records (for example 2000000) in Cassandra ?
I am using Scala, Datastax driver and phantom in my project. How can I insert these many records in database in a performant way?
2 Million isn't much. I would just use CQL copy from:
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html
The best performance could be achieved by:
using asynchronous operations;
prepared queries;
use consistency level ONE (default, don't change);
use DCAware/TokenAware load balancing policy (default, don't change);
increase the number of requests per connection from default 1024 to higher number, like, 32k;
But with asynchronous queries, the big problem is that you may push more requests that Cassandra may handle, and this could lead to BusyPoolException - to prevent this you need some kind of counting semaphore that won't allow to issue to many requests. Here is an example of such implementation.

How to transport and index Cassandra data on Elastic Search?

I'm starting a nodejs application where I want to index Cassandra data on Elastic Search, but what would be the best way to do that?, I gave a look to Storm to accomplish just that but doesn't seem to be the solution. Primarily, I was thinking to use one client for Cassandra and one client for Elastic Search and apply inserts/updates/deletes twice on my application, being one per client, but doesn't appear to be the way to go, and I'm worried about the consistency of this. There's a better way to transport Cassandra data to be indexed on Elastic Search? Storm would help me to accomplish that? Could someone suggest any techniques to transport one database data to another? I'm in a really doubt here with nowhere to look.
Do you want to move the data from Cassandra to ElasticSearch once and only once? Or you want to keep them in sync?
In both cases, I think Storm is a good fit. I used in the past to move data from our RDBMS into Apache Solr. One thing to keep in mind is the limit of writes that Solr/Elastic search can do. If you increased the parallelism, then you are bringing them to the knees.
Another option could be Apache Hadoop but it is only suitable for one time copying or if you want to copy the data (same data of yesterday + what could be new) every day.

How to execute a cascading JPA/toplink batch update

Background
I have a problem with a JPA cascading batch update that i need to implement.
The update will take some 10000 objects and merge them into the database at once.
The objects have an average depth of 5 objects and an average size of about 3 kb
The persistance provider is Oracle Toplink
This eats a large amount of memory and takes several minutes to complete.
I have looked around and i see 3 possibilities:
Looping through a standard JPA merge statement and flushing at certain intervals
Using JPQL
Using Toplink's own API (Which i have no experience with whatsoever)
So i have a couple of questions
Will i reduce the overhead from the standard merge by using JPQL instead? If i understand correctly, merge causes to entire object tree to be cloned before being invoked. Is it actually faster? Is there some trick to speeding up the process?
How do i do a batch merge using to Toplink API?
And i know that this is subjective but: Does anyone have a best practice for doing large cascading batch updates in JPA/Toplink? Maybe something i didn't consider?
Related questions
Batch updates in JPA (Toplink)
Batch insert using JPA/Toplink
Not sure what you mean by using JPQL? If you can express your update logic in terms of a JPQL update statement, it will be significantly more efficient to do so.
Definitely split your work into batches. Also ensure you are using batch writing and sequence pre-allocation.
See,
http://java-persistence-performance.blogspot.com/2011/06/how-to-improve-jpa-performance-by-1825.html

Using xmlpipe2 with Sphinx

I'm attempting to load large amounts of data directly into Sphinx from Mongo; and currently the best method I've found has been using xmlpipe2.
I'm wondering however if there are ways to just do updates to the dataset, as a full reindex of hundreds of thousands of records can take a while and be a bit intensive on the system.
Is there a better way to do this?
Thank you!
Main plus delta scheme. When all the updates goes to separate smaller index as described here:
http://sphinxsearch.com/docs/current.html#delta-updates

Which nosql solution fits my application HBase OR Hypertable OR Cassandra

I have an application with 100 million of data and growing. I want to scale out before it hits the wall.
I have been reading stuff about nosql technologies which can handle Big Data efficiently.
My needs:
There are more reads than writes.But writes are also significantly large in numbers (read:write=4:3)
Can you please explain difference among HBase, Hypertable and Cassandra? Which one fits my requirements?
Both HBase and Hypertable require hadoop. If you're not using Hadoop anyway (e.g. need to solve map/reduce related problems) - I'd go with cassandra as it is stand-alone
If you already having data Hive is the best solution for your application, Or you develop app from the scratch look into below link that explain overview of the nosql world
http://nosql-database.org/
http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape/