Big number of values IN Query with ItemReader - spring-batch

The following ItemReader gets a list of thousands accounts (acc).
The database that the ItemReader will connected to in order to retrieve the data is HIVE. I don’t have permission to create any table, only read option.
#Bean
#StepScope
public ItemReader<OmsDto> omsItemReader(#Value("#{stepExecutionContext[acc]}") List<String> accountList) {
String inParams = String.join(",", accountList.stream().map(id ->
"'"+id+"'").collect(Collectors.toList()));
String query = String.format("SELECT ..... account IN (%s)", inParams);
BeanPropertyRowMapper<OmsDto> rowMapper = new BeanPropertyRowMapper<>(OmsDto.class);
rowMapper.setPrimitivesDefaultedForNullValue(true);
JdbcCursorItemReader<OmsDto> reader = new JdbcCursorItemReader<OmsDto>();
reader.setVerifyCursorPosition(false);
reader.setDataSource(hiveDataSource());
reader.setRowMapper(rowMapper);
reader.setSql(query);
reader.open(new ExecutionContext());
return reader;
}
This is the error message that I get when using ItemReader:
Caused by: org.springframework.batch.item.ItemStreamException: Failed to initialize the reader
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.open(AbstractItemCountingItemStreamItemReader.java:153) ~[spring-batch-infrastructure-4.2.4.RELEASE.jar:4.2.4.RELEASE]
Caused by: java.sql.SQLException: Error executing query
at com.facebook.presto.jdbc.PrestoStatement.internalExecute(PrestoStatement.java:279) ~[presto-jdbc-0.243.2.jar:0.243.2-128118e]
at com.facebook.presto.jdbc.PrestoStatement.execute(PrestoStatement.java:228) ~[presto-jdbc-0.243.2.jar:0.243.2-128118e]
at com.facebook.presto.jdbc.PrestoPreparedStatement.<init>(PrestoPreparedStatement.java:84) ~[presto-jdbc-0.243.2.jar:0.243.2-128118e]
at com.facebook.presto.jdbc.PrestoConnection.prepareStatement(PrestoConnection.java:130) ~[presto-jdbc-0.243.2.jar:0.243.2-128118e]
at com.facebook.presto.jdbc.PrestoConnection.prepareStatement(PrestoConnection.java:300) ~[presto-jdbc-0.243.2.jar:0.243.2-128118e]
at org.springframework.batch.item.database.JdbcCursorItemReader.openCursor(JdbcCursorItemReader.java:121) ~[spring-batch-infrastructure-4.2.4.RELEASE.jar:4.2.4.RELEASE]
... 63 common frames omitted
Caused by: java.lang.RuntimeException: Error fetching next at https://prestoanalytics-ch2-p.sys.comcast.net:6443/v1/statement/executing/20201118_131314_11079_v3w47/yf55745951e0beccc234c98f36005723457073854/0 returned an invalid response: JsonResponse{statusCode=502, statusMessage=Bad Gateway, headers={cache-control=[no-cache], content-length=[107], content-type=[text/html]}, hasValue=false} [Error: <html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
]
I was sure that the root cause is because of the driver but I have tested the driver with the same SQL this time using DriverManager and its run perfectly.
#Component
public class OmsItemReader implements ItemReader<OmsDto>, StepExecutionListener {
private ItemReader<OmsDto> delegate;
public SikOmsItemReader() {
Properties properties = new Properties();
properties.setProperty("user", "....");
properties.setProperty("password", "...");
properties.setProperty("SSL", "true");
Connection connection = null;
try {
connection = DriverManager.getConnection("jdbc:presto://.....", properties);
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery(
I am not sure what is the different ? Is it the driver or sparing batch ?
I am looking for a workaround. How can I retrieve thousands of accounts via IN clauses with spring batch ?
Thank you

Related

SCDF server dual database connection error

In my spring batch task I have two datasources configured, one for oracle and another for h2.
H2 I'm using for batch and task execution tables and oracle is for real data for batch processing. I'm able to successfully run the task from ide but when I run it from SCDF server I get following error
Caused by: java.sql.SQLException: Unable to start the Universal Connection Pool: oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource: org.h2.jdbc.JdbcSQLInvalidAuthorizationSpecException: Wrong user name or password [28000-200]
Question is, why is it connecting to H2 for oracle db connection as well?
Following is my DB configuration:
#Bean(name = "OracleUniversalConnectionPool")
public DataSource secondaryDataSource() {
PoolDataSource pds = null;
try {
pds = PoolDataSourceFactory.getPoolDataSource();
pds.setConnectionFactoryClassName(driverClassName);
pds.setURL(url);
pds.setUser(username);
pds.setPassword(password);
pds.setMinPoolSize(Integer.valueOf(minPoolSize));
pds.setInitialPoolSize(10);
pds.setMaxPoolSize(Integer.valueOf(maxPoolSize));
} catch (SQLException ea) {
log.error("Error connecting to the database: ", ea.getMessage());
}
return pds;
}
#Bean(name = "batchDataSource")
#Primary
public DataSource dataSource() throws SQLException {
final SimpleDriverDataSource dataSource = new SimpleDriverDataSource();
dataSource.setDriver(new org.h2.Driver());
dataSource.setUrl("jdbc:h2:tcp://localhost:19092/mem:dataflow");
dataSource.setUsername("sa");
dataSource.setPassword("");
return dataSource;
}
I got it resolved, problem was I was using this for setting values to oracle db.
spring.datasource.url: ******
spring.datasource.username: ******
It worked fine from IDE, but when I ran it on SCDF server it was overwritten by default properties being used for database connection of SCDF server. So I just updated the connection properties to :
spring.datasource.oracle.url: ******
spring.datasource.oracle.username: ******
And now it's working as expected.

Error while connecting to druid using SQL interface

I am trying to connect to druid database using avatica jar
Following is the code.
String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica";
Properties connectionProperties = new Properties();
try (Connection connection = DriverManager.getConnection(url, connectionProperties))
{
try (
final Statement statement = connection.createStatement();
final ResultSet resultSet = statement.executeQuery("SELECT COUNT(*) as rowcount FROM wikiticker"))
{
while (resultSet.next())
{
int count = resultSet.getInt("rowcount");
System.out.println("Total records:" + count);
}
resultSet.close();
}
}
catch (SQLException e)
{
e.printStackTrace();
}
I get following exception , Can someone please let me know whats going wrong ? I have set the runtime property to enable sql.
Exception in thread "main" java.lang.RuntimeException: Failed to execute HTTP Request, got HTTP/404
at org.apache.calcite.avatica.remote.AvaticaCommonsHttpClientImpl.send(AvaticaCommonsHttpClientImpl.java:138)
at org.apache.calcite.avatica.remote.RemoteService.apply(RemoteService.java:34)
at org.apache.calcite.avatica.remote.JsonService.apply(JsonService.java:172)
at org.apache.calcite.avatica.remote.Driver.connect(Driver.java:175)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at com.test.druid.sql.Main.main(Main.java:17)
Looks like your broker instance is missing -Ddruid.sql.enabled=true flag on startup. You can refer to http://druid.io/docs/latest/querying/sql.html for further details.

SqlBulkCopy with ObjectReader - Failed to convert parameter value from a String to a Int32

I am using SqlBulkCopy (.NET) with ObjectReader (FastMember) to perform an import from XML based file. I have added the proper column mappings.
At certain instances I get an error: Failed to convert parameter value from a String to a Int32.
I'd like to understand how to
1. Trace the actual table column which has failed
2. Get the "current" on the ObjectReader
sample code:
using (ObjectReader reader = genericReader.GetReader())
{
try
{
sbc.WriteToServer(reader); //sbc is SqlBulkCopy instance
transaction.Commit();
}
catch (Exception ex)
{
transaction.Rollback();
}
}
Does the "ex" carry more information then just the error:
System.InvalidOperationException : The given value of type String from the data source cannot be converted to type int of the specified target column.
Simple Answer
The simple answer is no. One of the reasons .NET's SqlBulkCopy is so fast is that it does not log anything it does. You can't directly get any additional information from the .NET's SqlBulkCopy exception. However, that said David Catriel has wrote an article about this and has delivered a possible solution you can read fully about here.
Even though this method may provide the answer you are looking for I suggest only using the helper method when debugging as this quite possibly could have some performance impact if ran consistently within your code.
Why Use A Work Around
The lack of logging definitely speeds things up, but when you are
pumping hundreds of thousands of rows and suddenly have a failure on
one of them because of a constraint, you're stuck. All the
SqlException will tell you is that something went wrong with a given
constraint (you'll get the constraint's name at least), but that's
about it. You're then stuck having to go back to your source, run
separate SELECT statements on it (or do manual searches), and find the
culprit rows on your own.
On top of that, it can be a very long and iterative process if you've
got data with several potential failures in it because SqlBulkCopy
will stop as soon as the first failure is hit. Once you correct that
one, you need to rerun the load to find the second error, etc.
advantages:
Reports all possible errors that the SqlBulkCopy would encounter
Reports all culprit data rows, along with the exception that row would be causing
The entire thing is run in a transaction that is rolled back at the end, so no changes are committed.
disadvantages:
For extremely large amounts of data it might take a couple of minutes.
This solution is reactive; i.e. the errors are not returned as part of the exception raised by your SqlBulkCopy.WriteToServer() process. Instead, this helper method is executed after the exception is raised to try and capture all possible errors along with their related data. This means that in case of an exception, your process will take longer to run than just running the bulk copy.
You cannot reuse the same DataReader object from the failed SqlBulkCopy, as readers are forward only fire hoses that cannot be reset. You'll need to create a new reader of the same type (e.g. re-issue the original SqlCommand, recreate the reader based on the same DataTable, etc).
Using the GetBulkCopyFailedData Method
private void TestMethod()
{
// new code
SqlConnection connection = null;
SqlBulkCopy bulkCopy = null;
DataTable dataTable = new DataTable();
// load some sample data into the DataTable
IDataReader reader = dataTable.CreateDataReader();
try
{
connection = new SqlConnection("connection string goes here ...");
connection.Open();
bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "Destination table name";
bulkCopy.WriteToServer(reader);
}
catch (Exception exception)
{
// loop through all inner exceptions to see if any relate to a constraint failure
bool dataExceptionFound = false;
Exception tmpException = exception;
while (tmpException != null)
{
if (tmpException is SqlException
&& tmpException.Message.Contains("constraint"))
{
dataExceptionFound = true;
break;
}
tmpException = tmpException.InnerException;
}
if (dataExceptionFound)
{
// call the helper method to document the errors and invalid data
string errorMessage = GetBulkCopyFailedData(
connection.ConnectionString,
bulkCopy.DestinationTableName,
dataTable.CreateDataReader());
throw new Exception(errorMessage, exception);
}
}
finally
{
if (connection != null && connection.State == ConnectionState.Open)
{
connection.Close();
}
}
}
GetBulkCopyFailedData() then opens a new connection to the database,
creates a transaction, and begins bulk copying the data one row at a
time. It does so by reading through the supplied DataReader and
copying each row into an empty DataTable. The DataTable is then bulk
copied into the destination database, and any exceptions resulting
from this are caught, documented (along with the DataRow that caused
it), and the cycle then repeats itself with the next row. At the end
of the DataReader we rollback the transaction and return the complete
error message. Fixing the problems in the data source should now be a
breeze.
The GetBulkCopyFailedData Method
/// <summary>
/// Build an error message with the failed records and their related exceptions.
/// </summary>
/// <param name="connectionString">Connection string to the destination database</param>
/// <param name="tableName">Table name into which the data will be bulk copied.</param>
/// <param name="dataReader">DataReader to bulk copy</param>
/// <returns>Error message with failed constraints and invalid data rows.</returns>
public static string GetBulkCopyFailedData(
string connectionString,
string tableName,
IDataReader dataReader)
{
StringBuilder errorMessage = new StringBuilder("Bulk copy failures:" + Environment.NewLine);
SqlConnection connection = null;
SqlTransaction transaction = null;
SqlBulkCopy bulkCopy = null;
DataTable tmpDataTable = new DataTable();
try
{
connection = new SqlConnection(connectionString);
connection.Open();
transaction = connection.BeginTransaction();
bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.CheckConstraints, transaction);
bulkCopy.DestinationTableName = tableName;
// create a datatable with the layout of the data.
DataTable dataSchema = dataReader.GetSchemaTable();
foreach (DataRow row in dataSchema.Rows)
{
tmpDataTable.Columns.Add(new DataColumn(
row["ColumnName"].ToString(),
(Type)row["DataType"]));
}
// create an object array to hold the data being transferred into tmpDataTable
//in the loop below.
object[] values = new object[dataReader.FieldCount];
// loop through the source data
while (dataReader.Read())
{
// clear the temp DataTable from which the single-record bulk copy will be done
tmpDataTable.Rows.Clear();
// get the data for the current source row
dataReader.GetValues(values);
// load the values into the temp DataTable
tmpDataTable.LoadDataRow(values, true);
// perform the bulk copy of the one row
try
{
bulkCopy.WriteToServer(tmpDataTable);
}
catch (Exception ex)
{
// an exception was raised with the bulk copy of the current row.
// The row that caused the current exception is the only one in the temp
// DataTable, so document it and add it to the error message.
DataRow faultyDataRow = tmpDataTable.Rows[0];
errorMessage.AppendFormat("Error: {0}{1}", ex.Message, Environment.NewLine);
errorMessage.AppendFormat("Row data: {0}", Environment.NewLine);
foreach (DataColumn column in tmpDataTable.Columns)
{
errorMessage.AppendFormat(
"\tColumn {0} - [{1}]{2}",
column.ColumnName,
faultyDataRow[column.ColumnName].ToString(),
Environment.NewLine);
}
}
}
}
catch (Exception ex)
{
throw new Exception(
"Unable to document SqlBulkCopy errors. See inner exceptions for details.",
ex);
}
finally
{
if (transaction != null)
{
transaction.Rollback();
}
if (connection.State != ConnectionState.Closed)
{
connection.Close();
}
}
return errorMessage.ToString();

Why do I get a socket timeout the first time I hit the database after a recompile?

I am using playframework 2.0 and after every recompile I get a socket timeout the first time my app tries to go to the database. I am using the Mongo Driver directly. here is a typical stack trace:
play.core.ActionInvoker$$anonfun$receive$1$$anon$1: Execution exception [[Network: can't call something : ds031907.mongolab.com/107.21.99.26:31907/heroku_app4620908]]
at play.core.ActionInvoker$$anonfun$receive$1.apply(Invoker.scala:82) [play_2.9.1.jar:2.0]
at play.core.ActionInvoker$$anonfun$receive$1.apply(Invoker.scala:63) [play_2.9.1.jar:2.0]
at akka.actor.Actor$class.apply(Actor.scala:290) [akka-actor.jar:2.0]
at play.core.ActionInvoker.apply(Invoker.scala:61) [play_2.9.1.jar:2.0]
at akka.actor.ActorCell.invoke(ActorCell.scala:617) [akka-actor.jar:2.0]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:179) [akka-actor.jar:2.0]
Caused by: com.mongodb.MongoException$Network: can't call something : ds031907.mongolab.com/107.21.99.26:31907/heroku_app4620908
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:227) ~[mongo-java-driver-2.7.3.jar:na]
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:305) ~[mongo-java-driver-2.7.3.jar:na]
at com.mongodb.DBCollection.findOne(DBCollection.java:647) ~[mongo-java-driver-2.7.3.jar:na]
at com.mongodb.DBCollection.findOne(DBCollection.java:626) ~[mongo-java-driver-2.7.3.jar:na]
at models.daos.ModuleDAO.findPublishedModuleById(ModuleDAO.java:445) ~[classes/:na]
at controllers.LearnController.viewModule(LearnController.java:31) ~[classes/:2.0]
Caused by: java.net.SocketException: Operation timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.6.0_31]
at java.net.SocketInputStream.read(SocketInputStream.java:129) ~[na:1.6.0_31]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) ~[na:1.6.0_31]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) ~[na:1.6.0_31]
at java.io.BufferedInputStream.read(BufferedInputStream.java:317) ~[na:1.6.0_31]
at org.bson.io.Bits.readFully(Bits.java:35) ~[mongo-java-driver-2.7.3.jar:na]
And here is my initialization code:
public static DB getDB(){
ensureMongo();
DB db = mongo.getDB(MOJULO_DB);
if(!db.isAuthenticated()){
db.authenticate(MONGO_USERNAME, MONGO_PASSWORD);
if(db.isAuthenticated())
System.out.println("authentication success on db:" + db.getName());
else
System.out.println("db authentication failure");
}
return db;
}
private static synchronized void ensureMongo(){
if(mongo == null){
try{
MongoURI mongoURI = new MongoURI(MONGO_URI);
mongo = new Mongo(mongoURI);
DB db = mongo.getDB(MOJULO_DB);
db.command("ping");
}catch(UnknownHostException ex){
mongo = null;
System.out.println("failed to connect to mongo");
ex.printStackTrace();
}
}
}
public static void disconnect(){
System.out.println("disconnecting from mongo");
if(mongo != null){
mongo.close();
mongo = null;
}
}
I use the getDB method from outside the class to get the db. The method is meant to create the mongo singleton if it does not exists. I always get the authentication success println, but then on the first hit to the database, I get the socket timeout exception.
In my Global class, I close the connection to the database when the application is closed.
#Override
public void onStop(Application app) {
System.out.println("stop");
Logger.info("Application shutdown...");
DBManager.disconnect();
}
Any Ideas?
I am not an expert on MongoDB but can see similarity with other DB connectivity.
How is your connection configured?
It looks (to me) like it may be attempting to load all mappings/DB and table definitions/everything else when you attempt to use DB connection (find method) for the first time.
It may be better to run a simple DB query in your ensureMongo() method to allow the system to re-initialise everything it needs (you may have to set longer time-out on this method).

Result Set is closed error when referencing table via JPA (Websphere and DB2)

I'm using the IBM implentation of Open JPA on WebSphere 7 and I'm having an issue when I'm trying to reference an object that is #ManyToOne and keep getting the following error from DB2:
com.ibm.db2.jcc.b.SqlException: [jcc][t4][10120][10898][3.50.152] Invalid operation: result set is closed. ERRORCODE=-4470, SQLSTATE=null
I'm pulling my hair out as to why this doesn't work and hope that somebody can help.
Here is a simplified view the database schemas:
Table Report
record_id - integer - (primary key - generated by DB2)
agency - integer not null (foreign key to Dropdown table)
Table Dropdown
record_id - integer - (primary key - generated by DB2)
Here is the JPA entity for the Report which references the agency
#ManyToOne(fetch=FetchType.EAGER)
#JoinColumn(name="AGENCY")
private Dropdown agency;
Here is the code where I'm running a named query to get the data and then just iterating over the result set to print out the report id and the agnency. Whenever report.getAgency() is called, I get the "result set is closed" error from DB2:
#SuppressWarnings("unchecked")
public List<Report> getOpenIncidentsForUser(String aceId) throws Exception
{
List<Report> results = null;
EntityManager em = getEntityManager();
try
{
Query query = em.createNamedQuery("getOpenIncidentsForUser");
query.setParameter(1, aceId);
results = (List<Report>) query.getResultList();
Iterator<Report> it = results.iterator();
while(it.hasNext())
{
Report report = it.next();
System.out.println("Report [" + report.getRecordId() + "] Agency: [" + report.getAgency() + "]");
}
}
catch (Exception e)
{
log.fatal("Fatal error getting incidents for user", e);
throw e;
}
finally
{
em.close();
}
return (List<Report>) results;
}
if I don't ever refer to the getAgency method, I can print out anything else about the report with no problems. It only seems to be with the reference to the 2nd table. Any ideas?
I had answered this in responses to my original comment, but realized that I never marked the question as answered, so I wanted to do that officially.
The fix is documented here: https://www.ibm.com/support/knowledgecenter/SSEQTP_8.5.5/com.ibm.websphere.base.doc/ae/tejb_jpatroubleshoot.html
The fix ended up being the resulSetHoldability setting needed to be 1 instead of 2
For XA data sources you have to set downgradeHoldCursorsUnderXa to true, otherwise you could get a persistence exception with this message:
An SQL OPEN for a held cursor was issued on a XA connection
Setting DB2 resultSetHoldability=1 will only work if you are using a non-XA datasource. If you need to keep 2PC, then this is not a solution.
I had this exact problem and finally solved it by hard-coding a transaction around the offending code. This is what I have:
public class RequeueRuleList_back {
/*
* Injected resources ...
*/
#Resource UserTransaction txn;
#PersistenceUnit EntityManagerFactory emf;
:
public List<RequeueRuleBean> getRequeueRules() {
/*
* We need a hard transaction around this code even though it is just a query
* otherwise we cannot use a DB2 XA datasource to do this:
*
* com.ibm.db2.jcc.am.SqlException: [jcc][t4][10120][10898][3.63.75] Invalid operation: result set is closed. ERRORCODE=-4470, SQLSTATE=null
*/
try {
txn.begin();
} catch (Exception e) {
FacesContext.getCurrentInstance().addMessage(null,
new FacesMessage("Error starting transaction: " + e.getMessage()));
return null;
}
EntityManager em = emf.createEntityManager();
:
Query q = em.createQuery("SELECT rr FROM RequeueRule rr");
// Do useful things ...
em.close();
try {
txn.commit();
} catch (Exception e) {
FacesContext.getCurrentInstance().addMessage(null,
new FacesMessage("Error committing transaction: " + e.getMessage()));
}
:
}
}
If you just use jdbc to connect DB2 and didn't use Hibernate etc, you also have got this error. Because in the new JDBC version with DB2 9.7, many functions you shouldn't support in new vesion,althought there are no error running on old version jdbc.
These function include.
1: PreparedStatement
old version
pt.executeUpdate(sql);
new version
pt.executeUpdate();
2: Connection Iteration
old version:
try{
conn = ConnectionFactory.getConnection(ApplicationConstants.LOCAL_DATASOURCE_JNDI_NAME);
sql="select role_id,role_sname,role_sdesc from db2admin.mng_roles "+sql_condition+" order by role_id asc";
pt = conn.prepareStatement(sql.toString());
System.out.println("sql ="+sql);
rs = pt.executeQuery();
while(rs.next()){
i++;
role_id=rs.getInt(1);
role_sname=PubFunction.DoNull(rs.getString(2)).trim();
role_sdesc=PubFunction.DoNull(rs.getString(3)).trim();
role_right=PubFunction.DoNull(newright.getRightsbyRole(conn,role_id)).trim();}
new version
try{
conn = ConnectionFactory.getConnection(ApplicationConstants.LOCAL_DATASOURCE_JNDI_NAME);
sql="select role_id,role_sname,role_sdesc from db2admin.mng_roles "+sql_condition+" order by role_id asc";
pt = conn.prepareStatement(sql.toString());
System.out.println("sql ="+sql);
rs = pt.executeQuery();
while(rs.next()){
i++;
role_id=rs.getInt(1);
role_sname=PubFunction.DoNull(rs.getString(2)).trim();
role_sdesc=PubFunction.DoNull(rs.getString(3)).trim();
role_right=PubFunction.DoNull(newright.getRightsbyRole(null,role_id)).trim();}