Can Spring Batch work with Amazon Redshift? - spring-batch

I'm trying to use Spring Batch (4.0.1.RELEASE) working with Amazon Redshift. I got through the first major problem with Redshift's lack of support for sequences here.
However, now I've run into this when I try to run a job:
10:57:07.122 ERROR [http-nio-8080-exec-4 ] [JobLaunchingService] [] Could not start job [demoJob]
org.springframework.dao.InvalidDataAccessApiUsageException: PreparedStatementCallback; SQL [INSERT INTO BATCH_JOB_EXECUTION_CONTEXT (SHORT_CONTEXT, SERIALIZED_CONTEXT, JOB_EXECUTION_ID) VALUES(?, ?, ?)[Amazon][JDBC](10220) Driver does not support this optional feature.; nested exception is java.sql.SQLFeatureNotSupportedException: [Amazon][JDBC](10220) Driver does not support this optional feature.
This is with the Redshift 1.2.16.1027 JDBC Driver.
Is it even possible to use Redshift as the batch database? Any suggestions on how to get around this?

I'm not sure about you use case, if it is limitation or constrained that you need to have Spring batch only. Also, the jdbc driver, says It doesn't support the batch., then I believe there is no way around to make it work.
As a recommended approach and best practice, in Redshift instead of insert statement, COPY command should be used. Though, you could call the copy command using plain JDBC could be good idea.
You could take a look at one my answer, I have provided earlier.
I'm just copy/pasting it make it handy.
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.util.Properties;
public class RedShiftJDBC {
public static void main(String[] args) {
Connection conn = null;
Statement statement = null;
try {
//Make sure to choose appropriate Redshift Jdbc driver and its jar in classpath
Class.forName("com.amazon.redshift.jdbc42.Driver");
Properties props = new Properties();
props.setProperty("user", "username***");
props.setProperty("password", "password****");
System.out.println("\n\nconnecting to database...\n\n");
//In case you are using postgreSQL jdbc driver.
conn = DriverManager.getConnection("jdbc:redshift://********url-to-redshift.redshift.amazonaws.com:5439/example-database", props);
System.out.println("\n\nConnection made!\n\n");
statement = conn.createStatement();
String command = "COPY my_table from 's3://path/to/csv/example.csv' CREDENTIALS 'aws_access_key_id=******;aws_secret_access_key=********' CSV DELIMITER ',' ignoreheader 1";
System.out.println("\n\nExecuting...\n\n");
statement.executeUpdate(command);
//you must need to commit, if you realy want to have data copied.
conn.commit();
System.out.println("\n\nThats all copy using simple JDBC.\n\n");
statement.close();
conn.close();
} catch (Exception ex) { ex.printStackTrace(); } } }
I hope this gives you some idea. If you have specific question add a comment, I should be able to refocus the answer.

In order to make this work, I had to define a separate MySQL database for the Spring Batch "control" tables. That was the default (#Primary) database in the Batch application. The ItemWriters are fed with a different DataSource, the one that was pointed at Redshift.
So now I've got a DataSource for the Batch tables, one for my source db, and one for the target db. That seems to work, but I'm only using the standard DataSourceTransactionManager so it's not clear at all to me what the transactional boundaries are, if a step fails whether the databases are rolled back the same way. But I am NOT going to use XA!!

Amazon Redshift is not a supported database for Spring Batch. The supported databases are listed here: https://github.com/spring-projects/spring-batch/tree/master/spring-batch-core/src/main/resources/org/springframework/batch/core.

Related

Using JDBCBatchItemWriter for calling an Oracle Procedure gets EmptyResultDataAccessException

I am using Java based configuration for my Spring Batch. I am calling a stored procedure "writer.setSql("call proc (:_name)");"
The data is getting inserted through the procedure. However, I am getting exception " <<<<<<<<<
Thanks
Note: I am skipping "Exception.class" in my step.
The issue is due to the assertion of updates from the JDBCBatchItemWriter. The proc does not return the no.of rows affected like a sql statement. The java code throws the Exception of the count of updates is 0. The solution to the problem stated above is to setAssertUpdates to False " writer.setAssertUpdates(false)".
However, the question still remains on the best writer to use to execute DB objects like procedure or functions and how transactions should be managed.
Refer to the source code from the url below:
http://grepcode.com/file/repo1.maven.org/maven2/org.springframework.batch/spring-batch-infrastructure/3.0.0.RELEASE/org/springframework/batch/item/database/JdbcBatchItemWriter.java
I use Java Configuration. Set the writer to avoid 'assert updates' does the job.
writer.setAssertUpdates(false);

How do we do select query using phantom driver without table defintion

I have streaming of data coming from SparkStreaming. Which i need to process and finally want to store the data in Cassandra. So, earlier i was trying to use SparkCassandra connector. But it doesn't give the access of SparkStreaming Context object on workers. So, I have to use separate cassandra-scala driver. Hence, i ended up with phantom. Now, my question is i have already defined the column family in the cassnandra. So, how do i do the select and update query from scala.
I have followed these documentation link1 but i don't understand why do we need to give the table definition at client (scala code) side. Why can't we just give Keyspace, ClusterPoints and ColumnFamily and be done with it.
object CustomConnector {
val hosts = Seq("IP1", "IP2")
val Connector = ContactPoints(hosts).keySpace("KEYSPACE_NAME")
}
realTimeAgg.foreachRDD{ x => if (x.toLocalIterator.nonEmpty) {
x.foreachPartition {
How to achieve select/insert in Cassandra table here using phantom
}
This is not yet possible using phantom, we are actively working on phantom-spark to allow you to do this, but at this stage in time this is still a few months away.
In the interim, you will have to rely on the spark cassandra connector and use the non type-safe API to achieve this. It's a more unfortunate setup, but in the very near future this will be resolved.

How to rollback transaction in Grails integration tests on MongoDB

How I can (should) configure Grails integration tests to rollback transactions automatically when using MongoDB as datasource?
(I'm using Grails 2.2.1 + mongodb plugin 1.2.0)
For spock integration tests I defined a MongoIntegrationSpec that gives some control of cleaning up test data.
dropDbOnCleanup = true // will drop the entire DB after every feature method is executed.
dropDbOnCleanupSpec = true // will drop the entire DB after after the spec is complete.
dropCollectionsOnCleanup = ["collectionA", "collectionB", ...] // drops collections after every feature method is executed.
dropCollectionsOnCleanupSpec = ["collectionA", "collectionB", ...] // drops collections after the spec is complete.
dropNewCollectionsOnCleanup = true // after every feature method is executed, all new collections are dropped
dropNewCollectionsOnCleanupSpec = true // after the spec is complete, all new collections are dropped
Here's the source
https://github.com/onetribeyoyo/mtm/tree/dev/src/test/integration/com/onetribeyoyo/util/MongoIntegrationSpec.groovy
And the project has a couple usage examples too.
I don't think that it's even possible, because MongoDB doesn't support transactions. You could use suggested static transactional = 'mongo', but it helps only if you didn't flush your data (it's rare situation I think)
Instead you could cleanup database on setUp() manually. You can drop collection for a domain that you're going to test, like:
MyDomain.collection.drop()
and (optinally) fill with all data require for your test.
Can use static transactional = 'mongo' in integration test and/or service class.
Refer MongoDB Plugin for more details.
MongoDB does not support transactions! And hence you cannot use it. The options you have are
1. Go around and drop the collections for the DomainClasses you used.
MyDomain.collection.drop() //If you use mongoDB plugin alone without hibernate
MyDomain.mongo.collection.drop() //If you use mongoDB plugin with hibernate
Draw back is you have to do it for each domain you used
2. Drop the whole database (You don't need to create it explicitly, but you can)
String host = grailsApplication.config.grails.mongo.host
Integer port = grailsApplication.config.grails.mongo.port
Integer databaseName = grailsApplication.config.grails.mongo.databaseName
def mongo = new GMongo(host, port)
mongo.getDB(databaseName).dropDatabase() //this takes 0.3-0.5 seconds in my machin
The second option is easier and faster. To make this work for all your tests, extend IntegrationSpec and add the code to drop the database in the cleanup block (I am assuming you are using Spock test framework) or do a similar thing for JUnit like tests!
Hope this helps!

How can I get my client application name to show up on zos from java?

This page says I can put "clientProgramName" as one of the connection parameters and it will show up on db2 as the correlation ID.
And I quote:
In a java.util.Properties value in the info parameter of a
DriverManager.getConnection call.
We're using z/OS. The z/OS version of DB2 seems a lot more limited in terms of this kind of stuff.
Setting the client program name in the params hash of the connect call seems to have no effect, and when I put it on the end of the connect string url like this (which it also says I can do):
jdbc:db2://localhost:5036/DBNAME:clientProgramName=myprog
I get this error:
[jcc][10165][10051][4.11.77] Invalid database URL syntax:
jdbc:db2://localhost:5036/DBNAME:clientProgramName=myprog.
ERRORCODE=-4461, SQLSTATE=42815
Is there any way to send a custom user string to a z/OS db2 server so that connection can be identified on the server?
Depending on the method you use to connect to DB2, you use:
Class.forName
Class.forName("com.ibm.db2.jcc.DB2Driver");
Properties props = new Properties();
props.put("user", "scott");
props.put("password", "tiger");
props.put("clientProgramName", "My Program 1");
Connection conn = DriverManager.getConnection(
"jdbc:db2://localhost:50000/sample", props);
DataSource
Connection conn = null;
DB2SimpleDataSource ds = new com.ibm.db2.jcc.DB2SimpleDataSource();
ds.setDriverType(4);
ds.setServerName("localhost");
ds.setPortNumber(50000);
ds.setDatabaseName("sample");
ds.setUser("scott");
ds.setPassword("tiger");
ds.setClientProgramName("My Application 2");
conn = ds.getConnection();
I wrote a blog about that: http://angocadb2.blogspot.fr/2012/12/nombre-de-la-conexion-java-en-db2-java.html (Use your favorite translator because it is in Spanish)
According to this page on Info Center, there should be a function on the DB2Connection interface that allows you to change your application identifier, setDB2ClientApplicationInformation (I can't link directly, because there is no anchor, just search for that name).
You can pull the current application ID using the CURRENT CLIENT_APPLNAME special register:
SELECT CURRENT CLIENT_APPLNAME FROM SYSIBM.SYSDUMMY1
There are some other ways to set that register listed on the Info Center link listed above, including the WLM_SET_CLIENT_INFO function.
I am no DB2 expert, but I am looking at a trace record, generated by DB2 for z/OS, that contains a "correlation ID" (field QWHCCV in the product section correlation header of the trace record) that matches the value I set using setClientProgramName (method of the DB2 data source in my Java application).
My Java application is similar to the "DataSource" example given by AngocA, which is similar to the code quoted in the IBM technote 'The name of a DB2 JDBC application appears as "db2jcc_application". How to change it?'. This Java application, running on my Windows PC, connects to DB2 for z/OS. It also - and this is important, depending on which DB2 traces you have started (discussed below) - actually does something after connecting. For example:
pstmt=conn.prepareStatement("SELECT ... ");
rset=pstmt.executeQuery();
When you say, regarding the first example given by AngocA, "it doesn't do anything": what did you hope to see? Exactly where are you looking, what are you looking for, and what method (or tool) are you using to look for it?
For example, if you are looking for SMF type 100, 101, or 102 records (generated by DB2 traces) containing QWHCCV field values that match your correlation ID, then (with apologies if this is the bleeding obvious, teaching you how to suck eggs), on DB2 for z/OS, you need to start the DB2 traces (using the DB2 command START TRACE) that generate those records. Otherwise, there will be nothing to see ("it doesn't do anything"). Note that not all DB2 trace records generated by an application (such as the Java application described above) will contain your correlation ID; prior to a certain point in processing, the correlation ID of such records will have a different value (but that is getting off-topic, and anyway is about as far as I am comfortable describing).
Warning: Experiment with starting DB2 traces on a "sandbox" (development or test) DB2 system, not a production DB2 system. DB2 traces can result in large volumes of data.
You will also see the correlation ID in the message text of some DB2 V10 messages (such as DSNL027I) after "THREAD-INFO=".
For me I had to add a semicolon after each connection parameter.
EX for your case:
jdbc:db2://localhost:5036/DBNAME:clientProgramName=myprog;
EX with multiple params:
jdbc:db2://localhost:5036/DBNAME:clientProgramName=myprog;enableSysplexWLB=true;blah=true;

HSQLDB and in-memory files

Is it possible to setup HSQLDB in a way, so that the files with the db information are written into memory instead of using actual files? I want to use hsqldb to export some data structures together with hibernate mappings. Is is, however, not possible to write temporary files, so that I need to generate the files in-memory and return a stream with their contents as a response.
Setting hsqldb to use nio seems not to be a solution, because there is no way to get hold of those files before they get written onto the filesystem.
What I'm thinking of is a protocol handler for hsqldb, but I didn't find a suitable solution yet.
Just to describe in other words: A hack solution would be to pass hsqldb a stream or several streams. It would then during its operation write data into those streams. After all data is written, the user of the db could then use those streams to send it back over the network.
Yes, of course, we use it all the time for integration testing.
use as url : jdbc:hsqldb:mem:aname
see here for more details
DbUnit offers a handy database dump method as part of their package :
// database connection
Class driverClass = Class.forName("org.hsqldb.jdbcDriver");
Connection jdbcConnection = DriverManager.getConnection(
"jdbc:hsqldb:sample", "sa", "");
IDatabaseConnection connection = new DatabaseConnection(jdbcConnection);
// full database export
IDataSet fullDataSet = connection.createDataSet();
FlatXmlDataSet.write(fullDataSet, new FileOutputStream("full.xml"));
see DbUnit FAQ for more details. Of course there are routines to restore the data, as that is actually the puropose of the package : prepare a test database for integration testing. Usually we do this with an annotation, but you'll have to use tha API for that.