Spring Batch reader for temp table (create and insert) and stored procedure execution combination - spring-batch

I am creating a spring batch application to migrate the data from legacy Sybase database to csv files which can be loaded into target systems.
I am facing problem in designing reader configuration.
Possible combination of inputs for reader:
Direct SQL query (JdbcCursorReader is suitable) - No issues
Stored Procedure (stored procedure reader is suitable) - No issues
Sequence of below steps execution to get input - My Problem
Create temp table
Insert values into temp table
Execute stored procedure (reads input from temp table, process them and write output into same temp table)
Read data from the inserted temp table
I am blocked with this requirement #3, kindly help me with the solution.
Note: I am doing Spring boot application with dynamic configuration for Spring Batch.
ItemReader<TreeMap<Integer, TableColumn>> itemReader = ReaderBuilder.getReader(sybaseDataSource, sybaseJdbcTemplate, workflowBean);
ItemProcessor<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>> itemProcessor = ProcessorBuilder.getProcessor(workflowBean);
ItemWriter<TreeMap<Integer, TableColumn>> itemWriter = WriterBuilder.getWriter(workflowBean);
JobCompletionNotificationListener listener = new JobCompletionNotificationListener();
SimpleStepBuilder<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>> stepBuilder = stepBuilderFactory
.get(CommonJobEnum.SBTCH_JOB_STEP_COMMON_NAME.getValue()).allowStartIfComplete(true)
.<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>>chunk(10000).reader(itemReader);
if (itemProcessor != null) {
stepBuilder.processor(itemProcessor);
}
Step step = stepBuilder.writer(itemWriter).build();
String jobName = workflowBean.getiMTWorkflowTemplate().getNameWflTemplate() + workflowBean.getIdWorkflow();
job = jobBuilderFactory.get(jobName).incrementer(new RunIdIncrementer()).listener(listener).flow(step).end().build();
jobLauncher.run(job, jobParameters);

'Sybase' was the name of a company (which was bought out by SAP several years ago). There were (at least) 4 different database products produced under the Sybase name ... Adaptiver Server Enterprise (ASE), SQL Anywhere, IQ, Advantage DB.
It would help if you state which Sybase database product you're trying to extract data from.
Assuming you're talking about ASE ...
If all you need to do is pull data out of Sybase tables then why jump through all the hoops of writing SQL, procs, Spring code, etc? Or is this some sort of homework assignment (but even so, why go this route)?
Just use bcp (OS-level utility that comes with the Sybase dataserver) to pull the data from your Sybase tables. With a couple command line flags you can tell bcp to write the data to a delimited file.

I'm pretty sure that you will have issues accessing a temporary table within a stored procedure that is created outside a stored procedure because the stored proc runs as the writer of the proc, not the executer. So the temp table doesn't belong to and is not visible to the writer of the proc so it can't access it.
You could either create the temp table within the proc, or use a permanent table and either lock the table when you are using it for this, or create a key on the table which you pass into the proc so it only processes the data you have just passed in.

Related

Can I create a temporary table in a perl script and access it in stored procedure?

I have a perl script through which I'm executing a stored procedure (sybase db).
In the same perl script I'm reading records from a .csv file. somehow I want to get these records in my stored procedure.
Is there anyway I can do this without writing these records onto a new table in db.
I thought of using a temporary table. Can I create a temporary table in a perl script and access it in stored procedure?
I'm assuming your stored proc works on multiple rows of data (otherwise you could call the proc once for each row of data) so fwiw ...
Some basics ....
A temp table ('#' prefix) is associated with the login session that creates it.
As long as the login session remains active then the #temp table remains accessible (assuming you don't issue a 'drop table').
When the login session ends the temp table goes away.
The temp table cannot be accessed by another login session.
Your perl script should be able to create, populate and query the temp table as long as the perl script uses the same login session (aka connection) to perform all actions (create table, insert, select) against the temp table.
Your stored proc would also need to be invoked on the same login session if it's to access the temp table.

Clearing records in HBase table

We are creating a Disaster Recovery System for HBase tables. Because of the restrictions we are not able to use the fancy methods to maintain the replica of the table. We are using Export/Import statements to get the data into HDFS and using that to create tables in the DR Servers.
While Importing the data into HBase table, we are using truncate command to clear the table and load the data fresh into the table. But the truncate statement is taking a long time to delete the rows. Is there are any other effective statements to clear the entire table?
(truncate takes 33 min for ~2500000 records)
disable -> drop -> create table again, maybe ? I don't know if drop takes too long.

SqlBulkCopy Best Practice

I have a couple of questions regarding SqlBulkCopy.
I need to insert a few million records to a table from my business component. I am considering the following approaches:
Use SqlBulkCopy to insert data directly into the destination table. Because the table has existing data and index (which I cannot change), so I won't get bulk logging behavior and cannot apply TabLock.
Use SqlBulkCopy to insert data into a heap in temp db in one go (batchsize = 0). Once completed, use a stored procedure to move data from temp table to destination table.
Use SqlBulkCopy to insert data into a heap in temp db but specify a batchsize. Once completed, use a stored procedure to move data from temp table to destination table.
Split data and use multiple SqlBulkCopy to insert into multiple heaps in temp db concurrently. After each chunk of data is uploaded, use a stored procedure to move data from temp table to destination table.
Which approach has shortest end to end time?
If I use SqlBulkCopy to upload data into a table with index, will I be able to query the table at the same time?

Postgres and temporary tables

I have a stored procedure that use temp table and explicitly drops it when done.
What happed when same procedure is run at same time by 2 different sessions? Sessions are run as single user (webapp).
Will one session interfere with temp table, and data inside it, of other session?
I'm using Postgres 9.0.
In postgresql temporary tables are unique for each session, so no problem.
If I am not totally wrong, the result would be one temp table per query in your procedure which generates the temp table, so the two temp tables would not disturb eachother.

Creating Stored Procedures that can work with different tables

I need to use the same Stored Procedures against many tables all with the same structure in my DB. This is data loaded from customers,with one table/customer and the data needs calculations/checks run before it's loaded to our DataWarehouse.
So far these are the options and issues I've found and I'm looking for a better pattern/approach.
Create a view that points to the
table I want to process, the SPs
then talk to that view. This works
well (especially once I'd worked out
how to create views 'automagically'
based on their columns). But the
view can only be used with one table
at a time, forcing the system to
deal with one customer at a time.
Use dynamic sql within each SP -
makes the SPs much harder to
read/debug and for those reasons has
been ruled out
Create a partitioned view across
all the tables and then use a
paramatised table function to return
just the data we're interested in -
ah but then I can't update the data
as the function returns a table that
can be only used for select
Use dynamic sql inside a function
(can't be done) to create a view
(which also can't be done) .... give
up
Within the SP create a temp table
with over the target table using
dynamic sql, but then the temp table
only exists in the session that runs
the dynamic sql not the 'parent'
session that's running the SP ...
give up
Create a global temp table using
dynamic SQL to avoid the scope issue
of 5, then run the SP against the
global temp table. Still run into
the single customer issue.
Create the view as in 1 within a
transaction and then run all the SPs
and then commit - works fine for one
user, but any others are now blocked
trying to create a new view of the
same name
Use a temporary view ... can't in
T/Sql
Move all the code into .Net - but
we have environment issues where
tsql is much easier to host/run
I know I'm not the only person who has this problem, have any of you good people solved it, please help.
Maybe your approach is wrong, I will go deep in details in a while but it seems that your problem can be solved using SSIS
-- Updated answer:
First, the big picture:
The most affordable way to process the tables dynamically is using a script instead of a stored procedure. If you want to make table access randomly chosen, you certainly will not use any of the performance advantages of stored procedures, i.e. execution plans. A SQL Script can be easily upgraded to point one table at runtime using placeholders and replacing it before executing.
The script can be loaded from the filesystem, a variable, a text column in a table, etc. The loading process consists in read the script content to a string variable. This step occurs once.
The next step is the preparation stage. This step will be executed for each table to be processed. The main business of this step is to replace the table placeholders with the current table being processed. Also is possible to set parameter values like any parameter you can need to pass into the sp that you already wrote.
The last step is the execution of the script. As is already loaded into a variable and the placeholders were set to the current table name, you can safely call a ExecuteSQLTask with the sql variable as the input. This process of course happens for each table you want to process.
Ok. Now let's see this in action.
This is a sample database model:
CREATE TABLE [dbo].[t_n](
[id] [int] IDENTITY(1,1) NOT NULL,
[name] [varchar](50) NOT NULL,
[start] [datetime] NULL,
CONSTRAINT [PK_t_n] PRIMARY KEY CLUSTERED ([id] ASC)
) ON [PRIMARY]
where t_n represents any table (t_1, t_2, t_3, etc).
This is your current stored procedure:
CREATE PROCEDURE SpProcessT_n
AS
BEGIN
SET NOCOUNT ON;
SELECT * FROM [t1];
END
GO
Now, transform this stored procedure to a Sql script, placing a placeholder instead of the table name
SET NOCOUNT ON;
SELECT * FROM [$table_name];
I choose to save this in a .sql file in the filesystem to keep the POC as simple as possible.
Next, create a SSIS Package like this:
These are the settings I choose to set up the loop:
And this is the way you can assign the table name to a variable called appropriately _table_name_
This is the setup of the script task, here you find that the variable _table_name_ has read only access, while a new variable called SqlExec has read/write access:
And this is it's Main function:
public void Main()
{
String Table_Name = Dts.Variables["table_name"].Value.ToString();
String SqlScript;
Regex reg = new Regex(#"\$table_name", RegexOptions.Compiled);
using (var f = File.OpenText(#"c:\sqlscript.sql")) {
SqlScript = f.ReadToEnd();
f.Close();
}
SqlScript = reg.Replace(SqlScript, Table_Name);
Dts.Variables["SqlExec"].Value = SqlScript;
Dts.TaskResult = (int)ScriptResults.Success;
}
You can notice that the Dts Variable SqlExec contains the sql script that will be executed. Now you can set the following options in your ExecuteSqlTask:
Successfully tested in MSSQL 2008, if you put a insert inside the script file you will notice new rows in each table.
Hope this helps!
If your application can afford to have one cut-off day late, then you can have a nightly scheduled job to run an SSIS package that will consolidate all 150+ tables into one single huge table. Since the freshness of the results of the queries against that huge table will then be 1 'date' late, this solution will not include any rows that recently been loaded.
You can actually time the running of this package. If it is still amazingly fast, say within 30 minutes, then you can bet to run it in every few hours, like during: the start of work day, lunch break, and end of day. This way you can have a nearly fresh data to query with.
Write a partitioned view including table names?
SELECT 'TableName', t.* FROM TableName t
UNION ALL
SELECT 'TableName2', t.* FROM TableName2 t
Then write a single instead of trigger which uses dynamic SQL for writing (less testing involved with that use of dynamic SQL because you'd just write the simple CRUD operations once for all tables I'd think)
I would not do this with SQL. What you are describing sounds like a traditional ETL situation.
Since all of the customer tables are the same, I would create a table in the data warehouse with all the columns from the client table, a surrogate key column, and a type identifier. You have an option to create a "staging" table here that will only have data in it during the ETL process, or just working on a single "live" table. I would create the staging table.
Then within SSIS package (don't worry you can still schedule from SQL Server agent, it hasn't totally left the DB server), start the ETL process...
E(xtract): copy the data from your source into the staging table in the data warehouse. You most likely want to use a sub-package within a foreach loop and changing the name of the table that you want to process from an external store (most people would say put this in the warehouse, but its up to you).
T(ransform): run the calculations/checks you were talking about, but do it on the whole set...
L(oad): Copy it to your real within the data warehouse.
There are a couple things I would NOT do.
1. Modify the data in the source table.
2. Try to do this in t-sql. Its just not what tsql is good at.
If you need more detail on this approach, I would probably ask the question with some Business Intelligence tags. I'll be traveling for the next week or so, but I will try to look at the comments to clear anything up if you need me to.
I am fairly certain that the standard way to solve this is using dynamic SQL in each sp (your option 2), which has already been ruled out.
Your goal is to make generic, multi-table SQL. I don't see how you intend to accomplish that without sacrificing some efficiency and readability.