SqlBulkCopy Best Practice - sqlbulkcopy

I have a couple of questions regarding SqlBulkCopy.
I need to insert a few million records to a table from my business component. I am considering the following approaches:
Use SqlBulkCopy to insert data directly into the destination table. Because the table has existing data and index (which I cannot change), so I won't get bulk logging behavior and cannot apply TabLock.
Use SqlBulkCopy to insert data into a heap in temp db in one go (batchsize = 0). Once completed, use a stored procedure to move data from temp table to destination table.
Use SqlBulkCopy to insert data into a heap in temp db but specify a batchsize. Once completed, use a stored procedure to move data from temp table to destination table.
Split data and use multiple SqlBulkCopy to insert into multiple heaps in temp db concurrently. After each chunk of data is uploaded, use a stored procedure to move data from temp table to destination table.
Which approach has shortest end to end time?
If I use SqlBulkCopy to upload data into a table with index, will I be able to query the table at the same time?

Related

Insert into Memory Optimized Table from non optimized

I have two database.
Primary have a DDL triggers so i can't create memory optimized tables there. So i created secondary database and create there table with memory optimized on. Now, in procedure on primary database i need insert copy data from other table to this optimized.
For example:
INSERT INTO InMemory.dbo.DestTable_InMem SELECT * FROM #T;
And i have:
A user transaction that accesses memory optimized tables or natively compiled modules cannot access more than one user database or databases model and msdb, and it cannot write to master.
Did exists some workarounds from it?
I cannot move my procedure to second database.
There is no other way than using a native procedure to INSERT, UPDATE or DELETE in an in-memory table.
See: A Guide to Query Processing for Memory-Optimized Tables
To move from one DB to the other, the source table must exists locally

Postgresql: out of shared memory due to max locks per transaction using temporary table

I am using Postgresql 9.6.
I have a stored proc which is called by Scala. This stored proc is a wrapper, i.e. it will call another stored procs for each input list passed in wrapper. For e.g. wrapper has input list of 100 elements, so the internal stored proc will be called 100 times per element.
The internal proc is data heavy proc, which creates 4-5 temp tables and processes the data and returns.
So wrapper will collect all the data and finally complete.
get_data_synced(date, text, integer[])
Here the text is comma-separated items (10-1000 depending on use -case).
Basically the problem is if I pass a bigger number 100-200 items i.e. in a loop we call the internal procs that many times, it throws the error:
SQL execution failed (Reason: ERROR: out of shared memory
Hint: You might need to increase max_locks_per_transaction.
I understand that create temp table inside the internal function will create locks. Bu each time the proc is called, first thing is DROP and then CREATE the temp table.
DROP TABLE IF EXISTS _temp_data_1;
CREATE TEMP TABLE _temp_data_1 AS (...);
DROP TABLE IF EXISTS _temp_data_2;
CREATE TEMP TABLE _temp_data_2 AS (...);
..
..
..
So even if the proc is called 1000 times, the first thing it does is drop table (which should release locks?) and then create the table.
The max_locks_per_transaction is set to 256.
Now, the transaction is not over until my wrapper function (outside function) is over, right?
So it means that even if I am dropping the temp table, the locks are not released?
Is there a way to release the lock on temp table immediately once my function is complete?
You diagnosis is correct, the locks survive until the end of the transaction. Even if it was dropped in the same transaction that created it, and even if it is a temp table. Perhaps this could be optimized, but that is currently how it works.
For work-arounds, why not just truncate the table, rather than drop and re-create it, if it already exists?

how to dump data into a temporary table(without actually creating the temporary table) from an external table in Hive Script during run time

In SQL stored procedures, we have an option of creating a temporary table "#temp" whose structure is as that of another table that it is referring to. Here we don't explicitly create and mention the structure of "#temp" table.
Do we have similar option is HQL Hive script to create a temp table during run time without actually creating the table structure. Thus I can dump data to temp table and use it. Below code shows an example of #temp table in SQL.
SELECT name, age, gender
INTO #MaleStudents
FROM student
WHERE gender = 'Male'
Hive has the concept of temporary tables, which are local to a user's session. These tables behave just like any other table, and can be created using CTAS commands too. Hive automatically deletes all temporary tables at the end of the Hive session in which they are created.
Read more about them here.
Hive Documentation
DWGEEK
You can create simple temporary table. On this table you can perform any operation.
Once you are done with work and log out of your session they will be deleted automatically.
Syntax for temporary table is :
CREATE TEMPORARY TABLE TABLE_NAME_HERE (key string, value string)

Postgres Upsert vs Truncate and Insert

I have a stream of data that I can replay any time to reload data into a Postgres table. Lets say I have millions of rows in my table and I add a new column. Now I can replay that stream of data to map a key in the data to the column name that I have just added.
The two options I have are:
1) Truncate and then Insert
2) Upsert
Which would be a better option in terms of performance?
The way PostgreSQL does multiversioning, every update creates a new row version. The old row version will have to be reclaimed later.
This means extra work and tables with a lot of empty space in them.
On the other hand, TRUNCATE just throws away the old table, which is very fast.
You can gain extra performance by using COPY instead of INSERT to load bigger amounts of data.

Spring Batch reader for temp table (create and insert) and stored procedure execution combination

I am creating a spring batch application to migrate the data from legacy Sybase database to csv files which can be loaded into target systems.
I am facing problem in designing reader configuration.
Possible combination of inputs for reader:
Direct SQL query (JdbcCursorReader is suitable) - No issues
Stored Procedure (stored procedure reader is suitable) - No issues
Sequence of below steps execution to get input - My Problem
Create temp table
Insert values into temp table
Execute stored procedure (reads input from temp table, process them and write output into same temp table)
Read data from the inserted temp table
I am blocked with this requirement #3, kindly help me with the solution.
Note: I am doing Spring boot application with dynamic configuration for Spring Batch.
ItemReader<TreeMap<Integer, TableColumn>> itemReader = ReaderBuilder.getReader(sybaseDataSource, sybaseJdbcTemplate, workflowBean);
ItemProcessor<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>> itemProcessor = ProcessorBuilder.getProcessor(workflowBean);
ItemWriter<TreeMap<Integer, TableColumn>> itemWriter = WriterBuilder.getWriter(workflowBean);
JobCompletionNotificationListener listener = new JobCompletionNotificationListener();
SimpleStepBuilder<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>> stepBuilder = stepBuilderFactory
.get(CommonJobEnum.SBTCH_JOB_STEP_COMMON_NAME.getValue()).allowStartIfComplete(true)
.<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>>chunk(10000).reader(itemReader);
if (itemProcessor != null) {
stepBuilder.processor(itemProcessor);
}
Step step = stepBuilder.writer(itemWriter).build();
String jobName = workflowBean.getiMTWorkflowTemplate().getNameWflTemplate() + workflowBean.getIdWorkflow();
job = jobBuilderFactory.get(jobName).incrementer(new RunIdIncrementer()).listener(listener).flow(step).end().build();
jobLauncher.run(job, jobParameters);
'Sybase' was the name of a company (which was bought out by SAP several years ago). There were (at least) 4 different database products produced under the Sybase name ... Adaptiver Server Enterprise (ASE), SQL Anywhere, IQ, Advantage DB.
It would help if you state which Sybase database product you're trying to extract data from.
Assuming you're talking about ASE ...
If all you need to do is pull data out of Sybase tables then why jump through all the hoops of writing SQL, procs, Spring code, etc? Or is this some sort of homework assignment (but even so, why go this route)?
Just use bcp (OS-level utility that comes with the Sybase dataserver) to pull the data from your Sybase tables. With a couple command line flags you can tell bcp to write the data to a delimited file.
I'm pretty sure that you will have issues accessing a temporary table within a stored procedure that is created outside a stored procedure because the stored proc runs as the writer of the proc, not the executer. So the temp table doesn't belong to and is not visible to the writer of the proc so it can't access it.
You could either create the temp table within the proc, or use a permanent table and either lock the table when you are using it for this, or create a key on the table which you pass into the proc so it only processes the data you have just passed in.