How to execute an update after each item writing in Spring batch? - spring-batch

I am doing a database read and database write as spring task. It's running fine. The after job method also is getting executed fine. But my requirement is after each insert of an entry I need to update a flag in the source database. How can we achieve this?

Consider using a CompositeItemWriter - that has 2 delegate writers
Delegate writer 1 - performs the insert into the target database
Delegate writer 2 - update the status in the source database
If you really need to commit after each insert - you will need to set the commit-interval for the step to 1. Do remember that setting the commit interval 1 means very low performance - so unless there is a compelling reason do not set the commit interval to 1

if the inserted data contains some data to identify the insert happened (insert date, status flag, etc.) you could run a simple Taskletstep which executes an update statement like
update ....
set flag = flag.value
where insert.date = ....

Related

Azure Data Factory Stop 2 triggers from executing at same time

I have two ADFv2 triggers.
One is set to execute every 3 mins and another every 20 mins.
They execute different pipelines but there is an overlap as both touch the same database table which I want to prevent.
Is there a way to set them up so if one is already running and the other is scheduled to start, it is instead queued until the running trigger is finished?
Not natively AFAIK. You can use the pipeline's concurrency property setting to get this behaviour but only for a single pipeline.
Instead you could (we have):
Use Validation activity to block if a sentinel blob exists and have your other pipeline write and delete the blob when it starts/ends.
Likewise have one pipeline set a flag in a control table on the database that you can examine
If you can tolerate changing your frequencies to have a common factor, create a master pipeline that Execute Pipeline's your current two pipelines; make the longer one only called every n-th run using MOD. Then you can use the concurrency setting on the outer pipeline to make sure the next trigger gets queued until the current run ends.
Use REST API https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically#rest-api in one pipeline to check if the other is running
Jason's post gave me an idea for a more simple solution.
I have two triggers. Each executes at different schedules and different pipelines.
On occasion the schedule on these triggers can overlap. In this circumstance the trigger that fires while the other is running should not run. Only one to be running at any one time.
I did this using the following.
Create a control table with a IsJobRunning BIT (flag) column
When a trigger fires, the pipeline associated with it will execute an SP that will check the Control table.
If the value in the IsJobRunning is 0 then UPDATE the IsJobRunning column to 1 and continue executing,
if 1 then RAISEERROR - a dummy error - stop executing.
IF (SELECT J.IsJobRunning FROM '[[Control table ]]' ) = 1
BEGIN
SET #ERRMSG = N'**INFORMATIONAL ONLY** Other ETL trigger job is running - so stop this attempt ' ;
SET #ErrorSeverity = 16 ;
-- Note: this is only a INFORMATIONAL message and not an actual error.
RAISERROR (#ERRMSG,#ErrorSeverity,1 ) WITH NOWAIT;
RETURN 1;
END ;
ELSE
BEGIN
-- set #IsJobRunning to RUNNING
EXEC '[[ UPDATE IsJobRunning on COntrol table]] ' ;
END ;
This looks like this in the pipeline.
This logic is in both Pipelines.

Pentaho Data Integration - Assure that one step will be run before another

I have a transformation in Pentaho Data Integration that stores data in several tables from a database.
But this database has constraints, meaning I can't put things in a table before the related data is put in another table.
Sometimes it works, sometimes it doesn't, depends on concurrency luck.
So I need to assure that Table Output 1 gets entirely run before Table Output 2 starts.
How can I do this?
You can use a step named "Block this step until steps finish".
You place it before the step that needs to wait. And inside the block you define which steps are to be waited for.
Below, suppose, Table Output 2 contains a foreign key to a field in table 1, but the rows you're going to reference in table 2 still don't exist in table 1. This means Table Output 2 needs to wait until Table Output finishes.
Place the "block" connected before Table Output 2:
Then enter the properties of the "block" step. Inside, add Table Output in the list (and any other steps you want to wait for):
For that, you can use job instead of transformation. because in transformation all the steps run parallelly. so use a job in that add the first transformation in which table output1 will be executed first and in second transformation table output2 will be performed

How do I perform batch upsert in Spring JDBC?

I have list of records, I want to perform following tasks using SpringJDBCTemplate
(1) Update existing records
(2) Insert new records.
Don't know how this happens using jdbcTemplate of spring.
Any insight?
You just use one of the various forms of batchUpdate for the update. Then you check the return value which will contain 1 if the row was present and 0 otherwise. For the later, you perform another batchUpdate with the insert statements.

multiple cron jobs on the same postgres table

I have a cron job that runs every 2 mins it takes a 10 records from a postgres table and working on them then it set a flag when it is finished. i want to make sure if the fist cron runs and takes more than 2 min the other one will run on different data on DBs not on the same data.
is there any why to handle this case?
This can be solved using a Database Transaction.
BEGIN;
SELECT
id,status,server
FROM
table_log
WHERE
(direction = '2' AND status_log = '1')
LIMIT 100
FOR UPDATE SKIP LOCKED;
what are we doing?
We are Selecting all rows available (not locked) from other cron-jobs that might be running. And selecting them for update. So this means all this Query grabs its unlocked and all results will be locked for this cron-job only.
how to update my locked rows?
Simple use a for loop on your processor language (Python, Ruby, PHP) and do a concatenation for each update remember we are building 1 single update.
UPDATE table_log SET status_log = '6' ,server = '1' WHERE id = '1';
Finally we use
COMMIT;
And all rows locked will be updated. This prevents other Queries from touching the same data at the same time. Hope it helps.
Turn your "finished" flag from binary to ternary ("needs work", "in process", "finished"). You also might want to store the pid of the "in process" process, in case it dies and you need to clean it up, and a timestamp for when it started.
Or use a queueing system that someone already wrote and debugged for you.

What is a "batch", and why is GO used?

I have read and read over MSDN, etc. Ok, so it signals the end of a batch.
What defines a batch? I don't see why I need go when I'm pasting in a bunch of scripts to be run all at the same time.
I've never understood GO. Can anyone explain this better and when I need to use it (after how many or what type of transactions)?
For example why would I need GO after each update here:
UPDATE [Country]
SET [CountryCode] = 'IL'
WHERE code = 'IL'
GO
UPDATE [Country]
SET [CountryCode] = 'PT'
WHERE code = 'PT'
GO is not properly a TSQL command.
Instead it's a command to the specific client program which connects to an SQL server (Sybase or Microsoft's - not sure about what Oracle does), signalling to the client program that the set of commands that were input into it up till the "go" need to be sent to the server to be executed.
Why/when do you need it?
GO in MS SQL server has a "count" parameter - so you can use it as a "repeat N times" shortcut.
Extremely large updates might fill up the SQL server's log. To avoid that, they might need to be separated into smaller batches via go.
In your example, if updating for a set of country codes has such a volume that it will run out of log space, the solution is to separate each country code into a separate transaction - which can be done by separating them on the client with go.
Some SQL statements MUST be separated by GO from the following ones in order to work.
For example, you can't drop a table and re-create the same-named table in a single transaction, at least in Sybase (ditto for creating procedures/triggers):
> drop table tempdb.guest.x1
> create table tempdb.guest.x1 (a int)
> go
Msg 2714, Level 16, State 1
Server 'SYBDEV', Line 2
There is already an object named 'x1' in the database.
> drop table tempdb.guest.x1
> go
> create table tempdb.guest.x1 (a int)
> go
>
GO is not a statement, it's a batch separator.
The blocks separated by GO are sent by the client to the server for processing and the client waits for their results.
For instance, if you write
DELETE FROM a
DELETE FROM b
DELETE FROM c
, this will be sent to the server as a single 3-line query.
If you write
DELETE FROM a
GO
DELETE FROM b
GO
DELETE FROM c
, this will be sent to the server as 3 one-line queries.
GO itself does not go to the server (no pun intended). It's a pure client-side reserved word and is only recognized by SSMS and osql.
If you will use a custom query tool to send it over the connection, the server won't even recognize it and issue an error.
Many command need to be in their own batch, like CREATE PROCEDURE
Or, if you add a column to a table, then it should be in its own batch.
If you try to SELECT the new column in the same batch it fails because at parse/compile time the column does not exist.
GO is used by the SQL tools to work this out from one script: it is not a SQL keyword and is not recognised by the engine.
These are 2 concrete examples of day to day usage of batches.
Edit: In your example, you don't need GO...
Edit 2, example. You can't drop, create and permission in one batch... not least, where is the end of the stored procedure?
IF OBJECT_ID ('dbo.uspDoStuff') IS NOT NULL
DROP PROCEDURE dbo.uspDoStuff
GO
CREATE PROCEDURE dbo.uspDoStuff
AS
SELECT Something From ATable
GO
GRANT EXECUTE ON dbo.uspDoStuff TO RoleSomeOne
GO
Sometimes there is a need to execute the same command or set of commands over and over again. This may be to insert or update test data or it may be to put a load on your server for performance testing. Whatever the need the easiest way to do this is to setup a while loop and execute your code, but in SQL 2005 there is an even easier way to do this.
Let's say you want to create a test table and load it with 1000 records. You could issue the following command and it will run the same command 1000 times:
CREATE TABLE dbo.TEST (ID INT IDENTITY (1,1), ROWID uniqueidentifier)
GO
INSERT INTO dbo.TEST (ROWID) VALUES (NEWID())
GO 1000
source:
http://www.mssqltips.com/tip.asp?tip=1216
Other than that it marks the "end" of an SQL block (e.g. in a stored procedure)... Meaning you're on a "clean" state again... e.G: Parameters used in the statement before the code are reset (not defined anymore)
As everyone already said, "GO" is not part of T-SQL. "GO" is a batch separator in SSMS, a client application used to submit queries to the database. This means that declared variables and table variables will not persist from code before the "GO" to code following it.
In fact, GO is simply the default word used by SSMS. This can be changed in the options if you want. For a bit of fun, change the option on someone else's system to use "SELECT" as a batch seperator instead of "GO". Forgive my cruel chuckle.
It is used to split logical blocks. Your code is interpreted into sql command line and this indicate next block of code.
But it could be used as recursive statement with specific number.
Try:
exec sp_who2
go 2
Some statement have to be delimited by GO:
use DB
create view thisViewCreationWillFail