kafka-connect-jdbc does not fetch consecutive timestamp from source - postgresql

I use kafka-connect-jdbc-4.0.0.jar and postgresql-9.4-1206-jdbc41.jar
configuration of connector of kafka connect
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "updated_at",
"topic.prefix": "streaming.data.v2",
"connection.password": "password",
"connection.user": "user",
"schema.pattern": "test",
"query": "select * from view_source",
"connection.url": "jdbc:postgresql://host:5432/test?currentSchema=test"
}
I have configured two connectors one source and another sink using the jdbc driver, against a postgresql database ("PostgreSQL 9.6.9")
everything works correctly
I have doubts in how the connector collects the source data, looking at the log I see that between the execution of the queries there is a time difference of 21 seconds
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Checking for next block of results from TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} prepared SQL query: select * from view_source WHERE "updated_at" > ? AND "updated_at" < ? ORDER BY "updated_at" ASC (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG executing query select CURRENT_TIMESTAMP; to get current time from database (io.confluent.connect.jdbc.util.JdbcUtils)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:17:07.000 end time = 2019-01-11 08:20:18.985 (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:19[2019-01-11 08:20:19,070] DEBUG Resetting querier TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)
11/1/2019 9:20:49[2019-01-11 08:20:49,499] DEBUG Checking for next block of results from TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} prepared SQL query: select * from view_source WHERE "updated_at" > ? AND "updated_at" < ? ORDER BY "updated_at" ASC (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG executing query select CURRENT_TIMESTAMP; to get current time from database (io.confluent.connect.jdbc.util.JdbcUtils)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:20:39.000 end time = 2019-01-11 08:20:49.500 (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
the first query collects data between 08: 17: 07.000 and 08: 20: 18.985, but the second gathers data between 08: 20: 39.000 and 08: 20: 49.500 .. between both there are 21 seconds of difference in which there may be records ...
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:17:07.000 end time = 2019-01-11 08:20:18.985
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:20:39.000 end time = 2019-01-11 08:20:49.500
I assume that one of the data is the last record obtained and the other value the timestamp of the moment
I can not find an explanation about this
Is the normal operation of the connector?
Should you assume that you are not always going to collect all the information?

The JDBC connector is not guaranteed to retrieve every message. For that, you need log-based Change Data Capture. For Postgres that is provided by Debezium and Kafka Connect.
You can read more about this here.
Disclaimer: I work for Confluent, and wrote the above blog
Edit: This is now a recording available of the above blog too from ApacheCon 2020: 🎥 https://rmoff.dev/no-more-silos

Related

Get postgres query log statement and duration as one record

I have log_min_duration_statement=0 in config.
When I check log file, sql statement and duration are saved into different rows.
(Not sure what I have wrong, but statement and duration are not saved together as this answer points)
As I understand session_line_num for duration record always equals to session_line_num + 1 for relevant statement, for same session of course.
Is this correct? is below query reliable to correctly get statement with duration in one row?
(csv log imported into postgres_log table):
WITH
sql_cte AS(
SELECT session_id, session_line_num, message AS sql_statement
FROM postgres_log
WHERE
message LIKE 'statement%'
)
,durat_cte AS (
SELECT session_id, session_line_num, message AS duration
FROM postgres_log
WHERE
message LIKE 'duration%'
)
SELECT
t1.session_id,
t1.session_line_num,
t1.sql_statement,
t2.duration
FROM sql_cte t1
LEFT JOIN durat_cte t2
ON t1.session_id = t2.session_id AND t1.session_line_num + 1 = t2.session_line_num;

SQL Error: ERROR: not all tokens processed

Am getting below error in Postgres while executing insert and delete queries. I have around 50 inserts and 50 delete statements. When executed an getting the error as,
SQL Error: ERROR: not all tokens processed
The error is not consistent all the time,
For example,
My 20th delete statement is getting failed
Next time when the same queries are executed, 25th delete statement is getting failed
And when those statements are executed alone, there is no failure.
Not sure if it is a database load issue or infrastructure related issue.
Any suggestion would be helpful
Below is the query,
WITH del_table_1 AS
(
delete from table_1 where to_date('01-'||col1,'DD-mm-YYYY') < current_date-1
RETURNING *
)
update control_table set deleted_count = cnt, status = 'Completed',
update_user_id = 'User', update_datetime = current_date from
(select 'Table1' as table_name, count(*) as cnt from del_table_1) aa
where
control_table.table_name = aa.table_name
and control_table.table_name = 'Table1'
and control_table.status = 'Pending';

Is it possible to have hibernate generate update from values statements for postgresql?

Given a postgresql table
Table "public.test"
Column | Type | Modifiers
----------+-----------------------------+-----------
id | integer | not null
info | text |
And the following values :
# select * from test;
id | info
----+--------------
3 | value3
4 | value4
5 | value5
As you may know with postgresql you can use this kind of statements to update multiples rows with different values :
update test set info=tmp.info from (values (3,'newvalue3'),(4,'newvalue4'),(5,'newvalue5')) as tmp (id,info) where test.id=tmp.id;
And it results in the table being updated in a single queries to :
# select * from test;
id | info
----+--------------
3 | newvalue3
4 | newvalue4
5 | newvalue5
I have been looking around everywhere as to how to make hibernate generate this kind of statements for update queries. I know how to make it work for insert queries (with reWriteBatchedInserts jdbc option and hibernate batch config options).
But is it possible for update queries or do I have to write the native query myself ?
No matter what I do, hibernate always sends separate update queries to the database (I'm looking to the postgresql server statements logs for this affirmation).
2020-06-18 08:19:48.895 UTC [1642] LOG: execute S_6: BEGIN
2020-06-18 08:19:48.895 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.895 UTC [1642] DETAIL: parameters: $1 = 'newvalue3', $2 = '3'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.896 UTC [1642] DETAIL: parameters: $1 = 'newvalue4', $2 = '4'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.896 UTC [1642] DETAIL: parameters: $1 = 'newvalue4', $2 = '5'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_1: COMMIT
I always find it many times faster to issue a single massive update query than many separate update targeting single rows. With many seperate update queries, even though they are sent in a batch by the jdbc driver, they still need to be processed sequentially by the server, so it is not as efficient as a single update query targeting multiples rows. So if anyone has a solution that wouldn't involve writing native queries for my entities, I would be very glad !
Update
To further refine my question I want to add a clarification. I'm looking for a solution that wouldn't abandon Hibernate dirty checking feature for entities updates. I'm trying to avoid to write batch update queries by hand for the general case of having to updating a few basic fields with different values on an entity list. I'm currently looking into the SPI of hibernate to see it if it's doable. org.hibernate.engine.jdbc.batch.spi.Batch seems to be the proper place but I'm not quite sure yet because I've never done anything with hibernate SPI). Any insights would be welcomed !
You can use Blaze-Persistence for this which is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model.
It does not yet support the FROM clause in DML, but that is about to land in the next release: https://github.com/Blazebit/blaze-persistence/issues/693
Meanwhile you could use CTEs for this. First you need to define a CTE entity(a concept of Blaze-Persistence):
#CTE
#Entity
public class InfoCte {
#Id Integer id;
String info;
}
I'm assuming your entity model looks roughly like this
#Entity
public class Test {
#Id Integer id;
String info;
}
Then you can use Blaze-Persistence like this:
criteriaBuilderFactory.update(entityManager, Test.class, "test")
.with(InfoCte.class, false)
.fromValues(Test.class, "newInfos", newInfosCollection)
.bind("id").select("newInfos.id")
.bind("info").select("newInfos.info")
.end()
.set("info")
.from(InfoCte.class, "cte")
.select("cte.info")
.where("cte.id").eqExpression("test.id")
.end()
.whereExists()
.from(InfoCte.class, "cte")
.where("cte.id").eqExpression("test.id")
.end()
.executeUpdate();
This will create an SQL query similar to the following
WITH InfoCte(id, info) AS(
SELECT t.id, t.info
FROM (VALUES(1, 'newValue', ...)) t(id, info)
)
UPDATE test
SET info = (SELECT cte.info FROM InfoCte cte WHERE cte.id = test.id)
WHERE EXISTS (SELECT 1 FROM InfoCte cte WHERE cte.id = test.id)

PlayFramework 2 + Ebean - raw Sql Update query - makes no effect on db

I have a play framework 2.0.4 application that wants to modify rows in db.
I need to update 'few' messages in db to status "opened" (read messages)
I did it like below
String sql = " UPDATE message SET opened = true, opened_date = now() "
+" WHERE id_profile_to = :id1 AND id_profile_from = :id2 AND opened IS NOT true";
SqlUpdate update = Ebean.createSqlUpdate(sql);
update.setParameter("id1", myProfileId);
update.setParameter("id2", conversationProfileId);
int modifiedCount = update.execute();
I have modified the postgresql to log all the queries.
modifiedCount is the actual number of modified rows - but the query is in transaction.
After the query is done in the db there is ROLLBACK - so the UPDATE is not made.
I have tried to change db to H2 - with the same result.
This is the query from postgres audit log
2012-12-18 00:21:17 CET : S_1: BEGIN
2012-12-18 00:21:17 CET : <unnamed>: UPDATE message SET opened = true, opened_date = now() WHERE id_profile_to = $1 AND id_profile_from = $2 AND opened IS NOT true
2012-12-18 00:21:17 CET : parameters: $1 = '1', $2 = '2'
2012-12-18 00:21:17 CET : S_2: ROLLBACK
..........
Play Framework documentation and Ebean docs - states that there is no transaction /if not declared or transient if needed per query/.
So... I have made the trick
Ebean.beginTransaction();
int modifiedCount = update.execute();
Ebean.commitTransaction();
Ebean.endTransaction();
Logger.info("update mod = " + modifiedCount);
But this makes no difference - the same behavior ...
Ebean.execute(update);
Again - the same ..
Next step i did - I annontated the method with
#Transactional(type=TxType.NEVER)
and
#Transactional(type=TxType.MANDATORY)
None of them made a difference.
I am so frustrated with Ebean :(
Anybody can help, please ?
BTW.
I set
Ebean.getServer(null).getAdminLogging().setDebugGeneratedSql(true);
Ebean.getServer(null).getAdminLogging().setDebugLazyLoad(true);
Ebean.getServer(null).getAdminLogging().setLogLevel(LogLevel.SQL);
to see in Play console the query - other queries are logged - this update - not
just remove the initial space...Yes..I couldn't believe it either...
change from " UPDATE... to "UPDATE...
And thats all...
i think you have to use raw sql instead of createSqlUpdate statement.

Problems using REXX to access both Teradata output and DB2 output

I have a REXX job that needs to read from both Teradata (using BTEQ) and DB2. At present, I can get it to either read from Teradata or DB2, but not both. When I try to read from both, the Teradata one (which runs first) works fine but the DB2 read gives an error of RC(1) upon attempting to open the cursor.
Code to read from Teradata (by and large copied from http://www.teradataforum.com/teradata/20040928_131203.htm):
ADDRESS TSO "DELETE BLAH.TEMP"
"ALLOC FI(SYSPRINT) DA(BLAH.TEMP) NEW CATALOG SP(10 10) TR RELEASE",
"UNIT(SYSDA) RECFM(F B A) LRECL(133) BLKSIZE(0) REUSE"
"ATTRIB FBATTR LRECL(220)"
"ALLOC F(SYSIN) UNIT(VIO) TRACKS SPACE(10,10) USING(FBATTR)"
/* Set up BTEQ script */
QUEUE ".RUN FILE=LOGON"
QUEUE "SELECT COLUMN1 FROM TABLE1;"
/* Run BTEQ script */
"EXECIO * DISKW SYSIN (FINIS"
"CALL 'SYS3.TDP.APPLOAD(BTQMAIN)'"; bteq_rc = rc
"FREE FI(SYSPRINT SYSIN)"
/* Read and parse BTEQ output */
"EXECIO * DISKR SYSPRINT (STEM BTEQOUT. FINIS"
DO I = 1 to BTEQOUT.0
...
END
Code to read from DB2:
ADDRESS TSO "SUBCOM DSNREXX"
IF RC THEN rcDB2 = RXSUBCOM('ADD','DSNREXX','DSNREXX')
ADDRESS DSNREXX "CONNECT " subsys
sqlQuery = "SELECT COLUMN2 FROM TABLE2;"
ADDRESS DSNREXX "EXECSQL DECLARE C001 CURSOR FOR S001"
IF SQLCODE <> 0 THEN DO
SAY 'DECLARE C001 SQLCODE = ' SQLCODE
EXIT 12
END
ADDRESS DSNREXX "EXECSQL PREPARE S001 FROM :sqlQuery"
IF SQLCODE <> 0 THEN DO
SAY 'PREPARE S001 SQLCODE = ' SQLCODE SQLERROR
EXIT 12
END
ADDRESS DSNREXX "EXECSQL OPEN C001"
IF SQLCODE <> 0 THEN DO
SAY 'OPEN C001 SQLCODE = ' SQLCODE
EXIT 12
END
ADDRESS DSNREXX "EXECSQL FETCH C001 INTO :col2"
IF SQLCODE <> 0 THEN DO
SAY 'FETCH C001 SQLCODE = ' SQLCODE
EXIT 12
END
I suspect that this has something to do with my use of SYSPRINT and SYSIN. Anyone know how I can get this to work?
Thanks.
Edit
The question as stated was actually wrong. Apologies for not correcting this earlier.
What I had really done was have this:
ADDRESS TSO "SUBCOM DSNREXX"
IF RC THEN rcDB2 = RXSUBCOM('ADD','DSNREXX','DSNREXX')
ADDRESS DSNREXX "CONNECT " subsys
...followed by a small read from DB2, then followed by the code to read from Teradata, followed by more code to read from DB2. When this was changed to reading from Teradata first before having anything to do with DB2 at all, it worked.
I don't think this has anything to do with SYSPRINT or SYSIN.
I think you are getting TSO RC = 1, not SQLCODE = 1 (because there is no SQLCODE of 1.
1 is a warning, -1 is an error. You can look this up in the DB2 Application Programming
and SQL Guide
Turn on TRACE R and run it.
There are variables (shown below) that display info about the error/warning.
22 *-* ADDRESS DSNREXX "EXECSQL OPEN C1"
>>> "EXECSQL OPEN C1"
+++ RC(1) +++
23 *-* IF SQLCODE <> 0
28 *-* SAY 'SQLSTATE='sqlstate', SQLERRMC='sqlerrmc', SQLERRP='sqlerrp
SQLSTATE=00000, SQLERRMC=, SQLERRP=DSN
29 - SAY 'SQLERRD='sqlerrd.1', 'sqlerrd.2', 'sqlerrd.3', 'sqlerrd.4',',
sqlerrd.5', 'sqlerrd.6
SQLERRD=0, 0, 0, -1, 0, 0
32 - SAY 'SQLWARN='sqlwarn.0', 'sqlwarn.1', 'sqlwarn.2', 'sqlwarn.3',',
sqlwarn.4', 'sqlwarn.5', 'sqlwarn.6', 'sqlwarn.7',',
sqlwarn.8', 'sqlwarn.9', 'sqlwarn.10
SQLWARN= , N, , , , 2, , , , ,
For example, it could be that when both are used together, there is not enough memory.