I'm using Flink SQL to read debezium avro data from Kafka and store as parquet files in S3. Here is my code,
import os
from pyflink.datastream import StreamExecutionEnvironment, FsStateBackend
from pyflink.table import TableConfig, DataTypes, BatchTableEnvironment, StreamTableEnvironment, \
ScalarFunction
exec_env = StreamExecutionEnvironment.get_execution_environment()
exec_env.set_parallelism(1)
# start a checkpoint every 12 s
exec_env.enable_checkpointing(12000)
t_config = TableConfig()
t_env = StreamTableEnvironment.create(exec_env, t_config)
INPUT_TABLE = 'source'
KAFKA_TOPIC = os.environ['KAFKA_TOPIC']
KAFKA_BOOTSTRAP_SERVER = os.environ['KAFKA_BOOTSTRAP_SERVER']
OUTPUT_TABLE = 'sink'
S3_BUCKET = os.environ['S3_BUCKET']
OUTPUT_S3_LOCATION = os.environ['OUTPUT_S3_LOCATION']
ddl_source = f"""
CREATE TABLE {INPUT_TABLE} (
`event_time` TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL,
`id` BIGINT,
`price` DOUBLE,
`type` INT,
`is_reinvite` INT
) WITH (
'connector' = 'kafka',
'topic' = '{KAFKA_TOPIC}',
'properties.bootstrap.servers' = '{KAFKA_BOOTSTRAP_SERVER}',
'scan.startup.mode' = 'earliest-offset',
'format' = 'debezium-avro-confluent',
'debezium-avro-confluent.schema-registry.url' = 'http://kafka-production-schema-registry:8081'
)
"""
ddl_sink = f"""
CREATE TABLE {OUTPUT_TABLE} (
`event_time` TIMESTAMP,
`id` BIGINT,
`price` DOUBLE,
`type` INT,
`is_reinvite` INT
) WITH (
'connector' = 'filesystem',
'path' = 's3://{S3_BUCKET}/{OUTPUT_S3_LOCATION}',
'format' = 'parquet'
)
"""
t_env.sql_update(ddl_source)
t_env.sql_update(ddl_sink)
t_env.execute_sql(f"""
INSERT INTO {OUTPUT_TABLE}
SELECT *
FROM {INPUT_TABLE}
""")
When I submit the job, I get the following error message,
pyflink.util.exceptions.TableException: Table sink 'default_catalog.default_database.sink' doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database, source]], fields=[id, price, type, is_reinvite, timestamp])
I'm using Flink 1.12.1. The source is working properly and I have tested it using a 'print' connector in the sink. Here is a sample data set which was extracted from the task manager logs when using 'print' connector in the table sink,
-D(2021-02-20T17:07:27.298,14091764,26.0,9,0)
-D(2021-02-20T17:07:27.298,14099765,26.0,9,0)
-D(2021-02-20T17:07:27.299,14189806,16.0,9,0)
-D(2021-02-20T17:07:27.299,14189838,37.0,9,0)
-D(2021-02-20T17:07:27.299,14089840,26.0,9,0)
-D(2021-02-20T17:07:27.299,14089847,26.0,9,0)
-D(2021-02-20T17:07:27.300,14189859,26.0,9,0)
-D(2021-02-20T17:07:27.301,14091808,37.0,9,0)
-D(2021-02-20T17:07:27.301,14089911,37.0,9,0)
-D(2021-02-20T17:07:27.301,14099937,26.0,9,0)
-D(2021-02-20T17:07:27.302,14091851,37.0,9,0)
How can I make my table sink work with the filesystem connector ?
What happens is that:
when receiving the Debezium records, Flink updates a logical table by adding, removing and suppressing Flink rows based on their primary key.
the only sinks that can handle that kind of information are those that have a concept of update by key. Jdbc would be a typical example, in which case it's straightforward for Flink to translate the concept of "a Flink row with key foo has been updated to bar" into "JDBC row with key foo should be updated with value bar", or something. filesystem sink do not support that kind of operation since files are append-only.
See also Flink documentation on append and update queries
In practice, in order to do the conversion, we first have to decide what is it exactly we want to have in this append-only file.
If what we want is to have in the file the latest version of each item any time an id is updated, then to my knowledge the way to go would be to convert it to a stream first, and then output that with a FileSink. Note that in that case, the result contains a boolean saying if the row is updated or deleted, and we have to decide how we want this information to be visible in the resulting file.
Note: I used this other CDC example from the Flink SQL cookbook to reproduce a similar setup:
// assuming a Flink retract table of claims build from a CDC stream:
tableEnv.executeSql("" +
" CREATE TABLE accident_claims (\n" +
" claim_id INT,\n" +
" claim_total FLOAT,\n" +
" claim_total_receipt VARCHAR(50),\n" +
" claim_currency VARCHAR(3),\n" +
" member_id INT,\n" +
" accident_date VARCHAR(20),\n" +
" accident_type VARCHAR(20),\n" +
" accident_detail VARCHAR(20),\n" +
" claim_date VARCHAR(20),\n" +
" claim_status VARCHAR(10),\n" +
" ts_created VARCHAR(20),\n" +
" ts_updated VARCHAR(20)" +
") WITH (\n" +
" 'connector' = 'postgres-cdc',\n" +
" 'hostname' = 'localhost',\n" +
" 'port' = '5432',\n" +
" 'username' = 'postgres',\n" +
" 'password' = 'postgres',\n" +
" 'database-name' = 'postgres',\n" +
" 'schema-name' = 'claims',\n" +
" 'table-name' = 'accident_claims'\n" +
" )"
);
// convert it to a stream
Table accidentClaims = tableEnv.from("accident_claims");
DataStream<Tuple2<Boolean, Row>> accidentClaimsStream = tableEnv
.toRetractStream(accidentClaims, Row.class);
// and write to file
final FileSink<Tuple2<Boolean, Row>> sink = FileSink
// TODO: adapt the output format here:
.forRowFormat(new Path("/tmp/flink-demo"),
(Encoder<Tuple2<Boolean, Row>>) (element, stream) -> stream.write((element.toString() + "\n").getBytes(StandardCharsets.UTF_8)))
.build();
ordersStreams.sinkTo(sink);
streamEnv.execute();
Note that during the conversion, you obtain a boolean telling you whether that row is a new value for that accident claim, or a deletion of such claim. My basic FileSink config there is just including that boolean in the output, although how to handle deletions is to be decided case by case.
The result in the file then looks like this:
head /tmp/flink-demo/2021-03-09--09/.part-c7cdb74e-893c-4b0e-8f69-1e8f02505199-0.inprogress.f0f7263e-ec24-4474-b953-4d8ef4641998
(true,1,4153.92,null,AUD,412,2020-06-18 18:49:19,Permanent Injury,Saltwater Crocodile,2020-06-06 03:42:25,IN REVIEW,2021-03-09 06:39:28,2021-03-09 06:39:28)
(true,2,8940.53,IpsumPrimis.tiff,AUD,323,2019-03-18 15:48:16,Collision,Blue Ringed Octopus,2020-05-26 14:59:19,IN REVIEW,2021-03-09 06:39:28,2021-03-09 06:39:28)
(true,3,9406.86,null,USD,39,2019-04-28 21:15:09,Death,Great White Shark,2020-03-06 11:20:54,INITIAL,2021-03-09 06:39:28,2021-03-09 06:39:28)
(true,4,3997.9,null,AUD,315,2019-10-26 21:24:04,Permanent Injury,Saltwater Crocodile,2020-06-25 20:43:32,IN REVIEW,2021-03-09 06:39:28,2021-03-09 06:39:28)
(true,5,2647.35,null,AUD,74,2019-12-07 04:21:37,Light Injury,Cassowary,2020-07-30 10:28:53,REIMBURSED,2021-03-09 06:39:28,2021-03-09 06:39:28)
I am using Typehandler to map a List<Dep> to oracle array of ... here is the setPArameter method in the handler :
public void setParameter(PreparedStatement ps, int i, List<Dep> parameter, JdbcType jdbcType)
throws SQLException {
Connection connection = ps.getConnection();
// StructDescriptor structDescriptor = StructDescriptor.createDescriptor("MEMS_ARR", connection);
Struct[] structs = null;
if(parameter != null && parameter.size() >0) {
structs = new Struct[parameter.size()];
for (int index = 0; index < parameter.size(); index++)
{
Dep dep = parameter.get(index);
Object[] params = new Object[7];
params[0] = dep.getOrder();
params[1] = dep.getIdTp;
params[2] = dep.getId();
params[3] = " ";
params[4] = " ";
params[5] = " ";
params[6] = " ";
// STRUCT struct = new STRUCT(structDescriptor, ps.getConnection(), params);
structs[index] = connection.createStruct("MEMS", params);
}
// ArrayDescriptor desc = ArrayDescriptor.createDescriptor("MEMS_ARR", ps.getConnection());
// ARRAY oracleArray = new ARRAY(desc, ps.getConnection(), structs);
}else {
parameter = new ArrayList<DependentDTO>();
structs= new Struct[0];
}
this.parameter = parameter;
Array oracleArray = ((OracleConnection) connection).createOracleArray("MEMS_ARR", structs);
ps.setArray(i, oracleArray);
}
and here is the MEMS type :
create or replace TYPE MEMS AS OBJECT
( MEM1 NUMBER(2,0),
MEM2 VARCHAR2(1),
MEM3 VARCHAR2(15),
MEM4 VARCHAR2(60),
MEM5 VARCHAR2(1),
MEM6 VARCHAR2(40),
MEM7 VARCHAR2(10)
);
and here is the portion of the xml mapping file that uses the Typehandler :
#{nat,javaType=String,jdbcType=VARCHAR,mode=IN}, --nat
**#{deps,javaType=List,jdbcType=ARRAY,mode=IN,jdbcTypeName=MEMS_ARR,typeHandler=com.my.package.MyHandler}, --mems**
#{res,javaType=String,jdbcType=VARCHAR,mode=OUT} --res
the error log is as follows :
Error querying database. Cause: java.sql.SQLException: ORA-06550: line 31, column 5: PLS-00103: Encountered the symbol "" when expecting one of the following: . ( ) , * # % & = - + < / > at in is mod remainder not rem => <an exponent (**)> <> or != or ~= >= <= <> and or like like2 like4 likec between || indicator multiset member submultiset The symbol "(" was substituted for "" to continue. ORA-06550: line 44, column 4: PLS-00103: Encountered the symbol ";" when expecting one of the following: . ( ) , * % & = - + < / > at in is mod remainder not rem => <an exponent (**)> <> or != or ~= >= <= <> and or like like2 like4 likec between || multiset ### The error may exist in file [E:\path\to\mapper\ADao.xml] ### The error may involve my.package.ADao.mthodToCall -Inline ### The error occurred while setting parameters ### SQL: {call MY_PROC( ... , --nat?, **--mems? --res**)}
As you can see in the logs, the mems is replaced by empty string or is merged with the next arg res ... the comma is not there
Also kindly note that I already debugged inside the mybatis code and realized that the mapping setParameter method is called and the input List is mapped correctly to the oracle array ... the issue happens at the time of real calling
The issue actually was that I simply missed one comma between two previous parameters ... but the error pointed to the wrong parameter to look at
Been testing some answers found but still no effect. What am I missing? Please see below.
String createTableCommand = "CREATE TABLE " + TABLE_NAME + "("
+ COL_USER_ID +" INTEGER PRIMARY KEY,"
+ COL_USER_FULL_NAME + " TEXT NOT NULL,"
+ COL_USER_PASSWORD + " TEXT NOT NULL)";
#Override
public void onCreate(SQLiteDatabase db) {
db.execSQL(createTableCommand);
}
public void insertUsers(User user){
SQLiteDatabase db = this.getReadableDatabase();
ContentValues values = new ContentValues();
values.put(COL_USER_ID,user.getid());
values.put(COL_USER_FULL_NAME,user.getUserFullName());
values.put(COL_USER_PASSWORD,user.getPassword());
db.insert(TABLE_NAME,"",values);
Log.d("Inside insertUsers(): ","INSERTED "+user.getid() + " " +user.getUserFullName() + " " +user.getPassword());
}
Error: E/SQLiteLog: (1) table User has no column named ID
10-30 21:04:09.791 15860-15860/com.bustracker.usc.uscbt E/SQLiteDatabase: Error inserting ID=15200002 PASSWORD=shansay FULLNAME=Shansay
android.database.sqlite.SQLiteException: table User has no column named ID (code 1): , while compiling: INSERT INTO User(ID,PASSWORD,FULLNAME) VALUES (?,?,?)
at android.database.sqlite.SQLiteConnection.nativePrepareStatement(Native Method)
at android.database.sqlite.SQLiteConnection.acquirePreparedStatement(SQLiteConnection.java:889)
at android.database.sqlite.SQLiteConnection.prepare(SQLiteConnection.java:500)
at android.database.sqlite.SQLiteSession.prepare(SQLiteSession.java:588)
at android.database.sqlite.SQLiteProgram.<init>(SQLiteProgram.java:58)
at android.database.sqlite.SQLiteStatement.<init>(SQLiteStatement.java:31)
at android.database.sqlite.SQLiteDatabase.insertWithOnConflict(SQLiteDatabase.java:1472)
at android.database.sqlite.SQLiteDatabase.insert(SQLiteDatabase.java:1343)
at com.bustracker.usc.uscbt.com.usc.tc.dbhandler.UserDbHandler.insertUsers(UserDbHandler.java:62)
at com.bustracker.usc.uscbt.com.bustracker.com.uscbt.model.UserCRUD$1.onClick(UserCRUD.java:39)
at android.view.View.performClick(View.java:5637)
at android.view.View$PerformClick.run(View.java:22429)
at android.os.Handler.handleCallback(Handler.java:751)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:154)
at android.app.ActivityThread.main(ActivityThread.java:6119)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:886)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:776)