I'm using Flink SQL to read debezium avro data from Kafka and store as parquet files in S3. Here is my code,
import os
from pyflink.datastream import StreamExecutionEnvironment, FsStateBackend
from pyflink.table import TableConfig, DataTypes, BatchTableEnvironment, StreamTableEnvironment, \
ScalarFunction
exec_env = StreamExecutionEnvironment.get_execution_environment()
exec_env.set_parallelism(1)
# start a checkpoint every 12 s
exec_env.enable_checkpointing(12000)
t_config = TableConfig()
t_env = StreamTableEnvironment.create(exec_env, t_config)
INPUT_TABLE = 'source'
KAFKA_TOPIC = os.environ['KAFKA_TOPIC']
KAFKA_BOOTSTRAP_SERVER = os.environ['KAFKA_BOOTSTRAP_SERVER']
OUTPUT_TABLE = 'sink'
S3_BUCKET = os.environ['S3_BUCKET']
OUTPUT_S3_LOCATION = os.environ['OUTPUT_S3_LOCATION']
ddl_source = f"""
CREATE TABLE {INPUT_TABLE} (
`event_time` TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL,
`id` BIGINT,
`price` DOUBLE,
`type` INT,
`is_reinvite` INT
) WITH (
'connector' = 'kafka',
'topic' = '{KAFKA_TOPIC}',
'properties.bootstrap.servers' = '{KAFKA_BOOTSTRAP_SERVER}',
'scan.startup.mode' = 'earliest-offset',
'format' = 'debezium-avro-confluent',
'debezium-avro-confluent.schema-registry.url' = 'http://kafka-production-schema-registry:8081'
)
"""
ddl_sink = f"""
CREATE TABLE {OUTPUT_TABLE} (
`event_time` TIMESTAMP,
`id` BIGINT,
`price` DOUBLE,
`type` INT,
`is_reinvite` INT
) WITH (
'connector' = 'filesystem',
'path' = 's3://{S3_BUCKET}/{OUTPUT_S3_LOCATION}',
'format' = 'parquet'
)
"""
t_env.sql_update(ddl_source)
t_env.sql_update(ddl_sink)
t_env.execute_sql(f"""
INSERT INTO {OUTPUT_TABLE}
SELECT *
FROM {INPUT_TABLE}
""")
When I submit the job, I get the following error message,
pyflink.util.exceptions.TableException: Table sink 'default_catalog.default_database.sink' doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database, source]], fields=[id, price, type, is_reinvite, timestamp])
I'm using Flink 1.12.1. The source is working properly and I have tested it using a 'print' connector in the sink. Here is a sample data set which was extracted from the task manager logs when using 'print' connector in the table sink,
-D(2021-02-20T17:07:27.298,14091764,26.0,9,0)
-D(2021-02-20T17:07:27.298,14099765,26.0,9,0)
-D(2021-02-20T17:07:27.299,14189806,16.0,9,0)
-D(2021-02-20T17:07:27.299,14189838,37.0,9,0)
-D(2021-02-20T17:07:27.299,14089840,26.0,9,0)
-D(2021-02-20T17:07:27.299,14089847,26.0,9,0)
-D(2021-02-20T17:07:27.300,14189859,26.0,9,0)
-D(2021-02-20T17:07:27.301,14091808,37.0,9,0)
-D(2021-02-20T17:07:27.301,14089911,37.0,9,0)
-D(2021-02-20T17:07:27.301,14099937,26.0,9,0)
-D(2021-02-20T17:07:27.302,14091851,37.0,9,0)
How can I make my table sink work with the filesystem connector ?
What happens is that:
when receiving the Debezium records, Flink updates a logical table by adding, removing and suppressing Flink rows based on their primary key.
the only sinks that can handle that kind of information are those that have a concept of update by key. Jdbc would be a typical example, in which case it's straightforward for Flink to translate the concept of "a Flink row with key foo has been updated to bar" into "JDBC row with key foo should be updated with value bar", or something. filesystem sink do not support that kind of operation since files are append-only.
See also Flink documentation on append and update queries
In practice, in order to do the conversion, we first have to decide what is it exactly we want to have in this append-only file.
If what we want is to have in the file the latest version of each item any time an id is updated, then to my knowledge the way to go would be to convert it to a stream first, and then output that with a FileSink. Note that in that case, the result contains a boolean saying if the row is updated or deleted, and we have to decide how we want this information to be visible in the resulting file.
Note: I used this other CDC example from the Flink SQL cookbook to reproduce a similar setup:
// assuming a Flink retract table of claims build from a CDC stream:
tableEnv.executeSql("" +
" CREATE TABLE accident_claims (\n" +
" claim_id INT,\n" +
" claim_total FLOAT,\n" +
" claim_total_receipt VARCHAR(50),\n" +
" claim_currency VARCHAR(3),\n" +
" member_id INT,\n" +
" accident_date VARCHAR(20),\n" +
" accident_type VARCHAR(20),\n" +
" accident_detail VARCHAR(20),\n" +
" claim_date VARCHAR(20),\n" +
" claim_status VARCHAR(10),\n" +
" ts_created VARCHAR(20),\n" +
" ts_updated VARCHAR(20)" +
") WITH (\n" +
" 'connector' = 'postgres-cdc',\n" +
" 'hostname' = 'localhost',\n" +
" 'port' = '5432',\n" +
" 'username' = 'postgres',\n" +
" 'password' = 'postgres',\n" +
" 'database-name' = 'postgres',\n" +
" 'schema-name' = 'claims',\n" +
" 'table-name' = 'accident_claims'\n" +
" )"
);
// convert it to a stream
Table accidentClaims = tableEnv.from("accident_claims");
DataStream<Tuple2<Boolean, Row>> accidentClaimsStream = tableEnv
.toRetractStream(accidentClaims, Row.class);
// and write to file
final FileSink<Tuple2<Boolean, Row>> sink = FileSink
// TODO: adapt the output format here:
.forRowFormat(new Path("/tmp/flink-demo"),
(Encoder<Tuple2<Boolean, Row>>) (element, stream) -> stream.write((element.toString() + "\n").getBytes(StandardCharsets.UTF_8)))
.build();
ordersStreams.sinkTo(sink);
streamEnv.execute();
Note that during the conversion, you obtain a boolean telling you whether that row is a new value for that accident claim, or a deletion of such claim. My basic FileSink config there is just including that boolean in the output, although how to handle deletions is to be decided case by case.
The result in the file then looks like this:
head /tmp/flink-demo/2021-03-09--09/.part-c7cdb74e-893c-4b0e-8f69-1e8f02505199-0.inprogress.f0f7263e-ec24-4474-b953-4d8ef4641998
(true,1,4153.92,null,AUD,412,2020-06-18 18:49:19,Permanent Injury,Saltwater Crocodile,2020-06-06 03:42:25,IN REVIEW,2021-03-09 06:39:28,2021-03-09 06:39:28)
(true,2,8940.53,IpsumPrimis.tiff,AUD,323,2019-03-18 15:48:16,Collision,Blue Ringed Octopus,2020-05-26 14:59:19,IN REVIEW,2021-03-09 06:39:28,2021-03-09 06:39:28)
(true,3,9406.86,null,USD,39,2019-04-28 21:15:09,Death,Great White Shark,2020-03-06 11:20:54,INITIAL,2021-03-09 06:39:28,2021-03-09 06:39:28)
(true,4,3997.9,null,AUD,315,2019-10-26 21:24:04,Permanent Injury,Saltwater Crocodile,2020-06-25 20:43:32,IN REVIEW,2021-03-09 06:39:28,2021-03-09 06:39:28)
(true,5,2647.35,null,AUD,74,2019-12-07 04:21:37,Light Injury,Cassowary,2020-07-30 10:28:53,REIMBURSED,2021-03-09 06:39:28,2021-03-09 06:39:28)
I am using Orientdb 2.2.12.
This below is single source from where I am getting Connection instance
public static synchronized Connection getOrientDbConnection(String appName)
{
try {
// address of Db Server
OServerAdmin serverAdmin = new OServerAdmin
("remote:" + ip + ":" + port + "/"
+ appName).connect("root", "1234");
if (serverAdmin.existsDatabase())
{
serverAdmin.close();
Properties info = new Properties();
info.put("user", "root");
info.put("password", "1234");
Connection conn = (OrientJdbcConnection) DriverManager.getConnection("jdbc:orient:remote:" + ip + ":" + port + "/"
+ appName, info);
//System.out.println(" Database Connection instance returned for : " + appName + " for thread ID : " + Thread.currentThread().getId());
return conn;
}
else
{
// "Type_Of_Db", "Type_Of_Storage"
serverAdmin.createDatabase("GRAPH", "plocal");
serverAdmin.close();
OrientGraphFactory graphFactory = new OrientGraphFactory(
"remote:" + ip + ":" + port + "/" + appName);
/**
* ####################### Do This only Ones-Per-Database #####################
* 1. Create index with unique field
* #name = unique index will be created only one time for a particular database
*
*
*/
OrientGraphNoTx graph = graphFactory.getNoTx();
graph.createKeyIndex("name", Vertex.class, new Parameter<String, String>("type", "UNIQUE"));
// shutdown the graph which created Index immediately
graph.commit();
graph.shutdown();
// close all db pools
graphFactory.close();
try
{
Properties info = new Properties();
info.put("user", "root");
info.put("password", "pcp");
Connection conn = (OrientJdbcConnection) DriverManager.getConnection("jdbc:orient:remote:" +ip + ":" + port + "/" + appName, info);
//System.out.println(" Database created and Connection instance return for : " + appName + " for thread ID : " + Thread.currentThread().getId());
return conn;
} catch (Exception e) {
//System.out.println("Problem creating database or getting connection from database " + " for thread ID : " + Thread.currentThread().getId() + e.getMessage());
e.getMessage();
return null;
}
}
}
catch(Exception e)
{
e.getMessage();
}
return null;
}
My use-case is creating :-
vertexes
edges
update vertex, edges etc..
App.java
Connection conn = DB.getOrientDbConnection(String appName)
// create vertex
// create edges
conn.commit()
conn.close();
What problems I am facing ?
Even after program finishes and conn.close() done, JConsole showing Orientdb connection still active.
Name: OrientDB <- Asynch Client (/192.168.1.11:4424)
State: RUNNABLE
Total blocked: 0 Total waited: 136
Stack trace:
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(Unknown Source)
java.net.SocketInputStream.read(Unknown Source)
java.net.SocketInputStream.read(Unknown Source)
java.io.BufferedInputStream.fill(Unknown Source)
java.io.BufferedInputStream.read(Unknown Source)
- locked java.io.BufferedInputStream#1780dcd
java.io.DataInputStream.readByte(Unknown Source)
com.orientechnologies.orient.enterprise.channel.binary.OChannelBinary.readByte(OChannelBinary.java:68)
com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:189)
com.orientechnologies.orient.client.binary.OAsynchChannelServiceThread.execute(OAsynchChannelServiceThread.java:51)
com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:77)
Is this a bug or I am doing something wrong?
If your server is running you will see that in JConsole.
I have object DictionarySqlLiteDao which should create table of it not exist during its initialization. But I have exception despite the fact that the database is created in file system. Help me to figure out the reason of it. Here is my code:
object DictionarySqlLiteDao extends Dao[Word] {
private val CREATE_WORD_TABLE_SQL = "CREATE TABLE IF NOT EXISTS thelper.WORD " +
"(ID INTEGER PRIMARY KEY AUTOINCREMENT, " +
"WORD TEXT NOT NULL, " +
"UKR TEXT NOT NULL, " +
"ENG TEXT NOT NULL, " +
"EXAMPLE TEXT)"
private val SAVE_WORD_SQL = "INSERT INTO WORD(WORD, UKR, ENG, EXAMPLE" +
"VALUES(?, ?, ?, ?)"
private val connection: Connection = {
Class.forName("org.sqlite.JDBC")
DriverManager.getConnection("jdbc:sqlite:thelper")
}
{
connection.createStatement().executeUpdate(CREATE_WORD_TABLE_SQL)
}
override def save(word: Word): Word = {
val statement = connection.prepareStatement(SAVE_WORD_SQL, Statement.RETURN_GENERATED_KEYS)
statement.setString(1, word.word)
statement.setString(2, word.translation("ukr"))
statement.setString(3, word.translation("eng"))
statement.setString(4, word.example.orNull)
statement.executeUpdate()
val id = statement.getGeneratedKeys.getInt(1)
word.copy(id = id)
}
def main(args: Array[String]) {
println(connection == null) // here I get false
println(connection.createStatement().execute("select date('now')")) // here I get true
val w = save(Word(-1, "germanWord", Map("ukr" -> "ukrTranslation", "eng" -> "engTranslation"), Option.empty[String])) // and here I get an Exception
println(w)
}
}
And here is an exception I get:
Exception in thread "main" java.lang.ExceptionInInitializerError
at thelper.Dictionary.dao.DictionarySqlLiteDao.main(DictionarySqlLiteDao.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.sql.SQLException: unknown database thelper
at org.sqlite.core.NativeDB.throwex(NativeDB.java:397)
at org.sqlite.core.NativeDB._exec(Native Method)
at org.sqlite.jdbc3.JDBC3Statement.executeUpdate(JDBC3Statement.java:116)
at thelper.Dictionary.dao.DictionarySqlLiteDao$.<init>(DictionarySqlLiteDao.scala:23)
at thelper.Dictionary.dao.DictionarySqlLiteDao$.<clinit>(DictionarySqlLiteDao.scala)
... 6 more
As far as I remember you should specify full path to database, in your in-memory case it should be thelper.db like:
DriverManager.getConnection("jdbc:sqlite:thelper.db")`
P.S. After that you can omit database name: "CREATE TABLE IF NOT EXISTS WORD " + ....
P.P.S you can use triple quotes to write multiline strings like
"""CREATE TABLE IF NOT EXISTS WORD(
| ID INTEGER PRIMARY KEY AUTOINCREMENT,
| WORD TEXT NOT NULL,
| UKR TEXT NOT NULL,
| ENG TEXT NOT NULL,
| EXAMPLE TEXT)""".stripMargin
In my sample app following snippet works fine.
SELECT_CQL = "SELECT * FROM " + STREAM_NAME_IN_CASSANDRA + " WHERE '" + CONTEXT_ID_COLUMN + "'=?";
Connection connection = getConnection();
statement = connection.prepareStatement(SELECT_CQL);
statement.setString(1, "123");
resultSet = statement.executeQuery();
But when I try to add another parameter to the where clause query returns nothing!!
SELECT_CQL = "SELECT * FROM " + STREAM_NAME_IN_CASSANDRA + " WHERE '" + CONTEXT_ID_COLUMN + "'=? AND '"+TIMESTAMP_COLUMN+"'=?";
Connection connection = getConnection();
statement = connection.prepareStatement(SELECT_CQL);
statement.setString(1, "123");
statement.setString(2, "1390996577514");
resultSet = statement.executeQuery();
When I try the exact query within cqlsh terminal, it works fine.
statement.setString(2, "1390996577514");
Double-check the datatype of your TIMESTAMP_COLUMN and make sure that it's a string. Otherwise you'll need to use the appropriate "set" method. Ex:
statement.setLong(2, 1390996577514L);
I have got the following problem. I encountered an error regarding the default parameters. I added a simple piece of code for an overload. Now I get in the new code (line 3: CreatePageSection(sItemID, "", null);) gives the error mentioned in the title.
I looked for the answer in the other topics, but I can't find the problem. Can someone help me?
The code is found here:
public void CreatePageSection(ref string sItemID)
{
CreatePageSection(sItemID, "", null);
}
public void CreatePageSection(ref string sItemID, ref string sFrameUrl, ref object vOnReadyState)
{
if (Strings.InStr(msPresentPageSections, "|" + sItemID + "|", 0) > 0) {
return;
}
msPresentPageSections = msPresentPageSections + sItemID + "|";
string writeHtml = "<div class=" + MConstants.QUOTE + "PageSection" + MConstants.QUOTE + " id=" + MConstants.QUOTE + "Section" + sItemID + "Div" + MConstants.QUOTE + " style=" + MConstants.QUOTE + "display: none;" + MConstants.QUOTE + ">";
this.WriteLine_Renamed(ref writeHtml);
//UPGRADE_WARNING: Couldn't resolve default property of object vOnReadyState. Click for more: 'ms-help://MS.VSCC.v90/dv_commoner/local/redirect.htm?keyword="6A50421D-15FE-4896-8A1B-2EC21E9037B2"'
//UPGRADE_NOTE: IsMissing() was changed to IsNothing_Renamed(). Click for more: 'ms-help://MS.VSCC.v90/dv_commoner/local/redirect.htm?keyword="8AE1CB93-37AB-439A-A4FF-BE3B6760BB23"'
writeHtml = " <iframe id=" + MConstants.QUOTE + sItemID + "Frame" + MConstants.QUOTE + " name=" + MConstants.QUOTE + sItemID + "Frame" + MConstants.QUOTE + " frameborder=" + MConstants.QUOTE + "0" + MConstants.QUOTE + (!string.IsNullOrEmpty(sFrameUrl) ? " src=" + MConstants.QUOTE + sFrameUrl + MConstants.QUOTE : "") + ((vOnReadyState == null) ? "" : " onreadystatechange=" + MConstants.QUOTE + Convert.ToString(vOnReadyState) + MConstants.QUOTE) + ">";
this.WriteLine_Renamed(ref writeHtml);
writeHtml = " </iframe>";
this.WriteLine_Renamed(ref writeHtml);
writeHtml = "</div>";
this.WriteLine_Renamed(ref writeHtml);
}
you must pass params by reference
public void CreatePageSection(ref string sItemID)
{
var missingString = String.Empty;
object missingObject = null;
CreatePageSection(ref sItemID, ref missingString, ref missingObject);
}
Since your are not manipulating sFrameUrl and vOnReadyState, remove the ref keyword from those parameters.
See: http://msdn.microsoft.com/en-us/library/14akc2c7(v=vs.71).aspx
hth
Mario