KSQLDB Create Stream Select AS with Headers - apache-kafka

Is there a way in KsqlDB to add headers while creating a stream from AS SELECT?
For example, I have a stream DomainOrgs(DomainId INT,OrgId INT,someId INT), now I need to create a stream with all the values in DomainOrgs also DomainId should go to Header
I tried to create like
CREATE STREAM DomainOrgs_with_header AS
SELECT DomainId,
OrgId,
someId,
DomainId HEADER('DomainId')
FROM DomainOrgs
EMIT CHANGES;
Also tried
CREATE STREAM DomainOrgs_with_header
(
DomainId INT,
OrgId INT,
someId INT,
DomainId_Header Header('DomainId')
)
INSERT INTO DomainOrgs_with_header
SELECT DomainId,OrgId,someId,DomainId FROM DomainOrgs
Here, stream will create but INSERT INTO will fail.
Is there any way to select data into the stream with headers?

Related

Using PARTITION BY at KSQL, but the topic generated by the STREAM have messy code

`CREATE STREAM TEST_STREAM_JSON
(id INT
,age INT
,name VARCHAR)
WITH (KAFKA_TOPIC = 'test_partition_key_stream', VALUE_FORMAT = 'JSON');
CREATE STREAM TEST_STREAM_AVRO WITH (PARTITIONS=3, FORMAT = 'AVRO')
AS SELECT ID AS IDPK, AS_VALUE(ID) AS ID, AGE, NAME FROM TEST_STREAM_JSON
PARTITION BY ID;`
But the topic generated TEST_STREAM_AVRO's keys are messy code at UI side.
enter image description here
I have tried to add KEY_FORMAT to JSON, not wroks.

Flink SQL-CLi: bring header records

I'm new with flink sql cli and I want to create a sink from my kafka cluster.
I've read the documentation and as I understand de headers are a map<STRING, BYTE> types and through them are all the important information.
When I'm using de sql-cli I try to create a sink table following this command:
CREATE TABLE KafkaSink (
`headers` MAP<STRING, BYTES> METADATA
) WITH (
'connector' = 'kafka',
'topic' = 'MyTopic',
'properties.bootstrap.servers' ='LocalHost',
'properties.group.id' = 'MyGroypID',
'scan.startup.mode' = 'earliest-offset',
'value.format' = 'json'
);
But when I try to read the data with select * from KafkaSink limit 10; It returns me null records
I've tried to run queries like
select headers.col1 from a limit 10;
And also, I've tried to create the sink table with different structures at selecting columns part:
...
`headers` STRING
...
...
`headers` MAP<STRING, STRING>
...
...
`headers` ROW(COL1 VARCHAR, COL2 VARCHAR...)
...
But it returns me nothing, however when I bring the offset columns from kafka cluster it brings me the offset but no the headers.
Can someone explain me my error?
I want to create a kafka sink with flink sql cli
Ok, as I could see it, when I tried to change to
'format' = 'debezium-json'
I could see in a better way the json.
I follow the json schema, in my case was
{
"data": {...},
"metadata":{...}
}
So instead of bringing the header i'm bringing the data with all the columns that i need, the data as a string and the columns as for example
data.col1, data.col2
In order to see the records, just with a
select
json_value(data, '$.Col1') as Col1
from Table;
it works!

ksqldb stream created with EXTRACTJSONFIELD fields contains null values for these fields

everyone. I am trying to create a stream from another stream (using CREATE FROM SELECT) that contains JSON data stored as VARCHAR. My first intention was to use EXTRACTJSONFIELD to extract specific JSON fields to separate columns. But on an attempt to query a new stream I am getting the null values for fields created using EXTRACTJSONFIELD.
Example:
My payload looks like this:
{
"data": {
"some_value": 3702,
},
"metadata": {
"timestamp": "2022-05-23T23:54:38.477934Z",
"table-name": "some_table"
}
}
I create stream using
CREATE STREAM SOME_STREAM (data VARCHAR, metadata VARCHAR) WITH (KAFKA_TOPIC='SOME_TOPIC', VALUE_FORMAT='JSON');
Data inside looks like this:
Then I am trying to create new stream using:
CREATE STREAM SOME_STREAM_QUERYABLE
AS SELECT
DATA,
METADATA,
EXTRACTJSONFIELD(METADATA, '$.timestamp') as timestamp
FROM SOME_STREAM
EMIT CHANGES
The attempt to query this new stream gives me null for timestamp field. Data and metadata fields have valid json.

How to parse array of struct when creating stream in KSQL

I'm trying to access a key called session ID, here is what the JSON of my message looks like:
{
"events":[
{
"data":{
"session_id":"-8479590123628083695"}}]}
This is my KSQL code to create the stream,
CREATE STREAM stream_test(
events ARRAY< data STRUCT< session_id varchar> >
) WITH (KAFKA_TOPIC='my_topic',
VALUE_FORMAT='JSON')
But I get this error,
KSQLError: ("line 4:22: mismatched input 'STRUCT' expecting {'(', 'ARRAY', '>'}", 40001, None)
Does anyone know how to unpack this kind of structure? I'm struggling to find examples
My solution:
events ARRAY< STRUCT<data STRUCT< session_id varchar >> >

R2DBC: Why are RowsFetchSpec<T> operators (.all(),.first(),.one()) signal onComplete although record is stored in database?

I am using DatabaseClient for building a custom Repository. After I insert or update an Item I need that Row data to return the saved/updated Item. I just can´t wrap my head around why .all(), .first(), .one() are not returning the Result Map, although I can see that the data is inserted/updated in the database. They just signal onComplete. But .rowsUpdated() returns 1 row updated.
I observed this behaviour with H2 and MS SQL Server.
I´m new to R2dbc. What am I missing? Any ideas?
#Transactional
public Mono<Item> insertItem(Item entity){
return dbClient
.sql("insert into items (creationdate, name, price, traceid, referenceid) VALUES (:creationDate, :name, :price, :traceId, :referenceId)")
.bind("creationDate", entity.getCreationDate())
.bind("name", entity.getName())
.bind("price", entity.getPrice())
.bind("traceId", entity.getTraceId())
.bind("referenceId", entity.getReferenceId())
.fetch()
.first() //.all() //.one()
.map(Item::new)
.doOnNext(item -> LOGGER.info(String.format("Item: %s", item)));
}
The table looks like this:
CREATE TABLE [dbo].[items](
[creationdate] [bigint] NOT NULL,
[name] [nvarchar](32) NOT NULL,
[price] [int] NOT NULL,
[traceid] [nvarchar](64) NOT NULL,
[referenceid] [int] NOT NULL,
PRIMARY KEY (name, referenceid)
)
Thanks!
This is the behavior of an insert/update statement in database, it does not return the inserted/updated rows.
It returns the number of inserted/updated rows.
It may also return some generated values by the database (such as auto increment, generated uuid...), by adding the following line:
.filter(statement -> statement.returnGeneratedValues())
where you may specify specific generated columns in parameter. However this has limitations depending on the database (for example MySql can only return the last generated ID of an auto increment column even if you insert multiple rows).
If you want to get the inserted/updated values from database, you need to do a select.