How do I define the type of a field starting with a #? - apache-kafka

I am trying to create a stream from some kakfa messages in json format like :
"beat": {
"name": "xxxxxxx",
"hostname": "xxxxxxxxxx",
"version": "zzzzz"
},
"log_instance": "forwarder-2",
"type": "prod",
"message": "{ ... json string.... }",
"#timestamp": "2020-06-14T23:31:33.925Z",
"input_type": "log",
"#version": "1"
}
I tried using
CREATE STREAM S (
beat STRUCT<
name VARCHAR,
hostname VARCHAR,
version VARCHAR
>,
log_instance VARCHAR,
type VARCHAR,
message VARCHAR, # for brevity - I also have this with a struct
#timestamp VARCHAR,
input_type VARCHAR,
#version` VARCHAR )
WITH (KAFKA_TOPIC='some_topic', VALUE_FORMAT='JSON');
However I get an error :
Caused by: line 10:5: extraneous input '#' expecting ....
I tied quoting and preceeding underscore but no luck. I also tried creating an entry in the registry but I could not create legit avro this way.
PS. How do I "bind" a topic to a registry schema?
Thanks.

If you're on a recent enough version of ksqlDB then simply quoting the column names with invalid characters should work:
CREATE STREAM S (
beat STRUCT<
name VARCHAR,
hostname VARCHAR,
version VARCHAR
>,
log_instance VARCHAR,
type VARCHAR,
message VARCHAR, # for brevity - I also have this with a struct
`#timestamp` VARCHAR,
input_type VARCHAR,
`#version` VARCHAR )
WITH (KAFKA_TOPIC='some_topic', VALUE_FORMAT='JSON');
If the above doesn't work, then it's likely you're on an old version of ksqlDB. Upgrading should fix this issue.
PS. How do I "bind" a topic to a registry schema?
ksqlDB will auto-publish the JSON schema to the Schema Registry if you use the JSON_SR format, rather than just JSON. The latter only supports reading the schema from the schema registry.
If you're more asking how you register a schema in the SR for an existing topic... then you're best off looking at the SR docs. Note, ksqlDB only supports the TopicNameStrategy naming strategy. The value schema has the subject {topic-name}-value, e.g. the following registers a JSON schema for the test topic's values.
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"Payment\",\"namespace\":\"io.confluent.examples.clients.basicavro\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"amount\",\"type\":\"double\"}]}"}' http://localhost:8081/subjects/test-value/versions
See the SR tutorial for more info: https://docs.confluent.io/current/schema-registry/schema_registry_tutorial.html
I also tried creating an entry in the registry but I could not create legit avro this way.
Avro does not allow # in its field names. However, it looks like your data is in JSON format, which does allow #. See the curl example above on how to register a JSON schema.

Related

Creating a table from topic with value of String type in KSQLDB

How can one create a table from a topic which contains value of type String?
We have some topics that contains rdf data embedded inside strings, in a sense it is just a string value. Based on the KSQLDB documentation, we need to use value_format='KAFKA' with WRAP_SINGLE_VALUE=false given that it is an anonymous value.
CREATE SOURCE Table source_table_proxy (
key VARCHAR primary KEY,
value VARCHAR
) WITH (
KEY_FORMAT='KAFKA',
VALUE_FORMAT='KAFKA',
WRAP_SINGLE_VALUE=false,
kafka_topic = 'topic'
);
This is the topic info:
Key Type: STRING
Value Type: STRING
Topic Info
Partitions: 12
Replication: 1
Weirdly we get the following error:
The 'KAFKA' format only supports a single field. Got: [`VALUE` STRING, `ROWPARTITION` INTEGER, `ROWOFFSET` BIGINT]
Is there any workaround this issue ?
Unclear why you need KAFKA format.
The JSON format will work for plain (primitive) strings as well
supports reading and writing top-level primitives, arrays and maps.
For example, given a SQL statement with only a single field in the value schema and the WRAP_SINGLE_VALUE property set to false:
CREATE STREAM x (ID BIGINT) WITH (VALUE_FORMAT='JSON', WRAP_SINGLE_VALUE=false, ...);
And a JSON value of:
10
ksqlDB can deserialize the values into the ID field of the stream
https://docs.ksqldb.io/en/latest/reference/serialization/#top-level-primitives-arrays-and-maps

Debezium: Mysql LONGTEXT to Debezium Data type conversions is not correct

mysql schema
`Info` longtext,
debezium schema for the same field
{
"name": "Info",
"type": [
"null",
"string"
],
"default": null
},
When this data is loaded in Redshift it fails as it expects the data type to be large i.e. VARCHAR(MAX) but it is getting VARCHAR(255) since debezium is not transforming longtext to long.
Please suggest, why is this happening.
please take a look at https://debezium.io/documentation/reference/1.2/connectors/mysql.html#mysql-property-column-propagate-source-type
This will add the type constarint parameters into the schema.
Also IIUC you are using Confluent Avro Converter. If yes then set enhanced.avro.schema.support and connect.meta.data to true.
In this case you will need to convert the Debezium onstraint params into ones supported by the sink converter if such functionlaity is provided.

KsqlStatementException: statement does not define the schema and the supplied format does not support schema inference

I am using a ksql-server on kubernetes using the confluent helm charts.
https://github.com/confluentinc/cp-helm-charts/tree/master/charts/cp-ksql-server
I modified the queries.sql file for my own personal use case.
https://github.com/confluentinc/cp-helm-charts/blob/master/charts/cp-ksql-server/queries.sql
this is my query:
CREATE STREAM ksql_test WITH (kafka_topic='orders-topic', value_format='DELIMITED', partitions='1', replicas='3');
Once i deploy this pod, i get this error:
ERROR Failed to start KSQL Server with query file: /etc/ksql/queries/queries.sql (io.confluent.ksql.rest.server.StandaloneExecutor:124)
io.confluent.ksql.util.KsqlStatementException: statement does not define the schema and the supplied format does not support schema inference
Statement: CREATE STREAM ksql_test WITH (kafka_topic='orders-topic', value_format='DELIMITED', partitions='1', replicas='3');
No matter if I change the format to JSON, the error remains the same and i dont have schema for this topic.
This is the problem:
statement does not define the schema
A KSQL stream is a Kafka topic plus a schema. No schema, no stream.
If your data is delimited, maybe it looks like this:
1,FOO,0.1,400
That has a schema, perhaps something like:
CREATE STREAM example (COL1 INT, LABEL VARCHAR, WIBBLE DOUBLE, TARGET BIGINT)
WITH (KAFKA_TOPIC='example_topic', VALUE_FORMAT='DELIMITED);
tl;dr you can't create a stream without a schema. If you're using Avro, you have a schema already (in the Schema Registry) and hence don't have to declare it. If you're using JSON or Delimited, you must declare the schema explicitly.

NIFI Insert CSV File into Postgres Database with date fields

I would like to insert csv file into my postgres database. I use processors :
Getfiles ->
Split (cause files are big) ->
UpdateAttribute (to add avro.schema) ->
ConvertCSvToAvro ->
Putdatabaserecord.
If i use only string/text fields (in my avro schema and in column postgres database), the result is ok.
But when i tried to format Date fields, i have an error.
My raw data (CSV) is :
date_export|num_etiquette|key
07/11/2019 01:36:00|BAROMETRExxxxx|BAROMETRE-xxxxx
My avro schema is :
{
"type":"record",
"name":"public.data_scope_gp_temp",
"fields":[
{"name":"date_export","type":{ "type": "int", "logicalType": "date"}},
{"name":"num_etiquette","type":"string"},
{"name":"cle_scope","type":"string"}
]}
My postgres schema is:
date_export date,
num_etiquette text COLLATE pg_catalog."default",
key text COLLATE pg_catalog."default"
Any idea ?Regards
You don't need UpdateAttribute or ConvertCsvToAvro to use PutDatabaseRecord. You can specify a CSVReader in PutDatabaseRecord, and your CSVReader can supply the Avro schema in the Schema Text property (don't forget to set your Schema Strategy to Use Schema Text).

KSQL table not showing data but Stream with same structure returning data

I have created a table in KSQL, while querying it's not returning any data. Then I created a stream on the same topic with same structure and I am able to query the data.
What am I missing here. I need this as a table for joining with a stream.
CREATE TABLE users_table \
(registertime bigint, userid varchar, regionid varchar, gender varchar) \
WITH (value_format='json', kafka_topic='users_topic',key='userid');
and
CREATE STREAM users_stream \
(registertime bigint, userid varchar, regionid varchar, gender varchar) \
WITH (value_format='json', kafka_topic='users_topic');
Thanks in advance.
If you read a topic as a TABLE the messages in the topic must have the key set. If the key is null, records will be dropped silently. A key in a KSQL TABLE is a primary key and null is no valid value for a primary key.
Furthermore, the value in the message of the key attribute, must be the same as the key (note, that the schema itself is define on the value of the message). For example, if you have a schema, <A,B,C> and you set A as the key, the messages in the topic must be <key,value> == <a,<a,b,c>>. Otherwise, you will get incorrect results.