Ksqldb create stream error while using oneof protobuf on schema registry

Ksqldb create stream error while using oneof protobuf on schema registry - apache-kafka

I'm having an issue while trying to create a stream with a topic that has a defined protobuf schema that uses oneof clause, I try to create the stream referring to the VALUE_SCHEMA_ID of the created schema on schema registry, the idea of this stream is that it transforms some data into one of the objects accepted on the oneof protobuf definition but an error is being thrown, here are the files:
Protobuf schema:
syntax = "proto3";
import "google/protobuf/timestamp.proto";
message SettlementContextEvent{
string type = 1;
oneof event_data {
SettlementContextCreatedEvent settlement_context_created = 2;
MovementsCalculatedEvent movements_calculated = 3;
}
message SettlementContextCreatedEvent{
string status = 1;
string type = 2;
int32 account_id = 3;
string transaction_id = 4;
int64 order_id = 5;
google.protobuf.Timestamp creation_date = 6;
string additional_information = 7;
}
message MovementsCalculatedEvent{
repeated Movement movements = 1;
}
message Movement{
string id = 1;
...
}
}
This schema is under settlement_context_events-value subject with id 73
This is the Ksql query:
CREATE
STREAM IF NOT EXISTS SETTLEMENT_CONTEXT_CREATION_TRANSFORMER
WITH (KAFKA_TOPIC = 'settlement_context_events',
PARTITIONS = 10,
REPLICAS = 1 ,
VALUE_FORMAT = 'PROTOBUF',
VALUE_SCHEMA_ID= 73)
AS
SELECT var0 as `aggregate_id`,
Struct(`settlement_context_created` := Struct(
`status` := var1,
`type` := var2,
`account_id` := var3,
`tansaction_id` := var4,
`order_id` := CAST(var5 AS BIGINT),
`creation_date` := var6,
`additional_information` := var7
) ) AS `event_data`,
'test' AS `type`
FROM SETTLEMENT_CONTEXT_CDC PARTITION BY var0;
As you see it es using the settlement_context_createdobject defined in the oneof clause, but I'm getting the following error:
The following value columns are changed, missing or reordered: [`event_data` STRUCT<`settlement_context_created` STRUCT<`status` STRING, `type` STRING, `account_id` INTEGER, `tansaction_id` STRING, `order_id` BIGINT, `creation_date` TIMESTAMP, `additional_information` STRING>>]. Schema from schema registry is [`event_data_0` STRUCT<`settlement_context_created` STRUCT<`status` STRING, `type` STRING, `account_id` INTEGER, `transaction_id` STRING, `order_id` BIGINT, `creation_date` TIMESTAMP, `additional_information` STRING>, `movements_calculated` STRUCT<`movements` ARRAY<STRUCT<`id` STRING, `account_id` INTEGER, `operation` STRING, `value` DOUBLE, `creation_date` TIMESTAMP, `description` STRING, `previous_balance` DOUBLE, `billable` BOOLEAN, `document` STRING, `parent_account_operation_id` STRING, `released` BOOLEAN, `previous_reserve_balance` DOUBLE, `previous_frozen_balance` DOUBLE, `movement_type` STRING, `document_support_type` STRING, `release_date_of_reserve_fund` TIMESTAMP, `related_transaction_id` STRING, `accreditation_order` INTEGER>>>>, `type` STRING]
NOTE: I also tried changing the event_datastruct name to event_data_0 but got no luck does anyone knows why Ksqldb adds this 0 at the end?

Related

How to represent nulls in DataSets consisting of list of case classes

I have a case class
final case class FieldStateData(
job_id: String = null,
job_base_step_id: String = null,
field_id: String = null,
data_id: String = null,
data_value: String = null,
executed_unit: String = null,
is_doc: Boolean = null,
mime_type: String = null,
filename: String = null,
filesize: BigInt = null,
caption: String = null,
executor_id: String = null,
executor_name: String = null,
executor_email: String = null,
created_at: BigInt = null
)
That I want to use as part of a dataset of type Dataset[FieldStateData] to eventually insert into a database. All columns need to be nullable. How would I represent null types for numbers descended from Any rather than any string? I thought about using Option[Boolean] or something like that but will that automatically unbox during insertion or when it's used as a sql query?
Also note that the above code in not correct. Boolean types are not nullable. It's just an example.

You are correct to use Option Monad for in the case class. The field shall be unboxed by spark on read.
import org.apache.spark.sql.{Encoder, Encoders, Dataset}
final case class FieldStateData(job_id: Option[String],
job_base_step_id: Option[String],
field_id: Option[String],
data_id: Option[String],
data_value: Option[String],
executed_unit: Option[String],
is_doc: Option[Boolean],
mime_type: Option[String],
filename: Option[String],
filesize: Option[BigInt],
caption: Option[String],
executor_id: Option[String],
executor_name: Option[String],
executor_email: Option[String],
created_at: Option[BigInt])
implicit val fieldCodec: Encoder[FieldStateData] = Encoders.product[FieldStateData]
val ds: Dataset[FieldStateEncoder] = spark.read.source_name.as[FieldStateData]
When you write the Dataset back into the database, None become null values and Some(x) are the values that are present.

How to create a scala case class instance with a Map instance

I want to create a scala case class whose fields come form a map . And , here is the case class
case class UserFeature(uid: String = null,
age: String = null,
marriageStatus: String = null,
consumptionAbility: String = null,
LBS: String = null,
interest1: String = null,
interest2: String = null,
interest3: String = null,
interest4: String = null,
interest5: String = null,
kw1: String = null,
kw2: String = null,
kw3: String = null,
topic1: String = null,
topic2: String = null,
topic3: String = null,
appIdInstall: String = null,
appIdAction: String = null,
ct: String = null,
os: String = null,
carrier: String = null,
house: String = null
)
suppose the map instance is
Map("uid" -> "4564131",
"age" -> "5",
"ct" -> "bk7755")
how can I apply the keys&values of the map to the fields&values of case class?

It is not a good idea to use null to represent missing string values. Use Option[String] instead.
case class UserFeature(uid: Option[String] = None,
age: Option[String] = None,
marriageStatus: Option[String] = None,
...
Once you have done that, you can use get on the map to retrieve the value.
UserFeature(map.get("uid"), map.get("age"), map.get("marriageStatus") ...)
Values that are present in the map will be Some(value) and missing values will be None. The Option class has lots of useful methods for processing optional values in a safe way.

You can do UserFeature(uid = map_var("uid"), age = map_var("age"), ct = map_var("ct")) assuming the variable holding Map is map_var and the keys are available

Synthesizing the other two answers, I would convert all the Strings in UserFeature that you're defaulting to null (which you should basically never use in Scala unless interacting with poorly-written Java code requires it, and even then use it as little as possible) to Option[String]. I leave that search-and-replace out of the answer.
Then you can do:
object UserFeature {
def apply(map: Map[String, String]): UserFeature =
UserFeature(map.get("uid"), map.get("age") ...)
}
Which lets you use:
val someMap: Map[String, String] = ...
val userFeature = UserFeature(someMap)
With the change to Option[String], there will be some other changes that need to be made in your codebase. https://danielwestheide.com/blog/2012/12/19/the-neophytes-guide-to-scala-part-5-the-option-type.html is a good tutorial for how to deal with Option.

Primitive types to AnyRef in Scala

I am adding new features to an open source project(pillar) to migrate Cassandra tables. I have a problem in operation that insert values a new table.
There is a table in Cassandra:
create table customer(
name text,
age int,
point int,
primary key(name, age)
)
I want to migrate from this table to test_person table.
create table test_person (
name text,
surname text,
point int,
city text,
primary key(name)
)
Here is an operation:
var s: PreparedStatement = session.prepare("insert into test_person (name, age, point) values (?, ?, ?)");
var r: Row = session.execute("select * from customer").one()
var arr: Array[AnyRef] = new Array[AnyRef](3)
arr(0) = row.getObject("name")
arr(1) = row.getObject("age")
arr(2) = row.getObject("point")
session.execute(s.bind(arr))
This is error message:
Type mismatch Can't assign primitive value to object.
I got as object and assign an array typed of AnyRef. What is wrong?
How can I handle this

This is happening because there is an implicit conversion happening from java.lang.Integer to Int. And Int is of type AnyVal not an AnyRef. Try using Array[Any] instead of Array[AnyRef] OR You can disable the implicit conversion by import scala.Predef.{Integer2int => _}
// This method in Predef.scala is causing the conversion
implicit def Integer2int(x: java.lang.Integer): Int

That is because, AnyRef is for objects and AnyVal is for primitives. You can use an Array[Any] in your case:
var s: PreparedStatement = session.prepare("insert into test_person (name, age, point) values (?, ?, ?)");
var r: Row = session.execute("select * from customer").one()
val arr = Array(r.getString("name"), r.getInt("age"), r.getInt("point"))

Join on two foreign keys from same table in scalikejdbc

So i have a one table that has two FK that points at same table.
For example:
Message table with columns sender and receiver that both references id in user table.
When i'm writing query to fetch message and join on both the result is same use for both, the first one.
Here is how i'm trying to do it.
import scalikejdbc._
Class.forName("org.h2.Driver")
ConnectionPool.singleton("jdbc:h2:mem:hello", "user", "pass")
implicit val session = AutoSession
sql"""
create table members (
id serial not null primary key,
name varchar(64),
created_at timestamp not null
)
""".execute.apply()
sql"""
create table message (
id serial not null primary key,
msg varchar(64) not null,
sender int not null,
receiver int not null
)
""".execute.apply()
Seq("Alice", "Bob", "Chris") foreach { name =>
sql"insert into members (name, created_at) values (${name}, current_timestamp)".update.apply()
}
Seq(
("msg1", 1, 2),
("msg2", 1, 3),
("msg3", 2, 1)
) foreach { case (m, s, r) =>
sql"insert into message (msg, sender, receiver) values (${m}, ${s}, ${r})".update.apply()
}
import org.joda.time._
case class Member(id: Long, name: Option[String], createdAt: DateTime)
object Member extends SQLSyntaxSupport[Member] {
override val tableName = "members"
def apply(mem: ResultName[Member])(rs: WrappedResultSet): Member = new Member(
rs.long("id"), rs.stringOpt("name"), rs.jodaDateTime("created_at"))
}
case class Message(id: Long, msg: String, sender: Member, receiver: Member)
object Message extends SQLSyntaxSupport[Message] {
override val tableName = "message"
def apply(ms: ResultName[Message], s: ResultName[Member], r: ResultName[Member])(rs: WrappedResultSet): Message = new Message(
rs.long("id"), rs.string("msg"), Member(s)(rs), Member(r)(rs))
}
val mem = Member.syntax("m")
val s = Member.syntax("s")
val r = Member.syntax("r")
val ms = Message.syntax("ms")
val msgs: List[Message] = sql"""
select *
from ${Message.as(ms)}
join ${Member.as(s)} on ${ms.sender} = ${s.id}
join ${Member.as(r)} on ${ms.receiver} = ${r.id}
""".map(rs => Message(ms.resultName, s.resultName, r.resultName)(rs)).list.apply()
Am I doing something wrong or is it bug?

Sorry for late reply. We have the Google Group ML and I actively read notifications from the group.
When you're in a hurry, please post stackoverflow URLs there. https://groups.google.com/forum/#!forum/scalikejdbc-users-group
In this case, you need to write select ${ms.result.*}, ${s.result.*} instead of select *. Please read this page for details. http://scalikejdbc.org/documentation/sql-interpolation.html

Anorm string set from postgres ltree column

I have a table with one of the columns having ltree type, and the following code fetching data from it:
SQL("""select * from "queue"""")()
.map(
row =>
{
val queue =
Queue(
row[String]("path"),
row[String]("email_recipients"),
new DateTime(row[java.util.Date]("created_at")),
row[Boolean]("template_required")
)
queue
}
).toList
which results in the following error:
RuntimeException: TypeDoesNotMatch(Cannot convert notification.en.incident_happened:class org.postgresql.util.PGobject to String for column ColumnName(queue.path,Some(path)))
queue table schema is the following:
CREATE TABLE queue
(
id serial NOT NULL,
template_id integer,
template_version integer,
path ltree NOT NULL,
json_params text,
email_recipients character varying(1024) NOT NULL,
email_from character varying(128),
email_subject character varying(512),
created_at timestamp with time zone NOT NULL,
sent_at timestamp with time zone,
failed_recipients character varying(1024),
template_required boolean NOT NULL DEFAULT true,
attachments hstore,
CONSTRAINT pk_queue PRIMARY KEY (id ),
CONSTRAINT fk_queue__email_template FOREIGN KEY (template_id)
REFERENCES email_template (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
ALTER TABLE queue
OWNER TO postgres;
GRANT ALL ON TABLE queue TO postgres;
GRANT SELECT, UPDATE, INSERT, DELETE ON TABLE queue TO writer;
GRANT SELECT ON TABLE queue TO reader;
Why is that? Isn't notification.en.incident_happened just an ordinary string? Or am I missing anything?
UPD:
The question still applies, but here is a workaround:
SQL("""select id, path::varchar, email_recipients, created_at, template_required from "queue"""")()

This looked like a fun project so I implemented the ltree column mapper.
I piggybacked off anorm-postgresql, since that project already implements some postgres types in anorm. It looks good, and it would be useful if it implemented the full range of postgres types. My code has been merged in, so you can use that library. Alternatively, just use the following code:
import org.postgresql.util.PGobject
import anorm._
object LTree {
implicit def rowToStringSeq: Column[Seq[String]] = Column.nonNull { (value, meta) =>
val MetaDataItem(qualified, nullable, clazz) = meta
value match {
case pgo:PGobject => {
val seq = pgo.getValue().split('.')
Right(seq.toSeq)
}
case x => Left(TypeDoesNotMatch(x.getClass.toString))
}
}
implicit def stringSeqToStatement = new ToStatement[Seq[String]] {
def set(s: java.sql.PreparedStatement, index: Int, aValue: Seq[String]) {
val stringRepresentation = aValue.mkString(".")
val pgo:org.postgresql.util.PGobject = new org.postgresql.util.PGobject()
pgo.setType("ltree");
pgo.setValue( stringRepresentation );
s.setObject(index, pgo)
}
}
}
Then you can map an ltree to a Seq[String]. Notice that it is a sequence of path elements order matters so it is a Seq[String], rather than String or Set[String]. If you want a single string just say path.mkString("."). Usage below:
import LTree._
SQL("""select * from "queue"""")()
.map(
row =>
{
val queue =
Queue(
row[Seq[String]]("path"),
row[String]("email_recipients"),
new DateTime(row[java.util.Date]("created_at")),
row[Boolean]("template_required")
)
queue
}
).toList

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Ksqldb create stream error while using oneof protobuf on schema registry - apache-kafka

Related

How to represent nulls in DataSets consisting of list of case classes

How to create a scala case class instance with a Map instance

Primitive types to AnyRef in Scala

Join on two foreign keys from same table in scalikejdbc

Anorm string set from postgres ltree column

Categories

Resources