KSQL all messages failing in stream and there is data in topic - apache-kafka

I am relatively new to Kafka Stream and trying to print out a the messages in the stream I created.
Can someone tell me why all the messages are failed?
When I use the print command I get this
print 'main' from beginning limit 10;
Key format: ¯\_(ツ)_/¯ - no data processed
Value format: KAFKA_STRING
rowtime: 2021/05/29 14:17:57.375 Z, key: <null>, value: A-1,2,5/21/2019 8:29,5/21/2019 9:29,34.808868,-82.269157,34.808868,-82.269157,0,Accident on Tanner Rd at Pennbrooke Ln.,439,Tanner Rd,R,Greenville,Greenville,SC,29607-6027,US,U
S/Eastern,KGMU,5/21/2019 8:53,76,76,52,28.91,10,N,7,0,Fair,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,Day,Day,Day,Day
When I run describe extended of the stream I created I get the below:
Name : ACCIDENTS_ORIGINAL
Type : STREAM
Timestamp field : START_TIME
Key format : KAFKA
Value format : DELIMITED
Kafka topic : main (partitions: 1, replication: 1)
Statement : CREATE STREAM ACCIDENTS_ORIGINAL (ID STRING, SEVERITY INTEGER, START_TIME STRING,
END_TIME STRING, START_LAT DOUBLE, START_LNG DOUBLE, END_LAT DOUBLE, END_LNG DOUBLE, DISTANCE DOUBLE,
DESCRIPTION STRING, NUMBER DOUBLE, STREET STRING, SIDE STRING, CITY STRING, COUNTY STRING,
STATE STRING, ZIPCODE STRING, COUNTRY STRING, TIMEZONE STRING, AIRPORT_CODE STRING,
WEATHER_TIME STRING, TEMPERATURE DOUBLE, WIND_CHILL DOUBLE, HUMIDITY DOUBLE,
PRESSURE DOUBLE, VISIBILITY DOUBLE, WIND_DIRECTION STRING, WIND_SPEED STRING,
PRECIPITATION DOUBLE, WEATHER_CONDITION STRING, AMENITY BOOLEAN, BUMP BOOLEAN,
CROSSING BOOLEAN, GIVE_WAY BOOLEAN, JUNCTION BOOLEAN, NO_EXIT BOOLEAN, RAILWAY BOOLEAN,
ROUNDABOUT BOOLEAN, STATION BOOLEAN, STOP BOOLEAN, TRAFFIC_CALMING BOOLEAN,
TRAFFIC_SIGNAL BOOLEAN, TURNING_LOOP BOOLEAN, SUNRISE_SUNSET STRING,
CIVIL_TWILIGHT STRING, NAUTICAL_TWILIGHT STRING, ASTRONOMICAL_TWILIGHT STRING)
WITH (KAFKA_TOPIC='main', KEY_FORMAT='KAFKA', TIMESTAMP='Start_Time',
TIMESTAMP_FORMAT='yyyy-MM-dd HH:mm:ss', VALUE_FORMAT='DELIMITED');
Can anyone help me have a look and tell me what am I doing wrong here?
Small update, I also tried to set Key format as none and still get all the messages failed.

Related

How to group the dataframe for transformation

I have the following dataframe with schema:
<bound method DataFrame.printSchema of DataFrame[outer_value: string, value01: string, value02: string, value03: string, value04: string, value05: string, value06: string, value07: string, value08: string, value09: string, value10: string, value11: string, value12: string, value13: string, value14: string, value15: string, value16: string, value17: string, value18: string, value19: string, value20: string, value21: string, value22: string, value23: string, value24: string, value25: string, value26: string, value27: string, value28: string, value29: string, value30: string, value31: string, value32: string, value33: string, value34: string, value35: string, value36: string, value37: string, value38: string, value39: string, value40: string, value41: string, value42: string, value43: string, value44: string, value45: string, value46: string, value47: string, value48: string, value49: string, value50: string, value51: string, value52: string, value53: string, value54: string, value55: string]>
I would like to group the columns by 5 (divide 55 by 5 to create 11 column groups) and populate another dataframe with the following schema.
> [outer_value: string, value01: string, value02: string, value03:
> string, value04: string, value05:string]
The Image the below represents the first group of 5 columns used to populate the target schema. Likewise, the next group will be formed as : outer_value, value 6, value7, value 8, value 9, value 10.
it goes like that until the 11 groups are processed and each group populated the target dataframe.
Eventually, one row in the source data frame will end up as 11 rows in the target data frame.
How to achieve this?

DolphinDB error: A metric shouldn't be a constant in reactive state engine

My script is
share streamTable(1:0, `date`time`sym`market`price`qty, [DATE, TIME, SYMBOL, CHAR, DOUBLE, INT]) as trade
outputTable = table(100:0, `date`sym`factor1`flag, [DATE, STRING, DOUBLE, INT])
engine = createReactiveStateEngine(name="test", metrics=[<mavg(price, 3)>, <1>], dummyTable=trade, outputTable=outputTable, keyColumn=["date","sym"], filter=<date between 2012.01.01 : 2012.01.03>, keepOrder=true)
It throws A metric shouldn't be a constant. How to pass a constant to the metircs?
Try using <1*1> instead.

Scala Play JSON trouble converting List[CaseClass] to Json String

Not sure where I am going wrong with this it is returning error:
No Json serializer as JsObject found for type List[QM_Category].
Try to implement an implicit OWrites or OFormat for this type.
[error] Json.stringify(Json.toJsObject(a.categories))
Is there a way to define a format for just List[QM_Category]? I thought the format for QM_Category would handle the case class and play is supposed to handle Lists...
All I really want to do is take my List and convert it to json string. Pretty straight forward but I am not sure why Play Json doesnt like my format.
Here is my code:
case class QM_Answer (
answerid: String,
answerstring: String,
answerscore: Int
);
case class QM_Question (
questionid: String,
questionscore: Int,
questiongoal: Int,
questionstring: String,
questiontype: String,
questioncomments: String,
questionisna: Boolean,
questionishidden: Boolean,
failcategory: Boolean,
failform: Boolean,
answers: List[QM_Answer]
);
case class QM_Category (
categoryid: String,
categoryname: String,
categoryscore: Int,
categorygoal: Int,
categorycomments: String,
categoryishidden: Boolean,
failcategory: Boolean,
questions: List[QM_Question]
);
case class SurveySourceRaw (
ownerid: String,
formid: String,
formname: String,
sessionid: String,
evaluator: String,
userid: String,
timelinekey: Long,
surveyid: String,
submitteddate: Long,
month: String,
channel: String,
categories: List[QM_Category]
);
case class SurveySource (
ownerid: String,
formid: String,
formname: String,
sessionid: String,
evaluator: String,
userid: String,
timelinekey: Long,
surveyid: String,
submitteddate: Long,
month: String,
channel: String,
categories: String
);
implicit val qmAnswerFormat = Json.format[QM_Answer];
implicit val qmQuestionFormat = Json.format[QM_Question];
implicit val qmCategoryFormat = Json.format[QM_Category];
implicit val surveySourceRawFormat = Json.format[SurveySourceRaw];
var surveySourceRaw = sc
.cassandraTable[SurveySourceRaw]("mykeyspace", "mytablename")
.select("ownerid",
"formid",
"formname",
"sessionid",
"evaluator",
"userid",
"timelinekey",
"surveyid",
"submitteddate",
"month",
"channel",
"categories")
var surveyRelational = surveySourceRaw
.map(a => SurveySource
(
a.ownerid,
a.formid,
a.formname,
a.sessionid,
a.evaluator,
a.userid,
a.timelinekey,
a.surveyid,
a.submitteddate,
a.month,
a.channel,
Json.stringify(Json.toJsObject(a.categories))
))
The Play JSON format for a List[A], given a format for A, encodes/decodes a JSON array, e.g. for a List[String] [ "foo", "bar", "baz" ]. A JSON array is not a JSON object.
So if you want the List[QM_Category] to be a stringified JSON (but not necessarily a JSON object, e.g. it could be a string, array, etc.), you can use toJson:
Json.stringify(Json.toJson(a.categories))
Alternatively, if you want it to be a JSON object, you would need to define an OFormat (or an OReads/OWrites combination) for List[QM_Category]: an OFormat is a Format which requires that the JSON be an object with string attributes and JSON values (and so forth for OReads/OWrites).
I'm almost embarresed to answer this but sometimes I make it overly complicated.
The answer was to just read the column from cassandra as a string instead of a List[QM_Category]. The column in cassandra was defined as:
categories list<FROZEN<qm.category>>,
I wrongfully assumed I would need to read it in from Cassandra as a list of custom objects. I would then need to use play json to format that class into JSON and then stringify it.
case class SurveySourceRaw (
ownerid: String,
formid: String,
formname: String,
sessionid: String,
evaluator: String,
userid: String,
timelinekey: Long,
surveyid: String,
submitteddate: Long,
month: String,
channel: String,
categories: List[QM_Category]
);
When in reality, all I needed to do was read it from cassandra as a type String and it came in as a stringified json. Well played spark cassandra connector, well played.
case class SurveySourceRaw (
ownerid: String,
formid: String,
formname: String,
sessionid: String,
evaluator: String,
userid: String,
timelinekey: Long,
surveyid: String,
submitteddate: Long,
month: String,
channel: String,
categories: String
);

How to represent nulls in DataSets consisting of list of case classes

I have a case class
final case class FieldStateData(
job_id: String = null,
job_base_step_id: String = null,
field_id: String = null,
data_id: String = null,
data_value: String = null,
executed_unit: String = null,
is_doc: Boolean = null,
mime_type: String = null,
filename: String = null,
filesize: BigInt = null,
caption: String = null,
executor_id: String = null,
executor_name: String = null,
executor_email: String = null,
created_at: BigInt = null
)
That I want to use as part of a dataset of type Dataset[FieldStateData] to eventually insert into a database. All columns need to be nullable. How would I represent null types for numbers descended from Any rather than any string? I thought about using Option[Boolean] or something like that but will that automatically unbox during insertion or when it's used as a sql query?
Also note that the above code in not correct. Boolean types are not nullable. It's just an example.
You are correct to use Option Monad for in the case class. The field shall be unboxed by spark on read.
import org.apache.spark.sql.{Encoder, Encoders, Dataset}
final case class FieldStateData(job_id: Option[String],
job_base_step_id: Option[String],
field_id: Option[String],
data_id: Option[String],
data_value: Option[String],
executed_unit: Option[String],
is_doc: Option[Boolean],
mime_type: Option[String],
filename: Option[String],
filesize: Option[BigInt],
caption: Option[String],
executor_id: Option[String],
executor_name: Option[String],
executor_email: Option[String],
created_at: Option[BigInt])
implicit val fieldCodec: Encoder[FieldStateData] = Encoders.product[FieldStateData]
val ds: Dataset[FieldStateEncoder] = spark.read.source_name.as[FieldStateData]
When you write the Dataset back into the database, None become null values and Some(x) are the values that are present.

TypedPipe can't coerce strings to DateTime even when given implicit function

I've got a Scalding data flow that starts with a bunch of Pipe separated value files. The first column is a DateTime in a slightly non-standard format. I want to use the strongly typed TypedPipe API, so I've specified a tuple type and a case class to contain the data:
type Input = (DateTime, String, Double, Double, String)
case class LatLonRecord(date : DateTime, msidn : String, lat : Double, lon : Double, cellname : String)
however, Scalding doesn't know how to coerce a String into a DateTime, so I tried adding an implicit function to do the dirty work:
implicit def stringToDateTime(dateStr: String): DateTime =
DateTime.parse(dateStr, DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss.S"))
However, I still get a ClassCastException:
val lines: TypedPipe[Input] = TypedPipe.from(TypedPsv[Input](args("input")))
lines.map(x => x._1).dump
//cascading.flow.FlowException: local step failed at java.lang.Thread.run(Thread.java:745)
//Caused by: cascading.pipe.OperatorException: [FixedPathTypedDelimite...][com.twitter.scalding.RichPipe.eachTo(RichPipe.scala:509)] operator Each failed executing operation
//Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.joda.time.DateTime
What do I need to do to get Scalding to call my conversion function?
So I ended up doing this:
case class LatLonRecord(date : DateTime, msisdn : String, lat : Double, lon : Double, cellname : String)
object dateparser {
val format = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss.S")
def parse(s : String) : DateTime = DateTime.parse(s,format);
}
//changed first column to a String, yuck
type Input = (String, String, Double, Double, String)
val lines: TypedPipe[Input] = TypedPipe.from( TypedPsv[Input]( args("input")) )
val recs = lines.map(v => LatLonRecord(dateparser.parse(v._1), v._2, v._3,v._4, v._5))
But I feel like its a sub-optimal solution. I welcome better answers from people who have been using Scala for more than, say, 1 week, like me.