kafka ksql extract json fleld literal dollar symbol - apache-kafka

I've got a data stream coming from the mongo CDC connector, but the trouble is that the stream key is in the form of a JSON string.
e.g.
{"id":"{ \"$oid\" : \"5bbb0c70cd0b9c06cf06c9c1\"}"}
I know that I can use extractjsonfield method to extract the data using jsonpath, however, I can't figure out how to extract the literal dollar symbol I've tried:
$.id.$oid
$.id[\$oid]
$.id.*
each time I get a null response, any ideas?

I guess that your problem is related to issue #1403.
You can use [\\" field_name \\"] to reference the column. For example,
SELECT EXTRACTJSONFIELD(test,'$[\\"$oid\\"]') FROM testing;

I have faced the same issue with Debezium MongoDB Connector.
Using [\\" field_name \\"] as #Giorgos pointed didn't work for me with ksqlDB 0.21.0
Instead [\" field_name \"] (single back slash) works.

Related

nvarchar(), MD5, and matching data in Bigquery

I am consolidating a couple of datasets in BQ, and in order to do that i need to run MD5 on some data.
The problem I'm having is a chunk of the data is coming MD5'ed already from Azure, and the original field is nvarchar()
I'm not as familiar with Azure, but what I find is that:
HASHBYTES('MD5',CAST('6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5' as nvarchar(max)))
returns
0xCD571D6ADB918DC1AD009FFC3786C3BC (which is expected value)
where
HASHBYTES('MD5','6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5')
returns
0x94B04255CD3F8FEC2B3058385F96822A which is equivalent to what i get if i run
MD5(TO_HEX(MD5('6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5'))) in Bigquery, but it is unfortunately not what i need to match to, i need to match to the nvarchar version in BQ but i cannot figure out how to do that.
Figured out the problem, posting here for posterity.
The field in Azure is being stored as a nvarchar(50) which is encoded as UTF-16LE

Convert XML PATH sample code from SQL Server to DB2

I'm converting the SQL server to db2..
I need a solution for stuff and for xml path
Ex
Select stuff(select something
from table name
Where condition
For xml path(''),1,1,'')
Pls convert this into db2.
Your code is an old school XML "trick" to convert multiple values to a single string. (Often comma separated but in this case space separated.) Since those days DB2 (and the sql standards) have added a new function called listagg which is designed to solve this exact problem:
Select listagg(something,' ')
from table name
Where condition
db2 docs -
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_listagg.html
https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/db2/rbafzcollistagg.htm

How to format a byte string to set a value with cbt set (cloud bigtable command line tool)?

I have a field in Bigtable storing a timestamp. Using cbt lookup, the field displays like this "\x00\x00\x01d\x865W\x00"
This bytestring converts to an integer, for example via Python.
int.from_bytes(b"\x00\x00\x01d\x865W\x00",'big')
1531260000000
1531260000000 is a unix timestamp in microseconds. Converting to a human-readable format gives 2018-07-10T22:00:00+00:00
How can we update this field to a different timestamp using cbt?
From the docs we get the command
cbt set <table> <row> family:column=val
But how should the value be formatted to store it correctly?
I've tried cbt set mytable row1 family:timestamp=1531260000000, but then cbt lookup displays it as 1531260000000, not as a bytestring, and BigQuery does not display it at all, failing because the format is wrong.
I've also tried tried cbt set mytable row1 family:timestamp="\x00\x00\x01d\x865W\x00", but then cbt lookup displays the bytestring with escaped backslashes, which also does not work: "\\x00\\x00\\x01d\\x865W\\x00"
I looked in the source code for cbt but I'm not familiar enough with Go to figure it out from there.
From the current documentation looks like this could be done using a $’\’
I added an example from the documentation for lookup:
cbt -project my-project -instance my-instance lookup my-table $'\224\257\312W\365:\205d\333\2471\315\'
Please refer to the cbt reference for more information:
https://cloud.google.com/bigtable/docs/cbt-reference
According to this issue, you are not able to pass arbitrary bytes with cbt. The timestamp you provided is handled like a string by cbt, that's why it is escaped.

Oracle GoldenGate adapter for Kafka - JSON message contents

In My golden gate big data for kafka. when i try to update the record am getting only updated column and primary key column in after part in json file
{"table":"MYSCHEMATOPIC.PASSPORTS","op_type":"U","op_ts":"2018-03-17 13:57:50.000000","current_ts":"2018-03-17T13:57:53.901000","pos":"00000000030000010627","before":{"PASSPORT_ID":71541893,"PPS_ID":71541892,"PASSPORT_NO":"1234567","PASSPORT_NO_NUMERIC":241742,"PASSPORT_TYPE_ID":7,"ISSUE_DATE":null,"EXPIRY_DATE":"0060-12-21 00:00:00","ISSUE_PLACE_EN":"UN-DEFINED","ISSUE_PLACE_AR":"?????? ????????","ISSUE_COUNTRY_ID":203,"ISSUE_GOV_COUNTRY_ID":203,"IS_ACTIVE":1,"PREV_PASSPORT_ID":null,"CREATED_DATE":"2003-06-08 00:00:00","CREATED_BY":-9,"MODIFIED_DATE":null,"MODIFIED_BY":null,"IS_SETTLED":0,"MAIN_PASSPORT_PERSON_INFO_ID":34834317,"NATIONALITY_ID":590},
"after":{"PASSPORT_ID":71541893,"NATIONALITY_ID":589}}
In After part in my json out i want to show all columns
How to get all columns in after part?
gg.handlerlist = kafkahandler
gg.handler.kafkahandler.type=kafka gg.handler.kafkahandler.KafkaProducerConfigFile=custom_kafka_producer.properties
#The following resolves the topic name using the short table name
gg.handler.kafkahandler.topicMappingTemplate=passports
gg.handler.kafkahandler.format=json
gg.handler.kafkahandler.BlockingSend =false
gg.handler.kafkahandler.includeTokens=false
gg.handler.kafkahandler.mode=op
#gg.handler.kafkahandler.format.insertOpKey=I
#gg.handler.kafkahandler.format.updateOpKey=U
#gg.handler.kafkahandler.format.deleteOpKey=D
#gg.handler.kafkahandler.format.truncateOpKey=T
#gg.handler.kafkahandler.format.includeColumnNames=TRUE
goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE
gg.log=log4j
gg.log.level=info
gg.report.time=30sec
Try using the Kafka Connect handler instead - this includes the full payload. This article goes through the setup process.
Hi This issue is fixed by added below change in golden gate side
ADD TRANDATA table_name ALLCOLS

deserialize cassandra row key

I'm trying to use the sstablekeys utility for Cassandra to retrieve all the keys currently in the sstables for a Cassandra cluster. The format they come back in what appears to be serialized format when I run sstablekeys for a single sstable. Does anyone know how to deserialize the keys or get them back into their original format? They were inserted into Cassandra using astyanax, where the serialized type is a tuple in Scala. The key_validation_class for the column family is a CompositeType.
Thanks to the comment by Thomas Stets, I was able to figure out that the keys are actually just converted to hex and printed out. See here for a way of getting them back to their original format.
For the specific problem of figuring out the format of a CompositeType row key and unhexifying it, see the Cassandra source which describes the format of a CompositeType key that gets output by sstablekeys. With CompositeType(UTF8Type, UTF8Type, Int32Type), the UTF8Type treats bytes as ASCII characters (so the function in the link above works in this case), but with Int32Type, you must interpret the bytes as one number.