How to read the nested avro fields for creating streams?

How to read the nested avro fields for creating streams? - apache-kafka

I have following AVRO message in Kafka topic.
{
"table": {
"string": "Schema.xDEAL"
},
"op_type": {
"string": "Insert"
},
"op_ts": {
"string": "2018-03-16 09:03:25.000462"
},
"current_ts": {
"string": "2018-03-16 10:03:37.778000"
},
"pos": {
"string": "00000000000000010722"
},
"before": null,
"after": {
"row": {
"DEA_PID_DEAL": {
"string": "AAAAAAAA"
},
"DEA_NME_DEAL": {
"string": "MY OGG DEAL"
},
"DEA_NME_ALIAS_NAME": {
"string": "MY OGG DEAL"
},
"DEA_NUM_DEAL_CNTL": {
"string": "4swb6zs4"
}
}
}
}
When I run the following query. It creates the stream with null values.
CREATE STREAM tls_deal (DEA_PID_DEAL VARCHAR, DEA_NME_DEAL varchar, DEA_NME_ALIAS_NAME VARCHAR, DEA_NUM_DEAL_CNTL VARCHAR) WITH (kafka_topic='deal-ogg-topic',value_format='AVRO', key = 'DEA_PID_DEAL');
But when I change the AVRO message to following it works.
{
"table": {
"string": "Schema.xDEAL"
},
"op_type": {
"string": "Insert"
},
"op_ts": {
"string": "2018-03-16 09:03:25.000462"
},
"current_ts": {
"string": "2018-03-16 10:03:37.778000"
},
"pos": {
"string": "00000000000000010722"
},
"DEA_PID_DEAL": {
"string": "AAAAAAAA"
},
"DEA_NME_DEAL": {
"string": "MY OGG DEAL"
},
"DEA_NME_ALIAS_NAME": {
"string": "MY OGG DEAL"
},
"DEA_NUM_DEAL_CNTL": {
"string": "4swb6zs4"
}
}
Now If I run the above query the data will be populated.
My question is If I need to populate stream from nested field how can I handle this?
I am not able to find the solution in KSQL documentation page.
Thanks in advance. I appreciate the help. :)

As Robin states, this is not currently supported, (22 Mar 2018 / v0.5). However, it is a tracked feature request. You may want to up-vote or track this Github issue in the KSQL repo:
https://github.com/confluentinc/ksql/issues/638

KSQL does not currently (22 Mar 2018 / v0.5) support nested Avro. You can use Single Message Transform to flatten the data coming from Kafka Connect. For example, Debezium ships with UnwrapFromEnvelope.

Related

GA4 (Google Analytics) sessions based on UTM params

I am trying to fetch sessions from GA4 which are relevant to specific UTM params.
In GA3 we were able to use segments (sessions::condition::ga:source==X;ga:medium==Y) but I can not find a way to do this on GA4.
POST https://analyticsdata.googleapis.com/v1beta/#{property}:runReport`
Payload like this:
body = {
"metrics": [
{
"name": "sessions::condition::ga:source==X;ga:medium==Y"
}
],
"dimensions": [
{
"name": "date"
}
],
"dateRanges": [
{
"startDate": '2022-01-01',
"endDate": '2022-01-30',
"name": "current_year"
}
]
}
Returns: Field sessions::condition::ga:source==X;ga:medium==Y is not a valid metric.. Is there a way to do this via new API?
Should I use dimension filter to achieve that? I need to query on both source&medium but it is not clear how do I do this?
"dimensionFilter": {
"filter": {
"fieldName": "firstUserMedium",
"stringFilter": {
"value": "Y"
}
}
}

A dimension filter on sessionSource & sessionMedium returns sessions that have those specific utm_source & utm_medium values. See the dimensions & metrics page for a description of these and other dimensions & metrics.
The needed dimension filter is similar to the following. See Dimension Filters in Creating a Report for more info.
"dimensionFilter": {
"andGroup": {
"expressions": [
{
"filter": {
"fieldName": "sessionSource",
"stringFilter": {
"value": "X"
}
}
},
{
"filter": {
"fieldName": "sessionMedium",
"stringFilter": {
"value": "Y"
}
}
}
]
}
},
Segments are not yet available today in the GA4 Data API.

I think you should check the dimensions and metrcis list for GA4 they dont start with ga
POST https://analyticsdata.googleapis.com/v1beta/properties/GA4_PROPERTY_ID:runReport
{
"dateRanges": [{ "startDate": "2020-09-01", "endDate": "2020-09-15" }],
"dimensions": [{ "name": "country" }],
"metrics": [{ "name": "activeUsers" }]
}
Also at this time i don't think it supports segments.

Loopback 3 query by Property of a embedded model

I'm using loopback 3 to build a backend with mongoDB.
So i have 2 models: Object and Attachment. Object have a relation Embeds2Many to Attachment.
Objects look like that in mongoDB
[
{
"fieldA": "valueA1",
"attachments": [
{
"id": 1,
"url": "abc.com/image1"
},
{
"id": 2,
"url": "abc.com/image2"
}
]
},
{
"fieldA": "valueA2",
"attachments": [
{
"id": 4,
"url": "abc.com/image4"
},
{
"id": 5,
"url": "abc.com/image5"
}
]
}
]
The question is: how can i get Objects with attachments.id=4 over the RestAPI?
I have try with the where and include filter. But it didn't work. It look like, that this function is not implemented in loopback3, right?

I have found the solution. It only works on Mongodb, Cloudant and Memory database.
{
"filter": {
"where": {
"attachments.id": 4
}
}
}

Ingesting multi-valued dimension from comma sep string

I have event data from Kafka with the following structure that I want to ingest in Druid
{
"event": "some_event",
"id": "1",
"parameters": {
"campaigns": "campaign1, campaign2",
"other_stuff": "important_info"
}
}
Specifically, I want to transform the dimension "campaigns" from a comma-separated string into an array / multi-valued dimension so that it can be nicely filtered and grouped by.
My ingestion so far looks as follows
{
"type": "kafka",
"dataSchema": {
"dataSource": "event-data",
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "timestamp",
"format": "posix"
},
"flattenSpec": {
"fields": [
{
"type": "root",
"name": "parameters"
},
{
"type": "jq",
"name": "campaigns",
"expr": ".parameters.campaigns"
}
]
}
},
"dimensionSpec": {
"dimensions": [
"event",
"id",
"campaigns"
]
}
},
"metricsSpec": [
{
"type": "count",
"name": "count"
}
],
"granularitySpec": {
"type": "uniform",
...
}
},
"tuningConfig": {
"type": "kafka",
...
},
"ioConfig": {
"topic": "production-tracking",
...
}
}
Which however leads to campaigns being ingested as a string.
I could neither find a way to generate an array out of it with a jq expression in flattenSpec nor did I find something like a string split expression that may be used as a transformSpec.
Any suggestions?

Try setting useFieldDiscover: false in your ingestion spec. when this flag is set to true (which is default case) then it interprets all fields with singular values (not a map or list) and flat lists (lists of singular values) at the root level as columns.
Here is a good example and reference link to use flatten spec:
https://druid.apache.org/docs/latest/ingestion/flatten-json.html

Looks like since Druid 0.17.0, Druid expressions support typed constructors for creating arrays, so using expression string_to_array should do the trick!

Loading Json Data to hbase using pyspark

i wanted to load data into a Hbase Table using pyspark,
Can some one help how to load the json data to Hbase as ticid as rowkey as and all other goes into one column family.
Please find the json below.
{
"ticid": "1496",
"ticlocation": "vizag",
"custnum": "222",
"Comments": {
"comment": [{
"commentno": "1",
"desc": "journey",
"passengerseat": {
"intele": "09"
},
"passengerloc": {
"intele": "s15"
}
}, {
"commentno": "5",
"desc": " food",
"passengerseat": {
"intele": "09"
},
"passengerloc": {
"intele": "s15"
}
}, {
"commentno": "12",
"desc": " service",
"passengerseat": {
"intele": "09"
},
"passengerloc": {
"intele": "s15"
}
}]
},
"Rails": {
"Rail": [{
"Traino": "AP1545",
"startcity": "vizag",
"passengerseat": "5"
}, {
"Traino": "AP1555",
"startcity": "HYD",
"passengerseat": "15A"
}]
}
}

I assume that you don't have a single row to load but thousands or millions of rows? I would recommend converting your JSON data to TSV (tab separated) which is quite easy in Python and using the import-tsv feature of HBase
See also
Import TSV file into hbase table
Spark is not a good pattern for HBase Bulk load.

How to insert server timestamp in Firestore document using REST?

I want to insert a document using a REST call to Firestore createDocument method. One of the fields is a timestamp field that should be set on the server. With Android SDK it's as simple as annotating a Date field with #ServerTimestamp and keeping it null — now how do I do it in REST?
{
"fields": {
"timezoneId": {
"stringValue": "Europe\/London"
},
"city": {
"stringValue": "London"
},
"timestamp": {
"timestampValue": "???"
}
}
}
I tried using null, 0, empty string, timestamp — everything fails with an error requiring the standard RFC3339 format (e.g. 2018-01-31T13:50:30.325631Z). Is there any placeholder value I can use, or any way to obtain that timestamp?

The Android SDK doesn't execute a createDocument request when creating the document. Instead it uses the write request to issue an update and a transform request at the same time. If you are wanting to only use createDocument, then the answer is no.
Your payload would look something like this:
{
"writes": [
{
"update": {
"name": "projects/{projectId}/databases/{databaseId}/documents/{document_path}",
"fields": {
"timezoneId": {
"stringValue": "Europe\/London"
},
"city": {
"stringValue": "London"
}
}
},
// ensure the document doesn't exist
"currentDocument": {
"exists": false
}
},
{
"transform": {
"document": "projects/{projectId}/databases/{databaseId}/documents/{document_path}",
"fieldTransforms": [
{
"fieldPath": "timestamp",
"setToServerValue": "REQUEST_TIME"
}
]
}
}
]
}
The only downside to adding documents this way is you would need to generate the Document ID yourself (the SDK's generate them). I hope this helps.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to read the nested avro fields for creating streams? - apache-kafka

As Robin states, this is not currently supported, (22 Mar 2018 / v0.5). However, it is a tracked feature request. You may want to up-vote or track this Github issue in the KSQL repo: https://github.com/confluentinc/ksql/issues/638

KSQL does not currently (22 Mar 2018 / v0.5) support nested Avro. You can use Single Message Transform to flatten the data coming from Kafka Connect. For example, Debezium ships with UnwrapFromEnvelope.

Related

GA4 (Google Analytics) sessions based on UTM params

Loopback 3 query by Property of a embedded model

Ingesting multi-valued dimension from comma sep string

Loading Json Data to hbase using pyspark

How to insert server timestamp in Firestore document using REST?

Categories

Resources