KSQL streams - EXPLODE null issue

KSQL streams - EXPLODE null issue - apache-kafka

I am working on a JSON stream related to Malaria medicine availability in Zambia and have come across an issue I can't seem to find an answer for online. I am being sent JSON that looks like the one below.
{
"Country": "Zambia",
"City": "Lusaka",
"Area": [
"Northmead"
],
"MalariaMedicine": [
{
"pharmacyName": "Northmead Health",
"brand": "Chloroquin",
"quantity": 65,
"batchNumber": "CHLORO 628 C",
"bestBeforeDate": "2025-05-23",
"expired": false,
"batchInformation": {
"number": "CHLORO 628 C",
"expiration": "2025-01-23"
}
},
{
"pharmacyName": "Prime Pharmacy",
"brand": "Quinin",
"quantity": 205,
"batchNumber": "QUIN 560 Q",
"bestBeforeDate": "2028-01-01",
"expired": false,
"batchInformation": {
"number": "QUIN 560 Q",
"expiration": "2028-01-01"
}
}
]
}
I have pushed the JSON into a topic called Malaria and I used the code below to create a JSON stream.
CREATE STREAM MALARIASTREAM
(
COUNTRY STRING,
CITY STRING,
AREA ARRAY<STRING>,
MALARIAMEDICINE ARRAY<STRUCT<PHARMACYNAME STRING, BRAND STRING, QUANTITY INTEGER, BATCHNUMBER STRING, BESTBEFOREDATE STRING, EXPIRED BOOLEAN, BATCHINFORMATION STRUCT<NUMBER STRING, EXPIRATION STRING>>>
)
WITH (KAFKA_TOPIC='Malaria', KEY_FORMAT='KAFKA', VALUE_FORMAT='JSON');
The issue I have comes when I try to extract the data using the SELECT statement below
SELECT
COUNTRY,
CITY,
EXPLODE(AREA) AS AREA,
EXPLODE(MALARIAMEDICINE)->pharmacyName,
EXPLODE(MALARIAMEDICINE)->brand,
EXPLODE(MALARIAMEDICINE)->quantity,
EXPLODE(MALARIAMEDICINE)->batchNumber,
EXPLODE(MALARIAMEDICINE)->bestBeforeDate,
EXPLODE(MALARIAMEDICINE)->expired
FROM
MalariaStream EMIT CHANGES;
In the result set returned, the value of the AREA column is NULL for the second row. Both pharmacies are in the Northmead area so I want the second row to say Northmead as well.
How do I get the second row to also say Northmead?

If you know that you will always have one element arrays, you could use ELT(1, Area) to select the first element of that singleton array.
https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/scalar-functions/#elt

Related

Azure data factory pass activity output to a dataset

I am using a SQL Server query which would return the last 3 months since a customer last purchased a product. For instance, There's a customer 100 that last made a purchase in August 2022. The SQL query will return June, July, August. Which would be in the format 062022, 072022, 082022. Now I need to be able to pass these values to the Copy data activity REST api dataset Relative URL (/salemonyr/062022) in the ForEach activity.
So during the first iteration the Relative URL should be set to /salemonyr/062022 the second would be /salemonyr/072022 and third /salemonyr/082022.
Error: The expression 'length(activity('MonYear').output.value)' cannot be evaluated because property 'value' doesn't exist, available properties are 'resultSetCount, recordsAffected, resultSets, outputParameters, outputLogs, outputLogsLocation, outputTruncated, effectiveIntegrationRuntime, executionDuration, durationInQueue, billingReference
Script activity json:
{
"resultSetCount": 1,
"recordsAffected": 0,
"resultSets": [
{
"rowCount": 3,
"rows": [
{
"MonYear": 062022
},
{
"MonYear": 072022
},
{
"MonYear": 082022
}
]
}
],
"outputParameters": {},
"outputLogs": "",
"outputLogsLocation": "",
"outputTruncated": false,
"effectiveIntegrationRuntime": "",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 3
},
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
How would I accomplish this to read the values dynamically from the SQL query.

You can use #split(item().colname,',')[0] , split(item().colname,',')[1] and split(item().colname,',')[2] in the relative URL path.
Check the below video for details:

You can use REST Dataset parameter and use it in the Relative URL.
Relative URL:
Give lookup output to ForEach. use your query in lookup.
Give this to ForEach and inside ForEach, in copy sink(REST DATASET) use the below expression for the dataset parameter.
/salemonyr/#{item().sample_date}
In source, you can give your source.
By this, you can copy the data to the respective Relative URL.

Strapi API Rest V 3.6.8 how to search posts by title?

I have installed version 3.6.8 of Strapi
In the docs for v3.x
https://strapi.gitee.io/documentation/v3.x/content-api/parameters.html#filter
Filters are used as a suffix of a field name:
No suffix or eq: Equals
ne: Not equals
lt: Less than
gt: Greater than
lte: Less than or equal to
gte: Greater than or equal to
in: Included in an array of values
nin: Isn't included in an array of values
contains: Contains
ncontains: Doesn't contain
containss: Contains case sensitive
ncontainss: Doesn't contain case sensitive
null: Is null/Is not null
And I can see those examples
GET /restaurants?_where[price_gte]=3
GET /restaurants?id_in=3&id_in=6&id_in=8
etc..
So I tried
/posts?_where[title_contains]=foo
/posts?title_contains=foo
And I also tried the "new way" in V4
/posts?filters[title][contains]=foo
But all of this attempts return all the post, exactly the same than just doing
/posts?
Any idea how to filter by post title and/or post body?

Almost there my friend! The issue you are facing called deep filtering (please follow the link for documentation).
In Short: the title field is located inside the attributes object for each item
Your items may look something similar to this:
{
"data": [
{
"id": 1,
"attributes": {
"title": "Restaurant A",
"description": "Restaurant A's description"
},
"meta": {
"availableLocales": []
}
},
{
"id": 2,
"attributes": {
"title": "Restaurant B",
"description": "Restaurant B's description"
},
"meta": {
"availableLocales": []
}
},
]
}
And therefor the filter should be
/api/posts?filters[attributes][title][$contains]=Restaurant
Also note:
the $ sign that should be included for your operator (in our case contains)
the api prefix you should use before the plural api id (e.g. posts, users, etc.)
you may prefer using $containsi operator in order to ignore upper and lower case letters (better for searching operations)
Let me know if it worked for you!

How to improve this query using array or json constructor?

how are you?
I am wondering how to improve this query to return something better to work with. Let me show the tables, the current query and my idea first:
Tables
users
nfts
here owner_id is a fk to users.id
users_nfts (here I save all the creators of the nft, one nft could have more than one creator. This is to calculate royalties later)
Current query and explanation
To be able to code a "buy" nft process (in nodejs) I want to retrieve some data about the nft to buy:
Its price
Its current owner
The creators (to calculate the royalties and update their balances)
SELECT nfts.id, price, owner_id, owner.balance as owner_balance, creators.user_id, users.balance
FROM nfts
INNER JOIN users_nfts as creators
ON nfts.id = creators.nft_id
INNER JOIN users
ON creators.user_id = users.id
INNER JOIN users as owner
ON nfts.owner_id = owner.id
WHERE nfts.id = ${nft_id}
The query works but it's horrible because it retrieves me repeated data (this is what I want to solve). The red square means the repeated data.
What I would like to achieve
I would like to make a query so all the data about the NFT comes in one row. To do that, I need to retrieve the user_id and balance inside an array of tuples or in a json.
The result in my backend could be something like (any ideas here are welcome):
{
"id": "ea850c65-818e-40bd-bb06-af69eaeda4a6", // nft id
"price": 42,
"owner_id": "1134e9e0-02ae-4567-9adf-220ead36a6ef",
"owner_balance": 100,
"creators": [
{
"user_id": "1134e9e0-02ae-4567-9adf-220ead36a6ef",
"balance": 100,
},
{
"user_id": "2134e9e0-02ae-4567-9adf-220ead36a6ea",
"balance": 35,
},
],
},
Thanks in advance for any tips :)

I achieved what I want with this query, using json_agg and json_build_object
SELECT nfts.id, price, owner_id, owner.balance as owner_balance, json_agg(json_build_object('user_id', creators.user_id, 'balance', users.balance)) as creators
FROM nfts
INNER JOIN users_nfts as creators
ON nfts.id = creators.nft_id
INNER JOIN users
ON creators.user_id = users.id
INNER JOIN users as owner
ON nfts.owner_id = owner.id
WHERE nfts.id = ${nft_id}
GROUP BY nfts.id, owner.id;
The query produces the following output:
{
"rows": [
{
"id": "d87716ec-4005-4ccb-9970-6769adec3aa1",
"price": 42,
"owner_id": "7dd619dd-b997-4351-9541-4d8989c58667",
"owner_balance": 58,
"creators": [
{
"user_id": "1134e9e0-02ae-4567-9adf-220ead36a6ef",
"balance": 137.8
},
{
"user_id": "492851bb-dead-4c9d-b9f6-271dcf07a8bb",
"balance": 104.2
}
]
}
]
}
Hope someone else can find this useful.

How to map a json string into object type in sink transformation

Using Azure Data Factory and a data transformation flow. I have a csv that contains a column with a json object string, below an example including the header:
"Id","Name","Timestamp","Value","Metadata"
"99c9347ab7c34733a4fe0623e1496ffd","data1","2021-03-18 05:53:00.0000000","0","{""unit"":""%""}"
"99c9347ab7c34733a4fe0623e1496ffd","data1","2021-03-19 05:53:00.0000000","4","{""jobName"":""RecipeB""}"
"99c9347ab7c34733a4fe0623e1496ffd","data1","2021-03-16 02:12:30.0000000","state","{""jobEndState"":""negative""}"
"99c9347ab7c34733a4fe0623e1496ffd","data1","2021-03-19 06:33:00.0000000","23","{""unit"":""kg""}"
Want to store the data in a json like this:
{
"id": "99c9347ab7c34733a4fe0623e1496ffd",
"name": "data1",
"values": [
{
"timestamp": "2021-03-18 05:53:00.0000000",
"value": "0",
"metadata": {
"unit": "%"
}
},
{
"timestamp": "2021-03-19 05:53:00.0000000",
"value": "4",
"metadata": {
"jobName": "RecipeB"
}
}
....
]
}
The challenge is that metadata has dynamic content, meaning, that it will be always a json object but the content can vary. Therefore I cannot define a schema. Currently the column "metadata" on the sink schema is defined as object, but whenever I run the transformation I run into an exception:
Conversion from ArrayType(StructType(StructField(timestamp,StringType,false),
StructField(value,StringType,false), StructField(metadata,StringType,false)),true) to ArrayType(StructType(StructField(timestamp,StringType,true),
StructField(value,StringType,true), StructField(metadata,StructType(StructField(,StringType,true)),true)),false) not defined

We can get the output you expected, we need the expression to get the object Metadata.value.
Please ref my steps, here's my source:
Derived column expressions, create a JSON schema to convert the data:
#(id=Id,
name=Name,
values=#(timestamp=Timestamp,
value=Value,
metadata=#(unit=substring(split(Metadata,':')[2], 3, length(split(Metadata,':')[2])-6))))
Sink mapping and output data preview:
The key is that your matadata value is an object and may have different schema and content, may be 'value' or other key. We only can manually build the schema, it doesn't support expression. That's the limit.
We can't achieve that within Data Factory.
HTH.

How do you model case classes to reflect database queries results in a reusable manner

I will go with an example.
Say I have three tables defined like this:
(pseudocode)
Realm
id: number, pk
name: text, not null
Family
id: number, pk
realm_id: number, fk to Realm, pk
name: text, not null
Species
id: number, pk
realm_id: number, fk to Family (and therefore to Realm), pk,
family_id: number, fk to Family, pk,
name: text, not null
A temptative case classes definition would be
case class Realm (
id: Int,
name: String
)
case class Family (
id: Int,
realm: Realm,
name: String
)
case class Species (
id: Int,
family: Family,
name: String
)
If I make a json out of this after querying the database it would look like this:
SELECT *
FROM realm
JOIN family
ON family.realm_id = realm.id
JOIN species
ON species.family_id = family.id
AND species.realm_id = family.realm_id
Example data:
[{
"id": 1,
"family": {
"id": 1,
"name": "Mammal",
"realm": {
"id": 1,
"name": "Animal"
}
},
"name": "Human"
},
{
"id": 2,
"family": {
"id": 1,
"name": "Mammal",
"realm": {
"id": 1,
"name": "Animal"
}
},
"name": "Cat"
}]
Ok, so far... This is usable, if I need to show every species grouped by realm, I would transform the JsValue or in javascript do filters, etc. However when posting data back to the server, these classes seem a little awkward. If I want to add a new species I would have to post something like this:
{
"id": ???,
"family": {
"id": 1,
"name": "Mammal", // Awkward
"realm": {
"id": 1,
"name": "Animal" // Awkward
}
},
"name": "Cat"
}
Should my classes be then:
case class Realm (
id: Int,
name: Option[String]
)
case class Family (
id: Int,
realm: Realm,
name: Option[String]
)
case class Species (
id: Option[Int],
family: Family,
name: String
)
Like this, I can omit posting what it seems to be unnecesary data, but then the classes definition don't reflect what is in the database which are not nullable fields.
Queries are projection of data. More or like Table.map(function) => Table2. When data is extracted from the database and I don't get the name field, it doesn't mean it is null. How do you deal with this things?

One way to deal with it is to represent the interconnection using other data structures instead of letting each level know about the next.
For example, in the places where you need to represent the entire tree, you could represent it with:
Map[Realm, Map[Family, Seq[Species]]]
And then just Realm in some places for example as a REST/JSON resource, and maybe (Species, Family, Realm) in some place where you only want to work with one species but need to know about the other two levels in the hierarchy.
I would also advice you to think two or three times about letting your model structure define your JSON structure, what happens with the codes that consumes your JSON when you change anything in your model classes? (And if you really want that, do you actually need to go via a model structure, why not build your JSON directly from the database results and skip one level of data transformation?)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

KSQL streams - EXPLODE null issue - apache-kafka

If you know that you will always have one element arrays, you could use ELT(1, Area) to select the first element of that singleton array. https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/scalar-functions/#elt

Related

Azure data factory pass activity output to a dataset

Strapi API Rest V 3.6.8 how to search posts by title?

How to improve this query using array or json constructor?

How to map a json string into object type in sink transformation

How do you model case classes to reflect database queries results in a reusable manner

Categories

Resources