KSQL Event Merging - Combining events from a single stream based on timestamp - apache-kafka

I'm trying to combine multiple events from a single input stream into a single output event grouped by timestamp using ksql. I would also like the output event to contain an average of the input events, although this isn't strictly nessersay and is more a nice to have.
Input Stream: Temperature
event1: {location: "hallway", value: 23, property_Id: "123", timestamp: "1551645625878"}
event2: {location: "bedroom", value: 21, property_Id: "123", timestamp: "1551645625878"}
event3: {location: "kitchen", value: 20, property_Id: "123", timestamp: "1551645625878"}
event4: {location: "hallway", value: 19, property_Id: "123", timestamp: "9991645925878"}
event5: {location: "bedroom", value: 18, property_Id: "123", timestamp: "9991645925878"}
event6: {location: "kitchen", value: 18, property_Id: "123", timestamp: "9991645925878"}
(desired) Output Stream:
event1:
{
"property_id": "123",
"timestamp": "1551645625878",
"average_temperature": 21,
"temperature": [
{
"location": "hallway",
"value": 23
},
{
"location": "bedroom",
"value": 21
},
{
"location": "kitchen",
"value": 20
}
]
}
event2:
{
"property_id": "123",
"timestamp": "9991645925878",
"average_temperature": 18,
"temperature": [
{
"location": "hallway",
"value": 19
},
{
"location": "bedroom",
"value": 18
},
{
"location": "kitchen",
"value": 18
}
]
}
As far as I can tell, this just isn't possible using ksql, can anyone confirm?

Correct, you cannot do this in KSQL currently. As of v5.1 / March 2019 KSQL can read, but not build, nested objects: https://github.com/confluentinc/ksql/issues/2147 (please upvote/comment if you need this)
You could do the average calculation though with something like:
SELECT timestamp, SUM(value)/COUNT(*) AS avg_temp \
FROM input_stream \
GROUP BY timestamp;

Related

How can I compute the income of the sellerId of the products?

I'm trying to learn the advanced mongodb+mongoose function, so this is the result of my orders, and what I'm trying to do here is to compute the total amounts related to the sellerId
So in this one, I have two documents, the document 1 have an amount of 99 and and the other one is 11
so I need to get the sum of two. I've been searching and found the aggregate, but I can't figure out how I can combine the two documents.
[
{
"_id": "6360d1d0bd860240e2589564",
"userId": "6360cf687e186ebe29ab2a29",
"products": [
{
"productId": "6360cdd166480badb8c1e05b",
"quantity": 1,
"sellerId": "6360c6ed05e1e99034b5f7eb",
"_id": "6360d1d0bd860240e2589565"
}
],
"amount": 99,
"location": "asdsad",
"time": "asdsad",
"status": "pending",
"tax": 0.99,
},
{
"_id": "6360d7978044f3048e59bf34",
"userId": "6360d50dbd860240e258c585",
"products": [
{
"productId": "6360d7528044f3048e59bb6c",
"quantity": 1,
"sellerId": "6360d4d5bd860240e258c582",
"_id": "6360d7978044f3048e59bf35"
},
{
"productId": "6360d7868044f3048e59bd8c",
"quantity": 1,
"sellerId": "6360d4d5bd860240e258c582",
"_id": "6360d7978044f3048e59bf36"
}
],
"amount": 11,
"location": "Gym",
"time": "8:00 AM",
"status": "pending",
"tax": 0.11,
}
]
This might helps.
db.collection.aggregate([
{
$group: {
_id: null,
count: {
$sum: "$amount"
}
}
}
])

Increase Precision of Text Parsed via Facebook Duckling?

I'm using Facebook Duckling to parse some text but the results include some odd dimensions:
String text = "Tomorrow, February 28";
String result = Duckling.parseText(text);
Result:
[
{
"body": "Tomorrow, February 28",
"start": 0,
"value": {
"values": [
{
"value": "2022-02-28T00:00:00.000-08:00",
"grain": "day",
"type": "value"
}
],
"value": "2022-02-28T00:00:00.000-08:00",
"grain": "day",
"type": "value"
},
"end": 21,
"dim": "time",
"latent": false
},
{
"body": "28'",
"start": 19,
"value": {
"value": 28,
"type": "value",
"minute": 28,
"unit": "minute",
"normalized": {
"value": 1680,
"unit": "second"
}
},
"end": 22,
"dim": "duration",
"latent": false
},
{
"body": "28'",
"start": 19,
"value": {
"value": 28,
"type": "value",
"unit": "foot"
},
"end": 22,
"dim": "distance",
"latent": false
}
]
This result is odd since from the context of the query the text "28" is clearly referring to the day of month but Duckling also returns data as if it were referring to the Distance dimension.
Is there a way to make Duckling context aware and have it only return results matching the full query? Passing "dimensions" as argument is not ideal since I don't know the dimensions in advance.
Thanks

Calculate aggregates in a bucket in Upsert MongoDB update statement

My application gets measurements from a device that should be stored in a MongoDB database. Each measurement contains values for several probes of the device.
The measurements should displayed in an aggregation for a certain amount of time. I'm using the Bucket pattern in order to prepare the aggregates and simplify indexing and querying.
The following sample shows a document:
{
"DeviceId": "Device1",
"StartTime": 100, "EndTime": 199,
"Measurements": [
{ "timestamp": 100, "probeValues": [ { "id": "1", "t": 30 }, { "id": "2", "t": 67 } ] },
{ "timestamp": 101, "probeValues": [ { "id": "1", "t": 32 }, { "id": "2", "t": 67 } ] },
{ "timestamp": 102, "probeValues": [ { "id": "1", "t": 34 }, { "id": "2", "t": 55 } ] },
{ "timestamp": 103, "probeValues": [ { "id": "1", "t": 27 }, { "id": "2", "t": 30 } ] }
],
"probeAggregates": [
{ "id": "1", "cnt": 4, "total": 123 },
{ "id": "2", "cnt": 4, "total": 219 }
]
}
Updating the values and calculating the aggregates in a single request works well if the document already exists (1st block: query, 2nd: update, 3rd: options):
{
"DeviceId": "Device1",
"StartTime": 100,
"EndTime": 199
},
{
$push: {
"Measurements": {
"timestamp": 103,
"probeValues": [ { "id": "1", "t": 27 }, { "id": "2", "t": 30 } ]
}
},
$inc: {
"probeAggregates.$[probeAggr1].cnt": 1,
"probeAggregates.$[probeAggr1].total": 27,
"probeAggregates.$[probeAggr2].cnt": 1,
"probeAggregates.$[probeAggr2].total": 30
}
},
{
arrayFilters: [
{ "probeAggr1.id": "1" },
{ "probeAggr2.id": "2" }
]
}
Now I want to extend the statement to do a upsert if the document does not exist yet. However, if I do not change the update statement at all, there is the following error:
The path 'probeAggregates' must exist in the document in order to apply array updates.
If I try to prepare the probeAggregates array in case of an insert (e.g. by using $setOnInsert or $addToSet), this leads to another error:
Updating the path 'probeAggregates.$[probeAggr1].cnt' would create a conflict at 'probeAggregates'
Both errors can be explained and seem legit. One way to solve this would be to change the document structure and create one document per device, timeframe and probe and by that simplify the required update statement. In order to keep the number of documents low, I'd rather solve this by changing the update statement. Is there a way to create a valid document in an upsert?
(as I'm just learning to use a document db, feel free to share your experience in the comments on whether it is a good goal to keep the number of documents low in real world scenarios)

how to filter databse from mongodb between to string formatted date

[
{ "item": "journal", "qty": 25,"date":"1/1/2016", "status": "A" },
{ "item": "notebook", "qty": 50,"date":"10/1/2016", "status": "A" },
{ "item": "paper", "qty": 100,"date":"20/1/2016", "status": "D" },
{ "item": "planner", "qty": 75,"date":"1/2/2016", "status": "D" },
{ "item": "postcard", "qty": 45,"date":"10/2/2016", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"20/5/2016", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"30/7/2016", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"2/3/2017", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"5/5/2017", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"6/5/2017", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"8/10/2017", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"11/10/2017", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"12/11/2017", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"4/3/2018", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"5/6/2018", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"6/7/2018", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"7/7/2018", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"17/11/2018", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"19/12/2018", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"5/1/2019", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"7/1/2019", "status": "A" },
{ "item": "postcard", "qty": 45,"date":"14/3/2019", "status": "A" }
]
Above Is my database structure.
db.lichi.find({date: {$gte : '1/1/2016', $lt : '1/1/2019'}})
Here is the query i am trying to fetch data from database.
Here in database i have string formatted date.
I am tring to fetch using above way that, didn't work.
gave result
{ "item": "journal", "qty": 25,"date":"1/1/2016", "status": "A" },
{ "item": "notebook", "qty": 50,"date":"10/1/2016", "status": "A" }
Only two documents.
PLease have a look.
Storing dates as string is not the best idea since in case like this you have to compare strings instead of dates. If for some reason you have to keep date as string then you can convert it in your query using $dateFromString and then apply your filtering condition:
db.lichi.aggregate([
{
$addFields: {
date: {
$dateFromString: {
dateString: "$date",
format: "%d/%m/%Y"
}
}
}
},
{
$match: {
date: { $gte: ISODate("2016-01-01T00:00:00Z"), $lt: ISODate("2019-01-01T00:00:00Z") }
}
}
])

MongoDB Aggregation Error Returning wrong result

I have my json object like this
{
"_id": "5c2e811154855c0012308f00",
"__pclass": "QXRzXFByb2plY3RcTW9kZWxcUHJvamVjdA==",
"id": 44328,
"name": "Test project via postman2//2",
"address": "some random address",
"area": null,
"bidDate": null,
"building": {
"name": "Health Care Facilities",
"type": "Dental Clinic"
},
"collaborators": [],
"createdBy": {
"user": {
"id": 7662036,
"name": "Someone Here"
},
"firm": {
"id": 2520967,
"type": "ATS"
}
},
"createdDate": "2019-01-03T21:39:29Z",
"customers": [],
"doneBy": null,
"file": null,
"firm": {
"id": 1,
"name": "MyFirm"
},
"leadSource": {
"name": "dontknow",
"number": "93794497"
},
"location": {
"id": null,
"city": {
"id": 567,
"name": "Bahamas"
},
"country": {
"id": 38,
"name": "Canada"
},
"province": {
"id": 7,
"name": "British Columbia"
}
},
"modifiedBy": null,
"modifiedDate": null,
"projectPhase": {
"id": 1,
"name": "pre-design"
},
"quotes": [{
"id": 19,
"opportunityValues": {
"Key1": 100,
"Key2 Key2": 100,
"Key3 Key3 Key3": 200,
}
}],
"specForecast": [],
"specIds": [],
"tags": [],
"valuation": "something"
}
I am trying to aggregate using this query in MongoDB. My aggregation key is 4 level deep and also contains spaces. On all online examples shows me the aggregation at the first level. Looking to the online codes, I tried to re-iterate the same with my 4th level deep key.
db.mydata.aggregate([
{$match: {"id": 44328 } } ,
{$group: { _id: "$quotes.id",
totalKey2:{ $sum: "$quotes.opportunityValues.Key2 Key2"},
totalKey3:{ $sum: "$quotes.opportunityValues.Key3 Key3 Key3"}
}
}
]);
This should return
_id totalKey2 totalKey3
0 19 100 300
But it is returning
_id totalKey2 totalKey3
0 19 0 0
What am I doing Wrong?
Although it's not recommended to use space in field names in Mongo, it works as expected.
The problem with your query is that "quotes" is an array and you should first unwind it before grouping it.
This works as expected:
db.mydata.aggregate([
{ $match: { "id": 44328 } } ,
{ $unwind: "$quotes" },
{ $group: { _id: "$quotes.id",
totalKey2:{ $sum: "$quotes.opportunityValues.Key2 Key2" },
totalKey3:{ $sum: "$quotes.opportunityValues.Key3 Key3 Key3" } }
}
]);