Scalacheck json and case classes - scala

I'm writing a service that takes a case class and serializes it to json, that I will then send to an instance running Elastic Search.
I'd like scalacheck to generate several case classes, with random missing data, like this:
val searchDescAndBrand = SearchEntry("", "Ac Adapters", "Sony", "", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", "", 0L)
val searchBrand = SearchEntry("", ", "Sony", "", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", "", 0L)
val searchPartNumberAndBrand = SearchEntry("02DUYT", "", "Sony", "", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", "", 0L)
you get the idea, either fill in the values or leave them empty (the last one is a Long type.
This is the easy part, the problem is that the generated json doesn't just omit the "filed", but omits a whole section, for example:
"""
|{
| "from" : 0,
| "size" : 10,
| "query" : {
| "bool" : {
| "must" : [
| {"match" : {
| "description" : {
| "query" : "Ac Adapters",
| "type" : "phrase"
| }
| }},
| {"match" : {
| "brand" : {
| "query" : "Sony",
| "type" : "phrase"
| }
| }}
| ]
| }
| }
|}
|
""".stripMargin)
if I had a case class with the first 3 fields with data, the json would be:
"""
|{
| "from" : 0,
| "size" : 10,
| "query" : {
| "bool" : {
| "must" : [
| {"match" : {
| "part_number" : {
| "query" : "02D875",
| "type" : "phrase"
| }
| }},
| {"match" : {
| "description" : {
| "query" : "Ac Adapters",
| "type" : "phrase"
| }
| }},
| {"match" : {
| "brand" : {
| "query" : "Sony",
| "type" : "phrase"
| }
| }}
| ]
| }
| }
|}
|
""".stripMargin)
So, in short, having a value means adding
{"match" : {
| "<specific name here, based on which value we have>" : {
| "query" : "<value from scalacheck>",
| "type" : "phrase"
| }
| }}
to the result.
How would you handle such a use case?

Related

Scala Elasticsearch query with multiple parameters

I need to delete certain entries from an Elasticsearch table. I cannot find any hints in the documentation. I'm also an Elasticsearch noob. The to be deleted rows will be identified by its type and an owner_id. Is it possible to call deleteByQuery with multiple parameters? Or any alternatives to reach the same?
I'm using this library: https://github.com/sksamuel/elastic4s
How the table looks like:
| id | type | owner_id | cost |
|------------------------------|
| 1 | house | 1 | 10 |
| 2 | hut | 1 | 3 |
| 3 | house | 2 | 16 |
| 4 | house | 1 | 11 |
In the code it looks like this currently:
deleteByQuery(someIndex, matchQuery("type", "house"))
and I would need something like this:
deleteByQuery(someIndex, matchQuery("type", "house"), matchQuery("owner_id", 1))
But this won't work since deleteByQuery only accepts a single Query.
In this example it should delete the entries with id 1 and 4.
Explaining it in JSON and rest API format, to make it more clear.
Index Sample documents
put myindex/_doc/1
{
"type" : "house",
"owner_id" :1
}
put myindex/_doc/2
{
"type" : "hut",
"owner_id" :1
}
put myindex/_doc/3
{
"type" : "house",
"owner_id" :2
}
put myindex/_doc/4
{
"type" : "house",
"owner_id" :1
}
Search using the boolean query
GET myindex/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"type": "house"
}
}
],
"filter": [
{
"term": {
"owner_id": 1
}
}
]
}
}
}
And query result
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.35667494,
"_source" : {
"type" : "house",
"owner_id" : 1
}
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.35667494,
"_source" : {
"type" : "house",
"owner_id" : 1
}
}
]

Kafka topic with Variable nested JSON object as KSQL DB stream

I'm trying to join two existing Kafka topics in KSQL. Some data samples from Kafka (actual values redacted due to corporate environment):
device topic:
{
"persistTime" : "2020-10-06T13:30:25.373Z",
"previous" : {
"device" : "REDACTED",
"type" : "REDACTED",
"group" : "REDACTED",
"inventoryState" : "unknown",
"managementState" : "registered",
"communicationId" : "REDACTED",
"manufacturer" : "",
"description" : "",
"model" : "",
"location" : {
"geo" : {
"latitude" : "REDACTED",
"longitude" : "REDACTED"
},
"address" : {
"city" : "",
"postalCode" : "",
"street" : "",
"houseNumber" : "",
"floor" : "",
"company" : "",
"country" : "",
"reference" : "",
"timeZone" : "",
"region" : "",
"district" : ""
},
"logicalInstallationPoint" : ""
},
"tags" : [ ]
},
"current" : {
"device" : "REDACTED",
"type" : "REDACTED",
"group" : "REDACTED",
"inventoryState" : "unknown",
"managementState" : "registered",
"communicationId" : "REDACTED",
"manufacturer" : "",
"description" : "",
"model" : "",
"location" : {
"geo" : {
"latitude" : "REDACTED",
"longitude" : "REDACTED"
},
"address" : {
"city" : "",
"postalCode" : "",
"street" : "",
"houseNumber" : "",
"floor" : "",
"company" : "",
"country" : "",
"reference" : "",
"timeZone" : "",
"region" : "",
"district" : ""
},
"logicalInstallationPoint" : ""
},
"tags" : [ ]
}
}
device-event topic (1st sample):
{
"device" : "REDACTED",
"event" : "403151",
"firstOccurrenceTime" : "2020-09-30T11:03:50.000Z",
"lastOccurrenceTime" : "2020-09-30T11:03:50.000Z",
"occurrenceCount" : 1,
"receiveTime" : "2020-09-30T11:03:50.000Z",
"persistTime" : "2020-09-30T14:32:59.580Z",
"state" : "open",
"context" : {
"2" : "25",
"3" : "0",
"4" : "60",
"1" : "REDACTED"
}
}
device-event topic (2nd sample):
{
"device" : "REDACTED",
"event" : "402004",
"firstOccurrenceTime" : "2020-10-07T07:02:48Z",
"lastOccurrenceTime" : "2020-10-07T07:02:48Z",
"occurrenceCount" : 1,
"receiveTime" : "2020-10-07T07:02:48Z",
"persistTime" : "2020-10-07T07:15:28.533Z",
"state" : "open",
"context" : {
"2" : "2020-10-07T07:02:48.0000000Z",
"1" : "REDACTED"
}
}
The issue that I'm facing is the varying amount of variables inside of context under the device-event topic.
I've tried the following statements for creating the events stream on ksqlDB:
CREATE STREAM "events"\
("device" VARCHAR, \
"event" VARCHAR, \
"firstOccurenceTime" VARCHAR, \
"lastOccurenceTime" VARCHAR, \
"occurenceCount" INTEGER, \
"receiveTime" VARCHAR, \
"persistTime" VARCHAR, \
"state" VARCHAR, \
"context" ARRAY<STRING>) \
WITH (KAFKA_TOPIC='device-event', VALUE_FORMAT='JSON');
CREATE STREAM "events"\
("device" VARCHAR, \
"event" VARCHAR, \
"firstOccurenceTime" VARCHAR, \
"lastOccurenceTime" VARCHAR, \
"occurenceCount" INTEGER, \
"receiveTime" VARCHAR, \
"persistTime" VARCHAR, \
"state" VARCHAR, \
"context" STRUCT\
<"1" VARCHAR, \
"2" VARCHAR, \
"3" VARCHAR, \
"4" VARCHAR>) \
WITH (KAFKA_TOPIC='ext_device-event_10195', VALUE_FORMAT='JSON');
The second statement only brings in data that has all four context variables present ("1", "2", "3" and "4").
How would one go about creating the KSQL equivalent stream for the device-event Kafka topic?
You need to use a MAP rather than a STRUCT.
BTW you also don't need the \ line separator any more :)
Here's a working example using ksqlDB 0.12.
Load the sample data into a topic
kafkacat -b localhost:9092 -P -t events <<EOF
{ "device" : "REDACTED", "event" : "403151", "firstOccurrenceTime" : "2020-09-30T11:03:50.000Z", "lastOccurrenceTime" : "2020-09-30T11:03:50.000Z", "occurrenceCount" : 1, "receiveTime" : "2020-09-30T11:03:50.000Z", "persistTime" : "2020-09-30T14:32:59.580Z", "state" : "open", "context" : { "2" : "25", "3" : "0", "4" : "60", "1" : "REDACTED" } }
{ "device" : "REDACTED", "event" : "402004", "firstOccurrenceTime" : "2020-10-07T07:02:48Z", "lastOccurrenceTime" : "2020-10-07T07:02:48Z", "occurrenceCount" : 1, "receiveTime" : "2020-10-07T07:02:48Z", "persistTime" : "2020-10-07T07:15:28.533Z", "state" : "open", "context" : { "2" : "2020-10-07T07:02:48.0000000Z", "1" : "REDACTED" } }
EOF
In ksqlDB, declare the stream:
CREATE STREAM "events" (
"device" VARCHAR,
"event" VARCHAR,
"firstOccurenceTime" VARCHAR,
"lastOccurenceTime" VARCHAR,
"occurenceCount" INTEGER,
"receiveTime" VARCHAR,
"persistTime" VARCHAR,
"state" VARCHAR,
"context" MAP < VARCHAR, VARCHAR >
) WITH (KAFKA_TOPIC = 'events', VALUE_FORMAT = 'JSON');
Query the stream to check things work:
ksql> SET 'auto.offset.reset' = 'earliest';
Successfully changed local property 'auto.offset.reset' to 'earliest'. Use the UNSET command to revert your change.
ksql> SELECT "device", "event", "receiveTime", "state", "context" FROM "events" EMIT CHANGES;
+----------+--------+--------------------------+--------+------------------------------------+
|device |event |receiveTime |state |context |
+----------+--------+--------------------------+--------+------------------------------------+
|REDACTED |403151 |2020-09-30T11:03:50.000Z |open |{1=REDACTED, 2=25, 3=0, 4=60} |
|REDACTED |402004 |2020-10-07T07:02:48Z |open |{1=REDACTED, 2=2020-10-07T07:02:48.0|
| | | | |000000Z} |
Use the [''] syntax to access specific keys within the map:
ksql> SELECT "device", "event", "context", "context"['1'] AS CONTEXT_1, "context"['3'] AS CONTEXT_3 FROM "events" EMIT CHANGES;
+-----------+--------+------------------------------------+-----------+-----------+
|device |event |context |CONTEXT_1 |CONTEXT_3 |
+-----------+--------+------------------------------------+-----------+-----------+
|REDACTED |403151 |{1=REDACTED, 2=25, 3=0, 4=60} |REDACTED |0 |
|REDACTED |402004 |{1=REDACTED, 2=2020-10-07T07:02:48.0|REDACTED |null |
| | |000000Z} | | |

mongodb $lookup return empty array

I'm new to mongodb and in this question I have 2 collections, one is selected_date, another is global_mobility_report, what I'm trying to do is to find entries in global_mobility_report whose date is in the selected_date so I use $lookup to join the two collections.
date_selected:
{
"_id" : ObjectId("5f60d81ba43174cf172ebfdc"),
"date" : ISODate("2020-05-22T00:00:00.000+08:00")
},
{
"_id" : ObjectId("5f60d81ba43174cf172ebfdd"),
"date" : ISODate("2020-05-23T00:00:00.000+08:00")
},
{
"_id" : ObjectId("5f60d81ba43174cf172ebfde"),
"date" : ISODate("2020-05-24T00:00:00.000+08:00")
},
{
"_id" : ObjectId("5f60d81ba43174cf172ebfdf"),
"date" : ISODate("2020-05-25T00:00:00.000+08:00")
},
{
"_id" : ObjectId("5f60d81ba43174cf172ebfe0"),
"date" : ISODate("2020-05-26T00:00:00.000+08:00")
},
{
"_id" : ObjectId("5f60d81ba43174cf172ebfe1"),
"date" : ISODate("2020-05-27T00:00:00.000+08:00")
}
global_mobility_report:
{
"_id" : ObjectId("5f49fb013acddb5eec37f99e"),
"country_region_code" : "AE",
"country_region" : "United Arab Emirates",
"sub_region_1" : "",
"sub_region_2" : "",
"metro_area" : "",
"iso_3166_2_code" : "",
"census_fips_code" : "",
"date" : "2020-02-15",
"retail_and_recreation_percent_change_from_baseline" : "0",
"grocery_and_pharmacy_percent_change_from_baseline" : "4",
"parks_percent_change_from_baseline" : "5",
"transit_stations_percent_change_from_baseline" : "0",
"workplaces_percent_change_from_baseline" : "2",
"residential_percent_change_from_baseline" : "1"
},
{
"_id" : ObjectId("5f49fb013acddb5eec37f99f"),
"country_region_code" : "AE",
"country_region" : "United Arab Emirates",
"sub_region_1" : "",
"sub_region_2" : "",
"metro_area" : "",
"iso_3166_2_code" : "",
"census_fips_code" : "",
"date" : "2020-02-16",
"retail_and_recreation_percent_change_from_baseline" : "1",
"grocery_and_pharmacy_percent_change_from_baseline" : "4",
"parks_percent_change_from_baseline" : "4",
"transit_stations_percent_change_from_baseline" : "1",
"workplaces_percent_change_from_baseline" : "2",
"residential_percent_change_from_baseline" : "1"
},
{
"_id" : ObjectId("5f49fb013acddb5eec37f9a0"),
"country_region_code" : "AE",
"country_region" : "United Arab Emirates",
"sub_region_1" : "",
"sub_region_2" : "",
"metro_area" : "",
"iso_3166_2_code" : "",
"census_fips_code" : "",
"date" : "2020-02-17",
"retail_and_recreation_percent_change_from_baseline" : "-1",
"grocery_and_pharmacy_percent_change_from_baseline" : "1",
"parks_percent_change_from_baseline" : "5",
"transit_stations_percent_change_from_baseline" : "1",
"workplaces_percent_change_from_baseline" : "2",
"residential_percent_change_from_baseline" : "1"
},
{
"_id" : ObjectId("5f49fb013acddb5eec37f9a1"),
"country_region_code" : "AE",
"country_region" : "United Arab Emirates",
"sub_region_1" : "",
"sub_region_2" : "",
"metro_area" : "",
"iso_3166_2_code" : "",
"census_fips_code" : "",
"date" : "2020-02-18",
"retail_and_recreation_percent_change_from_baseline" : "-2",
"grocery_and_pharmacy_percent_change_from_baseline" : "1",
"parks_percent_change_from_baseline" : "5",
"transit_stations_percent_change_from_baseline" : "0",
"workplaces_percent_change_from_baseline" : "2",
"residential_percent_change_from_baseline" : "1"
}
when I try to find all entries in global with 'date' match in selected_date(I have converted the string to data format in gobal_mobility_report), it returns empty array.
db.global_mobility_report.aggregate([
{$match:{country_region:"Indonesia"}},
{$addFields: {"dateconverted": {$convert: { input: "$date", to: "date", onError:"onErrorExpr", onNull:"onNullExpr"}:}}},
{
$lookup:
{
from: "selected_date",
localField:"dateconverted",
foreignField: "date",
as: "selected_dates" // empty
}
})]
The output is:
{
"_id" : ObjectId("5f49fd6a3acddb5eec4427bb"),
"country_region_code" : "ID",
"country_region" : "Indonesia",
"sub_region_1" : "",
"sub_region_2" : "",
"metro_area" : "",
"iso_3166_2_code" : "",
"census_fips_code" : "",
"date" : "2020-02-15",
"retail_and_recreation_percent_change_from_baseline" : "-2",
"grocery_and_pharmacy_percent_change_from_baseline" : "-2",
"parks_percent_change_from_baseline" : "-8",
"transit_stations_percent_change_from_baseline" : "1",
"workplaces_percent_change_from_baseline" : "5",
"residential_percent_change_from_baseline" : "1",
"dateconverted" : ISODate("2020-02-15T08:00:00.000+08:00"),
"selected_dates" : [ ]
},
{
"_id" : ObjectId("5f49fd6a3acddb5eec4427bc"),
"country_region_code" : "ID",
"country_region" : "Indonesia",
"sub_region_1" : "",
"sub_region_2" : "",
"metro_area" : "",
"iso_3166_2_code" : "",
"census_fips_code" : "",
"date" : "2020-02-16",
"retail_and_recreation_percent_change_from_baseline" : "-3",
"grocery_and_pharmacy_percent_change_from_baseline" : "-3",
"parks_percent_change_from_baseline" : "-7",
"transit_stations_percent_change_from_baseline" : "-4",
"workplaces_percent_change_from_baseline" : "2",
"residential_percent_change_from_baseline" : "2",
"dateconverted" : ISODate("2020-02-16T08:00:00.000+08:00"),
"selected_dates" : [ ]
}
The reason you are getting an empty array is because dateconverted does not match the date field.
The $lookup operator does an equality between the localField and the foreigntField field, so basically with an example
db.users.insertMany([
{ email: "test#example.com", userId: 0 },
{ email: "test2#example.com", userId: 1 },
{ email: "test3#example.com", userId: 2 },
{ email: "test3#example.com", userId: 3 }
]);
db.posts.insertMany([
{ by: 0, post: "hello world" },
{ by: 0 , post: "hello earthlings" },
{ by: 3, post: "test test test"}
]);
db.posts.aggregate([
{
$lookup: {
from: "users",
localField: "by",
foreignField: "userId",
as: "list_of_post"
}
}
]).toArray();
The output will be what it suppose to be, because the localField matched the ForeignField
[
{
"_id" : ObjectId("5f60f6859a6df3133b325eb0"),
"by" : 0,
"post" : "hello world",
"list_of_post" : [
{
"_id" : ObjectId("5f60f6849a6df3133b325eac"),
"email" : "test#example.com",
"userId" : 0
}
]
},
{
"_id" : ObjectId("5f60f6859a6df3133b325eb1"),
"by" : 0,
"post" : "hello earthlings",
"list_of_post" : [
{
"_id" : ObjectId("5f60f6849a6df3133b325eac"),
"email" : "test#example.com",
"userId" : 0
}
]
},
{
"_id" : ObjectId("5f60f6859a6df3133b325eb2"),
"by" : 3,
"post" : "test test test",
"list_of_post" : [
{
"_id" : ObjectId("5f60f6849a6df3133b325eaf"),
"email" : "test3#example.com",
"userId" : 3
}
]
}
]
Let's mimic a situation where it does not match
db.posts.drop();
db.posts.insertMany([
{ by: 20, post: "hello world" },
{ by: 23 , post: "hello earthlings" },
{ by: 50, post: "test test test"}
]);
We get an empty array
[
{
"_id" : ObjectId("5f60f83344304796ae700b4d"),
"by" : 20,
"post" : "hello world",
"list_of_post" : [ ]
},
{
"_id" : ObjectId("5f60f83344304796ae700b4e"),
"by" : 23,
"post" : "hello earthlings",
"list_of_post" : [ ]
},
{
"_id" : ObjectId("5f60f83344304796ae700b4f"),
"by" : 50,
"post" : "test test test",
"list_of_post" : [ ]
}
]
So, back to your question, the reason for the empty array is as a result of the dateconverted field not matching the date field. So, let's take a look at an example.
In the first document the dateconverted is
ISODate("2020-02-16T08:00:00.000+08:00") and checking at date_selected document , there is no field that correspond to this value ISODate("2020-02-16T08:00:00.000+08:00"). But let's manually insert this, so you will properly understand what I am talking about.
db.date_selected.insert({
"_id" : ObjectId(),
"date": ISODate("2020-02-16T08:00:00.000+08:00")
});
Running the aggregation pipeline will also make selected_dates an empty array. And the other thing you have to note is that the mm/dd/yyy part of the ISODate object does not also match any document in your question. Secondly, you have to devise another means of running the comparison, because the aggregation pipeline in the $addFileds stage will be affected by timezone and other issues as well.

Wiremock Placeholder isn't recognized

I tried with the following json but the wiremock doesn't recognize my change. I read the documentation of wiremock and I saw that they said: JSON equality matching is based on JsonUnit and therefore supports placeholders. I also tried with both JDK 8 and JDK 13 but both are not working
Below is the detail
"method" : "POST",
"bodyPatterns" : [{
"equalToJson" : {
"recipient": {
"address": {
"city": "Bellevue",
"postalCode": "52031",
"countryCode": "US"
}
},
"sender": {
"address": {
"city": "",
"postalCode": "",
"countryCode": "HK"
}
},
"shipDate": "${json-unit.any-string}",
"accountNumber": {
"key": ""
}
},
Result when running selenium test with mock (I executed mock via java -jar tmp/wiremock.jar --global-response-templating --root-dir ./mock --port 1337 ):
|
{ | { <<<<< Body does not match
"recipient" : { | "recipient" : {
"address" : { | "address" : {
"city" : "Bellevue", | "city" : "Bellevue",
"postalCode" : "52031", | "postalCode" : "52031",
"countryCode" : "US" | "countryCode" : "US"
} | }
}, | },
"sender" : { | "sender" : {
"address" : { | "address" : {
"city" : "", | "city" : "",
"postalCode" : "", | "postalCode" : "",
"countryCode" : "HK" | "countryCode" : "HK"
} | }
}, | },
"shipDate" : "${json-unit.any-string}", | "shipDate" : "May-26-2020",
"accountNumber" : { | "accountNumber" : {
"key" : "" | "key" : ""
} | }
} | }
|
Can anybody make some suggestions here. Thank you for reading my question
The usage of "${json-unit.any-string}" is right. But placeholder works when the right dependency is used.
Using dependency com.github.tomakehurst:wiremock-jre8 worked for me.
Refer https://wiremock.org/docs/request-matching/ for more info. This would mention the following note
Placeholders are only available in the jre8 WireMock JARs, as the JsonUnit library requires at least Java 8.
you have to enable the placeholder as below and you should make sure you are using the jre-standalone jar. you seem to be using the normal standalone jar
"enablePlaceholders" : true

Creating subtables in MongoDB using Kettle

I have two PostgreSQL tables with the following data:
houses:
-# select * from houses;
id | address
----+----------------
1 | 123 Main Ave.
2 | 456 Elm St.
3 | 789 County Rd.
(3 rows)
and people:
-# select * from people;
id | name | house_id
----+-------+----------
1 | Fred | 1
2 | Jane | 1
3 | Bob | 1
4 | Mary | 2
5 | John | 2
6 | Susan | 2
7 | Bill | 3
8 | Nancy | 3
9 | Adam | 3
(9 rows)
In Spoon I have two table inputs the first named House Input with the SQL:
SELECT
id
, address
FROM houses
ORDER BY id;
The second table input is named People Input with the SQL:
SELECT
"name"
, house_id
FROM people
ORDER BY house_id;
I have both table input's going into a Merge Join that uses House Input as the first step with a key of id and People Input as the second step with a key of house_id.
I then have this going into a MongoDb Output with the database demo, collection houses, and Mongo document fields address and name. (As I am expecting MongoDB to assign the _id).
When I run the transformation and type db.houses.find(); from a Mongo shell, I get:
{ "_id" : ObjectId("52083706b251cc4be9813153"), "address" : "123 Main Ave.", "name" : "Fred" }
{ "_id" : ObjectId("52083706b251cc4be9813154"), "address" : "123 Main Ave.", "name" : "Jane" }
{ "_id" : ObjectId("52083706b251cc4be9813155"), "address" : "123 Main Ave.", "name" : "Bob" }
{ "_id" : ObjectId("52083706b251cc4be9813156"), "address" : "456 Elm St.", "name" : "Mary" }
{ "_id" : ObjectId("52083706b251cc4be9813157"), "address" : "456 Elm St.", "name" : "John" }
{ "_id" : ObjectId("52083706b251cc4be9813158"), "address" : "456 Elm St.", "name" : "Susan" }
{ "_id" : ObjectId("52083706b251cc4be9813159"), "address" : "789 County Rd.", "name" : "Bill" }
{ "_id" : ObjectId("52083706b251cc4be981315a"), "address" : "789 County Rd.", "name" : "Nancy" }
{ "_id" : ObjectId("52083706b251cc4be981315b"), "address" : "789 County Rd.", "name" : "Adam" }
What I want to get is something like:
{ "_id" : ObjectId("52083706b251cc4be9813153"), "address" : "123 Main Ave.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be9813154"), "name" : "Fred"} ,
{ "_id" : ObjectId("52083706b251cc4be9813155"), "name" : "Jane" } ,
{ "_id" : ObjectId("52083706b251cc4be9813155"), "name" : "Bob" }
]
},
{ "_id" : ObjectId("52083706b251cc4be9813156"), "address" : "345 Elm St.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be9813157"), "name" : "Mary"} ,
{ "_id" : ObjectId("52083706b251cc4be9813158"), "name" : "John" } ,
{ "_id" : ObjectId("52083706b251cc4be9813159"), "name" : "Susan" }
]
},
{ "_id" : ObjectId("52083706b251cc4be981315a"), "address" : "789 County Rd.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be981315b"), "name" : "Mary"} ,
{ "_id" : ObjectId("52083706b251cc4be981315c"), "name" : "John" } ,
{ "_id" : ObjectId("52083706b251cc4be981315d"), "name" : "Susan" }
]
}
}
I know why I am getting what I am getting, but can't seem to find anything online or in the examples to get me where I want to be.
I was hoping someone could nudge me in the right direction, point to an example that is closer to what I am trying to accomplish, or tell me that this is out of scope for what Kettle is supposed to do (Hopefully not the latter).
Turns out creating subtables is all in the MongoDB Output step.
First make sure that you have the Upsert and Modifier update checked on the Configure connection tab.
Then on the Mongo Documents field tab enter the following (The first line is column names):
Name | Mongo document Path | Use field name | Match field for upsert | Modifier operation | Modifier policy
--------+---------------------+----------------+------------------------|--------------------+----------------
address | | Y | N | N/A | Insert
address | | Y | Y | N/A | Insert
name | people[0] | Y | N | $set | Insert
name | people[1] | Y | N | $push | Update
Now when I run db.houses.find(); I get:
{ "_id" : ObjectId("520ccb8978d96b204daa029d"), "address" : "123 Main Ave.", "people" : [ { "name" : "Fred" }, { "name" : "Jane" }, { "name" : "Bob" } ] }
{ "_id" : ObjectId("520ccb8978d96b204daa029e"), "address" : "456 Elm St.", "people" : [ { "name" : "Mary" }, { "name" : "John" }, { "name" : "Susan" } ] }
{ "_id" : ObjectId("520ccb8a78d96b204daa029f"), "address" : "789 County Rd.", "people" : [ { "name" : "Bill" }, { "name" : "Nancy" }, { "name" : "Adam" } ] }
Two things I would like to note:
This assumes that my address are unique and that my name's are unique within a house. If this is not the case I would need to make my id's from my OLTP tables to id (not _id) fields in MongoDB and Match for field upsert on my house id.
As #G Gordon Worley III pointed out above, if these two tables are in the same database, I could do the join in the Table Output step, and this would be a two step transformation (and faster).