How do I configure kafka-connect-spooldir to consume a json array?

How do I configure kafka-connect-spooldir to consume a json array? - apache-kafka

I've configured kafka-connect-spooldir to consume files containing JSON objects according to the instructions at https://github.com/jcustenborder/kafka-connect-spooldir. This consumes files containing one or more JSON objects. Now how can I configure this to consume a file containing a JSON array instead?
Here is my current key and value schemas:
key.schema={"name": "com.example.users.UserKey", "type": "STRUCT", "isOptional": false, "fieldSchemas": {"id": {"type": "INT64", "isOptional": false }}}
value.schema={"name": "com.example.users.User", "type": "STRUCT", "isOptional": false, "fieldSchemas": {"id": {"type": "INT64", "isOptional": false}, "test": {"type": "STRING", "isOptional": true}}}
Here is a sample of my data:
{
"id": 10,
"test": "Carla Howe"
}
{
"id": 1,
"test": "Gayle Becker"
}
Here is what I would like the data to look like:
[
{
"id": 10,
"test": "Carla Howe"
},
{
"id": 1,
"test": "Gayle Becker"
}
]
I've tried simply to change the first type from STRUCT to ARRAY, but this throws an NPE "valueSchema cannot be null".
Can someone please point me in the right direction, or provide an example?

According to documentation there is a SchemaGenerator tool that can be run to generate the schema for sample data.

Related

JDBC sink topic with multiple structs to postgres

I am trying to sink a few topics top a postgres database. However the topic schema defines a array at the top level and within it multiple structs. Automapping does not work and I cannot find any reference how to handle this. I need all structs because they are dependent types, the second struct references the first struct as a field.
Currently it breaks when hitting the 2nd struct stating statusChangeEvent (struct) has no mapping to sql column type. This because it is using auto.create to make a table (probably called ProcessStatus) then when hitting the second entry there is no column of course.
[
{
"type": "record",
"name": "processStatus",
"namespace": "company.some.process",
"fields": [
{
"name": "code",
"doc": "The code of the processStatus",
"type": "string"
},
{
"name": "name",
"doc": "The name of the processStatus",
"type": "string"
},
{
"name": "description",
"type": "string"
},
{
"name": "isCompleted",
"type": "boolean"
},
{
"name": "isSuccessfullyCompleted",
"type": "boolean"
}
]
},
{
"type": "record",
"name": "StatusChangeEvent",
"namespace": "company.some.process",
"fields": [
{
"name": "contNumber",
"type": "string"
},
{
"name": "processId",
"type": "string"
},
{
"name": "processVersion",
"type": "int"
},
{
"name": "extProcessId",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "fromStatus",
"type": "process.status"
},
{
"name": "toStatus",
"doc": "The new status of the process",
"type": "company.some.process.processStatus"
},
{
"name": "changeDateTime",
"type": "long",
"logicalType": "timestamp-millis"
},
{
"name": "isPublic",
"type": "boolean"
}
]
}
]
I am not using ksql atm. Which connector settings are suited for this task? If there is a ksql alternative it would be nice to know but the current requirement is to use the JDBC connector.
I tried using flatten but it does not support struct fields that have a schema. Which seems kind of weird. Aren't schema's the whole selling point of connect with kafka? Or is it more of a constraint you have to work around?

Aren't schema's the whole selling point of connect with kafka?
Yes, but Postgres (or the JDBC Sink, in general) doesn't really support nested objects within columns. For that, you're better off with a document database, such as using Mongo Sink Connector.
Which connector settings are suited for this task?
None, really, other than transforms. You could write your own if flatten doesn't work.
You could try pre-defining your table to use JSONB for the two status columns, however, that's more of a workaround.

What column type do I need for this nested data in BigQuery?

I have a JSON schema for a Kafka stream that I am integrating with BigQuery but I can't get the data type correct at the BigQuery end. This is the schema:
"my_meta_data": {
"type": "object",
"properties": {
"property_1": {
"type": "array",
"items": {
"type": "number"
}
},
"property_2": {
"type": "array",
"items": {
"type": "number"
}
},
"property_3": {
"type": "array",
"items": {
"type": "number"
}
}
}
}
I tried this in the JSON file defining the BigQuery table:
{
"name": "my_meta_data",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "property_1",
"type": "INT64",
"mode": "REPEATED"
},
{
"name": "property_2",
"type": "INT64",
"mode": "REPEATED"
},
{
"name": "property_3",
"type": "INT64",
"mode": "REPEATED"
}
]
}
I am using a hosted connector from Confluent, the Kafka provider, and the error message is
The connector is failing because it cannot write a non-array element to an array column. Please check the schemas of the data in Kafka and the BigQuery tables the connector is writing to, and ensure that all data from Kafka that will be written to an array column in BigQuery is contained in an array.
I haven't defined an array column though, I've defined a RECORD column that contains arrays. Any ideas how I can set up the BigQuery table to capture this data? Thanks in advance.

Loopback - GET model using custom String ID from MongoDB

I'm developing an API with loopback, everything worked fine until I decided to change the ids of my documents in the database. Now I don't want them to be auto generated.
Now that I'm setting the Id myself. I get an "Unknown id" 404, whenever I hit this endpoint: GET properties/{id}
How can I use custom IDs with loopback and mongodb?
Whenever I hit this endpoint: http://localhost:5000/api/properties/20020705171616489678000000
I get this error:
{
"error": {
"name": "Error",
"status": 404,
"message": "Unknown \"Property\" id \"20020705171616489678000000\".",
"statusCode": 404,
"code": "MODEL_NOT_FOUND"
}
}
This is my model.json, just in case...
{
"name": "Property",
"plural": "properties",
"base": "PersistedModel",
"idInjection": false,
"options": {
"validateUpsert": true
},
"properties": {
"id": {"id": true, "type": "string", "generated": false},
"photos": {
"type": [
"string"
]
},
"propertyType": {
"type": "string",
"required": true
},
"internalId": {
"type": "string",
"required": true
},
"flexCode": {
"type": "string",
"required": true
}
},
"validations": [],
"relations": {},
"acls": [],
"methods": []
}

Your model setup (with with idInjection: true or false) did work when I tried it with a PostGreSQL DB setup with a text id field for smaller numbers.
Running a Loopback application with DEBUG=loopback:connector:* node . outputs the database queries being run in the terminal - I tried it with the id value you are trying and the parameter value was [2.002070517161649e+25], so the size of the number is the issue.
You could try raising it as a bug in Loopback, but JS is horrible at dealing with large numbers so you may be better off not using such large numbers as identifiers anyway.
It does work if the ID is an alphanumeric string over 16 characters so there might be a work around for you (use ObjectId?), depending on what you are trying to achieve.

Swagger UI doesn't show embedded json properties model

I am using the swagger tool for documenting my Jersey based REST API (the swaggerui I am using was downloaded on June 2014 don't know if this issue has been fixed in later versions but as I made a lot of customization to its code so I don't have the option to download the latest without investing lot of time to customize it again).
So far and until now, all my transfer objects have one level deep properties (no embedded pojos). But now that I added some rest paths that are returning more complex objects (two levels of depth) I found that SwaggerUI is not expanding the JSON model schema when having embedded objects.
Here is the important part of the swagger doc:
...
{
"path": "/user/combo",
"operations": [{
"method": "POST",
"summary": "Inserts a combo (user, address)",
"notes": "Will insert a new user and a address definition in a single step",
"type": "UserAndAddressWithIdSwaggerDto",
"nickname": "insertCombo",
"consumes": ["application/json"],
"parameters": [{
"name": "body",
"description": "New user and address combo",
"required": true,
"type": "UserAndAddressWithIdSwaggerDto",
"paramType": "body",
"allowMultiple": false
}],
"responseMessages": [{
"code": 200,
"message": "OK",
"responseModel": "UserAndAddressWithIdSwaggerDto"
}]
}]
}
...
"models": {
"UserAndAddressWithIdSwaggerDto": {
"id": "UserAndAddressWithIdSwaggerDto",
"description": "",
"required": ["user",
"address"],
"properties": {
"user": {
"$ref": "UserDto",
"description": "User"
},
"address": {
"$ref": "AddressDto",
"description": "Address"
}
}
},
"UserDto": {
"id": "UserDto",
"properties": {
"userId": {
"type": "integer",
"format": "int64"
},
"name": {
"type": "string"
},...
},
"AddressDto": {
"id": "AddressDto",
"properties": {
"addressId": {
"type": "integer",
"format": "int64"
},
"street": {
"type": "string"
},...
}
}
...
The embedded objects are User and Address, their models are being created correctly as shown in the json response.
But when opening the SwaggerUI I can only see:
{
"user": "UserDto",
"address": "AddressDto"
}
But I should see something like:
{
"user": {
"userId": "integer",
"name": "string",...
},
"address": {
"addressId": "integer",
"street": "string",...
}
}
Something may be wrong in the code that expands the internal properties, the javascript console doesn't show any error so I assume this is a bug.

I found the solution, there is a a line of code that needs to be modified to make it work properly:
In the swagger.js file there is a getSampleValue function with a conditional checking for undefined:
SwaggerModelProperty.prototype.getSampleValue = function(modelsToIgnore) {
var result;
if ((this.refModel != null) && (modelsToIgnore[this.refModel.name] === 'undefined'))
...
I updated the equality check to (removing quotes):
modelsToIgnore[this.refModel.name] === undefined
After that, SwaggerUI is able to show the embedded models.

OData REST Filter for deeply nested data

I have a working REST request that returns a large results collection. (trimmed here)
The original URL is:
http://intranet.domain.com//_api/SP.UserProfiles.PeopleManager/GetPropertiesFor(accountName=#v)?#v='domain\kens'&$select=AccountName,DisplayName,Email,Title,UserProfileProperties
The response is:
{
"d": {
"__metadata": {
"id": "stuff",
"uri": "morestuff",
"type": "SP.UserProfiles.PersonProperties"
},
"AccountName": "domain\\KenS",
"DisplayName": "Ken Sanchez",
"Email": "KenS#domain.com",
"Title": "Research Assistant",
"UserProfileProperties": {
"results": [
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "UserProfile_GUID",
"Value": "1c419284-604e-41a8-906f-ac34fd4068ab",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "SID",
"Value": "S-1-5-21-2740942301-4273591597-3258045437-1132",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "ADGuid",
"Value": "",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "AccountName",
"Value": "domain\\KenS",
"ValueType": "Edm.String"
}...
Is it possible to change the REST request with a $filter that only returns the Key Values from the results collection where Key=SID OR Key= other values?
I only need about 3 values from the results collection by name.

In OData, you can't filter an inner feed.
Instead you could try to query the entity set that UserProfileProperties comes from and expand the associated SP.UserProfiles.PersonProperties entity.
The syntax will need to be adjusted for your scenario, but I'm thinking something along these lines:
service.svc/UserProfileProperties?$filter=Key eq 'SID' and RelatedPersonProperties/AccountName eq 'domain\kens'&$expand=RelatedPersonProperties
That assumes you have a top-level entity set of UserProfileProperties and each is tied back to a single SP.UserProfiles.PersonProperties entity via a navigation property called (in my example) RelatedPersonProperties.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How do I configure kafka-connect-spooldir to consume a json array? - apache-kafka

According to documentation there is a SchemaGenerator tool that can be run to generate the schema for sample data.

Related

JDBC sink topic with multiple structs to postgres

What column type do I need for this nested data in BigQuery?

Loopback - GET model using custom String ID from MongoDB

Swagger UI doesn't show embedded json properties model

OData REST Filter for deeply nested data

Categories

Resources