Tags are not written in influxdb through kafka-connect-influxdb - apache-kafka

I am trying to connect kafka sink to influxdb. While it works but it does not save tags. For example if i send this to kafka topic
{"id": 1, "product": "pencil", "quantity": 100, "price": 50, "tags" : {"DEVICE": "living", "location": "home"}}`
Data is saved to influxdb but only the fields part.
I have been trying to debug this but failed. The versions i am using:
kafka 2.11-2.4.0
influxdb: 1.7.7

I encountered this too when I followed the Avro tags example on this page:
https://docs.confluent.io/kafka-connect-influxdb/current/influx-db-sink-connector/index.html
The "tags" schema in the example was incorrect. The example defines tags as:
{
"name": "tags",
"type": {
"name": "tags",
"type": "record",
"fields": [{
"name": "DEVICE",
"type": "string"
}, {
"name": "location",
"type": "string"
}]
}
}
It should actually be
{
"name": "tags",
"type": {
"type": "map",
"values": "string"
}
}
This web page provided the solution: https://rmoff.net/2020/01/23/notes-on-getting-data-into-influxdb-from-kafka-with-kafka-connect/

Related

Kafka & Avro producer - Schema being registered is incompatible with an earlier schema for subject

Im running schema-registry-confluent example with my local and I got an error when I modified the schema of the message:
This is my schema:
{
"type": "record",
"namespace": "io.confluent.tutorial.pojo.avro",
"name": "OrderDetail",
"fields": [
{
"name": "number",
"type": "long",
"doc": "The order number."
},
{
"name": "date",
"type": "long",
"logicalType": "date",
"doc": "The date the order was submitted."
},
{
"name": "client",
"type": {
"type": "record",
"name": "Client",
"fields": [
{ "name": "code", "type": "string" }
]
}
}
]
}
And I tried to send this message on the producer:
{"number": 2343434, "date": 1596490462, "client": {"code": "1234"}}
But I got this error:
org.apache.kafka.common.errors.InvalidConfigurationException: Schema being registered is incompatible with an earlier schema for subject "example-topic-avro-value"; error code: 409

Confluent Kafka producer message format for nested records

I have a AVRO schema registered in a kafka topic and am trying to send data to it. The schema has nested records and I'm not sure how I correctly send data to it using confluent_kafka python.
Example schema: *ingore any typos in schema (real one is very large, just an example)
{
"namespace": "company__name",
"name": "our_data",
"type": "record",
"fields": [
{
"name": "datatype1",
"type": ["null", {
"type": "record",
"name": "datatype1_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
{
"name": "datatype2",
"type": ["null", {
"type": "record",
"name": "datatype2_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
]
}
I am trying to send data to this schema using confluent_kafka python version. When I have done this prior, the records were not nested and I would use a typical dictionary key: value pairs and serialize it. How can I send nested data to work with schema.
What I tried so far...
message = {'datatype1':
{'site': 'sitename',
'units': 'm'
}
}
this version does not cause any kafka errors, but the all of the columns show up as null
and...
message = {'datatype1':
{'datatype1_1':
{'site': 'sitename',
'units': 'm'
}
}
}
This version produced a kafka error with the schema.
If you use namespaces, you don't have to worry about naming collisions and you can properly structure your optional records:
for example, both
{
"meta": {
"instanceID": "something"
}
}
And
{}
are valid instances of:
{
"doc": "Survey",
"name": "Survey",
"type": "record",
"fields": [
{
"name": "meta",
"type": [
"null",
{
"name": "meta",
"type": "record",
"fields": [
{
"name": "instanceID",
"type": [
"null",
"string"
],
"namespace": "Survey.meta"
}
],
"namespace": "Survey"
}
],
"namespace": "Survey"
}
]
}

Invalid Schema on Confluent Controlcenter

I am just trying to set up a Value-Schema for a Topic in the Web interface of Confluent Control Center.
I chose the Avro-format and tried the following schema:
{
"fields": [
{"name":"date",
"type":"dates",
"doc":"Date of the count"
},
{"name":"time",
"type":"timestamp-millis",
"doc":"date in ms"
},
{"name":"count",
"type":"int",
"doc":"Number of Articles"
}
],
"name": "articleCount",
"type": "record"
}
But the interface keeps on saying the input schema is invalid.
I have no idea why.
Any help is appreciated!
There are issues related to datatypes.
"type":"dates" => "type": "string"
"type":"timestamp-millis" => "type": {"type": "long", "logicalType": "timestamp-millis"}
Updated schema will be like:
{
"fields": [
{
"name": "date",
"type": "string",
"doc": "Date of the count"
},
{
"name": "time",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "date in ms"
},
{
"name": "count",
"type": "int",
"doc": "Number of Articles"
}
],
"name": "articleCount",
"type": "record"
}
Sample Payload:
{
"date": "2020-07-10",
"time": 12345678900,
"count": 1473217
}
More reference related to Avro datatypes can be found here:
https://docs.oracle.com/database/nosql-12.1.3.0/GettingStartedGuide/avroschemas.html

Camel Json Validation throws NoJsonBodyValidationException

I'm trying to perform Header validation for incoming GET request. I referred the Camel JSON schema validator component and followed below steps to implement in my project i.e.
Adding camel-json-validator-starter dependency in build.gradle
Adding Employee.json (YAML converted to JSON) in Resource folder of my Spring boot project. Here initially I had Open API 3.0 yaml specification file and I converted the same to json
Invoking validation with below code
rest(/employee).id("get-employee")
.produces(JSON_MEDIA_TYPE)
.get()
.description("The employee API")
.outType(EmployeeResponse.class)
.responseMessage()
.code(HttpStatus.OK.toString())
.message("Get Employee")
.endResponseMessage()
.route()
.to("json-validator:openapi.json")
.to("bean:employeeService?method=getEmployee()");
Running the project throws a org.apache.camel.component.jsonvalidator.NoJsonBodyValidationException, I'm using GET request but why is it expecting Request body, I just wanted to validate the Headers and request param from the incoming request. I'm not sure if my approach is right and what I'm missing.
I ran into this problem last year when adopting OpenAPI and came to the conclusion that it was too much work. I could not get FULL validation from the JSON validator using OpenAPI because there was some differences between the way OpenAPI declares schema definitions and the full JSON schema definitions.
Looking a the documentation of the JSON validation component you find this:
The JSON Schema Validator component performs bean validation of the message body against JSON Schemas v4 draft using the NetworkNT JSON Schema library (https://github.com/networknt/json-schema-validator). This is a full stand alone JSON Schema and if you read the github pages you find this.
OpenAPI Support
The OpenAPI 3.0 specification is using JSON schema to validate the request/response, but there are some differences. With a configuration file, you can enable the library to work with OpenAPI 3.0 validation.
OpenAPI schema appears to be a subset of the real JSON Schema.
Before I show you a more detailed example. Look at the example given in the camel documentation here: https://camel.apache.org/components/latest/json-validator-component.html. Compare that json schema file with the openAPI schema definitions and you will see they are not the same.
A useful tool here is https://jsonschema.net you can paste your json example here and infer a schema. I use this tool and the OpenAPI Pet Store example in the example below,
OpenAPI Petstore Pet Object Example:
{
"id": 0,
"category": {
"id": 0,
"name": "string"
},
"name": "doggie",
"photoUrls": [
"string"
],
"tags": [
{
"id": 0,
"name": "string"
}
],
"status": "available"
}
The openAPI specification saved in JSON produces this definition:
"Pet": {
"type": "object",
"required": [
"name",
"photoUrls"
],
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"category": {
"$ref": "#/definitions/Category"
},
"name": {
"type": "string",
"example": "doggie"
},
"photoUrls": {
"type": "array",
"xml": {
"name": "photoUrl",
"wrapped": true
},
"items": {
"type": "string"
}
},
"tags": {
"type": "array",
"xml": {
"name": "tag",
"wrapped": true
},
"items": {
"$ref": "#/definitions/Tag"
}
},
"status": {
"type": "string",
"description": "pet status in the store",
"enum": [
"available",
"pending",
"sold"
]
}
},
"xml": {
"name": "Pet"
}
}
When I convert this to proper JSON schema syntax the JSON Schema looks like this:
{
"definitions": {},
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://example.com/root.json",
"type": "object",
"title": "The Root Schema",
"required": [
"id",
"category",
"name",
"photoUrls",
"tags",
"status"
],
"properties": {
"id": {
"$id": "#/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"category": {
"$id": "#/properties/category",
"type": "object",
"title": "The Category Schema",
"required": [
"id",
"name"
],
"properties": {
"id": {
"$id": "#/properties/category/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"name": {
"$id": "#/properties/category/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
}
},
"name": {
"$id": "#/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"doggie"
],
"pattern": "^(.*)$"
},
"photoUrls": {
"$id": "#/properties/photoUrls",
"type": "array",
"title": "The Photourls Schema",
"items": {
"$id": "#/properties/photoUrls/items",
"type": "string",
"title": "The Items Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
},
"tags": {
"$id": "#/properties/tags",
"type": "array",
"title": "The Tags Schema",
"items": {
"$id": "#/properties/tags/items",
"type": "object",
"title": "The Items Schema",
"required": [
"id",
"name"
],
"properties": {
"id": {
"$id": "#/properties/tags/items/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"name": {
"$id": "#/properties/tags/items/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
}
}
},
"status": {
"$id": "#/properties/status",
"type": "string",
"title": "The Status Schema",
"default": "",
"examples": [
"available"
],
"pattern": "^(.*)$"
}
}
}
There is some differences between OpenAPI's schema definition and JSON Schema definition.
failOnNullBody (producer) - Whether to fail if no body exists.
Default is true
Try setting the option in your call:
.to("json-validator:openapi.json?failOnNullBody=false")

How do I COPY a nested Avro field to Redshift as a single field?

I have the following Avro schema for a record, and I'd like to issue a COPY to Redshift:
"fields": [{
"name": "id",
"type": "long"
}, {
"name": "date",
"type": {
"type": "record",
"name": "MyDateTime",
"namespace": "com.mynamespace",
"fields": [{
"name": "year",
"type": "int"
}, {
"name": "monthOfYear",
"type": "int"
}, {
"name": "dayOfMonth",
"type": "int"
}, {
"name": "hourOfDay",
"type": "int"
}, {
"name": "minuteOfHour",
"type": "int"
}, {
"name": "secondOfMinute",
"type": "int"
}, {
"name": "millisOfSecond",
"type": ["int", "null"],
"default": 0
}, {
"name": "zone",
"type": {
"type": "string",
"avro.java.string": "String"
},
"default": "America/New_York"
}],
"noregistry": []
}
}]
I want to condense the object in MyDateTime during the COPY to a single column in Redshift. I saw that you can map nested JSON data to a top-level column: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json-jsonpaths , but I haven't figured out a way to concatenate the fields directly in the COPY command.
In other words, is there a way to convert the following record (originally in Avro format)
{
"id": 6,
"date": {
"year": 2010,
"monthOfYear": 10,
"dayOfMonth": 12,
"hourOfDay": 14,
"minuteOfHour": 26,
"secondOfMinute": 42,
"millisOfSecond": {
"int": 0
},
"zone": "America/New_York"
}
}
Into a row in Redshift that looks like:
id | date
---------------------------------------------
6 | 2010-10-12 14:26:42:000 America/New_York
I'd like to do this directly with COPY
You would need to declare the Avro file(s) as an Redshift Spectrum external table and then use a query over that to insert the data into the local Redshift table.
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html