Tags are not written in influxdb through kafka-connect-influxdb

Tags are not written in influxdb through kafka-connect-influxdb - apache-kafka

I am trying to connect kafka sink to influxdb. While it works but it does not save tags. For example if i send this to kafka topic
{"id": 1, "product": "pencil", "quantity": 100, "price": 50, "tags" : {"DEVICE": "living", "location": "home"}}`
Data is saved to influxdb but only the fields part.
I have been trying to debug this but failed. The versions i am using:
kafka 2.11-2.4.0
influxdb: 1.7.7

I encountered this too when I followed the Avro tags example on this page:
https://docs.confluent.io/kafka-connect-influxdb/current/influx-db-sink-connector/index.html
The "tags" schema in the example was incorrect. The example defines tags as:
{
"name": "tags",
"type": {
"name": "tags",
"type": "record",
"fields": [{
"name": "DEVICE",
"type": "string"
}, {
"name": "location",
"type": "string"
}]
}
}
It should actually be
{
"name": "tags",
"type": {
"type": "map",
"values": "string"
}
}
This web page provided the solution: https://rmoff.net/2020/01/23/notes-on-getting-data-into-influxdb-from-kafka-with-kafka-connect/

Related

Kafka & Avro producer - Schema being registered is incompatible with an earlier schema for subject

Im running schema-registry-confluent example with my local and I got an error when I modified the schema of the message:
This is my schema:
{
"type": "record",
"namespace": "io.confluent.tutorial.pojo.avro",
"name": "OrderDetail",
"fields": [
{
"name": "number",
"type": "long",
"doc": "The order number."
},
{
"name": "date",
"type": "long",
"logicalType": "date",
"doc": "The date the order was submitted."
},
{
"name": "client",
"type": {
"type": "record",
"name": "Client",
"fields": [
{ "name": "code", "type": "string" }
]
}
}
]
}
And I tried to send this message on the producer:
{"number": 2343434, "date": 1596490462, "client": {"code": "1234"}}
But I got this error:
org.apache.kafka.common.errors.InvalidConfigurationException: Schema being registered is incompatible with an earlier schema for subject "example-topic-avro-value"; error code: 409

Confluent Kafka producer message format for nested records

I have a AVRO schema registered in a kafka topic and am trying to send data to it. The schema has nested records and I'm not sure how I correctly send data to it using confluent_kafka python.
Example schema: *ingore any typos in schema (real one is very large, just an example)
{
"namespace": "company__name",
"name": "our_data",
"type": "record",
"fields": [
{
"name": "datatype1",
"type": ["null", {
"type": "record",
"name": "datatype1_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
{
"name": "datatype2",
"type": ["null", {
"type": "record",
"name": "datatype2_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
]
}
I am trying to send data to this schema using confluent_kafka python version. When I have done this prior, the records were not nested and I would use a typical dictionary key: value pairs and serialize it. How can I send nested data to work with schema.
What I tried so far...
message = {'datatype1':
{'site': 'sitename',
'units': 'm'
}
}
this version does not cause any kafka errors, but the all of the columns show up as null
and...
message = {'datatype1':
{'datatype1_1':
{'site': 'sitename',
'units': 'm'
}
}
}
This version produced a kafka error with the schema.

If you use namespaces, you don't have to worry about naming collisions and you can properly structure your optional records:
for example, both
{
"meta": {
"instanceID": "something"
}
}
And
{}
are valid instances of:
{
"doc": "Survey",
"name": "Survey",
"type": "record",
"fields": [
{
"name": "meta",
"type": [
"null",
{
"name": "meta",
"type": "record",
"fields": [
{
"name": "instanceID",
"type": [
"null",
"string"
],
"namespace": "Survey.meta"
}
],
"namespace": "Survey"
}
],
"namespace": "Survey"
}
]
}

Invalid Schema on Confluent Controlcenter

I am just trying to set up a Value-Schema for a Topic in the Web interface of Confluent Control Center.
I chose the Avro-format and tried the following schema:
{
"fields": [
{"name":"date",
"type":"dates",
"doc":"Date of the count"
},
{"name":"time",
"type":"timestamp-millis",
"doc":"date in ms"
},
{"name":"count",
"type":"int",
"doc":"Number of Articles"
}
],
"name": "articleCount",
"type": "record"
}
But the interface keeps on saying the input schema is invalid.
I have no idea why.
Any help is appreciated!

There are issues related to datatypes.
"type":"dates" => "type": "string"
"type":"timestamp-millis" => "type": {"type": "long", "logicalType": "timestamp-millis"}
Updated schema will be like:
{
"fields": [
{
"name": "date",
"type": "string",
"doc": "Date of the count"
},
{
"name": "time",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "date in ms"
},
{
"name": "count",
"type": "int",
"doc": "Number of Articles"
}
],
"name": "articleCount",
"type": "record"
}
Sample Payload:
{
"date": "2020-07-10",
"time": 12345678900,
"count": 1473217
}
More reference related to Avro datatypes can be found here:
https://docs.oracle.com/database/nosql-12.1.3.0/GettingStartedGuide/avroschemas.html

Camel Json Validation throws NoJsonBodyValidationException

I'm trying to perform Header validation for incoming GET request. I referred the Camel JSON schema validator component and followed below steps to implement in my project i.e.
Adding camel-json-validator-starter dependency in build.gradle
Adding Employee.json (YAML converted to JSON) in Resource folder of my Spring boot project. Here initially I had Open API 3.0 yaml specification file and I converted the same to json
Invoking validation with below code
rest(/employee).id("get-employee")
.produces(JSON_MEDIA_TYPE)
.get()
.description("The employee API")
.outType(EmployeeResponse.class)
.responseMessage()
.code(HttpStatus.OK.toString())
.message("Get Employee")
.endResponseMessage()
.route()
.to("json-validator:openapi.json")
.to("bean:employeeService?method=getEmployee()");
Running the project throws a org.apache.camel.component.jsonvalidator.NoJsonBodyValidationException, I'm using GET request but why is it expecting Request body, I just wanted to validate the Headers and request param from the incoming request. I'm not sure if my approach is right and what I'm missing.

I ran into this problem last year when adopting OpenAPI and came to the conclusion that it was too much work. I could not get FULL validation from the JSON validator using OpenAPI because there was some differences between the way OpenAPI declares schema definitions and the full JSON schema definitions.
Looking a the documentation of the JSON validation component you find this:
The JSON Schema Validator component performs bean validation of the message body against JSON Schemas v4 draft using the NetworkNT JSON Schema library (https://github.com/networknt/json-schema-validator). This is a full stand alone JSON Schema and if you read the github pages you find this.
OpenAPI Support
The OpenAPI 3.0 specification is using JSON schema to validate the request/response, but there are some differences. With a configuration file, you can enable the library to work with OpenAPI 3.0 validation.
OpenAPI schema appears to be a subset of the real JSON Schema.
Before I show you a more detailed example. Look at the example given in the camel documentation here: https://camel.apache.org/components/latest/json-validator-component.html. Compare that json schema file with the openAPI schema definitions and you will see they are not the same.
A useful tool here is https://jsonschema.net you can paste your json example here and infer a schema. I use this tool and the OpenAPI Pet Store example in the example below,
OpenAPI Petstore Pet Object Example:
{
"id": 0,
"category": {
"id": 0,
"name": "string"
},
"name": "doggie",
"photoUrls": [
"string"
],
"tags": [
{
"id": 0,
"name": "string"
}
],
"status": "available"
}
The openAPI specification saved in JSON produces this definition:
"Pet": {
"type": "object",
"required": [
"name",
"photoUrls"
],
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"category": {
"$ref": "#/definitions/Category"
},
"name": {
"type": "string",
"example": "doggie"
},
"photoUrls": {
"type": "array",
"xml": {
"name": "photoUrl",
"wrapped": true
},
"items": {
"type": "string"
}
},
"tags": {
"type": "array",
"xml": {
"name": "tag",
"wrapped": true
},
"items": {
"$ref": "#/definitions/Tag"
}
},
"status": {
"type": "string",
"description": "pet status in the store",
"enum": [
"available",
"pending",
"sold"
]
}
},
"xml": {
"name": "Pet"
}
}
When I convert this to proper JSON schema syntax the JSON Schema looks like this:
{
"definitions": {},
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://example.com/root.json",
"type": "object",
"title": "The Root Schema",
"required": [
"id",
"category",
"name",
"photoUrls",
"tags",
"status"
],
"properties": {
"id": {
"$id": "#/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"category": {
"$id": "#/properties/category",
"type": "object",
"title": "The Category Schema",
"required": [
"id",
"name"
],
"properties": {
"id": {
"$id": "#/properties/category/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"name": {
"$id": "#/properties/category/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
}
},
"name": {
"$id": "#/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"doggie"
],
"pattern": "^(.*)$"
},
"photoUrls": {
"$id": "#/properties/photoUrls",
"type": "array",
"title": "The Photourls Schema",
"items": {
"$id": "#/properties/photoUrls/items",
"type": "string",
"title": "The Items Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
},
"tags": {
"$id": "#/properties/tags",
"type": "array",
"title": "The Tags Schema",
"items": {
"$id": "#/properties/tags/items",
"type": "object",
"title": "The Items Schema",
"required": [
"id",
"name"
],
"properties": {
"id": {
"$id": "#/properties/tags/items/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"name": {
"$id": "#/properties/tags/items/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
}
}
},
"status": {
"$id": "#/properties/status",
"type": "string",
"title": "The Status Schema",
"default": "",
"examples": [
"available"
],
"pattern": "^(.*)$"
}
}
}
There is some differences between OpenAPI's schema definition and JSON Schema definition.

failOnNullBody (producer) - Whether to fail if no body exists.
Default is true
Try setting the option in your call:
.to("json-validator:openapi.json?failOnNullBody=false")

How do I COPY a nested Avro field to Redshift as a single field?

I have the following Avro schema for a record, and I'd like to issue a COPY to Redshift:
"fields": [{
"name": "id",
"type": "long"
}, {
"name": "date",
"type": {
"type": "record",
"name": "MyDateTime",
"namespace": "com.mynamespace",
"fields": [{
"name": "year",
"type": "int"
}, {
"name": "monthOfYear",
"type": "int"
}, {
"name": "dayOfMonth",
"type": "int"
}, {
"name": "hourOfDay",
"type": "int"
}, {
"name": "minuteOfHour",
"type": "int"
}, {
"name": "secondOfMinute",
"type": "int"
}, {
"name": "millisOfSecond",
"type": ["int", "null"],
"default": 0
}, {
"name": "zone",
"type": {
"type": "string",
"avro.java.string": "String"
},
"default": "America/New_York"
}],
"noregistry": []
}
}]
I want to condense the object in MyDateTime during the COPY to a single column in Redshift. I saw that you can map nested JSON data to a top-level column: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json-jsonpaths , but I haven't figured out a way to concatenate the fields directly in the COPY command.
In other words, is there a way to convert the following record (originally in Avro format)
{
"id": 6,
"date": {
"year": 2010,
"monthOfYear": 10,
"dayOfMonth": 12,
"hourOfDay": 14,
"minuteOfHour": 26,
"secondOfMinute": 42,
"millisOfSecond": {
"int": 0
},
"zone": "America/New_York"
}
}
Into a row in Redshift that looks like:
id | date
---------------------------------------------
6 | 2010-10-12 14:26:42:000 America/New_York
I'd like to do this directly with COPY

You would need to declare the Avro file(s) as an Redshift Spectrum external table and then use a query over that to insert the data into the local Redshift table.
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Tags are not written in influxdb through kafka-connect-influxdb - apache-kafka

Related

Kafka & Avro producer - Schema being registered is incompatible with an earlier schema for subject

Confluent Kafka producer message format for nested records

Invalid Schema on Confluent Controlcenter

Camel Json Validation throws NoJsonBodyValidationException

How do I COPY a nested Avro field to Redshift as a single field?

Categories

Resources