I'm using npm jsonschema module for nodejs and my very simple json schema looks like:
"title": "ticket",
"type": "object",
"properties": {
"_id" : {"type": "string"},
"created": {"type": "string", "format": "date-time"}
},
"additionalProperties" : false
The data I validate through this schema is stored in mongodb. The problem is that created property has an index with expireAfterSeconds to auto delete these records after certain amount of time.
Now I have the following problem. If I send a string (no matter if it's valid or invalid according to json specification), the document in the database also has created property with type string, as mongo db is schemaless and I can't predefine that property as date type. For example if I send the data with created property as 2017-08-15T14:34:18.839Z ISO string and although mongo date properties looks very similar they still remains string. Ofc this breaks the expires logic.
If I send my data with real date for created property, JSON validation fails with
instance.created is not of a type(s) string
Ofc I can transform all date fields on insert and update from validated string to date type but this is kinda not sufficient, because on read I will have data with real date types that will fail validation on next update. So I can include a back transformation on every read from date to string but still this solution is not good enough for me.
Any other ideas ?
Correct types for _id (which is ObjectID) and date types are:
"bsonType": "objectId" // for ObjectID
"bsonType": "date" // For ISODate
Example:
"title": "ticket",
"type": "object",
"properties": {
"_id" : {"bsonType": "objectId"},
"created": {"bsonType": "date"}
},
"additionalProperties" : false
Not familiar with jsonschema but if you were to use ajv you could make use of custom keywords and validation such as:
{
"type": "object",
"properties": {
"someDate": {
"instanceof": "Date"
}
}
}
Related
I am using Kafka Connect with JSONSchema and am in a situation where I need to convert the JSON schema manually (to "Schema") within a Kafka Connect plugin. I can successfully retrieve the JSON Schema from the Schema Registry and am successful converting with simple JSON Schemas but I am having difficulties with ones that are complex and have valid "$ref" tags referencing components within a single JSON Schema definition.
I have several questions:
The JsonConverter.java does not appear to handle "$ref". Am I correct, or does it handle it in another way elsewhere?
Does the Schema Registry handle the referencing of sub-definitions? If yes, is there code that shows how the dereferencing is handled?
Should the JSON Schema be resolved to a string without references (ie. inline the references) before submitting to the Schema Registry and thereby remove the "$ref" issue?
I am looking at the Kafka Source code module JsonConverter.java below:
https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L428
An example of the complex schema (taken from the JSON Schema site) is shown below (notice the "$ref": "#/$defs/veggie" tag the references a later sub-definition)
{
"$id": "https://example.com/arrays.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "A representation of a person, company, organization, or place",
"title": "complex-schema",
"type": "object",
"properties": {
"fruits": {
"type": "array",
"items": {
"type": "string"
}
},
"vegetables": {
"type": "array",
"items": { "$ref": "#/$defs/veggie" }
}
},
"$defs": {
"veggie": {
"type": "object",
"required": [ "veggieName", "veggieLike" ],
"properties": {
"veggieName": {
"type": "string",
"description": "The name of the vegetable."
},
"veggieLike": {
"type": "boolean",
"description": "Do I like this vegetable?"
}
}
}
}
}
Below is the actual schema returned from the Schema Registry after it the schema was successfully registered:
[
{
"subject": "complex-schema",
"version": 1,
"id": 1,
"schemaType": "JSON",
"schema": "{\"$id\":\"https://example.com/arrays.schema.json\",\"$schema\":\"https://json-schema.org/draft/2020-12/schema\",\"description\":\"A representation of a person, company, organization, or place\",\"title\":\"complex-schema\",\"type\":\"object\",\"properties\":{\"fruits\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}},\"vegetables\":{\"type\":\"array\",\"items\":{\"$ref\":\"#/$defs/veggie\"}}},\"$defs\":{\"veggie\":{\"type\":\"object\",\"required\":[\"veggieName\",\"veggieLike\"],\"properties\":{\"veggieName\":{\"type\":\"string\",\"description\":\"The name of the vegetable.\"},\"veggieLike\":{\"type\":\"boolean\",\"description\":\"Do I like this vegetable?\"}}}}}"
}
]
The actual schema is embedded in the above returned string (the contents of the "schema" field) and contains the $ref references:
{\"$id\":\"https://example.com/arrays.schema.json\",\"$schema\":\"https://json-schema.org/draft/2020-12/schema\",\"description\":\"A representation of a person, company, organization, or place\",\"title\":\"complex-schema\",\"type\":\"object\",\"properties\":{\"fruits\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}},\"vegetables\":{\"type\":\"array\",\"items\":{\"$ref\":\"#/$defs/veggie\"}}},\"$defs\":{\"veggie\":{\"type\":\"object\",\"required\":[\"veggieName\",\"veggieLike\"],\"properties\":{\"veggieName\":{\"type\":\"string\",\"description\":\"The name of the vegetable.\"},\"veggieLike\":{\"type\":\"boolean\",\"description\":\"Do I like this vegetable?\"}}}}}
Again, the JsonConverter in the Apache Kafka source code has no notion of JSONSchema, therefore, no, $ref doesn't work and it also doesn't integrate with the Registry.
You seem to be looking for the io.confluent.connect.json.JsonSchemaConverter class + logic
I am trying to use AWS DMS and transfer data from mongodb to amazon elasticsearch.
i am encountering the following log in CloudWatch.
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
}
],
"type": "mapper_parsing_exception",
"reason": "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
},
"status": 400
}
This is my configuration for the mongo db source.
it has the _id as a separete column check box enabled.
i tried disabling it and it says that there is no primary key.
is there anything that you guys know that can fix it ?
quick note:
i have added mapping of the _id field to old_id and now it doesn't import all the other field, even when i add them in the mapping
As ElasticSearch will not support the LOB data type, Other fields are not migrated.
Add additional transformation rule to change the data type to String
{
"rule-type": "transformation",
"rule-id": "3",
"rule-name": "3",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "test",
"table-name": "%",
"column-name": "%"
},
"data-type": {
"type": "string",
"length": "30"
}
}
I have created a index in elasticsearch with multiple date field and formatted the column as yyyy-mm-dd HH:mm:ss. Eventually I found the date is malformed and was populating wrong data into the fields. The index has more than 600 000 records and I don't want to leave any data. Now I need to create another field or new index with same date field and format as YYYY-MM-ddTHH:mm:ss.Z and need to populate all the records into new index or new fields.
I have used the date processor pipeline as below. but it fails. Correct me anything wrong here.
PUT _ingest/pipeline/date-malform
{
"description": "convert malformed date to timestamp",
"processors": [
{
"date": {
"field": "event_tm",
"target_field" : "event_tm",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
},
{
"date": {
"field": "vendor_start_dt",
"target_field" : "vendor_start_dt",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
},
{
"date": {
"field": "vendor_end_dt",
"target_field" : "vendor_end_dt",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
}
]
}
I have created the pipeline and used reindex as below
POST _reindex
{
"source": {
"index": "tog_gen_test"
},
"dest": {
"index": "data_mv",
"pipeline": "some_ingest_pipeline",
"version_type": "external"
}
}
I am getting the below error while running the reindex
"failures": [
{
"index": "data_mv",
"type": "_doc",
"id": "rwN64WgB936y_JOyjc57",
"cause": {
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Illegal pattern component: T"
}
}
You can either use logstash like Shailesh Pratapwar suggested, but you also have the option to use elasticsearch reindex + ingest to do the same:
Create an ingest pipeline with the proper date processor in order to fix the date format/manipulation: https://www.elastic.co/guide/en/elasticsearch/reference/master/date-processor.html
reindex the data from the old index, to a new index, with the date manipulation. from: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
Reindex can also use the Ingest Node feature by specifying a pipeline
Use Logstash.
Read from ElasticSearch using LogStash.
Manipulate the date format.
Write to ElasticSearch using LogStash.
I am sending a POST request to http://orion.lab.fiware.org:1026/v2/entities/85/attrs?type=UrbansenseLocation in order to update atime and bundle attributes:
{
"atime":{
"type":"Number",
"value":1476370651},
"bundle":{
"type":"Number",
"value":1}
}
and a GET request to the same entity receives the following response:
{
"id": "85",
"type": "UrbansenseLocation",
"atime": {
"type": "Number",
"value": 1476370000,
"metadata": {}
},
"bundle": {
"type": "Number",
"value": 1,
"metadata": {}
},
//some other attributes
}
Please, note the mismatch on the value field of the atime attribute!!! Why is such thing happening?
Thanks.
I understand that atime is meant to be a datetime. In that case, I'd suggest to use the DateTime attribute type. This would provide a better semantic for the attribute and should avoid any number rending problem (as the ones that are being discussed right now at github).
More information about the DateTime type at the NGSIv2 specification (section "Special Attribute Types") and this document (look for the "Datetime support" slide).
I was trying to create custom object and corresponding fields in Eloqua. While creating a field with datatype largeText it throws validation error. I can create fields with datatypes like date, text, numeric etc. How can I create largeText fields?
This is my request body
{
"type": "CustomObject",
"description": "TestObject",
"name": "TestObject",
"fields": [
{
"type": "CustomObjectField",
"name": "Description",
"dataType": "largeText",
"displayType": "text"
}
]
}
Response is [Status=Validation error, StatusCode=400]
You should use "displayType":"textArea"for creating largeText fields.