Getting an error while loading data with DMS from mongodb to elasticsearch, any ideas? - mongodb

I am trying to use AWS DMS and transfer data from mongodb to amazon elasticsearch.
i am encountering the following log in CloudWatch.
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
}
],
"type": "mapper_parsing_exception",
"reason": "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
},
"status": 400
}
This is my configuration for the mongo db source.
it has the _id as a separete column check box enabled.
i tried disabling it and it says that there is no primary key.
is there anything that you guys know that can fix it ?
quick note:
i have added mapping of the _id field to old_id and now it doesn't import all the other field, even when i add them in the mapping

As ElasticSearch will not support the LOB data type, Other fields are not migrated.
Add additional transformation rule to change the data type to String
{
"rule-type": "transformation",
"rule-id": "3",
"rule-name": "3",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "test",
"table-name": "%",
"column-name": "%"
},
"data-type": {
"type": "string",
"length": "30"
}
}

Related

Kafka Connect JSON Schema does not appear to support "$ref" tags

I am using Kafka Connect with JSONSchema and am in a situation where I need to convert the JSON schema manually (to "Schema") within a Kafka Connect plugin. I can successfully retrieve the JSON Schema from the Schema Registry and am successful converting with simple JSON Schemas but I am having difficulties with ones that are complex and have valid "$ref" tags referencing components within a single JSON Schema definition.
I have several questions:
The JsonConverter.java does not appear to handle "$ref". Am I correct, or does it handle it in another way elsewhere?
Does the Schema Registry handle the referencing of sub-definitions? If yes, is there code that shows how the dereferencing is handled?
Should the JSON Schema be resolved to a string without references (ie. inline the references) before submitting to the Schema Registry and thereby remove the "$ref" issue?
I am looking at the Kafka Source code module JsonConverter.java below:
https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L428
An example of the complex schema (taken from the JSON Schema site) is shown below (notice the "$ref": "#/$defs/veggie" tag the references a later sub-definition)
{
"$id": "https://example.com/arrays.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "A representation of a person, company, organization, or place",
"title": "complex-schema",
"type": "object",
"properties": {
"fruits": {
"type": "array",
"items": {
"type": "string"
}
},
"vegetables": {
"type": "array",
"items": { "$ref": "#/$defs/veggie" }
}
},
"$defs": {
"veggie": {
"type": "object",
"required": [ "veggieName", "veggieLike" ],
"properties": {
"veggieName": {
"type": "string",
"description": "The name of the vegetable."
},
"veggieLike": {
"type": "boolean",
"description": "Do I like this vegetable?"
}
}
}
}
}
Below is the actual schema returned from the Schema Registry after it the schema was successfully registered:
[
{
"subject": "complex-schema",
"version": 1,
"id": 1,
"schemaType": "JSON",
"schema": "{\"$id\":\"https://example.com/arrays.schema.json\",\"$schema\":\"https://json-schema.org/draft/2020-12/schema\",\"description\":\"A representation of a person, company, organization, or place\",\"title\":\"complex-schema\",\"type\":\"object\",\"properties\":{\"fruits\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}},\"vegetables\":{\"type\":\"array\",\"items\":{\"$ref\":\"#/$defs/veggie\"}}},\"$defs\":{\"veggie\":{\"type\":\"object\",\"required\":[\"veggieName\",\"veggieLike\"],\"properties\":{\"veggieName\":{\"type\":\"string\",\"description\":\"The name of the vegetable.\"},\"veggieLike\":{\"type\":\"boolean\",\"description\":\"Do I like this vegetable?\"}}}}}"
}
]
The actual schema is embedded in the above returned string (the contents of the "schema" field) and contains the $ref references:
{\"$id\":\"https://example.com/arrays.schema.json\",\"$schema\":\"https://json-schema.org/draft/2020-12/schema\",\"description\":\"A representation of a person, company, organization, or place\",\"title\":\"complex-schema\",\"type\":\"object\",\"properties\":{\"fruits\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}},\"vegetables\":{\"type\":\"array\",\"items\":{\"$ref\":\"#/$defs/veggie\"}}},\"$defs\":{\"veggie\":{\"type\":\"object\",\"required\":[\"veggieName\",\"veggieLike\"],\"properties\":{\"veggieName\":{\"type\":\"string\",\"description\":\"The name of the vegetable.\"},\"veggieLike\":{\"type\":\"boolean\",\"description\":\"Do I like this vegetable?\"}}}}}
Again, the JsonConverter in the Apache Kafka source code has no notion of JSONSchema, therefore, no, $ref doesn't work and it also doesn't integrate with the Registry.
You seem to be looking for the io.confluent.connect.json.JsonSchemaConverter class + logic

How to create mongodb 2dsphere index on strapi?

My strapi project use Mongo database.
Strapi version : 3.0.0-beta.19.5
I have tried :
creating the 2dsphere index manually with the command in the mongo console, but when the application start to run, the index got deleted. I think maybe the database kinda synchronize with the strapi model configuration.
I checked the strapi document and I see there is an option to create an index by adding a configuration to the model.settings.json, but there is only a single field index option.
Is there any way to create a 2dspere index?
I just found a solution. I had to look up on the mongoose documentation index part.
In strapi documentation, they only tell that the value of 'index' is a boolean type. Which is different from mongoose doc. Actually the model.settings.json structure follow mongoose documentation.
So, to create the 2dsphere index we just need to specify "2dsphere" in the key "index" on that field.
Eg.
{
"kind": "collectionType",
"connection": "default",
"collectionName": "phone_stores",
"info": {
"name": "phoneStore"
},
"options": {
"increments": true,
"timestamps": true
},
"attributes": {
"car": {
"type": "integer",
"required": true
},
"userStoreId": {
"type": "objectId"
},
"location": {
"type": "json",
"index": "2dsphere" // <------ <1>
},
}
}
<1> if true were specify, Single field index will be created on this field. But u can also specify other type of index, like in my case i use '2dsphere'.
UPDATE
What I said about
u can also specify other type of index
is not correct. The index type is limited because of the framework. So far I have test with 2dsphere, which is working. I test also text index, but it didn't work.

Malformed date field to populate into new field in elasticsearch

I have created a index in elasticsearch with multiple date field and formatted the column as yyyy-mm-dd HH:mm:ss. Eventually I found the date is malformed and was populating wrong data into the fields. The index has more than 600 000 records and I don't want to leave any data. Now I need to create another field or new index with same date field and format as YYYY-MM-ddTHH:mm:ss.Z and need to populate all the records into new index or new fields.
I have used the date processor pipeline as below. but it fails. Correct me anything wrong here.
PUT _ingest/pipeline/date-malform
{
"description": "convert malformed date to timestamp",
"processors": [
{
"date": {
"field": "event_tm",
"target_field" : "event_tm",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
},
{
"date": {
"field": "vendor_start_dt",
"target_field" : "vendor_start_dt",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
},
{
"date": {
"field": "vendor_end_dt",
"target_field" : "vendor_end_dt",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
}
]
}
I have created the pipeline and used reindex as below
POST _reindex
{
"source": {
"index": "tog_gen_test"
},
"dest": {
"index": "data_mv",
"pipeline": "some_ingest_pipeline",
"version_type": "external"
}
}
I am getting the below error while running the reindex
"failures": [
{
"index": "data_mv",
"type": "_doc",
"id": "rwN64WgB936y_JOyjc57",
"cause": {
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Illegal pattern component: T"
}
}
You can either use logstash like Shailesh Pratapwar suggested, but you also have the option to use elasticsearch reindex + ingest to do the same:
Create an ingest pipeline with the proper date processor in order to fix the date format/manipulation: https://www.elastic.co/guide/en/elasticsearch/reference/master/date-processor.html
reindex the data from the old index, to a new index, with the date manipulation. from: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
Reindex can also use the Ingest Node feature by specifying a pipeline
Use Logstash.
Read from ElasticSearch using LogStash.
Manipulate the date format.
Write to ElasticSearch using LogStash.

Does Mongo-connector supports adding fields before inserting to Elasticsearch?

I have many docements in mongoDB. Mongo-connector inserts those data to elasticsearch. Is there a way, before inserting in to ES where we can add extra field to the document and then insert into elasticsearch? Is there any way in mongo-connector to do the above?
UPDATE
based on your UPDATE 3 i created mappings some thing like this is it correct?
PUT my_index2
{
"mappings":{
"my_type2": {
"transform": {
"script": {
"inline": "if (ctx._source.geopoint.alt) ctx._source.geopoint.remove('alt')",
"lang": "groovy"
}
},
"properties": {
"geopoint": {
"type": "geo_point"
}
}
}
}
}
ERROR
This what the error i keep getting when i tried to insert your mapping
{
"error": {
"root_cause": [
{
"type": "script_parse_exception",
"reason": "Value must be of type String: [script]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [my_type2]: Value must be of type String: [script]",
"caused_by": {
"type": "script_parse_exception",
"reason": "Value must be of type String: [script]"
}
},
"status": 400
}
UPDATE 2
Now the mapping is getting inserted and getting the acknowledge as true. But when try to insert the json data like below its throwing error.
PUT my_index2/my_type2/1
{
"geopoint": {
"lon": 48.845877,
"lat": 8.821861,
"alt": 0.0
}
}
ERROR FOR UPDATE2
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "scripts of type [inline], operation [mapping] and lang [groovy] are disabled"
}
}
},
"status": 400
}
ERROR 1 FOR UPDATE 2
After adding script.inline:true, tried to insert the data but getting following error.
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "field must be either [lat], [lon] or [geohash]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "parse_exception",
"reason": "field must be either [lat], [lon] or [geohash]"
}
},
"status": 400
}
mongo-connector aims at synchronizing a Mongo database with another target system, such as ES, Solr or another Mongo DB. Synchronizing means 1:1 replication, so there's no way that I know of for mongo-connector to enrich documents during the replication (and it's not its intent either).
However, in ES 5 we'll soon be able to use ingest nodes in which we'll be able to define processing pipelines whose goal is to enrich documents before they get indexed.
UPDATE
There's probably a way by modifying the formatters.py file.
In transform_value I would add a case to handle Geopoint:
if isinstance(value, dict):
return self.format_document(value)
elif isinstance(value, list):
return [self.transform_value(v) for v in value]
# handle Geopoint class
elif isinstance(value, Geopoint):
return self.format.document({'lat': value['lat'], 'lon': value['lon']})
...
UPDATE 2
Let's try another approach by modifying the transform_element function (on line 104):
def transform_element(self, key, value):
try:
# add these next two lines
if key == 'GeoPoint':
value = {'lat': value['lat'], 'lon': value['lon']}
# do not modify the initial code below
new_value = self.transform_value(value)
yield key, new_value
except ValueError as e:
LOG.warn("Invalid value for key: %s as %s"
% (key, str(e)))
UPDATE 3
Another thing you might try is to add a transform. The reason I've not mentioned it before is that it was deprecated in ES 2.0, but in ES 5.0 you'll have ingest nodes and you'll be able to take care of it at ingest time using a remove processor
You can define your mapping like this:
PUT my_index2
{
"mappings": {
"my_type2": {
"transform": {
"script": "ctx._source.geopoint.remove('alt'); ctx._source.geopoint.remove('valid')"
},
"properties": {
"geopoint": {
"type": "geo_point"
}
}
}
}
}
Note: make sure enable dynamic scripting, by adding script.inline: true to elasticsearch.yml and restart your ES node.
What is going to happen is that the alt field will still be visible in the stored _source but it will not be indexed, and hence, no error should occur.
With ES 5, you'd simply create a pipeline with a remove processor, like this:
PUT _ingest/pipeline/geo-pipeline
{
"description" : "remove unsupported altitude field",
"processors" : [
{
"remove" : {
"field": "geopoint.alt"
}
}
]
}

Create large text field in Eloqua using REST API

I was trying to create custom object and corresponding fields in Eloqua. While creating a field with datatype largeText it throws validation error. I can create fields with datatypes like date, text, numeric etc. How can I create largeText fields?
This is my request body
{
"type": "CustomObject",
"description": "TestObject",
"name": "TestObject",
"fields": [
{
"type": "CustomObjectField",
"name": "Description",
"dataType": "largeText",
"displayType": "text"
}
]
}
Response is [Status=Validation error, StatusCode=400]
You should use "displayType":"textArea"for creating largeText fields.