Error uploading documents that have null values for longitude and latitude (geo_point) in Elastic 7.15 - elastic-stack

I am using the "Upload File" option to upload csv file that has some values of longitude and latitude as null in Elastic 7.15. The Mappings and Ingest pipeline are as below
Mapping...
"Latitude": {
"type": "double"
},
"Longitude": {
"type": "double"
},
"UniqueID": {
"type": "keyword"
},
"Unit Number": {
"type": "long"
},
"User ID": {
"type": "long"
},
"location": {
"type": "geo_point"
}
...
Ingest pipeline
...
{
"set": {
"field": "location",
"value": "{{Latitude}},{{Longitude}}"
}
}
....
location field is auto added (combined fields)
When I import the csv with these settings, i am getting error that documents that are empty could not be imported
Error: 8: failed to parse field [location] of type [geo_point]
{"message":"1143266,1/4/2021,E BRECKENRIDGE 1000.0 FT E OF BERMUDA,,,1,"2,186,198""}
I would like to be able to import documents that have null values for coordinates while keeping the type as geo_point since I am creating map visualization. If I remove Set on location or add script "if": "ctx.latitude_field != null && ctx.longitude_field != null", to Set, I can upload all the docs, but then map visualization does not show any documents for location field

I was able to bypass this issue by adding a new field Location(concat(lat,long)) in the csv and updating mapping and removing Set.

Related

What column type do I need for this nested data in BigQuery?

I have a JSON schema for a Kafka stream that I am integrating with BigQuery but I can't get the data type correct at the BigQuery end. This is the schema:
"my_meta_data": {
"type": "object",
"properties": {
"property_1": {
"type": "array",
"items": {
"type": "number"
}
},
"property_2": {
"type": "array",
"items": {
"type": "number"
}
},
"property_3": {
"type": "array",
"items": {
"type": "number"
}
}
}
}
I tried this in the JSON file defining the BigQuery table:
{
"name": "my_meta_data",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "property_1",
"type": "INT64",
"mode": "REPEATED"
},
{
"name": "property_2",
"type": "INT64",
"mode": "REPEATED"
},
{
"name": "property_3",
"type": "INT64",
"mode": "REPEATED"
}
]
}
I am using a hosted connector from Confluent, the Kafka provider, and the error message is
The connector is failing because it cannot write a non-array element to an array column. Please check the schemas of the data in Kafka and the BigQuery tables the connector is writing to, and ensure that all data from Kafka that will be written to an array column in BigQuery is contained in an array.
I haven't defined an array column though, I've defined a RECORD column that contains arrays. Any ideas how I can set up the BigQuery table to capture this data? Thanks in advance.

Malformed date field to populate into new field in elasticsearch

I have created a index in elasticsearch with multiple date field and formatted the column as yyyy-mm-dd HH:mm:ss. Eventually I found the date is malformed and was populating wrong data into the fields. The index has more than 600 000 records and I don't want to leave any data. Now I need to create another field or new index with same date field and format as YYYY-MM-ddTHH:mm:ss.Z and need to populate all the records into new index or new fields.
I have used the date processor pipeline as below. but it fails. Correct me anything wrong here.
PUT _ingest/pipeline/date-malform
{
"description": "convert malformed date to timestamp",
"processors": [
{
"date": {
"field": "event_tm",
"target_field" : "event_tm",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
},
{
"date": {
"field": "vendor_start_dt",
"target_field" : "vendor_start_dt",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
},
{
"date": {
"field": "vendor_end_dt",
"target_field" : "vendor_end_dt",
"formats" : ["YYYY-MM-ddThh:mm:ss.Z"]
"timezone" : "UTC"
}
}
]
}
I have created the pipeline and used reindex as below
POST _reindex
{
"source": {
"index": "tog_gen_test"
},
"dest": {
"index": "data_mv",
"pipeline": "some_ingest_pipeline",
"version_type": "external"
}
}
I am getting the below error while running the reindex
"failures": [
{
"index": "data_mv",
"type": "_doc",
"id": "rwN64WgB936y_JOyjc57",
"cause": {
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "unable to parse date [2019-02-12 10:29:35]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Illegal pattern component: T"
}
}
You can either use logstash like Shailesh Pratapwar suggested, but you also have the option to use elasticsearch reindex + ingest to do the same:
Create an ingest pipeline with the proper date processor in order to fix the date format/manipulation: https://www.elastic.co/guide/en/elasticsearch/reference/master/date-processor.html
reindex the data from the old index, to a new index, with the date manipulation. from: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
Reindex can also use the Ingest Node feature by specifying a pipeline
Use Logstash.
Read from ElasticSearch using LogStash.
Manipulate the date format.
Write to ElasticSearch using LogStash.

Using ADF Copy Activity with dynamic schema mapping

I'm trying to drive the columnMapping property from a database configuration table. My first activity in the pipeline pulls in the rows from the config table. My copy activity source is a Json file in Azure blob storage and my sink is an Azure SQL database.
In copy activity I'm setting the mapping using the dynamic content window. The code looks like this:
"translator": {
"value": "#json(activity('Lookup1').output.value[0].ColumnMapping)",
"type": "Expression"
}
My question is, what should the value of activity('Lookup1').output.value[0].ColumnMapping look like?
I've tried several different json formats but the copy activity always seems to ignore it.
For example, I've tried:
{
"type": "TabularTranslator",
"columnMappings": {
"view.url": "url"
}
}
and:
"columnMappings": {
"view.url": "url"
}
and:
{
"view.url": "url"
}
In this example, view.url is the name of the column in the JSON source, and url is the name of the column in my destination table in Azure SQL database.
The issue is due to the dot (.) sign in your column name.
To use column mapping, you should also specify structure in your source and sink dataset.
For your source dataset, you need specify your format correctly. And since your column name has dot, you need specify the json path as following.
You could use ADF UI to setup a copy for a single file first to get the related format, structure and column mapping format. Then change it to lookup.
And as my understanding, your first format should be the right format. If it is already in json format, then you may not need use "json" function in your expression.
There seems to be a disconnect between the question and the answer, so I'll hopefully provide a more straightforward answer.
When setting this up, you should have a source dataset with dynamic mapping. The sink doesn't require one, as we're going to specify it in the mapping.
Within the copy activity, format the dynamic json like the following:
{
"structure": [
{
"name": "Address Number"
},
{
"name": "Payment ID"
},
{
"name": "Document Number"
},
...
...
]
}
You would then specify your dynamic mapping like this:
{
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "Address Number",
"type": "Int32"
},
"sink": {
"name": "address_number"
}
},
{
"source": {
"name": "Payment ID",
"type": "Int64"
},
"sink": {
"name": "payment_id"
}
},
{
"source": {
"name": "Document Number",
"type": "Int32"
},
"sink": {
"name": "document_number"
}
},
...
...
]
}
}
Assuming these were set in separate variables, you would want to send the source as a string, and the mapping as json:
source: #string(json(variables('str_dyn_structure')).structure)
mapping: #json(variables('str_dyn_translator')).translator
VladDrak - You could skip the source dynamic definition by building dynamic mapping like this:
{
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"type": "String",
"ordinal": "1"
},
"sink": {
"name": "dateOfActivity",
"type": "String"
}
},
{
"source": {
"type": "String",
"ordinal": "2"
},
"sink": {
"name": "CampaignID",
"type": "String"
}
}
]
}
}

Metabase: Date Range filter to dashboards created by native query

I use Metabase for data visualization.
Druid (imply-2.2.3) is data storage.
Created Metabase Questions I put on the dashboard and filter them all by Date Range.
When I try to add Date Range filter to question created by native query, metabase just can't find timestamp field for filtering and says: 'No valid fields':
Same problem even if I convert question, created by UI constructor to native query by using "View native query" option:
Generated Native query to Druid:
{
"intervals": ["1900-01-01/2100-01-01"],
"granularity": {
"type": "period",
"period": "P1D",
"timeZone": "UTC"
},
"context": {
"timeout": 60000
},
"queryType": "timeseries",
"dataSource": "DotmailerCampaign",
"aggregations": [{
"type": "javascript",
"name": "sum",
"fieldNames": ["numOpens"],
"fnReset": "function() { return 0 ; }",
"fnAggregate": "function(current, x) { return current + (parseFloat(x) || 0); }",
"fnCombine": "function(x, y) { return x + y; }"
}],
"descending": false
}
What is missed?

Does Mongo-connector supports adding fields before inserting to Elasticsearch?

I have many docements in mongoDB. Mongo-connector inserts those data to elasticsearch. Is there a way, before inserting in to ES where we can add extra field to the document and then insert into elasticsearch? Is there any way in mongo-connector to do the above?
UPDATE
based on your UPDATE 3 i created mappings some thing like this is it correct?
PUT my_index2
{
"mappings":{
"my_type2": {
"transform": {
"script": {
"inline": "if (ctx._source.geopoint.alt) ctx._source.geopoint.remove('alt')",
"lang": "groovy"
}
},
"properties": {
"geopoint": {
"type": "geo_point"
}
}
}
}
}
ERROR
This what the error i keep getting when i tried to insert your mapping
{
"error": {
"root_cause": [
{
"type": "script_parse_exception",
"reason": "Value must be of type String: [script]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [my_type2]: Value must be of type String: [script]",
"caused_by": {
"type": "script_parse_exception",
"reason": "Value must be of type String: [script]"
}
},
"status": 400
}
UPDATE 2
Now the mapping is getting inserted and getting the acknowledge as true. But when try to insert the json data like below its throwing error.
PUT my_index2/my_type2/1
{
"geopoint": {
"lon": 48.845877,
"lat": 8.821861,
"alt": 0.0
}
}
ERROR FOR UPDATE2
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "scripts of type [inline], operation [mapping] and lang [groovy] are disabled"
}
}
},
"status": 400
}
ERROR 1 FOR UPDATE 2
After adding script.inline:true, tried to insert the data but getting following error.
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "field must be either [lat], [lon] or [geohash]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "parse_exception",
"reason": "field must be either [lat], [lon] or [geohash]"
}
},
"status": 400
}
mongo-connector aims at synchronizing a Mongo database with another target system, such as ES, Solr or another Mongo DB. Synchronizing means 1:1 replication, so there's no way that I know of for mongo-connector to enrich documents during the replication (and it's not its intent either).
However, in ES 5 we'll soon be able to use ingest nodes in which we'll be able to define processing pipelines whose goal is to enrich documents before they get indexed.
UPDATE
There's probably a way by modifying the formatters.py file.
In transform_value I would add a case to handle Geopoint:
if isinstance(value, dict):
return self.format_document(value)
elif isinstance(value, list):
return [self.transform_value(v) for v in value]
# handle Geopoint class
elif isinstance(value, Geopoint):
return self.format.document({'lat': value['lat'], 'lon': value['lon']})
...
UPDATE 2
Let's try another approach by modifying the transform_element function (on line 104):
def transform_element(self, key, value):
try:
# add these next two lines
if key == 'GeoPoint':
value = {'lat': value['lat'], 'lon': value['lon']}
# do not modify the initial code below
new_value = self.transform_value(value)
yield key, new_value
except ValueError as e:
LOG.warn("Invalid value for key: %s as %s"
% (key, str(e)))
UPDATE 3
Another thing you might try is to add a transform. The reason I've not mentioned it before is that it was deprecated in ES 2.0, but in ES 5.0 you'll have ingest nodes and you'll be able to take care of it at ingest time using a remove processor
You can define your mapping like this:
PUT my_index2
{
"mappings": {
"my_type2": {
"transform": {
"script": "ctx._source.geopoint.remove('alt'); ctx._source.geopoint.remove('valid')"
},
"properties": {
"geopoint": {
"type": "geo_point"
}
}
}
}
}
Note: make sure enable dynamic scripting, by adding script.inline: true to elasticsearch.yml and restart your ES node.
What is going to happen is that the alt field will still be visible in the stored _source but it will not be indexed, and hence, no error should occur.
With ES 5, you'd simply create a pipeline with a remove processor, like this:
PUT _ingest/pipeline/geo-pipeline
{
"description" : "remove unsupported altitude field",
"processors" : [
{
"remove" : {
"field": "geopoint.alt"
}
}
]
}