ElasticSearch date parsing error while creating index with mappings - date

I create ElasticSearch (7.7 version) index (with mappings) in cdk script. here is my mapping for the date field:
{
"mappings": {
"numeric_detection": true,
"properties": {
"approximateArrivalTime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSSXXX"
}, ......}
But I keep having this error message:
{\"type\":\"illegal_argument_exception\",\"reason\":\"failed to parse date field [2020-11-23 11:48:20.472] with format [yyyy-MM-dd HH:mm:ss.SSSXXX]\",\"caused_by\":{\"type\":\"date_time_parse_exception\",\"reason\":\"Text \\u00272020-11-23 11:48:20.472\\u0027 could not be parsed at index 23\"}}}
What could be the reason?

The error clearly indicates that the data you are indexing in the approximateArrivalTime does not match with that of your date format specified in the index mapping.
Try indexing the document in the below format:
Index Mapping:
{
"mappings": {
"properties": {
"approximateArrivalTime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
}
}
}
}
Index Data:
{
"approximateArrivalTime":"2020-11-23 11:48:20.472"
}

Related

Some documents not appear in atlas-search when query by few letters

I have a collection. The document structure is,
{
model: {
name: 'string name'
}
}
I have enabled atlas search, Also created a search index for model.name field. Search works fine, But the only issue is couldn't get results for very minimal query letters.
Example:
I have a document,
{
model: {
name: "space1duplicate"
}
}
If I query space, I couldn't get the result.
{
index: 'search_index',
compound: {
must: [
{
text: {
query: 'space',
path: 'model.name'
}
}
]
}
}
But If I query space1duplica, It returns the result.
During indexing, full text search engine tokenizes the input by splitting up text into searchable chunks. Check out the relevant section in the documentation.
By default Atlas Search does not split words by digits, but if you need that, try to define a custom analyzer with the regex tokenizer and use it for your field:
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"analyzer": "digitSplitter",
"type": "string"
}
]
}
},
"analyzers": [
{
"charFilters": [],
"name": "digitSplitter",
"tokenFilters": [],
"tokenizer": {
"pattern": "[0-9]+",
"type": "regexSplit"
}
}
]
}
Also note that you can use multiple analyzers for string fields, if needed.
Atlas search uses Lucene to do the job. Documentation on mongodb site is mostly focused on mongo specific syntax to pass the query to Lucene and might be a bit confusing if you are not familiar with its query language.
First of all, there are number of tokenizers and analizers available, each serve specific purpose. You really need include index definition when you ask quetions about atlas search.
Default tokeniser uses word separators to build the index, then removes endings to store stems, again depending on language, English by default.
So in order to find "space1duplicate" by beginning of the word you can use "autocomplete" analizer with nGram tokens. The index should be created as following:
{
"mappings": {
"dynamic": false,
"fields": {
"name": {
"tokenization": "nGram",
"type": "autocomplete"
}
}
},
"storedSource": {
"include": [
"name"
]
}
}
Once it's indexed (you may need to wait a bit you you have larger dataset), you can find the document with following search:
{
index: 'search_index',
compound: {
must: [
{
autocomplete: {
query: 'spa',
path: 'name'
}
}
]
}
}

PUT mapping error elasticsearch with rest client : Failed to parse mapping [_doc]: Root mapping definition has unsupported parameters:

As you can see in the attachement, i want to create a mapping called movie but i have the following error:
Failed to parse mapping [_doc]: Root mapping definition has unsupported parameters:
{
"mappings": {
"movie":{
"properties": {
"year": {
"type": "date"
}
}
}
}
}
on elasticearch v 7.8
You are trying to create a mapping using a type, in your case movie, but since version 7.0 the mappings are typeless and you can't create mappings using a type anymore.
You should use the following mapping.
{
"mappings": {
"properties": {
"year": {
"type": "date"
}
}
}
}
This will create a mapping for the field year with the date date.

mongodb timestamp to date

i am pretty new to mongodb. i have a collection or json file like below.
{
"id": {
"timestamp": 1592538583,
"machineIdentifier": 1772242,
"processIdentifier": -7129,
"counter": 2887223,
"timeSecond": 1592538583,
"time": 1592538583000,
"date": 1592538583000
},
"creationTimestamp": 1592538583524,
"lastUpdateTimestamp": 1592538642832,
"idAsString": "5eec35d71b0ad2e4272c0e37"
}
i need to extract records based on timestamp. when i give below format its working. But i need human readable format. like lastUpdateTimestamp greater than "2020-06-30T00:00:00Z". i tried many waysbut getting bson format errors. any suggestions?
{ "lastUpdateTimestamp": { $gt : new Date(1592282308044) }}
Try to use it as a number.
{ "lastUpdateTimestamp": { $gt : 1592282308044 }}

Elasticsearch java high level client group by and max

I am using Scala 2.12 and Elasticsearch 6.5. Using the high level java client to query the ES.
Required Data is as E.g. Simple example of Documents has 2 sets of data (published 2 times) with different id and timestamp.
id: id_123 and id_234 (Theese are 2 different ids of required documents) and timestamp(representation only) 10 AM (for id_123) and 11 AM (for id_234).
So I just need those documents which are latest among these i.e. 11 AM one.
I have some filter conditions and then need to group on field1 and take the max of field2 (which is timestamp).
val searchRequest = new SearchRequest("index_name")
val searchSourceBuilder = new SearchSourceBuilder()
val qb = QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("myfield.date", "2019-07-02"))
.must(QueryBuilders.matchQuery("myfield.data", "1111"))
.must(QueryBuilders.boolQuery()
.should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex1"))
.should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex2"))
)
val myAgg = AggregationBuilders.terms("group_by_Id").field("field1.Id").subAggregation(AggregationBuilders.max("timestamp").field("field1.timeStamp"))
searchSourceBuilder.query(qb)
searchSourceBuilder.aggregation(myAgg)
searchSourceBuilder.size(1000)
searchRequest.source(searchSourceBuilder)
val searchResponse = client.search(searchRequest, RequestOptions.DEFAULT)
Basically, all works good if I do not use the aggregation.
When I use the aggregation, I am getting the following error:
ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Expected numeric type on field [field1.timeStamp], but got [keyword]]]
So what am I missing here?
I am basically looking for SQL-like query, which has fileter (where, AND/OR clause) and then group by a field (Id) and take documents only where timeStamp is max.
UPDATE:
I tried the above query in cURL via command prompt and get the same error when using "max" on aggregaation.
{
"query": {
"bool": {
"must": [
{
"match": { "myfield.date" : "2019-07-02" }
},
{
"match": { "myfield.data" : "1111" }
},
{
"bool": {
"should": [
{
"regexp": { "myOtherFieldId": "myregex1" }
},
{
"regexp": { "myOtherFieldId": "myregex2" }
}
]
}
}
]
}
},
"aggs": {
"NAME" : {
"terms": {
"field": "field1.Id"
},
"aggs": {
"NAME": {
"max" : {
"field": "field1.timeStamp"
}
}
}
}
},
"size": "10000"
}
I am getting the same error.
I tried to check the mappings of the index.
It is showing as keyword. So how to do max on such fields?
Adding the relevant mappings:
{"index_name":{"mappings":{"data":{"dynamic_templates":[{"boolean_as_keyword":{"match":"*","match_mapping_type":"boolean","mapping":{"ignore_above":256,"type":"keyword"}}},{"double_as_keyword":{"match":"*","match_mapping_type":"double","mapping":{"ignore_above":256,"type":"keyword"}}},{"long_as_keyword":{"match":"*","match_mapping_type":"long","mapping":{"ignore_above":256,"type":"keyword"}}},{"string_as_keyword":{"match":"*","match_mapping_type":"string","mapping":{"ignore_above":256,"type":"keyword"}}}],"date_detection":false,"properties":{"header":{"properties":{"Id":{"type":"keyword","ignore_above":256},"otherId":{"type":"keyword","ignore_above":256},"someKey":{"type":"keyword","ignore_above":256},"dataType":{"type":"keyword","ignore_above":256},"processing":{"type":"keyword","ignore_above":256},"otherKey":{"type":"keyword","ignore_above":256},"sender":{"type":"keyword","ignore_above":256},"receiver":{"type":"keyword","ignore_above":256},"system":{"type":"keyword","ignore_above":256},"timeStamp":{"type":"keyword","ignore_above":256}}}}}}}}
UPDATE2:
I think I need to aggregate (timeStamp) on keyword.
Please note that timeStamp is a subfield i.e. under field1. So below syntax for keyword doesn't seem to work or I am missing something else.
"aggs": {
"NAME" : {
"terms": {
"field": "field1.Id"
},
"aggs": {
"NAME": {
"max" : {
"field": "field1.timeStamp.keyword"
}
}
}
}
}
It fails now saying:
"Invalid aggregator order path [field1.timeStamp]. Unknown aggregation [field1]"

RemoteTransportException, Fielddata is disabled on text fields when doing aggregation on text field

I am migrating from 2.x to 5.x
I am adding values to the index like this
indexInto (indexName / indexType) id someKey source foo
however I would also want to fetch all values by field:
def getValues(tag: String) ={
client execute {
search(indexName / indexType) query ("_field_names", tag) aggregations (termsAggregation( "agg") field tag size 1)
}
But I am getting this exception :
RemoteTransportException[[8vWOLB2][172.17.0.5:9300][indices:data/read/search[phase/query]]];
nested: IllegalArgumentException[Fielddata is disabled on text fields
by default. Set fielddata=true on [my_tag] in order to load fielddata
in memory by uninverting the inverted index. Note that this can
however use significant memory.];
I am thought maybe to use keyword as shown here , but the fields are not known in advanced (sent by the user) so I cannot use perpend mappings
By default all the unknown fields will be indexed/added to elasticsearch as text fields which are not specified in the mappings.
If you will take a look at mappings of such a field, you can see there a field is enabled with for such fields with type 'keyword' and these fields are indexed but not analyzed.
GET new_index2/_mappings
{
"new_index2": {
"mappings": {
"type": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
so you can use the fields values for the text fields for aggregations like the following
POST new_index2/_search
{
"aggs": {
"NAME": {
"terms": {
"field": "name.fields",
"size": 10
}
}
}
}
Check name.fields
So your scala query can work if you can shift to fields value.
def getValues(tag: String) = {
client.execute {
search(indexName / indexType)
.query("_field_name", tag)
.aggregations {
termsAgg("agg", "field_name.fields")
}.size(1)
}
}
Hope this helps.
Thanks