I am using Scala 2.12 and Elasticsearch 6.5. Using the high level java client to query the ES.
Required Data is as E.g. Simple example of Documents has 2 sets of data (published 2 times) with different id and timestamp.
id: id_123 and id_234 (Theese are 2 different ids of required documents) and timestamp(representation only) 10 AM (for id_123) and 11 AM (for id_234).
So I just need those documents which are latest among these i.e. 11 AM one.
I have some filter conditions and then need to group on field1 and take the max of field2 (which is timestamp).
val searchRequest = new SearchRequest("index_name")
val searchSourceBuilder = new SearchSourceBuilder()
val qb = QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("myfield.date", "2019-07-02"))
.must(QueryBuilders.matchQuery("myfield.data", "1111"))
.must(QueryBuilders.boolQuery()
.should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex1"))
.should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex2"))
)
val myAgg = AggregationBuilders.terms("group_by_Id").field("field1.Id").subAggregation(AggregationBuilders.max("timestamp").field("field1.timeStamp"))
searchSourceBuilder.query(qb)
searchSourceBuilder.aggregation(myAgg)
searchSourceBuilder.size(1000)
searchRequest.source(searchSourceBuilder)
val searchResponse = client.search(searchRequest, RequestOptions.DEFAULT)
Basically, all works good if I do not use the aggregation.
When I use the aggregation, I am getting the following error:
ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Expected numeric type on field [field1.timeStamp], but got [keyword]]]
So what am I missing here?
I am basically looking for SQL-like query, which has fileter (where, AND/OR clause) and then group by a field (Id) and take documents only where timeStamp is max.
UPDATE:
I tried the above query in cURL via command prompt and get the same error when using "max" on aggregaation.
{
"query": {
"bool": {
"must": [
{
"match": { "myfield.date" : "2019-07-02" }
},
{
"match": { "myfield.data" : "1111" }
},
{
"bool": {
"should": [
{
"regexp": { "myOtherFieldId": "myregex1" }
},
{
"regexp": { "myOtherFieldId": "myregex2" }
}
]
}
}
]
}
},
"aggs": {
"NAME" : {
"terms": {
"field": "field1.Id"
},
"aggs": {
"NAME": {
"max" : {
"field": "field1.timeStamp"
}
}
}
}
},
"size": "10000"
}
I am getting the same error.
I tried to check the mappings of the index.
It is showing as keyword. So how to do max on such fields?
Adding the relevant mappings:
{"index_name":{"mappings":{"data":{"dynamic_templates":[{"boolean_as_keyword":{"match":"*","match_mapping_type":"boolean","mapping":{"ignore_above":256,"type":"keyword"}}},{"double_as_keyword":{"match":"*","match_mapping_type":"double","mapping":{"ignore_above":256,"type":"keyword"}}},{"long_as_keyword":{"match":"*","match_mapping_type":"long","mapping":{"ignore_above":256,"type":"keyword"}}},{"string_as_keyword":{"match":"*","match_mapping_type":"string","mapping":{"ignore_above":256,"type":"keyword"}}}],"date_detection":false,"properties":{"header":{"properties":{"Id":{"type":"keyword","ignore_above":256},"otherId":{"type":"keyword","ignore_above":256},"someKey":{"type":"keyword","ignore_above":256},"dataType":{"type":"keyword","ignore_above":256},"processing":{"type":"keyword","ignore_above":256},"otherKey":{"type":"keyword","ignore_above":256},"sender":{"type":"keyword","ignore_above":256},"receiver":{"type":"keyword","ignore_above":256},"system":{"type":"keyword","ignore_above":256},"timeStamp":{"type":"keyword","ignore_above":256}}}}}}}}
UPDATE2:
I think I need to aggregate (timeStamp) on keyword.
Please note that timeStamp is a subfield i.e. under field1. So below syntax for keyword doesn't seem to work or I am missing something else.
"aggs": {
"NAME" : {
"terms": {
"field": "field1.Id"
},
"aggs": {
"NAME": {
"max" : {
"field": "field1.timeStamp.keyword"
}
}
}
}
}
It fails now saying:
"Invalid aggregator order path [field1.timeStamp]. Unknown aggregation [field1]"
In mapReduce function of Mongo, can we skip storing a document with the out option.
Eg:
Sample Documents - animals -
{
"type": "bird",
"value": "10"
},
{
"type": "fish",
"value": "30"
},
{
"type": "fish",
"value": "20"
},
{
"type": "plant",
"value": "40"
}
Map reduce functions
function map() {
emit(this.type, this.value);
}
function reduce(key, value) {
var total = Array.sum(value);
// --> IF total is less than 20 then skip
if (total < 20) {
// skip... Maybe something like returning null? --> Is something like this possible?
return;
}
return total;
}
db.animals.mapReduce(map, reduce, { out: 'mr_test' })
I tried searching for a solution around this in the documentation and I wasn't able to find it. Can anyone help me with this?
No you cannot but you can always delete documents from the out collection:
db.mr_test.deleteMany({ value: { $lt:20 } })
Consider the following MongoDB collection / prototype that keeps track of how many cookies a given person has at a given point in time:
{
"_id": ObjectId("5c5b5c1865e463c5b6a5b748"),
"person": "Drew",
"cookies": 1,
"timestamp": ISODate("2019-02-05T20:34:48.922Z")
}
{
"_id": ObjectId("5c5b5c2265e463c5b6a5b749"),
"person": "Max",
"cookies": 3,
"timestamp": ISODate("2019-02-06T20:34:48.922Z")
}
{
"_id": ObjectId("5c5b5c2e65e463c5b6a5b74a"),
"person": "Max",
"cookies": 0,
"timestamp": ISODate("2019-02-07T20:34:48.922Z")
}
Ultimately, I need to get all people who currently have more than 0 cookies - In the above example, only "Drew" would qualify - ("Max" had 3, but later only had 0).
I've written the following map / reduce functions to sort this out..
var map = function(){
emit(this.person, {'timestamp' : this.timestamp, 'cookies': this.cookies})
}
var reduce = function(person, cookies){
let latestCookie = cookies.sort(function(a,b){
if(a.timestamp > b.timestamp){
return -1;
} else if(a.timestamp < b.timestamp){
return 1
} else {
return 0;
}
})[0];
return {
'timestamp' : latestCookie.timestamp,
'cookies' : latestCookie.cookies
};
}
This works fine and I get the following resultSet:
db.cookies.mapReduce(map, reduce, {out:{inline:1}})
...
"results": [
{
"_id": "Drew",
"value": {
"timestamp": ISODate("2019-02-05T20:34:48.922Z"),
"cookies": 1
}
},
{
"_id": "Max",
"value": {
"timestamp": ISODate("2019-02-07T20:34:48.922Z"),
"cookies": 0
}
}
],
...
Max is included in the results - But I'd like for him to not be included (he has 0 cookies after all)
What are my options here? I'm still relatively new to MongoDB. I have looked at finalize as well as creating a temporary collection ({out: "temp.cookies"}) but I'm just curious if there's an easier option or parameter I am overlooking.
Am I crazy for using MapReduce to solve this problem? The actual workload behind this scenario will include millions of rows..
Thanks in advance
This is working fine in kibana console without any error
POST rangedattr_4_1/_search
{
"size": 0,
"aggs": {
"user_field": {
"terms": {
"field": "FRONTLINK_OBJECT_GUID",
"size": 10
},
"aggs": {
"group_members": {
"max": {
"field": "MERGED_LINKS_TYPE"
}
},
"groupmember_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"groupmembers": "group_members"
},
"script": "params.groupmembers %2==1"
}
},
"aggs":{
"top_hits": {
"size": 1
}
}
}
}
}
}
But when I converted this into java code,
like this:
Map<String, String> bucketsPathsMap = new HashMap<>();
bucketsPathsMap.put("bucketselector", "FRONTLINK_OBJECT_GUID");
AggregationBuilder aggregation =AggregationBuilders
.terms("aggs").field(linkBackGuid).size(100000)
.subAggregation
(
AggregationBuilders.max("maxAgg").field("MERGED_LINKS_TYPE")
)
.subAggregation
(
PipelineAggregatorBuilders.bucketSelector("bucket_filter", bucketsPathsMap, script)
)
.subAggregation
(
AggregationBuilders.topHits("top").size(1)
);
it is giving all shards failed error
but when I run this :
AggregationBuilder aggregation =AggregationBuilders
.terms("aggs").field(linkBackGuid).size(100000)
.subAggregation
(
AggregationBuilders.max("maxAgg").field("MERGED_LINKS_TYPE")
)
.subAggregation
(
AggregationBuilders.topHits("top").size(1)
);
it is working fine
According to my information all shards failed will occur only if the docs are corrupted. But as I saw here, Max-Aggregation is working fine but at the same moment Bucket Selector Aggregation is giving error
it worked when I changed my bucketsPathsMap from
bucketsPathsMap.put("bucketselector","FRONTLINK_OBJECT_GUID")
to
bucketsPathsMap.put("bucketselector","maxAgg")
where FRONTLINK_OBJECT_GUID is a field and maxAgg has buckets
Seems this question is a very popular one, but I haven't found an answer (or at least not one I was able to understand).
I'm having a flat file that I would like to store in Mongo using some nestings. Though it is relatively easy to achieve with an insert and unique content I have the need to update the content on a regular base, so would need to be able to use the update commands also.
My flat file looks as follows :
Model,Category,Organisation,CountryCode,CountryWarranty,PeriodCode,PeriodQty
Model1,Category1,Org1,Code1,2Y,201707,1
Model1,Category1,Org1,Code1,2Y,201708,2
Model1,Category1,Org1,Code1,1Y,201709,3
Model1,Category1,Org1,Code2,2Y,201707,7
Model1,Category1,Org1,Code2,2Y,201708,8
Model1,Category1,Org1,Code2,5Y,201709,7
Model1,Category1,Org2,Code3,2Y,201707,5
Model1,Category1,Org2,Code3,4Y,201708,6
Model1,Category1,Org2,Code3,2Y,201709,7
...
Model_n,Category_n,Org_n,Code_n,3Y,201802,20
and what I like to achieve is the following :
{
"_id": "Model1",
"Model_category": "Category1",
"Product_Sales": [
{
"Organisation": "Org1",
"Country": [
{
"Code": "Code1",
"Guarantee_Years": "2Y",
"Period": [
{"Code": 201707,"Qty": 1},
{"Code": 201708,"Qty": 2},
{"Code": 201709,"Qty": 3}
]
}, {
"Code": "Code2",
"Guarantee_Years": "2Y",
"Period": [
{"Code": 201707,"Qty": 7},
{"Code": 201708,"Qty": 8},
{"Code": 201709,"Qty": 7}
]
}
]
}, {
"Organisation": "Org2",
"Country": [
{
"Code": "Code3",
"Guarantee_Years": "2Y",
"Period": [
{"Code": 201707,"Qty": 5},
{"Code": 201708,"Qty": 6},
{"Code": 201709,"Qty": 7}
]
}
]
}
]
}
Below a snippet of what I tried, note that the syntax is specific to my development environment so I know it is not workable or proper mongo, but it's about the basic idea. Any example using the console will do fine for me
concat("{update: "master_Sales",
updates: [
{
q:{"_id":", %{_id},""},
u:{$addToSet: {
"Product_Sales.Organisation": "", %{org}, "",
"Product_Sales.Organisation.Country": [
-- more here but have no clue --
]
}}
, upsert: true}
]}"
)
Adding my organisations works fine, but as soon as I want to add a second level (nested within an org) it goes wrong.
So in essence I want to be able to add this flat content to my Mongo in a nested array structure, and each time one of the values is changed in the future (say the quantity is updated, or a new country is added) that the line is added / updated so I am not forced to do a full refresh and insert each time a line is modified.
What would be the best approach to deal with this?
say the quantity is updated, or a new country is added
You can try below update query in 3.6 mongo version.
For updating Qty for Organisation/Country Code/Period Code - Org1/Code1/201707
db.collection.update(
{ },
{ "$set": { "Product_Sales.$[org].Country.$[country].Period.$[period].Qty" : 2 } },
{ arrayFilters: [ { "org.organisation": "Org1" }, { "country.Code": "Code1" }, { "period.Code": 201707 } ] }
)
For adding new Country to Organisation Org2
db.collection.update(
{ },
{ "$push": { "Product_Sales.$[org].Country" : {"Code": "Code4","Guarantee_Years": "2Y"} } },
{ arrayFilters: [ { "org.organisation": "Org2"} ] }
)
Can be simplified to
db.collection.update(
{ "org.organisation": "Org2"},
{ "$push": { "Product_Sales.$.Country" : {"Code": "Code4","Guarantee_Years": "2Y"} } }
)