ElasticSearch - pagination of date histogram results - date

Is it possible to paginate results from a date histogram?
I have the following example date histogram, but using the "from" : 0, "size" : 10 parameters do not seem to work.
POST _search/
{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"date_histogram" : {
"value_field": "value.count",
"interval" : "10s",
"field": "_timestamp"
}
}
}
}

The "from" and "size" parameters work only with results of your query (hits). Facets run on the entire result list regardless of how many records you choose to retrieve. Therefore, in order to implement "pagination" on histogram, you need to limit your query. For example, in order to retrieve all histogram buckets for the last hour you can simply add range query or filter to you query that will limit results to the current hour. If _timestamp is indexed as date, you can do something like this:
POST _search/
{
"query" : {
"filtered": {
"query": {
"match_all" : {}
},
"filter": {
"range" : {
"_timestamp" : {
"gt": "now-1h",
"lte": "now"
}
}
}
}
},
"facets" : {
"histo1" : {
"date_histogram" : {
"value_field": "value.count",
"interval" : "10s",
"field": "_timestamp"
}
}
}
}

Related

ELASTICSEARCH - Update the value of a field to current date

I'm looking for a way to update the value of a field through update_by_query in the field: reviewed_data in the next document:
{
"reviewed_date" : "2022-07-05T09:31:04.742077702Z",
"timestamp" : "2022-07-05T10:18:52.353852943Z"
}
I'm doing it through the following way:
POST index1/_update_by_query
{
"script": {
"source": "ctx._source.reviewed_date = OffsetDateTime.parse(ctx._source.reviewed_date)" ,
"lang": "painless"
},
"query": {
"term": {
"_id": "23r2fvwc3drgr4f2323dsd"
}
}
}
This way I only manage to update the value of the timestamp date, even though I am indicating reviewed_date.
But I would like to be able to assign the value directly to reviewed_data without modifying other fields.
Mapping:
"timestamp" : {
"type" : "date"
}
"reviewed_date" : {
"type" : "date"
},

Spring data MongoDb query based on last element of nested array field

I have the following data (Cars):
[
{
"make" : “Ferrari”,
"model" : “F40",
"services" : [
{
"type" : "FULL",
“date_time" : ISODate("2019-10-31T09:00:00.000Z"),
},
{
"type" : "FULL",
"scheduled_date_time" : ISODate("2019-11-04T09:00:00.000Z"),
}
],
},
{
"make" : "BMW",
"model" : “M3",
"services" : [
{
"type" : "FULL",
"scheduled_date_time" : ISODate("2019-10-31T09:00:00.000Z"),
},
{
"type" : "FULL",
“scheduled_date_time" : ISODate("2019-11-04T09:00:00.000Z"),
}
],
}
]
Using Spring data MongoDb I would like a query to retrieve all the Cars where the scheduled_date_time of the last item in the services array is in-between a certain date range.
A query which I used previously when using the first item in the services array is like:
mongoTemplate.find(Query.query(
where("services.0.scheduled_date_time").gte(fromDate)
.andOperator(
where("services.0.scheduled_date_time").lt(toDate))),
Car.class);
Note the 0 index since it's first one as opposed to the last one (for my current requirement).
I thought using an aggregate along with a projection and .arrayElementAt(-1) would do the trick but I haven't quite got it to work. My current effort is:
Aggregation agg = newAggregation(
project().and("services").arrayElementAt(-1).as("currentService"),
match(where("currentService.scheduled_date_time").gte(fromDate)
.andOperator(where("currentService.scheduled_date_time").lt(toDate)))
);
AggregationResults<Car> results = mongoTemplate.aggregate(agg, Car.class, Car.class);
return results.getMappedResults();
Any help suggestions appreciated.
Thanks,
This mongo aggregation retrieves all the Cars where the scheduled_date_time of the last item in the services array is in-between a specific date range.
[{
$addFields: {
last: {
$arrayElemAt: [
'$services',
-1
]
}
}
}, {
$match: {
'last.scheduled_date_time': {
$gte: ISODate('2019-10-26T04:06:27.307Z'),
$lt: ISODate('2019-12-15T04:06:27.319Z')
}
}
}]
I was trying to write it in spring-data-mongodb without luck.
They do not support $addFields yet, see here.
Since version 2.2.0 RELEASE spring-data-mongodb includes the Aggregation Repository Methods
The above query should be
interface CarRepository extends MongoRepository<Car, String> {
#Aggregation(pipeline = {
"{ $addFields : { last:{ $arrayElemAt: [$services,-1] }} }",
"{ $match: { 'last.scheduled_date_time' : { $gte : '$?0', $lt: '$?1' } } }"
})
List<Car> getCarsWithLastServiceDateBetween(LocalDateTime start, LocalDateTime end);
}
This method logs this query
[{ "$addFields" : { "last" : { "$arrayElemAt" : ["$services", -1]}}}, { "$match" : { "last.scheduled_date_time" : { "$gte" : "$2019-11-03T03:00:00Z", "$lt" : "$2019-11-05T03:00:00Z"}}}]
The date parameters are not parsing correctly. I didn't spend much time making it work.
If you want the Car Ids this could work.
public List<String> getCarsIdWithServicesDateBetween(LocalDateTime start, LocalDateTime end) {
return template.aggregate(newAggregation(
unwind("services"),
group("id").last("services.date").as("date"),
match(where("date").gte(start).lt(end))
), Car.class, Car.class)
.getMappedResults().stream()
.map(Car::getId)
.collect(Collectors.toList());
}
Query Log
[{ "$unwind" : "$services"}, { "$group" : { "_id" : "$_id", "date" : { "$last" : "$services.scheduled_date_time"}}}, { "$match" : { "date" : { "$gte" : { "$date" : 1572750000000}, "$lt" : { "$date" : 1572922800000}}}}]

Query to count number of occurrence in array grouped by day

I have the following document structure:
(trackerEventsCollection) =
{
"_id" : ObjectId("5b26c4fb7c696201040c8ed1"),
"trackerId" : ObjectId("598fc51324h51901043d76de"),
"trackingEvents" : [
{
"type" : "checkin",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:34:58.964Z")
},
{
"type" : "power",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:51:23.795Z")
},
{
"type" : "position",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:51:23.985Z")
}
]
}
I would like to write a query that would count number of trackingEvents with type "type" : "power" grouped by day. This seems to be quite tricky to me because parent document does not have date and I should rely on timestamp field that belongs to the trackingEvents array members.
I'm not really experienced mongodb user and couldn't understand how can this be achieved so far.
Would really appreciate any help, thanks
To process your nested array as a separate documents you need to use $unwind. In the next stage you can use $match to filter out by type. Then you can group by single days counting occurences. The point is that you have to build grouping key containing year, month and day like in following code:
db.trackerEvents.aggregate([
{ $unwind: "$trackingEvents" },
{ $match: { "trackingEvents.type": "power" } },
{
$group: {
_id: {
year: { $year:"$trackingEvents.timestamp" },
month:{ $month:"$trackingEvents.timestamp" },
day: { $dayOfMonth:"$trackingEvents.timestamp" }
},
count: { $sum: 1 }
}
}
])

Elasticsearch doesn't find value in range query

I launch following query:
GET archive-bp/_search
{
"query": {
"bool" : {
"filter" : [ {
"bool" : {
"should" : [ {
"terms" : {
"naDataOwnerCode" : [ "ACME-FinServ", "ACME-FinServ CA", "ACME-FinServ NY", "ACME-FinServ TX", "ACME-Shipping APA", "ACME-Shipping Eur", "ACME-Shipping LATAM", "ACME-Shipping ME", "ACME-TelCo-CN", "ACME-TelCo-ESAT", "ACME-TelCo-NL", "ACME-TelCo-PL", "ACME-TelCo-RO", "ACME-TelCo-SA", "ACME-TelCo-Treasury", "Default" ]
}
},
{
"bool" : {
"must_not" : {
"exists" : {
"field" : "naDataOwnerCode"
}
}
}
} ]
}
}, {
"range" : {
"bankCommunicationStatusDate" : {
"from" : "2006-02-27T06:45:47.000Z",
"to" : null,
"time_zone" : "+02:00",
"include_lower" : true,
"include_upper" : true
}
}
} ]
}
}
}
And I receive no results, but the field exists in my index.
When I strip off the data owner part, I still have no results. When I strip off the bankCommunicationDate, I get 10 results, so there is the problem.
The query of only the bankCommunicationDate:
GET archive-bp/_search
{
"query" :
{
"range" : {
"bankCommunicationStatusDate" : {
"from" : "2016-04-27T09:45:43.000Z",
"to" : "2026-04-27T09:45:43.000Z",
"time_zone" : "+02:00",
"include_lower" : true,
"include_upper" : true
}
}
}
}
The mapping of my index contains the following bankCommunicationStatusDate field:
"bankCommunicationStatusDate": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
And there are values for the field bankCommunicationStatusDate in elasticsearch:
"bankCommunicationStatusDate": "2016-04-27T09:45:43.000Z"
"bankCommunicationStatusDate": "2016-04-27T09:45:47.000Z"
What is wrong?
What version of Elastic Search do you use?
I guess the reason is that you should use "gte/lte" instead of "from/to/include_lower/include_upper".
According to documentation to version 0.90.4
https://www.elastic.co/guide/en/elasticsearch/reference/0.90/query-dsl-range-query.html
Deprecated in 0.90.4.
The from, to, include_lower and include_upper parameters have been deprecated in favour of gt,gte,lt,lte.
The strange thing is that I have tried your example on elastic search version 1.7 and it returns data!
I guess real depreciation took place much later - between 1.7 and maybe newer version you have.
BTW. You can isolate the problem even further using Sense plugin for Chrome and this code:
DELETE /test
PUT /test
{
"mappings": {
"myData" : {
"properties": {
"bankCommunicationStatusDate": {
"type": "date"
}
}
}
}
}
PUT test/myData/1
{
"bankCommunicationStatusDate":"2016-04-27T09:45:43.000Z"
}
PUT test/myData/2
{
"bankCommunicationStatusDate":"2016-04-27T09:45:47.000Z"
}
GET test/_search
{
"query" :
{
"range" : {
"bankCommunicationStatusDate" : {
"gte" : "2016-04-27T09:45:43.000Z",
"lte" : "2026-04-27T09:45:43.000Z"
}
}
}
}

Use aggregation framework to get peaks from a pre-aggregated dataset

I have a few collections of metrics that are stored pre-aggregated into hour and minute collections like this:
"_id" : "12345CHA-2RU020130104",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"processor_id" : NumberLong(0),
"date" : ISODate("2013-01-04T00:00:00Z"),
"processor_type" : "CHP",
"array_serial" : NumberLong(12345)
},
"hour" : {
"11" : 4.6665907,
"21" : 5.9431519999999995,
"7" : 0.6405864,
"17" : 4.712744,
---etc---
},
"minute" : {
"11" : {
"33" : 4.689972,
"32" : 4.7190895,
---etc---
},
"3" : {
"45" : 5.6883,
"59" : 4.792,
---etc---
}
The minute collection has a sub-document for each hour which has an entry for each minute with the value of the metric at that minute.
My question is about the aggregation framework, how should I process this collection if I wanted to find all minutes where the metric was above a certain highwater mark? Investigating the aggregation framework is showing an $unwind function but that seems to only work on arrays..
Would the map/reduce functionality be better suited for this? With that I could simply emit any entry above the highwatermark and count them.
You could build an array of "keys" using a reduce function that iterates through the objects attributes.
reduce: function(obj,prev)
{
for(var key in obj.minute) {
prev.results.push( { hour:key, minutes: obj.minute[key]});
}
}
will give you something like
{
"results" : [
{
"hour" : "11",
"minutes" : {
"33" : 4.689972,
"32" : 4.7190895
}
},
{
"hour" : "3",
"minutes" : {
"45" : 5.6883,
"59" : 4.792
}
}
]
}
I've just done a quick test using a group() - you'll need something more complex to iterate though the sub-sub documents (minutes) but hopefully points you in right direction.
db.yourcoll.group(
{
initial: { results: [] },
reduce: function(obj,prev)
{
for(var key in obj.minute) {
prev.results.push( { hour:key, minutes: obj.minute[key]});
}
}
}
);
In the finalizer you could reshape the data again. It's not going to be pretty, it might be easier to hold the minute and hour data as arrays rather than elements of the document.
hope it helps a bit