Elasticsearch query to get items that were modified more then an hour ago - date

I have the following indexed items in elasticsearch.
{
"_index": "test_index",
"type": "_doc",
"_source": {
"someTitle": "Thank you for your help",
"lastUpdated": 1640085989000}
},
{
"_index": "test_index",
"type": "_doc",
"_source": {
"someTitle": "Thank you for your help",
"lastUpdated": 1640092916012
}
},
{
"_index": "test_index",
"type": "_doc",
"_source": {
"someTitle": "Thank you for your help",
"lastUpdated": 1640092916012
}
}
How to get the items that were updated more than an hour ago based on that lastUpdated value? I have been trying some solutions found in internet but most of them are for querying the string but not number field.

It feels like a range query would do the work [doc]
The section you are looking for is range on dates
Your query should look more or less like that:
GET /<your index>/_search
{
"query": {
"range": {
"lastUpdated": {
"gte": "now-1h"
}
}
}
}
Make sure your mapping is right, and that lastUpdated has the right format [doc].

ES gives you keywords like now and h for simple date math queries. Along with a range query you should be able to do it:
{
"query": {
"range": {
"lastUpdated": {
"lt": "now-1h"
}
}
}
}

Related

GA4 (Google Analytics) sessions based on UTM params

I am trying to fetch sessions from GA4 which are relevant to specific UTM params.
In GA3 we were able to use segments (sessions::condition::ga:source==X;ga:medium==Y) but I can not find a way to do this on GA4.
POST https://analyticsdata.googleapis.com/v1beta/#{property}:runReport`
Payload like this:
body = {
"metrics": [
{
"name": "sessions::condition::ga:source==X;ga:medium==Y"
}
],
"dimensions": [
{
"name": "date"
}
],
"dateRanges": [
{
"startDate": '2022-01-01',
"endDate": '2022-01-30',
"name": "current_year"
}
]
}
Returns: Field sessions::condition::ga:source==X;ga:medium==Y is not a valid metric.. Is there a way to do this via new API?
Should I use dimension filter to achieve that? I need to query on both source&medium but it is not clear how do I do this?
"dimensionFilter": {
"filter": {
"fieldName": "firstUserMedium",
"stringFilter": {
"value": "Y"
}
}
}
A dimension filter on sessionSource & sessionMedium returns sessions that have those specific utm_source & utm_medium values. See the dimensions & metrics page for a description of these and other dimensions & metrics.
The needed dimension filter is similar to the following. See Dimension Filters in Creating a Report for more info.
"dimensionFilter": {
"andGroup": {
"expressions": [
{
"filter": {
"fieldName": "sessionSource",
"stringFilter": {
"value": "X"
}
}
},
{
"filter": {
"fieldName": "sessionMedium",
"stringFilter": {
"value": "Y"
}
}
}
]
}
},
Segments are not yet available today in the GA4 Data API.
I think you should check the dimensions and metrcis list for GA4 they dont start with ga
POST https://analyticsdata.googleapis.com/v1beta/properties/GA4_PROPERTY_ID:runReport
{
"dateRanges": [{ "startDate": "2020-09-01", "endDate": "2020-09-15" }],
"dimensions": [{ "name": "country" }],
"metrics": [{ "name": "activeUsers" }]
}
Also at this time i don't think it supports segments.

How to use Aggregation along with group by and sum

How to get the sum of purchased deal's price (current year data) group by week, day, year using purchased_at field
My collection data:
{
"_id": ObjectId("5a66d619042e9f3a070d6864"),
"name": "Deal1",
"price": "2000",
"status": true,
"purchased_at": ISODate("2018-01-23T06:28:41.0Z")
}
{
"_id": ObjectId("5a66d619042e9f3a070d6872"),
"name": "Deal2",
"price": "500",
"status": true,
"purchased_at": ISODate("2018-01-13T06:28:41.0Z")
}
{
"_id": ObjectId("5a66d619042e9f3a070d6880"),
"name": "Deal3",
"price": "1000",
"status": true,
"purchased_at": ISODate("2018-02-13T06:28:41.0Z")
}
{
"_id": ObjectId("5a66d619042e9f3a070d6880"),
"name": "Deal4",
"price": "1000",
"status": false,
"purchased_at": ISODate("2018-01-11T06:28:41.0Z")
}
Can someone please help?
Since you're using non standard date format, you need to use filter():
$lastWeekSum = $collection->filter(function($i) {
Carbon::parse($i['purchased_at'])->gt(now()->subWeek());
})->sum('price');
If $i['purchased_at'] returns an object, you should convert it to a string like 2018-01-11T06:28:41.0Z before parsing it.

elasticsearch ngram and postgresql trigram search results are not match

I've crereated an index on elasticsearch same as bellow:
"settings" : {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trigrams_filter"
]
}
}
}
},
"mappings": {
"issue": {
"properties": {
"description": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
My test items are bellow:
"alici onay verdi basarili satisiniz gerceklesti diyor ama hesabima para transferi gerceklesmemis"
"otomatik onay işlemi gecikmiş"
"************* nolu iade islemi urun kargoya verilmedi zamaninda iade islemlerinde urun erorr hata veriyor"
I've test this index with bellow query:
GET issue/_search
{
"query": {
"match": {
"description":{
"query": "otomatik onay istemi zamaninda gerceklesmemis"
}
}
}
}
And result:
{
....
"hits": {
....
"max_score": 2.3507352,
"hits": [
{
....
"_score": 2.3507352,
"_source": {
"issue_id": "*******",
"description": "alici onay verdi basarili satisiniz gerceklesti diyor ama hesabima para transferi gerceklesmemis"
}
}
]
}
}
But same data on postgresql with bellow SQL response another result:
SELECT
public.tbl_issue_descriptions_big.description,
similarity(description, 'otomatik onay islemi zamaninda gerceklesmemis') AS sml
FROM
public.tbl_issue_descriptions_big
WHERE
description %'otomatik onay islemi zamaninda gerceklesmemis'
ORDER BY
sml DESC
LIMIT 10
Result is:
description | sml
======================================================|======
otomatik onay islemi gecikmis |0,351852
Why is this difference caused?
I dont know enough about postgres to give a qualified answer there (as this also depends on the documents that are indexed and if they scoring formulas are exactly the same, which I doubt), but Elasticsearch has an explain API and an explain parameter in the search, that help you to find out why a certain document was scored this way.

How to perform date arithmetic between nested and unnested dates in Elasticsearch?

Consider the following Elasticsearch (v5.4) object (an "award" doc type):
{
"name": "Gold 1000",
"date": "2017-06-01T16:43:00.000+00:00",
"recipient": {
"name": "James Conroy",
"date_of_birth": "1991-05-30"
}
}
The mapping type for both award.date and award.recipient.date_of_birth is "date".
I want to perform a range aggregation to get a list of the age ranges of the recipients of this award ("Under 18", "18-24", "24-30", "30+"), at the time of their award. I tried the following aggregation query:
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"recipients": {
"nested": {
"path": "recipient"
},
"aggs": {
"age_ranges": {
"range": {
"script": {
"inline": "doc['date'].date - doc['recipient.date_of_birth'].date"
},
"keyed": true,
"ranges": [{
"key": "Under 18",
"from": 0,
"to": 18
}, {
"key": "18-24",
"from": 18,
"to": 24
}, {
"key": "24-30",
"from": 24,
"to": 30
}, {
"key": "30+",
"from": 30,
"to": 100
}]
}
}
}
}
}
}
Problem 1
But I get the following error due to the comparison of dates in the script portion:
Cannot apply [-] operation to types [org.joda.time.DateTime] and [org.joda.time.MutableDateTime].
The DateTime object is the award.date field, and the MutableDateTime object is the award.recipient.date_of_birth field. I've tried doing something like doc['recipient.date_of_birth'].date.toDateTime() (which doesn't work despite the Joda docs claiming that MutableDateTime has this method inherited from a parent class). I've also tried doing something further like this:
"script": "ChronoUnit.YEARS.between(doc['date'].date, doc['recipient.date_of_birth'].date)"
Which sadly also doesn't work :(
Problem 2
I notice if I do this:
"aggs": {
"recipients": {
"nested": {
"path": "recipient"
},
"aggs": {
"award_years": {
"terms": {
"script": {
"inline": "doc['date'].date.year"
}
}
}
}
}
}
I get 1970 with a doc_count that happens to equal the total number of docs in ES. This leads me to believe that accessing a property outside of the nested object simply does not work and gives me back some default like the epoch datetime. And if I do the opposite (aggregating dates of birth without nesting), I get the exact same thing for all the dates of birth instead (1970, epoch datetime). So how can I compare those two dates?
I am racking my brain here, and I feel like there's some clever solution that is just beyond my current expertise with Elasticsearch. Help!
If you want to set up a quick environment for this to help me out, here is some curl goodness:
curl -XDELETE http://localhost:9200/joelinux
curl -XPUT http://localhost:9200/joelinux -d "{\"mappings\": {\"award\": {\"properties\": {\"name\": {\"type\": \"string\"}, \"date\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ\"}, \"recipient\": {\"type\": \"nested\", \"properties\": {\"name\": {\"type\": \"string\"}, \"date_of_birth\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd\"}}}}}}}"
curl -XPUT http://localhost:9200/joelinux/award/1 -d '{"name": "Gold 1000", "date": "2016-06-01T16:43:00.000000+00:00", "recipient": {"name": "James Conroy", "date_of_birth": "1991-05-30"}}'
curl -XPUT http://localhost:9200/joelinux/award/2 -d '{"name": "Gold 1000", "date": "2017-02-28T13:36:00.000000+00:00", "recipient": {"name": "Martin McNealy", "date_of_birth": "1983-01-20"}}'
That should give you a "joelinux" index with two "award" docs to test this out ("James Conroy" and "Martin McNealy"). Thanks in advance!
Unfortunately, you can't access nested and non-nested fields within the same context. As a workaround, you can change your mapping to automatically copy date from nested document to root context using copy_to option:
{
"mappings": {
"award": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"date": {
"type": "date"
},
"date_of_birth": {
"type": "date" // will be automatically filled when indexing documents
},
"recipient": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"date_of_birth": {
"type": "date",
"copy_to": "date_of_birth" // copy value to root document
}
},
"type": "nested"
}
}
}
}
}
After that you can access date of birth using path date, though the calculations to get number of years between dates are slightly tricky:
Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()
Here I convert original JodaTime date objects to system.time.LocalDate objects:
Get number of milliseconds from 1970-01-01
Convert to number of days from 1970-01-01 by dividing it to 86400000L (number of ms in one day)
Convert to LocalDate object
Create date-based Period object from two dates
Get number of years between two dates.
So, the final aggregation query looks like this:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_ranges": {
"range": {
"script": {
"inline": "Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()"
},
"keyed": true,
"ranges": [
{
"key": "Under 18",
"from": 0,
"to": 18
},
{
"key": "18-24",
"from": 18,
"to": 24
},
{
"key": "24-30",
"from": 24,
"to": 30
},
{
"key": "30+",
"from": 30,
"to": 100
}
]
}
}
}
}

Elasticsearch - Incoming birthdays

I'm new with elasticsearch and I'm stuck with a query.
I want to get the next (now+3d) birthdays among my users. It looks simple but it's not because i have only the birthdate of my users.
How I can compare only months and day directly in the query when I only have a birthdate (Eg: 1984-04-15 or 2015-04-15 sometimes) ?
My field mapping:
"birthdate": {
"format": "dateOptionalTime",
"type": "date"
}
My actual query that doesn't work at all:
{
"query": {
"range": {
"birthdate": {
"format": "dd-MM",
"gte": "now",
"lte": "now+3d"
}
}
}
}
I saw this post Elasticsearch filtering by part of date but I'm not a big fan of the solution, and I would prefer instead of a wilcard a "now+3d"
Maybe I can do do something with a script ?
"Format" field was added in 1.5.0 version of elasticsearch. If your
version is below 1.5.0 format will not work. We had a same problem where we had to send an email on user's birthday and we were using version 1.4.4. So we created a separate dob field where we stored date in "dd-MM" format.
We changed the mapping of date field:
PUT /user
{
"mappings": {
"user": {
"properties": {
"dob": {
"type": "date",
"format": "dd-MM"
}
}
}
}
}
Then you can search:
GET /user/_search
{
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"from": "01-01",
"to": "01-01",
"include_upper" : true
}
}
}
}
}
}