How to sort OpenSearch results that have the same score? - opensearch

I want to set secondary sorting criteria, without deactivating the default behavior, which is sorting by relevance score. In the documentation, all the examples seem to deactivate the default behavior and then sort by the chosen field.
Documentation also gives examples of setting several sorting criteria (the sort attribute of the query is an array of sorting criteria), but I don't see a mention of how to set the first one as sorting by relevance score.
The track_score option allows me to see the relevance score of each hit, but I would like to actually use it as the first ordering rule, and use the other one only for results that have the same relevance score.

You can sort by more than one criteria. The second sort will work whenever the first sort score is the same.
Here is an example:
POST test_stackoverflow_us/_bulk?refresh=true&pretty
{ "index": {}}
{"name":"obama a", "countryCode":"us", "rating":5}
{ "index": {}}
{"name":"obama b", "countryCode":"us", "rating":4}
{ "index": {}}
{"name":"obama ac", "countryCode":"ar", "rating":3}
{ "index": {}}
{"name":"obama ess", "countryCode":"es", "rating":3.5}
GET test_stackoverflow_us/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match_phrase_prefix": {
"name": {
"query": "obama"
}
}
}
],
"boost": 2
}
}
],
"should": [
{
"term": {
"countryCode": {
"value": "US",
"boost": 4
}
}
},
{
"term": {
"countryCode": {
"value": "AR",
"boost": 3
}
}
},
{
"term": {
"countryCode": {
"value": "ES",
"boost": 2
}
}
}
]
}
},
"size": 50,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"rating": {
"order": "desc"
}
}
]
}

Related

MongoDb : Find and filter values from nested map

I have some date in mongo db
[
{
"_id": ObjectId("5a934e000102030405000000"),
"orgId": "606abce197dc265ac41ae82c",
"registrations": {
"id1": {
"status": "status",
"topStage": {
"id": "stage1",
"name": "stage1"
}
},
"id2": {
"status": "status",
"topStage": {
"id": "stage1",
"name": "stage1"
}
},
"id3": {
"status": "status",
"topStage": {
"id": "stage2",
"name": "stage2"
}
}
}
}
]
I am expecting to pass a stage id (at path registrations-> topStage -> id) and return all matching key values.
i have written following query
db.collection.aggregate([
{
$project: {
teams: {
$objectToArray: "$registrations"
},
original: "$$ROOT"
}
},
{
"$project": {
"teams": {
"$filter": {
"input": "$teams",
"as": "team",
"cond": {
"$eq": [
"$$team.v.topStage.id",
"stage1"
]
}
}
}
}
},
{
"$project": {
"registrations": {
"$arrayToObject": "$teams"
}
}
}
])
It does return me right values
for stage1 as stage id
[
{
"_id": ObjectId("5a934e000102030405000000"),
"registrations": {
"id1": {
"status": "status",
"topStage": {
"id": "stage1",
"name": "stage1"
}
},
"id2": {
"status": "status",
"topStage": {
"id": "stage1",
"name": "stage1"
}
}
}
}
]
and for stage2 as stage id, it returns
[
{
"_id": ObjectId("5a934e000102030405000000"),
"registrations": {
"id3": {
"status": "status",
"topStage": {
"id": "stage2",
"name": "stage2"
}
}
}
}
]
Can someone let me know if this is the best way to write this query or this can be simplified ??
It's the correct way to do it but there will be performance impact in the following cases.
If you don't have any other match condition against the indices
if you have a match condition and it matches few docs where registrations has more objects
Other best option you could do is that altering the schema.
you can keep registrations.id1 as registrations : { id:1, status_id: 2}
or you could alter the way such that it will not need to use objectToArray on larger set
if your data is huge, I would recommend to add an index on nested status Id field.
And mongo documentation itself suggests to evaluate multiple schemas for any data to get the best out of it.

Get inserted document counts in specific date range using date histogram in elasticsearch

I have list documents in elasticsearch which contains various fileds.
documents looks like below.
{
"role": "api_user",
"apikey": "key1"
"data":{},
"#timestamp": "2021-10-06T16:47:13.555Z"
},
{
"role": "api_user",
"apikey": "key1"
"data":{},
"#timestamp": "2021-10-06T18:00:00.555Z"
},
{
"role": "api_user",
"apikey": "key1"
"data":{},
"#timestamp": "2021-10-07T13:47:13.555Z"
}
]
I wanted to find the number of documents present in specifi date range with 1day interval, let's say
2021-10-05T00:47:13.555Z to 2021-10-08T00:13:13.555Z
I am trying the below aggregation for the result.
{
"size": 0,
"query": {
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2021-10-05T00:47:13.555Z",
"lte": "2021-10-08T00:13:13.555Z",
"format": "strict_date_optional_time"
}
}
}
]
}
}
},
"aggs": {
"data": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day"
}
}
}
}
The expected output should be:-
For 2021-10-06 I should get 2 documents and 2021-10-07 I should get 1 document and if the docs are not present I should get count as 0.
the below solution works
{
"size":0,
"query":{
"bool":{
"must":[
],
"filter":[
{
"match_all":{
}
},
{
"range":{
"#timestamp":{
"gte":"2021-10-05T00:47:13.555Z",
"lte":"2021-10-08T00:13:13.555Z",
"format":"strict_date_optional_time"
}
}
}
],
"should":[
],
"must_not":[
]
}
},
"aggs":{
"data":{
"date_histogram":{
"field":"#timestamp",
"fixed_interval":"12h",
"time_zone":"Asia/Calcutta",
"min_doc_count":1
}
}
}
}

Using $match to query from different arrays with the same key value

Suppose I have this simple JSON data of two documents both with two different arrays namely carPolicies and paPolicies. Within these arrays are objects named as policy where it contains a key 'agent' where the value is '47'.
{
"_id": {
"$oid": "some_id"
},
"name": "qwe",
"password": "pw",
"carPolicies": [
{
"policy": {
"agent": "47"
}
},
{
"policy": {
"agent": "47"
}
}
],
"paPolicies": [
{
"policy": {
"agent": "47"
}
},
{
"policy": {
"agent": "47"
}
}
]
}
{
"_id": {
"$oid": "some_id"
},
"name": "rty",
"password": "wp",
"carPolicies": [
{
"policy": {
"agent": "47"
}
},
{
"policy": {
"agent": "47"
}
}
],
"paPolicies": [
{
"policy": {
"agent": "47"
}
},
{
"policy": {
"agent": "47"
}
}
]
}
Using mongoDB's $match operator, how do I come up with a query that if agent value is 47 in either arrays, it returns me the document's name?
This is what I currently have:
db.collection('users').aggregate([
// Get just the docs that contain an agent element where agent is === req.params.name
{$match: {$or: [{'paPolicies.policy.agent': req.params.name}, {'carPolicies.policy.agent': req.params.name}]} },
{
$project: {
policy: {
$filter: {
// how to do an 'or' operator at 'input' so it can be input: '$paPolicies.policy || $carPolicies.policy'
input: '$paPolicies.policy',
as: 'police',
cond: { $eq: ['$$police.agent', req.params.name]}
}
},
_id: 1, name: 1
}
}
])
I know that the above code is wrong but I feel like it's the closest I can currently get to a solution and hopefully gives an idea of what I'm trying to achieve.
If I get the requirement right. How about just using dot(.) notation in a .find() query with projection as second parameter.
db.collection.find({
$or: [
{
"carPolicies.policy.agent": "47"
},
{
"paPolicies.policy.agent": "47"
}
]
},
{
"_id": 1,
"name": 1
})

Mongo Query - Filter on nested array and return documents that do not contain a specific field

I have the following schema:
{
"_id": 0,
"games": {
"gamesList": [
{
"franchiseName": "Tekken",
"genre": "Fighting",
"gamesInFranchise": [
{
"name": "Tekken 7",
"releaseDate": "03/18/2015",
"co-op": true,
"platforms": [
"playstation 3",
"xbox 360"
]
},
{
"name": "Tekken 6",
"releaseDate": "11/26/2007",
"co-op": true
},
{
"name": "Tekken 5",
"releaseDate": "01/01/2004",
"co-op": true
},
]
},
.................
]
}
}
I would like to filter documents based on specific "_id" that do not have the property "platforms". So essentially, the result would ideally look like this:
{
"_id": 0,
"games": {
"gamesList": [
{
"franchiseName": "Tekken",
"genre": "Fighting",
"gamesInFranchise": [
{
"name": "Tekken 6",
"releaseDate": "11/26/2007",
"co-op": true
},
{
"name": "Tekken 5",
"releaseDate": "01/01/2004",
"co-op": true
}
]
}
]
}
}
I tried using aggregation specifically with projection/filter query, but I can't seem to reach "platforms" to check if it exists or not.
The problem is that you have multiple embedded array before reaching platforms.
You have to use the aggregation framework to deal with them.
Here's the query :
db.collection.aggregate([
{
$match: {
_id: 0
}
},
{
$addFields: {
"games.gamesList": {
$map: {
input: "$games.gamesList",
as: "franchise",
in: {
"franchiseName": "$$franchise.franchiseName",
"genre": "$$franchise.genre",
"gamesInFranchise": {
$filter: {
input: "$$franchise.gamesInFranchise",
as: "gameInFranchise",
cond: {
$eq: [
null,
{
$ifNull: [
"$$gameInFranchise.platforms",
null
]
}
]
}
}
}
}
}
}
}
}
])
I used $addFields to keep your other fields safe.
Note that you have to describe whole element in th 'in' field of
'$map' operator.
The first $map operator is relative to the first level array, and $filter to the second level array.
$ifNull is the trick to check if element exists, or return something (here set to null). By checking the equality (or not equality) with null, you can check if the element exists
You can test the query here

How can I return the minimum values from two subdocuments in a collection using MongoDB's aggregation pipeline?

We have a bunch of products in a database with two types of monetary values attached to each. Each object has a manufacturer, a range and a description, and each object can have a monthly rental amount (for rental agreements), a monthly payment amount (for finance agreements) or both.
An example object would be:
{
"manufacturer": "Manufacturer A",
"range": "Range A",
"description": "Product Description",
"rentals": {
"initialRental": 1111.05,
"monthlyRental": 123.45,
"termMonths": 24
},
"payments": {
"deposit": 592.56,
"monthlyPayment": 98.76,
"finalPayment": 296.28,
"termMonths": 36
}
}
There can often be more than one object for a given manufacturer and range.
I'm looking for an aggregation pipeline that will return a list of the lowest monthly rental and the lowest monthly payment for each distinct manufacturer/range pair, but my limited knowledge of how to use the aggregation framework seems to be catching me out.
My intended result, if there were one distinct manufacturers with two distinct ranges, would be the following:
[
{
"manufacturer": "Manufacturer A",
"range": "Range A",
"minimumRental": 123.45,
"minimumPayment": 98.76
},
{
"manufacturer": "Manufacturer A",
"range": "Range B",
"minimumRental": 234.56,
"minimumPayment": 197.53
}
]
I'm using the following to try and achieve this, but I seem to be tripping up on the grouping and use of $min:
db.products.aggregate(
[
{
"$group": {
"_id": {
"manufacturer": "$manufacturer.name",
"range": "$range.name"
},
"rentals": {
"$addToSet": "$rentals.monthlyrental"
},
"payments": {
"$addToSet": "$payments.monthlypayment"
}
}
},
{
"$group": {
"_id": {
"manufacturer": "$_id.manufacturer",
"range": "$_id.range",
"payments": "$payments"
},
"minimumRental": {
"$min": "$rentals"
}
}
},
{
"$project": {
"_id": {
"manufacturer": "$_id.manufacturer",
"range": "$_id.range",
"minimumRental": "$minimumRental",
"payments": "$_id.payments"
}
}
},
{
"$group": {
"_id": {
"manufacturer": "$_id.manufacturer",
"range": "$_id.range",
"minimumRental": "$_id.minimumRental"
},
"minimumPayment": {
"$min": "$_id.payments"
}
}
},
{
"$project": {
"_id": 0,
"manufacturer": "$_id.manufacturer",
"range": "$_id.range",
"minimumRental": "$_id.minimumRental",
"minimumPayment": "$minimumPayment"
}
}
]
)
It's worth noting, in the case with my test data, that I have deliberately not specified a rental for Range B, as there will be cases where rentals and/or payments are not both specified for a given range.
So, using the query above on my test data gives me the following:
{
"0" : {
"minimumPayment" : [
98.76
],
"manufacturer" : "Manufacturer A",
"range" : "Range A",
"minimumRental" : [
123.45
]
},
"1" : {
"minimumPayment" : [
197.53
],
"manufacturer" : "Manufacturer A",
"range" : "Range B",
"minimumRental" : []
}
}
This is close, but it appears that I'm getting an array instead of a minimum value. I get the impression that what I'm trying to do is possible, but I don't seem to be able to find any resources specific enough to use to find out what I'm doing wrong.
Thanks for reading.
It's a bit complex but there is a little to understand here. First case is simplify and then just find the smallest amount for each
db.collection.aggregate([
// Tag things with an A/B value11
{ "$project": {
"_id": {
"manufacturer": "$manufacturer.name",
"range": "$range.name",
},
"rental": "$rentals.monthlyRental",
"payment": "$payments.monthlyPayment"
"type": { "$literal": [ "R","P" ] }
}},
// Unwind that "type"
{ "$unwind": "$type" },
// Group conditionally on the type
{ "$group": {
"_id": {
"_id": "$_id",
"type": "$type"
},
"value": {
"$min": {
"$cond": [
{ "$eq": [ "$type", "R" ] },
"$rental",
"$payment"
]
}
}
}},
// Sort by type and amount
{ "$sort": { "_id.type": 1, "value": 1 } },
// Group by type only and just take the first after sort
{ "$group": {
"_id": "$_id.type",
"manufacturer": { "$first": "$_id._id.manufacturer" },
"range": { "$first": "$_id._id.range" }
}}
])
And that's basically it, just clean up fields as you need with a $project or deal with it in code.
Personally though I find that a bit sloppy and with a bit of overhead due to $unwind doing "A/B" values. A better approach would be to run each aggregation in parallel queries, then just merge the result to send to the client.
I could bang on all day about parallel queries, but the basic example was in an answer I gave recently, so read How to Group By Different Fields which shows the general technique for doing this already.