MongoDB find closest match - mongodb

I'm wondering if it is possible to access a document in MongoDB via closest match.
e.g. my search query always contains:
name
country
city
Following rules are in place:
1. name always has to match
2. if either country or city is present, country has a higher priority
3. if country or city does not match only consider this document, if they have the default value (e.g. for String: "")
example Query:
name = "Test"
country = "USA"
city = "Seattle"
Documents:
db.stuff.insert([
{
name:"Test",
country:"",
city:"Seattle"
},{
name:"Test3",
country:"USA",
city:"Seattle"
},{
name:"Test",
country:"USA",
city:""
},{
name:"Test",
country:"Germany",
city:"Seattle"
},{
name:"Test",
country:"USA",
city:"Washington"
}
])
It should return the 3rd document
thanks!

Considering uncertain requirements and contradicting updates, the answer is rather a guideline addressing the "Is it possible at all" part.
The example should be adjusted to meet expectation.
db.stuff.aggregate([
{$match: {name: "Test"}}, // <== the fields that should always match
{$facet: {
matchedBoth: [
{$match: {country: "USA", city: "Seattle"}}, // <== bull's-eye
{$addFields: {weight: 10}} // <== 10 stones
],
matchedCity: [
{$match: {country: "", city: "Seattle"}}, // <== the $match may need to be improved, see below
{$addFields: {weight: 5}}
],
matchedCountry: [
{$match: {country: "USA", city: ""}},
{$addFields: {weight: 0}} // <== weightless, yet still a match
]
// add more rules here, if needed
}},
// get them together. Should list all rules from above
{$project: {doc: {$concatArrays: ["$matchedBoth", "$matchedCity", "$matchedCountry"]}}},
{$unwind: "$doc"}, // <== split them apart
{$sort: {"doc.weight": -1}}, // <== and order by weight, desc
// reshape to retrieve documents in its original format
{$project: {_id: "$doc._id", name: "$doc.name", country: "$doc.country", city: "$doc.city"}}
]);
The least explained part of the question affect how we build up facets. e.g.
{$match: {country: "", city: "Seattle"}}
matches all documents where country explicitly present and is an empty string.
It very well might be
{$match: {country: {$ne: "USA"}, city: "Seattle"}}
to get all documents with matching name and city and any country/no country, or even
{$match: {$and: [{$or: [{country: null}, {country: ""}]}, {city: "Seattle"}]}}
etc.

Here is a query
db.collection.aggregate([
{$match: {name:"Test"}},
{$project: {
name:"$name",
country: "$country",
city:"$city",
countryMatch: {$cond: [{$eq:["$country", "USA"]}, true, false]},
cityMatch: {$cond:[{$eq:["$city", "Seattle"]}, true, false]}
}},
{$match: {$and: [
{$or:[{countryMatch:true},{country:""}]},
{$or:[{cityMatch:true},{city:""}]}
]}},
{$sort: {countryMatch:-1, cityMatch:-1}},
{$project: {name:"$name", country:"$country", city:"$city"}}
])
Explanation:
First match filters out docs which don't match name (because rule #1 - name should match).
Next projection selects doc fields plus some information about country and city matches. We will need it to further filter and sort documents.
Second match filters out those documents which don't match both country and city and don't have default values for these fields (rule #3).
Sorting documents moves country matches before city matches as rule #2 states. And last - projection selects required fields.
Output:
{
_id: 3,
name : "Test",
country : "USA",
city : ""
},
{
_id: 1,
name : "Test",
country : "",
city : "Seattle"
}
You can limit query results to get only closest match.

Related

Mongo query: array of objects where a key's value is repeated

I am new to Mongo. Posting this question because i am not sure how to search this on google
i have a book documents like below
{
bookId: 1
title: 'some title',
publicationDate: DD-MM-YYYY,
editions: [{
editionId: 1
},{
editionId: 2
}]
}
and another one like this
{
bookId: 2
title: 'some title 2',
publicationDate: DD-MM-YYYY,
editions: [{
editionId: 1
},{
editionId: 1
}]
}
I want to write a query db.books.find({}) which would return only those books where editions.editionId has been duplicated for a book.
So in this example, for bookId: 2 there are two editions with the editionId:1.
Any suggestions?
You can use the aggregation framework; specifically, you can use the $group operator to group the records together by book and edition id, and count how many times they occur : if the count is greater than 1, then you've found a duplication.
Here is an example:
db.books.aggregate([
{$unwind: "$editions"},
{$group: {"_id": {"_id": "$_id", "editionId": "$editions.editionId"}, "count": {$sum: 1}}},
{$match: {"count" : {"$gt": 1}}}
])
Note that this does not return the entire book records, but it does return their identifiers; you can then use these in a subsequent query to fetch the entire records, or do some de-duplication for example.

Most frequent word in MongoDB collection

I got a MongoDB collection where each entry has a product field containing a string array. What i would like to do is find the most frequent word in the whole collection. Any ideas on how to do that ?
Here is a sample object:
{
"_id" : ObjectId("55e02d333b88f425f84191af"),
"product" : [
" x bla y "
],
"hash_key" : "ecfe355b2f45dfbaf361cff4d314d4cc",
"price" : [
"z"
],
"image" : "image_url"
}
Looking at the sample object, what I would like to do is count "x", "bla" and "y" singularly.
I recently had to do something similar. I had a collection of objects and each object had a list of keywords. To count the frequency of each keyword, I used the following aggregation pipeline, which uses the MongoDB version 4.4 $accumulator group operation.
db.collectionname.aggregate(
{$match: {available: true}}, // Some criteria to filter the documents
{$project:
{ _id: 0, keywords: 1}}, // Only keep keywords
{$group:
{_id: null, keywords: // Accumulate keywords into one array
{$accumulator: {
init: function(){return new Array()},
accumulate: function(state, value){return state.concat(value)},
accumulateArgs: ["$keywords"],
merge: function(state1, state2){return state1.concat(state2)},
lang: "js"}}}},
{$unwind: "$keywords"}, // Split array into fields
{$group: {_id: "$keywords", freq: {$sum: 1}}}, // Group keywords and count frequencies
{$sort: {freq: -1}}, // Sort in reverse order
{$limit: 5} // Take first five
)
I have no idea if this is the most efficient solution. However, it solved the problem for me.

MongoDB: calculate average value for the document & then do the same thing across entire collection

I have collection of documents (Offers) with subdocuments (Salary) like this:
{
_id: ObjectId("zzz"),
sphere: ObjectId("xxx"),
region: ObjectId("yyy"),
salary: {
start: 10000,
end: 50000
}
}
And I want to calculate average salary across some region & sphere for the entire collection. I created query for this, it works, but it takes care only about salary start value.
db.offer.aggregate(
[
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{region: ObjectId("xxx")},
{sphere: ObjectId("yyy")}
]}
},
{$group: {_id: null, avg: {$avg: "$salary.start"}}}
]
)
But firstly I want to calculate avarage salary (start & end) of the offer. How can I do this?
Update.
If value for "salary.end" may be missing in your data, you need to add one additional "$project" iteration to replace missing "salary.end" with existing "salary.start". Otherwise, the result of the average function will be calculated wrong due to ignoring documents with the lack of "salary.end" values.
db.offer.aggregate([
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{"region": ObjectId("xxx")},
{"sphere": ObjectId("yyy")}
]}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"salary.start":1,
"salary.end":1,
"salary.end": {$ifNull: ["$salary.end", "$salary.start"]}
}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"avg_salary":{$divide:[
{$add:["$salary.start","$salary.end"]}
,2
]}}},
{$group:{"_id":{"sphere":"$sphere","region":"$region"},
"avg":{$avg:"$avg_salary"}}}
])
The way you aggregate has to be modified:
Match the required region,sphere and where salary > 0.
Project a extra field for each offer, which holds the average of
start and end.
Now group together the records with the same region and sphere, and
apply the $avg aggregation operator on the avg_salary for each offer
in that group,to get the average salary.
The Code:
db.offer.aggregate([
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{"region": ObjectId("xxx")},
{"sphere": ObjectId("yyy")}
]}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"avg_salary":{$divide:[
{$add:["$salary.start","$salary.end"]}
,2
]}}},
{$group:{"_id":{"sphere":"$sphere","region":"$region"},
"avg":{$avg:"$avg_salary"}}}
])

mongodb query on uniqueness of a field

doing a prefix search
query = ({$or:[{country_lc:/^unit.*/, docType:'country'}, {region_lc:/^unit.*/, docType:'region'}, {city_lc:/^unit.*/, docType:'city'}]})
FIELDS = [
country : 1
, region : 1
, city : 1
, docType : 1
]
SORT_RULE = [
country_lc : 1
, region_lc : 1
, city_lc : 1
]
def locCursor = db.location.find(query, FIELDS).sort(SORT_RULE)
But this gives me duplicate results as well...So I want disticnt results. The distinction should be on the field
called 'name' in location colection. Means for each unique name,it should five me one result
In MongoDB shell you can do:
db.location.aggregate([
{$match: {$or:[
{country_lc:/^unit.*/, docType:'country'},
{region_lc:/^unit.*/, docType:'region'},
{city_lc:/^unit.*/, docType:'city'}
]}},
{$group: {
_id: '$name',
country: '$country',
region: '$region',
city: '$city',
docType: '$docType'
}},
{$sort: {couuntry_lc: 1, region_lc: 1, city_lc: 1}}
])
To get your results.

mongodb/meteor: how to I get the value of one field corresponding to the $max value of another field?

I have a collection of messsages with the following fields: _id, senderId, receiverId, dateSubmittedMs, message, and for a given user I want to return the latest message to him from all other users. So, for example, if there are users Alex, Barb, Chuck, Dora, I would like to return the most recent message between Alex and each of Barb, Chuck and Dora. What is the best way to do this? Can I do it in one step using aggregation?
The aggregation examples in the official online documentation (http://docs.mongodb.org/manual/reference/aggregation/min/) show how to find the lowest age over groups within a collection, but what I need is something analogous to finding the name of the youngest person over groups of people.
Here is my current approach:
Step 1: Find the highest value for dateSubmitted over all messages sent and received by Alex, grouping over the other users:
var M = Messages.aggregate(
{$match:
{$or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]}
},
{$group: {_id: "$receiverId", lastestSubmitted: {$max: "$submitted"} }}).fetch();
Step 2: Create an array of these highest values of dateSubmitted:
var MIds = _.pluck(M,'lastestSubmitted');
Step 3: Find these messages, by senderId, receiverId, and latestSubmitted:
return Messages.find(
{submitted: {$in: MIds}, $or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]},
{$sort: {submitted: 1}}
});
There are two problems with this:
Can it be done in one step instead of three? Perhaps through a mapReduce or Aggregate command?
Instead of grouping only over the receiverId: 'Alex', is there a way to group over something like: $or [{receiverId: 'Alex', senderId: 'Barb'}, {senderId: 'Alex', receiverId: 'Barb'}]? (but for EACH of the other users) This would allow me to get the latest message in a conversation between any two participants that Alex conversed with. So for example:
Any suggestions?
The only thing you have to change is the group._id in the grougping phase. While you can use a document for this propose not just a field, you can apply the sort in the same pipeline. However it is not change if you use only the receiverId for grouping.
var M = Messages.aggregate(
{$match:
{$or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]}
},
{$group:
{_id: {
receiverId: "$receiverId",
senderId: "$senderId"},
lastestSubmitted: {$max: "$submitted"} }
},
{$sort: {submitted: -1}
},
{$limit: 1}
).fetch();
This example above only adds the possiblity to check for the second most recently pinged connection or the third one, if you only likely to get the most recent message you do not even need to group. Just run this:
var M = Messages.aggregate(
{$match:
{$or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]}
},
{$sort: {submitted: -1}
},
{$limit: 1}
).fetch();
THE ANSWER ABOVE THIS IS FOR GETTING THE MOST RECENT MESSAGE BETWEEN A SINGLE USER AND OTHERS, TO GET THE MOST RECENT MESSAGE BETWEEN ALL PAIRS INVOLVING A USER READ UNDER>>>
Based on the comments i misinterpreted a bit the question but the part above is useful anyway. The correct resolution for your problem is under. The difficulty in is to get a key for the pair to group on and identify the switched pair ( bob -> tom == tom -> bob) in this case. U can use condition and ordering to identify the swaps. (It is certainly a much more difficult question) The code look like this:
var M = Messages.aggregate(
{$match:
{$or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]}
},
{$project:
{'part1':{$cond:[{$gt:['$senderId','$receiverId']},'$senderId','$receiverId']},
'part2':{$cond:[{$gt:['$senderId','$receiverId']},'$receiverId','$senderId']},
'message':1,
'submitted':1
}
},
{$sort: {submitted: -1}},
{$group:
{_id: {
part1: "$part1",
part2: "$part2"},
lastestSubmitted: {$first: "$submitted"},
message: {$first: "$message"} }
}
).fetch();
If you are not familiar with some of the operators used above, like $cond or $first, check out this.