mongodb query on uniqueness of a field - mongodb

doing a prefix search
query = ({$or:[{country_lc:/^unit.*/, docType:'country'}, {region_lc:/^unit.*/, docType:'region'}, {city_lc:/^unit.*/, docType:'city'}]})
FIELDS = [
country : 1
, region : 1
, city : 1
, docType : 1
]
SORT_RULE = [
country_lc : 1
, region_lc : 1
, city_lc : 1
]
def locCursor = db.location.find(query, FIELDS).sort(SORT_RULE)
But this gives me duplicate results as well...So I want disticnt results. The distinction should be on the field
called 'name' in location colection. Means for each unique name,it should five me one result

In MongoDB shell you can do:
db.location.aggregate([
{$match: {$or:[
{country_lc:/^unit.*/, docType:'country'},
{region_lc:/^unit.*/, docType:'region'},
{city_lc:/^unit.*/, docType:'city'}
]}},
{$group: {
_id: '$name',
country: '$country',
region: '$region',
city: '$city',
docType: '$docType'
}},
{$sort: {couuntry_lc: 1, region_lc: 1, city_lc: 1}}
])
To get your results.

Related

MongoDB find closest match

I'm wondering if it is possible to access a document in MongoDB via closest match.
e.g. my search query always contains:
name
country
city
Following rules are in place:
1. name always has to match
2. if either country or city is present, country has a higher priority
3. if country or city does not match only consider this document, if they have the default value (e.g. for String: "")
example Query:
name = "Test"
country = "USA"
city = "Seattle"
Documents:
db.stuff.insert([
{
name:"Test",
country:"",
city:"Seattle"
},{
name:"Test3",
country:"USA",
city:"Seattle"
},{
name:"Test",
country:"USA",
city:""
},{
name:"Test",
country:"Germany",
city:"Seattle"
},{
name:"Test",
country:"USA",
city:"Washington"
}
])
It should return the 3rd document
thanks!
Considering uncertain requirements and contradicting updates, the answer is rather a guideline addressing the "Is it possible at all" part.
The example should be adjusted to meet expectation.
db.stuff.aggregate([
{$match: {name: "Test"}}, // <== the fields that should always match
{$facet: {
matchedBoth: [
{$match: {country: "USA", city: "Seattle"}}, // <== bull's-eye
{$addFields: {weight: 10}} // <== 10 stones
],
matchedCity: [
{$match: {country: "", city: "Seattle"}}, // <== the $match may need to be improved, see below
{$addFields: {weight: 5}}
],
matchedCountry: [
{$match: {country: "USA", city: ""}},
{$addFields: {weight: 0}} // <== weightless, yet still a match
]
// add more rules here, if needed
}},
// get them together. Should list all rules from above
{$project: {doc: {$concatArrays: ["$matchedBoth", "$matchedCity", "$matchedCountry"]}}},
{$unwind: "$doc"}, // <== split them apart
{$sort: {"doc.weight": -1}}, // <== and order by weight, desc
// reshape to retrieve documents in its original format
{$project: {_id: "$doc._id", name: "$doc.name", country: "$doc.country", city: "$doc.city"}}
]);
The least explained part of the question affect how we build up facets. e.g.
{$match: {country: "", city: "Seattle"}}
matches all documents where country explicitly present and is an empty string.
It very well might be
{$match: {country: {$ne: "USA"}, city: "Seattle"}}
to get all documents with matching name and city and any country/no country, or even
{$match: {$and: [{$or: [{country: null}, {country: ""}]}, {city: "Seattle"}]}}
etc.
Here is a query
db.collection.aggregate([
{$match: {name:"Test"}},
{$project: {
name:"$name",
country: "$country",
city:"$city",
countryMatch: {$cond: [{$eq:["$country", "USA"]}, true, false]},
cityMatch: {$cond:[{$eq:["$city", "Seattle"]}, true, false]}
}},
{$match: {$and: [
{$or:[{countryMatch:true},{country:""}]},
{$or:[{cityMatch:true},{city:""}]}
]}},
{$sort: {countryMatch:-1, cityMatch:-1}},
{$project: {name:"$name", country:"$country", city:"$city"}}
])
Explanation:
First match filters out docs which don't match name (because rule #1 - name should match).
Next projection selects doc fields plus some information about country and city matches. We will need it to further filter and sort documents.
Second match filters out those documents which don't match both country and city and don't have default values for these fields (rule #3).
Sorting documents moves country matches before city matches as rule #2 states. And last - projection selects required fields.
Output:
{
_id: 3,
name : "Test",
country : "USA",
city : ""
},
{
_id: 1,
name : "Test",
country : "",
city : "Seattle"
}
You can limit query results to get only closest match.

Mongodb nested find

My database schema is somewhat like
{
"_id" : ObjectId("1"),
"createdAt" : ISODate("2017-03-10T00:00:00.000+0000"),
"user_list" : [
{
"id" : "a",
"some_flag" : 1,
},
{
"id" : "b",
"some_flag" : 0,
}
]
}
What I want to do is get the document where id is b & some_flag for the user b is 0.
My query is
db.collection.find({
'createdAt': {
$gte: new Date()
},
'user_list.id': 'b',
'user_list.some_flag': 1
}).sort({
createdAt: 1
})
When I run the query in shell. It returns the doc with id 1(which it shouldn't as the value of some_flag for b is 0)
The thing happening here is,
the query 'user_list.id': user_id matches with the nested object where "id" : b
'user_list.some_flag': 1 is matched with some_flag of nested object where "id": a (as the value of some_flag is 1 here)
What modifications should I make to compare the id & some_flag for the same nested object.
P.S. the amount of data is quite large & using aggregate will be a performance bottleneck
You should be using $elemMatch otherwise mongoDB queries are applied independently on array items, so in your case 'user_list.some_flag': 1 will be matched to array item with id a and 'user_list.id': 'b' will match array item with id b. So essentially if you want to query on array field with and logic use $elemMatch as following:
db.collection.find({
'createdAt': {
$gte: new Date()
},
user_list: {$elemMatch: {id: 'b', some_flag: 1}} // will only be true if a unique item of the array fulfill both of the conditions.
}).sort({
createdAt: 1
})
you need to try something like :
db.collection.find({
'createdAt': {
$gte: new Date()
},
user_list: {
$elemMatch: {
id: 'b',
some_flag: 1
}
}
}).sort({
createdAt: 1
});
This will match only user_list entries where _id is b and someflag is 1

Most frequent word in MongoDB collection

I got a MongoDB collection where each entry has a product field containing a string array. What i would like to do is find the most frequent word in the whole collection. Any ideas on how to do that ?
Here is a sample object:
{
"_id" : ObjectId("55e02d333b88f425f84191af"),
"product" : [
" x bla y "
],
"hash_key" : "ecfe355b2f45dfbaf361cff4d314d4cc",
"price" : [
"z"
],
"image" : "image_url"
}
Looking at the sample object, what I would like to do is count "x", "bla" and "y" singularly.
I recently had to do something similar. I had a collection of objects and each object had a list of keywords. To count the frequency of each keyword, I used the following aggregation pipeline, which uses the MongoDB version 4.4 $accumulator group operation.
db.collectionname.aggregate(
{$match: {available: true}}, // Some criteria to filter the documents
{$project:
{ _id: 0, keywords: 1}}, // Only keep keywords
{$group:
{_id: null, keywords: // Accumulate keywords into one array
{$accumulator: {
init: function(){return new Array()},
accumulate: function(state, value){return state.concat(value)},
accumulateArgs: ["$keywords"],
merge: function(state1, state2){return state1.concat(state2)},
lang: "js"}}}},
{$unwind: "$keywords"}, // Split array into fields
{$group: {_id: "$keywords", freq: {$sum: 1}}}, // Group keywords and count frequencies
{$sort: {freq: -1}}, // Sort in reverse order
{$limit: 5} // Take first five
)
I have no idea if this is the most efficient solution. However, it solved the problem for me.

MongoDB command counting how many times a value in field A appears with distinct value in field B(or MySQL command acceptable)

I have a collection in MongoDB, and suppose the data is:
_id | doc_name | serial_number
1 | a | abc
2 | b | abc
3 | a | abc
4 | a | def
5 | c | def
6 | b | abc
I want to count how many times "serial_number" appears with distinct "doc_name". In this case, it should be:
abc|2
def|2
and I want to save the results into another collection.
How can I do this?
MySQL command is also acceptable, since I can use the MySQL --> MongoDB translator..
Thanks a lot!
its easy with our lil' buddy mongo. use a mongo aggregation using $sum within the $group operator.
it appears the your sample output is incorrect. either
abc|4
def|2
or
a|abc|2
b|abc|2
a|def|1
c|def|1
the 1st is
db.collection.aggregate(
[{$group: {_id: "$serial_number",: {$sum: 1} }]
)
the 2nd is
db.collection.aggregate(
[{$group: {_id: {"$serial_number", "$doc_name"},: {$sum: 1} }]
)
and then ill defer to #nickmilon for the $out operator (that ive never used). cool
{ $out : 'output_collection' }
]
what about this ? :
db.your_collection.aggregate(
[
{
$group: {_id: {doc_name:'$doc_name', serial_number:'$serial_number'},
'count': {'$sum': 1}}
},
{
$group: {_id: '$_id.serial_number',
'count': {'$sum': 1}}
},
{ $out : 'output_collection' }
]
)
remark //{ $out : 'output_collection' } to see the results :
{ "_id" : "abc", "count" : 2 }
{ "_id" : "def", "count" : 2 }

Query for field in subdocument

An example of the schema i have;
{ "_id" : 1234,
“dealershipName”: “Eric’s Mongo Cars”,
“cars”: [
{“year”: 2013,
“make”: “10gen”,
“model”: “MongoCar”,
“vin”: 3928056,
“mechanicNotes”: “Runs great!”},
{“year”: 1985,
“make”: “DeLorean”,
“model”: “DMC-12”,
“vin”: 8056309,
“mechanicNotes”: “Great Scott!”}
]
}
I wish to query and return only the value "vin" in "_id : 1234". Any suggestion is much appreciated.
You can use the field selection parameter with dot notation to constrain the output to just the desired field:
db.test.find({_id: 1234}, {_id: 0, 'cars.vin': 1})
Output:
{
"cars" : [
{
"vin" : 3928056
},
{
"vin" : 8056309
}
]
}
Or if you just want an array of vin values you can use aggregate:
db.test.aggregate([
// Find the matching doc
{$match: {_id: 1234}},
// Duplicate it, once per cars element
{$unwind: '$cars'},
// Group it back together, adding each cars.vin value as an element of a vin array
{$group: {_id: '$_id', vin: {$push: '$cars.vin'}}},
// Only include the vin field in the output
{$project: {_id: 0, vin: 1}}
])
Output:
{
"result" : [
{
"vin" : [
3928056,
8056309
]
}
],
"ok" : 1
}
If by query, you mean get the values of the vins in javascript, you could read the json into a string called theString (or any other name) and do something like:
var f = [], obj = JSON.parse(theString);
obj.cars.forEach(function(item) { f.push(item.vin) });
If your json above is part of a larger collection, then you'd need an outer loop.