MongoDB distinct- returns only matched array elements - mongodb

I was trying to fetch distinct tags from array for auto-complete module. The collection format is:
{
tags:["apple","mango","apple-pie"]
},
{
tags: ["man","lemon","lemon-lite"]
}
Now, I am interested in getting distinct tags, with prefix q.
The query that I triggered is:
db.portfolio.distinct("tags",{"tags":/app/});
However, this query returned entire array:
["apple","mango","apple-pie"].
My requirement is: ["apple", "apple-pie"].
How can I modify my query to get desired result?

You can do this with aggregation.
You $unwind the tags array.
You $match those tags you are looking for according to the regular expression given.
You $group the tags into a set using $addToSet.
The code looks something like this:
> db.portfolio.aggregate([
{ "$unwind": "$tags" },
{ "$match": { "tags": /app/ }},
{ "$group":
{
"_id": null,
"tags": { "$addToSet": "$tags" }
}
}
]);
{ "_id" : null, "tags" : [ "apple-pie", "apple" ] }

Not possible with distinct because query will return documents containing matching tags(/app/) which mean there will be non matching tags as well. distinct gets a distinct set from all these tags.
So you will have to filter the returning array again( using regex /app/)

Related

Get count of a value of a subdocument inside an array with mongoose

I have Collection of documents with id and contact. Contact is an array which contains subdocuments.
I am trying to get the count of contact where isActive = Y. Also need to query the collection based on the id. The entire query can be something like
Select Count(contact.isActive=Y) where _id = '601ad0227b25254647823713'
I am using mongo and mongoose for the first time. Please edit the question if I was not able to explain it properly.
You can use an aggregation pipeline like this:
First $match to get only documents with desired _id.
Then $unwind to get different values inside array.
Match again to get the values which isActive value is Y.
And $group adding one for each document that exists (i.e. counting documents with isActive= Y). The count is stores in field total.
db.collection.aggregate([
{
"$match": {"id": 1}
},
{
"$unwind": "$contact"
},
{
"$match": {"contact.isActive": "Y"}
},
{
"$group": {
"_id": "$id",
"total": {"$sum": 1}
}
}
])
Example here

Trying to fetch data from Nested MongoDB Database?

I am beginner in MongoDB and struck at a place I am trying to fetch data from nested array but is it taking so long time as data is around 50K data, also it is not much accurate data, below is schema structure please see once -
{
"_id": {
"$oid": "6001df3312ac8b33c9d26b86"
},
"City": "Los Angeles",
"State":"California",
"Details": [
{
"Name": "Shawn",
"age": "55",
"Gender": "Male",
"profession": " A science teacher with STEM",
"inDate": "2021-01-15 23:12:17",
"Cars": [
"BMW","Ford","Opel"
],
"language": "English"
},
{
"Name": "Nicole",
"age": "21",
"Gender": "Female",
"profession": "Law student",
"inDate": "2021-01-16 13:45:00",
"Cars": [
"Opel"
],
"language": "English"
}
],
"date": "2021-01-16"
}
Here I am trying to filter date with date and Details.Cars like
db.getCollection('news').find({"Details.Cars":"BMW","date":"2021-01-16"}
it is returning details of other persons too which do not have cars- BMW , Only trying to display details of person like - Shawn which have BMW or special array value and date too not - Nicole, rest should not appear but is it not happening.
Any help is appreciated. :)
A combination of $match on the top-level fields and $filter on the array elements will do what you seek.
db.foo.aggregate([
{$match: {"date":"2021-01-16"}}
,{$addFields: {"Details": {$filter: {
input: "$Details",
as: "zz",
cond: { $in: ['BMW','$$zz.Cars'] }
}}
}}
,{$match: {$expr: { $gt:[{$size:"$Details"},0] } }}
]);
Notes:
$unwind is overly expensive for what is needed here and it likely means "reassembling" the data shape later.
We use $addFields where the new field to add (Details) already exists. This effectively means "overwrite in place" and is a common idiom when filtering an array.
The second $match will eliminate docs where the date matches but not a single entry in Details.Cars is a BMW i.e. the array has been filtered down to zero length. Sometimes you want to know this info so if this is the case, do not add the final $match.
I recommend you look into using real dates i.e. ISODate instead of strings so that you can easily take advantage of MongoDB date math and date formatting functions.
Is a common mistake think that find({nested.array:value}) will return only the nested object but actually, this query return the whole object which has a nested object with desired value.
The query is returning the whole document where value BMW exists in the array Details.Cars. So, Nicole is returned too.
To solve this problem:
To get multiple elements that match the criteria you can do an aggregation stage using $unwind to separate the different objects into array and match by the criteria you want.
db.collection.aggregate([
{
"$match": { "Details.Cars": "BMW", "date": "2021-01-26" }
},
{
"$unwind": "$Details"
},
{
"$match": { "Details.Cars": "BMW" }
}
])
This query first match by the criteria to avoid $unwind over all collection.
Then $unwind to get every document and $match again to get only the documents you want.
Example here
To get only one element (for example, if you match by _id and its unique) you can use $elemMatch in this way:
db.collection.find({
"Details.Cars": "BMW",
"date": "2021-01-16"
},
{
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
}
})
Example here
You can use $elemenMatch into query or projection stage. Docs here and here
Using $elemMatch into query the way is this:
db.collection.find({
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
},
"date": "2021-01-16"
},
{
"Details.$": 1
})
Example here
The result is the same. In the second case you are using positional operator to return, as docs says:
The first element that matches the query condition on the array.
That is, the first element where "Cars": "BMW".
You can choose the way you want.

Pymongo - Query mongdb for first array elemnet by query of list of values

Given collection:
{
"_id" : "1.1000038",
"recomendation" : [
"1.6739718"
]
}
/* 2 */
{
"_id" : "1.1000069",
"recomendation" : [
"1.9185509",
"1.9051998",
"1.9034279",
"1.8288046",
"1.8152670",
"1.858775",
"1.6224229",
"1.4591674",
"1.3862464",
"1.3427739",
"1.3080062",
"1.3003608",
"1.1694619",
"1.1634683",
"1.1590664",
"1.1524146",
"1.754599",
"1.700837",
"1.763617"
]
}
I need to query the MongoDB for a list of values and get the first element of the list of values
here is the query by mongo syntax
db.getCollection('similar_articles').find({"_id":{$in:["1.1000069","1.1000038"]}})
I don't want to filter it on the python side because it's can be too big.
I didn't find any documentation on it
desire output:
Pandas DataFrame
_id recom
1.1000038 1.6739718
1.1000069 1.9185509
I don't know pymongo so well, but you need this query:
First $match by _ids into the arreay (this is like the find you have).
And later use $project to create the field recom (you can use "recomendation" to overwrite the existing field) and set the value as the first into the array.
db.collection.aggregate([
{
"$match": { "_id": { "$in": [ "1.1000069", "1.1000038" ] } }
},
{
"$project": { "recom": { "$arrayElemAt": [ "$recomendation", 0 ] } }
}
])
Example here
Looking the doumentation it seems you only need to copy and paste this query.

Converting some fields in Mongo from String to Array

I have a collection of documents where a "tags" field was switched over from being a space separated list of tags to an array of individual tags. I want to update the previous space-separated fields to all be arrays like the new incoming data.
I'm also having problems with the $type selector because it is applying the type operation to individual array elements, which are strings. So filtering by type just returns everything.
How can I get every document that looks like the first example into the format for the second example?
{
"_id" : ObjectId("12345"),
"tags" : "red blue green white"
}
{
"_id" : ObjectId("54321"),
"tags" : [
"red",
"orange",
"black"
]
}
We can't use the $type operator to filter our documents here because the type of the elements in our array is "string" and as mentioned in the documentation:
When applied to arrays, $type matches any inner element that is of the specified BSON type. For example, when matching for $type : 'array', the document will match if the field has a nested array. It will not return results where the field itself is an array.
But fortunately MongoDB also provides the $exists operator which can be used here with a numeric array index.
Now how can we update those documents?
Well, from MongoDB version <= 3.2, the only option we have is mapReduce() but first let look at the other alternative in the upcoming release of MongoDB.
Starting from MongoDB 3.4, we can $project our documents and use the $split operator to split our string into an array of substrings.
Note that to split only those "tags" which are string, we need a logical $condition processing to split only the values that are string. The condition here is $eq which evaluate to true when the $type of the field is equal to "string". By the way $type here is new in 3.4.
Finally we can overwrite the old collection using the $out pipeline stage operator. But we need to explicitly specify the inclusion of other field in the $project stage.
db.collection.aggregate(
[
{ "$project": {
"tags": {
"$cond": [
{ "$eq": [
{ "$type": "$tags" },
"string"
]},
{ "$split": [ "$tags", " " ] },
"$tags"
]
}
}},
{ "$out": "collection" }
]
)
With mapReduce, we need to use the Array.prototype.split() to emit the array of substrings in our map function. We also need to filter our documents using the "query" option. From there we will need to iterate the "results" array and $set the new value for "tags" using bulk operations using the bulkWrite() method new in 3.2 or the now deprecated Bulk() if we are on 2.6 or 3.0 as shown here.
db.collection.mapReduce(
function() { emit(this._id, this.tags.split(" ")); },
function(key, value) {},
{
"out": { "inline": 1 },
"query": {
"tags.0": { "$exists": false },
"tags": { "$type": 2 }
}
}
)['results']

MongoDB: Sort by field existing and then alphabetically

In my database I have a field of name. In some records it is an empty string, in others it has a name in it.
In my query, I'm currently doing:
db.users.find({}).sort({'name': 1})
However, this returns results with an empty name field first, then alphabetically returns results. As expected, doing .sort({'name': -1}) returns results with a name and then results with an empty string, but it's in reverse-alphabetical order.
Is there an elegant way to achieve this type of sorting?
How about:
db.users.find({ "name": { "$exists": true } }).sort({'name': 1})
Because after all when a field you want to sort on is not actually present then the returned value is null and therefor "lower" in the order than any positive result. So it makes sense to exclude those results if you really are only looking for something with a matching value.
If you really want all the results in there and regarless of a null content, then I suggest you "weight" them via .aggregate():
db.users.aggregate([
{ "$project": {
"name": 1,
"score": {
"$cond": [
{ "$ifNull": [ "$name", false ] },
1,
10
]
}
}},
{ "$sort": { "score": 1, "name": 1 } }
])
And that moves all null results to the "end of the chain" by assigning a value as such.
If you want to filter out documents with an empty "name" field, change your query: db.users.find({"name": {"$ne": ""}}).sort({"name": 1})