Cast String as Numeric in Find and Sort Operations - mongodb

I have a mongo collection called items. I want to find the 10 highest priced items out of the active ones. My problem is the price is a string. So my question is how can I cast price as numeric and then sort the active items in descending order over price?
My current attempt gives me the highest price in alphanumeric order, i.e. 999. But I have items that are way pricier.
db.getCollection('items').find({"status": "active"})
.sort({"packet.price":-1})
.limit(10)
I tried:
sort({{$toInt:"packet.price"}:-1}),
sort({NumberInt("packet.price"):-1})
but no luck.

It is not possible with find method, you can use the aggregation framework,
$match to match your query condition
$addFields to change type of price field using $toInt operator
$sort by price in descending order
$limit 10 documents
db.getCollection('items').aggregate([
{ "$match": { "status": "active" }, },
{ "$addFields": { "$toInt": "$packet.price" } },
{ "$sort": { "packet.price": -1 } },
{ "$limit": 10 }
])

Related

Use MongoDB $sort twice

Is there a way to use the $sort operator twice within a single aggregation pipeline?
I know that using a singular $sort with two keys works properly, i.e. sort by the first key, then the second.
My current project requires multiple $sort stages to exist, for example
db.collection.aggregate([
{
$sort: {
"age": 1
}
},
{
$sort: {
"score": -1
}
}
])
Currently, the second stage doesn't respect the result of the first stage. Is there any workaround for that?
Is it possible to, for example, assign each document a new field 'index' after the first stage, storing its index within the current array of results, and use that field in the second $sort stage?
You can use multiple value in '$sort'.
db.collection.aggregate([
{
"$sort": {
"age": 1,
"score": -1
}
}
])
I have define mongo playground link, you can refer it
https://mongoplayground.net/p/ZaRX_XNSXhu

looking for some queries in mongodb. I have to write one query for every question

Questions are
How many products have a discount on them?
How many "Water" products don't have any discount on them?
How many products have "mega" in their name?
How many products have discount % greater than 30%?
Which brand in "Oil" section is selling the most number of products?
data format
{"_id":{"$oid":"62363ce631312ffd2dc724f5"},
{"Title":"Fortune mega"},
{"Brand":"Gewti Laurent}"
{"Name":"lorem ipsum"},
{"Original_Price":590},
{"Sale_price":590},
{"Product_category":"Oil"}
{"_id":{"$oid":"62363ce631342ffd2dc724f5"},
{"Title":"katy mega"},
{"Brand":"Gewti Laurent},
{"Name":"targi"},
{"Original_Price":1890},
{"Sale_price":1890},
{"Product_category":"Oil"}
{"_id":{"$oid":"62363ce641312ffd2dc724f5"},
{"Title":"ydnsu"},
{"Brand":"Gewti Laurent},
{"Name":"otgu"},
{"Original_Price":1390},
{"Sale_price":1290},
{"Product_category":"Water"}
{"_id":{"$oid":"62363ce431312ffd2dc724f5"},
{"Title":"ykjssu"},
{"Brand":"Gewti Laurent},
{"Name":"itru"},
{"Original_Price":190},
{"Sale_price":170},
{"Product_category":"Water"}
Here is using aggregation pipeline:
Products having discount:
db.collection.aggregate([
{
"$match": {
"$expr": {
"$lte": [
"$Sale_Price",
"$Original_Price"
]
}
}
}
])
"Water" products not having any discount:
db.collection.aggregate([
{
"$match": {
"Product_category": "Water",
"$expr": {
"$gte": [
"$Sale_Price",
"$Original_Price"
]
}
}
}
])
Note: If you want the opposite, i.e., find Water Products having discount simply change $gte to $lte
Products having "mega" in their name:
db.collection.find({
"Title": {
"$regex": "mega"
}
})
Note: This isn't aggregation. This query can be pulled off directly.
Products having discount % greater than 30:
db.collection.aggregate([
{
"$addFields": {
"discount_in_percent": {
"$multiply": [
{
"$divide": [
{
"$subtract": [
"$Original_Price",
"$Sale_price"
]
},
"$Original_Price"
]
},
100
],
}
}
},
{
"$match": {
"Product_category": "Water",
"$expr": {
"$gte": [
"$discount_in_percent",
30
]
}
}
}
])
Note: This is bit an overkill but it will let you know the percentage of discount on a product.
Your last question to find most selling product under a particular category is vague as pointed by #Rohit Roa. But from the given data it can the a product having the highest discount. For that you can play around a bit with second query by tweaking the $expr selector and using $project operator to get a particular product.
Since your question is more about how many rather than which you can look up on $size to get the count or loop the result.
How many products have a discount on them ?
db.products.find().toArray().filter(product => product.Sale_price < product.Original_Price).length
find() returns all products in the database, toArray() converts into an array on which you can use other functions. filter() then says look at every product in the list and return a new list of products where the Sale_price is lower than the Original_Price and .length just measures the length of that list.
How many "Water" products don't have any discount on them?
db.products.find({"Product_category": "Water"}).toArray().filter(product => product.Sale_price >= product.Original_Price).length
Similar to first one, except now that find() accepts a filter to filter by product category and filter() accepts a different filter.
How many products have "mega" in their name ?
db.products.find({"Name": {$regex: /mega/}}).toArray().length
Here use a regular expression to search for words containing mega
How many products have discount % greater than 30% ?
db.products.find().toArray().filter(product => (product.Original_Price-product.Sale_price) > 0.3*product.Original_Price).length
Similar to earlier, just change the filter.
Need more input to figure this out.
These commands will work in Mongo Shell. Refer to the documentation if you need to query the database via node or python using similar functions

Trying to select single documents from mongo collection

We have a rudimentary versioning system in a collection that uses a field (pageId) as a root key. Subsequent versions of this page have the same pageId. This allows us to very easily find all versions of a single page.
How do I go about running a query that returns only the lastModified document for each distinct pageId.
In psuedo-code you could say:
For each distinct pageId
sort documents based on lastModified descending
and return only the first document
You can use the aggregation pipelines for that.
$sort - Sorts all input documents and returns them to the pipeline in sorted order.
$group - Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping.
$first - Returns the value that results from applying an expression to the first document in a group of documents that share the same group by key.
Example:
db.getCollection('t01').aggregate([
{
$sort: {'lastModified': -1}
},
{
$group: {
_id: "$pageId",
element1: { $first: "$element1" },
element2: { $first: "$element2" },
elementN: { $first: "$elementN" },
}
}
]);

Count occurrences of duplicate values

How do I structure my MongooseJS/MongoDB query to get total duplicates/occurrences of a particular field value? Aka: The total documents with custID of some value for all custIDs
I can do this manually in command line:
db.tapwiser.find({"custID" : "12345"}, {}, {}).count();
Outputs: 1
db.tapwiser.find({"custID" : "6789"}, {}, {}).count();
Outputs: 4
I found this resource:
How to sum distinct values of a field in a MongoDB collection (utilizing mongoose)
But it requires that I specify the unique fields I want to sum.
In this case, I want to loop through all documents, sum the occurrences of each.
All you need to do is $group your documents by custID and use the $sum accumulator operator to return "count" for each group.
db.tapwiser.aggregate(
[
{ "$group": { "_id": "$custID", "count": { "$sum": 1 } } }
], function(err, results) {
// Do something with the results
}
)

Get distinct records with specified fields that match a value, paginated

I'm trying to get all documents in my MongoDB collection
by distinct customer ids (custID)
where status code == 200
paginated (skipped and limit)
return specified fields
var Order = mongoose.model('Order', orderSchema());
My original thought was to use mongoose db query, but you can't use distinct with skip and limit as Distinct is a method that returns an "array", and therefore you cannot modify something that is not a "Cursor":
Order
.distinct('request.headers.custID')
.where('response.status.code').equals(200)
.limit(limit)
.skip(skip)
.exec(function (err, orders) {
callback({
data: orders
});
});
So then I thought to use Aggregate, using $group to get distinct customerID records, $match to return all unique customerID records that have status code of 200, and $project to include the fields that I want:
Order.aggregate(
[
{
"$project" :
{
'request.headers.custID' : 1,
//other fields to include
}
},
{
"$match" :
{
"response.status.code" : 200
}
},
{
"$group": {
"_id": "$request.headers.custID"
}
},
{
"$skip": skip
},
{
"$limit": limit
}
],
function (err, order) {}
);
This returns an empty array though. If I remove project, only $request.headers.custID field is returned when in fact I need more.
Any thoughts?
The thing you need to understand about aggregation pipelines is generally the word "pipeline" means that each stage only receives the input that is emitted by the preceeding stage in order of execution. The best analog to think of here is "unix pipe" |, where the output of one command is "piped" to the other:
ps aux | grep mongo | tee out.txt
So aggregation pipelines work in much the same way as that, where the other main thing to consider is both $project and $group stages operate on only emitting those fields you ask for, and no others. This takes a little getting used to compared to declarative approaches like SQL, but with a little practice it becomes second nature.
Other things to get used to are stages like $match are more important to place at the beginning of a pipeline than field selection. The primary reason for this is possible index selection and usage, which speeds things up immensely. Also, field selection of $project followed by $group is somewhat redundant, as both essentially select fields anyway, and are usually best combined where appropriate anyway.
Hence most optimially you do:
Order.aggregate(
[
{ "$match" : {
"response.status.code" : 200
}},
{ "$group": {
"_id": "$request.headers.custID", // the grouping key
"otherField": { "$first": "$otherField" },
// and so on for each field to select
}},
{ "$skip": skip },
{ "$limit": limit }
],
function (err, order) {}
);
Where the main thing here to remember about $group is that all other fields than _id ( which is the grouping key ) require the use of an accumulator to select, since there is in fact always a multiple occurance of the values for the grouping key.
In this case we are using $first as an accumulator, which will take the first occurance from the grouping boundary. Commonly this is used following a $sort, but does not need to be so, just as long as you understand the behavior of what is selected.
Other accumulators like $max simply take the largest value of the field from within the values inside the grouping key, and are therefore independant of the "current record/document" unlike $first or $last. So it all depends on your needs.
Of course you can shorcut the selection in modern MongoDB releases after MongoDB 2.6 with the $$ROOT variable:
Order.aggregate(
[
{ "$match" : {
"response.status.code" : 200
}},
{ "$group": {
"_id": "$request.headers.custID", // the grouping key
"document": { "$first": "$$ROOT" }
}},
{ "$skip": skip },
{ "$limit": limit }
],
function (err, order) {}
);
Which would take a copy of all fields in the document and place them under the named key ( which is "document" in this case ). It's a shorter way to notate, but of course the resulting document has a different structure, being now all under the one key as sub-fields.
But as long as you understand the basic principles of a "pipeline" and don't exclude data you want to use in later stages by previous stages, then you generally should be okay.