I have a collection with documents in MongoDB:
[
{
"_id" : ObjectId("61ba65af74cf385ee93ad2c7"),
"Car_brand":"BMW X7",
"Plate_number":"8OP66",
"Model_year":"2018",
"Company":"BMW",
"Purchase_year":"2019",
"Body_color":"red",
"Mileage":1000,
"Price":35000,
"Body_type":"crossover"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2c8"),
"Car_brand":"Tesla Model X",
"Plate_number":"5XR56",
"Model_year":"2015",
"Company":"Tesla Motors",
"Purchase_year":"2019",
"Body_color":"white",
"Mileage":800,
"Price":25000,
"Body_type":"SUV"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2c9"),
"Car_brand":"Tesla Cybertruck",
"Plate_number":"2ED45",
"Model_year":"2021",
"Company":"Tesla Motors",
"Purchase_year":"2021",
"Body_color":"gray",
"Mileage":0,
"Price":50000,
"Body_type":"pickup"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2ca"),
"Car_brand":"Lamborghini Aventador",
"Plate_number":"2MN50",
"Model_year":"2011",
"Company":"Lamborghini",
"Purchase_year":"2017",
"Body_color":"orange",
"Mileage":700,
"Price":45000,
"Body_type":"supercar"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2cb"),
"Car_brand":"BMW X7",
"Plate_number":"3QW14",
"Model_year":"2018",
"Company":"BMW",
"Purchase_year":"2020",
"Body_color":"black",
"Mileage":4500,
"Price":14000,
"Body_type":"crossover"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2cc"),
"Car_brand":"Mercedes-Benz G-Class",
"Plate_number":"9KI24",
"Model_year":"2017",
"Company":"Mercedes-Benz",
"Purchase_year":"2017",
"Body_color":"black",
"Mileage":6000,
"Price":13000,
"Body_type":"SUV"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2cd"),
"Car_brand":"BMW X6",
"Plate_number":"2GH47",
"Model_year":"2008",
"Company":"BMW",
"Purchase_year":"2016",
"Body_color":"white",
"Mileage":4500,
"Price":14500,
"Body_type":"SUV"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2ce"),
"Car_brand":"Chevrolet Camaro",
"Plate_number":"7BV58",
"Model_year":"2015",
"Company":"Chevrolet",
"Purchase_year":"2020",
"Body_color":"orange",
"Mileage":4000,
"Price":43000,
"Body_type":"cabriolet"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2cf"),
"Car_brand":"Ford Mustang",
"Plate_number":"4AM23",
"Model_year":"2016",
"Company":"Ford Motor Company",
"Purchase_year":"2019",
"Body_color":"purple",
"Mileage":2000,
"Price":30000,
"Body_type":"cabriolet"
},
{
"_id" : ObjectId("61ba65af74cf385ee93ad2d0"),
"Car_brand":"Dodge Challenger",
"Plate_number":"6DL73",
"Model_year":"2020",
"Company":"Dodge (Chrysler Corporation)",
"Purchase_year":"2020",
"Body_color":"red",
"Mileage":0,
"Price":40000,
"Body_type":"muscle car"
}
]
I need to get:
Number of colors for each body type.
The company with two car brands.
I've tried the first one like this:
db.Vehicles.aggregate([{$group:{"_id":"$Body_type","Количество цветов":{$sum: 1}}}]);
But I get all the colors including those which repeat twice. And I need to get a number of distinct colors.
And I can't think of any suggestions for the second one.
Thanks.
Number of colors for each body type
You can do it like this:
db.collection.aggregate([
{
"$group": {
"_id": "$Body_type",
"colors_array": {
"$addToSet": "$Body_color"
}
}
},
{
"$project": {
"colors_number": {
"$size": "$colors_array"
},
"colors_array": 1
}
}
])
Note that I included the array of all the colors as well - colors_array property. If you want only number of unique colors, and not colors array in addition, you just change "colors_array": 1 to "colors_array": -1.
Working example
The company with two car brands.
You can do it like this:
db.collection.aggregate([
{
"$group": {
"_id": "$Company",
"brands_array": {
"$addToSet": "$Car_brand"
}
}
},
{
"$set": {
"brands_number": {
"$size": "$brands_array"
}
}
},
{
"$match": {
"brands_number": 2
}
}
])
Working example
Related
I am using mongoDB, but I am a complete beginner. I have two different queries where I want to combine them both into one output (I'm hoping the answer is a single query)
Query 1:
db.fin.aggregate([
{ "$match": { "misc.incident_characteristics": { "$not": /Officer Involved Incident/ } } },
{ $group: {
_id: "NonOfficerInvolved",
nInjured: { $avg: "$casualties.n_injured" },
nKilled: { $avg: "$casualties.n_killed" }
}
}
])
Which returns
{ "_id" : "NonOfficerInvolved", "nInjured" : 0.5048153043227224, "nKilled" : 0.24339953718948618 }
Query 2:
db.fin.aggregate([
{ $match: { "misc.incident_characteristics": "Officer Involved Incident" } },
{ $group: {
_id: "OfficerInvolved",
nInjured: { $avg: "$casualties.n_injured" },
nKilled: { $avg: "$casualties.n_killed" }
}
}
])
Which returns
{ "_id" : "OfficerInvolved", "nInjured" : 0.3599233845980508, "nKilled" : 0.358965692073686 }
I would like to get the result of both into one table as seen below. Is it possible to do this in one query?
{ "_id" : "NonOfficerInvolved", "nInjured" : 0.5048153043227224, "nKilled" : 0.24339953718948618 }
{ "_id" : "OfficerInvolved", "nInjured" : 0.3599233845980508, "nKilled" : 0.358965692073686 }
Yes, you can definitely do that in one query.
Instead of just matching against the magic string, store the match result in a new field as a boolean or string, then group on that new field.
db.fin.aggregate([
{ "$addFields": { type:{
$cond:[
{$eq:["$misc.incident_characteristics","Officer Involved Incident"]},
"OfficerInvolved",
"NonOfficerInvolved"
]
}}},
{ $group: {
_id: "$type",
nInjured: { $avg: "$casualties.n_injured" },
nKilled: { $avg: "$casualties.n_killed" }
}
}
])
To match using a regular expression, replace the line
{$eq:["$misc.incident_characteristics","Officer Involved Incident"]},
with
{$regexMatch:{
input:"$misc.incident_characteristics",
regex:"Officer Involved Incident"
options:"i"
}},
I have created a mongodb and by mistake have entered duplicate values in the form of capital and small case letters.
I have made the index unique. MongoDB is case sensitive and hence, considered the capital letter and small letter as different values.
Now my problem is the database have got around 32 GB. and I came across this issue. Kindly help me.
Here is the sample:
db.tt.createIndex({'email':1},{unique:true})
> db.tt.find().pretty()
{
"_id" : ObjectId("591d706c0ef9acde11d7af66"),
"email" : "g#gmail.com",
"src" : [
{
"acc" : "ln"
},
{
"acc" : "drb"
}
]
}
{
"_id" : ObjectId("591d70740ef9acde11d7af68"),
"email" : "G#gmail.com",
"src" : [
{
"acc" : "ln"
},
{
"acc" : "drb"
},
{
"acc" : "dd"
}
]
}
How I can make the email as lowercase and assign the src values to the original one. Kindly help me.
you can achive this using $toLower aggregation operator like this :
db.tt.aggregate([
{
$project:{
email:{
$toLower:"$email"
},
src:1
}
},
{
$unwind:"$src"
},
{
$group:{
_id:"$email",
src:{
$addToSet:"$src"
}
}
},
{
$project:{
_id:0,
email:"$_id",
src:1
}
},
{
$out:"anotherCollection"
}
])
$addToSet allow to keep oly one distinct occurence of src items
this will write this document to a new collection named anotherCollection:
{ "email" : "g#gmail.com", "src" : [ { "acc" : "dd" }, { "acc" : "drb" }, { "acc" : "ln" } ] }
Note that with $out, you can averwrite directly your tt collection, however before doing this make sure to understand what your doing because all previous data will be lost
The most efficient way I can think of to merge the data is run an aggregation and loop the result to write back to the collection in bulk operations:
var ops = [];
db.tt.aggregate([
{ "$unwind": "$src" },
{ "$group": {
"_id": { "$toLower": "$email" },
"src": { "$addToSet": "$src" },
"ids": { "$addToSet": "$_id" }
}}
]).forEach(doc => {
var id = doc.ids.shift();
ops = [
...ops,
{
"deleteMany": {
"filter": { "_id": { "$in": doc.ids } }
}
},
{
"updateOne": {
"filter": { "_id": id },
"update": {
"$set": { "email": doc._id },
"$addToSet": { "src": { "$each": doc.src } }
}
}
},
];
if ( ops.length >= 500 ) {
db.tt.bulkWrite(ops);
ops = [];
}
});
if ( ops.length > 0 )
db.tt.bulkWrite(ops);
In steps, that's $unwind the array items so they can be merged via $addToSet, under a $group on using $toLower on the email value. You also want to keep the set of unique source document ids.
In the loop you shift the first _id value off of doc.ids and update that document with the lowercase email and the revised "src" set. Using $addToSet here makes the operation write safe with any other updates that might occur to the document.
Then the other operation in the loop deletes the other documents that shared the same converted case email, so there are no duplicates. Actually do that one first. The default "ordered" operations make sure this is fine.
And do it in the shell, since it's a one-off operation and is really just as simple as listing as shown.
My function looks like below.
function (x)
{
var SO2Min = db.AirPollution.aggregate(
[
{
$match : {"SO2":{$ne:'NA'}, "State":{$eq: x} }
},
{
$group:
{
_id: x,
SO2MinQuantity: { $min: "$SO2" }
}
},
{
$project:
{SO2MinQuantity: '$SO2MinQuantity'
}
}
]
)
db.AirPollution.update
(
{
"State": "West Bengal"},
{
$set: {
"MaxSO2": SO2Max
}
},
{
"multi": true
}
);
}
Here, AirPolltuion is my Collection. If I run this function, the collection gets updated with new column MaxSO2 as below.
{
"_id" : ObjectId("5860a2237796484df5656e0c"),
"Stn Code" : 11,
"Sampling Date" : "02/01/15",
"State" : "West Bengal",
"City/Town/Village/Area" : "Howrah",
"Location of Monitoring Station" : "Bator, Howrah",
"Agency" : "West Bengal State Pollution Control Board",
"Type of Location" : "Residential, Rural and other Areas",
"SO2" : 10,
"NO2" : 40,
"RSPM/PM10" : 138,
"PM 2.5" : 83,
"MaxSO2" : {
"_batch" : [
{
"_id" : "West Bengal",
"SO2MaxQuantity" : 153
}
],
"_cursor" : {}
}
}
Where we can see, that MaxSO2 has been added as a sub document. But I want that column to be added inside same document as a field, not as a part of sub document. Precisely, I dont want batch and cursor fields to come up. Please help.
Since the aggregate function returns a cursor, you can use the toArray() method which returns an array that contains all the documents from a cursor and then access the aggregated field. Because you are returning a single value from the aggregate, there's no need to iterate the results array, just access the first and only single document in the result to get the value.
Once you get this value you can then update your collection using updateMany() method. So you can refactor your code to:
function updateMinAndMax(x) {
var results = db.AirPollution.aggregate([
{
"$match" : {
"SO2": { "$ne": 'NA' },
"State": { "$eq": x }
}
},
{
"$group": {
"_id": x,
"SO2MinQuantity": { "$min": "$SO2" },
"SO2MaxQuantity": { "$max": "$SO2" }
}
},
]).toArray();
var SO2Min = results[0]["SO2MinQuantity"];
var SO2Max = results[0]["SO2MaxQuantity"];
db.AirPollution.updateMany(
{ "State": x },
{ "$set": { "SO2MinQuantity": SO2Min, "SO2MaxQuantity": SO2Max } },
);
}
updateMinAndMax("West Bengal");
I want to return Object as a field in my Aggregation result similar to the solution in this question. However in the solution mentioned above, the Aggregation results in an Array of Objects with just one item in that array, not a standalone Object. For example, a query like the following with a $push operation
$group:{
_id: "$publisherId",
'values' : { $push:{
newCount: { $sum: "$newField" },
oldCount: { $sum: "$oldField" } }
}
}
returns a result like this
{
"_id" : 2,
"values" : [
{
"newCount" : 100,
"oldCount" : 200
}
]
}
}
not one like this
{
"_id" : 2,
"values" : {
"newCount" : 100,
"oldCount" : 200
}
}
}
The latter is the result that I require. So how do I rewrite the query to get a result like that? Is it possible or is the former result the best I can get?
You don't need the $push operator, just add a final $project pipeline that will create the embedded document. Follow this guideline:
var pipeline = [
{
"$group": {
"_id": "$publisherId",
"newCount": { "$sum": "$newField" },
"oldCount": { "$sum": "$oldField" }
}
},
{
"$project" {
"values": {
"newCount": "$newCount",
"oldCount": "$oldCount"
}
}
}
];
db.collection.aggregate(pipeline);
{
"_id":{
"oid":"4f33bf69873dbc73a7d21dc3"
},
"country":"IND",
"states":[{
"name":"orissa",
"direction":"east",
"population":41947358,
"districts":[{
"name":"puri",
"headquarter":"puri",
"population":1498604
},
{
"name":"khordha",
"headquarter":"bhubaneswar",
"population":1874405
}
]
},
{
"name":"andhra pradesh",
"direction":"south",
"population":84665533,
"districts":[{
"name":"rangareddi",
"headquarter":"hyderabad",
"population":3506670
},
{
"name":"vishakhapatnam",
"headquarter":"vishakhapatnam",
"population":3789823
}
]
}
]
}
In above collection(i.e countries) i have only one document , and i want to fetch the details about a particular state (lets say "country.states.name" : "orissa" ) ,But i want my result as here under instead of entire document .Is there a way in Mogo...
{
"name": "orissa",
"direction": "east",
"population": 41947358,
"districts": [
{
"name": "puri",
"headquarter": "puri",
"population": 1498604
},
{
"name": "khordha",
"headquarter": "bhubaneswar",
"population": 1874405
}
]
}
Thanks
Tried this:
db.countries.aggregate(
{
"$project": {
"state": "$states",
"_id": 0
}
},
{
"$unwind": "$state"
},
{
"$group": {
"_id": "$state.name",
"state": {
"$first": "$state"
}
}
},
{
"$match": {
"_id": "orissa"
}
}
);
And got:
{
"result" : [
{
"_id" : "orissa",
"state" : {
"name" : "orissa",
"direction" : "east",
"population" : 41947358,
"districts" : [
{
"name" : "puri",
"headquarter" : "puri",
"population" : 1498604
},
{
"name" : "khordha",
"headquarter" : "bhubaneswar",
"population" : 1874405
}
]
}
}
],
"ok" : 1
You can't do it right now, but you will be able to with $unwind in the aggregation framework. You can try it now with the experimental 2.1 branch, the stable version will come out in 2.2, probably in a few months.
Any query in mongodb always return root document.
There is only one way for you to load one sub document with parent via $slice if you know ordinal number of state in nested array:
// skip ordinalNumberOfState -1, limit 1
db.countries.find({_id: 1}, {states:{$slice: [ordinalNumber -1 , 1]}})
$slice work in default order (as documents was inserted in nested array).
Also if you don't need fields from a country you can include only _id and states in result:
db.countries.find({_id: 1}, {states:{$slice: [ordinalNumber -1 , 1]}, _id: 1})
Then result document will looks like this one:
{
"_id":{
"oid":"4f33bf69873dbc73a7d21dc3"
},
"states":[{
"name":"orissa",
"direction":"east",
"population":41947358,
"districts":[{
"name":"puri",
"headquarter":"puri",
"population":1498604
},
{
"name":"khordha",
"headquarter":"bhubaneswar",
"population":1874405
}
]
}]
}
db.countries.find({ "states": { "$elemMatch": { "name": orissa }}},{"country" : 1, "states.$": 1 })
If you don't want to use aggregate, you can do it pretty easily at the application layer using underscore (included by default):
var country = Groops.findOne({"property":value);
var state _.where(country, {"state":statename});
This will give you the entire state record that matches statename. Very convenient.