I have the following object in a mongodb. I'm wanting to find the genre that has won the highest number of awards, and out of the whole collection find the top 3 most found genres. I'm not really sure how to go about targeting specific fields within a collection like this, is it better to treat it as a large array? or is that a ridiculous comment.
Tried query which fails because the genre field is not an accumulator
db.MovieData.aggregate([
{$sort:{"awards.wins":-1}},
{$group:{"genres":"$genres"}}
])
Example data, there is far more data but i have limited it to 2 insertions
[
{
"title": "Once Upon a Time in the West",
"year": 1968,
"rated": "PG-13",
"runtime": 175,
"countries": [
"Italy",
"USA",
"Spain"
],
"genres": [
"Western"
],
"director": "Sergio Leone",
"writers": [
"Sergio Donati",
"Sergio Leone",
"Dario Argento",
"Bernardo Bertolucci",
"Sergio Leone"
],
"actors": [
"Claudia Cardinale",
"Henry Fonda",
"Jason Robards",
"Charles Bronson"
],
"plot": "Epic story of a mysterious stranger with a harmonica who joins forces with a notorious desperado to protect a beautiful widow from a ruthless assassin working for the railroad.",
"poster": "http://ia.media-imdb.com/images/M/MV5BMTEyODQzNDkzNjVeQTJeQWpwZ15BbWU4MDgyODk1NDEx._V1_SX300.jpg",
"imdb": {
"id": "tt0064116",
"rating": 8.6,
"votes": 201283
},
"tomato": {
"meter": 98,
"image": "certified",
"rating": 9,
"reviews": 54,
"fresh": 53,
"consensus": "A landmark Sergio Leone spaghetti western masterpiece featuring a classic Morricone score.",
"userMeter": 95,
"userRating": 4.3,
"userReviews": 64006
},
"metacritic": 80,
"awards": {
"wins": 4,
"nominations": 5,
"text": "4 wins \u0026 5 nominations."
},
"type": "movie"
},
{
"title": "A Million Ways to Die in the West",
"year": 2014,
"rated": "R",
"runtime": 116,
"countries": [
"USA"
],
"genres": [
"Comedy",
"Western"
],
"director": "Seth MacFarlane",
"writers": [
"Seth MacFarlane",
"Alec Sulkin",
"Wellesley Wild"
],
"actors": [
"Seth MacFarlane",
"Charlize Theron",
"Amanda Seyfried",
"Liam Neeson"
],
"plot": "As a cowardly farmer begins to fall for the mysterious new woman in town, he must put his new-found courage to the test when her husband, a notorious gun-slinger, announces his arrival.",
"poster": "http://ia.media-imdb.com/images/M/MV5BMTQ0NDcyNjg0MV5BMl5BanBnXkFtZTgwMzk4NTA4MTE#._V1_SX300.jpg",
"imdb": {
"id": "tt2557490",
"rating": 6.1,
"votes": 126592
},
"tomato": {
"meter": 33,
"image": "rotten",
"rating": 4.9,
"reviews": 188,
"fresh": 62,
"consensus": "While it offers a few laughs and boasts a talented cast, Seth MacFarlane's overlong, aimless A Million Ways to Die in the West is a disappointingly scattershot affair.",
"userMeter": 40,
"userRating": 3,
"userReviews": 62945
},
"metacritic": 44,
"awards": {
"wins": 0,
"nominations": 6,
"text": "6 nominations."
},
"type": "movie"
}
What you are looking for is:
db.MovieData.aggregate([
{ "$unwind": "$genres" },
{ "$group": {
"_id": "$genres",
"totalWins": { "$sum": "$awards.wins" }
}},
{ "$sort": { "totalWins": -1 } },
{ "$limit": 3 }
])
In short:
$unwind - The genres field is an array, you need that "flattened" in order to use as a "grouping key" for the next stage:
$group - Requires an _id which is the "grouping key" or the value that things are accumulated for. Though not a requirement this is typically paired with accumulators, which perform the "aggregation operations" such as $sum on a supplied field value. Here you want:
{ "$sum": "$awards.wins" }
to accumulate that field.
$sort - Orders those results by the supplied field(s). In this case on the accumulated totalWins and in descending ( -1 ) order.
$limit - Is the number of result documents to limit the return to.
A good place to look for common examples is the SQL to Aggregation Mapping Chart in the core documentation, particularly if you have some working knowledge of SQL or even if you do not as general examples.
All of the Aggregation Pipeline Stages as well as the Aggregation Pipeline Operators also have various usage examples within their own documentation pages as well. Familiarizing yourself with these is useful in understanding how they apply to different problems
Related
I got a collection of 10000 ca. docs, where each doc has the following format:
{
"_id": {
"$oid": "631edc6e207c89b932a70a26"
},
"name": "Ethereum",
"auditInfoList": [
{
"coinId": "1027",
"auditor": "Fairyproof",
"auditStatus": 2,
"reportUrl": "https://www.fairyproof.com/report/Covalent"
}
],
"circulatingSupply": 122335921.0615,
"cmcRank": 2,
"dateAdded": "2015-08-07T00:00:00.000Z",
"id": 1027,
"isActive": 1,
"isAudited": true,
"lastUpdated": 1662969360,
"marketPairCount": 6085,
"quotes": [
{
"name": "USD",
"price": 1737.1982544180462,
"volume24h": 14326453277.535921,
"marketCap": 212521748520.66168,
"percentChange1h": 0.62330307,
"percentChange24h": -1.08847937,
"percentChange7d": 10.96517745,
"lastUpdated": 1662966780,
"percentChange30d": -13.49374496,
"percentChange60d": 58.25153862,
"percentChange90d": 42.27475921,
"fullyDilluttedMarketCap": 212521748520.66,
"marketCapByTotalSupply": 212521748520.66168,
"dominance": 20.0725,
"turnover": 0.0674117,
"ytdPriceChangePercentage": -53.9168
}
],
"selfReportedCirculatingSupply": 0,
"slug": "ethereum",
"symbol": "ETH",
"tags": [
"mineable",
"pow",
"smart-contracts",
"ethereum-ecosystem",
"coinbase-ventures-portfolio",
"three-arrows-capital-portfolio",
"polychain-capital-portfolio",
"binance-labs-portfolio",
"blockchain-capital-portfolio",
"boostvc-portfolio",
"cms-holdings-portfolio",
"dcg-portfolio",
"dragonfly-capital-portfolio",
"electric-capital-portfolio",
"fabric-ventures-portfolio",
"framework-ventures-portfolio",
"hashkey-capital-portfolio",
"kenetic-capital-portfolio",
"huobi-capital-portfolio",
"alameda-research-portfolio",
"a16z-portfolio",
"1confirmation-portfolio",
"winklevoss-capital-portfolio",
"usv-portfolio",
"placeholder-ventures-portfolio",
"pantera-capital-portfolio",
"multicoin-capital-portfolio",
"paradigm-portfolio",
"injective-ecosystem"
],
"totalSupply": 122335921.0615
}
Im pulling updated version of it and, to aviod duplicates, im doing the following by using 'update_one'
for doc in new_doc_list:
CRYPTO_TEMPORARY_LIST.update_one(
{ "name" : doc['name']},
{ "$set": {
"lastUpdated": doc['lastUpdated']
}
},
upsert=True)
The problem is it's too slow.
I'm trying to figure out how to improve speed by using update_many but can't figure out how to set it up.
I Basically want to update every document x name. Completely change the doc and not the "lastUpdated" field would b even better.
Thanks guys <3
I have a hypothetical table with the following information about cost of vehicles, and I am trying to model the data for storing into a Expenses collection in MongoDB:
Category
Item
Cost
Land
Car
1000
Land
Motorbike
500
Air
Plane
2000
Air
Others: Rocket
5000
One assumption for this use case is that the Categorys and Items are fixed fields in the table, while users will fill in the Cost for each specific Item in the table. Should there be other vehicles in the category, users will fill them under "Others".
Currently, of 2 options to store the document:
Option 1 - as a nested object:
[
{
"category": "land",
"items": [
{"name": "Car", "cost": 1000},
{"name": "Motorbike", "cost": 500},
]
}
{
"category": "air",
"items": [
{"name": "Plane", "cost": 2000},
{"name": "Others", remarks: "Rocket", "cost": 5000},
]
}
]
Option 2 - as a flattened array, where the React application will map the array to render the data in the table:
[
{"category": "land", "item": "car", "cost": 1000},
{"category": "land", "item": "motorbike", "cost": 500},
{"category": "air", "item": "plane", "cost": 2000},
{"category": "air", "item": "others", "remarks": "rocket", "cost": 5000},
]
Was hoping to get any suggestions on which is a better approach, or if there is a better approach that you have in mind.
Thanks in advance! :)
In my application, i'm getting an API response as follows
{
"success": true,
"data": {
"items": [
{
"sequenceno": 1933,
"_id": "5eff1",
"chapter": "Numbers and Numeration",
"title": "Place Value: 3-digits",
"package_description": "This learning module helps to familiarise with the concept of place value of 3-digit numbers.",
"age_level": [
99,
8
],
"pkg_sequence": "2501",
"packagescors": {
"score": [
50
],
"date": [
"1600259121340"
]
}
},
{
"sequenceno": 1933,
"_id": "5d79",
"chapter": "Numbers and Numeration",
"title": "Place Value: 4-digits",
"package_description": "This learning module helps the kids familiarise with the concept of Place value of a number.",
"age_level": [
99,
8
],
"pkg_sequence": "2501",
"packagescors": {
"score": [
60
],
"date": [
"1615283866457"
]
}
},
]
The data is modules from a subject with chapters and titles under chapters. Each data has a key named 'chapter'. Whenever the listview is built the data gets duplicated
How can I remove duplicates and properly build the list?
Try out this.
final ids = myList.map((e) => e.id).toSet();
myList.retainWhere((x) => ids.remove(x.id)); // set your response param instead of id.
There are a lot of questions/answers sounding exactly like what I'm looking for but I couldn't find a single one that actually worked for me.
Sample data:
{
"_id": "5daeb61790183fd4d4361d6c",
"orderMessageId": "7563_21",
"orderId": "OS00154",
"orderEntryDate": "2019-06-17T00:00:00.000Z",
"typeOfOrder": "ORD",
"express": false,
"name1": "xxx",
"name2": "xxx",
"name3": " ",
"contact": "IN KOMMISSION",
"street": "xxx",
"city": "xxx",
"zipcode": "1235",
"country": "xx",
"customerId": "51515",
"lnatMarketCode": "Corporate - Regulatory",
"shipmentCarrier": "ABH",
"typeOfShipment": "83",
"typeOfShipmentDescr": "xxx",
"orderTextfield": " ",
"orderTextfield02": " ",
"text2": " ",
"LinVw": [
{
"orderLineMessageId": "OS05451",
"orderLineId": 5,
"articleId": "19200",
"articleDescription": "xxx",
"productId": "OS1902",
"productDescription": "xxx",
"baseQuantityUnit": "EA",
"quantityOrdered": 2,
"isbn": "978357468",
"issn": " ",
"orderSubmissionDate": "2019-06-06T00:00:00.000Z",
"customerPurchaseOrderId": "728188175",
"lnatCustomerIdAtSupplier": " ",
"supplierDeliveryNoteId": " ",
"fulfillmentContactName": "xxxx",
"customerVatRegistrationCode": "AT4151511900",
"listPriceInclVat": 21.4955,
"text": " ",
"orderResponses": [
{
"orderMessageId": "7718677_1",
"orderLineMessageId": "OS0000015451",
"orderId": "OS000154",
"orderLineId": 5,
"articleId": "1911200",
"quantity": 2,
"quantityNotShipped": 0,
"reasonForNotShippedCode": null,
"reasonForNotShipped": null,
"shipmentDate": "2019-10-04T00:00:00.000Z",
"deliveryNoteId": null,
"trackingIds": [
{
"trackingId": null,
"quantityRefToTracking": "2",
"weightRefToTracking": "0.0"
}
],
"type": "orderresponse",
"filepath": "xxxORDERRESP_20191004131209.xml",
"_id": "OS005451"
},
{
"orderMessageId": "753_21",
"orderLineMessageId": "OS015451",
"orderId": "O00154",
"orderLineId": 5,
"articleId": "100200",
"quantity": 0,
"quantityNotShipped": 2,
"reasonForNotShippedCode": "01",
"reasonForNotShipped": "Out of Stock",
"shipmentDate": null,
"deliveryNoteId": null,
"trackingIds": [
{
"trackingId": null,
"quantityRefToTracking": "0",
"weightRefToTracking": "0.0"
}
],
"type": "orderresponse",
"filepath": "xxxxORDERRESP_20190618161529.xml",
"_id": "OS0000015451"
}
]
}
],
"filepath": "xxxxxORDER_7539563_20190618_071522.xml"
}
I want to match all documents, where all documents in the array LinVw, match the following condition:
{'$or': [{'LinVw.orderResponses': {'$exists': False}}, {'LinVw.orderResponses.shipmentDate': {'$type': 10}}]}
To put it in words: I want to match documents, if the array LinVw.orderResponses doesn't exist, or it contains only documents, that don't have a valid shipmentDate.
Currently I have this (using pymongo):
result = order_collection.aggregate([
{"$unwind": "$LinVw"},
{"$match": {'$or': [{'LinVw.orderResponses': {'$exists': False}}, {'LinVw.orderResponses.shipmentDate': {'$type': 10}}]}}
])
But of course this doesn't consider that all documents inside LinVw.orderResponses should match the condition.
Most examples our there don't deal with this kind of nesting and I was unable to rewrite them accordingly.
I would appreciate any help.
You can achieve this by adding a $redact stage.
Inside $redact stage, you write down the query matching documents which you want to ignore(having invalid shipMent date). That's it.
I think I did it:
result = order_collection.aggregate([
{"$unwind": "$LinVw"},
{"$match": {'$or': [{'LinVw.orderResponses': {'$exists': False}}, {'LinVw.orderResponses.shipmentDate': {'$type': 10}}]}},
{"$match": {'LinVw.orderResponses.shipmentDate': {"$not":{'$type': 9}}}},
{"$project":{"_id":0, "LinVw.orderLineMessageId":1, "LinVw.orderResponses":1}}
])
Lets say I have to save records of cloths in mongoDB. Attribute of the cloth is
name
description
style
size
color
condition
brand
brandName
someAttrubute
price
For every cloth price changes for each combination of style and brand. So How do I model this in mongoDB.
So far what I have been thinking is:
{
"name": "A name",
"description": "A typical description",
"style":[
{"size": "XL","color": "red", "condition": "good"},//--style 0
{"size": "XXL","color": "white", "condition": "bad"},//--style 1
//...
{"size": "L","color": "black", "condition": "best"}//--style N
],
"brand":[
{"brandName":"brand0","someAttribute":"Attribute 0"},
{"brandName":"brand1","someAttribute":"Attribute 1"},
{"brandName":"brand2","someAttribute":"Attribute 2"}
],
"price":[
//Every price need to be added for every combination of brand and style
{"style":0,"brand":0,"price": 10},
{"style":0,"brand":1,"price": 20},
{"style":0,"brand":2,"price": 30},
{"style":1,"brand":0,"price": 10},
{"style":1,"brand":1,"price": 20},
//...
{"style":"N","brand":2,"price": 10}
]
}
I don't think this is the right way to do it in mongoDB. How to model this?
I would go like this,
{
"name": "A name",
"description": "A typical description",
"priceGroup" : [
{
"style": {"size": "XL","color": "red", "condition": "good"},
"brand": {"brandName":"brand0","someAttribute":"Attribute 0"}
"price": 10
},
{
"style": {"size": "XXL","color": "white", "condition": "bad"},
"brand": {"brandName":"brand0","someAttribute":"Attribute 0"}
"price": 20
},
{
"style": {"size": "XL","color": "red", "condition": "good"},
"brand": {"brandName":"brand1","someAttribute":"Attribute 1"}
"price": 30
},
{
"style": {"size": "XXL","color": "white", "condition": "bad"},
"brand": {"brandName":"brand1","someAttribute":"Attribute 1"}
"price": 40
},
.....
]
}
But as #Neil Lunn pointed out, while designing nosql schemas, there are no rules as in relational database design concept - no normalization. Hence it is more up to your application and requirements. Put the things that you will be querying all together in a collection, and the others in a different collection.