Is updating Embedded Documents in MongoDB a Manual process? - mongodb

I am not overly familiar with Mongodb yet , but I have a question about embedded documents.
I have seen a number of posts which show you how to update embedded documents through some update query.
My question is this: If I have a collection with embedded documents - which is denormalised for performance ; and one of the embedded documents changes, then do I need to manually update all the embedded documents or is there some way of specifying the link in MongoDB to Auto-Update?
For Example:
An Order record might look like the structure below. Note there is a Product item in one of the rows.
Lets say the ItemName field changed to "Product1a" in the product from a different collection and I want to update the product in every single order where this exists. Is that a manual process - or is there a way od setting it up in Mongodb to auto-update embedded documents?
{
"id": "ccc1beb1-e022-11e9-97f0-e7e789106ab2",
"type": "order",
"orderNumber": "ORD-100209857x",
"orderDate": "2019-09-26T17:42:31.000+12:00",
"orderItems": [
{
"discount": 0,
"price": 24.4944,
"product": {
"id": "ccc1beb1-e022-11e9-97f0-e7e789106ab2",
"itemNumber": "prd1",
"itemName": "Product1"
},
"qty": 4,
"rowTotal": 97.96,
"taxAmount": 9.8
},
{
"discount": 0,
"price": 3.21,
"itemName": "Shipping",
"qty": 1,
"rowTotal": 3.21,
"taxAmount": 0
}
]
}

Not sure what you mean by manual process, but here is some sample code to update all the documents
db.collection.updateMany({}, {$set:{"orderItems.product.itemName": "updatedProductName"}})
Let me know if this is not what you are looking for.

Related

Cursor-based pagination without `skip()` based on frequently dynamically updated field without skipping documents

The context
I have a MongoDB collection, items, that looks like this:
{
"_id": ObjectId(...),
"score": 42,
"data": "some text"
},
{
"_id": ObjectId(...),
"score": 95,
"data": "some text"
},
{
"_id": ObjectId(...),
"score": 1841,
"data": "some text"
},
{
"_id": ObjectId(...),
"score": 11,
"data": "some text"
},
It has potentially 50,000+ documents inside it, where the score field changes dynamically very frequently (it's a vote tally that records user's upvotes and downvotes).
What I need to do
I'm trying to infinitely paginate through this collection, sorting documents by the highest score, loading them sequentially, highest score to lowest, likely in bunches of ~25 at a time.
The only current way I know how
Use skip to provide an offset based on the last document I've loaded each call to the database, and only load new documents that have a score less than the last document's. The downside to this is that if I have multiple documents with the same score as the last seen one, I'd skip them when I only load new ones with a score less than the last seen one.
Additionally, I've read using skip() is extremely inefficient.
Conclusion
Do I have to use this inefficient solution, that would also result in me skipping documents?
Is there a better way?

Remove duplicates from MongoDB 4.0

I am trying to remove duplicates from MongoDB but all solutions find fail. Given the current JSON structure:
{
"_id": { "$oid": "5cee31bbca8a185b76a692db" },
"date": { "$date": "2018-10-07T19:11:38.000Z" },
"id": "1049014405130858496",
"username": "chrisoldcorn",
"text": "“The #UK can rest now. The Orange Buffoon is back in his xenophobic #WhiteHouse!” #news #politics #trump #populist #uspoli #ukpolitics #ukpoli #london #scotland #TrumpBaby #usa #america #canada #eu #europe #brexit #maga #msm #gop #elections #election2018 https://medium.com/#chrisoldcorn/trump-babys-uk-visit-a-reflection-1c2aa4ad942 …pic.twitter.com/Y6Yihs9g6K",
"retweets": 1,
"favorites": 0,
"mentions": "#chrisoldcorn",
"hashtags": "#UK #WhiteHouse #news #politics #trump #populist #uspoli #ukpolitics #ukpoli #london #scotland #TrumpBaby #usa #america #canada #eu #europe #brexit #maga #msm #gop #elections #election2018",
"geo": "",
"replies": 0,
"to": null,
"lan": "en"
}
I need to remove all duplicates based on field "id" in the file.
I have tried db.tweets.ensureIndex( { id:1 }, { unique:true, dropDups:true } ) but I am not sure this is the correct way. I obtain this output:
Can anyone help me?
It looks like you are running a MongoDB with version >3.0 and hence cannot remove dups by ensuring an index
According to the docs:
Changed in version 3.0: The dropDups option is no longer available.
The fastest way to do this would be to
Create a Dump
Drop the collection
Create the new Index
Restore the Dump
All duplicate documents will be dropped during the restore insert
The next best solution will be to run a script to collect all duplicate Ids and remove them

How to connect to collections by nested fields in MongoDB

I am struggling with some query in MongoDB. Let's say I have standings collection which looks like
{
"competitions: {id: "1", name:"someLeague"},
"standings": [
{
"type": "TOTAL",
"table": [
{
"position": "1",
"team": {
"id": "123",
"name": "XYZ"
},
won: "1",
draw: "2",
lost: "3",
points: "4",
},
{
"position": "2",
"team": {
"id": "321",
"name": "ABC"
}
...
And the fixtures collection which looks like
{
matchDay: "YYYY-MM-DD",
homeTeam: {id: "123", name:"ABC"},
awayTeam: {id: "321", name:"XYZ"},
}
Is it possible to connect this two collection this way that field "homeTeam" in fixtures collection will contain all information including points, won games etc. from standings where type would be total? And same thing with the field awayTeam, with the proviso that information of team would be from array where standings type is away.
There is no means in MongoDB to reference a document of collection A in collection B so that find queries on collection B automatically provide attributes of the referenced document. However, as of MongoDB 3.2 it is possible to use $lookup command as part of an aggregation (see https://stackoverflow.com/a/33511166/3976662) to JOIN (similar to standard SQL) over multiple collections during the query. In your case, you can consider using $lookup in conjunction with $unwind - similar to the example in the MongoDB docs. Spring Data Mongo supports $lookup since 1.10.

mongoDB - find first x documents, where rolling sum of their fields exceeds certain value

I have a mongoDB collection of documents like this:
{
"_id": 1,
"size": 10,
"name": "ABCD"
}
I would like to:
Sort them by "name" in ascending order
Return however many first documents from the result, where their cumulative "size" will be greater or equal to 100
I have briefly looked into $redact stage of aggregation framework, but I can't figure out whether I can store the cumulative sum outside the document. What would be the best approach to solve this problem?
EDIT:
An example collection:
{ "name": "AAAA", "size": 2}
{ "name": "BBBB", "size": 4}
{ "name": "CCCC", "size": 3}
So the query would be designed to return the first X documents, in order of their appearance, when their cumulative size reaches 6.
So output will be (because 2+4 is 6):
{ "name": "AAAA", "size": 2}
{ "name": "BBBB", "size": 4}
The only thing I can think of is to use the Cursor on the application level, and keep adding documents to result set, incrementing the "size" counter by value in the document. But is there a way to do that using Aggregation framework, for example?
EDIT2:
I also came across the 'rolling sum' terminology and using map-reduce. Sadly, in my case I would want the map-reduce operation to terminate when a global scope variable gets to or over a certain value, and I don't think it's possible (mapReduce will go over all documents fed to it at the outset).

How to join two collection in mongo without lookup

I have two collection, there name are post and comment.
The model structure is in the following.
I want to use aggregation query post and sort by comments like length sum, currently I can query a post comments like length sum in the following query statement.
My question is how can I query post and join comment collection in Mongo version 2.6. I know after Mongo 3.2 have a lookup function.
I want to query post collection and sort by foreign comments likes length. Is it have a best way to do this in mongo 2.6?
post
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
}
comment
/* 1 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello world",
"like": [
"2"
]
}
/* 2 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello stackoverflow",
"like": [
"1",
"2"
]
}
Query a post comments like sum
db.getCollection('comment').aggregate([
{
"$match": {
post_id: "5a39e22c27308912334b4567"
}
},
{
"$project": {
"likeLength": {
"$size": "$like"
},
"post_id": "$post_id"
}
},
{
"$group": {
_id: "$post_id",
"likeLengthSum": {
"$sum": "$likeLength"
}
}
}
])
There is no "best" way to query, as it'll really depend on your specific needs, but... you cannot perform a single query across multiple collections (aside from the $lookup aggregation pipeline function in later versions, as you already are aware).
You'll need to make multiple queries: one to your post collection, and one to your comment collection.
If you must perform a single query, then consider storing both types of documents in a single collection (with some identifier property to let you filter on either posts or comments, within your query).
There is no other way to join collections in the current MongoDB v6 without $lookup,
I can predict two reasons that causing you the issues,
The $lookup is slow and expensive - How to improve performance?
$lookup optimization:
Follow the guideline provided in the documentation
Use indexs:
You can use the index on the reference collection's fields, as per your sample data you can create an index for post_id field, an index for uid field, or a compound index for both the fields on the basis of your use cases
You can read more about How to Improve Performance with Indexes and Document Filters
db.comment.createIndex({ "post_id": -1 });
db.comment.createIndex({ "uid": -1 });
// or
db.comment.createIndex({ "post_id": -1, "uid": -1 });
Document Filters:
Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline
You can refer to the documentation for more detailed examples
{ $skip: 0 },
{ $limit: 10 } // as per your use case
Limit the $lookup result:
Try to limit the result of lookup by $limit stage,
Try to coordinate or balance with improved query and the UI/Use cases
You want to avoid $lookup - How to improve the collection schema to avoid $lookup?
Store the analytics/metrics:
If you are trying to get the total counts of the comments in a particular post then you must store the total count in the post collection whenever you get post get a new comment
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10
}
Store minimum reference data:
If you want to show the comments of a particular post, you can limit the result for ex: show 5 comments per post
You can also store a max of 5 latest comments in the post collection to avoid the $lookup, whenever you get the latest comment then add it and just remove the oldest comment from 5 comments
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10,
"comments": [
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"comment": "hello world"
},
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"comment": "hello stackoverflow"
}
]
}
Must read about Reduce $lookup Operations
Must read about Improve Your Schema