Let's imagine a mongo collection of - let's say magazines. For some reason, we've ended up storing each issue of the magazine as a separate document. Each article is a subdocument inside an Articles-array, and the authors of each article is represented as a subdocument inside the Writers-array on the Article-subdocument. Only the name and email of the author is stored inside the article, but there is an Writers-array on the magazine level containing more information about each author.
{
"Title": "The Magazine",
"Articles": [
{
"Title": "Mongo Queries 101",
"Summary": ".....",
"Writers": [
{
"Name": "tom",
"Email": "tom#example.com"
},
{
"Name": "anna",
"Email": "anna#example.com"
}
]
},
{
"Title": "Why not SQL instead?",
"Summary": ".....",
"Writers": [
{
"Name": "mike",
"Email": "mike#example.com"
},
{
"Name": "anna",
"Email": "anna#example.com"
}
]
}
],
"Writers": [
{
"Name": "tom",
"Email": "tom#example.com",
"Web": "tom.example.com"
},
{
"Name": "mike",
"Email": "mike#example.com",
"Web": "mike.example.com"
},
{
"Name": "anna",
"Email": "anna#example.com",
"Web": "anna.example.com"
}
]
}
How can one author be completely removed from a magazines?
Finding magazines where the unwanted author exist is quite easy. The problem is pulling the author out of all the sub documents.
MongoDB 3.6 introduces some new placeholder operators, $[] and $[<identity>], and I suspect these could be used with either $pull or $pullAll, but so far, I haven't had any success.
Is it possible to do this in one go? Or at least no more than two? One query for removing the author from all the articles, and one for removing the biography from the magazine?
You can try below query.
db.col.update(
{},
{"$pull":{
"Articles.$[].Writers":{"Name": "tom","Email": "tom#example.com"},
"Writers":{"Name": "tom","Email": "tom#example.com"}
}},
{"multi":true}
);
Related
Here is my db collection in mongoDB. I am using mongoose to update the data inside the comments array
{
"author": {
"email": "user#example.com",
"userName": "John"
},
"_id": "63bc20741475b40323d6259f",
"title": "this is blog",
"description": "description",
"blogBanner": "https://github.com//routers/userRouter.js",
"views": "0",
"comments": [
{
"userId": "63b919840ae5303938fb1c17",
"comment": "first comment",
"_id": "_id:63bbdbdb7018eabb752c0e58
},
{
"userId": "63b919840ae5303938fb1c17",
"comment": "second comment",
"_id": "63bbdbdb7018eabb752ce55"
}
],
"reaction": [
{
"userEmail": "gias#gmail.com",
"react": "love"
}
],
"date": "2023-01-09T14:09:32.810Z",
"__v": 0
}
here i want to update the first comment by using comment._id: 63bbdbdb7018eabb752c0e58
how can i update the single comment by using comment._id ;
use-case : when user want to update his/her comment by using comment._id
here i want to update the first comment by using comment._id: 63bbdbdb7018eabb752c0e58
how can i update the single comment by using comment._id;
use-case : when user want to update his/her comment by using comment._id
Please, read the following docs:
https://www.mongodb.com/docs/manual/reference/operator/update/positional/#update-documents-in-an-array
Model.update(
{
'_id': '63bc20741475b40323d6259f',
'comments._id':'63bbdbdb7018eabb752ce55'
},
{ $set: { "comments.$.comment" : 'Edited Text' } }
)
I've seen 2 main types of schema for subdocuments:
{
"cbill#boogiemail:com": {
"outbound": [
{
"name": "First",
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "active"
},
"data": {
}
},
{
"name": "Second",
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "draft"
},
"data": {
}
}
],
"inbound" : [
{
"name": "First",
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "active"
},
"data": {
}
},
{
"name": "Second",
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "draft"
},
"data": {
}
}
]
}
}
The alternative structure is:
{
"cbill#boogiemail:com": {
"outbound": {
"First": {
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "active"
},
"data": {
}
},
"Second": {
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "draft"
},
"data": {
}
}
},
"inbound" : {
"First": {
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "active"
},
"data": {
}
},
"Second": {
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "draft"
},
"data": {
}
}
}
}
}
The main difference between the two is the structure of the inbound/outbound subdocuments.
What is the best practice for Mongo DB subdocument structures?
And in each case, what query would get me the subdocument pointed to by:
cbill#boogiemail:com.inbound.Second ?
To add a bit more information:
The collection will have many different documents starting with different email addresses, but each document in the collection will only have a few subdocuments under the inbound/outbound keys.
You want to structure your collections and documents in a way that reflects how you intend to use the data. If you're going to do a lot of complex queries, especially with subdocuments, you might find it easier to split your documents up into separate collections. An example of this would be splitting comments from blog posts.
Your comments could be stored as an array of subdocuments:
# Example post document with comment subdocuments
{
title: 'How to Mongo!'
content: 'So I want to talk about MongoDB.',
comments: [
{
author: 'Renold',
content: 'This post, it's amazing.'
},
...
]
}
This might cause problems, though, if you want to do complex queries on just comments (e.g. picking the most recent comments from all posts or getting all comments by one author.) If you plan on making these complex queries, you'd be better off creating two collections: one for comments and the other for posts.
# Example post document with "ForeignKeys" to comment documents
{
_id: ObjectId("50c21579c5f2c80000000000"),
title: 'How to Mongo!',
content: 'So I want to talk about MongoDB.',
comments: [
ObjectId("50c21579c5f2c80000000001"),
ObjectId("50c21579c5f2c80000000002"),
...
]
}
# Example comment document with a "ForeignKey" to a post document
{
_id: ObjectId("50c21579c5f2c80000000001"),
post_id: ObjectId("50c21579c5f2c80000000000"),
title: 'Renold',
content: 'This post, it's amazing.'
}
This is similar to how you'd store "ForeignKeys" in a relational database. Normalizing your documents like this makes for querying both comments and posts easy. Also, since you're breaking up your documents, each document will take up less memory. The trade-off, though, is you have to maintain the ObjectId references whenever there's a change to either document (e.g. when you insert/update/delete a comment or post.) And since there are no event hooks in Mongo, you have to do all this maintenance in your application.
On the other-hand, if you don't plan on doing any complex queries on a document's subdocuments, you might benefit from storing monolithic objects. For instance, a user's preferences isn't something you're likely to make queries for:
# Example user document with address subdocument
{
ObjectId("50c21579c5f2c800000000421"),
name: 'Howard',
password: 'naughtysecret',
address: {
state: 'FL',
city: 'Gainesville',
zip: 32608
}
}
Found the answer from here (https://www.tutorialspoint.com/how-to-select-a-specific-subdocument-in-mongodb) after some slight modifications to that.
The query for the second example (which was the one that I was most interested in) was:
find({ "cbill#boogiemail:com.inbound": {$exists: true}},{"cbill#boogiemail:com.inbound.Second":1}).pretty()
This results in:
{
"_id" : ObjectId("6216a9940b84b1a642cb925e"),
"cbill#boogiemail:com" : {
"inbound" : {
"Second" : {
"state" : {
"saved" : "cbill#boogiemail.com",
"edited" : "connie#boogiemail.com",
"status" : "draft"
},
"data" : {
}
}
}
}
}
Whether this is the most efficient query I'm not sure - feel free to post any better alternatives.
let's say I have a collection like so:
{
"id": "2902-48239-42389-83294",
"data": {
"location": [
{
"country": "Italy",
"city": "Rome"
}
],
"time": [
{
"timestamp": "1626298659",
"data":"2020-12-24 09:42:30"
}
],
"details": [
{
"timestamp": "1626298659",
"data": {
"url": "https://example.com",
"name": "John Doe",
"email": "john#doe.com"
}
},
{
"timestamp": "1626298652",
"data": {
"url": "https://www.myexample.com",
"name": "John Doe",
"email": "doe#john.com"
}
},
{
"timestamp": "1626298652",
"data": {
"url": "http://example.com/sub/directory",
"name": "John Doe",
"email": "doe#johnson.com"
}
}
]
}
}
Now the main focus is on the array of subdocument("data.details"): I want to get output only of relevant matches e.g:
db.info.find({"data.details.data.url": "example.com"})
How can I get a match for all "data.details.data.url" contains "example.com" but won't match with "myexample.com". When I do it with $regex I get too many results, so if I query for "example.com" it also return "myexample.com"
Even when I do get partial results (with $match), It's very slow. I tried this aggregation stages:
{ $unwind: "$data.details" },
{
$match: {
"data.details.data.url": /.*example.com.*/,
},
},
{
$project: {
id: 1,
"data.details.data.url": 1,
"data.details.data.email": 1,
},
},
I really don't understand the pattern, with $match, sometimes Mongo do recognize prefixes like "https://" or "https://www." and sometime it does not.
More info:
My collection has dozens of GB, I created two indexes:
Compound like so:
"data.details.data.url": 1,
"data.details.data.email": 1
Text Index:
"data.details.data.url": "text",
"data.details.data.email": "text"
It did improve the query performance but not enough and I still have this issue with the $match vs $regex. Thanks for helpers!
Your mistake is in the regex. It matches all URLs because the substring example.com is in all URLs. For example: https://www.myexample.com matches the bolded part.
To avoid this you have to use another regex, for example that just start with that domain.
For example:
(http[s]?:\/\/|www\.)YOUR_SEARCH
will check that what you are searching for is behind an http:// or www. marks.
https://regex101.com/r/M4OLw1/1
I leave you the full query.
[
{
'$unwind': {
'path': '$data.details'
}
}, {
'$match': {
'data.details.data.url': /(http[s]?:\/\/|www\.)example\.com/)
}
}
]
Note: you must scape special characters from the regex. A dot matches any character and the slash will close your regex causing an error.
Using MongoDB for storage, if I wanted to represent a tree structure of nodes, where child nodes under a single parent always have unique node-names, I believe the standard approach would be to use collections and to manage the node name uniqueness on the app level:
Approach 1: Collection Based Approach for Tree Data
{ "node_name": "home", "title": "Home", "children": [
{ "node_name": "products", "title": "Products", "children": [
{ "node_name": "electronics", "title": "Electronics", "children": [ ] },
{ "node_name": "toys", "title": "Toys", "children": [ ] } ] },
{ "node_name": "services", "title": "Services", "children": [
{ "node_name": "repair", "title": "Repair", "children": [ ] },
{ "node_name": "training", "title": "Training"", "children": [ ] } ] } ] }
I have however thought of the following alternate approach, where node-names become "Object Map" field names, and we do without collections altogether:
Approach 2: Object-Map Based Approach (without Collections)
// NOTE: We don't have the equivalent of "none_name":"home" at the root, but that's not an issue in this case
{ "title": "Home", "children": {
"products": { "title": "Products", children": {
"electronics": { "title": "Electronics", "children": { } },
"toys": { "title": "Toys", "children": { } } } },
"services": { "title": "Services", children": {
"repair": { "title": "Repair", "children": { } },
"training": { "title": "Training", "children": { } } } } } }
The question is:
Strictly from a MongoDB perspective (considering querying, performance, data maintainability and data-size and server scaling), are there any major issues with Approach #2 (over #1)?
EDIT: After getting to know MongoDB a bit better (and thanks to Neil's comments below), I realized that both options of this question are generally the wrong way to go, because they assume that it makes sense to store multiple nodes in a single MongoDB document. Ultimately, each "node" should be a separate document and (as Neil Lunn stated in the comments) there are various ways to implement a hierarchy tree, as seen here: Model Tree Structures in MongoDB
I think this use-case is not good for Mongo DB, because:
there's(MongoDB 2.6) no compress algorithm (your documents will be too large)
Mongo DB use database-level locks (when you want one large document, all DB operations will be blocked)
it will be hard to index
I think better solution will be some relational DB for this use-case.
I have some documents in the "company" collection structured this way :
[
{
"company_name": "Company 1",
"contacts": {
"main": {
"email": "main#company1.com",
"name": "Mainuser"
},
"store1": {
"email": "store1#company1.com",
"name": "Store1 user"
},
"store2": {
"email": "store2#company1.com",
"name": "Store2 user"
}
}
},
{
"company_name": "Company 2",
"contacts": {
"main": {
"email": "main#company2.com",
"name": "Mainuser"
},
"store1": {
"email": "store1#company2.com",
"name": "Store1 user"
},
"store2": {
"email": "store2#company2.com",
"name": "Store2 user"
}
}
}
]
I'm trying to retrieve the doc that have store1#company2.com as a contact but cannot find how to query a specific value of a specific propertie of an "indexed" list of objects.
My feeling is that the contacts lists should not not be indexed resulting in the following structure :
{
"company_name": "Company 1",
"contacts": [
{
"email": "main#company1.com",
"name": "Mainuser",
"label": "main"
},
{
"email": "store1#company1.com",
"name": "Store1 user",
"label": "store1"
},
{
"email": "store2#company1.com",
"name": "Store2 user",
"label": "store2"
}
]
}
This way I can retrieve matching documents through the following request :
db.company.find({"contacts.email":"main#company1.com"})
But is there anyway to do a similar request on document using the previous structure ?
Thanks a lot for your answers!
P.S. : same question for documents structured this way :
{
"company_name": "Company 1",
"contacts": {
"0": {
"email": "main#company1.com",
"name": "Mainuser"
},
"4": {
"email": "store1#company1.com",
"name": "Store1 user"
},
"1": {
"email": "store2#company1.com",
"name": "Store2 user"
}
}
}
Short answer: yes, they can be queried but it's probably not what you want and it's not going to be really efficient.
The document structure in the first and third block is basically the same - you have an embedded document. The only difference between are the name of the keys in the contacts object.
To query document with that kind of structure you will have to do a query like this:
db.company.find({ $or : [
{"contacts.main.email":"main#company1.com"},
{"contacts.store1.email":"main#company1.com"},
{"contacts.store2.email":"main#company1.com"}
]});
This query will not be efficient, especially if you have a lot of keys in the contacts object. Also, creating a query will be unnecessarily difficult and error prone.
The second document structure, with an array of embedded objects, is optimal. You can create a multikey index on the contacts array which will make your query faster. The bonus is that you can use a short and simple query.
I think the easiest is really to shape your document using the structure describe in your 2nd example : (I have not fixed the JSON)
{
"company_name": "Company 1",
"contacts":{[
{"email":"main#company1.com","name":"Mainuser", "label": "main", ...}
{"email":"store1#company1.com","name":"Store1 user", "label": "store1",...}
{"email":"store2#company1.com","name":"Store2 user", "label": "store2",...}
]}
}
like that you can easily query on email independently of the "label".
So if you really want to use the other structure, (but you need to fix the JSON too) you will have to write more complex code/aggregation pipeline, since we do not know the name and number of attributes when querying the system. Theses structures are also probably hard to use by the developers independently of MongoDB queries.
Since it was not clear let me show what I have in mind
db.company.save(
{
"company_name": "Company 1",
"contacts":[
{"email":"main#company1.com","name":"Mainuser", "label": "main"},
{"email":"store1#company1.com","name":"Store1 user", "label": "store1"},
{"email":"store2#company1.com","name":"Store2 user", "label": "store2"}
]
}
);
db.company.save(
{
"company_name": "Company 2",
"contacts":[
{"email":"main#company2.com","name":"Mainuser", "label": "main"},
{"email":"store1#company2.com","name":"Store1 user", "label": "store1"},
{"email":"store2#company2.com","name":"Store2 user", "label": "store2"}
]
}
);
db.company.ensureIndex( { "contacts.email" : 1 } );
db.company.find( { "contacts.email" : "store1#company2.com" } );
This allows you to store many emails, and query with an index.