MongoDB Schema optimization for contacts - mongodb

I am storing my contacts in mongodb like this but main drawback of this schema is I am not able to store 40k-50k contacts in one document due to limit of 16mb.
I want to change my schema now. So can anyone please suggest me best way to redesign this.
Here is my sample doucument
{
"_id" : ObjectId("5c53653451154c6da4623a77"),
"contacts" : [{
name:"",
email:"",
group:[5c53653451154c6da4623a79]
}],
"groups" : [{
_id: ObjectId("5c53653451154c6da4623a79"),
group_name:"test"
}],
}

According to you document sample, contacts belongs to a group.
In that scenario, there are different ways to end up with a better schema:
1- Document embedding:
You will have an array of contacts inside each group document.
collection groups:
{
"_id": ObjectId("5c53653451154c6da4623a79"),
"group_name":"test",
"contacts": [
{
"name":"something",
"email":"something",
},
{
"name":"something else",
"email":"something else",
}
]
}
2- Document referencing:
You will have two collections - contacts and groups - and store a group reference inside each contact.
collection contacts:
{
"_id" : ObjectId("5c53653451154c6da4623a77"),
"name":"something",
"email":"something",
"groups":["5c53653451154c6da4623a79"]
},
{
"_id" : ObjectId("5c536s7df9sd7f987d9s7d98"),
"name":"something else",
"email":"something else",
"groups":["5c53653451154c6da4623a79"]
}
collection groups:
{
"_id": ObjectId("5c53653451154c6da4623a79"),
"group_name":"test"
}
Why are we referencing group inside contact and not the contrary? Because we probably will have more contacts than groups. This way we have smaller documents with smaller "reference arrays".
The path you will follow depends a lot on how many contacts you have per group. If this number is small, I would take the Document Embedding approach, for the sake of simplicity and access easiness. If you have a large number of contacts per group, I would use Document Reference, to have smaller documents.

Related

How to find objectId as an foreign key in mongo

How can I find the value of ObjectId in whole database for any field in mongo ,
it might be use in some collections for various fields as a reference?
"fourk_runs": [{
"Objectid": "6299b9f00f09ff045cc15d"
}],
"fourk_pass_fail": [{
"Objectid": "6299b9f00f09ff045cc152"
}],
"dr_runs": [{
"Objectid": "6299b9f00f09ff045cc154"
}],
I try this command , but it does not work
db.test.find( { $text: { $search: "4c8a331bda76c559ef04" } } )
In MongoDB, the references are mostly used for the normalization process. The references from the children's table (usually an ObjectID) are then embedded in the parent table. When reading the information, the user has to perform several queries in order to retrieve data from multiple collections.
For example, we will take the next two collections, parts and products. A product will contain many parts that will be referenced in the product collection.
parts Collection:
db.parts.findOne()
{
_id : ObjectID('1111'),
partno : '7624-faef-2615',
name : 'bearing',
price: 20,000
}
products Collection:
db.products.findOne()
{
name : 'wheel',
manufacturer : 'BMW',
catalog_number: 1134,
parts : [ // array of references to Part documents
ObjectID('1111'), // reference to the bearing above
ObjectID('3f3g'), // reference to a different Part
ObjectID('234r'),
// etc
]
}
To receive the parts for a particular product, we will first have to fetch the product document identified by the catalog number:
db.products.findOne({catalog_number: 1134});
Then, fetch all the parts that are linked to this product:
db.parts.find({_id: { $in : product.parts } } ).toArray();

Dating app Schema pattern nosql, is array extraction viable?

I'm trying to build a dating app, and for my backend, I'm using a nosql database. When it comes to the user's collection, some relations are happening between documents of the same collection. For example, a user A can like, dislike, or may haven’t had the choice yet. A simple schema for this scenario is the following:
database = {
"users": {
"UserA": {
"_id": "jhas-d01j-ka23-909a",
"name": "userA",
"geo": {
"lat": "",
"log": "",
"perimeter": ""
},
"session": {
"lat": "",
"log": ""
},
"users_accepted": [
"j2jl-564s-po8a-oej2",
"soo2-ap23-d003-dkk2"
],
"users_rejected": [
"jdhs-54sd-sdio-iuiu",
"mbb0-12md-fl23-sdm2",
],
},
"UserB": {...},
"UserC": {...},
"UserD": {...},
"UserE": {...},
"UserF": {...},
"UserG": {...},
},
}
Here userA has a reference from the users it has seen and made a decision, and stores them either in “users_accepted” or “users_rejected”. If User C hasn’t been seen (either liked or disliked) by userA, then it is clear that it won’t appear in both of the arrays. However, these arrays are unbounded and may exceed the max size that a document can handle. One of the approaches may be to extract both of these arrays and create the following schema:
database = {
"users": {
"UserA": {
"_id": "jhas-d01j-ka23-909a",
"name": "userA",
"geo": {
"lat": "",
"log": "",
"perimeter": ""
},
"session": {
"lat": "",
"log": ""
},
},
"UserB": {...},
"UserC": {...},
"UserD": {...},
"UserE": {...},
"UserF": {...},
"UserG": {...},
},
"likes": {
"id_27-82" : {
"user_give_like" : "userB",
"user_receive_like" : "userA"
},
"id_27-83" : {
"user_give_like" : "userA",
"user_receive_like" : "userC"
},
},
"dislikes": {
"id_23-82" : {
"user_give_dislike" : "userA",
"user_receive_dislike" : "userD"
},
"id_23-83" : {
"user_give_dislike" : "userA",
"user_receive_dislike" : "userE"
},
}
}
I need 4 basic queries
Get the users that have liked UserA (Show who is interested in userA)
Get the users that UserA has liked
Get the users that UserA has disliked
Get the matches that UserA has
The query 1. is fairly simple, just query the likes collection and get the users where "user_receive_like" is "userA".
Query 2. and 3. are used to get the users that userA has not seen yet, get the users that are not in query 2. or query 3.
Finally query 4. may be another collection
"matches": {
"match_id_1": {
"user_1": "referece_user1",
"user_2": "referece_user2"
},
"match_id_2": {
"user_1": "referece_user3",
"user_2": "referece_user4"
}
}
Is this approach viable and efficient?
You are right to notice, that these arrays are unbounded and pose a serious scalability problem for your application. If you were to assign 2-3 user roles to a user with the 1st approach it would be totally fine, but it is not the case for you. The official MongoDB documentation suggests that you should not use unbounded arrays: https://www.mongodb.com/docs/atlas/schema-suggestions/avoid-unbounded-arrays/
Your second approach is the superior implementation choice for you, because:
you can build indices of form (user_give_dislike, user_receive_like) which will improve your query performance even in case when you have 1M+ documents
you can store additional metadata, (like timestamps etc) on the likes collection without affecting the design of the user collection
the query for "matches" will be much simpler to write with this approach: https://mongoplayground.net/p/sFRvUniHKn8
More about NoSQL data modelling:
https://www.mongodb.com/docs/manual/data-modeling/ and
https://www.mongodb.com/docs/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
To answer your question , let me write some more assumptions about the domain and then lets try to answer it.
Assumptions:
System should support scale for 100 million users
A single user might like or dislike ~100k users in its lifetime
Also some thoery about nosql , if our queries go to all the shards of the collection then max scale of the system depends on the scale of the single shard
Now with these assumptions see the query performance of the question that you asked :
Get the users that have liked UserA (Show who is interested in userA) -
Assuming we are doing sharding or user_give_like column then if we filter on user_receive_like then it will do query on all shards , which is not the right thing for scalability
Get the users that UserA has liked
This will work fine as we have created shard based on user_give_like
Get the users that UserA has disliked
This will work fine as we have created shard based on user_give_dislike
Get the matches that UserA has
In this case if we do a join between existing users and all users which UserA has liked and disliked this will create a parallel query on all shard and is not scalable when UserA like or dislike has huge count
Now to conclude this dosen't look like a reasonable approach to me.

Index and sort on nested variable field MongoDB

I have a document that looks like this :
{
"_id": "chatID"
"presence": {
"userID1": 1647627240464,
"userID2": 1647227540464
},
}
I need to query for each userID the chats where he is present and order by the timestamp in the presence map.
I am aware that this is probably not the best way to do, before i had 1 element per user meaning duplicating the chatIDs, but it's a pain to update them all because it would look like :
{
"_id": "userID1chatID",
"at": 1647627240464,
"ids": "30EYwO01_Nyq7dMqe_O3vfL3AH",
"members": ["userID1", "userID2", "userID3"],
"owner": "userID1",
"present": true,
"uid": "chatID",
"url": "databaseURL"
}
This would allow me to find the chats where userID1 is present: true and order by at DESCENDING.
The problem with this is that i need to update the at attribute for all the documents (one per user) for this same chat room.
How can i do this same query while maintaining a single document with present as a map ?
Problem : the index would be on a variable : userID1, userID2, etc...
like : present.userID1 and seems to not be convenient for use when userID1 can be removed from the present map if the user leaves the chat.
Please let me know if this is unclear, thanks in advance.

Querying MongoDB (Using Edge Collection - The most efficient way?)

I've written Users, Clubs and Followers collections for the sake of an example the below.
I want to find all user documents from the Users collection that are following "A famous club". How can I find those? and Which way is the fastest?
More info about 'what do I want to do - Edge collections'
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA"
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}
Followers collection
{
"_id": "159",
"user_id": "1",
"club_id": "12"
}
PS: I can get the documents using Mongoose like the below way. However, creating followers array takes about 8 seconds with 150.000 records. And second find query -which is queried using followers array- takes about 40 seconds. Is it normal?
Clubs.find(
{ club_id: "12" },
'-_id user_id', // select only one field to better perf.
function(err, docs){
var followers = [];
docs.forEach(function(item){
followers.push(item.user_id)
})
Users.find(
{ _id:{ $in: followers } },
function(error, users) {
console.log(users) // RESULTS
})
})
There is no an eligible formula to manipulate join many-to-many relation on MongoDB. So I combined collections as embedded documents like the below. But the most important taks in this case creating indexes. For instance if you want to query by followingClubs you should create an index like schema.index({ 'followingClubs._id':1 }) using Mongoose. And if you want to query country and followingClubs you should create another index like schema.index({ 'country':1, 'followingClubs._id':1 })
Pay attention when working with Embedded Documents: http://askasya.com/post/largeembeddedarrays
Then you can get your documents fastly. I've tried to get count of 150.000 records using this way it took only 1 second. It's enough for me...
ps: we musn't forget that in my tests my Users collection has never experienced any data fragmentation. Therefore my queries may demonstrated good performance. Especially, followingClubs array of embedded documents.
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA",
"followingClubs": [ {"_id": "12"} ]
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}

Adding indexes in mogodb

I currently have a mongodb database which is pretty unstructured. I am attempting to extract all the followers of a given set of profiles on twitter. My database looks like this:
{'123':1
'123':2
'123':3
'567':8
'567':9
}
Where each key is a user and the value is a single follower. When I attempt to create an index on these keys, I simply run out of the available index as I have a lot of users (8 million). After googling, I find that the maximum number of indexes I can have is about 64. How do I create a proper indexing on this database? OR would you suggest a different way for me to store my data?
You should structure your data differently.
I would recommend you to have a collection of "user" documents, where every user has an array "followers". This array should be filled with unique identifiers of the users who follow (like name, _id or your own ID number).
{ name: "userA",
followers: [
"userB",
"userC"
]
},
{ name: "userB",
followers: [
"userD",
"userF"
]
},
You can then create an index on the followers field to quickly find all users who follow an other user. When you want to find all users who follow the users "userX", "userY" and "userZ", you would then do it with this query:
db.users.find({followers: { $all: ["userX", "userY", "userZ" ] } });
Edit:
To add a follower to a user, use the $push operator:
db.users.update({name:"userA"}, { $push: { followers: "userB" } } );
The $pull operator can be used to remove array enries:
db.users.update({name:"userA"}, { $pull: { followers: "userB" } } );