Index and sort on nested variable field MongoDB - mongodb

I have a document that looks like this :
{
"_id": "chatID"
"presence": {
"userID1": 1647627240464,
"userID2": 1647227540464
},
}
I need to query for each userID the chats where he is present and order by the timestamp in the presence map.
I am aware that this is probably not the best way to do, before i had 1 element per user meaning duplicating the chatIDs, but it's a pain to update them all because it would look like :
{
"_id": "userID1chatID",
"at": 1647627240464,
"ids": "30EYwO01_Nyq7dMqe_O3vfL3AH",
"members": ["userID1", "userID2", "userID3"],
"owner": "userID1",
"present": true,
"uid": "chatID",
"url": "databaseURL"
}
This would allow me to find the chats where userID1 is present: true and order by at DESCENDING.
The problem with this is that i need to update the at attribute for all the documents (one per user) for this same chat room.
How can i do this same query while maintaining a single document with present as a map ?
Problem : the index would be on a variable : userID1, userID2, etc...
like : present.userID1 and seems to not be convenient for use when userID1 can be removed from the present map if the user leaves the chat.
Please let me know if this is unclear, thanks in advance.

Related

MongoDB not using Index on simple find

I have a collection called "EN" and I created an index as follow:
db.EN.createIndex( { "Prod_id": 1 } );
When I run db.EN.getIndexes() I get this:
[{ "v": 2, "key": {
"_id": 1 }, "name": "_id_" }, { "v": 2, "key": {
"Prod_id": 1 }, "name": "Prod_id_1" }]
However, when I run the following query:
db.EN.find({'Icecat-interface.Product.#Prod_id':'ABCD'})
.explain()
I get this:
{ "explainVersion": "1", "queryPlanner": {
"namespace": "Icecat.EN",
"indexFilterSet": false,
"parsedQuery": {
"ICECAT-interface.Product.Prod_id": {
"$eq": "ABCD"
}
},
"queryHash": "D12BE22E",
"planCacheKey": "9F077ED2",
"maxIndexedOrSolutionsReached": false,
"maxIndexedAndSolutionsReached": false,
"maxScansToExplodeReached": false,
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"ICECAT-interface.Product.Prod_id": {
"$eq": "ABCD"
}
},
"direction": "forward"
},
"rejectedPlans": [] }, "command": {
"find": "EN",
"filter": {
"ICECAT-interface.Product.Prod_id": "ABCD"
},
"batchSize": 1000,
"projection": {},
"$readPreference": {
"mode": "primary"
},
"$db": "Icecat" }, "serverInfo": {
It's using COLLSCAN instead of the index, why is this happening?
MongoDB version is 5.0.9-8
Thanks
EDIT (and solution)
It turns that the field name has "#" in front and the index was created without this character so was not picking it up at all.
Once I created a new index using the field name as it was supposed to be it worked OK.
It was interesting though to see how indexing works and best practices
Your find operation is defined as
.find({'Icecat-interface.Product.#Prod_id':'ABCD'})
What is Icecat-interface.Product.#?
The parsedQuery in the explain output confirms that MongoDB is attempting to look for a document that has has a value of "ABCD" for a different field name than the one you have aindexed. From the explain you've provided, that field name is "ICECAT-interface.Product.Prod_id". As the field name being queried and the one that is indexed are different, MongoDB cannot use the index to perform the operation.
Marginally related, the # character that is used in the find is absent in the explain output. This appears to because the actual operation that was used to generate the explain was slightly different. This is also noticeable by the fact that the explain include a batchSize of 1000 which is absent in the operation that was shown as the one being explained.
Depending on what the Icecat-interface.Product.# prefix is supposed to be, the solution is probably to simply remove that from the query predicate in the find itself.
Edit to respond to the comment and the edit to the question. Regarding the comment first:
When I run this: .find({'Prod_id':'ABCD'}) it uses COLLSCAN which to me is wrong, as I have an index on that field, unless I'm missing something here
MongoDB will look to use an index if its first key is used by the query. So an index on { y: 1 } would not be eligible for use by a query of .find({ x: 1}). Similarly to a generic x and y example, Icecat-interface.Product.Prod_id and Prod_id are different field names. So if you query on one but only an index on the other exists, then a collection scan is the only way for the database to execute the query.
This then overlaps some with the edit to the question. In the edited question the new explain plan shows the database successfully using an index. However, that index is { "ICECAT-interface.Product.Prod_id": 1 } which is not the index that you originally show being created or present on the collection ({ "Prod_id": 1 }).
Moreover, you also mention that you "don't get any result back, even with products I know are in the DB". Which field in the database contains the value that you are searching on ('ABCD')? This is going to directly inform what results you get back and what index is used to find the results. Remember that you can search on any arbitrary field in MongoDB, even if it doesn't exist in the database.
I would recommend some extra attention be paid to the namespaces and field names that are being used. Unless this { "ICECAT-interface.Product.Prod_id": 1 } index was created after the db.EN.getIndexes() output was gathered, you may be inadvertently connecting to different systems or namespaces since that index is definitely present somewhere.
Based on your live comments while I'm writing this, seems like you've solved the field name mystery.

Dating app Schema pattern nosql, is array extraction viable?

I'm trying to build a dating app, and for my backend, I'm using a nosql database. When it comes to the user's collection, some relations are happening between documents of the same collection. For example, a user A can like, dislike, or may haven’t had the choice yet. A simple schema for this scenario is the following:
database = {
"users": {
"UserA": {
"_id": "jhas-d01j-ka23-909a",
"name": "userA",
"geo": {
"lat": "",
"log": "",
"perimeter": ""
},
"session": {
"lat": "",
"log": ""
},
"users_accepted": [
"j2jl-564s-po8a-oej2",
"soo2-ap23-d003-dkk2"
],
"users_rejected": [
"jdhs-54sd-sdio-iuiu",
"mbb0-12md-fl23-sdm2",
],
},
"UserB": {...},
"UserC": {...},
"UserD": {...},
"UserE": {...},
"UserF": {...},
"UserG": {...},
},
}
Here userA has a reference from the users it has seen and made a decision, and stores them either in “users_accepted” or “users_rejected”. If User C hasn’t been seen (either liked or disliked) by userA, then it is clear that it won’t appear in both of the arrays. However, these arrays are unbounded and may exceed the max size that a document can handle. One of the approaches may be to extract both of these arrays and create the following schema:
database = {
"users": {
"UserA": {
"_id": "jhas-d01j-ka23-909a",
"name": "userA",
"geo": {
"lat": "",
"log": "",
"perimeter": ""
},
"session": {
"lat": "",
"log": ""
},
},
"UserB": {...},
"UserC": {...},
"UserD": {...},
"UserE": {...},
"UserF": {...},
"UserG": {...},
},
"likes": {
"id_27-82" : {
"user_give_like" : "userB",
"user_receive_like" : "userA"
},
"id_27-83" : {
"user_give_like" : "userA",
"user_receive_like" : "userC"
},
},
"dislikes": {
"id_23-82" : {
"user_give_dislike" : "userA",
"user_receive_dislike" : "userD"
},
"id_23-83" : {
"user_give_dislike" : "userA",
"user_receive_dislike" : "userE"
},
}
}
I need 4 basic queries
Get the users that have liked UserA (Show who is interested in userA)
Get the users that UserA has liked
Get the users that UserA has disliked
Get the matches that UserA has
The query 1. is fairly simple, just query the likes collection and get the users where "user_receive_like" is "userA".
Query 2. and 3. are used to get the users that userA has not seen yet, get the users that are not in query 2. or query 3.
Finally query 4. may be another collection
"matches": {
"match_id_1": {
"user_1": "referece_user1",
"user_2": "referece_user2"
},
"match_id_2": {
"user_1": "referece_user3",
"user_2": "referece_user4"
}
}
Is this approach viable and efficient?
You are right to notice, that these arrays are unbounded and pose a serious scalability problem for your application. If you were to assign 2-3 user roles to a user with the 1st approach it would be totally fine, but it is not the case for you. The official MongoDB documentation suggests that you should not use unbounded arrays: https://www.mongodb.com/docs/atlas/schema-suggestions/avoid-unbounded-arrays/
Your second approach is the superior implementation choice for you, because:
you can build indices of form (user_give_dislike, user_receive_like) which will improve your query performance even in case when you have 1M+ documents
you can store additional metadata, (like timestamps etc) on the likes collection without affecting the design of the user collection
the query for "matches" will be much simpler to write with this approach: https://mongoplayground.net/p/sFRvUniHKn8
More about NoSQL data modelling:
https://www.mongodb.com/docs/manual/data-modeling/ and
https://www.mongodb.com/docs/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
To answer your question , let me write some more assumptions about the domain and then lets try to answer it.
Assumptions:
System should support scale for 100 million users
A single user might like or dislike ~100k users in its lifetime
Also some thoery about nosql , if our queries go to all the shards of the collection then max scale of the system depends on the scale of the single shard
Now with these assumptions see the query performance of the question that you asked :
Get the users that have liked UserA (Show who is interested in userA) -
Assuming we are doing sharding or user_give_like column then if we filter on user_receive_like then it will do query on all shards , which is not the right thing for scalability
Get the users that UserA has liked
This will work fine as we have created shard based on user_give_like
Get the users that UserA has disliked
This will work fine as we have created shard based on user_give_dislike
Get the matches that UserA has
In this case if we do a join between existing users and all users which UserA has liked and disliked this will create a parallel query on all shard and is not scalable when UserA like or dislike has huge count
Now to conclude this dosen't look like a reasonable approach to me.

How to make a cloudant query to find documents which two fields are equal

I need to get all documents whose e.g. "_id" field equal to another document field, e.g. "appId"
{
"_id": "xxxx-xxxx-xxxx-xxxx",
"_rev": "xxxx-xxxx-xxxx-xxxx",
"header": {
"appId": "xxxx-xxxx-xxxx-xxxx"
So what would be the query?
"selector": {
"_id": {
"$eq": header.appId
}
},
You can't do "sub queries" with Mango.
From what I see, you're trying to get all the documents listed by appId.
This could be done by using a view.
Your map function would be the following:
if(doc.header && doc.header.appId){
emit(doc.doc.header.appId,{_id: doc.header.appId});
}
The result would be a list of documents mapped by doc.header.appId.
If you query the view with ?include_docs=true, the documents would be joined to the response since we're doing a ManyToJoin join.

Should I use selector or views in Cloudant?

I'm having confusion about whether to use selector or views, or both, when try to get a result from the following scenario:
I need to do a wildsearch for a book and return the result of the books plus the price and the details of the store branch name.
So I tried using selector to do wildsearch using regex
"selector": {
"_id": {
"$gt": null
},
"type":"product",
"product_name": {
"$regex":"(?i)"+search
}
},
"fields": [
"_id",
"_rev",
"product_name"
]
I am able to get the result. The idea after getting the result is to use all the _id's from the result set and query to views to get more details like price and store branch name on other documents, which I feel is kind of odd and I'm not certain is that the correct way to do it.
Below is just the idea once I get the result of _id's and insert it as a "productId" variable.
var input = {
method : 'GET',
returnedContentType : 'json',
path : 'test/_design/app/_view/find_price'+"?keys=[\""+productId+"\"]",
};
return WL.Server.invokeHttp(input);
so I'm asking for input from an expert regarding this.
Another question is how to get the store_branch_name? Can it be done in a single view where we can get the product detail, prices and store branch name? Or do I need to have several views to achieve this?
expected result
product_name (from book document) : Book 1
branch_name (from branch array in Store document) : store 1 branch one
price ( from relationship document) : 79.9
References:
Book
"_id": "book1",
"_rev": "1...b",
"product_name": "Book 1",
"type": "book"
"_id": "book2",
"_rev": "1...b",
"product_name": "Book 2 etc",
"type": "book"
relationship
"_id": "c...5",
"_rev": "3...",
"type": "relationship",
"product_id": "book1",
"store_branch_id": "Store1_branch1",
"price": "79.9"
Store
{
"_id": "store1",
"_rev": "1...2",
"store_name": "Store 1 Name",
"type": "stores",
"branch": [
{
"branch_id": "store1_branch1",
"branch_name": "store 1 branch one",
"address": {
"street": "some address",
"postalcode": "33490",
"type": "addresses"
},
"geolocation": {
"coordinates": [
42.34493,
-71.093232
],
"type": "point"
},
"type": "storebranch"
},
{
"branch_id": "store1_branch2",
"branch_name":
**details ommit...**
}
]
}
In Cloudant Query, you can specify two different kinds of indexes, and it's important to know the differences between the two.
For the first part of your question, if you're using Cloudant Query's $regex operator for wildcard searches like that, you might be better off creating a Cloudant Query index of type "text" instead of type "json". It's in the Cloudant docs, but see the intro blog post for details: https://cloudant.com/blog/cloudant-query-grows-up-to-handle-ad-hoc-queries/ There's a more advanced post on this that covers the tradeoffs between the two types of indexes https://cloudant.com/blog/mango-json-vs-text-indexes/
It's harder to address the second part of your question without understanding how your application interacts with your data, but there are a couple pieces of advice.
1) Consider denormalizing some of this information so you're not doing the JOINs to begin with.
2) Inject more logic into your document keys, and use the traditional MapReduce View indexing system to emit a compound key (an array), that you can use to emulate a JOIN by taking advantage of the CouchDB/Cloudant index sorting rules.
That second one's a mouthful, but check out this example on YouTube: https://youtu.be/0al1KnCKjlA?t=23m39s
Here's a preview (example map function) of what I'm talking about:
'map' : function(doc)
{
if (doc.type==="user") {
emit( [doc._id], null );
}
else if (doc.type==="edge:follower") {
emit( [doc.user, doc.follows], {"_id":doc.follows} );
}
}
The resulting secondary index here would take advantage of the rules outlined in http://wiki.apache.org/couchdb/View_collation -- that strings sort before arrays, and arrays sort before objects. You could then issue range queries to emulate the results you'd get with a JOIN.
I think that's as much detail that's appropriate for here. Hope it helps!

Querying MongoDB (Using Edge Collection - The most efficient way?)

I've written Users, Clubs and Followers collections for the sake of an example the below.
I want to find all user documents from the Users collection that are following "A famous club". How can I find those? and Which way is the fastest?
More info about 'what do I want to do - Edge collections'
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA"
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}
Followers collection
{
"_id": "159",
"user_id": "1",
"club_id": "12"
}
PS: I can get the documents using Mongoose like the below way. However, creating followers array takes about 8 seconds with 150.000 records. And second find query -which is queried using followers array- takes about 40 seconds. Is it normal?
Clubs.find(
{ club_id: "12" },
'-_id user_id', // select only one field to better perf.
function(err, docs){
var followers = [];
docs.forEach(function(item){
followers.push(item.user_id)
})
Users.find(
{ _id:{ $in: followers } },
function(error, users) {
console.log(users) // RESULTS
})
})
There is no an eligible formula to manipulate join many-to-many relation on MongoDB. So I combined collections as embedded documents like the below. But the most important taks in this case creating indexes. For instance if you want to query by followingClubs you should create an index like schema.index({ 'followingClubs._id':1 }) using Mongoose. And if you want to query country and followingClubs you should create another index like schema.index({ 'country':1, 'followingClubs._id':1 })
Pay attention when working with Embedded Documents: http://askasya.com/post/largeembeddedarrays
Then you can get your documents fastly. I've tried to get count of 150.000 records using this way it took only 1 second. It's enough for me...
ps: we musn't forget that in my tests my Users collection has never experienced any data fragmentation. Therefore my queries may demonstrated good performance. Especially, followingClubs array of embedded documents.
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA",
"followingClubs": [ {"_id": "12"} ]
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}