CouchDB - an outer join on a many to many relationship - nosql
I have couchDB database with 3 sets of documents
Items, Users, Reviews
A many to many relationship is maintained in Reviews document for Items and Users
User
{"type":"user","user_id":"U1"},
{"type":"user","user_id":"U2"},
{"type":"user","user_id":"U3"}
Item
{"type":"item","item_id":"I1"},
{"type":"item","item_id":"I2"},
{"type":"item","item_id":"I3"},
{"type":"item","item_id":"I4"}
Review
{"type":"review","item_id":"I1","user_id":"U1","score":4},
{"type":"review","item_id":"I1","user_id":"U2","score":3},
{"type":"review","item_id":"I2","user_id":"U1","score":4},
{"type":"review","item_id":"I3","user_id":"U3","score":1}
I want to get an outer join on Items and Users using reviews so as to get the below results
Intended Result
{"total_rows":16,"offset":0,"rows":[
{"id":"...","key":"I1","value":["item"]},
{"id":"...","key":"I1","value":["review","U1",4]},
{"id":"...","key":"I1","value":["review","U2",3]},
{"id":"...","key":"I1","value":["review","U3",0]},
{"id":"...","key":"I2","value":["item"]},
{"id":"...","key":"I2","value":["review","U1",4]},
{"id":"...","key":"I2","value":["review","U2",0]},
{"id":"...","key":"I2","value":["review","U3",0]},
{"id":"...","key":"I3","value":["item"]},
{"id":"...","key":"I3","value":["review","U1",0]},
{"id":"...","key":"I3","value":["review","U2",0]},
{"id":"...","key":"I3","value":["review","U3",1]},
{"id":"...","key":"I4","value":["item"]},
{"id":"...","key":"I4","value":["review","U1",0]},
{"id":"...","key":"I4","value":["review","U2",0]},
{"id":"...","key":"I4","value":["review","U3",0]}
]}
I am using the tips mentioned in here
http://wiki.apache.org/couchdb/EntityRelationship#Many_to_Many:_Relationship_documents
"many_join_review": {
"map": "function(doc) {
if (doc.type == 'review') {
emit(doc.item_id,[doc.type,doc.user_id,doc.score]);
} else if (doc.type == 'item') {
emit(doc.item_id,[doc.type]);
}"
}
I am getting the below result instead
{"total_rows":8,"offset":0,"rows":[
{"id":"d5e26b9da1683232d1c208241a0056fc","key":"I1","value":["item"]},
{"id":"d5e26b9da1683232d1c208241a006bc8","key":"I1","value":["review","U1",4]},
{"id":"d5e26b9da1683232d1c208241a006be0","key":"I1","value":["review","U2",3]},
{"id":"d5e26b9da1683232d1c208241a005eb0","key":"I2","value":["item"]},
{"id":"d5e26b9da1683232d1c208241a0075cf","key":"I2","value":["review","U1",4]},
{"id":"d5e26b9da1683232d1c208241a006409","key":"I3","value":["item"]},
{"id":"d5e26b9da1683232d1c208241a008313","key":"I3","value":["review","U3",1]},
{"id":"d5e26b9da1683232d1c208241a0065d7","key":"I4","value":["item"]}
]}
So what should I change to have the intended result as in how to get 0 as score on items whose reviews don't exist for a user. Do I need to add something under reduce section.
Thanks
Related
Get Product Data from shopify GraphQL for over 10000 Products
I have an extremely large selection of products in a collection (140,000), to get the data of 250 is fine but I need to get a list of tags for 140,000 products, I have created a bulkOperationRunQuery to get the data. Here is the query I use to run mutation { bulkOperationRunQuery( query: """ { products{ edges{ node{ id tags } } } } """ ) { bulkOperation { id status } userErrors { field message } }} This Works but takes far to long to process, how can I make this quicker is there a set limit on the request
That is all you get for a massive ask like that. If you have 140,000 products you ask for the once. Then you have them, and speed should be of little consequence. There is no need to repeat yourself by asking for them again and again. If you are interested in changes, just listen to product change webhooks. Save yourself a lot of grief that way.
Firestore rules and data structure
I have a question regarding data structure and rules ... I have content on which users can vote. Something like this: Firestore object: { name: "Cat", description: "A cat named Cat", votes: 56 } Now ... I want authenticated users to be able to have update access to the votes, but not to any other values of the object and of course read rights since the content has to be displayed. I did this because I wanted to avoid additional queries when displaying the content. Should I create another collection "votes" maybe where the votes are kept and for each document make an additional request to get them?
In rules, you have access to the state of the data both before and after the writes - so you can test specific fields to be sure they have not changed: function existing() { return resource.data; } function resulting() { return request.resource.data; } function matchField(fieldName) { return existing()[fieldName] == resulting()[fieldName]; } .... allow update: if matchField("name") && matchField("description") .... The functions just make the rule easier to read.
FireStore - how to get around array "does-not-contain" queries
After some research, it's seems clear that I cannot use FireStore to query items a given array does NOT contain. Does anyone have a workaround for this use case?... After a user signs up, the app fetches a bunch of cards that each have a corresponding "card" document in FireStore. After a user interacts with a card, the card document adds the user's uid to a field array (ex: usersWhoHaveSeenThisCard: [userUID]) and the "user" document adds the card's uid to a field array (ex: cardsThisUserHasSeen: [cardUID]). The "user" documents live in a "user" collection and the "card" documents live in a "card" collection. Currently, I'd like to fetch all cards that a user has NOT interacted with. However, this is problematic, as I only know the cards that a user has interacted with, so a .whereField(usersWhoHaveSeenThisCard, arrayContains: currentUserUID) will not work, as I'd need an "arrayDoesNotContain" statement, which does not exist. Finally, a user cannot own a card, so I cannot create a true / false boolian field in the card document (ex: userHasSeenThisCard: false) and search on that criteria. The only solution I can think of, would be to create a new field array on the card document that includes every user who has NOT seen a card (ex: usersWhoHaveNotSeenThisCard: [userUID]), but that means that every user who signs up would have to write their uid to 1000+ card documents, which would eat up my data. I might just be out of luck, but am hoping someone more knowledgeable with NOSQL / FireStore could provide some insight. // If any code sample would help, please let me know and I'll update - I think this is largely conceptual as of now
As you've discovered from query limitations, there is no easy workaround for this using Cloud Firestore alone. You will need to somehow store a list of documents seen, load that into memory in the client app, then manually subtract those documents from the query results of all potential documents. You might want to consider augmenting your app with another database that can do this sort of operation more cleanly (such as a SQL database that can perform joins and subqueries), and maintain them in parallel. Either that, or require all the documents to be seen in a predictable order, such as by timestamp. Then all you have to store is the timestamp of the last document seen, and use that to filter the results.
There is an accepted and good answer, however, it doesn't provide a direct solution to the question so here goes... (this may or may not be helpful but it does work) I don't know exactly what your Firestore structure is so here's my assumption: cards card_id_0 usersWhoHaveSeenThisCard 0: uid_0 1: uid_1 2: uid_2 card_id_1 usersWhoHaveSeenThisCard 0: uid_2 1: uid_3 card_id_2 usersWhoHaveSeenThisCard 0: uid_1 1: uid_3 Suppose we want to know which cards uid_2 has not seen - which in this case is card_id_2 func findCardsUserHasNotSeen(uidToCheck: String, completion: #escaping ( ([String]) -> Void ) ) { let ref = self.db.collection("cards") ref.getDocuments(completion: { snapshot, err in if let err = err { print(err.localizedDescription) return } guard let docs = snapshot?.documents else { print("no docs") return } var documentsIdsThatDoNotContainThisUser = [String]() for doc in docs { let uidArray = doc.get("usersWhoHaveSeenThisCard") as! [String] let x = uidArray.contains(uidToCheck) if x == false { documentsIdsThatDoNotContainThisUser.append(doc.documentID) } } completion(documentsIdsThatDoNotContainThisUser) }) } Then, the use case like this func checkUserAction() { let uid = "uid_2" //the user id to check self.findCardsUserHasNotSeen(uidToCheck: uid, completion: { result in if result.count == 0 { print("user: \(uid) has seen all cards") return } for docId in result { print("user: \(uid) has not seen: \(docId)") } }) } and the output user: uid_2 has not seen: card_id_2 This code goes through the documents, gets the array of uid's stored within each documents usersWhoHaveSeenThisCard node and determines if the uid is in the array. If not, it adds that documentID to the documentsIdsThatDoNotContainThisUser array. Once all docs have been checked, the array of documentID's that do not contain the user id is returned. Knowing how fast Firestore is, I ran the code against a large dataset and the results were returned very quickly so it should not cause any kind of lag for most use cases.
MongoDB Social Network Adding Followers
I'm implementing a social network in MongoDB and I need to keep track of Followers and Following for each User. When I search for Users I want to display a list like Facebook with the User Name, Picture and number of Followers & Following. If I just wanted to display the User Name and Picture (info that doesn't change) it would be easy, but I also need to display the number of Followers & Following (which changes fairly regularly). My current strategy is to embed the People a User follows into each User Document: firstName: "Joe", lastName: "Bloggs", follows: [ { _id: ObjectId("520534b81c9aac710d000002"), profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg", name: "Mark Rogers", }, { _id: ObjectId("51f26293a5c5ea4331cb786a"), name: "The Palace Bar", profilePictureUrl: "https://s3-eu-west-1.amazonaws.com/businesses/xxx.jpg", } ] The question is - What is the best strategy to keep track of the number of Followers & Following for each User? If I include the number of Follows / Following as part of the embedded document i.e. follows: [ { _id: ObjectId("520534b81c9aac710d000002"), profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg", name: "Mark Rogers", **followers: 10,** **following: 400** } then every time a User follows someone requires multiple updates across all the embedded documents. Since the consistency of this data isn't really important (i.e. Showing someone I have 10 instead of 11 followers isn't the end of the world), I can queue this update. Is this approach ok or can anyone suggest a better approach ?
You're on the right track. Think about which calculation is performed more - determining the number of followers/following or changing number of followers/following? Even if you're caching the output of the # of followers/following calculation it's still going to be performed one or two orders of magnitude more often than changing the number. Also, think about the opposite. If you really need to display the number of followers/following for each of those users, you'll have to then do an aggregate on each load (or cache it somewhere, but you're still doing a lot of calcs). Option 1: Cache the number of followers/following in the embedded document. Upsides: Can display stats in O(1) time Downsides: Requires O(N) time to follow/unfollow Option 2: Count the number of followers/following on each page view (or cache invalidation) Upsides: Can follow/unfollow in O(1) time Downsides: Requires O(N) time to display Add in the fact that follower/following stats can be eventually consistent whereas the counts have to be displayed on demand and I think it's a pretty easy decision to cache it.
I've gone ahead and implement the update followers/following based on the same strategy recommended by Mason (Option 1). Here's my code in NodeJs and Mongoose and using the AsyncJs Waterfall pattern in case anyone is interested or has any opinions. I haven't implemented queuing yet but the plan would be to farm most of this of to a queue. async.waterfall([ function (callback) { /** find & update the person we are following */ Model.User .findByIdAndUpdate(id,{$inc:{followers:1}},{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}}) .lean() .exec(callback); }, function (followee, callback) { /** find & update the person doing the following */ var query = { $inc:{following:1}, $addToSet: { follows: followee} } Model.User .findByIdAndUpdate(credentials.username,query,{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}}) .lean() .exec(function(err,follower){ callback(err,follower,followee); }); }, function(follower,followee,callback){ /** update the following count */ Model.User .update({'follows._id':follower.id},{'follows.$.following':follower.following},{upsert:true,multi:true},function(err){ callback(err,followee); }); }, function(followee,callback){ /** update the followers count */ Model.User .update({'follows._id':followee.id},{'follows.$.followers':followee.followers},{upsert:true,multi:true},callback); } ], function (err) { if (err) next(err); else { res.send(HTTPStatus.OK); next(); } });
MongoDB - get 1 last message from each conversation?
I have a collection for conversations: {_id: ..., from: userA, to: userB, message: "Hello!", datetime: ...} I want to show a preview of user's conversations - last message from each conversation between current user and any other users. So when user clicks on some "last message" he goes to next page with all messages between him and that user. How do I do that (get 1 last message from each conversation) without Map/Reduce? 1) use "distinct" command? (how?) 2) set "last" flag for last message? I think it's not very safe... 3) ..?
I was writing up a complicated answer to this question using cursors and a lot of advanced query features and stuff... it was painful and confusing. Then I realized, this is painful because it's not how mongodb expects you to do things really. What I think you should do is just denormalize the data and solve this problem in one shot easily. Here's how: Put a hash/object field on your User called most_recent_conversations When you make a new conversation with another user, update it so that it looks like this: previewUser.most_recent_conversations[userConversedWith._id] = newestConversation._id Every time you make a new conversation, simply smash the value for the users involved in their hashes with the newer conversation id. Now we have a { uid: conversationId, ... } structure which basically is the preview data we need. Now you can look up the most recent conversation (or N conversations if you make each value of the hash an array!) simply: var previewUser = db.users.findOne({ _id: someId }); var recentIds = []; for( uid in previewUser.most_recent_conversations ) { recentIds.push( previewUser.most_recent_conversations[uid] ); } var recentConversations = db.conversations.find({ _id: { $in: recentIds } });