Using Mongo Map Reduce for Notification System

Using Mongo Map Reduce for Notification System - mongodb

I'm implementing a notification system in mongodb. Basically I create notification items and subscribe users on them. After that if any actions happen on notification items, I just log them in action document so i can query user's notifications on action collection.
So far it's all good.
Now I just want to group actions by their type and notification items and sort the result by latest action time (let say follow, comment, like). So i can make summary like "John, Lisa and 3 other like your bla bla"
To succeed it I implemented mapreduce on mongodb. It's also working. But the problem is it doesnt look right;
Because mapreduce generates a collection as output.
In this case;
User1 login, mapreduce run generate notification collection then
User2 login, mapreduce run generate notification collection with same name(which already created for another user).
I'm curios about what happens if multiple users trigger the mapreduce at the same time?(which is quite possible,it's a social network)
I thought that maybe i can create unique notification collections for each user notification_$userid .. For 1M users 1M collection doesnt sound right?
By the way I started with the MongoDB grouping but changed to mapreduce because i want to sort result set.
Thank you.

Related

How to organize FireStore Collections and Documents based on app similar to BlaBlaCar rides

It's my first time working with FireStore. I'm working on a ridesharing app with Flutter that uses Firebase Auth where users can create trips and offer rides similarly to BlaBlaCar, where other users can send requests to join a ride. I’m having difficulty not only deciding the potential collections and paths to use, but also how to even structure it.
For simplicity at this stage, I want any user to be able to see all trips created, but when they go to their “My Rides” page, they will only see the rides that they’ve participated in. I would be grateful for any kind of feedback.
Here are the options I’ve considered:
Two collections, “Users” and “Trips”. The path would look something like this:
users/uid and trips/tripsId with a created_by field
One collection of “Users” and a sub-collection of “Trips". The path seems to make more sense to me, which would be users/uid/trips/tripId but then I don't know how other users could access all the rides on their home feed.
I'm inclined to go with the first option of two collections. Also very open to any other suggestions or help. Thanks.

I want any user to be able to see all trips created, but when they go
to their “My Rides” page, they will only see the rides that they’ve
participated in
I make the assumption that participating in a ride is either being the author or being a passenger of the ride.
I would go for 2 collections: one for users and one for trips. In a trip document you add two fields:
createdBy with the uid of the creator
participants: an Array where you store the author's uid and all the other participants uids (passengers)
This way you can easily query for:
All the rides
All the rides created by a user
All the rides for which a user is a participant, using arrayContains.
(Regarding the limit of 1 MiB for the maximum size for a document I guess this is not a problem because the number of passengers of a ride shouldn't be so huge that the Array fields size makes the document larger than 1 Mib!)
Note that the second approach with subcollections could also be used since you can query with collections group queries but, based on the elements in your question, I don't see any technical advantage.

should I create a seperate model (collection) for this?

i am building a small web app with MERN, i have a collection that holds "name, email, password, avatar url, and date" and i am going to add to the users some info like a "bio, hobbies(array), "visited countries(array), and another array"
question is, should i create a diffrent model for the users info, and add owner field that refers to the other model?. or should i put all of them there,
also i might add the following and followers option in the future.

The user's info should be in the user collection, I could see there is no reason to have a separate collection for it. If you want to reduce the responses from listing users, you could use populate to remove unnecessary fields.
Regards to the following and followers, I think there are 2 approaches:
Adding a new field which used to store id and necessary metadata (name, avatar) of users to the existing collection
Create a new collection which is a combination of users and users they are following, or are followed. You then could use Virtual to get this information from the User collection.
Personally, I prefer the first approach although it requires more effort to maintain the list to be accurate. E.g remove an item out of the list when your follower stops following you.

Flutter firestore structure for query condition

I am new to NoSQL and I'm trying to figure out a good way to represent my data. I have a series of workers that need to request vacations via mobile app.
When I try to write a Firebase query with Flutter, I can do this:
Firestore.instance
.collection("ferie_permessi")
.document("worker1#test.com")
.snapshot();
It works but there are two main errors:
If I try to create another collection called "Woker info" I cannot use worker1#test.com as document ID as it already esists;
I have to sort data client side because firestore doesn't give me the possibility (with this setup I've made).
I'm quite sure that this structure isn't good at all. Each worker needs to have 2 lists: vacations and other. What is wrong?
My guess is that I should move worker1#test.com together with vacations and other so that I can make a query of this kind:
Firestore.instance
.collection("ferie_permessi")
.where("user", "==", "worker1#test.com)
.snapshot();
But now the id? Is an automatic one good?

I had a chance to recently explore creating an app using a firebase-firestore. A couple of things will help here:
Yes, the autogenerated id is good since it is unique, for example, you can have a collections vacation_requests, users you can then use that user_id as a document in vaccation_requests -> user_id -> vacations, instead of using email as a document key.
Or
You can do it like this collections users, vacation_requests, and requests.
store user details in users.
store requests in requests with from and to dates.
store reference of User and Request in vaccation_requests.
Hope this helps!

Firestore - How to perform "NOT IN" like you would in SQL

I have a collection of "quizes" that users will participate in. When a user takes a quiz I create a document in "results" collection for with that userId and quizId. I want my app to pull all docs from "quizes" collection excluding the ones that the user taken. In SQL I would do "NOT IN" clause and accomplish that, but I have no idea how to best approach this in Firestore.

There's no equivalent query in Firestore. You would need to pull all the data and determine which docs are relevant on clientside.
Alternatively, you can create a list of all quizzes for each user and maintain this list. You could add and remove quizzes for each user as they become relevant/irrelevant to show them.

How to optimize collection subscription in Meteor?

I'm working on a filtered live search module with Meteor.js.
Usecase & problem:
A user wants to do a search through all the users to find friends. But I cannot afford for each user to ask the complete users collection. The user filter the search using checkboxes. I'd like to subscribe to the matched users. What is the best way to do it ?
I guess it would be better to create the query client-side, then send it the the method to get back the desired set of users. But, I wonder : when the filtering criteria changes, does the new subscription erase all of the old one ? Because, if I do a first search which return me [usr1, usr3, usr5], and after that a search that return me [usr2, usr4], the best would be to keep the first set and simply add the new one to it on the client-side suscribed collection.
And, in addition, if then I do a third research wich should return me [usr1, usr3, usr2, usr4], the autorunned subscription would not send me anything as I already have the whole result set in my collection.
The goal is to spare processing and data transfer from the server.
I have some ideas, but I haven't coded enough of it yet to share it in a easily comprehensive way.
How would you advice me to do to be the more relevant possible in term of time and performance saving ?
Thanks you all.
David

It depends on your application, but you'll probably send a non-empty string to a publisher which uses that string to search the users collection for matching names. For example:
Meteor.publish('usersByName', function(search) {
check(search, String);
// make sure the user is logged in and that search is sufficiently long
if (!(this.userId && search.length > 2))
return [];
// search by case insensitive regular expression
var selector = {username: new RegExp(search, 'i')};
// only publish the necessary fields
var options = {fields: {username: 1}};
return Meteor.users.find(selector, options);
});
Also see common mistakes for why we limit the fields.
performance
Meteor is clever enough to keep track of the current document set that each client has for each publisher. When the publisher reruns, it knows to only send the difference between the sets. So the situation you described above is already taken care of for you.
If you were subscribed for users: 1,2,3
Then you restarted the subscription for users 2,3,4
The server would send a removed message for 1 and an added message for 4.
Note this will not happen if you stopped the subscription prior to rerunning it.
To my knowledge, there isn't a way to avoid removed messages when modifying the parameters for a single subscription. I can think of two possible (but tricky) alternatives:
Accumulate the intersection of all prior search queries and use that when subscribing. For example, if a user searched for {height: 5} and then searched for {eyes: 'blue'} you could subscribe with {height: 5, eyes: 'blue'}. This may be hard to implement on the client, but it should accomplish what you want with the minimum network traffic.
Accumulate active subscriptions. Rather than modifying the existing subscription each time the user modifies the search, start a new subscription for the new set of documents, and push the subscription handle to an array. When the template is destroyed, you'll need to iterate through all of the handles and call stop() on them. This should work, but it will consume more resources (both network and server memory + CPU).
Before attempting either of these solutions, I'd recommend benchmarking the worst case scenario without using them. My main concern is that without fairly tight controls, you could end up publishing the entire users collection after successive searches.

If you want to go easy on your server, you'll want to send as little data to the client as possible. That means every document you send to the client that is NOT a friend is waste. So let's eliminate all that waste.
Collect your filters (eg filters = {sex: 'Male', state: 'Oregon'}). Then call a method to search based on your filter (eg Users.find(filters). Additionally, you can run your own proprietary ranking algorithm to determine the % chance that a person is a friend. Maybe base it off of distance from ip address (or from phone GPS history), mutual friends, etc. This will pay dividends in efficiency in a bit. Index things like GPS coords or other highly unique attributes, maybe try out composite indexes. But remember more indexes means slower writes.
Now you've got a cursor with all possible friends, ranked from most likely to least likely.
Next, change your subscription to match those friends, but put a limit:20 on there. Also, only send over the fields you need. That way, if a user wants to skip this step, you only wasted sending 20 partial docs over the wire. Then, have an infinite scroll or 'load more' button the user can click. When they load more, it's an additive subscription, so it's not resending duplicate info. Discover Meteor describes this pattern in great detail, so I won't.
After a few clicks/scrolls, the user won't find any more friends (because you were smart & sorted them) so they will stop trying & move on to the next step. If you returned 200 possible friends & they stop trying after 60, you just saved 140 docs from going through the pipeline. There's your efficiency.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse