How do i do this in mongo aggregation framework?
given these records
record 1. {id:1,action:'clicked',user:'id', time:'1'}
record 2. {id:2,action:'video play',user:'id',time:'2'}
record 3. {id:3,action:'page load',user:'id',time:'3'}
record 4. {id:4,action:'video play',user:'id',time:'4'}
record 5. {id:1,action:'clicked',user:1id', time:'5'}
record 6. {id:2,action:'video play',user:'id',time:'6'}
now, how do i get the all "video play" that are after clicked action? anybody come across with this kind of aggregation?
You will need to redesign your schema. I can think of two approaches. In your application you can track the click path of a session. When you insert an action to your collection, you will need to also track the previous interaction. Once you have this, then you just need to do something like db.actions.find({prevAction:"clicked",action:"video play"}).count(). This will be very fast.
Alternatively, if you decide you like to track session click path information, you may have a document like:
{_id:sessionid
user:usderid
actions:[
{...login}
{...click link}
{...play video}
]}
You can create this collection by doing upserts. Make sure you keep the action subdocuments small so you don't exceed the 16MB limit for standard documents. Also set the collection's padding factor to "powersof2."
Once you have this collection, you can pull out these documents to get some interesting info. The specific aggregation that you want to do would be more complex on this collection than the suggestion that I previously made though. You will need to create a MR process that may run periodically behind the scenes to calculate what you want (emit key value only if the area contains the expected sequence of actions).
Related
I have a database of million of Objects (simply say lot of objects). Everyday i will present to my users 3 selected objects, and like with tinder they can swipe left to say they don't like or swipe right to say they like it.
I select each objects based on their location (more closest to the user are selected first) and also based on few user settings.
I m under mongoDB.
now the problem, how to implement the database in the way it's can provide fastly everyday a selection of object to show to the end user (and skip all the object he already swipe).
Well, considering you have made your choice of using MongoDB, you will have to maintain multiple collections. One is your main collection, and you will have to maintain user specific collections which hold user data, say the document ids the user has swiped. Then, when you want to fetch data, you might want to do a setDifference aggregation. SetDifference does this:
Takes two sets and returns an array containing the elements that only
exist in the first set; i.e. performs a relative complement of the
second set relative to the first.
Now how performant this is would depend on the size of your sets and the overall scale.
EDIT
I agree with your comment that this is not a scalable solution.
Solution 2:
One solution I could think of is to use a graph based solution, like Neo4j. You could represent all your 1M objects and all your user objects as nodes and have relationships between users and objects that he has swiped. Your query would be to return a list of all objects the user is not connected to.
You cannot shard a graph, which brings up scaling challenges. Graph based solutions require that the entire graph be in memory. So the feasibility of this solution depends on you.
Solution 3:
Use MySQL. Have 2 tables, one being the objects table and the other being (uid-viewed_object) mapping. A join would solve your problem. Joins work well for the longest time, till you hit a scale. So I don't think is a bad starting point.
Solution 4:
Use Bloom filters. Your problem eventually boils down to a set membership problem. Give a set of ids, check if its part of another set. A Bloom filter is a probabilistic data structure which answers set membership. They are super small and super efficient. But ya, its probabilistic though, false negatives will never happen, but false positives can. So thats a trade off. Check out this for how its used : http://blog.vawter.com/2016/03/17/Using-Bloomfilters-to-Avoid-Repetition/
Ill update the answer if I can think of something else.
I'm working to develop an app with my team. It's based on Meteor and React. We have 2 collections: Rooms and Locations. Each room has an uniq location. We have a page where we list all the rooms and we can filter them. This is the most used feature. Insert of new room or new location can be done only by the admin.
We are design our filter (by date, by floor, by time, by location name). All the property we need are in the Rooms collection, excpetion done for the location name. We come out with two solutions:
duplicate the location name used in the filter also for each room in the Rooms collections.
get the list of rooms for each property.
I'm try to figure out which one is the best.
first option:
In that case we only need one collection: Rooms. Will cost O(n). The cost to add the location name to the new room will be the same since we already need to add the property id. The extra cost will be the space on MongoDB to save it.
second option.
In this solution we have all the data well structured in the DB. But to filter by location we need to parse each room and find the proper location in the location collections. Only this I think will cost O(n*m).
This is a simple case, we will never scale to much, but since I'm new to mongo I would like to know which one of the two approach can lead to have better performance.
How would one find the Collection from the Document object in MongoDB? I'm developing a small game in Meteor and I've run into a scenario in which I have two Document objects that are similar (Players and Monsters) and get called often in similar places (both move, initiate combat, can pick things up, etc).
This is all server side code.
I have functions server side that can interchangeably use these Documents and I won't know which Collection to execute updates against.
I was hoping that there was a sort of "save()" method similar to that of Mongoose but there doesn't appear to be.
Is there a way to introspect a record after executing:
var player = Players.findOne({query});
to know that the results from that query are from Players to be inserted back into later in another function?
I have a use case where I need to get list of Objects from mongo based off a query. But, to improve performance I am adding Pagination.
So, for first call I get list of say 10 Objects, in next I need 10 more. But I cannot use offset and pageSize directly because the first 10 objects displayed on the page may have been modified [ deleted ].
Solution is to find Object Id of last object passed and retrieve next 10 objects after that ObjectId.
Please help how to efficiently do it using Morphia mongo.
Using morphia you can do this by the following command.
datastore.find(YourClass.class).field(id).smallerThan(lastId).limit(10).order("-ts");
Since you are querying for retrieving the items after the last retrieved id, you won't be bothered to deal with deleted items.
One thing I have thought up of is that you will have the same problem as with using skip() here unless you intend to change how your interface works.
Using ranged queries like this demands that you use a different kind of interface since it is must harder to detect now exactly what page you are on and how many pages exist in the future, especially if you are doing this to avoid problems with conventional paging.
The default type of interface to arise from this type of paging is merely a infinitely scrolling page, think of YouTube video comments or Facebook wall feed or even Google+. There is no physical pagination or "pages", instead you have a get more button.
This is the type of interface you will need to use to get ranged paging working better.
As for the query #cubbuk gives a good example:
datastore.find(YourClass.class).field(id).smallerThan(lastId).limit(10).order("-ts");
Except it should be greaterThan(lastId) since you want to find everything above that last _id. I would also sort by _id unless you make your OjbectIds sometime before you insert a record, if this is the case then you can use a specific timestamp set on insert instead.
I took a shortcut earlier and made the primary key of my Mongo database by concatenating various fields to create a "unique id"
I would now like to change it to actually use the ObjectId. What's the best approach to do it? I have a little over 3M documents and would like this to be as least disruptive as possible.
A simple approach would be to bring down the site for a bit and then copy every document from one to the other one which is using ObjectIds but I'd like to keep the application running if I can. I imagine one way would be to write to both for a period of time while the migration happens but that would require me having two similar code bases so I wonder if there's a way to avoid all that.
To provide some additional information:
It's just one collection that's not referenced by any others. I have another MySQL database that contains some values that are used to create the queries that read from this MongoDB collection.
I'm using PyMongo/Mongoengine libraries to interact with MongoDB from Python and I don't know if it's possible to just change the primary key for a collection.
You shouldn't bring your site down for any reason if it does not go down itself. :)
No matter how many millions of records you have, the solution to the problem resides on how you use your ids.
If you cross-reference documents in different collections using these ids, then for every updated object, you will update all other objects that references this one.
As first step, your system should be updated to stop creating new objects in the old way. If your system lets you easily do this, then you can update your database very easily. If this change is not easy to make, then your system has some architectural problems and you should first change this. If this is the situation, please update your question so I can update my answer.
Since I don't know anything about your applications and data, what I say will be too general. Let's call the collection you want to update coll_bad_id. Every item in this collection is referenced in other collections like coll_poor_guy and coll_wisdom_searcher. How I would do this is to run over coll_bad_id one item at a time like this:
1. read one item
2. update _id with new style of _id
3. insert item back to collection
-- now we have two copies of the same item one with old-style id, one with new
4. update each item referencing this to use new style id
5. remove the duplicate item with old-style id from collection
One thing you should keep in mind that, bson ObjectId's keep date/time data that can be very useful. Since you rebuild all these objects on one day, your ObjectId's will not reflect correct creation times for these items. For newly added items, they will. You can note the first newly added item as the milestone of items with ids with correct-creation times.
UPDATE: Code sample to run on Mongo shell.
This is not the most efficient way to do this; but it is safe to run since we do not remove anything before adding them back with a new _id. Better can be doing this in small amounts by adding queries to find() call.
var cursor = db.testcoll.find()
cursor.forEach(function(item) {
var oldid= item._id; // we save old _id to use for removal below.
delete item._id; // When we add an item without _id, Mongo creates a unique _id.
db.testcoll.insert(item); // We add item without _id.
db.testcoll.remove(oldid); // We delete the item with bad _id.
});