I have a mongoDB collection with an array field that represents the lists the user is member of.
user {
screen_name: string
listed_in: ['list1', 'list2', 'list3', ...] //Could be more than 10000 elements (I'm aware of the BSON 16MB limits)
}
I am using the *listed_in* field to get the members list
db.user.find({'listed_in': 'list2'});
I also need to query for a specific user and know if he is member of certain lists
var user1 = db.findOne({'screen_name': 'user1'});
In this case I will get the *listed_in* field with all its members.
My question is:
Is there a way to pre-compute custom fields in mongoDB?
I would need to be able to get fields like these, user1.isInList1, user1.isInList2
Right now I have to do it in the client side by iterating through the *listed_in* array to know if the user is member of "list1" but *listed_in* could have thousand elements.
My question is: Is there a way to pre-compute custom fields in mongoDB?
Not really. MongoDB does not have any notion of "computed columns". So the query you're looking for doesn't exist.
Right now I have to do it in the client side by iterating through the *listed_in* array to know if the user is member of "list1" but *listed_in* could have thousand elements
In your case you're basically trying to push a client-side for loop onto the server. However, some process still has to do the for loop. And frankly, looping through 10k items is not really that much work for either client or server.
The only real savings here is preventing extra data on the network.
If you really want to save that network traffic, you will need to restructure your data model. This re-structure will likely involve two queries to read and write, but less data over the wire. But that's the trade-off.
Related
May be title is not very clear but I'll try to explain.
There are two collections in mongo:
groups
users
Groups are created by users.
UI sends a /groups/1/10 to read first 10 groups. We don't want to return groups whose creators(users) are deleted.
Example:
UI makes call: /groups/1/10
Let us say only 8 records are available because 2 users are deleted from the system, hence their groups are not available.
What should we do?
Should UI make another request like: /groups/1/2 ?
Should we read let's say 20 groups, read first 10 available groups and return them? This may not be very good for second or third pages.
There is not enough information here to give a specific answer, specifically we need to know more about the schema that you are using. We'll try to give some general details that might point things in the right direction. We are also assuming that your endpoints are structured as /groups/<pageNumber>/<pageSize>.
Broadly speaking, if the client calls /groups/1/10 and there are (at least) 10 valid matching results, then the system should return 10 results.
It's not clear what you mean when you say:
only 8 records are available because 2 users are deleted from the system, hence their groups are not available ... Should UI make another request like: /groups/1/2 ?
The first part of that statement implies that there are only 8 valid results, but the second part implies that there are at least 2 more valid results that can be retrieved. If there are 10 valid results then they should be all get returned.
How you accomplish this depends on how invalid groups and/or deleted users are represented in your system. If, for example, the documents in your groups collection have some sort of valid field that becomes false when the user who created it gets deleted, then you should be applying a filter to remove those results such as:
db.groups.find({ valid: true }).limit(10)
If instead the groups have a document that references the user who created it, then you may need to do something a bit more complex. That may be along the lines of doing an aggregation that does a $lookup on the users collection and then perform a subsequent $match to remove the groups from the results whose creators have been deleted.
While there are many approaches to this problem, the only one that I would consider incorrect would be to force the client to perform the group validity check and/or force the client to make multiple requests.
Is it possible to "join" indices in Algolia to get a merged result?
For example:
If I have two indices : one for 'users', and one for 'events'. Users each have id and name attributes. Events each have date and userId attributes.
How would I go about searching for all users named "bob", and for each user also return the next 5 events associated with them?
Is it possible to "join" them like you would in a relational database? Or do I need to search for users, then iterate through the hits, searching for events for each user? What's the best solution for this type of query here?
Algolia is not designed as a relational database. To get to what you're trying to achieve, you have to transform all your records into "flat" objects (meaning, each object also includes all their linked dependencies).
In your case, what I would do is to add a new key to your user records, named events and have it be an array of events (just like you save them in the events table). This way, you got all the information needed in one call.
Hope that helps,
I am looking into best -practices for returning search results. I have a search page that subscribes to a publication that returns a find based on the searched regex query in multiple fields. This gets put into the minimongo collection, on the client.
At this time, the way it is being handled is that facets are being set up from the subscription. My question is if the filtering for the pre-loaded results from the backend should be done client side, or if the query should be sent back.
Example :
Given a collection of fruits, i want to find all that have the color red. The server returns this, but I have facets based on the fruits. So, i have a checkbox for strawberries, apples, cherries, etc. If I click on the checkbox for cherries, should I just be filtering the current minimongo collection, or should I re-query?
Logically, I already have all the needed items in my collection that I could be filtering on, so I am not sure why I would need to hit the back-end. The only time I should hit the backend is if in the search, I type in a new query (such as blue), and the facets get re-done appropriately
If your original search is returning all matching documents then adding criteria on the client can just be done in your minimongo query if the fields on which the additional criteria were returned with the original search.
OTOH if the original search is returning a paginated list or just the top N results or if the required keys weren't included then you want to continue the search on the server.
In a traditional request-response system, you might also want to query the server each time if the underlying data was rapidly changing (ex: a reservations system). With Meteor the reactive nature of pub-sub means that the data on the client is being constantly refreshed with adds/changes/deletions via DDP over WebSocket.
I am confused with the term 'link' for connecting documents
In OrientDB page http://www.orientechnologies.com/orientdb-vs-mongodb/ it states that they use links to connect documents, while in MongoDB documents are embedded.
Since in MongoDB http://docs.mongodb.org/manual/core/data-modeling-introduction/, documents can be referenced as well, I can not get the difference between linking documents or referencing them.
The goal of Document Oriented databases is to reduce "Impedance Mismatch" which is the degree to which data is split up to match some sort of database schema from the actual objects residing in memory at runtime. By using a document, the entire object is serialized to disk without the need to split things up across multiple tables and join them back together when retrieved.
That being said, a linked document is the same as a referenced document. They are simply two ways of saying the same thing. How those links are resolved at query time vary from one database implementation to another.
That being said, an embedded document is simply the act of storing an object type that somehow relates to a parent type, inside the parent. For example, I have a class as follows:
class User
{
string Name
List<Achievement> Achievements
}
Where Achievement is an arbitrary class (its contents don't matter for this example).
If I were to save this using linked documents, I would save User in a Users collection and Achievement in an Achievements collection with the List of Achievements for the user being links to the Achievement objects in the Achievements collection. This requires some sort of joining procedure to happen in the database engine itself. However, if you use embedded documents, you would simply save User in a Users collection where Achievements is inside the User document.
A JSON representation of the data for an embedded document would look (roughly) like this:
{
"name":"John Q Taxpayer",
"achievements":
[
{
"name":"High Score",
"point":10000
},
{
"name":"Low Score",
"point":-10000
}
]
}
Whereas a linked document might look something like this:
{
"name":"John Q Taxpayer",
"achievements":
[
"somelink1", "somelink2"
]
}
Inside an Achievements Collection
{
"somelink1":
{
"name":"High Score",
"point":10000
}
"somelink2":
{
"name":"High Score",
"point":10000
}
}
Keep in mind these are just approximate representations.
So to summarize, linked documents function much like RDBMS PK/FK relationships. This allows multiple documents in one collection to reference a single document in another collection, which can help with deduplication of data stored. However it adds a layer of complexity requiring the database engine to make multiple disk I/O calls to form the final document to be returned to user code. An embedded document more closely matches the object in memory, this reduces Impedance Mismatch and (in theory) reduces the number of disk I/O calls.
You can read up on Impedance Mismatch here: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
UPDATE
I should add, that choosing the right database to implement for your needs is very important from the start. If you have a lot of questions about each database, it might make sense to contact each supplier and get some of their training material. MongoDB offers 2 free courses you can take to learn more about their product and best uses at MongoDB University. OrientDB does offer training, however it is not free. It might be best to try contacting them directly and getting some sort of pre-sales training (if you are looking to license the db), usually they will put you in touch with some sort of pre-sales consultant to help you evaluate their product.
MongoDB works like RDBMS where the object id is like a foreign key. This means a "JOIN" that is run-time expensive. OrientDB, instead, has direct links that are created only once and have a very low run-time cost.
I'm working on a Rails app that implements some social network features as relationships, following, etc. So far everything was fine until I came across with a problem on many to many relations. As you know mongo lacks of joins, so the recommended workaround is to store the relation as an array of ids on both related documents. OK, it's a bit redundant but it should work, let's say:
field :followers, type: Array, default: []
field :following, type: Array, default: []
def follow!(who)
self.followers << who.id
who.following << self.id
self.save
who.save
end
That works pretty well, but this is one of those cases where we would need a transaction, uh, but mongo doesn't support transactions. What if the id is added to the 'followed' followers list but not to the 'follower' following list? I mean, if the first document is modified properly but the second for some reason can't be updated.
Maybe I'm too pessimistic, but there isn't a better solution?
I would recommend storing relationships only in one direction, storing the users someone follows in their user document as "following". Then if you need to query for all followers of user U1, you can query for {users.following : "U1"} Since you can have a multi-key index on an array, this query will be fast if you index this field.
The other reason to go in that direction only is a single user has a practical limit to how many different users they may be following. But the number of followers that a really popular user may have could be close to the total number of users in your system. You want to avoid creating an array in a document that could be that large.