mongoDB data redundancy or faster query? - mongodb

I'm working to develop an app with my team. It's based on Meteor and React. We have 2 collections: Rooms and Locations. Each room has an uniq location. We have a page where we list all the rooms and we can filter them. This is the most used feature. Insert of new room or new location can be done only by the admin.
We are design our filter (by date, by floor, by time, by location name). All the property we need are in the Rooms collection, excpetion done for the location name. We come out with two solutions:
duplicate the location name used in the filter also for each room in the Rooms collections.
get the list of rooms for each property.
I'm try to figure out which one is the best.
first option:
In that case we only need one collection: Rooms. Will cost O(n). The cost to add the location name to the new room will be the same since we already need to add the property id. The extra cost will be the space on MongoDB to save it.
second option.
In this solution we have all the data well structured in the DB. But to filter by location we need to parse each room and find the proper location in the location collections. Only this I think will cost O(n*m).
This is a simple case, we will never scale to much, but since I'm new to mongo I would like to know which one of the two approach can lead to have better performance.

Related

How to modelling domain model - aggregate root

I'm having some issues to correctly design the domain that I'm working on.
My straightforward use case is the following:
The user (~5000 users) can access to a list of ads (~5 millions)
He can choose to add/remove some of them as favorites.
He can decide to show/hide some of them.
I have a command which will mutate the aggregate state, to set Favorite to TRUE, let's say.
In terms of DDD, how should I design the aggregates?
How design the relationship between a user and his favorite's ads selection?
Considering the large numbers of ads, I cannot duplicate each ad inside a user aggregate root.
Can I design a Ads aggregateRoot containing a user "collection".
And finally, how to handle/perform the readmodels part?
Thanks in advance
Cheers
Two concepts may help you understand how to model this:
1. Aggregates are Transaction Boundaries.
An aggregate is a cluster of associated objects that are considered as a single unit. All parts of the aggregate are loaded and persisted together.
If you have an aggregate that encloses a 1000 entities, then you have to load all of them into memory. So it follows that you should preferably have small aggregates whenever possible.
2. Aggregates are Distinct Concepts.
An Aggregate represents a distinct concept in the domain. Behavior associated with more than one Aggregate (like Favoriting, in your case) is usually an aggregate by itself with its own set of attributes, domain objects, and behavior.
From your example, User is a clear aggregate.
An Ad has a distinct concept associated with it in the domain, so it is an aggregate too. There may be other entities that will be embedded within the Ad like valid_until, description, is_active, etc.
The concept of a favoriting an Ad links the User and the Ad aggregates. Your question seems to be centered around where this linkage should be preserved. Should it be in the User aggregate (a list of Ads), or should an Ad have a collection of User objects embedded within it?
While both are possibilities, IMHO, I think FavoriteAd is yet another aggregate, which holds references to both the User aggregate and the Ad aggregate. This way, you don't burden the concepts of User or the Ad with favoriting behavior.
Those aggregates will also not be required to load this additional data every time they are loaded into memory. For example, if you are loading an Ad object to edit its contents, you don't want the favorites collection to be loaded into memory by default.
These aggregate structures don't matter as far as read models are concerned. Aggregates only deal with the write side of the domain. You are free to rewire the data any way you want, in multiple forms, on the read side. You can have a subscriber just to listen to the Favorited event (raised after processing the Favorite command) and build a composite data structure containing data from both the User and the Ad aggregates.
I really like the answer given by Subhash Bhushan and I want to add another approach for you to consider.
If you look closely at your question you will see that you've made the assumption that an aggregate can 'see' everything that the user does when they are interacting with the UI. This doesn't need to be so.
Depending on the requirements of the domain you don't need to hold a list of any Ads in the aggregate to favourite them. Here's what I mean:
For this example, it doesn't matter where the the 'favourite' ad command sits. It could be on the user aggregate or a specific aggregate for handling the concept of Favouriting. The command just needs to hold the id of the User and the Ad they are favouriting.
You may need to handle what happens if a user or ad is deleted but that would just be a case of an event process manager listening to the appropriate events and issuing compensating commands.
This way you don't need to load up 5 million ads. That's a job for the read model and UI, not the domain.
Just a thought.

Implementing Many to Many Relationships on Firestore

I need to model a many-to-many relation on Firestore. A summary of the requirements follow.
A company can hire many contractors for a project. A contractor can work for many companies on different projects at different times.
There should be no limit on the number of contractors or companies, i.e. collections or sub-collections should be used.
A contractor should be able to query by companies; and vice versa, a company should be able to query by contractors. For example, (1) a contractor might ask for a list of companies he/she worked for sorted by project & time, and (2) a company can ask for all contractors who worked for them over a month sorted by project & contractor, and possibly divided by week.
As far as the company is concerned, a contractor can change status, e.g. working, complete. A company changes the status of a contractor during the project lifetime. This status can be used in queries.
Obviously, contractors should not have access to other contractors' information.
A company is represented by only a single user on the mobile app. Similarly, a contractor is represented by only a single user on the mobile app.
The mobile app is built in React Native, which (to the best of my knowledge) is considered by Firestore as a web app.
I am thinking of using a sub-collection of documents for/under each company. Each document represents a project. All contractors' names, their statuses and start and end times are stored on this document.
At the same time, having a duplicate sub-collection of project documents for/under each contractor. Each of these duplicate documents represents a partial copy of the project's document (above). This duplicate document stores the company name and start and end time of the project.
a. Whenever a relationship is established, e.g. a contract is signed, both documents are created in a batch.
b. Status exists only on the 1st copy of the document.
c. In case of any rare changes to the almost static data, eg. name, phone, both documents are updated.
Does this design make sense?
Any concerns, suggestions, better ideas?
If you agree with the design, I would love to hear from you, maybe you can write in a comment something like sounds good.
AskFirebase
There are particular cases when you can use a sub-collection and when not to use sub-collections.
When to use sub-collections:
1) When you don't want to store a lot of fields in a document. Cloud Firestore has 20,000 field limit. (If the Company and Contractor information is very huge and can exceed more than 20,000 fields)
2) When updating the parent collection is a common operation. Firestore only lets you update the document at rate of 1 write/second. (If the Company and Contractor information is modified very often)
3) When you want to limit the access to particular fields of a document. (If you want to restrict the access to a Company's contractors or if the access to Contractor's companies should be restricted. In this case moving the restricted fields to another document in another collection is also a good idea!)
When not to use sub-collections:
1) When you want to query the collections and sub-collections together. Firestore queries are shallow. So sub-collections won't be queried when you query the parent collection so you have to query them separately. (If you have a case to show all the companies and their contractors in one window)
2) When you want to show the sub-collection when viewing the collection.(When showing a company, you might want to show its contractors. Here the number of reads will increase because instead of reading one document you are reading one document and its sub-collection all the time)
3) When you want to query collections and sub-collections together.(You can use the newly announced collections-group query whenever you want to query something that's common across the Companies and Contractors such as field of work or minimum rate)
4) If you're thinking about querying individual pieces of data, you should put them in a collection. (If the Contractor's particular attributes are usually queried by Companies or a Company's details are looked upon by multiple Contractors)
My Suggestion:
Company collection to store company information on which companies can be searched according to their qualities.
Contractors collection with the same approach since I'm assuming contractors will be queried a lot according to their attributes.
Projects sub-collection for info about the projects on which companies and contractors will collaborate. This can be a sub-collection under Company collection if only one company will be working on a project. Even if multiple contractors are going to be working on a project for a company you can store the contractor's Ids in an array in the Projects collection. This will help you avoid the Projects partial sub-collection inside each Company/Contractor collection.
But if you need to query on the project's qualities, it is better to expose them as a seperate parent collection. I leave that up to you.
Finally I would suggest a new collection Contracts which can be used to store the relationship between Company, Contractor and Project and all the information on which you can do the complex querying on. If the same company and contractor has two different projects on which they are working/collaborating, then it can be two documents in Contracts collection. This comes handy when you want to show some dashboards. Using this single collection you can show the separate statistics for a Company, Contractor and complex statistics involving both Company and Contractor.
Hope this helps.

Firestore Geopoint in an Arrays

I am working with an interesting scenario that I am not sure can work, or will work well. With my current project I am trying to find an efficient way of working with geopoints in firestore. The straight forward approach where a document can contain a geopoints field is pretty self explanatory and easy to query. However, I am having to work with a varying amount of geopoints for a single document (Article). The reason for this is because a specific piece of content may need to be available in more than one geographic area.
For example, An article may need to be available only in NYC, Denver and Seattle. Using a geopoint for each location and searching by radius, in general, is a pretty standard task if I only wanted the article to be available in Seattle, but now it needs to be available in two more places.
The solution as I see it currently is to use an array and fill it with geopoints. The structure would look something like this:
articleText (String),
sortTime (Timestamp),
tags (Array)
- ['tagA','tagB','tagC','tagD'],
availableLocations (Array)
- [(Geopoint), (Geopoint), (Geopoint), (Geopoint)]
Then performing a query to get all content within 10 miles of a specific Geopoint starting at a specific postTime.
What I don't know is if putting the geopoints in an array works well or should be avoided in favor or another data structure.
I have considered replicating an article document for each geopoint, but that does not scale very well if more than a handful of locations needs defining. I've also considered creating a "reference" collection where each point is a document that contains the documentID of an article, but this leads to reading each reference document then reading the actual document. Essentially two document reads for 1 piece of content, which can get expensive based on the Firestore pricing model, and may slow things down unnecessarily.
Am I approaching this in an acceptable way? And are there other methods that can work more efficiently?

Database schema for a tinder like app

I have a database of million of Objects (simply say lot of objects). Everyday i will present to my users 3 selected objects, and like with tinder they can swipe left to say they don't like or swipe right to say they like it.
I select each objects based on their location (more closest to the user are selected first) and also based on few user settings.
I m under mongoDB.
now the problem, how to implement the database in the way it's can provide fastly everyday a selection of object to show to the end user (and skip all the object he already swipe).
Well, considering you have made your choice of using MongoDB, you will have to maintain multiple collections. One is your main collection, and you will have to maintain user specific collections which hold user data, say the document ids the user has swiped. Then, when you want to fetch data, you might want to do a setDifference aggregation. SetDifference does this:
Takes two sets and returns an array containing the elements that only
exist in the first set; i.e. performs a relative complement of the
second set relative to the first.
Now how performant this is would depend on the size of your sets and the overall scale.
EDIT
I agree with your comment that this is not a scalable solution.
Solution 2:
One solution I could think of is to use a graph based solution, like Neo4j. You could represent all your 1M objects and all your user objects as nodes and have relationships between users and objects that he has swiped. Your query would be to return a list of all objects the user is not connected to.
You cannot shard a graph, which brings up scaling challenges. Graph based solutions require that the entire graph be in memory. So the feasibility of this solution depends on you.
Solution 3:
Use MySQL. Have 2 tables, one being the objects table and the other being (uid-viewed_object) mapping. A join would solve your problem. Joins work well for the longest time, till you hit a scale. So I don't think is a bad starting point.
Solution 4:
Use Bloom filters. Your problem eventually boils down to a set membership problem. Give a set of ids, check if its part of another set. A Bloom filter is a probabilistic data structure which answers set membership. They are super small and super efficient. But ya, its probabilistic though, false negatives will never happen, but false positives can. So thats a trade off. Check out this for how its used : http://blog.vawter.com/2016/03/17/Using-Bloomfilters-to-Avoid-Repetition/
Ill update the answer if I can think of something else.

When to seperate documents in MongoDB

I am building a real estate application using node and MongoDB. I have two major models
City
Property
I am now confused because I don't know if I should create a separate collection for cities and one for properties. Or I should put all the properties under it's city?
I am confused because I think when the application grow, large cities will be huge documents, which is a design decision should be done by the first.
Please let me know if you have a best practice way to handle this kind of situations.
As every property has only one city, this is a one-to-many relationship. In this case you have many options:
Firstly, remember the 16 MB document size restriction per document. So, how big the "many" is. How many properties per city?
One-to-Few (just a few hundred): embedding the "few" (property) in "one" (city).
One-to-Many (no more than a couple of thousand): child-referencing.
The ObjectIDs of the "many" (property) doc in an array in "one"(city) document.
One-to-Squillions: parent-referencing.
Store the ObjectId of the "one" (city) in the "many" (property) document.
Secondly, if there’s an high ratio of reads to updates, you can considering denormalization. Paying the price of slower and complex updates in order to get more efficient queries.
A proposed solution: having only one collection (properties) and in their documents embed the city document.
As probably, you are going to retrieve the properties by city, don't forget to create an index on the city field.
Recommending posts:
http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1
http://blog.mongodb.org/post/87892923503/6-rules-of-thumb-for-mongodb-schema-design-part-2
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3
I assume the city center coordinates are not going to change very often. So embedding cities to properties is possible if the city documents are not very big and you need information abot them each time you read a property. On the other hand, you can put cities into a separate collection and link it from a property. Finally, you can embed most useful information about a city along with link to a property (mixed approach). Embedding a city to property will introduce some duplication but may increase read performance. To choose most appropriate option you need to understand what read/write requests the application will make most often.