Firestore: Is it efficient to create a Firestore Collection per each user in your app? [duplicate] - swift

I am making an app where users can follow each other. To decide how to model it in firestore I would like to know how does collection size affect query performance.
I first thought of making it like this:
relationships(coll.)
----{userId_1}(document)
--------following(coll)
------------{someId1}(document)
------------{someId2}(document)
.....
--------followers(coll)
------------{someId5}(document)
------------{someId7}(document)
.....
----{userId_2}(document)
--------following(coll)
------------{someId11}(document)
------------{someId24}(document)
.....
--------followers(coll)
------------{someId56}(document)
------------{someId72}(document)
.....
So I would have main collection relationships, then each document would represent one user and he would have two collections - following and followers, and in those collections I would store documents with data like id,name,email,..
Then when user1 wants to see his followers, I would get all documents under relationships/userId_1/followers, and if he would like to see who he follows I would get documents under relationships/userId_1/following
I also thought about doing it like this:
relationships(coll)
----{user5id_user4id}(document)
--------user1:"user5id" (field)
--------user2:"user4id" (field)
.........(other fields)
----{user4_user5}(document)
--------user1:"user4id" (field)
--------user2:"user5id" (field)
.........(other fields)
I would have one main collection relationships where each document would represent one following relationship, document name would be firstUserId_secondUSerId (means firstUserId follows secondUserId) and I would also have two fields user1 and user2 that would store ids of two users where user1 follows user2
So if I am {myUserId} and I would like to get all the people who I follow I would do a query on relationships collection where user1 = myUserId
And if I would like to get all the people who follow me I would do a query on relationships collection where user2 = myUserId
since each document represents relation user1 follows user2.
So my question is which way would be more efficient with querying the data.
In first case each user would have collection of his followers/following and I would just get the documents, in second case relationship would have many document representing user1->follows->user2 relation.
I know that I would be billed by number of documents that query function returns, but how fast would it be if it would need to search through large collection.

Collection size has no bearing on the performance or cost of a query. Both are determined entirely by size of the result size (number of documents). So, a query for 10 documents out of 100 performs and costs the same as a query for 10 documents out of 100,000. The size of 10 is the only thing that matters here.
See also: Queries scale with the size of your result set, not the size of your data set

Related

Firestore: Order by sub-collection field

First of all, this is not a regular question. It's little complicated.
App summary
Recipes app where users can search recipes by selected ingredients (collection ingredients exists in firestore db). I want to store for every ingredient statistics how much did users search with that selected ingredient, so I can show them later at the top ingredients which they used mostly for searching recipes.
This is how my collection looks like:
http://prntscr.com/nlz062
And now I would like to order recipes by statistics that created logged in user.
first = firebaseHelper
.getDb()
.collection(Constants.INGREDIENTS_COLLECTION)
.orderBy("statistics." + firebaseHelper.getCurrentUser().getUid() + ".count")
.limit(25);
If logged in user hasn't yet searched recipes with ingredients, then it should order normally. Anyway the query above is not working. Is it possible this use case to be done with Firestore.
Note: Statistics may exists or may not for logged in user, it all depends on his search.
You can't query and documents by fields that don't immediately exist within the document. Or, in other words, you can't use fields documents within subcollections that are not in the named collection being queried.
As of today (using the latest Firestore client libraries), you could instead perform a collection group query to query all of the subcollections called "statistics" for their count field. However, that will still only get you the statictics documents. You would have to iterate those documents, parse the ingredient document ID out of its reference, and individually get() each one of those documents in order to display a UI.
The collection group query would look something like this in JavaScript:
firestore
.collectionGroup("statistics")
.where(FieldPath.documentId())
.orderBy("count")
.limit(25)
You should be able to iterate those results and get the related documents with no problem.

Firestore Query on large collection performance

I am making an app where users can follow each other. To decide how to model it in firestore I would like to know how does collection size affect query performance.
I first thought of making it like this:
relationships(coll.)
----{userId_1}(document)
--------following(coll)
------------{someId1}(document)
------------{someId2}(document)
.....
--------followers(coll)
------------{someId5}(document)
------------{someId7}(document)
.....
----{userId_2}(document)
--------following(coll)
------------{someId11}(document)
------------{someId24}(document)
.....
--------followers(coll)
------------{someId56}(document)
------------{someId72}(document)
.....
So I would have main collection relationships, then each document would represent one user and he would have two collections - following and followers, and in those collections I would store documents with data like id,name,email,..
Then when user1 wants to see his followers, I would get all documents under relationships/userId_1/followers, and if he would like to see who he follows I would get documents under relationships/userId_1/following
I also thought about doing it like this:
relationships(coll)
----{user5id_user4id}(document)
--------user1:"user5id" (field)
--------user2:"user4id" (field)
.........(other fields)
----{user4_user5}(document)
--------user1:"user4id" (field)
--------user2:"user5id" (field)
.........(other fields)
I would have one main collection relationships where each document would represent one following relationship, document name would be firstUserId_secondUSerId (means firstUserId follows secondUserId) and I would also have two fields user1 and user2 that would store ids of two users where user1 follows user2
So if I am {myUserId} and I would like to get all the people who I follow I would do a query on relationships collection where user1 = myUserId
And if I would like to get all the people who follow me I would do a query on relationships collection where user2 = myUserId
since each document represents relation user1 follows user2.
So my question is which way would be more efficient with querying the data.
In first case each user would have collection of his followers/following and I would just get the documents, in second case relationship would have many document representing user1->follows->user2 relation.
I know that I would be billed by number of documents that query function returns, but how fast would it be if it would need to search through large collection.
Collection size has no bearing on the performance or cost of a query. Both are determined entirely by size of the result size (number of documents). So, a query for 10 documents out of 100 performs and costs the same as a query for 10 documents out of 100,000. The size of 10 is the only thing that matters here.
See also: Queries scale with the size of your result set, not the size of your data set

Querying MongoDB collection with heterogeneous schema efficiently

I'm developing a web application with NodeJS, MongoDB and Mongoose. It is intended to act as an interface between the user and a big data environment. The idea is that the users can execute the big data processes in a separated cluster, and the results are stored in a MongoDB collection Results. This collection may store more than 1 million of documents per user.
The document schema of this collection can be completely different between users. For instance, we have user1 and user2. Examples of document in the Resultscollection for user1 and user2:
{
user: ObjectId(user1):, // reference to user1 in the Users collection
inputFields: {variable1: 3, ...},
outputFields: { result1: 504.75 , ...}
}
{
user: ObjectId(user2):,
inputFields: {country: US, ...},
outputFields: { cost: 14354.45, ...}
}
I'm implementing a search engine in the web application so that each user can filter in the fields according to the schemas of their documents (for example, user1 must me able to filter by inputFields.variable1, and user2 by outputFields.cost). Of course I know that I must use indexes, otherwise the queries are so slow.
My first attempt was to create an index for each different field in the Results collection, but it's quite inefficient, since the database server becomes unstable because of the size of the indexes. So my second attempt was to try to reduce the amount of indexes by using partial indexes, so that I create indexes specifying the user id in the option partialFilterExpression.
The problem is that if another user has the same schema in the Results collection as any other user and I try to create the indexes for this user, MongoDB throws this exception:
Index with pattern: { inputFields.country: 1 } already exists with different options
It happens because the partial indexes cannot index the same fields even though the partialFilterExpression is different.
So my questions are: How could I allow the users to query their results efficiently in this environmnet? Is MongoDB really suitable for this use case?
Thanks

Mongoid: retrieving documents whose _id exists in another collection

I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.

Mongo/No SQL solution to second tier data query?

I have an existing PostgreSQL database, which contains roughly 500,000 entries each of which is essentially a category in a huge tree of categories (each category has different schemas of elements).
I also have a MySQL database, which contains roughly 100,000 documents, each of which can be categories in one or more categories.
I need to be able to search for documents, which match attribute filters which are set in the categories the document is linked to.
As I understand it, I'd have to store all the data relating to all the categories a document links to, in each document, in mongo, and that just seems insane. How can I make this work?
As an example, imagine a category, which represents a red car, made in 1964, and a document which was written in 1990 about that red car. I need to be able to search for 1964 and fine the document about the car, as well as the car itself.
n:m relations in MongoDB can be expressed with arrays of database referencs (DBRef) or arrays of object IDs.
So each document would have a field "categories" which has an array with the IDs or database references of the categories it belongs to.
See this article for further information:
http://docs.mongodb.org/manual/applications/database-references/
An alternative which avoids to perform multiple database queries just to show the names of the categories would be to put the category names in that array instead of the IDs. Then you should also add an index (with the ensureIndex function) to the name field of your category collection for faster lookup (you might want to create a unique index on this field anyway to avoid duplicate category names).
About the data an object has because it belongs to a category, like cars having a manufacturer and a document having a list of other objects mentioned in the document: this data should be put directly into the document of the object. The advantage of a document-oriented database is that not every entity must have the same fields.