Query documents in one collection that aren't referenced in another collection with Firestore - google-cloud-firestore

I have a firestore DB where I'm storing polls in one collection and responses to polls in another collection. I want to get a document from the poll collection that isn't referenced in the responses collection for a particular user.
The naive approach would be to get all of the poll documents and all of the responses filtered by user ID then filter the polls on the client side. The problem is that there may be quite a few polls and responses so those queries would have to pull down a lot of data.
So my question is, is there a way to structure my data so that I can query for polls that haven't been completed by a user without having to pull down the collections in their entirety? Or more generally, is there some pattern to use when you need to query for documents in one collection that aren't referenced by another?
The documents in each of the collections look something like this:
Polls:
{
question: string;
answers: Answer[];
}
Responses:
{
userId: string;
pollId: string;
answerId: string;
}
Anyhelp would be much appreciated!

Queries in Firestore can only return documents from one collection (or from all collections with the same name) and can only contain conditions on the data that they actually return.
Since there's no way to filter based on a condition in some other documents, you'll need to include the information that you want to filter on in the polls documents.
For example, you could include a completionCount field in each poll document, that you initially set to 0, and then update only every poll completion. With that in place, the query becomes a simple query on the completionCount field of the polls collection.
For a specific user I'd actually add all polls to their profile document, and remove them from there. Duplicating data is usually the easiest (and sometimes only) way to implement use-cases such as this.
If you're worried about having to add each new poll to each new user profile when it is created, you can also query all polls on their creation timestamp when you next load a user profile and perform that sync at that moment.
load user profile,
check when they were last active,
query for new polls,
add them to user profile.

Related

Flutter & Firebase: Is it possible to get specific field of for each document? (Not whole field then filter it) [duplicate]

I'm currently working on an application where users can create groups and invite others in it.
I would like people in the same group to be able to see their first and last names.
To do that, I have a collection named Users where each of the users have a document contains all their personnal data, like first and last names, phone, position , ...
I have also another collection named Groups, where all of my groups are stored, with their name, and an array contaning the ID of the members.
When an user open the app, a first request is done for request his groups (he recieve the groups names and the arrays of members). After, if he want to know the user in a certain group, another request is done for search only the first and last name of all the members.
So, I imagine that there is a query that will return me only the fields that I would like to retrieve, and that there is a rule allowing a potential hacker to be refused access to the entire user document except if the user is the owner of the document.
// For retrieving my user's groups
Stream<List<Group>?> get organizations {
return firestore
.collection('Groups')
.where('members', arrayContains: this.uid)
.snapshots()
.map(_groupsFromSnapshot);
}
// For retrieving names of the members of a group
Stream<List<Member>?> getMembers(Group group){
return firestore
.collection('Users')
// and i dont know what to do here ...
}
With the Client SDKs and the Flutter plugin it is not possible to get only a subset of the fields of a Document. When you fetch a Document you get it with all its fields.
If you want to get only a subset of the fields of a document, you can implements the two following approaches:
Denormalize your data: You create another collection which contains documents that only contain the fields you want to expose. You need to synchronize the two collections (the Users collection, which is the "master", and the new collection): for that it's quite common to use a Cloud Function. Note also that it's a good idea to use the same documentID for the linked documents in the two collections.
Use the Firestore REST API to fetch the data: With the REST API you can use a DocumentMask when you fetch one document with the get method or a Projection when you query a Collection. The DocumentMask or the Projection will "restrict a get operation on a document to a subset of its fields". You can use the http package for calling the API from your Flutter app.
HOWEVER, the second approach is not valid if you want to protect the other users data: a malicious user could call the Firestore REST API with the same request but without a DocumentMask or a Projection. In other words, this approach is interesting if you just want to minimize the network traffic, not if you want to keep secret certain fields of a document.
So, for your specific use case, you need to go for the first solution.

Performance difference between storing the asset as subdocument vs single document in Mongoose

I have an API for synchronizing contacts from the user's phone to our database. The controller essentially iterates the data sent in the request body and if it passes validation a new contact is saved:
const contact = new Contact({ phoneNumber, name, surname, owner });
await contact.save();
Having a DB with 100 IOPS and considering the average user has around 300 contacts, when the server is busy this API takes a lot of time.
Since the frontend client is made in a way that a contact ID is necessary for other operations (edit, delete), I was thinking about changing the data structure to subdocuments, and instead of saving each Contact as a separate document, the idea is to save one document with many contacts inside:
const userContacts = new mongoose.Schema({
owner: //the id of the contacts owner,
contacts: [new mongoose.Schema({
name: { type: String },
phone: { type: String }
})]
});
This way I have to do just one save. But since Mongo has to generate an ID for each subdocument, is this really that much faster than the original approach?
Summary
This really depends on your exact usage scenarios:
are contacts often updated?
what is the max / average quantity of contacts per user
are they ever partially loaded, or are they always fetched all together?
But for a fairly common collection such as contacts, I would not recommend storing them in subdocuments.
Instead you should be able to use insertMany for your initial sync scenario.
Explanation
Storing as subdocuments makes a bulk-write easier will make querying and updating contacts slower and more awkward than as regular documents.
For example, if I have 100 contacts, and I want to view and edit 1 of them, it needs to load the full 100 contacts. I can make the change via a partial update using $set or $update, so the update will be OK. But when I add a new contact, I will have to add a new contact subDocument to you Contacts document. This makes it a growing document, meaning your database will suffer from fragmentation which can slow things down a lot (see this answer)
You will have to use aggregate with $ projection or $unwind to search through contacts in MongoDB. If you want to apply a specific sort order, this too would have to be done via aggregate or in code.
Matching via projection can also lead to problems with duplicate contacts being difficult to find.
And this won't scale. What if you get users with 1000s of contacts later? Then this single document will grow large and querying it will become very slow.
Alternatives
If your contacts for sync are in the 100s, you might get away with a splitting them into groups of ~50-100 and calling insertMany for each batch.
If they grow into the thousands, then I would suggest uploading all contacts, saving them as JSON / CSV files to disk, then slowly processing these in the background in batches.

"Join" multiple Algolia indices?

Is it possible to "join" indices in Algolia to get a merged result?
For example:
If I have two indices : one for 'users', and one for 'events'. Users each have id and name attributes. Events each have date and userId attributes.
How would I go about searching for all users named "bob", and for each user also return the next 5 events associated with them?
Is it possible to "join" them like you would in a relational database? Or do I need to search for users, then iterate through the hits, searching for events for each user? What's the best solution for this type of query here?
Algolia is not designed as a relational database. To get to what you're trying to achieve, you have to transform all your records into "flat" objects (meaning, each object also includes all their linked dependencies).
In your case, what I would do is to add a new key to your user records, named events and have it be an array of events (just like you save them in the events table). This way, you got all the information needed in one call.
Hope that helps,

Retrieve records in mongoDB using bidirectional query

I have two collections - Tickets and Users. Where a user can have one to many tickets. The ticket collection is defined as follows
Ticket = {_id, ownerId, profile: {name}}
The ownerId is used to find all tickets that belong to a specific person. I need to write a query that gets me all users with no tickets.
How can i write this query without having to loop through all users, checking if the userID shows up in any Tickets?
Would a bidirectional storage cause me any performance problems ? For example, if i were to change my users collection and add an array of tickets: [ticketID, ticketID2, ...]?
I'd go with the array of tickets being stored in users. As far as I know, Mongo doesn't really have a way to query one collection based on the (lack of) elements in another collection. With the array, though, you can simply do db.users.find({tickets:[]}).

MongoDB storing user-specific data on shared collection objects

I'm designing an application that processes RSS feeds using MongoDB. Currently my collections are as follows:
Entry
fields: content, feed_id, title, publish_date, url
Feed
fields: description, title, url
User
fields: email_address
subscriptions (embedded collection; fields: feed_id, tags)
A user can subscribe to feeds which are linked from the embedded subscription collection. From the subscriptions I can get a list of all the feeds a user should see and also the corresponding entries.
How should I store entry status information (isRead, isStarred, etc.) that is specific to a user? When a user views an entry I need to record isRead = 1. Two common queries I need to be able to perform are:
Find all entries for a specific feed where isRead = 0 or no status exists currently
For a specific user, mark all entries prior to a publish date with isRead = 1 (this could be hundreds or even thousands of records so it must be efficient)
Hmm, this is a tricky one!
It makes sense to me to store a record for entries that are unread, and delete them when they're read. I'm basing this on the assumption that there will be more read posts than unread for each individual user, so you might as well not have documents for all of those already-read entries sitting around in your DB forever. It also makes it easier to not have to worry about the 16MB document size limit if you're not having to drag around years of history with you everywhere.
For starred entries, I would simply add an array of Entry ObjectIds to User. No need to make these subscription-specific; it'll be much easier to pull a list of items a User has starred that way.
For unread entries, it's a little more complex. I'd still add it as an array, but to satisfy your requirement of being able to quickly mark as-read entries before a specific date, I would denormalize and save the publish-date alongside the Entry ObjectId, in a new 'UnreadEntry' document.
User
fields: email_address, starred_entries[]
subscriptions (embedded collection; fields: feed_id, tags, unread_entries[])
UnreadEntry
fields: id is Entry ObjectId, publish_date
You need to be conscious of the document limit, but 16MB is one hell of a lot of unread entries/feeds, so be realistic about whether that's a limit you really need to worry about. (If it is, it should be fairly straightforward to break out User.subscriptions to its own document.)
Both of your queries now become fairly easy to write:
All entries for a specific feed that are unread:
user.subscriptions.find(feedID).unread_entries
Mark all entries prior to a publish date read:
user.subscriptions.find(feedID).unread_entries.where(publish_date.lte => my_date).delete_all
And, of course, if you simply need to mark all entries in a feed as read, that's very easy:
user.subscriptions.find(feedID).unread_entries.delete_all