Scoped Collection Group Queries - google-cloud-firestore

Consider a multi-tenant firestore database:
/customers/{customerId}/users/{userId}/rosters/{rosterId}
Since it seems collectionGroup is scoped to the entire database. So in node:
let rosters = db.collectionGroup('rosters').where('isActive', '==', 'true');
would return matches for all customers. For client side this can be scoped through security rules.
How can it be scoped in admin access?

Collection group queries are applied to all collections that have the name you specify. There is currently no way to scope them beyond that. Long term the plan is to allow scoping to paths, but there's no timeline for when that might be available, and it definitely won't be in the near future.
This means that you'll either need to be able to reduce the scope through the collection name (e.g. if you'd have multiple types of rosters, you might want to rename the collections under users to user_rosters so that you can limit a collection-group-query to just user's rosters), or through additional field conditions in your query.
While the latter case might feel like you're going back to global collections, it's actually still better than that, since the subcollections do result in better write throughput.

Related

Firestore Rule - limiting "list" access

I have a collection on which I want to provide list access, but only in a limited manner for most users.
All users should be able to do this: (the string valuex can be anything)
collection("XYZ").where("fieldx", "==", "valuex").get()
Only admins can get all the documents:
collection("XYZ").get()
Note that as valuex can be anything, at the end of the day all users can see all documents. The difference is that non-admins need to know what to query, admins don't, they get it all directly.
The only solution I have found is to force non-admins to write to a document the value they are querying, prior to calling get. The rules then are:
allow list: if isadmin() || resource.data.fieldx == getvaluex();
function isadmin() { return request.auth.token.get("admin", false); }
function getvaluex() { return get(/databases/$(database)/documents/users/$(request.auth.uid).data.valuex; }
That way all returned documents must have the same value for fieldx. But this solution 1) needs 1 additional write 2) adds a read in the rules and 3) in my case valuex is sensitive and I dont want the user to have to store it in Firestore.
So is there any better solution?
Is it possible for instance to limit the usage of an index to only some users? (both queries above actually have more where statements and require each a specific composite index).
Is it possible to compare the returned documents between each others to ensure they all have the same value for fieldx?
The way I would do it is this:
Don't allow non-admins to make those direct requests to the database at all.
Instead, have them send a request to a Firebase Http function.
The Http function has admin access to the db, it can accept any valuex non-null value.
It queries the db using that valuex, on behalf of the non-admin users, and returns the results.
This way, you can keep the documents in collection XYZ locked to non-admins in your Firestore Rules.
You can even keep sensitive data in those documents, since you have control on what you share with users. You can control that by choosing which fields your HTTP function will return to clients.
Mind you, Firebase function invocations are way cheaper than making additional writes/reads.
Firestore works well for easy/normalized access from clients to collections and documents.
What you are trying to do is pretty specific to your implementation of the these "lists".
You may create another collection (list_auth) that tracks the accesses to the list.
In the security access you can create a security rule for the collection that looks up the permissions of user into the list by accessing the list_auth collection.
https://firebase.google.com/docs/firestore/security/rules-conditions#access_other_documents

Firestore security rule to limit reads to use collectionGroup(..)?

Several questions address whether knowing a Firestore uid allows hackers to edit that person's data, like this question and this question. My question is about security rules to filter when users can read another's data.
Specifically, I have a social media app that allows people to post data anonymously. My data model is /users/{user}/posts/{post}. I use db.collectionGroup("posts") to build a timeline of posts, (some anonymous, others with users' names).
Posts that are not anonymous have a valid uid, so it wouldn't be tough for a hacker to figure out someone's uid, which I'm not concerned about. My concern is whether a hacker could then query usersRef.document(uid).posts.getDocuments(); to get all the posts of that user, including the anonymous ones?
Because my app builds timelines from users "posts" collection, I can't write a rule that they can't read another user's posts. Can I write a rule that they can only read posts with collectionGroup?
That's not going to be possible with the way things are structured now. Here's the way you write a rule to allow collection group queries, as described in the documentation
match /{path=**}/posts/{post} {
allow read: if ...condition...;
}
The path wildcard in the rule explicitly allows all reads for all collections named "posts". The rule does not limit the reads to only collection group queries - any normal collection query on any "posts" will be allowed.
Bear in mind also that a collection group query would not hide any data from the caller compared to a normal collection query. The query results will still contain a reference to the full path of each document, which includes the document uid in the path.

Row level security using prisma and postgres

I am using prisma and yoga graphql servers with a postgres DB.
I want to implement authorization for my graphql queries. I saw solutions like graphql-shield that solve column level security nicely - meaning I can define a permission and according to it block or allow a specific table or column of data (on in graphql terms, block a whole entity or a specific field).
The part I am stuck on is row level security - filtering rows by the data they contain - say I want to allow a logged in user to view only the data that is related to him, so depending on the value in a user_id column I would allow or block access to that row (the logged in user is one example, but there are other usecases in this genre).
This type of security requires running a query to check which rows the current user has access to and I can't find a way (that is not horrible) to implement this with prisma.
If I was working without prisma, I would implement this in the level of each resolver but since I am forwarding my queries to prisma I do not control the internal resolvers on a nested query.
But I do want to work with prisma, so one idea we had was handling this in the DB level using postgres policy. This could work as follows:
Every query we run will be surrounded with “begin transaction” and “commit transaction”
Before the query I want to run “set local context.user_id to 5"
Then I want to run the query (and the policy will filter results according to the current_setting(‘context.user_id’))
For this to work I would need prisma to allow me to either add pre/post queries to each query that runs or let me set a context for the db.
But these options are not available in prisma.
Any ideas?
You can use prisma-client instead of prisma-binding.
With prisma-binding, you define the top level resolver, then delegates to prisma for all the nesting.
On the other hand, prisma-client only returns scalar values of a type, and you need to define the resolvers for the relations. Which means you have complete control on what you return, even for nested queries. (See the documentation for an example)
I would suggest you use prisma-client to apply your security filters on the fields.
With the approach you're looking to take, I'd definitely recommend a look at Graphile. It approaches row-level security essentially the same way that you're thinking of. Unfortunately, it seems like Prisma doesn't help you move away from writing traditional REST-style controller methods in this regard.

REST and subcollections

I have resource collection users and each user can have a filter, so another resource collection is filters.
So, to retrieve filters we have this url
/users/:id/filters
How should be url to retrieve a filter by id?
/users/:id/filters/:id
/filters/:id
Are the filters first-class entities or owned by a user? For example, can a filter belong to two users or put in another way, is a filter created then assigned to one or more individuals?
If the filters are properties of the users (one filter to one user, one user to one or more filters) /users/:id/filters/:id makes a lot of sense. If the filters are themselves distinct objects that are related to users (one or more users per filter) then either API (perhaps even both) may make sense depending on how the user is expected to consume the API. If they only care about filters, /filter/:id makes sense, if they only care about users' filters /users/:id/filters/:id is appropriate, or if their needs vary between the two, having both is a sensible solution.
Assign all resources a canonical URI at the root (e.g. /companies/{id} and /employees/{id}).
If a resource cannot exist without another, it should be represented as its sub-resource
The URLs are shorter, easier to remember and not many parameters are required.
https://stackoverflow.com/a/19285843/7527624
https://stackoverflow.com/a/32820784/7527624
/users/:id/filters/:id makes more sense, as the query is to retrieve a specific filter corresponding to a specific user.

GraphQL,Cassandra and denormalization strategy

Would a database like Cassandra and scheme like GraphQL work well together?
Cassandra ideology is based on the idea of optimizing your queries and denormalizing data. This doesn't seem to really mesh well with a GraphQL ideology where data seems to be accessible in every level of a query.
Example:
Suppose I architect my Cassandra table like so:
User:
name
address
etc... (many properties)
Group:
id
name
user_name (denormalized user, where we generally just need the name of a user)
But with GraphQL, it's one wouldn't exactly expect a denormalized User.
query getGroup {
group(id: 1) {
name
users {
name
}
}
}
So a couple of things:
1.) This GraphQL query could end up hitting our Cassandra database multiple times (assuming no caching). Getting the group name and for each of the users we might even hit it for each user. But lets say our resolve creates multiple User objects with one cassandra call.
2.) We can't really build a cassandra idiomatic database with denormalization and graphql in mind, can we? Otherwise we should expect certain properties of a User aren't returned to us with the query.
To sum up the question, what's the graphql strategy for working with denormalized data? Is it acceptable to omit certain properties that the client thinks are accessible? E.g the client tries to access address of user but we don't have that at the moment because our data is denormalized. Or should one not even worry about denormalization and just let graphQL make calls with a caching mechanism in between the db and graphql. E.g graphql first gets the group, then gets the user data for the group id.
This is a side effect of GraphQL where a query can get quite complex in retrieving the data. But as long as the user is actually requesting the data they need if you are smart about your resolvers the end result will actually be faster.
Consider tools like dataloader to cache when resolving a query.
As far as omitting certain properties graphql validates the response and will throw an error, although it will also return the data you gave. It would probably be better to implement some sort of timeout and throw a more descriptive error if there is an issue retrieving the data.