I would like to offer full-text search to my users through their data - and make sure that they can only access the data they own. Are there any patterns allowing to do that on Algolia ? None of the solutions I've considered seem a good fit, so i was wondering if I had overlooked some other options.
We could host each user's data in a separate Algolia app, so that each API key would give access to only the relevant data, but that would quickly become unaffordable, as many would hit the 10000 records limit.
We could host each user's data in a separate index and use team index restrictions, but there does not seem to be an API to manage those, and that would anyway require an Algolia account for each customer, which seems like a misuse of the service (we could e.g. generate email addresses at our domain name).
Finally we could filter queries with some userId to retrieve only the relevant data, but that wouldn't be secure, as someone could use the apikey to query algolia without the filter.
We could proxy algolia calls to inject the filter and the api key - but the perf penalty would probably be high.
Any other suggestions ? Thanks!
I got a great answer from rayrutjes at Algolia, so I'm pasting it here in case :
The best approach for your use case is to use what we call generated API keys. Here is the documentation for the JavaScript client: https://www.algolia.com/doc/api-client/javascript/api-keys/#generate-key
The usage is fairly simple, you generate an API key on the fly based on your search API key + some additional query params.
The resulting API key can be used like a standard search API key, with the difference that it can be scoped on a given set of parameters.
Note that the generation of such a scoped API key does not require an actual call to the API.
Also be sure to generate those scoped API keys in the backend as in that case you don't want to expose the search API key you use for their generation.
Related
I'm thinking about a REST API design. There are several tables in my database. For example Customer and Order.
Of course - each Order has its Customer (and every customer can have many Orders).
I've decided to provide such an interface
/api/v1/Customers/ -- get list of Customers, add new Customer
/api/v1/Customers/:id: -- get Customer with id=:id:
/api/v1/Orders/ -- get list of Orders, add new Order
/api/v1/Orders/:id: -- get Order with id=:id:
It works flawlessly. But my frontend has to display a list of orders with customer names. With this interface, I will have to make a single call to /api/v1/Orders/ and then another call to /api/v1/Customer/:id: for each record from the previous call. Or perform two calls to /api/v1/Orders/ and /api/v1/Customers/ and combine them on the frontend side.
It looks like overkill, this kind of operation should be done at the database level. But how can/should I provide an appropriate interface?
/api/v1/OrdersWithCustomers
/api/v1/OrdersWithCustomers/:id:
Seems weir. Is it a right way to go
There's no rule that says you cannot "extend" the data being returned from a REST API call. So instead of returning "just" the Order entity (as stored in the backend), you could of course return an OrderResponseDTO which includes all (revelant) fields of the Order entity - plus some from the Customer entity that might are relevant in your use case.
The data model for your REST API does not have to be an exact 1:1 match to your underlying database schema - it does give you the freedom to leave out some fields, or add some additional information that the consumers of your API will find helpful.
Great question, and any API design will tend to hit pragmatic reality at some point like this.
One option is to include a larger object graph for each resource (ie include the customer linked to each order) but use filter query parameters to allow users to specify what properties they require or don't require.
Personally I think that request parameters on a restful GET are fine for either search semantics when retrieving a list of resources, or filtering what is presented for each resource as in this case
Another option for your use case might be to look into a GraphQL approach.
How would you do it on the web?
You've got a web site, and that website serves documents about Customers, and documents about Orders. But your clients aren't happy, because its too much boring, mistake-prone work to aggregate information in the two kinds of documents.
Can we please have a document, they ask, with the boring work already done?
And so you generate a bunch of these new reports, and stick them on your web server, and create links to make it easier to navigate between related documents. TA-DA.
A "REST-API" is a facade that makes your information look and act like a web site. The fact that you are generating your representations from a database is an implementation details, deliberately hidden behind the "uniform interface".
Let's take the following resource in my REST API:
GET `http://api/v1/user/users/{id}`
In normal circumstances I would use this like so:
GET `http://api/v1/user/users/aabc`
Where aabc is the user id.
There are times, however, when I have had to design my REST API in a way that some extra information is passed with the ID. For example:
GET `http://api/v1/user/users/customer:1`
Where customer:1 denotes I am using an id from the customer domain to lookup the user and that id is 1.
I now have a scenario where the identifier is more than one key (a composite key). For example:
GET `http://api/v1/user/users/customer:1;type:agent`
My question: in the above URL, what should I use as the separator between customer:1 and type:agent?
According to https://www.ietf.org/rfc/rfc3986.txt I believe that the semi-colon is not allowed.
You should either:
Use parameters:
GET http://api/v1/user/users?customer=1
Or use a new URL:
GET http://api/v1/user/users/customer/1
But use Standards like this
("Paths tend to be cached, parameters tend to not be, as a general rule.")
Instead of trying to create a general structure for accessing records via multiple keys at once, I would suggest trying to think of this on more of a case-by-case basis.
To take your example, one way to interpret it is that you have multiple customers, and those customers each may have multiple user accounts. A natural hierarchy for this would be:
/customer/x/user/y
Often an elegant decision like this can be made, that not only solves the problem but also documents your data-model in a way that someone can easily see that users belong to customers via a 1-to-many relationship.
Hello Internet Denizens,
I was reading through a nice database design article and the final determination on how to properly generate DB primary keys was ...
So, in reality, the right solution is probably: use UUIDs for keys,
and don’t ever expose them. The external/internal thing is probably
best left to things like friendly-url treatments, and then (as Medium
does) with a hashed value tacked on the end.
That is, use UUIDs for internal purposes like db joins, but use a friendly-url for external purposes (like a REST API).
My question is ... how do you make uniquely identifiable (and friendly) keys for external purposes?
I've used several APIs: Stripe, QuickBooks, Amazon, etc. and it seems like they use straight up sequential IDs for things like customers, report IDs, etc for retrieving information. It makes me wonder if exposing UUIDs as a security risk is a little overblown b/c in theory you should be able to append a where clause to your queries.
SELECT * FROM products where UUID = <supplied uuid> AND owner/role/group/etc = <logged in user>
The follow-up question is: If you expose a primary key, how do people efficiently restrict access to that resource in a database environment? Assign an owner to a db row?
Interested in the design responses.
Potential Relevant Posts for Further Reading:
Should I use UUIDs for resources in my public API?
It is not a good idea to expose your internal ids to the outside. You should either encode them (with some algorithm) or have a look up table.
Also, do not append parameters provided by user (or URL) to your SQL query (UUIDS or not), this is prone to SQL injection. Use parameterized SQL queries for that.
with Algolia is it possible to restrict the attributes to retrieve when building a Secured API Key?
By defualt, when searching, attributesToRetrieve parameter may be used, however I am not sure if it's possible to get used during the generation of a Secured API key.
The reason of this is because we want to restrict certain attributes of a document to specific users.
Unfortunately, it's not possible to restrict the attributes to retrieve using the attributesToRetrieve query parameter while generating the Secured API key -> the user will still be able to override it at query time.
The only thing you can do is configure the unretrievableAttributes setting in your index settings. This setting forces some attributes to be non-retrievable whatever the attributeTo{Retrieve,Highlight,Snippet} query parameter you set.
In designing a RESTful API, the following call gives us basic information on user 123 (first name, last name, etc):
/api/users/123
We have a lot of information on users so we make additional calls to get subresources on a user like their cart:
/api/users/123/cart
For an admin page we would like to see all the cart information for all the users. A big table listing each user and some details about their cart. Obviously we don't want to make a separate API call for each user (tons of requests). How would this be done using RESTful API patterns?
/api/carts/users came to mind but then you'd in theory have 2 ways to get a specific user's cart by going /api/carts/users/123.
This is generally solved by adding a deref capability to your REST server. Assuming the response from your user looks like:
{
...
cartId: "12345",
...
}
you could add a simple dereference by passing in the query string "&deref=cart" (or however you setup your syntax.)
This still leaves the problem of making a request per user. I'd posit there are two ways to generally do this. The first would be with a multiget type of resource (see [1] for an example). The problem with this approach is you must know all of the IDs and handle paging yourself. The second (which I believe is better) is to implement an index endpoint to your user resource. Indexing allows you to query a resource (generally via a query string such as firstName=X or whatever else you want to sort on.) Then you should implement basic paging so you're not passing around massive amounts of data. There are tons of examples of paging, but the simplest would be to specify a number (count=20) a start token (since=X) and a sort order (sort=-createdAt).
These implementations allow you to then ask for all users and their carts by iterating on the index endpoint. You might find this helpful as a starting point for paging [2].
[1] - How to construct a REST API that takes an array of id's for the resources
[2] - Pagination in a REST web application
For some reason I was under the assumption that having 2 URIs to the same resource was a bad thing. In my situation /api/users/123/cart and /api/carts/users/123 would return the same data. Through more research I've learned from countless sources that it's acceptable to have multiple URIs to the same resource if it makes sense to the end user.
In my case I probably wont expose /api/carts/users/123, but I'm planning on using /api/carts/users with some query parameters to return a subset of carts in the system. Similarly, I'm going to have /api/carts/orgs to search org carts.
A really good site I found with examples and clear explanations was the REST API Tutorial. I hope this helps others with planning their API URIs.