REST API database structure - rest

Im building a simple REST Api, where i have "book" and "bookCategory".
They properities are very simple and the same:
book {id, name, created_at, modified_at }
bookCategory {id, name, created_at, modified_at }
If i had only this tables i would leave it like this, but i have the same logic and structure for "movie", "painting", "video games" etc.
Is it a good practice to split them into different table, even if they have the same structure, but logically they are different.
I could do this which saves me a lot of tables, controllers and forms (keep it DRY):
things {id, **parent_id**, name, created_at, modified_at, **type** }
some example
1 | 0 | "Comedy" | "movie"
2 | 1 | "Dumb and Dumber" | "movie"
3 | 1 | "Ace Ventura" | "movie"
4 | 0 | "Fantasy" | "book"
5 | 4 | "Lord of the Rings" | "book"
It is very compact, but how would look like an endpoint for "all movie categories" or "all categories" ?
domain/api/things/???
Or its better to lay down a flexible ground structure (maybe new properties will come)?

Given the problem space as you've defined it, it's reasonable to use a vertical table - assuming you expect there will be no new properties which are unique to one entity (such as 'writer', which might be on book and movie, but not bookCategory or movieCategory). If you anticipate new unique properties (or aren't sure), I would suggest separate tables for each. While you're violating DRY now, the cost to change later is going to be large.
As far as endpoints,
GET /api/movieCategories
<- an entity with all movie categories
Again, if all categories have and will always have the same properties, it would be reasonable to instead do
GET /api/categories?type=movie

Related

Query children of One-To-Many Relationship based on date along with parent

I have two entities in my dynamo table: User and Order.
Each user has 0..* orders and each order has exactly one associated user. Every order also has a orderDate attribute, that describes when the order was placed.
My current table is structured as follows to make retrieving all orders for a specific user efficient:
+--------------+----------------+--------------------------------------+
| PK | SK | Attributes |
+--------------+----------------+-------------+-----------+------------+
| | | name | firstName | birthDate |
+--------------+----------------+-------------+-----------+------------+
| USER#userid1 | META#userid1 | Foo | Bar | 2000-10-10 |
+--------------+----------------+-------------+-----------+------------+
| | | orderDate | | |
+--------------+----------------+-------------+-----------+------------+
| USER#userid1 | ORDER#orderid1 | 2020-05-10 | | |
+--------------+----------------+-------------+-----------+------------+
I now have a second access pattern where I want to query all orders (regardless of user) that were placed on a specific day (e.g. 2020-05-10) along with the the user(s) that placed them.
I'm struggling to handle this access pattern in my table design. Neither GSIs nor different primary keys seem to work here, because I either have to duplicate every user item for each day or I can't query the orders together with the user.
Is there an elegant solution to my problem?
This is a perfect use case for a secondary index. Here's one way to do it:
You could create a secondary index (GSI1) on the Order item with a Partition Key (GSI1PK) of ORDERS#<orderDate> and a Sort Key (GSI1SK) of USER#<user_id>. It would look something like this:
The logical view of your GSI1 would look like this:
GSI1 would now support a query of all orders placed on a specific day.
Keep in mind that denormalizing your data model (e.g. repeating user info in the Order item) is a common pattern utilized in DynamoDB data modeling. Remember, space is cheap! More importantly, you are pre-joining your data to support your applications access patterns. In this instance, I'd add whatever User metadata you need to the Order item so it gets projected into the index.
Make sense?
Unfortunately, I can't seem to figure out a way to elegantly solve your problem.
You need to either duplicate the user info and store in the order record or use a second getItem to query the user-specific info.
If anyone has better solutions, please let me know.

REST API: Is it acceptable to PUT to resource containing query param filters?

I'm designing an API Service that returns only JSON representations.
Some background and context to my question... Behind the scenes a product in my database has an associated set of prices. Prices consist of (qty, currency_code, unit_price) tuples. Each set of prices belong to a particular product and price list.
Here's a glance at the relational database data. Each row has a unique constraint on (product_id, price_list_id, currency_code, qty). Both product_id and price_list_id are forreign keys.
dev=# SELECT * FROM
price
WHERE
product_id = 1 AND price_list_id = 1 AND currency_code = 'GBP';
id | uuid | product_id | price_list_id | qty | currency_code | unit_price | created | modified
----+--------------------------------------+------------+---------------+-----+---------------+------------+----------------------------+----------------------------
1 | 6fcbbb5b-8e51-4a4c-bf63-270f5d3f1ff8 | 1 | 1 | 1 | GBP | 20417 | 2019-08-15 15:49:19.508808 | 2019-08-15 15:49:19.508808
16 | c044e9fe-bb5f-4996-b8e6-88b4a1b9f125 | 1 | 1 | 2 | GBP | 3453345 | 2019-08-15 15:49:37.896681 | 2019-08-15 15:49:37.896681
17 | c488d372-e58f-4441-a583-281e4c2b1310 | 1 | 1 | 3 | GBP | 312353345 | 2019-08-15 15:49:41.320622 | 2019-08-15 15:49:41.320622
To retrieve a set of prices for a given product I intend to use a GET request to the /products/:product_id/prices?price_list_id=1&currency_code=GBP resource. I expect to receive:
[
{ id: "6fcbbb5b-8e51-4a4c-bf63-270f5d3f1ff8", "qty": 1, "unit_price": 20417 },
... etc // 3 items total
]
If I want to update a set of prices for a given (product_id, price list_id and currency_code), is it acceptable to do the exact reverse by doing a PUT request to the same URI that I used for the GET, i.e., PUT /products/:product_id/prices/price_list_id=1&currency_code=GBP or should I use an alternative?
In the context of a GET request having price_list_id=1 and currency_code=GBP act like filters. Whilst using a PUT request I'm not sure if it's okay to identify a resource for updates using a query parameter as a filter.
Alternatives I've considered are:
PUT /products/:product_id/prices and place the price_list_id and currency_code in the request body. e.g.
{
"price_list_id": "<uuid>",
"currency_code": "GBP",
"data": [
{ "qty": 1, "unit_price": <newprice> },
... /* new set of prices */
]
}
thereby deleting all existing prices and replacing the set with those in the request body.
PUT /products/:product_id/prices/price-lists/:price_list_id which starts to look very long winded. products have many prices so the first part of the resource looks okay to nest as a subresource. However, prices don't have price-lists (it's the opposite way around) so it makes no sense to have price-lists as a sub resource of prices.
PUT /prices/:price_id. This means I have to first retrieve a list, delete them one by one and update prices individually. This is not a good solution as I want to operate on prices as a set collectively. I also want the set to be replaced as a whole or none at all.
The url you are sending a PUT request to should be the 'identity' of the thing you are updating. Including information to locate the resource inside the request body does therefore not make sense.
All the information the server needs to 'locate' the resource you are updating should be in the resource locator (that's the url!)
All of these are completely fine from a HTTP protocol perspective:
PUT /products/:product_id/prices?price_list_id=1&currency_code=GBP
PUT /products/:product_id/prices/price-lists/:price_list_id
PUT /prices/:price_id
It doesn't really matter if your ids are in the path part of query part of the url, it should be irrelevant. For purposes of locating resources there's no difference between /article/5 and /article?id=5.
So which one you choose should be based on consistency within your own API, user-friendliness, etc.
It's interesting that you don't like option 2 for being too long-winded, but you're ok with option 1 which is even longer though ;)
You can use PATCH /products/:product_id/prices
See this answer to a similar question:
https://stackoverflow.com/a/32101994/853837

DynamoDB adjacency list primary key

I am completing an exercise using DynamoDB to model a many to many relationship. I need to allow a many to many relationship between posts and tags. Each post can have many tags and each tag can have many posts.
I have a primary key on id and primary sort key on type and then another global index on id and data, I added another global index on id and type again but I think this is redundant.
Here is what I have so far.
id(Partition key) type(Sort Key) target data
------------- ---------- ------ ------
1 post 1 cool post
tag tag tag n/a
1 tag tag orange
---------------------------------------------
---- inserting another tag will overwrite ---
---------------------------------------------
1 tag tag green
I am taking advice from this awesome talk https://www.youtube.com/watch?v=jzeKPKpucS0 and these not so awesome docs https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html
The issue I am having is that if I try to add another tag with an id "1" and type "tag" it will overwrite the existing tag because it would have the same composite key. What am I missing here? It seems like the suggestion is to make the primary key and sort key be the id and type. Should I have my type be more like "tag#orange"? In that case I could put a global index on the target with a sort key on the type. This way I could get all posts with a certain tag by querying target = "tag" and type starts with "tag".
Just looking for some advice on handling this sort of adjacency list data with Dynamo as it seems very interesting. Thanks!
Basic guidelines for an adjacency-list
You need a few modifications to the way you're modeling. In an adjacency-list you have two types of items:
Top-level (those are your Posts and Tags)
Association (expresses which Tags are associated with each Post and vice-versa)
To build this adjacency-list, you must follow two simple guidelines (which I think are missing in your example):
Each top-level item (in your case a Post or a Tag) must be represented using the primary-key. Also, those items should have the same value in the sort-key and the primary-key
For associations, use the primary-key to represent the source (or parent) and the sort-key to represent the target (or child).
From what I see in your examples, you set the primary-key of your Posts and Tags as just the item ID, while you should also use its type; e.g. Post-1 or Tag-3. In items that represent associations, I also don't see you storing the target ID.
Example
Let's say you have:
Three Posts: "hello world", "foo bar" and "Whatever..."
And three tags: "cool", "awesome", "great"
Post "hello world" has one tag: "cool"
Post "foo bar" has two tags: "cool" and "great"
Post "Whatever..." doesn't have any tags
You'd need to model this way in Dynamo:
PRIMARY-KEY | SORT-KEY | SOURCE DATA | TARGET DATA
--------------|-------------|--------------|-------------
Post-1 | Post-1 | hello world |
Post-2 | Post-2 | foo bar |
Post-3 | Post-3 | Whatever... |
Tag-1 | Tag-1 | cool |
Tag-2 | Tag-2 | awesome |
Tag-3 | Tag-3 | great |
Post-1 | Tag-1 | hello world | cool
Post-2 | Tag-1 | foo bar | cool
Post-2 | Tag-3 | foo bar | great
Tag-1 | Post-1 | cool | hello world
Tag-1 | Post-2 | cool | foo bar
Tag-3 | Post-2 | great | foo bar
How you query this adjacency list
1) You need a particular item, say Post-1:
Query primary-key == "Post-1" & sort-key == "Post-1" - returns: only Post-1
2) You need all tags associated with Post-2:
Query by primary-key == "Post-2" & sort-key BEGINS_WITH "Tag-" - returns: Tag-1 and Tag-3 associations.
Check the documentation about the begin_with key condition expression.
3) You need all Posts associated with, say Tag-1:
Query by primary_key == "Tag-1" & sort-key BEGINS_WITH "Post-" - returns: Post-1 and Post-2 associations.
Note that, if you change the contents of a given post, you need to change the value in all association items as well.
You can also don't store the post and tag content in association items, which saves storage space. But, in this case, you'd need two queries in the example queries 2 and 3 above: one to retrieve associations, another to retrieve each source item data. Since querying is more expensive than storing data, I prefer to duplicate storage. But it really depends if your application is read-intensive or write-intensive. If read-intensive, duplicating content in associations gives you benefit of reducing read queries. If write-intensive, not duplicating content saves write queries to update associations when the source item is updated.
Hope this helps! ;)
I don't think you are missing anything. The idea is that ID is unique for the type of item. Typically you would generate a long UUID for the ID rather than using sequential numbers. Another alternative is to use the datetime you created the item, probably with an added random number to avoid collisions when items are being created.
This answer I have previously provided may help a little DynamoDB M-M Adjacency List Design Pattern
Don't remove the sort key - this wont help make your items more unique.

Nicely managed lookup tables

We have a people table, each person has a gender defined by a gender_id to a genders table,
| people |
|-----------|
| id |
| name |
| gender_id |
| genders |
|---------|
| id |
| name |
Now, we want to allow people to create forms by themselves using a nice form builder. One of the elements we want to add is a select list with user defined options,
| lists |
|-------|
| id |
| name |
| list_options |
|--------------|
| id |
| list_id |
| label |
| value |
However, they can't use the genders as a dropdown list because it's in a different table. They could create a new list with the same options as genders but this isn't very nice and if a new gender is added they'd need to add it in multiple places.
So we want to move the gender options into a list that the user can edit at will and will be reflected when a new person is created too.
What's the best way to move the genders into a list and list_options while still having a gender_id (or similar) column in the people table? Thoughts I've had so far include;
Create a 'magic' list with a known id and always assume that this contains the gender options.
Not a great fan of this because it sounds like using 'magic' numbers. The code will need some kind of 'map' between system level select boxes and what they mean
Instead of having a 'magic' list, move it out into an option that the user can choose so they have a choice which list contains the genders.
This isn't really much different, but the ID wouldn't be hardcoded. It would require more work looking through DB tables though
Have some kind of column(s) on the lists table that would mark it as pulling its options from another table.
Would likely require a lot more (and more complex) code to make this work.
Some kind of polymorphic table that I'm not sure how would work but I've just thought about and wanted to write down before I forget.
No idea how this would work because I've only just had the idea
The easiest solution would change your list_options table to a view. If you have multiple tables you need have a list drop down for to pull from this table, just UNION result sets together.
SELECT
(your list id here) -- make this a part primary key
id, -- and this a part primary key
Name,
FROM dbo.Genders
UNION
SELECT
(your list id here) -- make this a part primary key
id, -- and this a part primary key
Name,
FROM dbo.SomeOtherTable
This way it's automatically updated anytime the data changes. Now you are going to want to test this, as if this gets big it might get slow, you can get around this by only pulling all this information once in your application (or say cache it for 30 minutes and then refresh just in case).
Your second option is to create a table list_options and then create a procedure (etc.) which goes through all the other lookup tables and pulls the information to compile it. This will be faster for application performance, but it will require you to keep it all in sync. The easiest way to handle this one is to create a series of triggers which will rebuild portions (or the entire) list_options table when something in the look up tables is changed. In this one, I would suggest moving away from creating a automatically generated primary key and move to a composite key, like I mentioned with the views. Since this is going to be rebuilt, the id will change, so it's best to not having anything think that value is at all stable. With the composite (list_id,lookup_Id) it should always be the same no matter how many times that row is inserted into the table.

Join custom table with product attributes on custom attribute

It's fairly simple - I have a table with a field that corresponds to certain custom product attribute.
+----+-----------+---------+---------------------+
| id | from_user | to_user | first_contact |
+----+-----------+---------+---------------------+
| 2 | 2 | 2 | 2012-10-26 18:24:30 |
+----+-----------+---------+---------------------+
to_user corresponds to profile_id product attribute which is unique. Now, I need to join product attributes such as status and url_key of that product to my custom table. I get stuck at the very soon, since I don't know how to reference profile_id field in my join syntax. I could always rely on flat product table, but I'd like to make the script more robust so that it works even if flat product is off. I presume I should use joinAttribute() but I'm having trouble understanding how.
Thanks