DynamoDB Model Chronological Shipment Update Data - nosql

I recently just started learning about DynamoDB single table design. Now, I am trying to model Shipment Update data that has the following properties:
an account has multiple users
an account has multiple shipments
a shipment can change eta multiple times
each time there's a shipment update, a new record will be inserted
Access patterns:
get all shipments of an account displaying the last updated status ordered by eta in an ascending order
for a given shipment, get the chronological updates
I am having a difficulty trying to resolve the 2 access patterns mentioned above. If, per se, I only have 1 record per shipment, then I can just update the sort key for the shipment update items to be shpm#55abc and the retrieval of all shipments for a given account by eta is straight forward, which is via the gsi accountEta.
How do I resolve this to get the access patterns I need? Should I consider having a separate table for the shipment update audit, i.e. to store just the shipment updates? So that when I need access pattern #2, then I query this audit table by the shipment id to get all the chronological updates. But, I feel like this defeats the purpose of the single table design.

A single-table design is a good fit for these access patterns. Use overloadable, generic key names like PK and SK. Here is one approach*:
Shipments have a "current" record. Add a global secondary index (GSI1) to create an alternate Primary Key for querying by account in ETA order (pattern #1). All changes to the shipment are executed as updates to this "current" record.
# shipment "current" record
PK SK GSI1PK GSI1SK
shpmt#55abc x_current account#123 x_eta#2022-07-01
Next, enable DynamoDB Streams on the table to capture shipment changes. Each time a "current" record is updated, the Lambda backing the Stream writes the OLD_IMAGE to the table as a change control record. This enables pattern #2 by shipment and account.
# shipment update record
PK SK GSI1PK GSI1SK
shpmt#55abc update#2022-06-28T06:10:33.247Z account#123 update#2022-06-28T06:10:33.247Z
One virtue of this approach is that a single query operation can retrieve both the current shipment record and its full/partial change history in reverse order. This is the reason for the x_ prefixes on the current record's keys. A query with a key expression of PK = shpmt#55abc AND SK >= "update", DESC sorting with ScanIndexForward=False and a limit of 2 returns the current record (x_current) and the latest update record.
* Whether this is a good solution for you also depends on expected read/write volumes.

Related

How to avoid customer's order history being changed in MongoDB?

I have two collections
Customers
Products
I have a field called "orders" in each of my customer document and what this "orders" field does is that it stores a reference to the product Id which was ordered by a customer, now my question is since I'm referencing product Id and if I update the "title" of that product then it will also update in the customer's order history since I can't embed each order information since a customer may order thousands of products and it can hit 16mb mark in no time so what's the fix for this. Thanks.
Create an Orders Collection
Store ID of the user who made the order
Store ID of the product bought
I understand you are looking up the value of the product from the customer entity. You will always get the latest price if you are not storing the order/price historical transactions. Because your data model is designed this way to retrieve the latest price information.
My suggestion.
Orders place with product and price always need to be stored in history entity or like order lines and not allow any process to change it so that when you look up products that customers brought you can always get the historical price and price change of the product should not affect the previous order. Two options.
Store the order history in the current collection customers (or top say 50 order lines if you don't need all of history(write additional logic to handle this)
if "option1" is not feasible due to large no. of orders think of creating an order lines transaction table and refer order line for the product brought via DBref or lookup command.
Note: it would have helped if you have given no. of transactions in each collection currently and its expected rate of growth of documents in the collection QoQ.
You have orders and products. Orders are referencing products. Your problem is that the products get updated and now your orders reference the new product. The easiest way to combat this issue is to store full data in each order. Store all the key product-related information.
The advantage is that this kind of solution is extremely easy to visualize and implement. The disadvantage is that you have a lot of repetitive data since most of your products probably don't get updated.
If you store a product update history based on timestamps, then you could solve your problem. Products are identified now by 3 fields. The product ID, active start date and active end date. Or you could configure products in this way: product ID = product ID + "Version X" and store this version against each order.
If you use dates, then you will query for the product and find the product version that was active during the time period that the order occurred. If you use versions against the product, then you will simply query the database for the particular version of the product itself. I haven't used mongoDb so I'm not sure how you would achieve this in mongoDb exactly. Naively however, you can modify the product ID to include the version as well using # as a delimiter possibly.
The advantage of this solution is that you don't store too much of extra data. Considering that products won't be updated too often, I feel like this is the ideal solution to your problem

How to handle many to many in DynamoDB

I am new to NoSql and DynamoDb, but from RDBMS..
My tables are being moved from MySql to DynamoDb. I have tables:
customer (columns: cid [PK], name, contact)
Hardware (columns: hid[PK], name, type )
Rent (columns: rid[PK], cid, hid, time) . => this is the association of customer and Hardware item.
one customer can have many Hardware Items and one Hardware Item can be shared among many customers.
Requirements: seperate lists of customers and hadware items should be able to retrieve.
Rent details- which customer barrowed which Hardeware Item.
I referred this - secondary index table. This is about keeping all columns in one table.
I thought to have 2 DynamoDb tables:
Customer - This has all attributes similar to columns AND set of hardware Item hash keys. (Then my issue is, when customer table is queried to retrieve only customers, all hardware keys are also loaded.)
Any guidance please for table structure? How to save, and load, and even updates ?
Any java samples please? (couldn't find any useful resource which similar to my scenario)
Have a look on DynamoDB's Adjacency List Design Pattern
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html
In your case, based on Adjacency List Design Pattern, your schema can be designed as following
The prefix of partition key and sort key indicate the type of record.
If the record type is customer, both partition key and sort key should have the prefix 'customer-'.
If the record is that the customer rents the hardware, the partition key's prefix should be 'customer-' and the sort key's prefix should be 'hardware-'
base table
+------------+------------+-------------+
|PK |SK |Attributes |
|------------|------------|-------------|
|customer-cid|customer-cid|name, contact|
|hardware-hid|hardware-hid|name, type |
|customer-cid|hardware-hid|time |
+------------+------------+-------------+
Global Secondary Index Table
+------------+------------+----------+
|GSI-1-PK |GSI-1-SK |Attributes|
|------------|------------|----------|
|hardware-hid|customer-cid|time |
+------------+------------+----------+
customer and hardware should be stored in the same table. customer can refer to hardware by using
SELECT * FROM base_table WHERE PK=customer-123 AND SK.startsWith('hardware-')
if you hardware want to refer back to customer, you should use GSI table
SELECT * FROM GSI_table WHERE PK=hardware-333 AND SK.startsWith('customer-')
notice: the SQL I wrote is just pseudo code, to provide you an idea only.
Take a look at this answer, as it covers many of the basics which are relevant to you.
DynamoDB does not support foreign keys as such. Each table is independent and there are no special tools for keeping two tables synchronised.
You would probably have an attribute in your customers table called hardwares. The attribute would be a list of hardware ids the customer has. If you wanted to see all hardware items belonging to a customer you would:
Perform GetItem on the customer id. Or use Query depending on how you are looking the customer up.
For each hardware id in the customer's hardware attribute, perform a GetItem on the Hardware table.
With DynamoDB you generally end up doing more in the client application relative to an RDBMS solution. The benefits are that its fast and simple. But you will find you probably move a lot of your work from the database server to your client server.

Filter and display database audit / changelog (activity stream)

I'm developing an application with SQLAlchemy and PostgreSQL. Users of the system modify data in 8 or so tables. Consider this contrived example schema:
I want to add visible logging to the system to record what has changed, but not necessarily how it has changed. For example: "User A modified product Foo", "User A added user B" or "User C purchased product Bar". So basically I want to store:
Who made the change
A message describing the change
Enough information to reference the object that changed, e.g. the product_id and customer_id when an order is placed, so the user can click through to that entity
I want to show each user a list of recent and relevant changes when they log in to the application (a bit like the main timeline in Facebook etc). And I want to store subscriptions, so that users can subscribe to changes, e.g. "tell me when product X is modified", or "tell me when any products in store S are modified".
I have seen the audit trigger recipe, but I'm not sure it's what I want. That audit trigger might do a good job of recording changes, but how can I quickly filter it to show recent, relevant changes to the user? Options that I'm considering:
Have one column per ID type in the log and subscription tables, with an index on each column
Use full text search, combining the ID types as a tsvector
Use an hstore or json column for the IDs, and index the contents somehow
Store references as URIs (strings) without an index, and walk over the logs in reverse date order, using application logic to filter by URI
Any insights appreciated :)
Edit It seems what I'm talking about it an activity stream. The suggestion in this answer to filter by time first is sounding pretty good.
Since the objects all use uuid for the id field, I think I'll create the activity table like this:
Have a generic reference to the target object, with a uuid column with no foreign key, and an enum column specifying the type of object it refers to.
Have an array column that stores generic uuids (maybe as text[]) of the target object and its parents (e.g. parent categories, store and organisation), and search the array for marching subscriptions. That way a subscription for a parent category can match a child in one step (denormalised).
Put a btree index on the date column, and (maybe) a GIN index on the array UUID column.
I'll probably filter by time first to reduce the amount of searching required. Later, if needed, I'll look at using GIN to index the array column (this partially answers my question "Is there a trick for indexing an hstore in a flexible way?")
Update this is working well. The SQL to fetch a timeline looks something like this:
SELECT *
FROM (
SELECT DISTINCT ON (activity.created, activity.id)
*
FROM activity
LEFT OUTER JOIN unnest(activity.object_ref) WITH ORDINALITY AS act_ref
ON true
LEFT OUTER JOIN subscription
ON subscription.object_id = act_ref.act_ref
WHERE activity.created BETWEEN :lower_date AND :upper_date
AND subscription.user_id = :user_id
ORDER BY activity.created DESC,
activity.id,
act_ref.ordinality DESC
) AS sub
WHERE sub.subscribed = true;
Joining with unnest(...) WITH ORDINALITY, ordering by ordinality, and selecting distinct on the activity ID filters out activities that have been unsubscribed from at a deeper level. If you don't need to do that, then you could avoid the unnest and just use the array containment #> operator, and no subquery:
SELECT *
FROM activity
JOIN subscription ON activity.object_ref #> subscription.object_id
WHERE subscription.user_id = :user_id
AND activity.created BETWEEN :lower_date AND :upper_date
ORDER BY activity.created DESC;
You could also join with the other object tables to get the object titles - but instead, I decided to add a title column to the activity table. This is denormalised, but it doesn't require a complex join with many tables, and it tolerates objects being deleted (which might be the action that triggered the activity logging).

TSQL - Deleting with Inner Joins and multiple conditions

My question is a variation on one already asked and answered (TSQL Delete Using Inner Joins) but I have a different level of complexity and I couldn't see a solution to it.
My requirement is to delete Special Prices which haven't been accessed in 90 days. Special Prices are keyed on Customer ID and Product ID and the products have to matched to a Customer Order Detail table which also contains a Customer ID and a Product ID. I want to write one function that will look at the Special Price table for each Customer, compare each Product for that Customer with the Customer Order Detail table and if the Maximum Order Date is more than 90 days earlier than today, delete it from the Special Price table.
I know I can use a CURSOR (slow but effective) but would prefer to have a single query like the one in the TSQL Delete Using Inner Joins example. Any ideas and/or is more information required?
I cannot dig more on the situation of your system but i think and if it is ok for you, check MERGE STATEMENT, it might be a help instead of using cursors. check this Link MERGE STATEMENT

Ways to implement data versioning in PostreSQL

Can you share your thoughts how would you implement data versioning in PostgreSQL. (I've asked similar question regarding Cassandra and MongoDB. If you have any thoughts which db is better for that please share)
Suppose that I need to version records in a simple address book. Address book records are stored in one table without relations for simplicity. I expect that the history:
will be used infrequently
will be used all at once to present it in a "time machine" fashion
there won't be more versions than few hundred to a single record.
history won't expire.
I'm considering the following approaches:
Create a new object table to store history of records with a copy of schema of addressbook table and add timestamp and foreign key to address book table.
Create a kind of schema less table to store changes to address book records. Such table would consist of: AddressBookId, TimeStamp, FieldName, Value. This way I would store only changes to the records and I wouldn't have to keep history table and address book table in sync.
Create a table to store seralized (JSON) address book records or changes to address book records. Such table would looks as follows: AddressBookId, TimeStamp, Object (varchar).
Again this is schema less so I wouldn't have to keep the history table with address book table in sync.
(This is modelled after Simple Document Versioning with CouchDB)
I do something like your second approach: have the table with the actual working set and a history with changes (timestamp, record_id, property_id, property_value). This includes the creation of records. A third table describes the properties (id, property_name, property_type), which helps in data conversion higher up in the application. So you can also track very easily changes of single properties.
Instead of a timestamp you could also have an int-like, wich you increment for every change per record_id, so you have an actual version.
You could have start_date and end_date.
When end_date is NULL, it`s the actual record.
I'm versioning glossary data, and my approach was pretty successful for my needs. Basically, for records you need versioning, you divide the fieldset into persistent fields and version-dependent fields, thus creating two tables. Some of the first set should also be the unique key for the first table.
Address
id [pk]
fullname [uk]
birthday [uk]
Version
id [pk]
address_id [uk]
timestamp [uk]
address
In this fashion, you get an address subjects determined by fullname and birthday (should not change by versioning) and a versioned records containing addresses. address_id should be related to Address:id through foreign key. With each entry in Version table you'll get new version for subject Address:id=address_id with a specific timestamp, in which way you can have a history reference.