How can I partition this table? - postgresql

I want to track the states of subscriptions between a contact and a list. The states are...
pending
subscribed
resubscribed (a subset of subscribed)
unsubscribed
I also want to know when each step happened. Rather than using a single field I'm using three timestamps.
created_at
subscribed_at
unsubscribed_at
The states correspond to:
pending: subscribed_at is null
subscribed: subscribed_at > coalesce(unsubscribed_at, '-infinity')
resubscribed: ... and unsubscribed_at is not null
unsubscribed: coalesce(unsubscribed_at, '-infinity') > subscribed_at
The major operations are...
Find pending subscriptions for a contact and list.
Find active subscriptions for a contact, a list, or both.
Find unsubscriptions for a contact, a list, or both.
Delete old pending subscriptions.
Delete old unsubscriptions.
This table can potentially get quite large. We have in the order of 10,000,000 contacts and 100 lists. A contact has about 10 active subscriptions. A list can have in the order of 100,000 active subscriptions. Old pending and unsubscriptions will be purged periodically.
I'm anticipating most queries will use subscribed_at and unsubscribed_at which will complicate indexing. I'd like to partition the table into pending, subscribed, and unsubscribed, but I can't figure out how.
This is using PostgreSQL 12.

Related

DynamoDB Model Chronological Shipment Update Data

I recently just started learning about DynamoDB single table design. Now, I am trying to model Shipment Update data that has the following properties:
an account has multiple users
an account has multiple shipments
a shipment can change eta multiple times
each time there's a shipment update, a new record will be inserted
Access patterns:
get all shipments of an account displaying the last updated status ordered by eta in an ascending order
for a given shipment, get the chronological updates
I am having a difficulty trying to resolve the 2 access patterns mentioned above. If, per se, I only have 1 record per shipment, then I can just update the sort key for the shipment update items to be shpm#55abc and the retrieval of all shipments for a given account by eta is straight forward, which is via the gsi accountEta.
How do I resolve this to get the access patterns I need? Should I consider having a separate table for the shipment update audit, i.e. to store just the shipment updates? So that when I need access pattern #2, then I query this audit table by the shipment id to get all the chronological updates. But, I feel like this defeats the purpose of the single table design.
A single-table design is a good fit for these access patterns. Use overloadable, generic key names like PK and SK. Here is one approach*:
Shipments have a "current" record. Add a global secondary index (GSI1) to create an alternate Primary Key for querying by account in ETA order (pattern #1). All changes to the shipment are executed as updates to this "current" record.
# shipment "current" record
PK SK GSI1PK GSI1SK
shpmt#55abc x_current account#123 x_eta#2022-07-01
Next, enable DynamoDB Streams on the table to capture shipment changes. Each time a "current" record is updated, the Lambda backing the Stream writes the OLD_IMAGE to the table as a change control record. This enables pattern #2 by shipment and account.
# shipment update record
PK SK GSI1PK GSI1SK
shpmt#55abc update#2022-06-28T06:10:33.247Z account#123 update#2022-06-28T06:10:33.247Z
One virtue of this approach is that a single query operation can retrieve both the current shipment record and its full/partial change history in reverse order. This is the reason for the x_ prefixes on the current record's keys. A query with a key expression of PK = shpmt#55abc AND SK >= "update", DESC sorting with ScanIndexForward=False and a limit of 2 returns the current record (x_current) and the latest update record.
* Whether this is a good solution for you also depends on expected read/write volumes.

How to avoid customer's order history being changed in MongoDB?

I have two collections
Customers
Products
I have a field called "orders" in each of my customer document and what this "orders" field does is that it stores a reference to the product Id which was ordered by a customer, now my question is since I'm referencing product Id and if I update the "title" of that product then it will also update in the customer's order history since I can't embed each order information since a customer may order thousands of products and it can hit 16mb mark in no time so what's the fix for this. Thanks.
Create an Orders Collection
Store ID of the user who made the order
Store ID of the product bought
I understand you are looking up the value of the product from the customer entity. You will always get the latest price if you are not storing the order/price historical transactions. Because your data model is designed this way to retrieve the latest price information.
My suggestion.
Orders place with product and price always need to be stored in history entity or like order lines and not allow any process to change it so that when you look up products that customers brought you can always get the historical price and price change of the product should not affect the previous order. Two options.
Store the order history in the current collection customers (or top say 50 order lines if you don't need all of history(write additional logic to handle this)
if "option1" is not feasible due to large no. of orders think of creating an order lines transaction table and refer order line for the product brought via DBref or lookup command.
Note: it would have helped if you have given no. of transactions in each collection currently and its expected rate of growth of documents in the collection QoQ.
You have orders and products. Orders are referencing products. Your problem is that the products get updated and now your orders reference the new product. The easiest way to combat this issue is to store full data in each order. Store all the key product-related information.
The advantage is that this kind of solution is extremely easy to visualize and implement. The disadvantage is that you have a lot of repetitive data since most of your products probably don't get updated.
If you store a product update history based on timestamps, then you could solve your problem. Products are identified now by 3 fields. The product ID, active start date and active end date. Or you could configure products in this way: product ID = product ID + "Version X" and store this version against each order.
If you use dates, then you will query for the product and find the product version that was active during the time period that the order occurred. If you use versions against the product, then you will simply query the database for the particular version of the product itself. I haven't used mongoDb so I'm not sure how you would achieve this in mongoDb exactly. Naively however, you can modify the product ID to include the version as well using # as a delimiter possibly.
The advantage of this solution is that you don't store too much of extra data. Considering that products won't be updated too often, I feel like this is the ideal solution to your problem

MongoDB - Getting first set of $lt

I'm using MongoDB to store data and when retrieving some, I need a subset which I'm uncertain how to obtain.
The situation is this; items are created in batches, spanning about a month between. When a new batch is added, the previous batch has a deleted_on date set.
Now, depending on when a customer is created, they can always retrieve the current (not deleted) set of items, and all items in the one batch that wasn't deleted when they registered.
Thus, I want to retrieve records that have deleted_on as either null, or all items that have the deleted_on on the closest date in the future from the customer.added_on-date.
In all of my solutions, I run into one of the below problems:
I can get all items that were deleted before the customer was created - but they include all batches - not just the latest one.
I can get the first item that was deleted after the customer was created, but nothing else from the same batch.
I can get all items, but I have to modify the result set afterwards to remove all items that don't apply.
Having to modify the result afterwards is fine, I guess, but undesirable. What's the best way to handle this?
Thanks!
PS. The added_on (on the customer) and deleted_on on the items have indexes.

CRM 2011 Multiple Plugins firing at same time, retrieving invalid data

I am working on Dynamics CRM 2011,I have created a Order Product Create Plugin(Post Operation) and also an Order Product Delete Plugin (Pre Validation).
When an Order Product is created, my plugin retrieves a parent record and updates a quantity field according to the Order Product Quantity.
When an Order product is deleted, my delete plugin reverses this and adds back a quantity to a parent record.
My problem is that I have a custom HTML Resource that calls an OData Post that creates Order Products in Batches (simulating a Bulk Create), my Script calls this Creation in a loop. For instance, I may call the OData Create 5 times in a row to quickly create 5 custom Order Products, Or I may call it 10 times, depending on the users desire. It looks like my plugin is firing at the same time as the value that is retrieved from the parent record sometimes is the same instead of an updated value. My intention is that each plugin fire and update the parent record before the next plugin fires/retrieves the quantity value.
If I create 5 order products with a quantity of 1 each, I expect my parent record to decrement by 5. In reality it only decrements by 1 or 2 in a 5 Order Product Create situation. It looks like the retrieve org service calls in each plugin must be firing at the same time to grab the old value.
On the other hand, my delete plugin works perfectly in a Bulk Delete situation. I can delete 5 Order Products in a Bulk Delete and the Parent Record is updated correctly. For instance 5 Order products with quantity of 1 each, results in parent record updating by 5.
Why would a bulk delete work differently then me calling a Odata Post a few times. Do you think that moving this from a Plugin to a Workflow process would be a better solution?
Thank you
Ian
You should use Plugin Execution Order to execute plugins in order you want.
Execution order in plugin specifies the order, also known as rank, that plug-ins are executed within a pipeline stage. Plug-ins registered with an order value of 1 are executed first, followed by plug-ins registered with an order of 2, and so on. However, if there is more than one plug-in in a stage with the same order value, then the plug-in with the earliest compilation date is called first
See Image Below
You may be experiencing a race condition as a result of the plugin being triggered simultaneously after two or more Order Products are created at the same time for the same Product.
e.g. Product Lemon has 10 stocked items.
Order Product A orders 2 Lemons.
Order Product B orders 3 Lemons.
If Product Order A and B triggers your plugin at the same time, they will both grab the current stock count for Product Lemon which is 10.
Both plugin will deduct from 10 (i.e. Product Order A will be 10-2, while Product B will be 10-3). Depending on who gets to update the Product Lemon record last will be the new stock count for Product Lemon.
Solution:
Use a MUTEX to prevent a race condition in the calculation.
NB 1: MUTEX is not available in a CRM Online environment.
NB 2: (lock) is supported in CRM Online, but not for Cross-Process locking.

What's is recommended for a To-Do List schema design for MongoDB?

With the main purpose of posting Tasks, displayed as either 'to-do' or 'done', how would one better structure a NoSQL DB of the following objects:
Datetime Created Not Null
Task ID Not Null
Task ID as a Str Not Null
Task Title Not Null
Task Description
Time &/or Date Due
User Not Null
ID Not Null
ID as a Str Not Null
Name Not Null
Username Not Null
Location
Contacts Count
Date Created Not Null
UTC Offset Not Null
Time Zone Not Null
Geo-Enabled Not Null
Verified
Task Count Not Null
Language Not Null
Geo-Location
Coordinates
Place
Shared with Whom
?
Task Status
Marked Done
Auto-Moved to Done (because datetime-due is passed)
Labeled (True/False)
Edited
Edit Count
Edit Datetime
Deleted
Users can post an unlimited number of Tasks, and Tasks can be shared between users. How is this relationship best captured?
Tasks can be manually 'marked done', or 'auto-labeled' and 'auto-moved-to-done' because the date-time due is passed.
Edits & Deletes are to be recorded as well.
As a starting place, what are the upsides &/or downsides of the following schema (with scalability a prime focus):
{
"created_at":"Day Mon ## 00:00:00 +0000 20##",
"id":#####,
"id_str":"#####",
"title":"This is a title",
"description":"The description goes here..",
"date_due":"Day Mon ## 00:00:00 +0000 20##",
"user":{
"id":####,
"id_str":"####",
"name":"Full Name",
"user_name":"Username",
"location":"",
"contacts_count":101,
"created_at":"Day Mon ## 00:00:00 +0000 20##",
"utc_offset":####,
"time_zone":"Country",
"geo_enabled":true,
"verified":false,
"task_count":101,
"lang":"en",
},
"geo":?,
"coordinates":?,
"place":?,
"shared_with":?,
"moved_done":false,
"marked_done":false,
"edited":false,
"deleted":false,
}
Edits & Deletes are to be recorded as well.
Do you only need to know that a task was altered, not how or by whom?
Otherwise, that will probably require versioning, i.e. for every Task there can be a number of TaskVersions. Alternatively, you could store the modification only - it depends on your needs. In particular, having many writers isn't easy because of locking - what if two people try to change the same object at 'the same time'? You might want to consider optimistic vs. pessimistic locking or mvcc. Be warned that the "Tasks can be shared between users" requirement must be designed carefully.
As a starting place, what are the upsides &/or downsides of the following schema (with scalability a prime focus):
I guess that user refers to the user who logs in. I wouldn't denormalize that information. Suppose a user created a thousand tasks and adds a new contact. Now the contacts_count of 1000 documents must be updated or it will be wrong. Denormalize only what's really necessary, e.g. the user_name.
Depending on what kind of lists you show, you can also choose to only store the user id and actually fetch the user object whenever you need to display the user name. While complex joins aren't supported, doing a $in query on, say 50 or 100 ids (like you would have to query in a list of tasks) is pretty fast.