Correct REST endpoint design when dealing with messaging - rest

I'm struggling to work out the correct REST endpoints for a certain situation. On my website it's possible for users to send messages to each other. One user is able to send messages to multiple recipients.
I think that /v1/users/123/messages would return all messages that have been sent to user 123
What end point should I use for messages that user 123 has sent?
My database structure is as follows...
accounts table
id INT
username VARCHAR(64)
messages table
id INT
account_id INT <!-- This is the senders account ID
subject VARCHAR(128)
message TEXT
messagerecipients table
id INT
message_id INT
account_id INT <!-- This is the recipients account ID
The messages table defines a one-to-one relationship between a message and its sender
The messagerecipients table defines a many-to-many relationship between messages and their recipients
Also I'm reading through a PDF on API design at the moment which seems to suggest I should hide this kind of complexity behind the query string.
For instance....
/v1/emails?filter=author_id(123)
/v1/emails?filter=recipient_id(123)
Thoughts?

I would expect
/v1/users/123/messages
to return all the messages that belongs to this user. This means received, sent, deleted, tagged, drafted etc.
To specify a subset of a resource you can go two ways with this like you and bertvh stated:
Querystring:
I find it perfectly valid for filtering as e.g.
/v1/users/123/messages?type=received&folder=important
Or as a subresource:
Use this if you expect to have a lot of filter options on a higher level e.g.
/v1/users/123/messages/received?folder=important
As you can see this would reduce a filter option.
And like bertvh stated, the underlying database schema is irrelevant for serving the responses.

I would do something like this.
To get all messages sent by a user:
/v1/users/123/sent
To get all messages received by a user:
/v1/users/123/inbox
Your database structure is irrelevant for the resource scheme but it can influence the payload structure. If you want to use JSON a message could look something like this:
{
sender: 123,
receivers: [124, 125]
content: "My message content"
}

Related

Two different approaches to structure my NoSQL database < What to choose?

I currently get to work with DynamoDB and I have a question regarding the structure I should choose.
I setup Twilio for being able to receive WhatsApp messages from guests in a restaurant. Guests can send their feedback directly to my Twilio WhatsApp number. I receive that feedback via webhook and save it in DynamoDB. The restaurant manager gets a Dashboard (React application) where he can see monitor the feedback. While I start with one restaurant / one WhatsApp number I will add more users / restaurants over time.
Now I have one of the following two structures in mind. With the first idea, I would always create a new item when a new message from a guest is sent to the restaurant.
With the second idea, I would (most of the time) update an existing entry. Only if the receiver / the restaurant doesn't exist yet, a new item is created. Every other message to that restaurant will just update the existing item.
Do you have any advice on what's the best way forward?
First idea:
PK (primary key), Created (Epoc time), Receiver/Restaurant (phone number), Sender/Guest (phone number), Body (String)
Sample data:
1, 1574290885, 4917123525993, 4916034325342, "Example Message 1" # Restaurant McDonalds (4917123525993)
2, 1574291036, 4917123525993, 4917542358273, "Example Message 2" # different sender (4917542358273)
3, 1574291044, 4917123525993, 4916034325342, "Example Message 3" # same sender as pk 1 (4916034325342)
4, 1574291044, 4913423525123, 4916034325342, "Example Message 4" # Restaurant Burger King (4913423525123)
Second idea:
{
Receiver (primary key),
Messages: {
{
id,
Created,
From,
Body
}
}
}
Sample data (same data as for first idea, but different structured):
{
Receiver: 4917123525993,
Messages: {
{
Created: 1574290885,
Sender: 4916034325342,
Body: "Example Message 1"
},
{
Created: 1574291036,
Sender: 4917542358273,
Body: "Example Message 2"
},
{
Created: 1574291044,
Sender: 4916034325342,
Body: "Example Message 3"
}
}
}
{
Receiver: 4913423525123,
Messages: {
{
Created: 1574291044,
Sender: 4916034325342,
Body: "Example Message 4"
}
}
}
If I read this correctly, in both approaches, the proposal is to save all messages received by a restaurant as a nested list (the Messages property looks like an object in the samples you've shared, but I assume it is an array since that would make more sense).
One potential problem that I foresee with this is that DynamoDB documents have a limitation on how big they can get (400kb). Agreed this seems like a pretty large number, but you're bound to reach that limit pretty quickly if you use this application for something like a food order delivery system.
Another potential issue is that querying on nested objects is not possible in DynamoDB and the proposed structure would mostly involve table scans for any filtering, greatly increasing operational costs.
Unlike with relational DBs, the structure of your data in document DBs is dependent heavily on the questions you want to answer most frequently. In fact, you should avoid designing your NoSQL schema unless you know what questions you want to answer, your access patterns, and your data volumes.
To come up with a data model, I will assume you want to answer the following questions with your table :
Get all messages received by a restaurant, ordered by timestamp (ascending / descending can be determined in the query by specifying ScanIndexForward = true/false
Get all messages sent by a user ordered by timestamp
Get all messages sent by a user to a restaurant, ordered by timestamp
Consider the following record structure :
{
pk : <restaurant id>, // Partition key of the main table
sk : "<user id>:<timestamp>", // Synthetic (generated) range key of the main table
messageBody : <message content>,
timestamp: <timestamp> // Local secondary index (LSI) on this field
}
You insert a new record of this structure for each new message that comes into your system. This structure allows you to :
Efficiently query all messages received by a restaurant ID using only the partition key
Efficiently retrieve all messages received by a restaurant and sent by a user using pk = <restaurant id> and begins_with(sk, <user id>)
The LSI on timestamp allows for efficiently filtering messages based on creation time.
However, this by itself does not allow you to query all messages sent by a user (to any restaurant, or a specific restaurant). To do that we can create a global secondary index (GSI), using the table's sk property (containing user IDs) as the GSI's primary key, and a synthetic range key that consists of the restaurant ID and timestamp separated by a ':'.
GSI structure
{
gsi_pk: <user Id>,
gsi_sk: "<dealer Id>:<timestamp>",
messageBody : <message content>
}
messageBody is a non key field projected on to the GSI
The synthetic SK of the GSI helps make use of the different key matching modes that DynamoDB provides (less than, greater than, starts with, between).
This GSI allows us to answer the following questions:
Get all messages by a user (using only gsi_pk)
Get all messages by a user, sent to a particular restaurant (ordered by timestamp) (gsi_pk = <user Id> and begins_with(gsi_sk, <restaurant Id>)
The system has a some duplication of data, but that is in line with one of the core ideas of DynamoDB, and most NoSQL databases. I hope this helps!
Storing multiple message in a single record has multiple issues
Size of write to db will increase as we go. (which will translate to money and response time, worst case you may end up hitting 400kb limit.)
Race condition between multiple writes.
No way to aggregate messages by user and other patterns.
And the worse part is that, I don't see any benefit of storing multiple messages together. (Other than may be I can query all of them together, which will becomes a con as size grows, like you will not be able to do get me last 10 reviews, you will always have to fetch all and then fetch last 10.)
Hence go for option where all the messages are stored differently.

HTTP + REST. When should I divide GET request into 2 separate get requests with modes?

Let's say I have an endpoint for getting customers: /customers with different modes in the request query. For example: /customers?mode={type}, where type could be one of [employee, owner].
Let's say, I have different logic for getting either employee or owner(for example, for owners I return only firstName and lastName and for employees, I'm adding their passportId(no matter what actually). Plus, there is a different set of permissions needed for accessing employee or owner.
The question is next: what's the logic limit for dividing GET request(for example /persons/employees, /persons/owners. Should I even do this?

SurveyMonkey Email Collector Custom Values

In order to include information (like an order number) in a survey using an Email Collector, it's my understanding that this information needs to be stored in the Contact's custom variables. My concern is what happens if I am sending something like a customer satisfaction survey that needs to reference the order number, and the same customer (email address) places more than one order, and I have to send out more than one survey.
Will the custom values that are returned with the collectors/.../responses API call include the custom values at the time of the survey invite? Or will these be set to current values?
The custom values are stored on the response at the time the survey is taken. So if they change later, they will not change on the response. This will work fine as long as you don't sent out another survey with new custom values to the same contact before they respond to the previous one.
Just an FYI, there is also an option to set extra_fields on a recipient when adding recipients to an email collector (rather than on the contact).
POST /v3/collectors/<collector_id>/messages/<message_id>/recipients
{
"email": "test#example.com",
"extra_fields": {
"field1": "value1",
"field2": "value2"
}
}
I don't believe that data is stored with he response, but the recipient_id is and you can fetch the recipient by ID to get that data back.
Those are two options, you can see which one works best for you. The benefit of contact custom values is that you can view them and edit them from the web, whereas extra_fields are API only fields.

What is the best architecture and table design for a chat application with the use of Cassandra and trigger

I want to leverage the use of trigger to make push to different users when a message is sent.
Agree with #Sabik about not using a trigger to deliver the messages, should be better to handle within your application.
If you want to store the chat history, you might try something like below.
Considering chat will be same for both users, we only want to store once so will have an ID for each chat. We can have a separate table to store all the chats a user is involved with. This will allow extension to group chats with more than 2 users.
This is a quick example but then you might look at enhancements to try to do things like order the user's chats based on most recent message, etc.
CREATE TABLE chatsPerUser (
"user1" uuid,
"chatID" uuid,
PRIMARY KEY ("user1")
);
//Add any other metadata you need
CREATE TABLE chat (
"chatID" uuid,
ts timestamp,
userID uuid,
message text,
PRIMARY KEY (("chatID"),ts)
);

Modeling hierarchical data with authentication using DynamoDB

I'm looking for some best practices when it comes to modeling confidential hierarchical data in general and specifically with DynamoDB.
The scenario is best explained with an example:
Let's say we have a number of users. Each user has a number of products. Each product consists of a number of parts.
Typical use cases:
List all products for a given user
List all parts for a given product
So far I have modeled this in DynamoDB like this:
Users
----------------
HashKey: UserId
Products
-------------------
HashKey: UserId
RangeKey: ProductId
Parts
-------------------
HashKey: ProductId
RangeKey: PartId
The data is confidential and accessed through authenticated REST endpoints where an authentication token can be mapped to a UserId. Each user may be allowed to view other users' data through some group concept.
Listing all products for a given user is simple since UserId is a key in the products table:
GET /users/111/products becomes a simple Query(Table=Products, UserId=111)
But consider the case of listing all parts for a given product:
GET /users/111/products/222/parts
If I simply do a Query(Table=Parts, ProductId=222) then I will get the desired data fast, but I am not protecting against other users querying for data belonging to user 111, provided they somehow know about ProductId 222 (in reality, ID:s will of course be UUID:s or similar so not so easily guessable):
GET /users/119/products/222/parts
... would result in malicious user 119 retrieving data that doesn't belong to him, provided nothing is done to address this.
So here I imagine I need to do something like one of these:
First make another query to make sure product 222 in fact belongs to the given user
Duplicate the UserId in the Parts table and include it in the query condition (which basically means it will match either all rows or no rows when scanning through the set identified by ProductId): Query(Table=Parts, ProductId=222, UserId=111)
Use UserId as the hash key also in the Parts table and instead keep ProductId as a secondary index
Use a composite HashKey such as UserId_ProductId ("111_222") on the Parts table
If I need to return a 401 as opposed to just empty data, option 1 seems like the only approach. But if we imagine a deeper hierarchy of data, e.g. "users having inboxes having messages having parts having attachments" it seems this approach could eventually be expensive (listing all attachments for part P might result in a query to check that part P belongs to message M, that message M belongs to inbox I and that inbox I belongs to user U, and so on).
Does anyone have any good arguments for which approach is most favorable? Or am I doing something stupid and should be modeling my data in some other way completely?