ScyllaDB mutual friend relation data modeling

ScyllaDB mutual friend relation data modeling - nosql

I'm working on a mobile App with social features, I'm currently trying on implement mutual friend relations with Scylla. I have chosen Scylla because the Friend Service will be a key feature with high TPS since a lot of other services need access to a users friends and friend relations are generally a good fit for NoSQL.
Mutual friend relation means, if user_1 accepts the friend request of another user_2, user_2 should have user_1 as friend, but user_1 should also automatically be friends with user_2.
My goal is to design a Schema that allows all my access patterns with minimal response time, so optimally one request for every access pattern. Less data duplication would be nice but not mandatory.
Access Patterns:
Get Friends of a user by the user's id
Get Incoming Friend Requests by user id
Get Relation between two users by specifying both user id's
The query should allow us to conclude if the two users are friends, if one requested the other to be friends or if they are not friends.
Data Manipulation
Create Friend Request
Decline Friend Request
Accept Friend Request
This should result in a mutual friend relation between the requester and requested.
Remove Friend Relation
This should remove the friend relation for both users.
My Current Design
Here I will explain my current Schema and which access patterns it allows me to do and what could be improved.
This is my Schema, no data duplication just one table with the Primary Key user_id, accepted, friend_id
CREATE TABLE IF NOT EXISTS friend_relations (
user_id text,
friend_id text,
accepted boolean,
requested_at timestamp,
accepted_at timestamp,
PRIMARY KEY (user_id, accepted, friend_id)
);
GetFriends
SELECT * FROM friend_relations WHERE user_id=? AND accepted=true;
GetIncomingFriendRequests
When a Friend Request is inserted the receiver is the user_id so it is only possible to get Incoming Friend Request but not outgoing.
SELECT * FROM friend_relations WHERE user_id=? AND accepted=false;
GetFriendRelation
This is my current Pain Point. It would be easy to know if user_1 is
friends with user_2 since their would be a record for both of them
saying the other user is their friend (because of mutual friend
relation). But when I want to know if one user requested the other, I
would need to do a query like below:
SELECT * FROM friend_relations WHERE user_id IN (user_1,user_2) AND accepted=false AND friend_id IN (user_1,user_2)
This way I would know if either user_1 requested user_2 or the
other way.
I combined both queries into:
SELECT * FROM friend_relations WHERE user_id IN (user_1,user_2) AND accepted IN (true, false) AND friend_id IN (user_1,user_2).
The many IN relations give some headaches since I'm not sure if this could cause unpredictable performance when one id is in a completely different partition on a different Node.
Data Manipulation
Create Friend Request
INSERT INTO friend_relations (user_id, friend_id, accepted, requested_at) VALUES (requested_user, requester, false, now).
Here I'm switching the requester to the friend_id and the person who get's requested (requested_user) is the user_id. That way we can find all friend requests to a user which is more important than finding the friend requests sent by a user.
Decline Friend Request
DELETE FROM friend_relations WHERE user_id=? AND accepted=false AND friend_id=?;
This is simple since the person who wants to decline is the user_id and the friend_id is the person who requested.
Accept Friend Request
This is more complicated since I need to batch three cql statements.
Delete the friend request where the accepting user is the
user_id, accepted=false and the friend_id is the requester. I
can't update the relation since accepting is part of the primary
key.
Insert mutual friend relation by inserting two friend relations
with accepted=true and switching the user_id and friend_id for
one of the inserts, as this way the other user will be found by
GetFriends method for both users.
Remove Friend Relation
DELETE FROM friend_relations WHERE user_id IN (user_1,user_2) AND accepted=true AND friend_id IN (user_1,user_2);.
Relatively easy, just delete relation for both sides of users.
These are all my thoughts that went into designing my current Schema but I'm still not completly convinced by it, since a very important access pattern GetRelation uses so many IN relations and accepting a Friend Request is quite complicated. But I also find it hard to imagine a better Schema which could include Materialized Views, perhaps to separate the friend requests from the friend relations, because then the GetRelation method would need to query the other table if one doesn't return a record (Two network requests).

Related

Data modelling for dynamodb where entity has one to many and many to many relationships

I am new to the NoSql world. I am building a serverless app with dynamodb. In a relational DB when I would have 3 entities like post, post_likes and post_tags I would have few tables and use joins to fetch data. But, I wonder how should one make a NoSql structure for a scenario where post has one to many relationship with likes, and many to many with tags.
Post model:
user_id <string>
attachment_url <string>
description <string>
public <boolean>
Like model:
user_id <string>
post_id <string>
type <string>
Tag model:
name <string>
I have few access patterns:
Get all public posts
Get all posts filtered by a single tag and public status
Get all posts by user id
Get a single post by post id
And each time a post should be fetched with tags data, and likes data including user data that is attached to a like.
In relational DB I would create post_tags table and fetch all post by tags. But, how can I do this with dynamodb?
I am struggling to figure out how my table should look like and what to set as primary and sort keys amongst post_id, user_id, tag_name or public fields for this case?
My initial thought was to build a table with entity that would look like this:
Partition key | Sort key | data attributes
tag_name | post_id | public | user_id | likes[] | other post attributes...
Then this table would look something like this:
I have set the 2 Global secondary indexes.
First Global secondary index:
partition key set to public and sort key to post_id
Second Global secondary index:
partition key set to user_id and sort key to post_id
That way for each tag a post has, I would have a duplicate of that post in the table. I thought by having a tag as a first filter, that way I could query efficiently posts if I need to query them by a tag.
But, if I do a query by just a public status or user_id, I would get all the duplicates of posts for each tag they belong to.
Or should I have 3 separate entities in the table, tags, posts and likes and if I fetch a post by a tag, I would first do one query to find all post_ids by a tag, then do the second query to fetch posts and their likes id, and then do the third query to fetch the likes array.
I don't know what is the best practice when it comes to this things, since I only just started using dynamodb.
How should this DB structure look like then?

You're off to a great start by thinking deeply about your access patterns and defining your entities (Posts, Users, Likes, etc). As you know, having a thorough understanding of your access patterns is critical to storing your data in DynamoDB.
While reviewing my answer, keep in mind that this is only one solution. DynamoDB gives you a ton of flexibility when defining your data model, which can be both a blessing and a curse! This answer is not meant to be the way to model these access patterns. Instead, it's one way that these access patterns can be implemented. Let's get into it!
I like to start by listing the entities we need to model, as well as the Primary key for each. Throughout this post, I'll be using composite primary keys, which are keys made up of a Partition Key (PK) and a Sort Key (SK). Let's start out with a blank table and fill it out as we go.
Partition Key Sort Key
User
Post
Tag
Users
Users are central to your application, so I'll start there.
Let's start by defining a User model that lets us identify a User by ID. I'll use the pattern USER#<user_id> for the PK and SK of the User entity.
This supports the following access patterns (examples in pseudocode for simplicity):
Fetch User by ID
ddbClient.query(PK = USER#1, SK = USER#1)
I'll update the table with the new PK/SK pattern for Users
Partition Key Sort Key
User USER#<user_id> USER#<user_id>
Post
Tag
Posts
I'll start modeling Posts by focusing on the one-to-many relationship between Users and their Posts.
You have an access pattern to fetch All Posts by UserId, so I'll start by adding the Post model to the User partition. I'll do this by defining a PK of USER#<user_id> and an SK of POST#<post_id>.
This supports the following access patterns:
Fetch User and all Posts
ddbClient.query(PK = USER#<user_id>)
Fetch User Posts
ddbClient.query(PK = USER#<user_id>, SK begins_with "POST#")
You may wonder about the odd-looking Post IDs. When fetching Posts, you'll probably want to get the most recent Posts first. You also want to be able to uniquely identify Posts by ID. When you have this sort of requirement, you can use a KSUID as your unique identifier. Explaining KSUID's is a bit out of scope for your question, but know that they are unique and sortable by the time they were created. Since DynamoDB sorts results by the Sort Key, your query for a user's posts will automatically be sorted by creation date!
Updating the PK/SK patterns for your application, we now have
Partition Key Sort Key
User USER#<user_id> USER#<user_id>
Post USER#<user_id> POST#<post_id>
Tag
Tags
We have a few options on how to model the one-to-many relationship between Posts and Tags. You could include a list attribute on your Post item, which simply lists the number of tags on the item. This approach is perfectly fine. However, looking at your other access patterns, I'm going to take a different approach for now (it will be apparent why later).
I will model tags with a PK of POST#<post_id> and an SK of TAG#<tag_name>
Since Primary Keys are unique, modeling tags in this way will ensure that no Post is tagged with the same Tag twice. Additionally, it allows us to have an unbounded number of Tags on a Post.
Updating our PK/SK table for Tag, we have
Partition Key Sort Key
User USER#<user_id> USER#<user_id>
Post USER#<user_id> POST#<post_id>
Tag POST#<post_id> TAG#<tag_name>
At this point we've modeled Users, Posts and Tags. However, we've only addressed one of your four access patterns. Lets see how we can use secondary indexes to support your access patterns.
Note: You could also model Likes in the exact same way.
Defining A Secondary Index
Secondary indexes allow you to support additional access patterns within your data. Let's define a very simple secondary index and see how it supports your various access patterns.
I'm going to create a secondary index that swaps the PK/SK patterns in your base table. This pattern is called an inverted index, and would look like this:
All we've done here is swapped the PK/SK pattern of your base table, which has given us access to two additional access patterns:
Fetch Post by ID
ddbClient.query(IndexName = InvertedIndex, PK = POST#<post_id>)
Fetch Posts by Tag
ddbClient.query(IndexName = InvertedIndex, PK = TAG#<tag_name>)
Fetch All Posts by Public/Private status
You wanted to fetch posts by public/private status, as well as fetching all Posts. One way to fetch all Posts is to put them in a single partition. We can put the public/private status in the sort key to separate the public and private Posts.
To do this, I'll create two new attributes on the Post item: _type and publicPostId. These fields will serve as the PK/SK patterns for the secondary index I'm calling PostByStatus.
After doing this, your base table would look like this:
and your new secondary index would look like this
This secondary index would enable the following access patterns
Fetch All Posts
ddbClient.query(IndexName = PostByStatus, PK = POST)
Fetch All Private Posts
ddbClient.query(IndexName = PostByStatus, PK = POST, SK begins_with "PRIVATE#")
Fetch All Public Posts
ddbClient.query(IndexName = PostByStatus, PK = POST, SK begins_with "PUBLIC#")
Remember, post ID's are KSUID's, so they will naturally be sorted in your results by the date the Post was made.
A Word on Hot Partitions
Storing all your Posts in a single partition will likely result in a hot partition as your application scales. One way to address this is by distributing your Post items across multiple partitions. How you do that is entirely up to you and specific to your application.
One strategy to avoid the single POST partition could involve grouping Posts by creation day/week/month/etc. For example, instead of using POST as your PK in the PostByStatus secondary index, you could use POSTS#<month>-<year> instead, which would look like this:
Your application would need to take this pattern into account when fetching Posts (e.g. start at the current month and go backwards until enough results are fetched), but you'd be spreading the load across multiple partitions.
Wrapping Up
I hope this exercise gives you some ideas on how to model your data to support specific access patterns. Data modeling in DynamoDB takes time to get right, and will likely require multiple iterations to make work for your specific application. It can be a steep learning curve, but the payoff is a solution that brings scale and speed to your application.

Is long API Url length a good practice?

I am working on an Api of Medicine Reminder System . Here I have four Tables.
user
medicine
user_medicine ( Linking table of user and medicine )
medicine_reminder_time
user and medicine have many-to-many relationship. Thus, there is user_medicine table as linking table.
A user's can have many medicines. and Each medicines can have one or more reminder.
Thus, to maintain this, I have created a column of increment id in user_medicine (linking table) and make one-to-many relationship with medicine_reminder_time table.
For Api users to do CRUD on medicine_reminder_time, I have made route url like this :
api/users/1/medicine/3/reminder
Is this a standard way to make api urls? Or it should do with different way. please suggest.

How should I handle two different types of users in database?

I am building a ecommerce application with two different types of users, users who shop and vendors/brands.
Should I make two tables for each user like?
User| id, email, password, username, address, stripeCustomerId
Brands| id, email, password, username/brandName, shippingRate, address, stripeAccessToken etc.
Or should I make it like so:
Users| id, email, password, username, address, stripeCustomerId
Brands| userid, etc...

This is an example of trying to model the object-oriented notion of inheritance in a relational database. If you search for that term, you'll find several answers on Stack Overflow.
In your case, I think you have 3 logical entities:
User: email, password, username, address...
Customer (is a type of user): StripeID
Vendor (is a type of user): shipping rate, stripe token
How you model those logical entities to physical objects in your database is mostly a question of trade-offs - the other answers explain those.
I assume there will be significant differences in both the behaviour and attributes between "customer" and "vendor".
I also assume your data model will evolve over time - for instance, you probably need to store more than one address for each user (shipping, billing), you probably have different lifecycles for "customers" (new, registered, registration confirmed, payment confirmed) and "vendors" (new, approved, rejected).
If those things are true, I'd just bite the bullet and have 2 tables, customer and vendor. This means you can evolve their behaviour more easily - you don't have to worry about needing a slightly different address logic between two "customer" and "vendor", you just build what you need. Your schema is a little more self-explanatory - your foreign keys go to tables that say what they do (products -> vendors, not products -> users).

It shouldn't be two tables, but three :D
1. users (id, name, password, )
2. customers (user_id, customer_specific_fields)
3. vendors (user_id, vendor_specific_fields)

Extend User authentication object in Azure Mobile Services

Is it possible to add additional properties to the User object on the server in WAMS? I would like to store the Id primary key of my User table for (secure) use in my table scripts. At the moment the only id is the vendor specific authentication Id, but I'd like to be able to allow users to choose an authentication method. Currently my (simplified) table design is as follows:
User table:
id
googleId
twitterId
facebookId
name, etc...
League table
id
userId
name, etc
I'd like to store the user primary key in the userId field on the league table, and then query it to ensure that users only get to see leagues they created. At the moment, the user object in table scripts sends through a User object with the Google/Twitter/Windows authentication token and I have to do a query to get the primary key userID, everytime I want to carry out an operation on a table with a userId column.
Ideal solution would be that when the Insert script on my User table is called on registrations and logins I can do:
// PSEUDO CODE
function insert(item, user, request) {
var appUserId;
Query the user table using the user.userId Google/Twitter/Facebook id
If user exists {
// Set a persisted appUserId to use in all subsequent table scripts.
user.appUserId = results.id;
} else {
Set the GooTwitFace columns on the user table, from user.userId
insert the user then get the inserted record id
// Set a persisted appUserId to use in all subsequent table scripts
user.appUserId = insertUserPK;
}
}
Then, in subsequent table scripts, I'd like to use user.appUserId in queries

If all you are trying to do is authorize users to only have access to their own data, I'm not sure you even need the "user" table. Just use the provider-specific userId on the user object to query your "league" table (making sure the userId column is indexed). The values will be provider-specific, but that shouldn't make any difference.
If you are trying to maintain a notion of a single user identity across the user's Google/Facebook/Twitter logins, that's a more complicated problem where you would need a "user" table and the kind of lookup you are describing. We hope to ship support for this scenario as a feature out of the box. It is possible (but fairly messy) to do this yourself, let me know if that's what you're trying to do.

Facebook "like" data structure

I've been wondering how facebook manages the database design for all the different things that you can "like". If there is only one thing to like, this is simple, just a foreign key to what you like and a foreign key to who you are.
But there must be hundreds of different tables that you can "like" on facebook. How do they store the likes?

If you want to represent this sort of structure in a relational database, then you need to use a hierarchy normally referred to as table inheritance. In table inheritance, you have a single table that defines a parent type, then child tables whose primary keys are also foreign keys back to the parent.
Using the Facebook example, you might have something like this:
User
------------
UserId (PK)
Item
-------------
ItemId (PK)
ItemType (discriminator column)
OwnerId (FK to User)
Status
------------
ItemId (PK, FK to Item)
StatusText
RelationshipUpdate
------------------
ItemId (PK, FK to Item)
RelationshipStatus
RelationTo (FK to User)
Like
------------
OwnerId (FK to User)
ItemId (FK to Item)
Compound PK of OwnerId, ItemId
In the interest completeness, it's worth noting that Facebook doesn't use an RDBMS for this sort of thing. They have opted for a NoSQL solution for this sort of storage. However, this is one way of storing such loosely-coupled information within an RDBMS.

Facebook does not have traditional foreign keys and such, as they don't use relational databases for most of their data storage. Simply, they don't cut it for that.
However they use several NoSQL type data stores. The "Like" is most likely attributed based on a service, probably setup in an SOA style manner throughout their infrastructure. This way the "Like" can basically be attributed to anything they want it to be associated with. All this, with vast scalability and no tightly coupled relational issues to deal with. Something that Facebook, can't really afford to deal with at the volume they operate.
They could also be using an AOP (Aspect Oriented Programming) style processing mechanism to "attach" a "Like" to anything that may need one at page rendering time, but I get the notion that it is asynchronous processing via JavaScript against an SOA style web service or other delivery mechanism.
Either way, I'd love to hear how they have this setup from an architecture perspective myself. Considering their volume, even the simple "Like" button becomes a significant implementation of technology.

You can have a table with Id, ForeignId and Type. Type can be anything like Photo, Status, Event, etc… ForeignId would be the id of the record in the table Type. This makes possible for both comments and likes. You only need one table for all likes, one for all comments and the one I described.
Example:
Items
Id | Foreign Id | Type
----+-------------+--------
1 | 322 | Photo
4 | 346 | Status
Likes
Id | User Id | Item Id
----+-------------+--------
1 | 111 | 1
Here, user with Id 111 likes the photo with Id 322.
Note: I assume you are using an RDBMS, but see Adron's answer. Facebook does not use an RDBMS for most of their data.

I'm pretty sure Facebook does not store "like" information as how some other suggested it using RDBMS. With millions of users and possibly thousands of like, we're looking at thousands of rows to join here which would impact performance.
The best approach here is to append all "likes" in a single row. For example, a table with user_like_id column of text datatype. Then all id's who liked the post is appended. In this case, you only query one row and you got everything. This will be a lot faster than joining tables and getting counts.
EDIT: I haven't been here on this site lately and I just discovered this answer has been downvoted. Well, here's an example post with like count and their avatars. This is my design where I just implemented what I'm talking about.
The two components here are 1.) XREF table and 2.) JSON object.
The likes are still stored on a XREF table. But at the same time, data is appended on JSON object and stored on a text column on the post table.
Why did I store the likes info on a text column as JSON? So that there's no need to do db lookup/joins for the likes. If someone unlike the post, the JSON object is just updated.
Now I don't know why this answer is downvoted by some users here. This answer provides quick data retrieval. This is close to NoSQL approach which is how FB access data. In this case, there's no need for extra joins/lookup to get likes info.
And here's the table that holds the likes. It's just a simple XREF mapping between user and item table.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse