Entity Framework, Junction Tables with Timestamps - entity-framework

I was wanting to know if there is a good way to work in timestamps into junction tables using the Entity Framework (4.0). An example would be ...
Clients
--------
ID | Uniqueidentifier
Name | varchar(64)
Products
-----------
ID | uniqueidentifier
Name | varchar(64)
Purchases
--------
Client | uniqueidentifier
Product | uniqueidentifier
This works smooth for junctioning the two together - but I'd like to add a timestamp. Whenever I do that, I'm forced to go through the middle-table in my code. I don't think I can add the timestamp field to the junction table - but is there a different method that might be useable?

Well, your question says it all. You must either have a middle "Purchases" entity or not have the timestamp on Purchases. Actually, you can have the field on the table if you don't map it, but if you want it on your entity model then these are the only two choices.

Related

Implement revision number while keeping primary key in sql

Suppose I have a psql table with a primary key and some data:
pkey | price
----------------------+-------
0075QlyLvw8bi7q6XJo7 | 20
(1 row)
However, I would like to save historical updates on it without losing the functionality that comes from referencing it's key in other tables as foreign keys.
I am thinking of doing some kind of revision_number + timestamp approach where each "update" would be a new row, example:
pkey | price | rev_no
----------------------+-------+--------
0075QlyLvw8bi7q6XJo7 | 20 | 0
----------------------+-------+--------
0075QlyLvw8bi7q6XJo7 | 15 | 1
(2 rows)
Then create a view that always takes the highest revision number of the table and reference keys from that view.
However to me this workaraound seems a bit too heavy for a task that in my opinion should be fairly common. Is there something I'm missing? Do you have a better solution or is there a well known paradigm for these types of problems which I don't know about?
Assuming PKey is actually the defined primary key you cannot do the revision scheme you outlined without creating a history table and moving old data to it. The primary key must be unique for any revision. But if you have a properly normalized table there several valid method, the following is one:
Review the other attributes and identify the candidate business keys (columns of business meaning that could be defined unique -- perhaps the item name.
If not already present add 2 columns: effective timestamp and superseded timestamp.
Now create a partial unique index on the identified column,from #1) and the superseded timestamp being a column meaning this is the currently active version.
Create a simple view as Select * from table. Since this is a simple view it is fully update-able. Use this View for Select,Insert and Delete, but
for Update create an instead of trigger. This trigger will set the superseded timestamp of the current active row and insert a new row update applied and the updated the version number.
With the above you can get you uniquely keep on the current active revision. Further you maintain the history of all relationships at each version. (See demo, including a couple useful functions)

Relational table to Dynamodb

I worked with relational databases for a long time, and now I am going to work with DynamoDB. After, working with relational databases, I am struggling to design some of our current SQL tables in DynamoDB. Especially, deciding about partition and sort keys. I will try to explain with an example:
Current Tables:
Student: StudentId(PK), Email, First name, Last name, Password, SchoolId(FK)
School: SchoolId(PK), Name, Description
I was thinking to merge these tables in DynamoDB and use SchoolId as the Partition Key, StudentId as the sort key. However, I saw some similar examples use StudentId as the Partition Key.
And then I realized, we use "username" in each login functionality, so the application will query with "username"(sometimes with a password, or auth token) a lot. This situation makes me think about; SchoolId as the Partition Key and Username as the sort key.
I need some ideas about what would be the best practice in that case and some suggestions to give me a better understanding of NoSQL and DynamoDb concepts.
In NoSql you should try to list down all your use cases first and then try to model the table schema.
Below are the use-cases that I see in your application
Get user info for one user with userId (password, age, name,...)
Get School info for user with userId (className, schoolName)
Get All the student in one school.
Get All the student in one class of one school.
Based on these given access pattern this is how I would have designed the schema
| pk | sk | GSI1 PK | GSI1 SK |
| 12345 | metadata | | | Age:13 | Last name: Singh | Name:Rohan | ...
| 12345 | schoolMeta | SchoolName: DPS | DPS#class5 | className:5 |
With the above schema you can solve the identified use cases as
Get user info for one user with userId
Select * where pk=userId and sk=metadata
Get school info for user with userId
Select * where pk=userId and sk=schoolMeta
Get All the student in one school.
Select * where pk=SchoolId from table=GSI1
Get All the student in one class.
Select * where pk=SchoolId and sk startswith SchoolId#className from table=GSI1
But the given schema suffers from the drawback that
If you want to change school name you will have to update too many rows.

Many to Many with thousands of reference items

I currently have a SQL Server database with a table containing 400,000 movies. I have another table containing thousands of users.
CREATE TABLE [movie].[Header]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[SourceId] [int] NOT NULL,
[ReleaseDate] [Date] NOT NULL,
[Title] [nvarchar](500) NOT NULL
)
CREATE TABLE [account].[Registration]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[Username] [varchar](50) NOT NULL,
[PasswordHash] [varchar](1000) NOT NULL,
[Email] [varchar](100) NOT NULL,
[CreatedAt] [datetime] NOT NULL,
[UpdatedAt] [datetime] NOT NULL
)
CREATE TABLE [movie].[Likes]
(
[Id] [uniqueidentifier] NOT NULL,
[HeaderId] [int] NOT NULL,
[UserId] [int] NOT NULL,
[CreatedAt] [datetime] NOT NULL
)
CREATE TABLE [movie].[Dislikes]
(
[Id] [uniqueidentifier] NOT NULL,
[HeaderId] [int] NOT NULL,
[UserId] [int] NOT NULL,
[CreatedAt] [datetime] NOT NULL
)
Each user is shown 100 movies starting from two weeks into the future. They can then perform an action such as like, dislike, recommend etc.
I'm in the process of moving the entire application into a serverless architecture. I have the APIs running in AWS via Lambda + API Gateway and now I'm looking at using DynamoDB for the database. I don't think I have anything super crazy that would prevent me from storing the data in Dynamo and their pricing/consumption model seems like it would be substantially cheaper than SQL Server (currently hosted in Azure).
The one thing I'm having issues with is understanding how I would model the users performing an action on a movie. If they "like" a movie, it goes into a likes list that they can go back and visit. There, I present them the entire move record (which actually consists of more data such as cast/crew/ratings etc. I just truncated the cable to simplify it). If I stored each "Like" as an item in Dynamo, along with the entire movie as an attribute, I'd think that the users document would get very large.
I also need to continue to show users movies, starting two weeks out, that they have not performed any actions on. Movies that they have performed actions on I need to remove from the query. Today I'm just joining on the movies table and the users actions table, removing movies from the query that already exists in the users action table. How would I model this in NoSql with the same end-result?
I can consolidate the likes/dislikes into a single document with an action type attribute (representing like/dislike etc), and an array of movies that the action has been performed on. Not sure still how I would go about filtering the [Header] query so that the movies in the users document don't come back.
I figured I would set my movies hash key to the release date for sharding, since there's roughly 10 movies per release date on average. That gives a nice distribution. I figured I'd use the userid has the hash key for the document containing all of the movies that a user has performed an action on; not sure if that's the right path though.
I've never dealt with NoSql so I wanted to ask for input. I am not sure how best to design something that is essentially one-to-many, but with the potential for the movies-per-user being in the tens of thousands.
So, based on your comments I am gonna throw in a suggestion. It doesn't mean its a right answer, I could be wrong as well or missing a point
First of all please read every segment of the Best Practices over and over again. There are patterns that you might never thought of but is still possible with NoSQL approach. Its very helpful and educative (considering you saying you are new to NoSQL). There are similarities to your case and you might create your own answer based on the best practices.
What I can suggest is:
NoSQL is very bad at querying for 'not existing'. A big trick of NoSQL is it exactly knows where to find the data you are looking for, not where not not to find. So its bit hard to find users that didn't perform any action on a movie yet. If you can use a side DB such like Redis you can pull this off very easily. With Redis data structures you can query which user hasn't liked/disliked yet and get the rest of the movie data from DynamoDB. But putting side database, Redis, to aside for now and going with only DynamoDB approach.
One approach could be when each movie arrives to DB (new movie) you can add them to each of the users with the action type not-actioned-yet. And now for all users you can query these very easy and very fast. (Now it knows where the data is ;) ) But this isn't right because if there are 10.000 users then for every movie you make 10.000 writes.
Another approach could be imagine you have item on a table that holds the date of the user's last 'get list of not-yet-actioned' query. Now, after some time user comes back for the same query and now you need to read that date and get all the movies that is added to your DB after that date. With datetimes as sort keys you can query movies starting from that date. Lets say, 10 movies added after users last query (these are definitely user hasn't actioned yet). Now you add these 10 movies to a table as an item not-actioned-yet. After this you will you have all the movies user hasn't actioned yet. 'not-actioned-yet' is also type like 'like, disliked'. From now on you can query for them easily.
Example table structures:
You can either use sparse indexes or time series table approach to separates new movies (in next 2 weeks) from others. This way you query or scan only them efficiently. Going with sparse indexes here
Movies table
| Id (Hash Key|Primary Key) | StartingDateUnix(GSI SK) | IsIn2Weeks (GSI) |
|:-------------------------:|-------------------------:|:----------------:|
| MovieId1 | 1234567 | 1
| MovieId2 | 1234568 | 1
| MovieId3 | 001123 | null
To get movies after unix 1234567 you have to query GSI with a sort key bigger than unix time.
User Actions Table
| UserId (Hash Key) | ActionType_ForMovie(Sort Key) | CreatedAt (LSI) |
|:-----------------:|:-----------------------------:|:---------------:|
| UserId1 | no-action::MovieId1 | 1234567 |
| UserId1 | no-action::MovieId2 | 1234568 |
| UserId1 | like::MovieId3 | 1234569 |
| UserId1 | like::MovieId4 | 1234561 |
| UserId1 | dislike::MovieId5 | 1234562 |
Using sort keys you can query for all the likes dislikes not yet actioned ... and you can sort them by dates. You can also paginate.
I have spent some time on this problem, because its also good challenge for me and i would appreciate a feedback. Hope it helps in some way

In Postgresql, is there a way to restrict a column's values to be an enum?

In postgresql, I can create a table documenting which type of vehicle people have.
CREATE TABLE IF NOT EXISTS person_vehicle_type
( id SERIAL NOT NULL PRIMARY KEY
, name TEXT NOT NULL
, vehicle_type TEXT
);
This table might have values such as
id | name | vehicle_type
----+---------+---------
1 | Joe | sedan
2 | Sue | truck
3 | Larry | motorcycle
4 | Mary | sedan
5 | John | truck
6 | Crystal | motorcycle
7 | Matt | sedan
The values in the car_type column are restricted to the set {sedan, truck, motorcycle}.
Is there a way to formalize this restriction in postgresql?
Personally I would use foreign key and lookup table.
Anyway you could use enums. I recommend to read article PostgreSQL Domain Integrity In Depth:
A few RDBMSes (PostgreSQL and MySQL) have a special enum type that
ensures a variable or column must be one of a certain list of values.
This is also enforcible with custom domains.
However the problem is technically best thought of as referential
integrity rather than domain integrity, and usually best enforced with
foreign keys and a reference table. Putting values in a regular
reference table rather than storing them in the schema treats those
values as first-class data. Modifying the set of possible values can
then be performed with DML (data manipulation language) rather than
DDL (data definition language)....
However when the possible enumerated values are very unlikely to
change, then using the enum type provides a few minor advantages.
Enums values have human-readable names but internally they are simple integers. They don’t take much storage space. To compete with
this efficiency using a reference table would require using an
artificial integer key, rather than a natural primary key of the value
description. Even then the enum does not require any foreign key
validation or join query overhead.
Enums and domains are enforced everywhere, even in stored procedure arguments, whereas lookup table values are not. Reference
table enumerations are enforced with foreign keys, which apply only to
rows in a table.
The enum type defines an automatic (but customizable) order relation:
CREATE TYPE log_level AS ENUM ('notice', 'warning', 'error', 'severe');
CREATE TABLE log(i SERIAL, level log_level);
INSERT INTO log(level)
VALUES ('notice'::log_level), ('error'::log_level), ('severe'::log_level);
SELECT * FROM log WHERE level >= 'warning';
DBFiddle Demo
Drawback:
Unlike a restriction of values enforced by foreign key, there is no way to delete a value from an existing enum type. The only workarounds are messing with system tables or renaming the enum, recreating it with the desired values, then altering tables to use the replacement enum. Not pretty.

User profile database design

i have to design a user account/profile tables for a university project. The basic idea i have is the following:
a table for user account (email, username, pwd, and a bunch of other fields)
a user profile table.
It seems to me that there are two ways to model a user profile table:
put all the fields in a table
[UserProfileTable]
UserAccountID (FK)
UserProfileID (PK)
DOB Date
Gender (the id of another table wich lists the possible gender)
Hobby varchar(200)
SmallBio varchar(200)
Interests varchar(200)
...
Put the common fields in a table and design an ProfileFieldName table that will list
all fields that we want. For example:
[ProfileFieldNameTable]
ProfileFieldID int (PK)
Name varchar
Name will be 'hobby', 'bio', 'interests' etc...Finally, we will have a table that will associate profiles with profile fields:
[ProfileFieldTalbe]
ProfileFieldID int (PK)
UserProfileID FK FK
FieldContent varchar
'FieldContent' will store a small text about hobbies, the bio of the user, his interests and so on.
This way is extensible, meaning that in this way adding more fields corresponds to an INSERT.
What do you think about this schema?
One drawback is that to gather all profile information of a single user now i have to do a join.
The second drawback is that the field 'FieldContent' is of type varchar. What if i want it to be of another type (int, float, a date, a FK to another table for listboxs etc...)?
I suggest 2nd option would be better,
The drawbacks which you mentioned are not actual drawbacks,
Using JOINS is one of ways to retrieving the data from the 1 or more tables ,
2) 'FieldContent' is of type varchar : I understand that you are going to create only 'FieldContent' for all your other fields
In that case, I suggest that you can have 'FieldContent' for each corresponding fields so that you can give any any kind of Data Type which you wish.
Coming to the first option,
1)If you put all the fields in one table may lead lot of confusion and provides less feasibility to extend later, if any requirements changes
2)There will be lot of redundancy as well.