SQL server with row level security and entity framework - entity-framework

I have a database (MS SQL Server) where row level security is used for the tables. Will there be any complications if I want to use that with Entity Framework. For example assume I have a n:n relation between students and courses like this:
Student:
Bob
Alice
Course:
Math
History
Student_Course:
Bob | Math
Bob | History
Alice | Math
Alice | History
"Course" table has row level access and I am not allowed to see the row "History". So what will happen if I have fetched Bob from the "Student" table and then do something like:
var bobsCourses = bobTheStudent.Courses
The framework is smart enough to give me [Math]?
I get [Math, null]?
Everything will crash and burn because the entry in "Student_Course" does not have a match in "Course"?
The database already exists but not the application so this is "database first"
The application will only be used for data presenting, so no need to care about writing any updates to the database.

It would probably break unless you applied the RLS predicate to the Student_Course table as well. RLS doesn't "cascade" in SQL Server.

I did some testing, it seems to be working like this:
For n:n relations there is no problem, the example works fine and the result will be [Math].
It's a little bit more complicated for 1:1 relations, e.g. assume we have these tables:
Student: ### Table with students + main subject foreign key
Bob | Math
Alice | History
Subject: ### Subjects
Math ### This row is private
History
Then:
AllStudents
will give me [Bob, Alice]
and
BobTheStudent.MainSubject
will simply be null.
But then comes the scary stuff...
AllStudents.Include("MainSubject")
will give me [Alice].
I guess that's because in that case an SQL inner join will be used to retrieve the data from both tables. (Probably it will be different if null values were allowed for the foreign key, but I did not test that.)

Related

updating and modeling nosql record

So in a traditional database I might have 2 tables like users, company
id
username
companyid
email
1
j23
1
something#gmail.com
2
fj222
1
james#aol.com
id
ownerid
company_name
1
1
A Really boring company
This is to say that user 1 and 2 are apart of company 1 (a really boring company) and user 1 is the owner of this company.
I could easily issue an update statement in MySQL or Postgresql to update the company name.
But how could I model the same data from a NoSQL perspective, in something like Dynamodb or Mongodb?
Would each user record (document in NoSQL) contain the same company table data (id, ownerid (or is owner true/false, and company name)? I'm unclear how to update the record for all users containing this data then if the company name needed to be updated.
In case you want to save the company object as JSON in each field (for performance reasons), indeed, you have to update a lot of rows.
But best way to achieve this is to have a similar structure as you have above, in MySQL. NoSql schema depends a lot on the queries you will be making.
For example, the schema above is great for:
Find a particular user by username, along with his company name. First you need to query User by username (you can add an index), get the companyId and do another query on Company to fetch the name.
Let's assume company name changes often
In this case company name update is easy. To execute the read query, you need 2 queries to get your result (but they should execute fast)
Embedded company JSON would work better for:
Find all users from a specific city and show their company name
Let's assume company name changes very rarely
In this case, we can't use the "relational" approach, because we will do 1 query to fetch Users by city and then another query for all users found to fetch the company name
Using embedded approach, we need only 1 query
To update a company name, a full (expensive) scan is needed, but should be ok if done rarely
What if company name changes ofter and I want to get users by city?
This becomes tricky, NoSQL is not a replacement for SQL, it has it's shortcomings. Solution may be a platform dependent feature (from mongo, dynamodb, firestore etc.), an additional layer above (elasticSearch) or no solution at all (consider not using key-value NoSQL)
Depends on the programming language used to handle NoSQL objects/documents you have variety of ORM libraries to model your schema. Eg. for MongoDB plus JS/Typescript I recommend Mongoose and its subdocuments. Here is more about it:
https://mongoosejs.com/docs/subdocs.html

DDD: modelling aggregate entities' unique global/local id in PostgreSQL

I have read Eric Evans' Domain Driven Design book and I have been trying to apply some of the concepts.
In his book, Eric talks about aggregates and how aggregate roots should have a unique global id whereas aggregate members should have a unique local id. I have been trying to apply that concept to my database tables and I'm running into some issues.
I have two tables in my PostgreSQL database: facilities and employees where employees can be assigned to a single facility.
In the past, I would lay out the employees table as follows:
CREATE TABLE "employees" (
"employeeid" serial NOT NULL PRIMARY KEY,
"facilityid" integer NOT NULL,
...
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is a globally unique id. I would then add code in the backend for access control validation, preventing users of one facility from accessing rows pertaining to other facilities. I have a feeling this might not be the safest way to do it.
What I am now considering is this layout:
CREATE TABLE "employees" (
"employeeid" integer NOT NULL,
"facilityid" integer NOT NULL,
...
PRIMARY KEY ("employeeid", "facilityid"),
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is unique (locally) for a given facilityid but needs to be paired with a facilityid to be unique globally.
Concretely, this is what I am looking for:
Employee A (employeeid: 1, facilityid: 1)
Employee B (employeeid: 2, facilityid: 1)
Employee C (employeeid: 1, facilityid: 2)
where A, B and C are 3 distinct employees and...
adding an employee D to facility 1 would give him the keys (employeeid : 3, facilityid: 1)
adding an employee E to facility 2 would give him the keys (employeeid : 2, facilityid: 2)
I see two ways of achieving this:
I could use triggers or stored procedures to automatically generate new employeeids and store the last ids for every facility in another table for quicker access but I am concerned about concurrency issues and ending up with 2 employees from the same facility with the same id.
I could possibly create a new sequence for each facility to manage the employeeids but I fear ending up with thousands of sequences to manage and with procedures to delete those sequences in case a facility is deleted. Is there anything wrong with this? It seems heavy to me.
Which approach should I take? Is there anything I'm missing out on?
I am inferring from your question that you will be running a single database for all facilities, or at least that if you have a local database as the "master" for each facility that the data will need to be combined in a central database without collisions.
I would make the facilityid the high order part of the primary key. You could probably assign new employee numbers using a simple SELECT max(employeeid) + 1 ... WHERE facilityid = n approach, since adding employees to any one facility is presumably not something that happens hundreds of times per second from multiple concurrent sources. There is some chance that this could generate an occasional serialization failure, but it is my opinion that any database access should be through a framework which recognizes those and automatically retries the transaction.
I guess you overstressed the aggregate root concept here. In my understanding of modelling an employee (that depends on your context) an employee is almost always an aggregate root possibly referenced by another aggregate root facility.
Both employee and facility almost always have natural keys. For the employee this is typically some employee id (printed on employee identification badges, or at least maintained in the human resources software system) and facilities have this natural keys too almost always containing some location part and some number like "MUC-1" for facility 1 located in munich. But that all depends on your context. In case employee and facility have this natural keys your database model should be quite clear.

Facebook "like" data structure

I've been wondering how facebook manages the database design for all the different things that you can "like". If there is only one thing to like, this is simple, just a foreign key to what you like and a foreign key to who you are.
But there must be hundreds of different tables that you can "like" on facebook. How do they store the likes?
If you want to represent this sort of structure in a relational database, then you need to use a hierarchy normally referred to as table inheritance. In table inheritance, you have a single table that defines a parent type, then child tables whose primary keys are also foreign keys back to the parent.
Using the Facebook example, you might have something like this:
User
------------
UserId (PK)
Item
-------------
ItemId (PK)
ItemType (discriminator column)
OwnerId (FK to User)
Status
------------
ItemId (PK, FK to Item)
StatusText
RelationshipUpdate
------------------
ItemId (PK, FK to Item)
RelationshipStatus
RelationTo (FK to User)
Like
------------
OwnerId (FK to User)
ItemId (FK to Item)
Compound PK of OwnerId, ItemId
In the interest completeness, it's worth noting that Facebook doesn't use an RDBMS for this sort of thing. They have opted for a NoSQL solution for this sort of storage. However, this is one way of storing such loosely-coupled information within an RDBMS.
Facebook does not have traditional foreign keys and such, as they don't use relational databases for most of their data storage. Simply, they don't cut it for that.
However they use several NoSQL type data stores. The "Like" is most likely attributed based on a service, probably setup in an SOA style manner throughout their infrastructure. This way the "Like" can basically be attributed to anything they want it to be associated with. All this, with vast scalability and no tightly coupled relational issues to deal with. Something that Facebook, can't really afford to deal with at the volume they operate.
They could also be using an AOP (Aspect Oriented Programming) style processing mechanism to "attach" a "Like" to anything that may need one at page rendering time, but I get the notion that it is asynchronous processing via JavaScript against an SOA style web service or other delivery mechanism.
Either way, I'd love to hear how they have this setup from an architecture perspective myself. Considering their volume, even the simple "Like" button becomes a significant implementation of technology.
You can have a table with Id, ForeignId and Type. Type can be anything like Photo, Status, Event, etc… ForeignId would be the id of the record in the table Type. This makes possible for both comments and likes. You only need one table for all likes, one for all comments and the one I described.
Example:
Items
Id | Foreign Id | Type
----+-------------+--------
1 | 322 | Photo
4 | 346 | Status
Likes
Id | User Id | Item Id
----+-------------+--------
1 | 111 | 1
Here, user with Id 111 likes the photo with Id 322.
Note: I assume you are using an RDBMS, but see Adron's answer. Facebook does not use an RDBMS for most of their data.
I'm pretty sure Facebook does not store "like" information as how some other suggested it using RDBMS. With millions of users and possibly thousands of like, we're looking at thousands of rows to join here which would impact performance.
The best approach here is to append all "likes" in a single row. For example, a table with user_like_id column of text datatype. Then all id's who liked the post is appended. In this case, you only query one row and you got everything. This will be a lot faster than joining tables and getting counts.
EDIT: I haven't been here on this site lately and I just discovered this answer has been downvoted. Well, here's an example post with like count and their avatars. This is my design where I just implemented what I'm talking about.
The two components here are 1.) XREF table and 2.) JSON object.
The likes are still stored on a XREF table. But at the same time, data is appended on JSON object and stored on a text column on the post table.
Why did I store the likes info on a text column as JSON? So that there's no need to do db lookup/joins for the likes. If someone unlike the post, the JSON object is just updated.
Now I don't know why this answer is downvoted by some users here. This answer provides quick data retrieval. This is close to NoSQL approach which is how FB access data. In this case, there's no need for extra joins/lookup to get likes info.
And here's the table that holds the likes. It's just a simple XREF mapping between user and item table.

Why does DBIx::Class not create many-to-many accessors?

While creating a schema from a database many-to-many relationships between tables are not created.
Is this a principal problem?
Is it possible to detect from the table structure that many-to-many relationships exist and create the respective code in schema classes automagically?
It is indeed a somewhat fundamental problem -- many_to_many is a "relationship bridge" and not a "relation." The documentation explains that "the difference between a bridge and a relationship is, that the bridge cannot be used to join tables in a search, instead its component relationships must be used."
On the other hand, this means that if the real relationships are correctly discovered it should be straightforward to add the many-to-many relationships automatically: First, search for tables that have two or more has_many relationships. Then, for each pair of such relationships, create a many-to-many relationship bridge. (Of course, one might hope that DBIx::Class would do this itself.)
The problem with developing this kind of code is that many tables that contain multiple references are not many-to-many tables, and have multiple references for other reasons. For instance, I'll make up a schema for some fictional app where something could be regarded as a many-to-many table, when it is not.
create table category (
id primary key,
...
);
create table sub_category (
id primary key,
category references category(id),
...
);
/* EDIT:
This is the table that could be regarded as many_to_many
by an automated system */
create table product (
id primary key,
category references category(id),
sub_category references sub_category(id),
...
);
Something could be built this way for ease of use, without having to do multiple table joins in the database on a website, especially when considering speed. It would be difficult for a piece of code to say definitively 'this is not a many_to_many' situation, while the developer should be able to easily figure it out, and add in the many_to_many line below the checksum.
I consider DBIX::Class schema outputs a good starting point, and little more, especially when working with auto numbering in non-MySQL databases, among other things. I often need to modify above the "Don't modify above this line" stuff (although many_to_many can obviously go below that checksum, of course.

No-sql relations question

I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.
Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.
For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!
Any help appreciated.
I can only speak to CouchDB.
The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.
You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].
Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.
http://wiki.apache.org/couchdb/View_collation
With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.
In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.
Embedded documents become more relevant for cases that aren't many-to-many like this.
Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).
The word "column" is another loaded
word like "table" and "base" which
carries the emotional baggage of years
of RDBMS experience.
Instead, I find it easier to think
about this like a multidimensional map
- a map of maps if you will.
For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.
See the FAQ question 20 in the Hadoop/HBase FAQ:
Q:[Michael Dagaev] How would you
design an Hbase table for many-to-many
association between two entities, for
example Student and Course?
I would
define two tables: Student: student
id student data (name, address, ...)
courses (use course ids as column
qualifiers here) Course: course id
course data (name, syllabus, ...)
students (use student ids as column
qualifiers here) Does it make sense?
A[Jonathan Gray] : Your design does
make sense. As you said, you'd
probably have two column-families in
each of the Student and Course tables.
One for the data, another with a
column per student or course. For
example, a student row might look
like: Student : id/row/key = 1001
data:name = Student Name data:address
= 123 ABC St courses:2001 = (If you need more information about this
association, for example, if they are
on the waiting list) courses:2002 =
... This schema gives you fast access
to the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
In relational database, the concept is very clear: one table for cars with columns like "car_id, car_type, car_name, car_price", and another table for shops with columns "shop_id, car_id, shop_name, sale_count", the "car_id" links the two table together for data Ops. All the columns must well defined in creating the database.
No SQL database systems do not require you pre-define these columns and tables. You just construct your records in a certain format, say JSon, like:
"{car:[id:1, type:auto, name:ford], shop:[id:100, name:some_shop]}",
"{car:[id:2, type:auto, name:benz], shop:[id:105, name:my_shop]}",
.....
After your system is on-line providing service for your management, you may find there are some flaws in your design of db structure, you hope to add one column "employee" of "shop" for your future records. Then your new records coming is as:
"{car:[id:3, type:auto, name:RR], shop:[id:108, name:other_shop, employee:Bill]}",
No SQL systems allow you to do so, but relational database is impossible for this job.