I have a set of tables and am currently trying to set up the correct relationships for the tables. The condensed version is below.
Users
ID INT NOT NULL
Activities
ID INT NOT NULL
UserID INT NULL
Logs
ID INT NOT NULL
UserID INT NULL
ActivityID INT NULL
I have relationships relating UserID from both Activities and Logs back to Users.ID and ActivityID relating back to Activities.ID.
I have set Activities.UserID and Logs.UserID to set null on delete, and cascade on update. My problem comes when i attempt to set the same update and delete functions to Logs.ActivityID but i get an error about "may cause cycles or multiple cascade paths". My problem is, Logs Require a User, and do not require an Activity, but, Logs that do have an Activity need to be updated if and when an Activity changes.
What way do i have around this that does not involve having two separate Logs tables, and does not involve manually updating the Logs table. Is this even possible in SQL Server 2012?
SQL Server does not support multiple cascade paths. Your options to work around this limitation are:
write your own logic for dealing with multiple cascade paths (recommended).
change your schema such that multiple cascade paths are not required.
wait for SQL Server to fix this. Don't hold your breath - this has been a limitation for years. See these Connect items:
a. http://connect.microsoft.com/SQLServer/feedback/details/126159/cascade-updates
b. http://connect.microsoft.com/SQLServer/feedback/details/307723/allow-multiple-cascade-paths-for-foreign-key-and-on-dalete-update-cascade
Notice how they keep saying "we don't have time for this now; we'll consider it for the next release"? This isn't the kind of thing that sells software, because people are satisfied - generally - with coding the logic themselves.
migrate to a database platform that supports multiple cascade paths (sounds like you think you have many options for this, but I don't think you do, and I don't know what you sacrifice by switching, not even counting porting your schema and code).
Related
When implementing a system which creates tasks that need to be resolved by some workers, my idea would be to create a table which would have some task definition along with a status, e.g. for document review we'd have something like reviewId, documentId, reviewerId, reviewTime.
When documents are uploaded to the system we'd just store the documentId along with a generated reviewId and leave the reviewerId and reviewTime empty. When next reviewer comes along and starts the review we'd just set his id and current time to mark the job as "in progress" (I deliberately skip the case where the reviewer takes a long time, or dies during the review).
When implementing such a use case in e.g. PostgreSQL we could use the UPDATE review SET reviewerId = :reviewerId, reviewTime: reviewTime WHERE reviewId = (SELECT reviewId from review WHERE reviewId is null AND reviewTime is null FOR UPDATE SKIP LOCKED LIMIT 1) RETURNING reviewId, documentId, reviewerId, reviewTime (so basically update the first non-taken row, using SKIP LOCKED to skip any already in-processing rows).
But when moving from native solution to JDBC and beyond, I'm having troubles implementing this:
Spring Data JPA and Spring Data JDBC don't allow the #Modifying query to return anything else than void/boolean/int and force us to perform 2 queries in a single transaction - one for the first pending row, and second one with the update
one alternative would be to use a stored procedure but I really hate the idea of storing such logic so away from the code
other alternative would be to use a persistent queue and skip the database all along but this introduced additional infrastructure components that need to be maintained and learned. Any suggestions are welcome though.
Am I missing something? Is it possible to have it all or do we have to settle for multiple queries or stored procedures?
Why Spring Data doesn't support returning entity for modifying queries?
Because it seems like a rather special thing to do and Spring Data JDBC tries to focus on the essential stuff.
Is it possible to have it all or do we have to settle for multiple queries or stored procedures?
It is certainly possible to do this.
You can implement a custom method using an injected JdbcTemplate.
When saving an entity in EntityFrameworkCore, the Id of course is posted as part of the update statement from the client.
In reading a blog, they mentioned that it is possible for a hacker to simply change the Id of an entity, and thereby cause an update for a different entity, possibly for a different user.
Is there a way to lock this down?
The only way I can think of is to get the existing entity on the server, and then confirm that it does indeed belong to the current user, and only then update.
This is where an authorization framework comes in. The server side code takes the identity of the caller and checks if the caller is allowed to perform that action.
If your business rules say that users are only allowed to update their own entities, your authZ checks whether the id associated with the authenticated user matches the one in the update statement.
If the user has admin privileges, he/she is likely to be allowed to update any entity.
The other answer does not really solve the issue, as it lets people with access to still be malicious and alter records outside the scope of what is allowed within that edit/update event.
The security side is good practice, however in order to protect further, you would be better with GUID primary keys, instead of 1, 2, 3 and then also a timestamp field for concurrency Checking.
For example, Lets say we have a model called Person, we want to ensure that when this is updated, its not only out of date ( i.E someone beat us to it ) but we want to also make sure this is the correct record, I.E Concurrency: Assuming SQL server here.
public class Person{
[Timestamp]
public byte[] Timestamp { get; set; }
}
In our Fluent API Model Creating we would then configure the timestamp as a rowversion identifier.
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Person>()
.Property(p => p.Timestamp)
.IsRowVersion();
}
Together with GUID key, they not only got to work out a correct GUID, but also have to get the rowversion right too.
Now what you dont want to do is try and resolve the concurrency, as advised all over the place, because essentially, you managed to find another guid, and got a hit, you now dont want to start copying data from one secure source to another. Instead you can either redirect to your original get ( this will re-check permissions and reload ( with an unable to update message ) or you can simply say it failed. Either way, you want to avoid saying ( well done, that was a good guess at another GUID, but the timestamp is wrong ). I.E on login you wouldn't say password is wrong, you just say login failed.
It can be a pain in the rear for the end user, because you are not making things easy for them, You are not handling the concurrency for them and matching up and sorting all the data for them. But you have to remember easier for a user = easier for a bad user. Just like its no longer easy to jump onto an airplane, it's not meant to piss you off, its meant to keep you safe, and thats the rules.
Its worth thinking about the data as its own entity, security of a system, should not be the only way you secure the data, passwords are easy to crack, hashed passwords are easy to crack. peoples computers are easy to crack, and then get forward access onto systems. Thats just one part, You should be protecting the data, even from any God or admin role you may have implemented.
book:
id: primary key, integer
title: varchar
borrowed: boolean
borrowed_by_user_id: foreign key user.id
user:
id: primary key, integer
name: varchar
blocked: boolean
The isolation level is READ COMMITED, because it is default level in PostgreSQL (this requirement is not from me).
I am using one database transaction to SELECT FOR UPDATE a book and lend it to any user if book is not borrowed yet. The book was selected FOR UPDATE so it cannot be borrowed concurrently.
But there is another problem. We cannot allow to lend a book to blocked user. How can we ascertain that? Even if we check at the beginning if user is not blocked, the result might not be correct because a concurrent transaction could block the user after that check.
For example, a user can be blocked by a concurrent transaction from the admin's panel.
How to solve that issue?
I see that I can use SERIALIZABLE. It requires a handling errors, yes?
I am not sure how that CHECK works. Could you say more about it?
These are actually two questions.
About the books:
If you lock the book with SELECT ... FOR UPDATE as soon as you consider lending it out, this is an example of “pessimistic locking” and will block the book for all concurrent activity.
That is fine if the transactions are very short – specifically, if there is no user interaction between the locking and the end of the transaction.
Otherwise you should use “optimistic locking”. This can be done in several ways:
Use REPEATABLE READ transaction isolation. Then updating a book that has been modified since you read its data will lead to a serialization error (see the note at the end).
When selecting books, remember the values of the system columns ctid and xmin. Then update as follows:
UPDATE books SET ...
WHERE id = ...
AND ctid = original_ctid AND xmin = original_xmin;
If no row gets updated, somebody must have modified the book since you looked at it.
About the users:
Three ideas:
You use SERIALIZABLE transaction isolation (see the note at the end).
You maintain a counter on the user that contains the number of books the user has borrowed.
Then you can have a check constraint like
ALTER TABLE users ADD CHECK (NOT blocked OR books_borrowed = 0);
Such a check constraint is evaluated at the end of each statement and has to yield TRUE, else an error is thrown.
So either the transaction that borrows a book or the transaction that blocks the user must fail (both transactions have to modify the user).
Right before lending a book to a user, you run
SELECT blocked FROM users WHERE id = ... FOR UPDATE;
If you get TRUE, you abort the transaction, otherwise lend out the book.
A concurrent transaction that wants to block the user has to SELECT ... FOR UPDATE on the user as well and only then check if there are any books lent to that user.
That way, no inconsistency can happen: if you want to block a user, all concurrent transactions that want to lend a book to the user must either be completed, so that you see their effect, or they must wait until you are done blocking the user, whereupon they will fail.
Note about higher isolation levels:
If you run transactions at an isolation level of REPEATABLE READ or SERIALIZABLE, you can encounter serialization errors. These are not bugs in your program, they are normal and to be expected. If you encounter a serialization error, you have to rollback and try the same transaction again. That is the price you pay for not having to worry about race conditions.
Due to some vague reasons we are using replicated orient-db for our application.
It looks likely that we will have cases when a single record could be created twice. Here is how it happens:
we have entities user_document, they have ID of user and ID of document - both users and documents are managed in another application and stored in another database;
this another application on creating new document sends broadcast event (via rabbit-mq topic);
several instances of our application receive this message and create another user_document with the same pair of user_id and document_id.
If I understand correct, we should use UNIQUE index on the pair of these ids and rely upon distributed transactions.
However due to some reasons (we have another team writitng layer between application and database) we probably could not use UNIQUE though it may sound stupid :)
What are our chances then?
Could we, for example, allow all instances to create redundant records and immediately after creation select by user_id and document_id and if more than one were found, delete ones with lexicografically higher own id?
Sure you can do in that way.
You can try to use something like
DELETE FROM (SELECT FROM user_document where user_id=? and document_id=? skip 1)
However, take a notice that without creation of index this approach may consume some additional resources on server, and you might got a significant slow down if user_document have big amount of records.
Why is uniqueidentifier (which equates to a GUID in .NET) used as the type for the Id fields in the EventSources and Events tables?
Would it not be faster to use an integer type (like bigint in SQL Server) that functioned as an identity, so that the database could assign the Id as the inserts are performed?
I am a complete newb when it comes to Event Sourcing and CQRS, so I apologize if this has been asked and answered and my searching hasn't been correct enough to find the answer.
Note: Answers 2 and 4 assume that you are following a few basic principles of Domain-Driven Design.
IDs should be unique across different aggregate types and even across different bounded contexts
Every aggregate must always be in a valid state. Having a unique ID is part of that. This means you couldn't do anything with the aggregate before the initial event has been stored and the database generated and returned the respective ID.
A client that sends the initial command often needs some kind of reference to relate the created aggregate to. How would the client know which ID has been assigned? Remember, commands are void, they don't return anything beyond ack/nack, not even an ID.
A domain concern (identification of an entity) would heavily rely on technical implementation details (This should actually be #1)