Issue Insert/Update EF Core DbContext in Azure QueueTrigger Function (Multi-threading) - entity-framework-core

I´m getting PK Violation Exception when using EF Core 2.1 DbContext in an Azure QueueTrigger function. Guess is due to the nature of DbContext not being thread-safe, and the Azure Function running different instances in parallel. I have read quite a few, but I can´t find a good approach to solve this.
Here is my scenario (producer-consumer pattern):
I have a Scheduled Azure Function that is calling an API to get Projects from different external systems. To get all the required info for a project, I need to run different Queries to other external services, so I´m decoupling this to another Azure function, so the Scheduled function just queues a message per Project, as “Sync Project ID 101”.
Another QueueTrigger Function fires every time a message is queued, so, it means different instances running in parallel. This function must gather all the data of a specific Project, and that means more calls to other external services / APIs, to (some kind of) aggregate all the info about a Project. IMHO it´s good to do it that way, as I can process multiple Projects in parallel, and I can scale the Function if I need it.
Once I have all this Project info, I want to persist it in a SQL DB using EF Core (and here comes the issue)
Project data includes Users in the Project, and each user have a specific GUID as PK (coming from the external system). That means I can have repeated Users IDs in different Function instances, and here is the problem, as when I try to persist User info in a SQL Table, I can get PK Duplication exception, as multiple Function instances can try to Insert the same User at the same time (when the instance A check if user exists, it gets False, but another instance B is actually adding this User, so when instance A tries the Insert, it fails).
Guess I can lock DbContext somehow, but not sure if is good, as I also have a website doing Queries to the SQL DB (read-only queries for now, but could be updates in future too).
Another idea could be to send the entire Project info to another Queue / Blob file, and have another function in Singleton mode that Insert the data into SQL.
I´ve created this project simplifying my scenario, but enough to reproduce the issue and understand the problem.
https://github.com/luismanez/queuetrigger-efcore-multithreading
Any other ideas or recommended approaches? (open to change the architecture if find something better)
Many thanks!

A "more easy" way could be to do some kind of upsert in the database. There is a sample of how to do that with EF Core: https://www.flexlabs.org/2018/02/adding-upsert-support-for-entity-framework-core

Related

Efficient write only (CREATE/UPDATE) object persistence in EF6 - stored procedures?

It seems a lot has changed in EF (code first) since many of the questions on SO were written... many refer to EF4 and I am using EF6, I know stored procedures are a new feature in EF6 for instance.
My application periodically calls an Xml web service and simply dumps the objects in a database; I have a C# object for each Xml type which maps directly to one DB table using EF.
Objects may be created or updated depending if they are already in the DB. My application has no interest in the content of the objects and should not keep them in memory as it may be running for weeks at a time.
What is an efficient way to Create-or-Update objects in this scenario? I am imagining that my DbContext will be created (and disposed) each time the web service is polled, so it will never already know what objects are in the DB. In a non-EF world I'd probably create a stored-procedure CreateOrUpdate(...) so I wonder if there is a simple parallel.
I do not need to use stored procedures with EF but it sounds like it might be a nicer idea. And since we're not used to EF and not all our modules are using it, haveing the stored-procedures in the DB in case someone else wants to use them seems useful.

Strategy for deploying EF controlled Database

I have written an application using EF 6.0 in combination with an SQL Server Compact 4.0 Database. When a customer uses this application for the first time, it (the application) should create a database-file in a given path with some initital values. Also migrations should be allowed, for it is quite possible that the object model might change with future versions of the app.
Now I´m wondering what would be the best way to to deploy the DB on the users productive system. I could think of three ways:
I could create a DB-file with initial values and just copy it to the right place during installation process and use MigrateDatabaseToLatestVersionInitializer in the app.
In the DbContext-Constructors (I have two contexts) I could check for an existing DB-file and use different Database-Initializers accordingly. Like a CreateDatabaseIfNotExistsInitializer with a seed method that creates initial data if no fiel is found and a MigrateDatabaseToLatestVersionInitializer if the DB-file exists.
I could use the MigrateDatabaseToLatestVersionInitializer always and in its "Seed"-method check for existing table entries and create them if they are not present.
Which of these ways is to be preferred or is there a better way I didn´t think of?
It sounds like this is a desktop application so you might want to catch permissions errors about creating the database file at installation time (i.e. option 1) rather than run time, especially as in option 2 the database initialization is not an imperative command you're giving that you can put a try...catch around.
I don't think option 3 would work as the Seed method gets run after all the migrations, so surely the migrations will either have successfully run, in which case the tables don't need creating, or they will have failed as the DB doesn't exist and therefore your Seed method won't get run.

Entity Framework - Share Transactions across Bounded Contexts

I am working on a very large application with over 100 modules, and almost 500 tables in the database. We are converting this application to WPF/WCF using Entity Framework 4.2 Code First. Our database is SQL Anywhere 11. Because of the size of the database, we are using an approach similar to Bounded DbContexts, as described here http://msdn.microsoft.com/en-us/magazine/jj883952.aspx by Julie Lerman.
Each of our modules creates its own DbContext, modeling only the subset of the database that it needs.
However, we have run into a serious problem with the way DbContexts are created. Our modules are not neatly self-contained, nor can they be. Some contain operations that are called from several other modules. And when they are, they need to participate in transactions started by the calling modules. (And for architectural reasons, DTC is not an option for us.) In our old ADO architecture, there was no problem passing an open connection from module to module in order to support transactions.
I've looked at various DbContext constructor overloads, and tried managing the transaction from EntityConnection vs. the StoreConnection, and as far as I can tell, there is no combination that allows ModuleA to begin a transaction, call a function in ModuleB, and have ModuleB's DbContext participate in the transaction.
It comes down to two simple things:
Case 1. If I construct DbContextB with DbContextA's EntityConnection, DbContextB is not built with its own model metadata; it reuses DbContextA's metadata. Since the Contexts have different collections of DbSets, all ModuleB's queries fail. (The entity type is not a part of the current context.)
Case 2. If I construct DbContextB with ModuleA's StoreConnection, DbContextB does not recognize the StoreConnection's open transaction at the EntityConnection level, so EF tries to start a new transaction when ModuleB calls SaveChanges(). Since the database connection in fact has an open transaction, this generates a database exception. (Connection does not support parallel transactions.)
Is there any way to 1) force DbContextB to build its own model in Case 1, or 2) get DbContextB's ObjectContext to respect its StoreConnection's transaction state in Case 2?
(By the way, I saw some encouraging things in the EF6 alpha, but after testing it out, found the only difference was that I could create DbContextB on an open connection. But even then, the above 2 problems still exist.)
I suggest you try use the TransactionScope object to manage this task for you. As long as all of your DbContexts use the same connection string (not connection object) the transaction should not try to enlist MS-DTC.

WF4 TransactionScope containing several custom activities with EF4 database updates

I have created several custom activities that update tables in my DB (in this case SQL Server Compact), using Entity Framework 4 with POCOs.
If I put more than one of these inside a WF4 TransactionScope activity, I'm running into problems: EF disposes the DB connection after the first activity has finished, and when the next DB activity tries to do a DB update a new connection is built up. At this moment an exception is thrown.
System.Activities.WorkflowApplicationAbortedException : The workflow has been aborted.
----> System.Data.EntityException : The underlying provider failed on Open.
----> System.InvalidOperationException : The connection object can not be enlisted in transaction scope.
Do I have to keep the EF connection open during the whole transaction scope? How can I do that? Create an explicit custom activity for that, or is there a standard way?
My current workaround goes like this: I created a new code activity that creates our ObjectContext and explicitely calls dbContext.Connection.Open(). It returns the ObjectContext, which is then saved in a workflow variable. That one is passed to all the DB related activities as an InArgument<>. Inside my DB activities, I use this ObjectContext if it is passed in, otherwise I create a new one.
This does work, but I'm not satisfied with this solution: It needs the new InArgument for every DB related activity. In the workflow designer, I have to insert that special OpenDatabaseConnection activity inside the transaction scope, and then make sure that the correct variable is passed into all DB activities. This seems to be very inelegant and error prone, especially if other team members have to use these DB activities.
What would be a better way to handle this?
The problem is that when you open a second connection in the same transaction scope, an attempt is made to promote the transaction to a distributed transaction (even though there's nothing distributed about it since you connect to the same database). SQL Server CE doesn't support this scenario.
What I would do is create a custom 'container' activity that opens (and closes) the connection and makes it available to child activities. This is still not optimal but at least you no longer need to pass InArgument's around. You get the following activity tree:
TransactionScope
InitializeConnection
Sequence
CustomDataActivity1
CustomDataActivity2
CustomDataActivity3
InitializeConnection is a NativeActivity that uses NativeActivityContext.Properties to expose the connection (or the ObjectContext) to child activities.
Make sure you implement proper error handling to ensure you close the connection at all times.
NOTE: Distributed transactions are supported by the full SQL Server only through a Windows service called MSDTC (Microsoft Distributed Transaction Coordinator). You can find this one in your 'Local Services'. Since SQL Server CE is a database that should be able to operate completely standalone, it makes sense that it has no dependency on MSDTC. Therefore it has no support for distributed transactions.

New entity ID in domain event

I'm building an application with a domain model using CQRS and domain events concepts (but no event sourcing, just plain old SQL). There was no problem with events of SomethingChanged kind. Then I got stuck in implementing SomethingCreated events.
When I create some entity which is mapped to a table with identity primary key then I don't know the Id until the entity is persisted. Entity is persistence ignorant so when publishing an event from inside the entity, Id is just not known - it's magically set after calling context.SaveChanges() only. So how/where/when can I put the Id in the event data?
I was thinking of:
Including the reference to the entity in the event. That would work inside the domain but not necesarily in a distributed environment with multiple autonomous system communicating by events/messages.
Overriding SaveChanges() to somehow update events enqueued for publishing. But events are meant to be immutable, so this seems very dirty.
Getting rid of identity fields and using GUIDs generated in the entity constructor. This might be the easiest but could hit performance and make other things harder, like debugging or querying (where id = 'B85E62C3-DC56-40C0-852A-49F759AC68FB', no MIN, MAX etc.). That's what I see in many sample applications.
Hybrid approach - leave alone the identity and use it mainly for foreign keys and faster joins but use GUID as the unique identifier by which i pull the entities from the repository in the application.
Personally I like GUIDs for unique identifiers, especially in multi-user, distributed environments where numeric ids cause problems. As such, I never use database generated identity columns/properties and this problem goes away.
Short of that, since you are following CQRS, you undoubtedly have a CreateSomethingCommand and corresponding CreateSomethingCommandHandler that actually carries out the steps required to create the new instance and persist the new object using the repository (via context.SaveChanges). I will raise the SomethingCreated event here rather than in the domain object itself.
For one, this solves your problem because the command handler can wait for the database operation to complete, pull out the identity value, update the object then pass the identity in the event. But, more importantly, it also addresses the tricky question of exactly when is the object 'created'?
Raising a domain event in the constructor is bad practice as constructors should be lean and simply perform initialization. Plus, in your model, the object isn't really created until it has an ID assigned. This means there are additional initialization steps required after the constructor has executed. If you have more than one step, do you enforce the order of execution (another anti-pattern) or put a check in each to recognize when they are all done (ooh, smelly)? Hopefully you can see how this can quickly spiral out of hand.
So, my recommendation is to raise the event from the command handler. (NOTE: Even if you switch to GUID identifiers, I'd follow this approach because you should never raise events from constructors.)