I'm passing user information as part of each call to a stateful service. I use this information for audit purposes in the service.
Do I have to pass this information around within the service, or is there some other mechanism such as a user context to keep the data in that I can access globally? I have used thread storage (data slots) in the past to hold data, but since the code is async I assume that won't work?
There's a concept called CallContext, that could help. Read more here: http://blog.stephencleary.com/2013/04/implicit-async-context-asynclocal.html
You can't keep that information in some member in the service since you have no control over how concurrent calls are being executed async on multiple threads. The way the runtime allows Tasks and async methods to 'jump' between means that ThreadStorage is not a safe way to keep that context.
What you need is something that can carry all the way from your ServiceRemotingDispatcher down to the execution of your actual service methods and is not affected by concurrent calls (to potentially the same method). The way the underlying Service Fabric implementation executes your method means that there is multiple Tasks and async methods that are being called, this also means that simply based on this the Thread id is out of the question as an option since it is almost guaranteed to jump threads at least once before you reach your code.
I wrote this answer to a similar question (that was deleted, so I added this q back myself). It basically solves your question and it is based on CallContext as mentioned by #LoekD
Related
I'm looking for some suggestions here. The usecase is a networking device (like router) with networking operations performed over gRPC.
Let's say there are "n" model objects, like router, interfaces, routing configuration objects like OSPF etc. Every networking operation, like finally be a CRUD on on or many of the model objects.
Now, when defining this over a gRPC service, there seems to be 2 options:
Define generic gRPC RPCs, like "SET" and "GET". The parameter will be a list of objects and operations. Like SET((router, update), (interface, update)..
Define very specific RPCs. Like "setInterfaceProperty_x", "createOSPFInstance".. And there could be many many such RPCs.
With #2, we are building the application intelligence in the RPCs itself. Every new feature might need new RPCs from this service.
With #1, the RPCs are the means, but the intelligence reside with the application which uses the RPC in a context. The RPC list will be just a very few and doesn't change over time.
What is the preferred approach? Generic RPCs (and keep it very few) or have tens (or more) of operation driven RPCs? I see some opensource projects like P4Runtime take approach #1.
Thanks for your time. I can provide more information if required.
You should use option #2. This puts your interface contract in the proto, rather than in your application. You leave your self many open doors by picking option #2 that would be cumbersome or unsupportable otherwise:
If the API definition of an object doesn't match the internal representation, you need to define a mapping between the two. Suppose you update your internal code to not need InterfaceProperty any more, and it was instead moved to a new field called BetterInterfaceProperties. Option one would force you to keep the old field exposed, while option 2 would allow you to reinterpret the call and do the right thing.
Fine grained access controls are easier with specific methods. All users may be able to set publicProperty, but only admins can set dangerousProperty. By grouping all the fields into a single call (as in #1), your caller has to reinterpret error messages, while option #2 it's more clear why authorization failed.
Smaller return values. Having a method like getSpecificProperty will do much less work than getFullObject. As your data model gets more complex, you will have to include more and more data on return messages. Even if the caller only cares about one thing, they have to wait for all of them. Consider a Database application. The database might have to do several unnecessary queries to fill in fields the client will never read.
There are reason to use #1, but they aren't that valuable until you identify what properties go together and are logically a single RPC. (such as a Get)
From what I understand the main q thread monitors it socket descriptors for requests and respond to them.
I want to use a while loop in my main thread that will go on for an indefinite period of time. This would mean, that I will not be able to use hopen on the process port and perform queries.
Is there any way to manually check requests within the while loop.
Thanks.
Are you sure you need to use a while loop? Is there any chance you could, for instance, instead use the timer functionality of KDB+?
This could allow you to run a piece of code periodically instead of looping over it continually. Depending on your use case, this may be more appropriate as it would allow you to repeatedly run a piece of code (e.g. that could be polling something periodically), without using the main thread constantly.
KDB+ is by default single-threaded, which makes it tricky to do what you want to do. There might be something you can do with slave threads.
If you're interested in using timer functionality, but the built-in timer is too limited for your needs, there is a more advanced set of timer functionality available free from AquaQ Analytics (disclaimer: I work for AquaQ). It is distributed as part of the TorQ KDB framework, the specific script you'd be interested in is timer.q, which is documented here. You may be able to use this code without the full TorQ if you like, you may need some of the other "common" code from TorQ to provide functions used within timer.q
I have a .NET Core 1.1 API with EF Core 1.1 and using Microsoft's vanilla setup of using Dependency Injection to provide the DbContext to my services. (Reference: https://learn.microsoft.com/en-us/aspnet/core/data/ef-mvc/intro#register-the-context-with-dependency-injection)
Now, I am looking into parallelizing database reads as an optimization using WhenAll
So instead of:
var result1 = await _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var result2 = await _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
I use:
var repositoryTask1 = _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var repositoryTask2 = _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
(var result1, var result2) = await (repositoryTask1, repositoryTask2 ).WhenAll();
This is all well and good, until I use the same strategy outside of these DB Repository access classes and call these same methods with WhenAll in my controller across multiple services:
var serviceTask1 = _service1.GetSomethingsFromDb(Id);
var serviceTask2 = _service2.GetSomeMoreThingsFromDb(Id);
(var dataForController1, var dataForController2) = await (serviceTask1, serviceTask2).WhenAll();
Now when I call this from my controller, randomly I will get concurrency errors like:
System.InvalidOperationException: ExecuteReader requires an open and available Connection. The connection's current state is closed.
The reason I believe is because sometimes these threads try to access the same tables at the same time. I know that this is by design in EF Core and if I wanted to I could create a new dbContext every time, but I am trying to see if there is a workaround. That's when I found this good post by Mehdi El Gueddari: http://mehdi.me/ambient-dbcontext-in-ef6/
In which he acknowledges this limitation:
an injected DbContext prevents you from being able to introduce multi-threading or any sort of parallel execution flows in your services.
And offers a custom workaround with DbContextScope.
However, he presents a caveat even with DbContextScope in that it won't work in parallel (what I'm trying to do above):
if you attempt to start multiple parallel tasks within the context of
a DbContextScope (e.g. by creating multiple threads or multiple TPL
Task), you will get into big trouble. This is because the ambient
DbContextScope will flow through all the threads your parallel tasks
are using.
His final point here leads me to my question:
In general, parallelizing database access within a single business transaction has little to no benefits and only adds significant complexity. Any parallel operation performed within the context of a business transaction should not access the database.
Should I not be using WhenAll in this case in my Controllers and stick with using await one-by-one? Or is dependency-injection of the DbContext the more fundamental problem here, therefore a new one should instead be created/supplied every time by some kind of factory?
Using any context.XyzAsync() method is only useful if you either await the called method or return control to a calling thread that's doesn't have context in its scope.
A DbContext instance isn't thread-safe: you should never ever use it in parallel threads. Which means, just for sure, never use it in multiple threads anyway, even if they don't run parallel. Don't try to work around it.
If for some reason you want to run parallel database operations (and think you can avoid deadlocks, concurrency conflicts etc.), make sure each one has its own DbContext instance. Note however, that parallelization is mainly useful for CPU-bound processes, not IO-bound processes like database interaction. Maybe you can benefit from parallel independent read operations but I would certainly never execute parallel write processes. Apart from deadlocks etc. it also makes it much harder to run all operations in one transaction.
In ASP.Net core you'd generally use the context-per-request pattern (ServiceLifetime.Scoped, see here), but even that can't keep you from transferring the context to multiple threads. In the end it's only the programmer who can prevent that.
If you're worried about the performance costs of creating new contexts all the time: don't be. Creating a context is a light-weight operation, because the underlying model (store model, conceptual model + mappings between them) is created once and then stored in the application domain. Also, a new context doesn't create a physical connection to the database. All ASP.Net database operations run through the connection pool that manages a pool of physical connections.
If all this implies that you have to reconfigure your DI to align with best practices, so be it. If your current setup passes contexts to multiple threads there has been a poor design decision in the past. Resist the temptation to postpone inevitable refactoring by work-arounds. The only work-around is to de-parallelize your code, so in the end it may even be slower than if you redesign your DI and code to adhere to context per thread.
It came to the point where really the only way to answer the debate was to do a performance/load test to get comparable, empirical, statistical evidence so I could settle this once and for all.
Here is what I tested:
Cloud Load test with VSTS # 200 users max for 4 minutes on a Standard Azure webapp.
Test #1: 1 API call with Dependency Injection of the DbContext and async/await for each service.
Results for Test #1:
Test #2: 1 API call with new creation of the DbContext within each service method call and using parallel thread execution with WhenAll.
Results for Test #2:
Conclusion:
For those who doubt the results, I ran these tests several times with varying user loads, and the averages were basically the same every time.
The performance gains with parallel processing in my opinion is insignificant, and this does not justify the need for abandoning Dependency Injection which would create development overhead/maintenance debt, potential for bugs if handled wrong, and a departure from Microsoft's official recommendations.
One more thing to note: as you can see there were actually a few failed requests with the WhenAll strategy, even when ensuring a new context is created every time. I am not sure the reason for this, but I would much prefer no 500 errors over a 10ms performance gain.
I have the following architecture:
Ofc. there are ports and adapters, and everything else you can imagine...
What do you suggest, how to send a rest response by immediate consistency? Should I add another event bus and raise an event? (I guess the projection must send something about the success.)
How to handle errors in an event based system like this? (The event bus is not necessary, I can solve loose coupling with an IoC container, but I don't think sending a callback through so many objects would be a good solution.)
It's not hard, instead of sending a command, you can call directly the command handler from controller. Or have a service method which will handle the input and returns something. The important bit is that all these are done synchronously (i.e you need to wait until the handler finishes). The domain events handlers are unaffected, they can be async.
If you don't want to go 'hybrid' and want to always use the same workflow (as described in your pic) things are more complicated, you need the client to check often if the operation has completed. I think the better way is to be flexible so, for some tasks you can use the 'old' ways. The domain events will still be generated and handled, that part would not change. You're just changing the way a 'command' is executed.
Also, it's worth mentioning that you shouldn't expect responses from event handlers and if it makes you feel better, use the 'request-response' terminology instead of command-response.
Btw, you don't break CQRS this way, as long as your domain model isn't used to do queries i.e you have different model for writes and reads, it is CQRS.
Immediate consistency, at what cost? are you using DTC?
What if you later on want to have more than one subscriber for a given event in the read model, how many transactions will be involved in a DTC transaction scope? In order for you to have immediate consistency your events need to be handled sync, so what is the benefit in this architecture?
You can have immediate consistency and even immediate user notifications with client callback (signalR), but IMHO you should changes a few things in your architecture, starting with the drop of the immediate consistency bit.
Why do you think you need that btw?
Consider a function: IsWalletValid(walletID). It returns true if the walletID exists in the database, and updates a 'last_accessed_time' field.
A task runs periodically to remove any wallets that have not been accessed for a set period of time.
Seems like an easy solution for what we want to do, but IsWalletValid() has a side effect because it writes to the database.
Should we add an additional 'UpdateLastAccessedTime(walletID)' function? Everytime we call IsWalletValid() we will also need to remember to call UpdateLastAccessedTime(walletID).
Do verifying that a wallet is valid and updating it's last_accessed_time field need to be transactionally consistent (ACID)? You could use eventual consistency here:
The method IsWalletValid publishes an WalletAccessed event, then an event handler updates last_accessed_time asynchronously.
if last_accessed_time is not accessed by domain logic to make decisions on any write handling this could just be a facet of the read only projection. Seems like this is the same concern as other more verbose read audit concerns. Just because data is being written and maintained doesn't mean that it necessarily needs to be part of the write model of the system. If you did however want to implement this as part of the domain and perhaps stored within the same event store it could be considered a separate auditing context outside of the boundary of the original aggregate being audited.