Entity Framework 6 Context Lifetime - entity-framework

I have been reading everywhere that the 'right' way to use a context is the following:
using (var Db = new MyDatabase())
{
DoStuff(Db);
}
But I have found this to be painfully slow in some cases.
I have a function that is called a lot, from different threads, through web requests and is READ ONLY.
This is my test code:
var S = new Stopwatch()
S.Start();
Parallel.For(
0,
15000,
I =>
{
var P = GetPostWebContent(I);
});
S.Stop();
Console.WriteLine(S.ElapsedMilliseconds);
With the 'recommended' method the execution time is 303.6s for 15000 calls, 20.24ms per call.
Now, by keeping one context per thread, I get very different results, using the following code:
[ThreadStatic]
private static MyDatabase _Db;
if (_Db == null) _Db = new DatabaseWebsite();
DoStuff(Db);
And the execution time is 52.4s for 15000 calls, 3.5ms per call.
Which means that this solution is 5.8x faster!
I am a little bit shocked: I keep reading online that the context creation is not a big deal, etc; but in my case, it has a huge impact.
I am aware that with this implementation the context will keep growing in size, but I could implement a counter and re-create the context regularly.
Is there anything that would prevent this from being used? As I said above, all the operation are READ ONLY, I am not writing anything to the database.

It's just a recommendation. Context is not thread safe, so having one per thread is a good idea if performance is important for you.
I tend to pass contexts around methods that run on the same thread because I've found performance issues with creating contexts. It's worked on production code for me.

Related

implications of using callr_function = NULL in targets package

I was wondering what happens when callr_function = NULL?
Is it just issues with things maybe being in the environment/side effects?
Mainly wondering because I was passing quite large spatio-temporal arrays (0.5 to 5 gigs) and callr serialization via saveRDS is quite slow.
The two things I was thinking about was forking callr and dropping in a different save function or just using callr_function = NULL.
Ordinarily, targets runs the pipeline in a fresh new reproducible external R session. callr_function = NULL just says to run the pipeline in the current R session. I only recommend this for debugging because in serious use cases you could accidentally invalidate some targets based on changed data in your global environment. callr_function = NULL will probably not help solve issues with large memory. For that, I recommend selecting a more efficient storage format for your data, e.g. tar_target(..., format = "feather"). You could also try tar_option_set(memory = "transient", garbage_collection = TRUE) for better memory efficiency.

Background task in reactive pipeline (Fire-and-forget)

I have a reactive pipeline to process incoming requests. For each request I need to call a business-relevant function (doSomeRelevantProcessing).
After that is done, I need to notify some external service about what happened. That part of the pipeline should not increase the overall response time.
Also, notifying this external system is not business critical: giving a quick response after the main part of the pipeline is finished is more important than making sure the notification is successful.
As far as I learned, the only way to run something in the background without slowing down the overall process is to subscribe to in directly in the pipeline, thus achieving a fire-and-forget mentality.
Is there a good alternative to subscribing inside the flatmap?
I am a little worried about what might happen if notifying the external service takes longer than the original processing and a lot of requests are coming in at once. Could this lead to a memory exhaustion or the overall process to block?
fun runPipeline(incoming: Mono<Request>) = incoming
.flatMap { doSomeRelevantProcessing(it) } // this should not be delayed
.flatMap { doBackgroundJob(it) } // this can take a moment, but is not super critical
fun doSomeRelevantProcessing(request: Request) = Mono.just(request) // do some processing
fun doBackgroundJob(request: Request) = Mono.deferContextual { ctx: ContextView ->
val notification = "notification" // build an object from context
// this uses non-blocking HTTP (i.e. webclient), so it can take a second or so
notifyExternalService(notification).subscribeOn(Schedulers.boundedElastic()).subscribe()
Mono.just(Unit)
}
fun notifyExternalService(notification: String) = Mono.just(Unit) // might take a while
I'm answering this assuming that you notify the external service using purely reactive mechanisms - i.e. you're not wrapping a blocking service. If you are then the answer would be different as you're bound by the size of your bounded elastic thread pool, which could quickly become overwhelmed if you have hundreds of requests a second incoming.
(Assuming you're using reactive mechanisms, then there's no need for .subscribeOn(Schedulers.boundedElastic()) as you give in your example, as that's not buying you anything - it's designed for wrapping legacy blocking services.)
Could this lead to a memory exhaustion
It's only a possibility in really extreme cases, the memory used by each individual request will be tiny. It's almost certainly not worth worrying about, if you start seeing memory issues here then you'll almost certainly be hit by other issues elsewhere.
That being said, I'd probably recommend adding .timeout(Duration.ofSeconds(5)) or similar before your inner subscribe method to make sure the requests are killed off after a while if they haven't worked for any reason - this will prevent them building up.
...or [can this cause] the overall process to block?
This one is easier - a short no, it can't.

Axon Framework: Handle only events published by the same JVM instance?

Hi Axon Framework community,
I'd like to have your opinion on how to solve the following problem properly.
My Axon Test Setup
Two instances of the same Spring Boot application (using axon-spring-boot-starter 4.4 without Axon Server)
Every instance publishes the same events on a regular interval
Both instances are connected to the same EventSource (single SQL Server instance using JpaEventStorageEngine)
Every instance is configured to use TrackingEventProcessors
Every instances has the same event handlers registered
What I want to achieve
I'd like that events published by one instance are only handled by the very same instance
If instance1 publishes eventX then only instance1 should handle eventX
What I've tried so far
I can achieve the above scenario using SubscribingEventProcessor. Unfortunately this is not an option in my case, since we'd like to have the option to replay events for rebuilding / adding new query models.
I could assign the event handlers of every instance to differed processing groups. Unfortunately this didn't worked. Maybe because every TrackingEventProcessors instance processes the same EventStream ? - not so sure about this though.
I could implement a MessageHandlerInterceptor which only proceeds in case the event origin is from the same instance. This is what I implemented so far and which works properly:
MessageHandlerInterceptor
class StackEventInterceptor(private val stackProperties: StackProperties) : MessageHandlerInterceptor<EventMessage<*>> {
override fun handle(unitOfWork: UnitOfWork<out EventMessage<*>>?, interceptorChain: InterceptorChain?): Any? {
val stackId = (unitOfWork?.message?.payload as SomeEvent).stackId
if(stackId == stackProperties.id){
interceptorChain?.proceed()
}
return null
}
}
#Configuration
class AxonConfiguration {
#Autowired
fun configure(eventProcessingConfigurer: EventProcessingConfigurer, stackProperties: StackProperties) {
val processingGroup = "processing-group-stack-${stackProperties.id}"
eventProcessingConfigurer.byDefaultAssignTo(processingGroup)
eventProcessingConfigurer.registerHandlerInterceptor(processingGroup) { StackEventInterceptor(stackProperties) }
}
}
Is there a better solution ?
I have the impression that my current solution is properly not the best one, since ideally I'd like that only the event handlers which belongs to a certain instance are triggered by the TrackingEventProcessor instances.
How would you solve that ?
Interesting scenario you're having here #thowimmer.
My first hunch would be to say "use the SubscribingEventProcessor instead".
However, you pointed out that that's not an option in your setup.
I'd argue it's very valuable for others who're in the same scenario to know why that's not an option. So, maybe you can elaborate on that (to be honest, I am curious about that too).
Now for your problem case to ensure events are only handled within the same JVM.
Adding the origin to the events is definitely a step you can take, as this allows for a logical way to filter. "Does this event originate from my.origin()?" If not, you'd just ignore the event and be done with it, simple as that. There is another way to achieve this though, to which I'll come to in a bit.
The place to filter is however what you're looking for mostly I think. But first, I'd like to specify why you need to filter in the first place. As you've noticed, the TrackingEventProcessor (TEP) streams events from a so called StreamableMessageSource. The EventStore is an implementation of such a StreamableMessageSource. As you are storing all events in the same store, well, it'll just stream everything to your TEPs. As your events are part of a single Event Stream, you are required to filter them at some stage. Using a MessageHandlerInterceptor would work, you could even go and write a HandlerEnhacnerDefinition allowing you to add additional behaviour to your Event Handling functions. However you put it though, with the current setup, filtering needs to be done somewhere. The MessageHandlerInterceptor is arguably the simplest place to do this at.
However, there is a different way of dealing with this. Why not segregate your Event Store, into two distinct instances for both applications? Apparently they do not have the need to read from one another, so why share the same Event Store at all? Without knowing further background of your domain, I'd guess you are essentially dealing with applications residing in distinct bounded contexts. Very shortly put, there is zero interest to share everything with both applications/contexts, you just share specifics portions of your domain language very consciously with one another.
Note that support for multiple contexts, using a single communication hub in the middle, is exactly what Axon Server can achieve for you. I am not here to say you cant configure this yourself though, I have done this in the past. But leaving that work to somebody or something else, freeing you from the need to configure infrastructure, that would be a massive timesaver.
Hope this helps you set the context a little of my thoughts on the matter #thowimmer.
Sumup:
Using the same EventStore for both instances is probably no an ideal setup in case we want to use the capabilities of the TrackingEventProcessor.
Options to solve it:
Dedicated (not mirrored) DB instance for each application instance.
Using multiple contexts using AxonServer.
If we decide to solve the problem on application level filtering using MessageHandlerInterceptor is the most simplest solution.
Thanks #Steven for exchanging ideas.
EDIT:
Solution on application level using CorrelationDataProvider & MessageHandlerInterceptor by filtering out events not originated in same process.
AxonConfiguration.kt
const val METADATA_KEY_PROCESS_ID = "pid"
const val PROCESSING_GROUP_PREFIX = "processing-group-pid"
#Configuration
class AxonConfiguration {
#Bean
fun processIdCorrelationDataProvider() = ProcessIdCorrelationDataProvider()
#Autowired
fun configureProcessIdEventHandlerInterceptor(eventProcessingConfigurer: EventProcessingConfigurer) {
val processingGroup = "$PROCESSING_GROUP_PREFIX-${ApplicationPid()}"
eventProcessingConfigurer.byDefaultAssignTo(processingGroup)
eventProcessingConfigurer.registerHandlerInterceptor(processingGroup) { ProcessIdEventHandlerInterceptor() }
}
}
class ProcessIdCorrelationDataProvider() : CorrelationDataProvider {
override fun correlationDataFor(message: Message<*>?): MutableMap<String, *> {
return mutableMapOf(METADATA_KEY_PROCESS_ID to ApplicationPid().toString())
}
}
class ProcessIdEventHandlerInterceptor : MessageHandlerInterceptor<EventMessage<*>> {
override fun handle(unitOfWork: UnitOfWork<out EventMessage<*>>?, interceptorChain: InterceptorChain?) {
val currentPid = ApplicationPid().toString()
val originPid = unitOfWork?.message?.metaData?.get(METADATA_KEY_PROCESS_ID)
if(currentPid == originPid){
interceptorChain?.proceed()
}
}
}
See full demo project on GitHub

play - how to wrap a blocking code with futures

I am trying to understand the difference between the 2 methods, in terms of functionality.
class MyService (blockService: BlockService){
def doSomething1(): Future[Boolean] = {
//do
//some non blocking
//stuff
val result = blockService.block()
Future.successful(result)
}
def doSomething2(): Future[Boolean] = {
Future{
//do
//some non blocking
//stuff
blockService.block()
}
}
}
To my understanding the difference between the 2 is which thread is the actual thread that will be blocked.
So if there is a thread: thread_1 that execute something1, thread_1 will be the one that is blocked, while if a thread_1 executed something2a new thread will run it - thread_2, and thread_2 is the one to be blocked.
Is this true?
If so, than there is no really a preferred way to write this code? if I don't care which thread will eventually be blocked, then the end result will be the same.
dosomething1 seems like a weird way to write this code, I would choose dosomething2.
Make sense?
Yes, doSomething1 and doSomething2 blocks different threads, but depending on your scenario, this is an important decision.
As #AndreasNeumann said, you can have different execution contexts in doSomething2. Imagine that the main execution context is the one receiving HTTP requests from your users. Block threads in this context is bad because you can easily exhaust the execution context and impact requests that have nothing to do with doSomething.
Play docs have a better explanation about the possible problems with having blocking code:
If you plan to write blocking IO code, or code that could potentially do a lot of CPU intensive work, you need to know exactly which thread pool is bearing that workload, and you need to tune it accordingly. Doing blocking IO without taking this into account is likely to result in very poor performance from Play framework, for example, you may see only a few requests per second being handled, while CPU usage sits at 5%. In comparison, benchmarks on typical development hardware (eg, a MacBook Pro) have shown Play to be able to handle workloads in the hundreds or even thousands of requests per second without a sweat when tuned correctly.
In your case, both methods are being executed using Play default thread pool. I suggest you to take a look at the recommended best practices and see if you need a different execution context or not. I also suggest you to read Akka docs about Dispatchers and Futures to gain a better understanding about what executing Futures and have blocking/non-blocking code.
This approach makes sense if you make use of different execution contexts in the second method.
So having for example one for answering requests and another for blocking requests.
So you would use the normal playExecutionContext to keep you application running and answering and separate blocking operation in a different one.
def doSomething2(): Future[Boolean] = Future{
blocking { blockService.block() }
}( mySpecialExecutionContextForBlockingOperations )
For a little more information: http://docs.scala-lang.org/overviews/core/futures.html#blocking
You are correct. I don't see a point in doSomething1. It simply complicates the interface for the caller while not providing the benefits of an asynchronous API.
Does BlockService handle blocking operation?
Normally, use blocking ,as #Andreas remind,to make blocking operation into another thread is meanful.

mvc-mini-profiler slows down Entity Framework

I've set up mvc-mini-profiler against my Entity Framework-powered MVC 3 site. Everything is duly configured; Starting profiling in Application_Start, ending it in Application_End and so on. The profiling part works just fine.
However, when I try to swap my data model object generation to providing profilable versions, performance slows to a grind. Not every SQL query, but some queries take about 5x the entire page load. (The very first page load after firing up IIS Express takes a bit longer, but this is sustained.)
Negligible time (~2ms tops) is spent querying, executing and "data reading" the SQL, while this line:
var person = dataContext.People.FirstOrDefault(p => p.PersonID == id);
...when wrapped in using(profiler.Step()) is recorded as taking 300-400 ms. I profiled with dotTrace, which confirmed that the time is actually spent in EF as usual (the profilable components do make very brief appearances), only it is taking much longer.
This leads me to believe that the connection or some of its constituent parts are missing sufficient data, making EF perform far worse.
This is what I'm using to make the context object (my edmx model's class is called DataContext):
var conn = ProfiledDbConnection.Get(
/* returns an SqlConnection */CreateConnection());
return CreateObjectContext<DataContext>(conn);
I originally used the mvc-mini-profiler provided ObjectContextUtils.CreateObjectContext method. I dove into it and noticed that it set a wildcard metadata workspace path string. Since I have the database layer isolated to one project and several MVC sites as other projects using the code, those paths have changed and I'd rather be more specific. Also, I thought this was the cause of the performance issue. I duplicated the CreateObjectContext functionality into my own project to provide this, as such:
public static T CreateObjectContext<T>(DbConnection connection) where T : System.Data.Objects.ObjectContext {
var workspace = new System.Data.Metadata.Edm.MetadataWorkspace(
GetMetadataPathsString().Split('|'),
// ^-- returns
// "res://*/Redacted.csdl|res://*/Redacted.ssdl|res://*/Redacted.msl"
new Assembly[] { typeof(T).Assembly });
// The remainder of the method is copied straight from the original,
// and I carried over a duplicate CtorCache too to make this work.
var factory = DbProviderServices.GetProviderFactory(connection);
var itemCollection = workspace.GetItemCollection(System.Data.Metadata.Edm.DataSpace.SSpace);
itemCollection.GetType().GetField("_providerFactory", // <==== big fat ugly hack
BindingFlags.NonPublic | BindingFlags.Instance).SetValue(itemCollection, factory);
var ec = new System.Data.EntityClient.EntityConnection(workspace, connection);
return CtorCache<T, System.Data.EntityClient.EntityConnection>.Ctor(ec);
}
...but it doesn't seem to make much of a difference. The problem still exists whether I use the above hacked version that's more specific with metadata workspace paths or the mvc-mini-profiler provided version. I just thought I'd mention that I've tried this too.
Having exhausted all this, I'm at my wits' end. Once again: when I just provide my data context as usual, no performance is lost. When I provide a "profilable" data context, performance plummets for certain queries (I don't know what influences this either). What could mvc-mini-profiler do that's wrong? Am I still feeding it the wrong data?
I think this is the same problem as this person ran into.
I just resolved this issue today.
see: http://code.google.com/p/mvc-mini-profiler/issues/detail?id=43
It happened cause some of our fancy hacks were not cached well enough. In particular:
var workspace = new System.Data.Metadata.Edm.MetadataWorkspace(
new string[] { "res://*/" },
new Assembly[] { typeof(T).Assembly });
Is a very expensive call, so we need to cache the workspace.
Profiling, by definition, will effect performance of the application being profiled. The profiler needs to insert it's own method calls throughout the application, intercept low level system calls, and record all that data someplace (meaning writes to disk). All of those tasks take up precious CPU cycles, memory, and disk access.