long running ADO.NET process throws error recommending `CoWaitForMultipleHandles` pumping wait primitives - ado.net

I am trying to encrypt a text field in an existing database using ADO.NET Command object that feeds parameters to a stored proc server-side. Code below.
NOTE: a command has already fetched the PK and the particular column that needs to be encrypted for all relevant rows in the table, and client-side that dataset is a System.Data.DataTable T. Inside a tight loop, the column's plainText is obtained, encrypted, and then the id and the encrypted values are provided as command parameters to MyUpdateCommand, the command that invokes a stored procedure server side, which updates the row with the encrypted value:
foreach (DataRow row in T.Rows)
{
int id = (int)row["id"];
string plainText = row["note"].ToString();
string encrypted = MyEncrypt(plainText, aeskey);
MyUpdateCommand.Parameters["#id"].Value = id;
MyUpdateCommand.Parameters["#encryptedanote"].Value = encrypted;
MyUpdateCommand.ExecuteNonQuery();
}
I'm doing this in batches of 100,000 records. After about 10,000 records, an exception is thrown (The CLR has been unable to transition from COM context […] for 60 seconds) which says the process is not pumping windows messages, which
... has a negative performance impact and may even lead to the
application becoming unresponsive or memory usage accumulating over
time. To avoid this problem, all single threaded apartment (STA)
threads should use pumping wait primitives (such as
CoWaitForMultipleHandles) and routinely pump messages during long
running operations.
How do I retrofit my loop with pumping wait primitives?

Related

Can QuickFixJ store only messages using JdbcStore

We are currently incorporating a FIX engine (using QuickFixJ) in our application. We will be the initiator and use trade capture reports to get informed on all trades happening on the platform.
When persisting the trade capture reports, we first buffer them for performance reasons and later insert them all at once on the database in a single transaction. We are using the JdbcStore to persist the sent FIX messages on the database as we cannot rely on the hard disk. However, we do not want the JdbcStore to persist the session information (target sequence number, sender sequence number etc) because this would open a new transaction for every single message we receive (which we want to avoid due to performance reasons). Instead, we manually save the last seen and sent sequence numbers.
I have not found a configuration of QuickFixJ which would allow this. If we create a JdbcStore using the JdbcStoreFactory, it expects a table on the database to store session information. Is there any way to configure QuickFixJ to only persist the sent messages, but not the session information?

Entity Framework Arithabort ON, but still query is slow

I have a simple query
var count = await _context.ExchangeRate.AsNoTracking().CountAsync(u => u.Currency == "GBP");
The table has only 3 Columns and 10 rows data.
When I tried to execute the query from Net 5 project it is taking around 2.3 seconds for the first time and 500ms (+- 100) for subsequent requests. When I hit the same request in SSMS it is returning in almost no time (45ms as seen in sql profiler).
I have implemented ARITHABORT ON in EF from here
When I see in SQL Profiler it is setting ARITHABORT ON but still the query takes the same time for the first request and subsequent requests.
How do I achieve speed same as SSMS query speed. I need the query to run really speed as my project has requirement to the return the response in 1 second (Need to make atleast 5 simple DB calls...if 1 call takes 500ms then it is crossing 1 second requirement)
Edit
Tried with even ADO.Net. The execution time took as seen in SQL Profiler is 40ms where as when it reached the code it is almost 400ms. So much difference
using (var conn = new SqlConnection(connectionString))
{
var sql = "select count(ExchangeRate) as cnt from ExchangeRate where Currency = 'GBP'";
SqlCommand cmd = new SqlCommand();
cmd.CommandText = "SET ARITHABORT ON; " + sql;
cmd.CommandType = CommandType.Text;
cmd.Connection = conn;
conn.Open();
var t1 = DateTime.Now;
var rd = cmd.ExecuteReader();
var t2 = DateTime.Now;
TimeSpan diff = t2 - t1;
Console.WriteLine((int)diff.TotalMilliseconds);
while (rd.Read())
{
Console.WriteLine(rd["cnt"].ToString());
}
conn.Close();
}
Your "first run" scenario is generally the one-off static initialization of the DbContext. This is where the DbContext works out its mappings for the first time and will occur when the first query is executed. The typical approach to avoid this occurring for a user is to have a simple "warm up" query that runs when the service starts up.. For instance after your service initializes, simply put something like the following:
// Warm up the DbContext
using (var context = new AppDbContext())
{
var hasUser = context.Users.Any();
}
This also serves as a quick start-up check that the database is reachable and responding. The query itself will do a very quick operation, but the DbContext will resolve its mappings at this time so any newly generated DbContext instances will respond without incurring that cost during a request.
As for raw performance, if it isn't a query that is expected to take a while and tie up a request, don't make it async. Asynchronous requests are not faster, they are actually a bit slower. Using async requests against the DbContext is about ensuring your web server / application thread is responsive while potentially expensive database operations are processing. If you want a response as quickly as possible, use a synchronous call.
Next, ensure that any fields you are filtering against, in this case Currency, are indexed. Having a field called Currency in your entity as a String rather than a CurrencyId FK (int) pointing to a Currency record is already an extra indexing expense as indexes on integers are smaller/faster than those on strings.
You also don't need to bother with AsNoTracking when using a Count query. AsNoTracking applies solely when you are returning entities (ToList/ToArray/Single/Firstetc.) to avoid having the DbContext holding onto a reference to the returned entity. When you use Count/Any or projection to return properties from entities using Select there is no entity returned to track.
Also consider network latency between where your application code is running and the database server. Are they the same machine or is there a network connection in play? How does this compare when you are performing a SSMS query? Using a profiler you can see what SQL EF is actually sending to the database. Everything else in terms of time is a cost of: Getting the request to the DB, Getting the resulting data back to the requester, parsing that response. (If in the case where you are returning entities, allocating, populating, checking against existing references, etc... In the case of counts etc. checking existing references)
Lastly, to ensure you are getting the peak performance, ensure that your DbContexts lifetimes are kept short. If a DbContext is kept open and has had a number of tracking queries run against in (Selecting entities without AsNoTracking) those tracked entity references accumulate and can have a negative performance impact on future queries, even if you use AsNoTracking as EF looks to check through it's tracked references for entities that might be applicable/related to your new queries. Many times I see developers assume DbContexts are "expensive" so they opt to instantiate them as little as possible to avoid those costs, only to end up making operations more expensive over time.
With all that considered, EF will never be as fast as raw SQL. It is an ORM designed to provide convenience to .Net applications when it comes to working with data. That convenience in working with entity classes rather than sanitizing and writing your own raw SQL every time comes with a cost.

Timeout exception when size of the input to child workflow is huge

16:37:21.945 [Workflow Executor taskList="PullFulfillmentsTaskList", domain="test-domain": 3] WARN com.uber.cadence.internal.common.Retryer - Retrying after failure
org.apache.thrift.transport.TTransportException: Request timeout after 1993ms
at com.uber.cadence.serviceclient.WorkflowServiceTChannel.throwOnRpcError(WorkflowServiceTChannel.java:546)
at com.uber.cadence.serviceclient.WorkflowServiceTChannel.doRemoteCall(WorkflowServiceTChannel.java:519)
at com.uber.cadence.serviceclient.WorkflowServiceTChannel.respondDecisionTaskCompleted(WorkflowServiceTChannel.java:962)
at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$RespondDecisionTaskCompleted$11(WorkflowServiceTChannel.java:951)
at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:569)
at com.uber.cadence.serviceclient.WorkflowServiceTChannel.RespondDecisionTaskCompleted(WorkflowServiceTChannel.java:949)
at com.uber.cadence.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendReply$0(WorkflowWorker.java:301)
at com.uber.cadence.internal.common.Retryer.lambda$retry$0(Retryer.java:104)
at com.uber.cadence.internal.common.Retryer.retryWithResult(Retryer.java:122)
at com.uber.cadence.internal.common.Retryer.retry(Retryer.java:101)
at com.uber.cadence.internal.worker.WorkflowWorker$TaskHandlerImpl.sendReply(WorkflowWorker.java:301)
at com.uber.cadence.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:261)
at com.uber.cadence.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:229)
at com.uber.cadence.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:71)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Our parent workflow code is basically like this (JSONObject is from org.json)
JSONObject[] array = restActivities.getArrayWithHugeJSONItems();
for(JSONObject hugeJSON: array) {
ChildWorkflow child = Workflow.newChildWorkflowStub(ChildWorkflow.class);
child.run(hugeJSON);
}
What we find out is that most of the time, the parent workflow worker fails to start the child workflow and throws the timeout exception above. It retries like crazy but never success and print the timeout exception over and over again. However sometimes we got very lucky and it works. And sometimes it fails even earlier at the activity worker, and it throws the same exception. We believe this is due to the size of the data is too big (about 5MB) and could not be sent within the timeout (judging from the log we guess it's set to 2s). If we call child.run with small fake data it 100% works.
The reason we use child workflow is we want to use Async.function to run them in parallel. So how can we solve this problem? Is there a thrift timeout config we should increase or somehow we can avoid passing huge data around?
Thank you in advance!
---Update after Maxim's answer---
Thank you. I read the example, but still have some questions for my use case. Let's say I got an array of 100 huge JSON objects in my RestActivitiesWorker, if I should not return the huge array to the workflow, I need to make 100 calls to the database to create 100 rows of records and put 100 ids in an array and pass that back to the workflow. Then the workflow create one child workflow per id. Each child workflow then calls another activity with the id to load the data from the DB. But that activity has to pass that huge JSON to the child workflow, is this OK? And for the RestActivitiesWorker making 100 inserts into the DB, what if it failed in the middle?
I guess it boils down to that our workflow is trying to work directly with huge JSON. We are trying to load huge JSON (5-30MB, not that huge) from an external system into our system. We break down the JSON a little bit, manipulate a few values, and use values from a few fields to do some different logic, and finally save it in our DB. How should we do this with Temporal?
Temporal/Cadence doesn't support passing large blobs as inputs and outputs as it uses a DB as underlying storage. So you want to change architecture of your application to avoid this.
The standard workarounds are:
Use external blob store to save large data and pass reference to it as parameters.
Cache data in a worker process or even on a host disk and route activities that operate on this data to that process or host. See fileprocessing sample for this approach.

Is querying MongoDB faster than Redis?

I have some data stored in a database (MongoDB) and in distributed cache redis.
While querying to the repository, I am using lazy loading approach which first finds the data in the cache if it's available, if not find it in the database and update the cache as well so that next time when the requirement comes it should be found in the cache.
Sample Model Used:
Person ( id, name, age, address (Reference))
Address (id, place)
PersonCacheModel extends Person with addressId.
I am not storing parent object with child object together in the cache that is why I've created personCacheModel with addressId and store this object in the cache and while getting the data personCacheModel converts to person and make a call to address repo to addressCache to fill the address details of the person object.
As far as I understand:
personRepository.findPersonByName(NAME + randomNumber);
Access Data from Cache = network time + cache access time + deserialize time
Access Data from database = network time + database query time + object mapping time
When I ran above approach for 1000 rows, accessing data from the database is faster than the accessing data from the cache. I believe cache access time must be smaller than the accessing MongoDB.
Please let me know if there's an issue with the approach or is this is the expected scenario.
to have a valid benchmark we need to consider hardware side and data processing side:
hardware - do we have same configuration, RAM, CPUs count, OS... etc
process - how data is transformed (on single thread, multi thread, per object, per request)
Performing a load test on your data set will give you an good overview of which process is faster in particular use case scenario.
It is hard to judge - what it should be as long as there mentioned above points will be know for us.
The other thing is to have more than one test scenario and have it stressed in let's say 10 sec time, minute , 5 an hour... so you can have digits that will tell you the truth.

Updating entities in ndb while paging with cursors

To make things short, I have to make a script in Second Life communicating with an AppEngine app updating records in an ndb database. Records extracted from the database are sent as a batch (a page) to the LSL script, which updates customers, then asks the web app to mark these customers as updated in the database.
To create the batch I use a query on a (integer) property update_ver==0 and use fetch_page() to produce a cursor to the next batch. This cursor is also sent as urlsafe()-encoded parameter to the LSL script.
To mark the customer as updated, the update_ver is set to some other value like 2, and the entity is updated via put_async(). Then the LSL script fetches the next batch thanks to the cursor sent earlier.
My rather simple question is: in the web app, since the query property update_ver no longer satisfies the filter, is my cursor still valid ? Or do I have to use another strategy ?
Stripping out irrelevant parts (including authentication), my code currently looks like this (Customer is the entity in my database).
class GetCustomers(webapp2.RequestHandler): # handler that sends batches to the update script in SL
def get(self):
cursor=self.request.get("next",default_value=None)
query=Customer.query(Customer.update_ver==0,ancestor=customerset_key(),projection=[Customer.customer_name,Customer.customer_key]).order(Customer._key)
if cursor:
results,cursor,more=query.fetch_page(batchsize,start_cursor=ndb.Cursor(urlsafe=cursor))
else:
results,cursor,more=query.fetch_page(batchsize)
if more:
self.response.write("more=1\n")
self.response.write("next={}\n".format(cursor.urlsafe()))
else:
self.response.write("more=0\n")
self.response.write("n={}\n".format(len(results)))
for c in results:
self.response.write("c={},{},{}\n".format(c.customer_key,c.customer_name,c.key.urlsafe()))
self.response.set_status(200)
The handler that updates Customer entities in the database is the following. The c= parameters are urlsafe()-encoded entity keys of the records to update and the nv= parameter is the new version number for their update_ver property.
class UpdateCustomer(webapp2.RequestHandler):
#ndb.toplevel # don't exit until all async operations are finished
def post(self):
updatever=self.request.get("nv")
customers=self.request.get_all("c")
for ckey in customers:
cust=ndb.Key(urlsafe=ckey).get()
cust.update_ver=nv # filter in the query used to produce the cursor was using this property!
cust.update_date=datetime.datetime.utcnow()
cust.put_async()
else:
self.response.set_status(403)
Will this work as expected ? Thanks for any help !
Your strategy will work and that's the whole point for using these cursors, because they are efficient and you can get the next batch as it was intended regardless of what happened with the previous one.
On a side note you could also optimise your UpdateCustomer and instead of retrieving/saving one by one you can do things in batches using for example the ndb.put_multi_async.