Perform multiple tasks to database one at a time - swift

I´m using Sqlite.Swift and I want to perform three different tasks to add data to my database. Each task will get data from an external source.
So what I want to do is:
Get data for the first task
Add it to the first table
When this is done, go on to the next task
Add it to the second table
When this is done, go on to the last task
Add it to the last table
Right now I only have it like this:
dataService.getPlaces()
dataService.getTaxes()
dataService.getPersons()
But the issue is that there is over 2000 places, 100 taxes and 2000 persons so each task takes some time to complete and the database get locked when these try to run at the same time.
Anyone have any idea how to do this tasks one at a time?

Use NSOperationQueue, there is an excellent video online from last year's WWDC.

SQLite, whatever the Swift library you use, does not support concurrent writes: you won't be able to write places, taxes, persons, in parallel.
This is the case even when you open multiple connections, as I guess you did because you got locking errors.
What you can do is: first load data from external sources in memory. This can be done in parallel. When all data has been loaded, you can write them to the database in a single transaction (SQLite performs much better when you group writes in a transaction).

Related

Architecture to be able to have a lot of SQL calls on the same tables for a workflow execution

We have a project where we let users execute workflows based on a selection of steps.
Basically each step is linked to an execution and an execution can be linked to one or multiple executionData (the data created or updated during that execution for that step, a blob in postgres).
Today, we execute this through a queuing mechanism where executions are created in queues and workers do the executions and create the next job in the queue.
But this architecture and our implementation make our postgres database slow as when multiple jobs are scheduled at the same time:
We are basically always creating and reading from the execution table (we create the execution to be scheduled, we read the execution when starting the job, we update the status when the job is finished)
We are basically always creating and reading from the executionData table (we add and update executionData during executions)
We have the following issues:
Our executionData table is growing very fast and it's almost impossible to remove rows as there are constantly locks on the table => what could we do to avoid that ? Postgres a good usage for that kind of data ?
Our execution table is growing as well very fast and it impacts the overall execution as to be able to execute we need to create, read & update execution. Delete of rows is as well almost impossible ... => what could we do to improve this ? Usage of historical table ? Suggestions ?
We need to perform statistics on the total executions executed & data saved, this is as well requested on the above table which slows down the process
We use RDS on AWS for our Postgres database.
Thanks for your insights!
Try going for a faster database architecture. Your use-case seems well optimized for a DynamoDB architecture for your executions. You can get O(1) performance, and the blob-storage can fit right into the record as long as you can keep it under 256K.

Postgres: Count all INSERT queries executed in the past 1 minute

I can do currently active count of all INSERT queries executed on the PostgreSQL server like this:
SELECT count(*) FROM pg_stat_activity where query like 'INSERT%'
But is there a way to count all INSERT queries executed on the server in a given period of time? E.g. in the past minute?
I have a bunch of tables into which I send a lot of inserts and I would like to somehow aggregate how many rows I am inserting per minute. I could code a solution for this, but it'd be so much easier if this was possible to somehow extract directly from the server.
Any type of stats like this, in a certain period of time, would be very helpful, an average time it takes for the query to process, or knowing the bandwidth that goes through per minute, etc.
Note: I am using PostgreSQL 12
If not already done, install pg_stat_statements extension and take some snapshots of the view pg_stat_statements: the diff will give the number of queries executed between 2 snapshots.
Note: It doesn’t save each individual query, rather it parameterizes them and then saves the aggregated result.
See https://www.citusdata.com/blog/2019/02/08/the-most-useful-postgres-extension-pg-stat-statements/
I believe that you can use the audit trigger.
This audit will create a table that register INSERT, UPDATE and DELETE actions. So you can adapt. So every time that your database runs one of those commands, the audit table register the action, the table and the time of the action. So, it will be easy to do a COUNT() on desired table with a WHERE from a minute ago.
I couldn't come across anything solid, so I have created a table where I log a number of insert transactions using a script that runs as a cron job. It was simple enough to implement and I do not get estimations, but the real values instead. I actually count all new rows inserted to tables in a given interval.

Extract Active Directory into SQL database using VBScript

I have written a VBScript to extract data from Active Directory into a record set. I'm now wondering what the most efficient way is to transfer the data into a SQL database.
I'm torn between;
Writing it to an excel file then firing an SSIS package to import it or...
Within the VBScript, iterating through the dataset in memory and submitting 3000+ INSERT commands to the SQL database
Would the latter option result in 3000+ round trips communicating with the database and therefore be the slower of the two options?
Sending an insert row by row is always the slowest option. This is what is known as Row by Agonizing Row or RBAR. You should avoid that if possible and take advantage of set based operations.
Your other option, writing to an intermediate file is a good option, I agree with #Remou in the comments that you should probably pick CSV rather than Excel if you are going to choose this option.
I would propose a third option. You already have the design in VB contained in your VBscript. You should be able to convert this easily to a script component in SSIS. Create an SSIS package, add a DataFlow task, add a Script Component (as a datasource {example here}) to the flow, write your fields out to the output buffer, and then add a sql destination and save yourself the step of writing to an intermediate file. This is also more secure, as you don't have your AD data on disk in plaintext anywhere during the process.
You don't mention how often this will run or if you have to run it within a certain time window, so it isn't clear that performance is even an issue here. "Slow" doesn't mean anything by itself: a process that runs for 30 minutes can be perfectly acceptable if the time window is one hour.
Just write the simplest, most maintainable code you can to get the job done and go from there. If it runs in an acceptable amount of time then you're done. If it doesn't, then at least you have a clean, functioning solution that you can profile and optimize.
If you already have it in a dataset and if it's SQL Server 2008+ create a user defined table type and send the whole dataset in as an atomic unit.
And if you go the SSIS route, I have a post covering Active Directory as an SSIS Data Source

Lock table to handle concurrency in Entity Framework 3.5

We have a web service that acccepts an XML file for any faults that occur on a vehicle. The web service then uses EF 3.5 to load these files to a hyper normalized database. Typically an XML file is processed in 10-20 seconds. There are two concurrency scenarios that I need to handle:
Different vehicles sending XML files at the same time: This isn't a problem. EF's default optimistic concurrency ensures that I am able to store all these files in the same tables as their data is mutually exclusive.
Same vehicle sending multiple files at the same time: This creates a problem as my system tries to write same or similar data to the database simultaneously. And this isn't rare.
We needed a solution for point 2.
To solve this I introduced a lock table. Basically, I insert a concatenated vehicle id and fault timestamp (which is same for the multiple files sent by a vehicle for the same fault) into this table when I start writing to the DB and I delete the record once I am done. However, there are a lot of times when both the files try to insert this row into the database simultaneously. In such cases, one file succeeds, while the other throws a duplicate key exception that goes to the caller of the webservice.
What's the best way to handle such scenarios? I wouldn't like to rollback anything from the db as there are many tables involved for a single file.
And what solution do you expect? Your current approach with lock table is exactly what you need. If the exception is fired because of duplicate you can either wait and try it again later or fire typed fault back to client and let him upload the file later. Both solutions are ugly but that is what your application currently offer.
The better solution would be replacing current web service with another solution where web service call would only add job to the queue and some background process would process these jobs and ensure that two files for the same car would not be processed concurrently. This would also offer much better throughput control for peek situations. The disadvantage is that you must implement some notification that file has been processed because it will not be online.

Update table instantly or “Bulk” Update in database later? And is it advisable?

I have a question regarding a semi-constant update in a database. In short it is regarding a checkout function on a web page, which each time the checkout function is evoked it do five steps.
I want to try to optimize this function and have my eye on a step where I update a table each time the checkout is performed. I take the information retrieved from the shopping cart and then update the table in question.
I do have some indexes on the table, the gain from those are greater than leaving them so this is a cost I’m willing to take.
Now, my question is. Could it in some way regarding to performance be better to not update the table instantly but collect every checkout items and save them in some way (maybe in a file) and then at a specific time (or several times) at day take this file and then update the table with the new information.
Then I started thinking about if there was a possibility to use some sort of Bulk Update to take a file, hashmap, array (or?) and then update it.
And I’m using IBM DB2 version 9.7
Mestika
You will lose the ability to do transactions, or to recover from failure after a step midway, so I would avoid using this approach. You could try using prepared statements, or batch updates offered by JDBC 2.0 where multiple statements are submitted to the DB as a single unit.