We have a web service that acccepts an XML file for any faults that occur on a vehicle. The web service then uses EF 3.5 to load these files to a hyper normalized database. Typically an XML file is processed in 10-20 seconds. There are two concurrency scenarios that I need to handle:
Different vehicles sending XML files at the same time: This isn't a problem. EF's default optimistic concurrency ensures that I am able to store all these files in the same tables as their data is mutually exclusive.
Same vehicle sending multiple files at the same time: This creates a problem as my system tries to write same or similar data to the database simultaneously. And this isn't rare.
We needed a solution for point 2.
To solve this I introduced a lock table. Basically, I insert a concatenated vehicle id and fault timestamp (which is same for the multiple files sent by a vehicle for the same fault) into this table when I start writing to the DB and I delete the record once I am done. However, there are a lot of times when both the files try to insert this row into the database simultaneously. In such cases, one file succeeds, while the other throws a duplicate key exception that goes to the caller of the webservice.
What's the best way to handle such scenarios? I wouldn't like to rollback anything from the db as there are many tables involved for a single file.
And what solution do you expect? Your current approach with lock table is exactly what you need. If the exception is fired because of duplicate you can either wait and try it again later or fire typed fault back to client and let him upload the file later. Both solutions are ugly but that is what your application currently offer.
The better solution would be replacing current web service with another solution where web service call would only add job to the queue and some background process would process these jobs and ensure that two files for the same car would not be processed concurrently. This would also offer much better throughput control for peek situations. The disadvantage is that you must implement some notification that file has been processed because it will not be online.
Related
I am coding an application that generates reports from a Druid database. My integration tests need to read data from that database.
My current approach involves creating synthetic data for each of the tests. However, I am unable to remove data created from the database (be it by removing entries or completely dropping the schema). Tried this but still getting data back after disabling the segment and firing the kill task.
I think that either I am completely wrong with my approach or there is a way to delete information from the database that I haven't been able to find.
You can do this by below 2 approaches
Approach 1 :
Disable the segment(used=0)
Fire a kill task for that segment
Have the load and drop rules
Refer : http://druid.io/docs/latest/ingestion/tasks.html (look for destroying segments)
Approach 2 : (prefer this for doing integration tests before setting up production):
stop coordinator node and delete all entires in the druid_segments
table in the metadata store
stop historical node and delete everything inside the directory pointed by druid.segmentCache.locations at historical node
start coordinator and historical nodes
Remember this will delete everything from druid cluster.
In the end I worked around the issue by inserting data in Druid with ids specific to each unit test and querying for that.
Not very elegant since now one malicious test can (potentially) mess with the results of another test.
I´m using Sqlite.Swift and I want to perform three different tasks to add data to my database. Each task will get data from an external source.
So what I want to do is:
Get data for the first task
Add it to the first table
When this is done, go on to the next task
Add it to the second table
When this is done, go on to the last task
Add it to the last table
Right now I only have it like this:
dataService.getPlaces()
dataService.getTaxes()
dataService.getPersons()
But the issue is that there is over 2000 places, 100 taxes and 2000 persons so each task takes some time to complete and the database get locked when these try to run at the same time.
Anyone have any idea how to do this tasks one at a time?
Use NSOperationQueue, there is an excellent video online from last year's WWDC.
SQLite, whatever the Swift library you use, does not support concurrent writes: you won't be able to write places, taxes, persons, in parallel.
This is the case even when you open multiple connections, as I guess you did because you got locking errors.
What you can do is: first load data from external sources in memory. This can be done in parallel. When all data has been loaded, you can write them to the database in a single transaction (SQLite performs much better when you group writes in a transaction).
Can anyone please give me some direction in regards to various ways to
synchronize the Write and Read databases?
What are different technologies out there, and how do you evaluate each, in
terms of realiability, performance, cost to implement, etc.
Typically in CQRS, the write DB is used to store transitional data for long running processes (sagas). If you are synchronizing the read and write DB (I'm assuming you mean both ways), you might be doing something wrong.
For a long running process where a service expects multiple messages, it needs a way to temporary store data before the all the messages arrives. An example of this is customer registration where an approval from manager, which takes a week to process, is required. The service needs a way to temporarily store the customer information before the approval arrives. This is where the write DB is used to store this piece of temporary data. Note that before the customer is approved, nothing is written to the read DB yet.
When the approval finally arrives, the service will take the customer information from the write DB, complete the registration process and write it to the read DB. At this time, the temporary customer information in the write DB has done its job and can be removed from the write DB. Notice that there isn't any two-way sync'ing involved.
For simpler process such as change customer first name, the change can be written to the read DB right away. Writing to the write DB is not required because there is no temporary data in this case.
Query model need not be consistent.. it needs to be eventually consistent. Query model is also the view model, i.e. tables are already joined as per requirement of user interface. So you can use even an in memory cache, or like Redis.
Command side is like command objects which contain all relevant information to update database. These objects may fill up a messaging queue. The command objects are processed by a command processor which transactionally updates the query cache and the write database. The write database can be an RDBMS.. but as is apparent, should be write optimized like MongoDB.
You can update read database via a messaging system too.
Some good messaging systems for this purpose are RabbitMQ and 0MQ.
If you, like me see the read store as the db that the Query service use (and its denormalized)
and the write db as the database where the Domain events are stored , then if you need to Synch them to a particular moment then what you can do is just replay the events that you have stored.
In the case you want to be as up to date as possible then you need not to restrict by version
If you are using CQRS, then probably you will have a repository that looks somewhat like this
public interface IRepository<T> where T : AggregateRoot, new()
{
void Save(AggregateRoot aggregate, int expectedVersion);
T GetById(Guid id);
T GetById(Guid id, int version);
}
Hope this helps
Cheers
I want to build an application that utilizes the data from a server, and it needs to synchronize the data in the application with the data entered by other client applications.
So, there are some questions:
How to design the database schema efficiently? Should it replicate the same database schema on the server or should it add some more fields & entities?
What are the strategies to synchronize the data, on each application start or during some idle state of the application, or something else...
How to handle conflict of the data entered by the user within the application and data enter ed by another client application.
Any response is welcomed.
Well, you've identified the main challenges in your original question. The real answer is that this has little to do with the iPhone - database replication is just really hard.
Here are some rules of thumb I can offer:
one-way replication of data is a million times easier than two-way replication, if you can get away with it.
replication is always easier if the database schema is identical on the client and the server.
to do two-way replication, you either need to store timestamps for each row on each end, or to store the complete contents of one end on the other end. (ie. the server needs to know the client's most recent status, or the client needs to know the server's most recent status).
to allow adding rows from disconnected clients, you need to identify your rows using a GUID (or hash, eg. SHA-1), not an autoincrement field. It's possible to keep new client-added rows as "identifierless" until you sync them with the server, but that way lies madness.
there is no actual good way to do conflict resolution. The imperfect options include last-writer-wins (last person who syncs a modified record gets their copy of the record inserted), three-way-merge (when someone sends a modified record, check which columns they have changed, and change only those columns, thus not overwriting any changes to other columns), split-into-two-records (if two people make changes to the same record, just make two records and assume someone will fix it eventually), and "ask the user" (which is technically the most sound, but requires a lot of UI work and users rarely understand what a conflict even is).
i know this has been asked here. But my question is slightly different. When the dataset was designed keeping the disconnected principle in mind, what was provided as a feature which would handle unexpected termination of the application, say a power failure or a windows hang or system exception leading to restart. Say the user has entered some 100 rows and it is modified at the dataset alone. Usually the dataset is updated at the application close or at a timely period.
In old times which programming using vb 6.0 all interaction used to take place directly with the database, thus each successful transaction was committing itself automatically. How can that be done using datasets?
DataSets are never for direct access to database, they are a disconnected model only. There is no intent that they be able to recover from machine failures.
If you want to work live against the database you need to use DataReaders and issue DbCommands against the database live for changes. This of course will increase your load on the database server though.
You have to balance the two for most applications. If you know a user just entered vital data as a new row, execute an insert command to the database, and put a copy in your local cached DataSet. Then your local queries can run against the disconnected data, and inserts are stored immediately.
A DataSet can be serialized very easily, so you could implement your own regular backup to disk by using serialization of the DataSet to the filesystem. This will give you some protection, but you will have to write your own code to check for any data that your application may have saved to disk previously and so on...
You could also ignore DataSets and use SqlDataReaders and SqlCommands for the same sort of 'direct access to the database' you are describing.