What is the recommended approach for handling errors when using Hive database in Flutter and is it necessary to handle exceptions for unlikely scenarios, like errors in getting data.
Any help would be appreciated.
Yes you should definitely handle the errors. If you can sideline the rows of the bad data those are responsible to fail your workflow. There are cases where the entire workflow fails due to a poison pill data.
Also keep alerting on those failures so that you are aware of those.
Related
We would like to be able to read state inside a command use case.
We could get the state from event store for the specific aggregate, but what about querying aggregates by field(not id) or performing more complicated queries, that are not fitted for the event store?
The approach we were thinking was to use our read model for those cases as well and not only for query use cases.
This might be inconsistent, so a solution could be to have the latest version of the aggregate stored in both write/read models, in order to be able to tell if the state is correct or stale.
Does this make sense and if yes, if we need to get state by Id should we use event store or the read model?
If you want the absolute latest state of an event-sourced aggregate, you're going to have to read the latest snapshot (assuming that you are snapshotting) and then replay events since that snapshot from the event store. You can be aggressive about snapshotting (conceivably even saving a snapshot after every command), but you're giving away some write performance to make the read faster.
Updating the read model directly is conceivably possible, though that level of coupling is something that should be considered very carefully. Note also that you will very likely need some sort of two-phase commit to ensure that the read model is only updated when the write model is updated and vice versa. I strongly suggest considering why you're using CQRS/ES in this project, because you are quite possibly undermining that reason by doing this sort of thing.
In general, if you need a query for processing a particular command, it's likely that query will generally be the same, i.e. you don't need free-form query support. In that case, you can often have a read model that's tuned for exactly that query and which only cares about events which could affect that query: often a fairly small subset of the events. The finer-grained the read model, the easier it is to keep in sync (if it ignores 99% of events, for instance, it can't really fall that far behind).
Needing to make complex queries as part of command processing could also be a sign that your aggregate boundaries aren't right and could do with a re-examination.
Does this make sense
Maybe. Let's start with
This might be inconsistent
Yup, they might be. So what?
We typically respond to a query by sending an unlocked copy of the answer. In other words, it's possible that the actual information in the write model will change after this response is dispatched but before the response arrives at its destination. The client will be looking at a copy of the answer taken from the past.
So we might reasonably ask how much better it is to get information no more than one minute old compared to information no more than five minutes old. If the difference in value is pennies, then you should probably deploy the five minute version. If the difference is millions of dollars, then you're in a good position to negotiate a real budget to solve the problem.
For processing a command in our own write model, that kind of inconsistency isn't usually acceptable or wise. But neither of the two common answers require keeping the read and write models synchronized. The most common answer is to just work with the write model alone. The less common answer is to grab a snapshot out of a cache, and then apply any additional events to it to bring it up to date. The latter approach is "just" a performance optimization (first rule: don't.)
The variation that trips everyone up is trying to process a command somewhere else, enforcing a consistency rule on our data here. Once again, you need a really clear picture of how valuable the consistency is to the business. If it's really important, that may be a signal that the information in question shouldn't be split into two different piles - you may be working with the wrong underlying data model.
Possibly useful references
Pat Helland Data on the Outside Versus Data on the Inside
Udi Dahan Race Conditions Don't Exist
Hi there I'm looking for advice from someone who is good at IBM db2 performance.
I have a situation in which many batch tasks are massively inserting rows in the same db2 table, at the same time.
This situation looks potentially bad. I don't think db2 is able to resolve the many requests quickly enough, causing the concurring tasks to take longer to end and even causing some of them to abend with a -904 or -911 sqlcode.
What do you guys think? Should situations like these be avoided? Are there some sort of techniques that could improve the performance of the batch tasks, keeping them from abending or running too slow?
Thanks.
Inserting should not be a big problem running ETL workloads i.e. with DataStage do this all the time.
I suggest to run an
ALTER TABLE <tabname> APPEND ON
This avoids the free space search - details can be found here
With the errors reported the information provided is not sufficient to get the cause of it.
There are several things to consider. What indexes are on the table is one.
Append mode works well to relieve last page contention, there are also issues where you could see contention with the statement itself for the variation lock. Then you could have issues with the transaction logs if they are not fast enough.
What is the table, indexes and statement and maybe we can come up with how to do it. What hardware are you using and what io subsystem is being used for transaction logs and database tablespaces.
I've got a PostgreSQL DB with very normalized data, so a lot of requests spawn a lot of joins and my DB works slow. I want to denormalize data from PostgreSQL and store it in a NoSQL DB for readonly access. For that I must provide sync between PostgreSQL and NoSQL (little latency is allowed). I want to consider different ways so I can choose the most suitable.
I can use events from models when there were changes and put them into a queue. After that a worker can process events and add necessary data to NoSQL, but I've got a lot of legacy code which is bad quality and I don't want to change it a lot. Also, I can denormalize data and put it to PostgreSQL but don't know if this is suitable solution or not.
What solutions exist for such tasks?
I did research on this topic and I've got results.
There are several ways to solve this task. I tell you about 3 general ways.
1) You can use signals(ORM signals for example) in your app to get notifications about changes.
You should put it to queue, RabbitMQ if changes are not a lot and Kafka if there are a lot of changes. It's a simple solution for not complicated apps which were good written.
If you have complex architecture and a lot of legacy then you should choose this approach:
General meaning about this approach is here
2) Use PostgreSQL logical decoding to get events about changes, it's very powerful feature. I found two solution where was used this feature: 1. Use tool bottledwater with Kafka, it works, but not develop any more. 2. Use tool debezium, it works and has active community.
3) Use PostgreSQL logical decoding to get events about changes and write own tool to get events.
Background/Intent:
So I'm going to create an event tracker from scratch and have a couple of ideas on how to do this but I'm unsure of the best way to proceed with the database side of things. One thing I am interested in doing is allowing these events to be completely dynamic, but at the same time to allow for reporting on relational event counters.
For example, all countries broken down by operating systems. The desired effect would be:
US # of events
iOS - # of events that occured in US
Android - # of events that occured in US
CA # of events
iOS - # of events that occured in CA
Android - # of events that occured in CA
etc.
My intent is to be able to accept these event names like so:
/?country=US&os=iOS&device=iPhone&color=blue&carrier=Sprint&city=orlando&state=FL&randomParam=123&randomParam2=456&randomParam3=789
Which means in order to do the relational counters for something like the above I would potentially be incrementing 100+ counters per request.
Assume there will be 10+ million of the above requests per day.
I want to keep things completely dynamic in terms of the event names being tracked and I also want to do it in such a manner that the lookups on the data remains super quick. As such I have been looking into using redis or mongodb for this.
Questions:
Is there a better way to do this then counters while keeping the fields dynamic?
Provided this was all in one document (structured like a tree), would using the $inc operator in mongodb to increment 100+ counters at the same time in one operation be viable and not slow? The upside here being I can retrieve all of the statistics for one 'campaign' quickly in a single query.
Would this be better suited to redis and to do a zincrby for all of the applicable counters for the event?
Thanks
Depending on how your key structure is laid out I would recommend pipelining the zincr commands. You have an easy "commit" trigger - the request. If you were to iterate over your parameters and zincr each key, then at the end of the request pass the execute command it will be very fast. I've implemented a system like you describe as both a cgi and a Django app. I set up a key structure along the lines of this:
YYYY-MM-DD:HH:MM -> sorted set
And was able to process Something like 150000-200000 increments per second on the redis side with a single process which should be plenty for your described scenario. This key structure allows me to grab data based on windows of time. I also added an expire to the keys to avoid writing a db cleanup process. I then had a cronjob that would do set operations to "roll-up" stats in to hourly, daily, and weekly using variants of the aforementioned key pattern. I bring these ideas up as they are ways you can take advantage of the built in capabilities of Redis to make the reporting side simpler. There are other ways of doing it but this pattern seems to work well.
As noted by eyossi the global lock can be a real problem with systems that do concurrent writes and reads. If you are writing this as a real time system the concurrency may well be an issue. If it is an "end if day" log parsing system then it would not likely trigger the contention unless you run multiple instances of the parser or reports at the time of input. With regards to keeping reads fast In Redis, I would consider setting up a read only redis instance slaved off of the main one. If you put it on the server running the report and point the reporting process at it it should be very quick to generate the reports.
Depending on your available memory, data set size, and whether you store any other type of data in the redis instance you might consider running a 32bit redis server to keep the memory usage down. A 32b instance should be able to keep a lot of this type of data in a small chunk of memory, but if running the normal 64 bit Redis isn't taking too much memory feel free to use it. As always test your own usage patterns to validate
In redis you could use multi to increment multiple keys at the same time.
I had some bad experience with MongoDB, i have found that it can be really tricky when you have a lot of writes to it...
you can look at this link for more info and don't forget to read the part that says "MongoDB uses 1 BFGL (big f***ing global lock)" (which maybe already improved in version 2.x - i didn't check it)
On the other hand, i had a good experience with Redis, i am using it for a lot of read / writes and it works great.
you can find more information about how i am using Redis (to get a feeling about the amount of concurrent reads / writes) here: http://engineering.picscout.com/2011/11/redis-as-messaging-framework.html
I would rather use pipelinethan multiif you don't need the atomic feature..
firstly please excuse my relative inexperience with Hibernate I’ve only really been using it in fairly standard cases, and certainly never in a scenario where I had to manage the primary key’s (#Id) myself, which is where I believe my problems lies.
Outline: I’m bulk-loading facebook profile information through FB’s batch API's and need to mirror this information in a local database, all of which is fine, but runs into trouble when I attempt to do it in parallel.
Imagine a message queue processing batches of friend data in parallel and lots of the same shared Likes and References (between the friends), and that’s where my problem lies.
I run into repeated Hibernate ConstraintViolationException’s which are due to duplicate PK entries - as one transaction has tried to flush it’s session after determining an entity as transient when in fact another transaction has already made the same determination and beaten the first to committing, resulting in the below:
Duplicate entry '121528734903' for key 'PRIMARY'
And the ConstraintViolationException being raised.
I’ve managed to just about overcome this by removing all cascading from the parent entity, and performing atomic writes, one record per-transaction and trying to essentially just catching any exceptions, ignoring them if they do occur as I’d know that another transaction had already done the job, but I’m not very happy with this solution and cant imagine it's the most efficient use of hibernate.
I'd welcome any suggestions as to how I could improve the architecture…
Currently using : Hibernate 3.5.6 / Spring 3.1 / MySQL 5.1.30
Addendum: at the moment I'm using a hibernate merge() which checks initially for the existence of a row and will either merge (update) or insert dependant on existence, problem is even with an isolation level of READ_UNCOMMITTED sometimes the wrong determination is made, i.e. two transactions decide the same, and I've got an exception again.
Locking doesn't really help me either, optimistic or pessimistic as the condition is only a problem in the initial insert case and there's no row to lock, making it very difficult to handle concurrency...
I must be missing something but I've done the reading, my worry is that not being able to leave hibernate to manage PK's i'm kinda scuppered - as it checks for existence to early in the session and come time to synchronise the session state is invalid.
Anyone with any suggestion for me..? thanks.
Take this with a large grain of salt as I know very little about Hibernate, but it sounds like what you need to do is specify that the default mysql INSERT statement is instead made an INSERT IGNORE statement. You might want to take a look at #SQLInsert in Hibernate, I believe that's where you would need to specify the exact insert statement that should be used. I'm sorry I can't help with the syntax, but I think you can probably find what you need by looking at the Hibernate documentation for #SQLInsert and if necessary the MySQL documentation for INSERT IGNORE.