Mysterious hash collision issue in DynamoDB - hash

I am pretty new in DynamoDB. I am currently investigating an issue in DynamoDB. Once a transaction file was validated as a normal transaction in a lambda, dynamoDB has found it duplicated transaction. It turned out it was not a duplicated transaction and it had to be normally processed with other transactions. But it was not the successful case weirdly. I am suspecting that there was collision when it compared hashes(partition key) and somehow it understood the key was not unique enough. It is definitely unique one along with sort key. If it is a duplicated matter, it throws ConditionalCheckFailedException in the current situation and it won't be recorded further once it's filtered in db. I would like to have a better understanding on how DynamoDB can have the issue and what could be the good solutions to prevent the same incident. It never had this issue before. If you have any idea, I would love to hear you!

Related

Do Firebase/Firestore Transactions create internal queues?

I'm wondering if transactions (https://firebase.google.com/docs/firestore/manage-data/transactions) are viable tools to use in something like a ticketing system where users maybe be attempting to read/write to the same collection/document and whoever made the request first will be handled first and second will be handled second etc.
If not what would be a good structure for such a need with firestore?
Transactions just guarantee atomic consistent update among the documents involved in the transaction. It doesn't guarantee the order in which those transactions complete, as the transaction handler might get retried in the face of contention.
Since you tagged this question with google-cloud-functions (but didn't mention it in your question), it sounds like you might be considering writing a database trigger to handle incoming writes. Cloud Functions triggers also do not guarantee any ordering when under load.
Ordering of any kind at the scale on which Firestore and other Google Cloud products operate is a really difficult problem to solve (please read that link to get a sense of that). There is not a simple database structure that will impose an order where changes are made. I suggest you think carefully about your need for ordering, and come up with a different solution.
The best indication of order you can get is probably by adding a server timestamp to individual documents, but you will still have to figure out how to process them. The easiest thing might be to have a backend periodically query the collection, ordered by that timestamp, and process things in that order, in batch.

Transactional guarantee in mongodb

So, I am doing research on MongoDB, in line with upper management decision to embrace open source and to migrate existing product database from SQL Server to MongoDB and revamp the entire thing. Do note that our database should focus on data consistency and transactional guarantee.
And i discover this post: Click here. A summary of the post is as follow:
MongoDB claims to be strongly consistent, but a lot of evidence
recently has shown this not to be the case in certain scenarios (when
network partitioning occurs, which can happen under heavy load). This
means that you can potentially lose records that MongoDB has
acknowledged as "successfully written".
In terms of your application, if you have a need to have transactional
guarantees (meaning if you can't make a durable write, you need the
transaction to fail), you should avoid MongoDB. Example scenarios
where strong consistency and durability are essential include "making
a deposit into a bank account" or "creating a record of birth". To put
it another way, these are scenarios where you would get punched in the
face by your customer if you indicated an operation succeeded and it
didn't.
So, my question is as follow:
1) To what extend does "lost data" still valid in current version of MongoDB?
2) What approach can be take to ensure transactional guarantee in MongoDB?
I am pretty sure that if company like PayPal do use MongoDB, there is certainly a way of overcoming these issue.
The references in that post have been discussed here before (for example, here is one: MongoDB: Does Write Concern guarantee the write on primary plus atleast one secondary ). No need to duplicate your question.
The blog "Aphyr" mostly uses these articles to tout it's own tech (if you read the entire blog you will realise they have their own database which they market). Every database they show loses data except their own.
2) What approach can be take to ensure transactional guarantee in MongoDB?
I agree you should be handling database problems in client code, if not then how is your client side ever going to remain consistent in the event of partitions?
Since you are not Harry Potter (are you?) I will say that you need to check for exceptions thrown in your client code and react to them as needed.
1) To what extend does "lost data" still valid in current version of MongoDB?
As for the bug he mentions in 2.4.3: he fails (as I mention in the linked post) to state the bug reference so again, no comment still.
Besides 2 writes in 6,000? That's less data loss than I have seen in MySQL on a partition! So not too shabby.
I have not noticed such behaviour in my own app and, from small to extremely large sites, I have not noticed anyone reproduce benchmark type scenarios as displayed in that article, I doubt very much you will.
I am pretty sure that if company like PayPal do use MongoDB, there is certainly a way of overcoming these issue.
They would have tight coding to ensure consistency in distributed environments.
Of course, they would start by choosing the right tech for the situation...
Write Concern Reference
Write concern describes the level of acknowledgement requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters. In sharded clusters, mongos instances will pass the write concern on to the shards.
https://docs.mongodb.org/v3.0/reference/write-concern/

Hibernate High Concurrency and User defined #Id ’s

firstly please excuse my relative inexperience with Hibernate I’ve only really been using it in fairly standard cases, and certainly never in a scenario where I had to manage the primary key’s (#Id) myself, which is where I believe my problems lies.
Outline: I’m bulk-loading facebook profile information through FB’s batch API's and need to mirror this information in a local database, all of which is fine, but runs into trouble when I attempt to do it in parallel.
Imagine a message queue processing batches of friend data in parallel and lots of the same shared Likes and References (between the friends), and that’s where my problem lies.
I run into repeated Hibernate ConstraintViolationException’s which are due to duplicate PK entries - as one transaction has tried to flush it’s session after determining an entity as transient when in fact another transaction has already made the same determination and beaten the first to committing, resulting in the below:
Duplicate entry '121528734903' for key 'PRIMARY'
And the ConstraintViolationException being raised.
I’ve managed to just about overcome this by removing all cascading from the parent entity, and performing atomic writes, one record per-transaction and trying to essentially just catching any exceptions, ignoring them if they do occur as I’d know that another transaction had already done the job, but I’m not very happy with this solution and cant imagine it's the most efficient use of hibernate.
I'd welcome any suggestions as to how I could improve the architecture…
Currently using : Hibernate 3.5.6 / Spring 3.1 / MySQL 5.1.30
Addendum: at the moment I'm using a hibernate merge() which checks initially for the existence of a row and will either merge (update) or insert dependant on existence, problem is even with an isolation level of READ_UNCOMMITTED sometimes the wrong determination is made, i.e. two transactions decide the same, and I've got an exception again.
Locking doesn't really help me either, optimistic or pessimistic as the condition is only a problem in the initial insert case and there's no row to lock, making it very difficult to handle concurrency...
I must be missing something but I've done the reading, my worry is that not being able to leave hibernate to manage PK's i'm kinda scuppered - as it checks for existence to early in the session and come time to synchronise the session state is invalid.
Anyone with any suggestion for me..? thanks.
Take this with a large grain of salt as I know very little about Hibernate, but it sounds like what you need to do is specify that the default mysql INSERT statement is instead made an INSERT IGNORE statement. You might want to take a look at #SQLInsert in Hibernate, I believe that's where you would need to specify the exact insert statement that should be used. I'm sorry I can't help with the syntax, but I think you can probably find what you need by looking at the Hibernate documentation for #SQLInsert and if necessary the MySQL documentation for INSERT IGNORE.

In a SQLite database is it better to use tirggers to handle cascading table changes, or is it better to do it programmatically?

Background
I have a couple of projects that use a SQLite DB for data. The data stored in the databases are obviously stored across several tables, linked by key/foreign key values.
The thing is that in these databases, if something changes to one record I have to update several other tables. The best example off the top of my head is deleting a record. I have to make sure all other records related to the one being deleted are deleted as well. Now, this example can be solved using key/foreign key values, I believe, but what about more complicated updates?
Now I'm no pro DB admin, but I know that there needs to be data integrity in the DB or things get ugly.
The Question
So, my question. I know that I have greater control when updating related tables programmatically, but at the cost of human error and time. I may miss something or not implement the tables updates correctly and it takes a lot longer to code in the updates. On the other hand, I can put in triggers and let the DB handle the updates to other tables, but I then lose a lot of control.
So, which one is better? Is each better in different situations?
On the other hand, I can put in
triggers and let the DB handle the
updates to other tables, but I then
lose a lot of control.
What control do you think you're losing? If data integrity requires that "such-and-such an update here requires additional updates there and there", you're not losing control by coding that in a trigger. You're centralizing control, and delegating it to the dbms, which is the only piece of software that can guarantee every application follows those requirements.
I know that I have greater control
when updating related tables
programmatically, but at the cost of
human error and time. I may miss
something or not implement the tables
updates correctly and it takes a lot
longer to code in the updates.
You're thinking like a programmer, not a database designer. (That's an observation, not a criticism.) Don't think, "I might miss something". That way of thinking really misses the mark.
Instead, when you're tempted to delegate data integrity to application code, think "Every programmer and every new or changed application that hits this database from now until the end of time has to get it perfectly right."
Now, honestly, does that really sound like a good idea to you?
(The last Fortune 500 company I worked in had programs written in at least two dozen different languages hitting their OLTP database.)

Propagated delete in code or database?

I'm working on an iPhone application with a few data relationships (Author -> Books for example). When a user deletes an Author object from the application, I have a few SQLite triggers that run on the delete to remove any books from the database that have a foreign key matching the Author's primary key.
I'm also using a trigger to insert some data when a new item is created.
I can't help but shake the feeling that this might be bad design or lead to some problems down the road I am not thinking of. That said, should I rely on code in my app to handle propagating the deletes like this when the database has the capability built in to handle it?
What say you?
True. Use the inbuilt capabilities of the database as much as possible. Atleast try and start off like that and only compromise when things really demand so.
I would make use of the database's features to ensure relational integrity, especially with respect to updates/deletes. There are cases where I might use a trigger to insert some additional data (auditing comes to mind), though I would tend to avoid this and insert all of the data from my application. If you are doing multiple inserts, though, make sure to wrap it all in a single transaction so that you don't end up with a partial insert which could lead to loss of relational integrity.
I like the idea of using the database's built in functionality (I am not familiar with how it works).. but I would worry if I went back to the code a year from now, would I remember how it worked? (Given the code isn't right in front of me).
I imagine if you add a lot of comments to remind yourself about how it works now, if anything goes wrong in the future, at least you won't need to relearn the database features when you need to go do some debugging.
You're a few steps ahead of me: I recently learned about how to do that stuff with triggers and I am tempted to use them myself.
Based on the other answers here, it seems like a philosophical choice. It would probably be fine to use either triggers or code, but best to be consistent. So don't use triggers for cascading deletes on one table but then C code for another table.
Since you tagged the question iphone, I think the most important difference would be relative performance of C code versus a trigger. You'd probably have to code both and experiment to determine the difference, if any.
Another thing that comes to mind is that, of all the horror stories that I read on thedailywtf.com, about half of them seem to involve database triggers.
Unfortunately SQLite does NOT support on delete cascade etc. From the SQLite documentation:
http://www.sqlite.org/omitted.html
FOREIGN KEY constraints are parsed but are not enforced. However, the equivalent constraint enforcement can be achieved using triggers. The SQLite source tree contains source code and documentation for a C program that will read an SQLite database, analyze the foreign key constraints, and generate appropriate triggers automatically.
There is some support for triggers but it is not complete. Missing subfeatures include FOR EACH STATEMENT triggers (currently all triggers must be FOR EACH ROW), INSTEAD OF triggers on tables (currently INSTEAD OF triggers are only allowed on views), and recursive triggers - triggers that trigger themselves.
Therefore, the only way to code on delete cascade etc using SQLite requires triggers.
Kind regards,
Code goes in your app.
Triggers are code. The functionality goes in your app. Not in the database.
I think that databases should be used for data, not processing. I think apps should be used for processing, not data.
Database processing features merely muddy the water.