I'm investigating the scalibility of Sequalize in a production app, specifically the increment function, to see how well it can handle when a row could theortically be updated several times simultaneously (say, the totals row of something). My question is can the sequalize increment operator be trusted for these little addition operations that could be concurrent?
We're using Postgres on the backend, but I'm not familar with the internals of Postgres and how it would handle this type of scenerio (heroku postgres will be the production host, if it matters).
The Docs / The Code
The sql ran by Sequealize according to the code comments
SET column = column + X
It's hard to say without complete SQL examples, but I'd say this will likely serialize all transactions that call it on the same object.
If you update an object, the db takes a row update lock that's only released at commit/rollback time. Other updates/deletes block on this lock until the first tx commits or rolls back.
Related
I work with a software that is used by a lot of different clients in several countries, with different needs, rules and constraints on their data.
When I make a change to the database's structure, I have a tool to test it on every client's database, obviously with read-only rights. This means that the best way to test a query like UPDATE table SET x = y WHERE condition
is to call the "read-only part" SELECT x FROM table WHERE condition.
It works but it's not ideal, as sometimes it is writing data that causes problems (mostly deadlocks or timeouts), meaning I can't see the problem until a client suffers from it.
I'm wondering if there is a way to grant write permissions in Postgres, but only when inside a transaction, and force a rollback on every transaction. This way, changes could be tested accurately on real data and still prevent any dev from editing it.
Any ideas?
Edit: the volumes are too large to consider cloning data for every dev who needs to run a query
This sounds similar to creating an audit table to record information about transactions. I would consider using a trigger to write a copy of the data to a "rollback" table/row and then copy the "rollback" table/row back on completion of the update.
I am a new for Postgres. Currently I hit a situation, I need to retrieve a record from DB/table, then update this record with some changes, then retrieve the updated record again to return to customer, these 3 operations are executed sequentially.
After I run the above 3 steps, sometimes it seems the record has not been updated. But in fact the record has been updated, I suspect when I retrieve the record, Postgres return a cached data rather than fresh data. Actually all DB operations are correct, I guess just Postgres returns cached records.
I am wondering if there is any mechanism to flush my updated data immediately?
Another question is, I am wondering which one is a good practice:
Update a record (actually write to DB), then immediately retrieve the record from DB, both operations are using statement to operate.
Update a record (actually write to DB), then don't retrieve record from DB, because we know updated data, we just use these data to return to customer. However, the record might fail to write DB.
Any ideas for the above?
Some programming languages do db calls asynchronously, meaning that your code is moving on to the next db operation without waiting for the first to finish. So, it could be as simple as using your language's "await" keyword to make sure you are waiting for your db to finish the "update record" before trying to read it again.
If you are writing raw sql or you know it is not an issue with making several db call asynchronously, you could try writing your update and read calls as a single transaction.
See https://www.postgresql.org/docs/14/tutorial-transactions.html if you're unfamiliar with writing transactions.
I want to periodically export data from db2 and load it in another database for analysis.
In order to do this, I would need to know which rows have been inserted/updated since the last time I've exported things from a given table.
A simple solution would probably be to add a timestamp to every table and use that as a reference, but I don't have such a TS at the moment, and I would like to avoid adding it if possible.
Is there any other solution for finding the rows which have been added/updated after a given time (or something else that would solve my issue)?
There is an easy option for a timestamp in Db2 (for LUW) called
ROW CHANGE TIMESTAMP
This is managed by Db2 and could be defined as HIDDEN so existing SELECT * FROM queries will not retrieve the new row which would cause extra costs.
Check out the Db2 CREATE TABLE documentation
This functionality was originally added for optimistic locking but can be used for such situations as well.
There is a similar concept for Db2 z/OS - you have to check that out as I have not tried this one.
Of cause there are other ways to solve it like Replication etc.
That is not possible if you do not have a timestamp column. With a timestamp, you can know which are new or modified rows.
You can also use the TimeTravel feature, in order to get the new values, but that implies a timestamp column.
Another option, is to put the tables in append mode, and then get the rows after a given one. However, this option is not sure after a reorg, and affects the performance and space utilisation.
One possible option is to use SQL replication, but that needs extra tables for staging.
Finally, another option is to read the logs, with the db2ReadLog API, but that implies a development. Also, just appliying the archived logs into the new database is possible, however the database will remain in roll forward pending.
As title say, it is possible to issue a query on psql with a "begin", query, and "commit".
What I want to know is what happens if I don't use a "begin" command?
Some database engine will allow you to execute modifications (INSERT, UPDATE, DELETE) without an open transaction. It's basically assumed that you have an instant BEGIN / COMMIT around each of your instructions, which is a bad practice in case something goes wrong in a batch of many instructions.
You can still make a SELECT, but no INSERT, UPDATE, DELETE without a BEGIN to enforces the good practice. That way, if something goes wrong, a ROLLBACK is instantly executed, canceling all your modifications as if they never existed.
Using a transaction around a batch of various SELECT will guarantee that the data you get for each SELECT matches the same version of the database at the instant you open the transaction depending on your ISOLATION level.
Please read this for more information :
http://www.postgresql.org/docs/9.5/static/sql-start-transaction.html
and
http://www.postgresql.org/docs/9.5/static/tutorial-transactions.html
If you don't use BEGIN/COMMIT, it's the same as wrapping each individual query in a BEGIN/COMMIT block. You can use BEGIN/COMMIT to group multiple queries into a single transaction. A few reasons you might want to do so include
Updating multiple tables at the same time. For instance, usually when you delete a record you also want to delete other rows that reference it. If you do this in the same transaction, nothing will ever be able to reference a row that's already been deleted.
You want to be able to revert some changes if something goes wrong later. Suppose you're writing some user inputted data to multiple tables. At some point you realize that some of it isn't formatted properly. You probably wouldn't want to insert any of it, so you should wrap the entire operation in a transaction.
If you want to ensure the data you're updating hasn't been updated while you're writing to it. Suppose I'm adding $10 to a bank account from two separate connections. I want to add $20 in total - I don't want one of the UPDATEs to clobber the other.
Postgres gives you the first two of these by default. The last one would require a higher transaction isolation level, and makes your query run the risk of raising a serialization error. Transaction isolation levels are a fairly complicated topic, so if you want more info on them the best place to go is the documentation.
I have a question regarding a semi-constant update in a database. In short it is regarding a checkout function on a web page, which each time the checkout function is evoked it do five steps.
I want to try to optimize this function and have my eye on a step where I update a table each time the checkout is performed. I take the information retrieved from the shopping cart and then update the table in question.
I do have some indexes on the table, the gain from those are greater than leaving them so this is a cost I’m willing to take.
Now, my question is. Could it in some way regarding to performance be better to not update the table instantly but collect every checkout items and save them in some way (maybe in a file) and then at a specific time (or several times) at day take this file and then update the table with the new information.
Then I started thinking about if there was a possibility to use some sort of Bulk Update to take a file, hashmap, array (or?) and then update it.
And I’m using IBM DB2 version 9.7
Mestika
You will lose the ability to do transactions, or to recover from failure after a step midway, so I would avoid using this approach. You could try using prepared statements, or batch updates offered by JDBC 2.0 where multiple statements are submitted to the DB as a single unit.