Capture Postgresql Trafic to replay it on another database - postgresql

In order to check if a new version of the database (in staging) react the same way (or better) than the production database, I would like to capture all requests execute on production server, .. to replay them on the staging database.
Is there a tool that does this job ?
what would be interesting is the abality to compare execution time, when replay, and highlight queries executed slower.
Else, I thought I would catpure queries by configuring '0' to log_min_statement_duration (so that queries can be logged in postgresql logfile), and then parse the file to grab and re play request on other server.... is there a better way to do it ?
(current database version postgresql9.6, but I'm interesting even if it's for higher version.. for next time)

Related

PostgreSQL external transaction feature

I have a big web application and tests which make requests to app running in sandbox. After each test I used to rollback database using db migrate rollback && db migrate && db seed. But now after test quantity rised, It takes much time. So, I am looking for feature which can wrap some amount of database command into a transaction and after test finish cancel transaction without modifying app source code (or make this by another way). May be there are some postgres database parameters or extensions?
I found another way..
I can make a dump one time and then drop and restore dump every time next, much faster)
look this topic:
Truncating all tables in a Postgres database

How to cache a query response with Postgres?

I have a database that syncs completely every 2 hours. All data is dropped and populated from the main data source.
I have some queries coming from client app, that have the same response for the current 2-hours dataset. So, if 100 clients run their apps, I will have to run this query 100 times for each of them, even though they don't differ.
How do I avoid running this real query against my database every time, but just keep its response somewhere and return it instead?
I think I can run this query after each sync and save to its own table then return from it.
What are other options, probably provided by Postgres itself?
You should use something like redis to store the result or your query in memory. It comes with many clients. You can invalidate the result of this query when it's time to.
There are other memory caching like memcache, easy to install & to use.
Note these are specific to postgres.

PostgreSQL blocking on too many inserts

I am working on a research platform that reads relevant Twitter feeds via the Twitter API and stores them in a PostgreSQL database for future analysis. Middleware is Perl, and the server is an HP ML310 with 8GB RAM running Debian linux.
The problem is that the twitter feed can be quite large (many entries per second), and I can't afford to wait for the insert before returning to wait for the next tweet. So what I've done is to use a fork() so each tweet gets a new process to insert into the database and the listener and return quickly to grab the next tweet. However, because each of these processes effectively opens a new connection to the PostgreSQL backend, the system never catches up with its twitter feed.
I am open to using a connection pooling suggestion and/or to upgrading hardware if necessary to make this work, but would appreciate any advice. Is this likely RAM bound, or is there configuration or software approaches I can try to make the system sufficiently speedy?
If you open and close a new connection for each insert, that is going to hurt big time. You should use a connection pooler instead. Creating a new database connection is not a lightweight thing to do.
Doing a a fork() for each insert is probably not such a good idea either. Can't you create one process that simply takes care of the inserts and listens on a socket, or scans a directory or something like that and another process signalling the insert process (a classical producer/consumer pattern). Or use some kind of message queue (I don't know Perl, so I can't say what kind of tools are available there).
When doing bulk inserts do them in a single transaction, sending the commit at the end. Do not commit each insert. Another option is to write the rows into a text file and then use COPY to insert them into the database (it doesn't get faster than that).
You can also tune the PostgreSQL server a bit. If you can afford to lose some transactions in case of a system crash, you might want to turn synchronous_commit off.
If you can rebuild the table from scratch anytime (e.g. by re-inserting the tweets), you might also want to make that table an "unlogged" table. It is faster than a regular table in writing, but if Postgres is not shown down cleanly, you lose all the data in the table.
Use COPY command.
One script reads Tweeter and appends strings to the CSV file on disk.
Other scripts looking for CSV file on disk, renamed this file file and started COPY command from this file.

is it possible to fork a mysqldump of data?

I am restoring a mysql database with perl on a remote server with about 30 million records. It's taking > 2 days & looking at my network connections I am not fully utilizing my uplink bandwidth. I will need to do this at least 1x per week. Is there a way to fork a mysqldump (I'm using perl) so that I can take full advantage of my bandwidth (I don't mind if I'm choked off for a bit...I just need to get this done faster).
Can't you upload the whole dump to the remote server and start the restore there?
A restore of a mysqldump is just the execution of a long series of commands that would restore your database from scratch. If the execution path for that is; 1) send command 2) remote system executes command 3) remote system replies that the command is complete 4) send next command, then you are spending most of your time waiting on network latency.
I do know that most SQL hosts will allow you to upload a dump file specifically to avoid the kinds of restore time that you're talking about. The company that takes my money each month even has a web-based form that you can use to restore a database from a file that has been uploaded via sftp. Poke around your hosting service's documentation. They should have something similar. If nothing else (and you're comfortable on the command line) you can upload it directly to your account and do it from a shell there.
mk-parallel-dump and mk-parallel-restore are designed to do what you want, but in my testing mk-parallel-dump was actually slower than plain old mysqldump. Your mileage may vary.
(I would guess the biggest factor would be the number of spindles your data files reside on, which in my case, 1, was not especially conducive to parallelization.)
First caveat: mk-parallel-* writes a bunch of files, and figuring out when it's safe to start sending them (and when you're done receiving them) may be a little tricky. I believe that's left as an exercise for the reader, sorry.
Second caveat: mk-parallel-dump is specifically advertised as not being for backups. Because "At the time of this release there is a bug that prevents --lock-tables from working correctly," it's really only useful for databases that you know will not change, e.g., a slave that you can STOP SLAVE on with no repercussions, and then START SLAVE once mk-parallel-dump is done.
I think a better solution than parallelizing a dump may be this:
If you're doing your mysqldump on a weekly basis, you can just do it once (dumping with --single-transaction (which you should be doing anyway) and --master-data=n) and then start a slave that connects over an ssh tunnel to the remote master, so the slave is continually updated. The disadvantage is that if you want to clone a local copy (perhaps to make a backup) you will need enough disk to keep an extra copy around. The advantage is that a week's worth of (query-based) replication log is probably quite a bit smaller than resending the data, and also it arrives gradually so you don't clog your pipe.
How big is your database in total? What kind of tables are you using?
A big risk with backups using mysqldump has to do with table locking, and updates to tables during the backup process.
The mysqldump backup process basically works as follows:
For each table {
Lock table as Read-Only
Dump table to disk
Unlock table
}
The danger is that if you run an INSERT/UPDATE/DELETE query that affects multiple tables while your backup is running, your backup may not capture the results of your query properly. This is a very real risk when your backup takes hours to complete and you're dealing with an active database. Imagine - your code runs a series of queries that update tables A,B, and C. The backup process currently has table B locked.
The update to A will not be captured, as this table was already backed up.
The update to B will not be captured, as the table is currently locked for writing.
The update to C will be captured, because the backup has not reached C yet.
This is an easy way to destroy referential integrity in your database.
Your backup process needs to be atomic, and transactional. If you can't shut down the entire database to writes during the backup process, you're risking disaster.
Also - there must be something wrong here. At a previous company, we were running nightly backups of a 450G Mysql DB (largest table had 150M rows), and it took less than 6 hours for the backup to complete.
Two thoughts:
Do you have a slave database? Run the backup from there - Stop replication (preventing RW risk), run the backup, restart replication.
Are your tables using InnoDB? Consider investing in InnoDBhotbackup, which solves this problem, as the backup process leverages the journaling that is part of the InnoDB storage engine.

SyBase SQL anywhere check if Synchronization is needed?

I have a Sybase SQL Anywhere 11.0.1 database that I am using to sync with an Oracle Consolidated Database.
I know that the SQL Anywhere database keeps track of all of the changes that are made to it so that it knows what to synchronize with the consolidated database. My question is whether or not there is a SQL command that will tell you if the database has changes to sync.
I have a mobile application and I want to show a little flag to the user anytime they have made changes to the handheld that need to be synced. I could just create another table to track that stuff myself but I would much rather just ping the database and ask it if it has changes that need to be synced.
There's nothing automatic to tell you that there is data to synchronize. In addition to Ben's suggestion, another idea would be to query the SYS.SYSSYNC table at the remote database to get an idea of whether there might be changes. The following statement returns a result set that shows a simple status of your last synchronization :
select ss.site_name, sp.publication_name, ss.log_sent,ss.progress
from sys.syssync ss, sys.syspublication sp
where ss.publication_id = sp.publication_id
and ss.publication_id is not null
and ss.site_name is not null
If progress < log_sent, then the status of the last synchronization is unknown. The last upload may or may not have been applied at the consolidated, because the upload was sent, but no response was received from the MobiLink server. In this case, suggesting a synch isn't a bad idea.
If progress = log_sent, then the last synch was successful. Knowing this, you could check the value of db_property('CurrentRedoPos'), which will return the current log offset of the remote database. If this value is significantly higher than the progress value, there have been many operations applied to the database since the last synchronization, so there's a good chance that there is data to synchronize. There are lots of reasons why even a large difference in progress and db_property('CurrentRedoPos') could result in no actual data needing synchronization.
The download from the ML Server is applied by dbmlsync after the progress value at the remote is updated by dbmlsync when the upload is confirmed by the ML Server. Operations applied in the download by dbmlsync are not synchronized back to the ML Server, so the entire offset range could just be the last download that was applied. This could be worked around by tracking the current log offset in the sp_hook_dbmlsync_end hook when the exit code value in the #hook_dict table value is zero. This would tell you the log offset of the database after the download was applied, and you could now compare the saved value with the current log offset.
All the operations in the transaction log could be operations on tables that are not synchronized.
All the operations in the transaction log could have been rolled back.
My solution is not ideal. Tracking the changes to synchronized tables yourself is the best solution, but I thought I could offer an alternative that might be OK for your needs, with the advantage that you are not triggering an extra action on every operation performed on a synchronized table.
The mobile database doesn't keep track of when the last sync was, the MobiLink server keeps all of that information in the MobiLink tables of the consolidated database.
Since synchronization only transfers necessary information, you could simply initiate a sync. If there's nothing to sync, then very little data will be used by your application.
As a side note, SQL Anywhere has its own SO clone which is monitored by Sybase engineers. If anyone knows for sure, it'll be them.
As of SQL Anywhere 17, SAP PM maps to a local Sybase database that contains a TTRANSACTION_UPLOAD table, so to determine if a synchronization is necessary we simply query this table to see if it has any records that need to be sync'd to the HANA consolidation database.