How to execute Redshift queries in parallel - amazon-redshift

I need to simulate some basic load testing against my Redshift cluster and I need to execute around 20 SELECT queries in parallel.
Since stored procedures are not supported by Redshift, I would love to get some ideas on how I can accomplish this.

To initiate the selects in parallel, install this
https://github.com/gbb/par_psql
and then you can run parallel sql commands against redshift like this
export PGPASSWORD=your_pw; par_psql -h your_redshift -p 5439 -U your_username -d mydb —-file=myscript.sql

Check the WLM Query Slot Count
Check Route Queries to Queues

Related

How to run a sql statements sequentially?

I am running a set of SQL statements sequentially in the AWS Redshift Query editor.
sql-1
sql-2
sql-3
....
sql-N
However, in the Redshift Query editor, I cannot run multiple SQL statements. So currently I am running SQL statements one by one manually.
What is the alternative approach for me? For me looks like, I can use the DBeaver.
Is there more programmatic approach i.e. just by using a simple bash script?
If you have a Linux instance that can access the cluster you can use the psql command line tool. For example:
yum install postgresql
psql -h my-cluster.cjmul6ivnpa4.us-east-2.redshift.amazonaws.com \
-p 5439 \
-d my_db \
-f my_sql_script.sql
We recently announced a way to schedule queries: https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-redshift-supports-scheduling-sql-queries-by-integrating-with-amazon-eventbridge/ And even more recently published this blog post that walks you through all the steps using the Console or the CLI: https://aws.amazon.com/blogs/big-data/scheduling-sql-queries-on-your-amazon-redshift-data-warehouse/ Hope these links help.

Best way to make PostgreSQL backups

I have a site that uses PostgreSQL. All content that I provide in my site is created at a development environment (this happens because it's webcrawler content). The only information created at the production environment is information about the users.
I need to find a good way to update data stored at production. May I restore to production only the tables updated at development environment and PostgreSQL will update this records at production or the best way would be to backup the users information at production, insert them at development and restore the whole database at production?
Thank you
You can use pg_dump to export the data just from the non-user tables in the development environment and pg_restore to bring that into prod.
The -t switch will let you pick specific tables.
pg_dump -d <database_name> -t <table_name>
https://www.postgresql.org/docs/current/static/app-pgdump.html
There are many tips arounds this subject here and here.
I'd suggest you to take a look on these links before everything.
If your data is discarded at each update process then a plain dump will be enough. You can redirect pg_dump output directly to psql connected on production to avoid pg_restore step, something like below:
#Of course you must drop tables to load it again
#so it'll be reasonable to make a full backup before this
pg_dump -Fp -U user -h host_to_dev -T=user your_db | psql -U user -h host_to_production your_db
You might asking yourself "Why he's saying to drop my tables"?
Bulk loading data on a fresh table is faster than deleting old data and inserting again. A quote from the docs:
Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded.
Ps¹: If you can't connect on both environment at same time then you need to do pg_restore manually.
Ps²: I don't recommend it but you can append --clean option on pg_dump to generate DROP statements automatically. Be extreme careful with this option to avoid dropping unnexpected objects.

Where does pgdump do its compression?

Here is the command I am using
pgdump -h localhost -p 54321 -U example_user --format custom
which dumps a database on a remote server which I have connected to with a port forward on port 54321.
I know that the custom format does some compression by default.
Does this compression happen on the database server, or does everything get sent across to my local machine where the compression happens.
Compression is done on the client side so everything gets sent to your computer. What pg_dump does to the database is that it just executes ordinary queries to get the data.
PostgreSQL Documentation: 24.1. SQL Dump:
pg_dump is a regular PostgreSQL client application (albeit a particularly clever one).
PostgreSQL Documentation - II. PostgreSQL Client Applications - pg_dump:
pg_dump internally executes SELECT statements. If you have problems running pg_dump, make sure you are able to select information from the database using, for example, psql.
If you need more information about the inner workings of pg_dump I would suggest asking it from PostgreSQL mailing list or looking at the source code.

How to migrate data to remote server with PostgreSQL?

How can I dump my database schema and data in such a way that the usernames, database names and the schema names of the dumped data matches these variables on the servers I deploy to?
My current process entails moving the data in two steps. First, I dump the schema of the database (pg_dump --schema-only -C -c) then I dump out the data with pg_dump --data-only -C and restore these on the remote server in tandem using the psql command. But there has to be a better way than this.
We use the following to replicate databases.
pg_basebackup -x -P -D /var/lib/pgsql/9.2/data -h OTHER_DB_IP_ADDR -U postgres
It requires the "master" server at OTHER_DB_IP_ADDR to be running the replication service and pg_hba.conf must allow replication connections. You do not have to run the "slave" service as a hot/warm stand by in order to replicate. One downside of this method compared with a dump/restore, the restore operation effectively vacuums and re-indexes and resets EVERYTHING, while the replication doesn't, so replicating can use a bit more disk space if your database has been heavily edited. On the other hand, replicating is MUCH faster (15 mins vs 3 hours in our case) since indexes do not have to be rebuilt.
Some useful references:
http://opensourcedbms.com/dbms/setup-replication-with-postgres-9-2-on-centos-6redhat-el6fedora/
http://www.postgresql.org/docs/9.2/static/high-availability.html
http://www.rassoc.com/gregr/weblog/2013/02/16/zero-to-postgresql-streaming-replication-in-10-mins/

Faster method for copying postgresql databases

I have 12 test databases on a test server and I generate the dump from the main server and copy it to the test server every day. I populate the test databases using:
zcat ${dump_address} | psql $db_name
It takes 45 minutes for each database. Is it faster if I do that for just one database then use:
CREATE DATABASE newdb WITH TEMPLATE olddb;
for the rest? Are there any other methods I could try?
It depends how much data and indexes there is. Some ideas to speed things up include:
Make sure the dump does not INSERT commands - they tend to be much slower than COPY.
If you have many indexes or constraints, use the custom or archive dump format and then call pg_restore -j $my_cpu_count. -j controls how many threads are creating these concurrently, and index creation is often CPU-bound.
Use faster disks.
Copy $PGDATA in its entirety (rsync or some fancy snapshots).