We have migrated some of the data to PostgreSQL from MS-SQL Server. And are using R6G.Large aurora PostgreSQL RDS instance. We have transferred the data using DMS to PostgreSQL instance, and table size is around 183 GB and it has around 1.5 billion records. Now we are trying to create a Primary Key on an Id column, but it is failing with the below error...
ERROR: could not write to file "base/pgsql_tmp/pgsql_tmp18536.30": No
space left on device CONTEXT: SQL statement "ALTER TABLE
public.tbl_actions ADD CONSTRAINT tbl_actions_pkey PRIMARY KEY
(action_id)" PL/pgSQL function inline_code_block line 10 at SQL
statement SQL state: 53100
When looked at the documentation we found that index creation will use the temporary storage of the instance, and for r6g.large has 32 GiB. And for this huge table, that storage is not sufficient hence the index creation is failed with above error.
Is there any workaround to solve this without having to upgrade the instance type, may be by changing some values in parameter group or options groups.
To me, this looks like the storage has run out and not the RAM. You can check this using the Monitoring tab under the heading "Free Storage Space" on the RDS instance in AWS Console.
Try this:
To increase storage for a DB instance
Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.
In the navigation pane, choose Databases.
Choose the DB instance that you want to modify.
Choose Modify.
Enter a new value for Allocated storage. It must be greater than the current value.
More details here:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIOPS.Storage
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Troubleshooting.html#CHAP_Troubleshooting.Storage
Related
How to migrate my whole database which is currently in AWS RDS Postgres to AWS Redshift and also can you please help me out how can I keep both these DBs in sync. I want to sync even if any column is updated in RDS so it must get updated in Redshift also.
I know we can achieve it with AWS Glue, but the above scenario is mandatory in my case. Migration task is easy to do but to to the CDC migration is bit challenging. I am also aware about the bookmark key but my situation is bit different, I do not have any sequential column in the tables, but it has updated_at field in all the tables so this column is the only field on which I can check whether the record is processed or not so that duplicate processing may not occur and if any new data is inserted it should also get replicated in RedShift.
So, would anyone help me out to do this even by using pyspark script?
Thanks.
We are having a very small Database for storing some relational data in an Amazon RDS instance. The version of the PostgreSQL Engine is 12.7.
There are a number of lambda functions in AWS in the same region, that access this instance for inserting records. In the process, some join queries are also used. We use psycopg2 Python library to interact with the DB. Since, the size of the data is very small, we have used a t2.small instance with 20GB storage and 1 CPU. In the production, however, a t2.medium instance has been used. Auto Scaling has not been enabled.
Recently, we have started experiencing an issue with this database. After the lambda functions run for a while, at some point, they time out. This is because the database takes too long to return a response, or, some times throws a Disk Full error as follows:
DiskFull
could not write to file "base/pgsql_tmp/pgsql_tmp1258.168": No space left on device
I have referred this documentation to identify the cause. Troubleshoot RDS DiskFull error
Following are the queries for checking the DB file size:
SELECT pg_size_pretty(pg_database_size('db_name'));
The response of this query is 35 MB.
SELECT pg_size_pretty(SUM(pg_relation_size(oid))) FROM pg_class;
The output of the above query is 33 MB.
As we can see, the DB file size is very small. However, on checking the size of the temporary files, we see the following:
SELECT datname, temp_files AS "Temporary files",temp_bytes AS "Size of temporary files" FROM pg_stat_database;
If we look at the size of the temporary files, its roughly 18.69 GB, which is why the DB is throwing a DiskFull error.
Why is the PostgreSQL instance not deleting the temporary files after the queries have finished? Even after rebooting the instance, the temporary file size is the same (although this is not a feasible solution as we want the DB to delete the temporary files on its own). Also, how do I avoid the DiskFull error as I may want to run more lambda functions that interact with the DB.
Just for additional information, I am including some RDS Monitoring graphs taken while the DB slowed down for CPU Utilisation and Free Storage Space:
From this, I am guessing that we probably need to enable autoscaling as the CPU Utilisation hits 83.5%. I would highly appreciate if someone shared some insights and helped in resolving the DiskFull error and identify why the temporary files are not deleted.
One of the join queries the lambda function runs on the database is:
SELECT DISTINCT
scl1.*, scl2.date_to AS compiled_date_to
FROM
logger_main_config_column_name_loading
JOIN
column_name_loading ON column_name_loading.id = logger_main_config_column_name_loading.column_name_loading_id
JOIN
sensor_config_column_name_loading ON sensor_config_column_name_loading.column_name_loading_id = column_name_loading.id
JOIN
sensor_config_loading AS scl1 ON scl1.id = sensor_config_column_name_loading.sensor_config_loading_id
INNER JOIN (
SELECT id, hash, min(date_from) AS date_from, max(date_to) AS date_to
FROM sensor_config_loading
GROUP BY id, hash
) AS scl2
ON scl1.id = scl2.id AND scl1.hash=scl2.hash AND scl1.date_from=scl2.date_from
WHERE
logger_main_config_loading_id = %(logger_main_config_loading_id)s;
How can this query be optimized? Will running smaller queries in a loop be faster?
pg_stat_database does not show the current size and number of temporary files, it shows cumulative historical data. So your database had 145 temporary files since the statistics were last reset.
Temporary files get deleted as soon as the query is done, no matter if it succeeds or fails.
You get the error because you have some rogue queries that write enough temporary files to fill the disk (perhaps some forgotten join conditions). To avoid the out-of-space condition, set the parameter temp_file_limit in postgresql.conf to a reasonable value and reload PostgreSQL.
When trying to integrate between from Heroku PostgreSQL DB and AWS Postgresql USING stitch data (stitchdata.com) I get the following message without much explanation:
Inconsistent state for stream DBNAME-public-addresses with replication method: null, and bookmark: {}
What is causing this error, and how to fix it?
I also had this problem. The cause is a misleading Stitch settings interface.
Every table you've selected to be synced from a Postgres DB by Stitch needs a "replication method" - options are "Full table" and "Key-based incremental". "Key-based incremental" means only syncing rows whose value is greater than the last row in the previous sync. Think created_at.
When you select a table to by synced from a Postgres DB by Stitch, it will prompt you for a replication method. However, when you select all tables at once using the checkbox in the top left of the list of tables, you aren't prompted for anything at all. Stitch will allow you to save a configuration with no replication methods on any table, which will immediately fail with the cryptic Inconsistent state error in the question.
So, to solve:
open your integration settings
navigate to "Tables to replicate"
manually un-check and re-check each table to be synced
select a replication method when prompted
There is no way to set a replication setting for more than one table at once; it has to be done for each table. When you've done this for all of them, this error will stop being thrown.
Details:
Database: Postgres.
Version: 9.6
Host: Amazon RDS
Problem: After restoring from snapshot, the database is unusably slow.
Why: Something AWS calls the "first touch penalty". When a newly restored instance becomes available, the EBS volume attachment is complete but not all the data has been migrated to the attached EBS volume from S3. Only after initially "touching" the data will RDS realize the data isn't on the EBS volume and it needs to pull it from S3. This completely destroys our performance. We also cannot use dd or fio to pre-touch the data because RDS does not allow access to the mounted EBS volumes.
What I've done: Contact AWS support. They acknowledged that it's a problem, that they are working on it and that the only solution is to select * from all tables.
Why I still need help: The select * strategy did speed things up (I selected everything from the public schema), but not as much as is needed. So I read up on how postgres stores data to disk. There's a heck of a lot on disk that wouldn't be "touched" by a simple select from user-defined tables.
My question: Being limited to only SQL queries/functions and not having direct access to the underlying disk, what are the best sql statements I can use to "touch" as much as possible on the disk in order to get it loaded on the EBS volume from S3?
My suggestion would be to manually trigger a vacuum analyze, this will do a full table scan of each table within scope to update the planner with fresh statistics. You can scope this fairly easily to only a certain schema, the database in question and the Postgres schema for example could help keep total time down if you have multiple databases within the one host.
The operation is rather time consuming and I'm not aware of a good way to parallelize it. There is also the vacuumdb utility but this just runs a query with a vacuum statement in it.
Source: I asked RDS support this very question a few days ago.
[1] https://www.postgresql.org/docs/9.5/static/sql-vacuum.html
edit: will reformat later, on mobile
I have created db2 db with default space 4k,now my application is throwing space issue
SQLCODE: -1585, SQLSTATE: 54048, SQLERRMC: null
no sufficient space allocated for temp tablespace
how to overcome this
can any one help on this.
DB2 has different type of tablespaces
Regular (data, your default TableSpace)
Large (data with long rows)
Catalog (metadata)
User temporary (for temporary tables)
System temporary (for internal operations like sorts, joins, etc that sometimes overflows the memory)
You just need to create a system temporary table in the database to perform the operation you are trying to do. It is highly recommended to create temporary tablespaces (user and system) like SMS type, or automatic type for recent versions. Make sure the pagesize is the correct one.