Strange behaviour of pg_dump- related to raster data backup - postgresql

I have a PostGIS raster table in PostgreSQL database that contains big amount of raster data. I want to back up this database. The size of database is 65GB.
In pg_dump command I mention that I want to exclude the data of raster tables. However, the size of the back up file gets to 70GB and due to the network issues I have to stop it. I think it is very strange because I am just backing up some small PostGIS tables and I ignore all those heavy raster tables.
I use this command:
pg_dump -U postgres -h 10****** -d my_db --exclude-table-data wis.s* --
exclude-table-data wis.m* --exclude-table-data wis.chan* >
C:\enav_bkup.sql
Then, I tried to take a back up only from a schema where my data is stored "wis" schema. The size of back up file was only 500MB.
It is strange for me!! the full database backup file was more than 70GB but the schema where the data is stored is only 500MB when I back up it with "-n wis" flag.
Then, I used this script to see how big is each of my other schemas:
select * from (
SELECT schema_name,
pg_size_pretty(sum(table_size)) size,
(sum(table_size) / database_size) * 100 as percentage_of_DB
FROM (
SELECT pg_catalog.pg_namespace.nspname as schema_name,
pg_relation_size(pg_catalog.pg_class.oid) as table_size,
sum(pg_relation_size(pg_catalog.pg_class.oid)) over () as
database_size
FROM pg_catalog.pg_class
JOIN pg_catalog.pg_namespace ON relnamespace =
pg_catalog.pg_namespace.oid
) t
GROUP BY schema_name, database_size) foo
order by size desc
I realized that the heaviest schema with 50GB is called "pg_toast". and wis itself is only 6GB.
Is my data in pg_toast instead of wis!
How can I safely back up the database to be sure that I can restore it later without losing any data?
If I do a schema-only back up for wis schema, will I lose the data that is in pg_toast??
Can someone please explain what happened here

Related

Is there a way to filter pg_dump by timestamp for PostgreSQL?

I have a database that needs backing up, but only for specific timestamps and tables, like from the first of October to the 15th of October. After looking up on multiple sites, I have not found any methods that can suit my requirements.
Let's say I have database_A, and database A has 15 tables. I want to be able to use pg_dump to back up 10 tables from database_A, from the 1st of October to the 15th of October, all into 1 file. Below is what I have managed to do, but have not gotten the date portion yet as I'm not entirely sure.
pg_dump -U postgres -t"\"table_1\"" -t"\"table_2\"" database_A > backup.csv
This above code will work if I want to back up multiple tables into one file, and it will back up the entire table, from start to end.
I would much appreciate if someone could help me with this, as I am still mostly a beginner at this. Thank you!
If the data you're copying has a column named timestamp you can use psql and the COPY command to accomplish this:
# Optional: clear existing table since COPY FROM will append data
psql -c "TRUNCATE TABLE my_table" target_db
psql -c "COPY (SELECT * FROM my_table WHERE timestamp >= '...' AND timestamp <= '...') TO STDOUT" source_db | psql -c "COPY my_table FROM STDIN" target_db
You can repeat this pattern for as many tables as necessary. I've used this approach before to copy a subset of live data into a development database and it works quite well, especially if you put the above commands into a shell script.

Low performance of postgres_fdw extension

I need to periodically copy data from the TMP database to the remote PROD database with some data modifications in columns.
When I use the postgres_fdw extension from the PROD database (with mapping foreign schema), the process of copying a million records lasts 6 minutes.
insert into prod.foreign_schema.foreign_table
(select * from tmp.public.table limit 1000000);
However, when I use the dblink to copy the same table from the PROD database (SQL is running on the PROD database, not on the TEMP), the process lasts 20 seconds.
insert into prod.public.table
(select * from dblink('host=192.1... port=5432 dbname=... user=… password=…. connect_timeout=2', 'select * from tmp.production.table limit 1000000') as tab (id integer…..)
);
How can I optimize and shorten the process of copying data from the TEMP database?
I have to run SQL commands on the TMP database.
TMP and PROD database are in this same versions (10).
The first statement will effectively run many small inserts, albeit with a prepared statement, so you don't have the planning overhead each time. So you'll have more round trips between the two servers, which probably is the reason for the difference.

Why does pg_dump create a gigantic file?

I am currently trying to back up a postgres 10.x database. If I check the size of the database using the snippet here: https://wiki.postgresql.org/wiki/Disk_Usage#Finding_the_largest_databases_in_your_cluster , the database is 180MB. However, if I use
pg_dump my_database > my_database_backup
to backup the data, it creates a file over 2GB in size. Any thoughts on why pg_dump would be creating a backup file that is over 10x the size of the raw data? I assume that the inline sql commands might cause some of a file size increase, but 10x seems a bit extreme to me.
Edit: the specific query done to check db size (there are only 2 databases on this server, so it was within the limit 20)
SELECT d.datname AS Name, pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END AS SIZE
FROM pg_catalog.pg_database d
ORDER BY
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END DESC -- nulls first
LIMIT 20

SQL Server 2012 Express Edition Database Size

We have a requirement in our project to store millions of records(~100 million) in database.
And we know that SQL Express Edition 2012 can maximum accommodate 10GB of data.
I am using this query to get the actual size of the database - Is this right?
use [Bio Lambda8R32S50X]
SELECT DB_NAME(database_id) AS DatabaseName,
Name AS Logical_Name,
Physical_Name, (size*8)/1024 SizeMB
FROM sys.master_files
WHERE DB_NAME(database_id) = 'Bio Lambda8R32S50X'
GO
SET NOCOUNT ON
DBCC UPDATEUSAGE(0)
-- Table row counts and sizes.
CREATE TABLE #t
(
[name] NVARCHAR(128),
[rows] CHAR(11),
reserved VARCHAR(18),
data VARCHAR(18),
index_size VARCHAR(18),
unused VARCHAR(18)
)
INSERT #t EXEC sp_msForEachTable 'EXEC sp_spaceused ''?'''
SELECT *
FROM #t
-- # of rows.
SELECT SUM(CAST([rows] AS int)) AS [rows]
FROM #t
DROP TABLE #t
The second question is this restriction is only on the database size of the Primary file group or inclusive of the log files as well?
If we do a lot of delete and insert, or may be delete and insert back the same number of records, does the database size vary or remains the same?
This is very crucial, since this will decide whether we can go ahead with SQL Server 2012 Express Edition or not?
Thanks and regards
Subasish
I can see that the first query is to get the overall size of the database for the data and logs. The second one is for each table. So I would say yes to both.
Based upon my experience seeing db's over 40GB and this linkmaximum DB size limits that the limit on sql server express is based upon the mdf and ndf files not the ldf.
You might be safer however, just to go with SQL Server Standard and use CAL licensing in case your database starts growing.
Good Luck!

ORA-01652 Unable to extend temp segment by in tablespace

I am creating a table like
create table tablename
as
select * for table2
I am getting the error
ORA-01652 Unable to extend temp segment by in tablespace
When I googled I usually found ORA-01652 error showing some value like
Unable to extend temp segment by 32 in tablespace
I am not getting any such value.I ran this query
select
fs.tablespace_name "Tablespace",
(df.totalspace - fs.freespace) "Used MB",
fs.freespace "Free MB",
df.totalspace "Total MB",
round(100 * (fs.freespace / df.totalspace)) "Pct. Free"
from
(select
tablespace_name,
round(sum(bytes) / 1048576) TotalSpace
from
dba_data_files
group by
tablespace_name
) df,
(select
tablespace_name,
round(sum(bytes) / 1048576) FreeSpace
from
dba_free_space
group by
tablespace_name
) fs
where
df.tablespace_name = fs.tablespace_name;
Taken from: Find out free space on tablespace
and I found that the tablespace I am using currently has around 32Gb of free space. I even tried creating table like
create table tablename tablespace tablespacename
as select * from table2
but I am getting the same error again. Can anyone give me an idea, where the problem is and how to solve it. For your information the select statement would fetch me 40,000,000 records.
I found the solution to this. There is a temporary tablespace called TEMP which is used internally by database for operations like distinct, joins,etc. Since my query(which has 4 joins) fetches almost 50 million records the TEMP tablespace does not have that much space to occupy all data. Hence the query fails even though my tablespace has free space.So, after increasing the size of TEMP tablespace the issue was resolved. Hope this helps someone with the same issue. Thanks :)
Create a new datafile by running the following command:
alter tablespace TABLE_SPACE_NAME add datafile 'D:\oracle\Oradata\TEMP04.dbf'
size 2000M autoextend on;
You don't need to create a new datafile; you can extend your existing tablespace data files.
Execute the following to determine the filename for the existing tablespace:
SELECT * FROM DBA_DATA_FILES;
Then extend the size of the datafile as follows (replace the filename with the one from the previous query):
ALTER DATABASE DATAFILE 'D:\ORACLEXE\ORADATA\XE\SYSTEM.DBF' RESIZE 2048M;
I encountered the same error message but don't have any access to the table like "dba_free_space" because I am not a dba. I use some previous answers to check available space and I still have a lot of space. However, after reducing the full table scan as many as possible. The problem is solved. My guess is that Oracle uses temp table to store the full table scan data. It the data size exceeds the limit, it will show the error. Hope this helps someone with the same issue