Why does pg_dump create a gigantic file?

Why does pg_dump create a gigantic file? - postgresql

I am currently trying to back up a postgres 10.x database. If I check the size of the database using the snippet here: https://wiki.postgresql.org/wiki/Disk_Usage#Finding_the_largest_databases_in_your_cluster , the database is 180MB. However, if I use
pg_dump my_database > my_database_backup
to backup the data, it creates a file over 2GB in size. Any thoughts on why pg_dump would be creating a backup file that is over 10x the size of the raw data? I assume that the inline sql commands might cause some of a file size increase, but 10x seems a bit extreme to me.
Edit: the specific query done to check db size (there are only 2 databases on this server, so it was within the limit 20)
SELECT d.datname AS Name, pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END AS SIZE
FROM pg_catalog.pg_database d
ORDER BY
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END DESC -- nulls first
LIMIT 20

Related

Strange behaviour of pg_dump- related to raster data backup

I have a PostGIS raster table in PostgreSQL database that contains big amount of raster data. I want to back up this database. The size of database is 65GB.
In pg_dump command I mention that I want to exclude the data of raster tables. However, the size of the back up file gets to 70GB and due to the network issues I have to stop it. I think it is very strange because I am just backing up some small PostGIS tables and I ignore all those heavy raster tables.
I use this command:
pg_dump -U postgres -h 10****** -d my_db --exclude-table-data wis.s* --
exclude-table-data wis.m* --exclude-table-data wis.chan* >
C:\enav_bkup.sql
Then, I tried to take a back up only from a schema where my data is stored "wis" schema. The size of back up file was only 500MB.
It is strange for me!! the full database backup file was more than 70GB but the schema where the data is stored is only 500MB when I back up it with "-n wis" flag.
Then, I used this script to see how big is each of my other schemas:
select * from (
SELECT schema_name,
pg_size_pretty(sum(table_size)) size,
(sum(table_size) / database_size) * 100 as percentage_of_DB
FROM (
SELECT pg_catalog.pg_namespace.nspname as schema_name,
pg_relation_size(pg_catalog.pg_class.oid) as table_size,
sum(pg_relation_size(pg_catalog.pg_class.oid)) over () as
database_size
FROM pg_catalog.pg_class
JOIN pg_catalog.pg_namespace ON relnamespace =
pg_catalog.pg_namespace.oid
) t
GROUP BY schema_name, database_size) foo
order by size desc
I realized that the heaviest schema with 50GB is called "pg_toast". and wis itself is only 6GB.
Is my data in pg_toast instead of wis!
How can I safely back up the database to be sure that I can restore it later without losing any data?
If I do a schema-only back up for wis schema, will I lose the data that is in pg_toast??
Can someone please explain what happened here

How to load bulk data to table from table using query as quick as possible? (postgresql)

I have a large table(postgre_a) which has 0.1 billion records with 100 columns. I want to duplicate this data into the same table.
I tried to do this using sql
INSERT INTO postgre_a select i1 + 100000000, i2, ... FROM postgre_a;
However, this query is running more than 10 hours now... so I want to do this more faster. I tried to do this with copy, but I cannot find the way to use copy from statement with query.
Is there any other method can do this faster?

You cannot directly use a query in COPY FROM, but maybe you can use COPY FROM PROGRAM with a query to do what you want:
COPY postgre_a
FROM PROGRAM '/usr/pgsql-10/bin/psql -d test'
' -c ''copy (SELECT i1+ 100000000, i2, ... FROM postgre_a) TO STDOUT''';
(Of course you have to replace the path to psql and the database name with your values.)
I am not sure if that is faster than using INSERT, but it is worth a try.
You should definitely drop all indexes and constraints before the operation and recreate them afterwards.

How to find database name of \data\base postgres folders?

I have a large folder of 70 GB in my postgres installation under:
D:\Program Files\PostgreSQL\9.5\data\base\130205
Question: how could I find out which database is based on that folder?
I have like 10 databases running on the same server, and most of them having a tablespace on a different drive.
But probably I'm missing a mapping somewhere, maybe a large index or kind of. How can I find out the "causing" database of these amounts of data?

Just run oid2name as PostgreSQL operating system user.

Thanks to the hint of #a_horse, the following statement shows the oid and table names:
SELECT oid,* from pg_database

You can use the following query to find the size of your largest databases (taken from here):
SELECT d.datname AS Name, pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END AS SIZE
FROM pg_catalog.pg_database d
ORDER BY
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END DESC -- nulls first
LIMIT 20

you can use this syntax. it takes the ID and the name of the database from the pg_databse.
$ select pg_database.datname,pg_database.oid from pg_database;

SQL Server 2012 Express Edition Database Size

We have a requirement in our project to store millions of records(~100 million) in database.
And we know that SQL Express Edition 2012 can maximum accommodate 10GB of data.
I am using this query to get the actual size of the database - Is this right?
use [Bio Lambda8R32S50X]
SELECT DB_NAME(database_id) AS DatabaseName,
Name AS Logical_Name,
Physical_Name, (size*8)/1024 SizeMB
FROM sys.master_files
WHERE DB_NAME(database_id) = 'Bio Lambda8R32S50X'
GO
SET NOCOUNT ON
DBCC UPDATEUSAGE(0)
-- Table row counts and sizes.
CREATE TABLE #t
(
[name] NVARCHAR(128),
[rows] CHAR(11),
reserved VARCHAR(18),
data VARCHAR(18),
index_size VARCHAR(18),
unused VARCHAR(18)
)
INSERT #t EXEC sp_msForEachTable 'EXEC sp_spaceused ''?'''
SELECT *
FROM #t
-- # of rows.
SELECT SUM(CAST([rows] AS int)) AS [rows]
FROM #t
DROP TABLE #t
The second question is this restriction is only on the database size of the Primary file group or inclusive of the log files as well?
If we do a lot of delete and insert, or may be delete and insert back the same number of records, does the database size vary or remains the same?
This is very crucial, since this will decide whether we can go ahead with SQL Server 2012 Express Edition or not?
Thanks and regards
Subasish

I can see that the first query is to get the overall size of the database for the data and logs. The second one is for each table. So I would say yes to both.
Based upon my experience seeing db's over 40GB and this linkmaximum DB size limits that the limit on sql server express is based upon the mdf and ndf files not the ldf.
You might be safer however, just to go with SQL Server Standard and use CAL licensing in case your database starts growing.
Good Luck!

ORA-01652 Unable to extend temp segment by in tablespace

I am creating a table like
create table tablename
as
select * for table2
I am getting the error
ORA-01652 Unable to extend temp segment by in tablespace
When I googled I usually found ORA-01652 error showing some value like
Unable to extend temp segment by 32 in tablespace
I am not getting any such value.I ran this query
select
fs.tablespace_name "Tablespace",
(df.totalspace - fs.freespace) "Used MB",
fs.freespace "Free MB",
df.totalspace "Total MB",
round(100 * (fs.freespace / df.totalspace)) "Pct. Free"
from
(select
tablespace_name,
round(sum(bytes) / 1048576) TotalSpace
from
dba_data_files
group by
tablespace_name
) df,
(select
tablespace_name,
round(sum(bytes) / 1048576) FreeSpace
from
dba_free_space
group by
tablespace_name
) fs
where
df.tablespace_name = fs.tablespace_name;
Taken from: Find out free space on tablespace
and I found that the tablespace I am using currently has around 32Gb of free space. I even tried creating table like
create table tablename tablespace tablespacename
as select * from table2
but I am getting the same error again. Can anyone give me an idea, where the problem is and how to solve it. For your information the select statement would fetch me 40,000,000 records.

I found the solution to this. There is a temporary tablespace called TEMP which is used internally by database for operations like distinct, joins,etc. Since my query(which has 4 joins) fetches almost 50 million records the TEMP tablespace does not have that much space to occupy all data. Hence the query fails even though my tablespace has free space.So, after increasing the size of TEMP tablespace the issue was resolved. Hope this helps someone with the same issue. Thanks :)

Create a new datafile by running the following command:
alter tablespace TABLE_SPACE_NAME add datafile 'D:\oracle\Oradata\TEMP04.dbf'
size 2000M autoextend on;

You don't need to create a new datafile; you can extend your existing tablespace data files.
Execute the following to determine the filename for the existing tablespace:
SELECT * FROM DBA_DATA_FILES;
Then extend the size of the datafile as follows (replace the filename with the one from the previous query):
ALTER DATABASE DATAFILE 'D:\ORACLEXE\ORADATA\XE\SYSTEM.DBF' RESIZE 2048M;

I encountered the same error message but don't have any access to the table like "dba_free_space" because I am not a dba. I use some previous answers to check available space and I still have a lot of space. However, after reducing the full table scan as many as possible. The problem is solved. My guess is that Oracle uses temp table to store the full table scan data. It the data size exceeds the limit, it will show the error. Hope this helps someone with the same issue