How to resolve pg_restore hangup - postgresql

Application creates Postgres backup files using command
pg_dump mydb -ib -Z3 -f mybackup.backup -Fc -h localhost -T firma*.attachme -T firma*.artpilt
In Postgres 9.3 in Windows 8 x64
Creating empty database and using pgadmin to restore from this file in Postgres 9.3 in windows 7 x64
runs forever.
CPU usage by pg_restore is 0% .
postgres log file does not contain any information in normal log level.
Backup file was transferred over web. Its header starts with PGDMP and there are lot of create command in start. Its size is 24MB so restore cannto take long time.
Restore is done to same computer where server exists.
How to restore from backup? How to check .backup file integrity ?
I tried to use 7-zip test option to test it, but 7-zip wrote that it cannot open file.
Update
select * from pg_stat activity
shows number of pg_restore processes (8 jobs was specified on restore since cpu has 8 cores) starting at 10:51 when backup starts. All of them have idle status and start time does not change.
Running this query multiple times does not change result.
930409;"betoontest";8916;10;"postgres";"pg_restore";"::1";"";49755;"2014-11-18 10:51:39.262+02";"";"2014-11-18 10:51:42.064+02";"2014-11-18 10:51:42.094+02";f;"idle";"CREATE INDEX makse_dokumnr_idx ON makse USING btree (dokumnr);
"
930409;"betoontest";9640;10;"postgres";"pg_restore";"::1";"";49760;"2014-11-18 10:51:39.272+02";"";"2014-11-18 10:51:39.662+02";"2014-11-18 10:51:42.044+02";f;"idle in transaction (aborted)";"COPY rid (id, reanr, dokumnr, nimetus, hind, kogus, toode, partii, myygikood, hinnak, kaubasumma, yhik, kulukonto, kuluobjekt, rid2obj, reakuupaev, kogpak, kulum, baasostu, ostuale, rid3obj, rid4obj, rid5obj, rid6obj, rid7obj, rid8obj, rid9obj, kaskogus, a (...)"
930409;"betoontest";8176;10;"postgres";"pg_restore";"::1";"";49761;"2014-11-18 10:51:39.272+02";"";"2014-11-18 10:51:42.064+02";"2014-11-18 10:51:42.094+02";f;"idle";"CREATE INDEX attachme_idmailbox_idx ON attachme USING btree (idmailbox);
"
930409;"betoontest";8108;10;"postgres";"pg_restore";"::1";"";49765;"2014-11-18 10:51:39.272+02";"";"2014-11-18 10:51:42.064+02";"2014-11-18 10:51:42.094+02";f;"idle";"CREATE INDEX makse_kuupaev_kellaeg_idx ON makse USING btree (kuupaev, kellaaeg);
"
930409;"betoontest";8956;10;"postgres";"pg_restore";"::1";"";49764;"2014-11-18 10:51:39.282+02";"";"2014-11-18 10:51:42.074+02";"2014-11-18 10:51:42.094+02";f;"idle";"CREATE INDEX makse_varadokumn_idx ON makse USING btree (varadokumn);
"
930409;"betoontest";11780;10;"postgres";"pg_restore";"::1";"";49763;"2014-11-18 10:51:39.292+02";"";"2014-11-18 10:51:42.064+02";"2014-11-18 10:51:42.094+02";f;"idle";"ALTER TABLE ONLY mitteakt
ADD CONSTRAINT mitteakt_pkey PRIMARY KEY (klient, toode);
"
930409;"betoontest";4680;10;"postgres";"pg_restore";"::1";"";49762;"2014-11-18 10:51:39.292+02";"";"2014-11-18 10:51:42.064+02";"2014-11-18 10:51:42.094+02";f;"idle";"ALTER TABLE ONLY mailbox
ADD CONSTRAINT mailbox_pkey PRIMARY KEY (guid);
"
930409;"betoontest";5476;10;"postgres";"pg_restore";"::1";"";49766;"2014-11-18 10:51:39.332+02";"";"2014-11-18 10:51:42.064+02";"2014-11-18 10:51:42.094+02";f;"idle";"CREATE INDEX makse_kuupaev_idx ON makse USING btree (kuupaev);
Data is restored partially. Maybe file is truncated and postgres or pg_restore waits for data forever. How to prevent such hangups?

Related

Script does not create tables but pgAdmin does

I am running a script that creates a database, some tables with foreign keys and insert some data, but somehow creating the tables is not working, although it doesn't throw any error: I go to pgAdmin, look for the tables created and there's no one...
When I copy the text of my script and execute it into the Query Tool, it works fine and creates the tables.
Can you please explain me what I am doing wrong?
Script:
DROP DATABASE IF EXISTS test01 WITH (FORCE); --drops even if in use
CREATE DATABASE test01
WITH
OWNER = postgres
ENCODING = 'UTF8'
LC_COLLATE = 'German_Germany.1252'
LC_CTYPE = 'German_Germany.1252'
TABLESPACE = pg_default
CONNECTION LIMIT = -1
IS_TEMPLATE = False
;
CREATE TABLE customers
(
customer_id INT GENERATED ALWAYS AS IDENTITY,
customer_name VARCHAR(255) NOT NULL,
PRIMARY KEY(customer_id)
);
CREATE TABLE contacts
(
contact_id INT GENERATED ALWAYS AS IDENTITY,
customer_id INT,
contact_name VARCHAR(255) NOT NULL,
phone VARCHAR(15),
email VARCHAR(100),
PRIMARY KEY(contact_id),
CONSTRAINT fk_customer
FOREIGN KEY(customer_id)
REFERENCES customers(customer_id)
ON DELETE CASCADE
);
INSERT INTO customers(customer_name)
VALUES('BlueBird Inc'),
('Dolphin LLC');
INSERT INTO contacts(customer_id, contact_name, phone, email)
VALUES(1,'John Doe','(408)-111-1234','john.doe#bluebird.dev'),
(1,'Jane Doe','(408)-111-1235','jane.doe#bluebird.dev'),
(2,'David Wright','(408)-222-1234','david.wright#dolphin.dev');
I am calling the script from a Windows console like this:
"C:\Program Files\PostgreSQL\15\bin\psql.exe" -U postgres -f "C:\Users\my user name\Desktop\db_create.sql" postgres
My script is edited in Notepad++ and saved with Encoding set to UTF-8 without BOM, as per a suggestion found here
I see you are using -U postgres command line parameter, and also using database name as last parameter (postgres).
So all your SQL commands was executed while you are connected to postgres database. Of course, CREATE DATABASE command did creation of test01 database, but CREATE TABLE and INSERT INTO did executed not for test01 database, but for postgres database, and all your tables are in postgres database, but not in test01.
You need to split your SQL script into 2 scripts (files): first for 'CREATE DATABASE', second for the rest of.
You need to execute first script as before, like
psql.exe -U postgres -f "db_create_1.sql" postgres
And for second one need to choose the database which was created at 1st step, like
psql.exe -U postgres -f "db_create_2.sql" test01

PostgreSQL pg_dump creates sql script, but it is not a sql script: is there a way to get pg_dump to create a standard sql script?

I'm running pg_dump to create a script to automate the creation of a system like this:
pg_dump --dbname=postgresql://postgres:ohdsi#127.0.0.1:5432/OHDSI -t webapi.* > webapi.sql
This creates a sql script, but it is not really a sql script as it has code in it like what is shown below.
When this script is run as a sql script, it fails giving the error shown below.
Is there a way to get pg_dump to create a sql script that is standard sql and can be executed as a sql script?
Code sample from sql generated by pg_dump:
COPY webapi.cohort_version (asset_id, comment, description, version, asset_json, archived, created_by_id, created_date) FROM stdin;
\.
--
-- Data for Name: concept_of_interest; Type: TABLE DATA; Schema: webapi; Owner: ohdsi_admin_user
--
COPY webapi.concept_of_interest (id, concept_id, concept_of_interest_id) FROM stdin;
1 4329847 4185932
2 4329847 77670
3 192671 4247120
4 192671 201340
Error seen when running the script generated by pg_dump:
--
-- Name: penelope_laertes_uni_pivot id; Type: DEFAULT; Schema: webapi; Owner: ohdsi_admin_user
--
ALTER TABLE ONLY webapi.penelope_laertes_uni_pivot ALTER COLUMN id SET DEFAULT nextval('webapi.penelope_laertes_uni_pivot_id_seq'::regclass)
--
-- Data for Name: achilles_cache; Type: TABLE DATA; Schema: webapi; Owner: ohdsi_admin_user
--
COPY webapi.achilles_cache (id, source_id, cache_name, cache) FROM stdin
Error executing: COPY webapi.achilles_cache (id, source_id, cache_name, cache) FROM stdin
. Cause: org.postgresql.util.PSQLException: ERROR: COPY from stdin failed: COPY commands are only supported using the CopyManager API.
Where: COPY achilles_cache, line 1
Exception in thread "main" java.lang.RuntimeException: org.apache.ibatis.jdbc.RuntimeSqlException: Error executing: COPY webapi.achilles_cache (id, source_id, cache_name, cache) FROM stdin
. Cause: org.postgresql.util.PSQLException: ERROR: COPY from stdin failed: COPY commands are only supported using the CopyManager API.
Where: COPY achilles_cache, line 1
at org.yaorma.database.Database.executeSqlScript(Database.java:344)
at org.yaorma.database.Database.executeSqlScript(Database.java:332)
at org.nachc.tools.fhirtoomop.tools.build.postgres.build.A04_CreateAtlasWebApiTables.exec(A04_CreateAtlasWebApiTables.java:29)
at org.nachc.tools.fhirtoomop.tools.build.postgres.build.A04_CreateAtlasWebApiTables.main(A04_CreateAtlasWebApiTables.java:19)
Caused by: org.apache.ibatis.jdbc.RuntimeSqlException: Error executing: COPY webapi.achilles_cache (id, source_id, cache_name, cache) FROM stdin
. Cause: org.postgresql.util.PSQLException: ERROR: COPY from stdin failed: COPY commands are only supported using the CopyManager API.
Where: COPY achilles_cache, line 1
at org.apache.ibatis.jdbc.ScriptRunner.executeLineByLine(ScriptRunner.java:109)
at org.apache.ibatis.jdbc.ScriptRunner.runScript(ScriptRunner.java:71)
at org.yaorma.database.Database.executeSqlScript(Database.java:342)
... 3 more
Caused by: org.postgresql.util.PSQLException: ERROR: COPY from stdin failed: COPY commands are only supported using the CopyManager API.
Where: COPY achilles_cache, line 1
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:329)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:315)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:291)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:286)
at org.apache.ibatis.jdbc.ScriptRunner.executeStatement(ScriptRunner.java:190)
at org.apache.ibatis.jdbc.ScriptRunner.handleLine(ScriptRunner.java:165)
at org.apache.ibatis.jdbc.ScriptRunner.executeLineByLine(ScriptRunner.java:102)
... 5 more
--- EDIT ------------------------------------
The --inserts method in the accepted answer gave me exactly what I needed.
I ended up doing this:
pg_dump --inserts --dbname=postgresql://postgres:ohdsi#127.0.0.1:5432/OHDSI -t webapi.* > webapi.sql
The client tool you are using to restore the dump cannot deal with the data from the (nonstandard) COPY command being mixed into the script. You need psql to restore such a dump.
You can use the --inserts option of pg_dump to create a dump that contains INSERT statements rather than COPY. That will be slower to restore, but will work with more client tools.
However, your wish to get a standard SQL script is hopeless. PostgreSQL extends the standard in many ways, so a database cannot be dumped with a standard SQL script. Note, for example, that indexes are not defined by the SQL standard. If you are looking to transfer a PostgreSQL dump to a different RDBMS, you will be disappointed. That is more difficult.

PostgreSQL: Segmentation Fault when calling procedure with temporary tables

I apologize for the lack of a better title. I'm at a loss as to what the exact problem could be.
PostgreSQL 13.4
TimescaleDB 2.4.2
Ubuntu 18.04.6 LTS
Running in Docker (Dockerfile further down)
shared_memory 16GB
shm size 2GB
The query at the bottom causes postgres to shut down with the error:
2022-05-09 15:17:42.012 UTC [1] LOG: server process (PID 1316) was terminated by signal 11: Segmentation fault
2022-05-09 15:17:42.012 UTC [1] DETAIL: Failed process was running: CALL process_wifi_traffic_for_range('2022-01-10','2022-01-12')
2022-05-09 15:17:42.012 UTC [1] LOG: terminating any other active server processes
2022-05-09 15:17:42.013 UTC [654] WARNING: terminating connection because of crash of another server process
2022-05-09 15:17:42.013 UTC [654] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-05-09 15:17:42.013 UTC [654] HINT: In a moment you should be able to reconnect to the database and repeat your command.
I want to process traffic data day by day, since my data lake is very big.
First I load the day into the temporary table unprocessed.
Then I extract the difference between the traffic reported the record before and now since the source only gives the total traffic accrued.
The tables wifi_traffic_last_traffic, t_last_traffic are just to keep track of the last known traffic per client form the last time the procedure was run.
ht_wifi_traffic_processed is a timescaledb hypertable, but the error also occurs when I use a normal table. It also does not help to reduce the timeframe to 1 hour instead of one day in case of memory issues. Sometimes it manages to do 1 or 2 days and the data that manages to finish is correct.
The query is a bit long, but I don't want to omit anything:
DECLARE
f_date date = from_date::date;
t_date date = to_date::date;
BEGIN
SET SESSION temp_buffers = '1GB';
CREATE TEMPORARY TABLE t_last_traffic (
clientid text,
devicecpuid text,
traffic bigint,
PRIMARY KEY(clientid,devicecpuid)
);
INSERT INTO t_last_traffic (
SELECT * FROM wifi_traffic_last_traffic
);
CREATE TEMPORARY TABLE unprocessed (
"timestamp" timestamp,
clientid text,
devicecpuid text,
customer text,
traffic bigint
) ON COMMIT DELETE ROWS;
CREATE TEMPORARY TABLE processed (
"timestamp" timestamp,
clientid text,
devicecpuid text,
customer text,
traffic bigint,
PRIMARY KEY("timestamp", clientid, devicecpuid)
) ON COMMIT DELETE ROWS;
LOOP
RAISE NOTICE 'Processing date: %', f_date;
INSERT INTO unprocessed
SELECT wt."timestamp", wt.clientid, wt.devicecpuid, wt.customer, wt.traffic
FROM wifi_traffic AS wt
WHERE wt."timestamp"
BETWEEN
f_date::timestamp
AND
f_date::timestamp + INTERVAL '1 day'
ORDER BY
devicecpuid ASC, --Important to sort by cpuID first as to not mix traffic results.
clientid ASC,
wt."timestamp" ASC;
RAISE NOTICE 'Unprocessed import done';
INSERT INTO processed
SELECT * FROM (
SELECT
up."timestamp",
up.clientid,
up.devicecpuid,
up.customer,
wifi_traffic_lag(
up.traffic,
lag(
up.traffic,
1,
COALESCE(
(
SELECT lt.traffic
FROM t_last_traffic lt
WHERE
lt.clientid = up.clientid
AND
lt.devicecpuid = up.devicecpuid
FETCH FIRST ROW ONLY
),
CAST(0 AS bigint)
)
)
OVER (
PARTITION BY
up.clientid,
up.devicecpuid
ORDER BY
up.clientid,
up.devicecpuid,
up."timestamp"
)
) AS traffic
FROM unprocessed up
WHERE
up.traffic != 0
) filtered
WHERE
filtered.traffic > 0
ON CONFLICT ON CONSTRAINT processed_pkey DO NOTHING;
RAISE NOTICE 'Processed import done';
INSERT INTO t_last_traffic(devicecpuid, clientid, traffic)
SELECT up.devicecpuid, up.clientid, MAX(up.traffic)
FROM unprocessed up
GROUP BY up.devicecpuid, up.clientid
ON CONFLICT ON CONSTRAINT t_last_traffic_pkey DO UPDATE SET
traffic = EXCLUDED.traffic;
INSERT INTO ht_wifi_traffic_processed
SELECT * FROM processed;
TRUNCATE TABLE unprocessed;
TRUNCATE TABLE processed;
COMMIT;
RAISE NOTICE 'Finished processing for date: %', f_date;
f_date = f_date + 1;
EXIT WHEN f_date > t_date;
END LOOP;
INSERT INTO wifi_traffic_last_traffic
SELECT * FROM t_last_traffic
ON CONFLICT ON CONSTRAINT wifi_traffic_last_traffic_pkey DO UPDATE SET
traffic = EXCLUDED.traffic;
DROP TABLE t_last_traffic;
DROP TABLE unprocessed;
DROP TABLE processed;
COMMIT;
END
Docker Compose:
services:
postgres-storage:
image: <redacted>/postgres_gis_tdb:pg13_tdb2.4.2_gis3.1.4
restart: unless-stopped
shm_size: 2gb
ports:
- '5433:5432'
networks:
- bigdata
volumes:
- /mnt/data_storage/psql_data:/var/lib/postgresql/data
- /usr/docker-volumes/postgres-storage:/var/lib/postgresql/ssd
environment:
POSTGRES_USER: postgres
env_file:
- .env
Dockerfile:
FROM postgres:13
ENV POSTGIS_MAJOR 3
ENV POSTGIS_VERSION 3.1.4+dfsg-1.pgdg100+1
### INSTALL POSTGIS ###
RUN apt-get update \
&& apt-cache showpkg postgresql-$PG_MAJOR-postgis-$POSTGIS_MAJOR \
&& apt-get install -y --no-install-recommends \
postgresql-$PG_MAJOR-postgis-$POSTGIS_MAJOR=$POSTGIS_VERSION \
postgresql-$PG_MAJOR-postgis-$POSTGIS_MAJOR-scripts \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /docker-entrypoint-initdb.d
COPY ./initdb-postgis.sh /docker-entrypoint-initdb.d/10_postgis.sh
COPY ./update-postgis.sh /usr/local/bin
### INSTALL TIMESCALEDB ###
# Important: Run timescaledb-tune #
# once for COMPLETELY NEW DATABASES, #
# so no existing postgresql_data. #
### ###
RUN apt-get update \
&& apt-get install -y postgresql-common wget lsb-release
RUN echo "yes" | sh /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
RUN sh -c "echo 'deb [signed-by=/usr/share/keyrings/timescale.keyring] https://packagecloud.io/timescale/timescaledb/debian/ $(ls$
RUN wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | gpg --dearmor -o /usr/share/keyrings/timescale.keyr$
RUN apt-get update \
&& apt-get install -y timescaledb-2-postgresql-13 timescaledb-tools
It seems like the error comes from doing
CREATE TEMPORARY TABLE unprocessed (
"timestamp" timestamp,
clientid text,
devicecpuid text,
customer text,
traffic bigint
) ON COMMIT DELETE ROWS;
As well as:
TRUNCATE TABLE unprocessed;
I did this initially because a test indicated the the ON COMMIT DELETE ROWS wasn't really clearing the table after the COMMIT in the middle of the procedure.
Leaving it out prevented the error from occuring and further tests showed that even without it the data was as expected. It seems to be some sort of race condition.
I will post this in the postgres github as well.

Automate Native Range Partitioning in PostgreSQL 10 for Zabbix 3.4

I would like to automate the process of partitioning a Zabbix 3.4 Database using PostgreSQL's Native Range Partitioning.
Would it be wiser to write a SQL function to perform the following below or to use a shell/python script?
Make sure at least one partition is created ahead of what's needed.
Delete any partition older than x weeks/months; for history 7 days and for trends 1 year
The following below is the solution I came up with for transitioning to PSQL 10 Native Range Partitioning from a PSQL 9.4 populated database with no partitioning.
A. Create a Zabbix Empty PSQL 10 Database.
Ensure you first create an empty Zabbix PSQL 10 DB.
# su postgres
postgres#<>:~$ createuser -P -s -e zabbix
postgres#<>:~$ psql
postgres# create database zabbix;
postgres# grant all privileges on database zabbix to zabbix;
B. Create tables & Native Range Partitions on clock column
Create the tables in the Zabbix DB and implement native range partitioning for the clock column. Below is an example of a manual SQL script that can be fun for history table. Perform this for all the history tables you want to partition via range.
zabbix=# CREATE TABLE public.history
(
itemid bigint NOT NULL,
clock integer NOT NULL DEFAULT 0,
value numeric(20,0) NOT NULL DEFAULT (0)::numeric,
ns integer NOT NULL DEFAULT 0
) PARTITION BY RANGE (clock);
zabbix=# CREATE TABLE public.history_old PARTITION OF public.history
FOR VALUES FROM (MINVALUE) TO (1522540800);
zabbix=# CREATE TABLE public.history_y2018m04 PARTITION OF public.history
FOR VALUES FROM (1522540800) TO (1525132800);
zabbix=# CREATE TABLE public.history_y2018m05 PARTITION OF public.history
FOR VALUES FROM (1525132800) TO (1527811200);
zabbix=# CREATE INDEX ON public.history_old USING btree (itemid, clock);
zabbix=# CREATE INDEX ON public.history_y2018m04 USING btree (itemid, clock);
zabbix=# CREATE INDEX ON public.history_y2018m05 USING btree (itemid, clock);
C. Automate it!
I used a shell script because it is one of the simplest ways to deal with creating new partitions in PSQL 10. Make sure you're always at least one partition ahead of what's needed.
Let's call the script auto_history_tables_monthly.sh.
On a Debian 8 Flavor OS that runs PSQL 10 ensure the script is in a certain directory (I used /usr/local/bin) with correct permissions (chown postgres:postgres /usr/local/bin/auto_history_tables_monthly.sh) and make it executable (chmod u+x /usr/local/bin/auto_history_tables_monthly.sh as postgres user).
Create a cron job (crontab -e) for postgres user with following:
0 0 1 * * /usr/local/bin/auto_history_tables_monthly.sh | psql -d zabbix
This will run the shell script the first of every month.
Below is the script. It uses the date command to leverage the UTC epoch value. It creates a table a month in advance and drops a partition 2 months older. This appears to work well in conjunction with a 31 days of retention for history that is customized to my needs. Ensure the PSQL 10 DB is on UTC time for this use case.
#!/bin/bash
month_diff () {
year=$1
month=$2
delta_month=$3
x=$((12*$year+10#$month-1))
x=$((x+$delta_month))
ry=$((x/12))
rm=$(((x % 12)+1))
printf "%02d %02d\n" $ry $rm
}
month_start () {
year=$1
month=$2
date '+%s' -d "$year-$month-01 00:00:00" -u
}
month_end () {
year=$1
month=$2
month_start $(month_diff $year $month 1)
}
# Year using date
current_year=$(date +%Y)
current_month=$(date +%m)
# Math
next_date=$(month_diff $current_year $current_month 1)
next_year=$(echo $next_date|sed 's/ .*//')
next_month=$(echo $next_date|sed 's/.* //')
start=$(month_start $next_date)
end=$(month_end $next_date)
#next_month_table="public.history_y${next_year}m${next_month}"
# Create next month table for history, history_uint, history_str, history_log, history_text
sql="
CREATE TABLE IF NOT EXISTS public.history_y${next_year}m${next_month} PARTITION OF public.history
FOR VALUES FROM ($start) TO ($end);
\nCREATE TABLE IF NOT EXISTS public.history_uint_y${next_year}m${next_month} PARTITION OF public.history_uint
FOR VALUES FROM ($start) TO ($end);
\nCREATE TABLE IF NOT EXISTS public.history_str_y${next_year}m${next_month} PARTITION OF public.history_str
FOR VALUES FROM ($start) TO ($end);
\nCREATE TABLE IF NOT EXISTS public.history_log_y${next_year}m${next_month} PARTITION OF public.history_log
FOR VALUES FROM ($start) TO ($end);
\nCREATE TABLE IF NOT EXISTS public.history_text_y${next_year}m${next_month} PARTITION OF public.history_text
FOR VALUES FROM ($start) TO ($end);
\nCREATE INDEX on public.history_y${next_year}m${next_month} USING btree (itemid, clock);
\nCREATE INDEX on public.history_uint_y${next_year}m${next_month} USING btree (itemid, clock);
\nCREATE INDEX on public.history_str_y${next_year}m${next_month} USING btree (itemid, clock);
\nCREATE INDEX on public.history_log_y${next_year}m${next_month} USING btree (itemid, clock);
\nCREATE INDEX on public.history_text_y${next_year}m${next_month} USING btree (itemid, clock);
"
echo -e $sql
# Math
prev_date=$(month_diff $current_year $current_month -2)
prev_year=$(echo $prev_date|sed 's/ .*//')
prev_month=$(echo $prev_date|sed 's/.* //')
# Drop last month table for history, history_uint, history_str, history_log, history_text
sql="
DROP TABLE public.history_y${prev_year}m${prev_month};
\nDROP TABLE public.history_uint_y${prev_year}m${prev_month};
\nDROP TABLE public.history_str_y${prev_year}m${prev_month};
\nDROP TABLE public.history_log_y${prev_year}m${prev_month};
\nDROP TABLE public.history_text_y${prev_year}m${prev_month};
"
echo -e $sql
D. Then dump the data from the old database within. I used pg_dump/pg_restore.
I'm sure there are more complex solutions out there but I found this to be simplest for the needs of autopartitioning the Zabbix Database using the PostgreSQL 10 Native Range Partitioning functionality.
Please let me know if you need more details.
I have written detailed notes on using PostgreSQL version 11 together with pgpartman as the mechanism for native table partitioning with Zabbix (version 3.4 as of this writing).
zabbix-postgres-partitioning

PostgreSQL write amplification

I'm trying to find how much stress PostgreSQL puts on disks and results are kind of discouraging so far. Please take a look on methodology, apparently I'm missing something or calculating numbers in a wrong way.
Environment
PostgreSQL 9.6.0-1.pgdg16.04+1 is running inside a separate LXC container with Ubuntu 16.04.1 LTS (kernel version 4.4.0-38-generic, ext4 filesystem on top of SSD), has only one client connection from which I run tests.
I disabled autovacuum to prevent unnecessary writes.
Calculation of written bytes is done by following command, I want to find total number of bytes written by all PostgreSQL processes (including WAL writer):
pgrep postgres | xargs -I {} cat /proc/{}/io | grep ^write_bytes | cut -d' ' -f2 | python -c "import sys; print sum(int(l) for l in sys.stdin)"
Tests
With # sign I marked a database command, with → I marked result of write_bytes sum after the database command. The test case is simple: a table with just one int4 column filled with 10000000 values.
Before every test I run set of commands to free disk space and prevent additional writes:
# DELETE FROM test_inserts;
# VACUUM FULL test_inserts;
# DROP TABLE test_inserts;
Test #1: Unlogged table
As documentation states, changes in UNLOGGED table are not written to WAL log, so it's a good point to start:
# CREATE UNLOGGED TABLE test_inserts (f1 INT);
→ 1526276096
# INSERT INTO test_inserts SELECT generate_series(1, 10000000);
→ 1902977024
The difference is 376700928 bytes (~359MB), which sort of makes sense (ten millions of 4-byte integers + rows, pages and other costs), but still looks a bit too much, almost 10x of actual data size.
Test #2: Unlogged table with primary key
# CREATE UNLOGGED TABLE test_inserts (f1 INT PRIMARY KEY);
→ 2379882496
# INSERT INTO test_inserts SELECT generate_series(1, 10000000);
→ 2967339008
The difference is 587456512 bytes (~560MB).
Test #3: regular table
# CREATE TABLE test_inserts (f1 INT);
→ 6460669952
# INSERT INTO test_inserts SELECT generate_series(1, 10000000);
→ 7603630080
There the difference is already 1142960128 bytes (~1090MB).
Test #4: regular table with primary key
# CREATE TABLE test_inserts (f1 INT PRIMARY KEY);
→ 12740534272
# INSERT INTO test_inserts SELECT generate_series(1, 10000000);
→ 14895218688
Now the difference is 2154684416 bytes (~2054MB) and after about 30 seconds additional 100MB were written.
For this test case I made a breakdown by processes:
Process | Bytes written
/usr/lib/postgresql/9.6/bin/postgres | 0
\_ postgres: 9.6/main: checkpointer process | 99270656
\_ postgres: 9.6/main: writer process | 39133184
\_ postgres: 9.6/main: wal writer process | 186474496
\_ postgres: 9.6/main: stats collector process | 0
\_ postgres: 9.6/main: postgres testdb [local] idle | 1844658176
Any ideas, suggestions on how to measure values I'm looking for correctly? Maybe it's a kernel bug? Or PostgreSQL really does so many writes?
Edit: To double check what write_bytes means I wrote a simple python script that proved, that this value is the actual written bytes value.
Edit 2: For PostgreSQL 9.5 Test case #1 showed 362577920 bytes, test #4 showed 2141343744 bytes, so it's not about PG version.
Edit 3: Richard Huxton mentioned Database Page Layout article and I'd like to elaborate: I agree with the storage cost, that includes 24 bytes of row header, 4 bytes of data itself and even 4 bytes for data alignment (8 bytes usually), which gives 32 bytes per row and with that amount of rows it's about 320MB per table and this is something I got with test #1.
I could assume that primary key in that case should be about the same size as data and it test #4 both, data and PK, would be written to WAL. That gives something like 360MB x 4 = 1.4GB, which is less than result I got.