To what degree does PostgreSQL support parallel DDL? - postgresql

Looking here, it is clear that Oracle supports execution of DDL commands in parallel with scenarios clearly listed. I was wondering whether Postgres does indeed offer such functionality? I can find a lot of material on "parallel queries" for PostgreSQL but not so much when DDL is involved.
For example, can I execute multiple 'CREATE TABLE...AS SELECT' in parallel? And if not, how can I achieve such functionality? What happens if I have a temporary table (CREATE TEMP TABLE)? Do I need to configure something for locks?

From here:
Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are true:
The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within
a CTE, no parallel plans for that query will be generated.
(emphasis mine).
Which seems to suggest that Postgres will not "parallelize" any query that modifies the database structure, under any circumstances.
Running multiple queries simultaneously in Postgres requires one connection per running query.

Those are generic DDL statements, they are index operations and partition operations that can be parallelized.
If you check the Notes section of the CREATE INDEX statement, you'll see that parallel index building is supported :
PostgreSQL can build indexes while leveraging multiple CPUs in order to process the table rows faster. This feature is known as parallel index build. For index methods that support building indexes in parallel (currently, only B-tree), maintenance_work_mem specifies the maximum amount of memory that can be used by each index build operation as a whole, regardless of how many worker processes were started. Generally, a cost model automatically determines how many worker processes should be requested, if any.
Update
I suspect the real question is about CREATE TABLE ... AS though.
This is essentially a CREATE TABLE followed by an INSERT .. SELECT. The CREATE TABLE part can't be parallelized and doesn't have to - it's essentially a metadata operation. The SELECT on the other hand, could be parallelized easily. INSERT is a bit harder, but it's a matter of implementation.
As a_horse_with_no_name explains in a comment to this question, parallelization for CREATE TABLE AS was added in PostgreSQL 11 :
Improvements to parallelism, including:
CREATE INDEX can now use parallel processing while building a B-tree index
Parallelization is now possible in CREATE TABLE ... AS, CREATE MATERIALIZED VIEW, and certain queries using UNION
Parallelized hash joins and parallelized sequential scans now perform better

Related

CREATE INDEX CONCURRENTLY is executed but the CONCURRENTLY option lost after creation?

In my Postgres 14.3 database on AWS RDS, I want to create an index without blocking other database operations. So I want to use the CONCURRENTLY option and I executed the following statement successfully.
CREATE INDEX CONCURRENTLY idx_test
ON public.ap_identifier USING btree (cluster_id);
But when checking the database with:
SELECT * FROM pg_indexes WHERE indexname = 'idx_test';
I only see: Index without CONCURRENTLY option
I am expecting the index is created with CONCURRENTLY option.
Is there any database switch to turn this feature on, or why does it seem to ignore CONCURRENTLY?
As has been commented, CONCURRENTLY is not a property of the index, but an instruction to create the index without blocking concurrent writes. The resulting index does not remember that option in any way. Read the chapter "Building Indexes Concurrently" in the manual.
Creating indexes on big tables can take a while. The system table
pg_stat_progress_create_index can be queried for progress reporting. While that is going on, CONCURRENTLY is still reflected in the command column.
To your consolation: once created, all indexes are "concurrent" anyway, in the sense that they are maintained automatically without blocking concurrent reads or writes (except for UNIQUE indexes that prevent duplicates.)

Does PostgreSQL automatically create an index of a table?

In PostgreSQL,
when I create a table, and doesn't create any index for it, will PostgreSQL automatically create some default index for the table?
When I later update and query the table several times, will PostgreSQL be smart enough to automatically create an index for me based on how and how often I update and query the table?
If not, what commands in PostgreSQL can help me manually choose an index that will improve the performance of the table?
Thanks.
No database engine will create indexes on its own. Indexes have an important impact on performance (when modifying the records), and it's your role to know and calculate the performance gain or drop to take a clever/informed decision. The only index which is automatically created is the PrimaryKey index.
The only thing your database engine will be "smart" about, is when and how to use the indexes which already exists. This is called the query optimizer, and it bases its decision on complex algorithms and internal statistics.
There are tools to analyze how the database works to suggest some indexes. But the best, and simplest way, is to use an EXPLAIN.
https://www.postgresql.org/docs/9.5/static/sql-explain.html

How does postgresql lock tables when inserting and selecting?

I'm migrating data from one table to another in an environment where any long locks or downtime is not acceptable, in total about 80000 rows. Essentially the query boils down to this simple case:
INSERT INTO table_2
SELECT * FROM table_1
JOIN table_3 on table_1.id = table_3.id
All 3 tables are being read from and could have an insert at any time. I want to just run the query above, but I'm not sure how the locking works and whether the tables will be totally inaccessible during the operation. My understanding tells me that only the affected rows (newly inserted) will be locked. Table 1 is just being selected, so no harm, and concurrent inserts are safe so table 2 should be freely accessible.
Is this understanding correct, and can I run this query in a production environment without fear? If it's not safe, what is the standard way to accomplish this?
You're fine.
If you're interested in the details, you can read up on multiversion concurrency control, or on the details of the Postgres MVCC implementation, or how its various locking modes interact, but the implications for your case are nicely summarised in the docs:
reading never blocks writing and writing never blocks reading
In short, every record stored in the database has some version number attached to it, and every query knows which versions to consider and which to ignore.
This means that an INSERT can safely write to a table without locking it, as any concurrent queries will simply ignore the new rows until the inserting transaction decides to commit.

How to perform a Bulk Insert in Sybase SQL

I need to insert a Big amount of data(Some Millions) and I need to perform it Quickly.
I read about Bulk insert via ODBC on .NET and JAVA But I need to perform it directly on the Database.
I also read about Batch Insert but What I have tried have not seemed to work
Batch Insert, Example
I'm executing a INSERT SELECT but it's taking something like 0,360s per row, this is very slow and I need to perform some improvements here.
I would really appreciate some guidance here with examples and documentation if possible.
DATABASE: SYBASE ASE 15.7
Expanding on some of the comments ...
blocking, slow disk IO, and any other 'wait' events (ie, anything other than actual insert/update activity) can be ascertained from the master..monProcessWaits table (where SPID = spid_of_your_insert_update_process) [see the P&T manual for Monitoring Tables (aka MDA tables)]
master..monProcessObject and master..monProcessStatement will show logical/physical IOs for currently running queries [again, see P&T manual for MDA tables]
master..monSysStatement will show logical/physical IOs for recently completed queries [again, see P&T manual for MDA tables]
for UPDATE statements you'll want to take a look at the query plan to see if you're suffering from a poor join order; also of key importance ... direct (fast/good) updates vs deferred (slow/bad) updates; deferred updates can occur for many reasons ... some fixable, some not ... updating indexed columns, poor join order, updates that cause page splits and/or row forwardings
RI (PK/FK) constraints can be viewed with sp_helpconstraint table_name; query plans will also show the under-the-covers joins required when performing RI (PK/FK) validations during inserts/updates/deletes
triggers are a bit harder to locate (an official sp_helptrigger doesn't show up until ASE 16); check the sysobjects.[ins|upd|del]trig where name = your_table - these represent the object id(s) of any insert/update/delete triggers on the table; also check sysobjects records where type = 'TR' and deltrig = object_id(your_table) - provides support for additional insert/update/delete triggers (don't recall at moment if this is just ASE 16+)
if triggers are being fired, need to review the associated query plans to make sure the inserted and deleted tables (if referenced) are driving any queries where these pseudo tables are joined with permanent tables
There are likely some areas I'm forgetting (off the top of my head) ... key take away is that there could be many reasons for 'slow' DML statements.
One (relatively) quick way to find out if RI (PK/FK) constraints or triggers are at play ...
set showplan on
go
insert/update/delete statements
go
Then review the resulting query plan(s); if you see references to any tables other than the ones explicitly listed in the insert/update/delete statements then you're likely dealing with RI constraints and/or triggers.

oracle indexes to be executed in parallel

In oracle, Can i run index rebuild in parallel , all indexes belonging to same table.
We have a dyanmic sql procedure for rebuilding indexes, so can call that procedure in parallel for 3 indexes belonging to same table.
Regards,
Shi
I think it should cause no problems, you might want to check if the columns of the index overlap or not.
refer this for more info:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2913600659112
Thanks
You must create indexes on existing table with data with PARALLEL ON and you can recreate them afterwards with NO-PARALLEL to avoid further processing degradation while insert data.