Retrieve a progress of index creation process in PostgreSQL - postgresql

Consider long-running query in PostgreSQL of index creation, smth like that:
CREATE INDEX some_idx
ON some_table USING btree
(some_varchar_column COLLATE pg_catalog."default");
The question is: how to retrieve the progress of this query process? Is it possible or not?
It is interesting to know the way in both cases:
using pgAdmin
using SQL
using some internal postgreSQL tools.
May be this additional info could influence on the answer: PostgreSQL 9.3 on Windows 7 64 bit.

There's a wiki page on this very topic, which links to several links. Their accuracy is in question as of a few years ago. There's also a thread on hackers from 2006 or 2007 regarding adding progress indicators within which, EnterpriseDBs Greg Stark makes the same point.

In Postgres v12+ the view pg_stat_progress_create_index should give you this information.
https://www.postgresql.org/docs/12/progress-reporting.html

Related

Firebird equivalent to MySQL table partitioning

I'm working with Firebird 2.5. I have used MySQL table partitioning in the past to help optimize very large tables by creating partitions based on year. I would like to do the same thing, if possible, in Firebird but I'm having trouble finding any documentation.
Does anyone know if this is possible and if so, can you please point me toward some documentation?
Firebird does not support table partitioning, which is also why you can't find anything about it in the documentation.
Depending on the exact performance problem you're trying to solve and the queries you use, choosing your indexes well may already solve part of the problem.

DBLINK vs Postgres_FDW, which one may provide better performance?

I have a use case to distribute data across many databases on many servers, all in postgres tables.
From any given server/db, I may need to query another server/db.
The queries are quite basic, standard selects with where clauses on standard fields.
I have currently implemented postgres_FDW, (I'm, using postgres 9.5), but I think the queries are not using indexes on the remote db.
For this use case (a random node may query N other nodes), which is likely my best performance choice based on how each underlying engine actually executes?
The Postgres foreign data wrapper (postgres_FDW) is newer to
PostgreSQL so it tends to be the recommended method. While the
functionality in the dblink extension is similar to that in the
foreign data wrapper, the Postgres foreign data wrapper is more SQL
standard compliant and can provide improved performance over dblink
connections.
Read this article for more detailed info: Cross Database queryng
My solution was simple: I upgraded to Postgres 10, and it appears to push where clauses down to the remote server.

Is it possible to do Sharding in PostgreSQL without any extra plugin?

I want to do sharding in postgresql without using citus plugin.
can anybody suggest how to do ?
Postgresql 10 actually added support for native partitioning, but that was released less than a week ago. Have a look here for some examples of SQL syntax for usage:
https://postgrespro.co.il/blog/whats-new-in-postgresql-10-part-2-native-partitioning/

Postgresql-hll (or another Hyperloglog data type/structure) for Redshift

Need to be able to report on Unique Visitors, but would like to avoid pre-computing every possible permutation of keys and creating multiple tables.
As a simplistic example, let's say I need to report Monthly Uniques in a table that has the following columns
date (Month/Year)
page_id
country_id
device_type_id
monthly_uniques
In Druid and Redis, Hyperloglog data type will take care of this (assuming a small margin of error is acceptable), where I would be able to run a query by any combination of the dimensions and receive a viable estimate of the uniques.
Closest I was able to find in PostgreSQL world is postgresql-hll plugin, but it seems to be for PostgreSQL 9.0+.
Is there a way to represent this in Redshift without either having to pre-compute or store visitor IDs (greatly inflating the table size, but allowing to use RedShift's "approximate count" hll implementation)?
Note: RedShift is the preferred platform, but I already know that other self-hosted PostgreSQL forks can support this, such as CitusDB. Looking for ways to do this with RedShift.
Redshift recently announced support for HyperLogLog Sketches:
https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-redshift-announces-support-hyperloglog-sketches/
https://docs.aws.amazon.com/redshift/latest/dg/hyperloglog-overview.html
UPDATE: blog post on HLL usage https://aws.amazon.com/blogs/big-data/use-hyperloglog-for-trend-analysis-with-amazon-redshift/
Redshift announced new HLL capabilities in October 2020. If your Redshift release version is 1.0.19097 or later, you can use all HLL functions available. See more at AWS Redshift documentation here
You can do something like
SELECT hll(column_name) AS unique_count FROM YOURTABLE;
or create HLL sketches directly
Redshift, while technically postgresql-derived, was forked over ten years ago. It still speaks the same line protocol as postgres, but its code has diverged a great deal. Among other incompatibilities, it no longer allows for custom datatypes. That means that the type of plugin you're looking to use is not going to be feasible.
However, as you pointed out, if you're able to get all the raw data in, you can use the built-in approximation capability.

PostgreSQL how to DROP TABLE with extreme prejudice [duplicate]

This question already has answers here:
How to drop a PostgreSQL database if there are active connections to it?
(12 answers)
Closed 7 years ago.
We use postgres as a real-time data cache for observations. We need to drop our tables on a daily basis. There are frequently clients that still have the db open for reading, actually they have it open for read/write and don't realize it. We have specifically noted that Python opens it rw and keeps a permanent transaction lock on the DB. This prevents us from dropping the tables.
The data table can have different number of columns on a daily basis, so 'delete from table' does not appear to be an option.
We have tried creating a read-only user, but that did not help, it was still getting "IDLE in transaction".
Is there any kind of 'kill -9' for dropping tables?
We are currently on PostgreSQL 8.4 on RHEL 6, but will be migrating to RHEL 7 soon.
If you have administrative access then you can kill all the current sessions. I think your question is similar to this.