When I do select * from pg_stat_user_indexes on one of the tables in production, 2 indexes are showing zero for all 3 columns idx_scan, idx_tup_read and idx_tup_fetch.
One of the indexes is showing size of 12GB and the other one around 6.5 GB.
Does it mean, this indexes is not used?
Postgres version 11.8
Related
In Redshift I had a cluster with 4 nodes of the type dc2.large
The total size of the cluster was 160*4=640gb. The system showed 100% storage full. The size of the database was close to 640gb
Query I use to check the size of the db:
select sum(used_mb) from (
SELECT schema as table_schema,
"table" as table_name,
size as used_mb
FROM svv_table_info d order by size desc
)
I added 2 dc2.large nodes - classic resize which set the size of the cluster to 160*6=960gb, but when I checked the size of the database suddenly I saw that it also grew and again takes almost 100% of the cluster with increased size.
Database size grew with the size of the cluster!
I had to perform additional resize operation - elastic one. From 6 nodes to 12 nodes. The size of the data remained close to 960gb
How is it possible that the size of the database grew from 640gb to 960gb as a result of cluster resize operation?
I'd guess that your database has a lot of small tables in it. There are other ways this can happen but this is by far the most likely cause. You see Redshift uses a 1MB "block" as the minimum storage unit which is great for large data table storage but is inefficient for small (< 1M rows per slice in the cluster).
If you have a table that has say 100K rows split across your 4 nodes of dc2.large nodes (8 slices), each slice holds 12.5K rows. Each column for this table will need 1 block (1MB) to store the data. However, a block on average can store 200K rows (per column) so most of the blocks for this table are mostly empty. If you add rows the on-disk size (post vacuum) doesn't increase. Now if you add 50% more nodes you are also adding 50% more slices which just adds 50% more nearly empty blocks to the table's storage.
If this isn't your case I can expand on other ways this can happen but this really is the most likely in my experience. Unfortunately the fix for this is often to revamp your data model or to offload some less used data to Spectrum (S3).
Explanation
I have 2 tables in PostgreSQL using the PostGIS extension. Both tables are representing streets as linestrings from a province.
streetsA table (orange lines) has a table size of 96 MB (471026 rows), the second table streetsB (green lines) has a storage size of 78 MB (139708 rows). The streets differ a bit in their positions, that is why I applied a ST_Snap function to match streetsB to streetsA.
create table snapped as select ST_snap(a.geom, b.geom, ST_Distance(a.geom, b.geom)*0.5) from streetsA as a, streetsB as b;
However due to the large size of the tables, the query takes more than 5 hours to complete. I haven't changed anything in the postgres settings. Is it a good idea to perform the query on such a large dataset? Does a spatial index make sense for this query? I am using a 16GB RAM Laptop with Core i7.
The EXPLAIN method gives me following output:
Nested Loop (cost=0.00..5264516749.25 rows=65806100408 width=32)
Seq Scan on streetsa a (cost=0.00..16938.26 rows=471026 width=153)
Materialize (cost=0.00..12127.62 rows=139708 width=206)
Seq Scan on streets b (cost=0.00..11429.08 rows=139708
The first table size is 10G with 8 million rows, but imported table size is 8G with 6 million rows. Why is there so much less data?
I am loading about 300GB of contour line data in to an postgis table. To speed up the process i read that it is fastest to first load the data, and then create an index. Loading the data only took about 2 days, but now I have been waiting for the index for about 30 days, and it is still not ready.
The query was:
create index idx_contour_geom on contour.contour using gist(geom);
I ran it in pgadmin4, and the memory consumption of the progran has varied from 500MB to 100GB++ since.
Is it normal to use this long time to index such a database?
Any tips on how to speed up the process?
Edit:
The data is loaded from 1x1 degree (lat/lon) cells (about 30.000 cells) so no line has a bounding box larger than 1x1 degree, most of then should be much smaller. They are in EPSG:4326 projection and the only attributes are height and the geometry (geom).
I changed the maintenance_work_mem to 1GB and stopped all other writing to disk (a lot of insert opperations had ANALYZE appended, which took a lot of resources). I now ran in 23min.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the maximum number of columns in a PostgreSQL select query
I am going to start a new project which requires a large number of tables and columns , using postgres I just want to ask that is number of columns in creating postgres tables are limited , If yes then what would be the MAX value for number of columns in CREATE and SELECT statements?
Since Postgres 12, the official list of limitations can be found in the manual:
Item Upper Limit Comment
---------------------------------------------------------
database size unlimited
number of databases 4,294,950,911
relations per database 1,431,650,303
relation size 32 TB with the default BLCKSZ of 8192 bytes
rows per table limited by the number of
tuples that can fit onto 4,294,967,295 pages
columns per table 1600 further limited by tuple size fitting on a single page; see note below
field size 1 GB
identifier length 63 bytes can be increased by recompiling PostgreSQL
indexes per table unlimited constrained by maximum relations per database
columns per index 32 can be increased by recompiling PostgreSQL
partition keys 32 can be increased by recompiling PostgreSQL
Before that, there was an official list on the PostgresL "About" page. Quote for Postgres 9.5:
Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250 - 1600 depending on column types
Maximum Indexes per Table Unlimited
If you get anywhere close to those limits, chances are you are doing something wrong.