Show Disk Space Used for all Tables - Azure SQL Data Warehouse - tsql

We are transitioning across to Azure SQL Data Warehouse - and an issue that's been highlighted is the need to change some smaller tables from Round-Robin / Hash-distributed to Replicated to improve performance.
MS Design Guidance (See Here) suggests one criteria for this decision is Tables that take up less than 2Gb Disk Space. i.e. these tables could be made into Replicated tables. They suggest using DBCC PDW_SHOWSPACEUSED to determine this.
I can run this against the whole DB, or one specific table, but i'd really like to get a list of all tables and the space used (preferably in MB) - but it's beyond me.
A lot of google searching either gives me the two basic commands I already know (against the whole DB / against 1 table) or give me SQL Server queries that don't run against Azure DW - e.g. using sys.allocation_units - which is not supported in Azure DW.

I was just directed to this Microsoft article that provides a pretty solid solution to this problem.
In particular, create a view:
CREATE VIEW dbo.vTableSizes
AS
WITH base
AS (
SELECT
GETDATE() AS [execution_time],
DB_NAME() AS [database_name],
s.name AS [schema_name],
t.name AS [table_name],
QUOTENAME(s.name) + '.' + QUOTENAME(t.name) AS [two_part_name],
nt.[name] AS [node_table_name],
ROW_NUMBER() OVER (PARTITION BY
nt.[name]
ORDER BY
(
SELECT
NULL
)
) AS [node_table_name_seq],
tp.[distribution_policy_desc] AS [distribution_policy_name],
c.[name] AS [distribution_column],
nt.[distribution_id] AS [distribution_id],
i.[type] AS [index_type],
i.[type_desc] AS [index_type_desc],
nt.[pdw_node_id] AS [pdw_node_id],
pn.[type] AS [pdw_node_type],
pn.[name] AS [pdw_node_name],
di.name AS [dist_name],
di.position AS [dist_position],
nps.[partition_number] AS [partition_nmbr],
nps.[reserved_page_count] AS [reserved_space_page_count],
nps.[reserved_page_count] - nps.[used_page_count] AS [unused_space_page_count],
nps.[in_row_data_page_count] + nps.[row_overflow_used_page_count] + nps.[lob_used_page_count] AS [data_space_page_count],
nps.[reserved_page_count] - (nps.[reserved_page_count] - nps.[used_page_count])
- ([in_row_data_page_count] + [row_overflow_used_page_count] + [lob_used_page_count]) AS [index_space_page_count],
nps.[row_count] AS [row_count]
FROM
sys.schemas s
INNER JOIN
sys.tables t
ON s.[schema_id] = t.[schema_id]
INNER JOIN
sys.indexes i
ON t.[object_id] = i.[object_id]
AND i.[index_id] <= 1
INNER JOIN
sys.pdw_table_distribution_properties tp
ON t.[object_id] = tp.[object_id]
INNER JOIN
sys.pdw_table_mappings tm
ON t.[object_id] = tm.[object_id]
INNER JOIN
sys.pdw_nodes_tables nt
ON tm.[physical_name] = nt.[name]
INNER JOIN
sys.dm_pdw_nodes pn
ON nt.[pdw_node_id] = pn.[pdw_node_id]
INNER JOIN
sys.pdw_distributions di
ON nt.[distribution_id] = di.[distribution_id]
INNER JOIN
sys.dm_pdw_nodes_db_partition_stats nps
ON nt.[object_id] = nps.[object_id]
AND nt.[pdw_node_id] = nps.[pdw_node_id]
AND nt.[distribution_id] = nps.[distribution_id]
AND i.[index_id] = nps.[index_id]
LEFT OUTER JOIN
(
SELECT
*
FROM
sys.pdw_column_distribution_properties
WHERE
distribution_ordinal = 1
) cdp
ON t.[object_id] = cdp.[object_id]
LEFT OUTER JOIN
sys.columns c
ON cdp.[object_id] = c.[object_id]
AND cdp.[column_id] = c.[column_id]
WHERE
pn.[type] = 'COMPUTE'),
size
AS ( SELECT
[execution_time],
[database_name],
[schema_name],
[table_name],
[two_part_name],
[node_table_name],
[node_table_name_seq],
[distribution_policy_name],
[distribution_column],
[distribution_id],
[index_type],
[index_type_desc],
[pdw_node_id],
[pdw_node_type],
[pdw_node_name],
[dist_name],
[dist_position],
[partition_nmbr],
[reserved_space_page_count],
[unused_space_page_count],
[data_space_page_count],
[index_space_page_count],
[row_count],
([reserved_space_page_count] * 8.0) AS [reserved_space_KB],
([reserved_space_page_count] * 8.0) / 1000 AS [reserved_space_MB],
([reserved_space_page_count] * 8.0) / 1000000 AS [reserved_space_GB],
([reserved_space_page_count] * 8.0) / 1000000000 AS [reserved_space_TB],
([unused_space_page_count] * 8.0) AS [unused_space_KB],
([unused_space_page_count] * 8.0) / 1000 AS [unused_space_MB],
([unused_space_page_count] * 8.0) / 1000000 AS [unused_space_GB],
([unused_space_page_count] * 8.0) / 1000000000 AS [unused_space_TB],
([data_space_page_count] * 8.0) AS [data_space_KB],
([data_space_page_count] * 8.0) / 1000 AS [data_space_MB],
([data_space_page_count] * 8.0) / 1000000 AS [data_space_GB],
([data_space_page_count] * 8.0) / 1000000000 AS [data_space_TB],
([index_space_page_count] * 8.0) AS [index_space_KB],
([index_space_page_count] * 8.0) / 1000 AS [index_space_MB],
([index_space_page_count] * 8.0) / 1000000 AS [index_space_GB],
([index_space_page_count] * 8.0) / 1000000000 AS [index_space_TB]
FROM
base)
SELECT
*
FROM
size;
Then, this "Table space summary" query provides a list of tables and how much space each is currently using (among other information):
SELECT
database_name,
schema_name,
table_name,
distribution_policy_name,
distribution_column,
index_type_desc,
COUNT(DISTINCT partition_nmbr) AS nbr_partitions,
SUM(row_count) AS table_row_count,
SUM(reserved_space_GB) AS table_reserved_space_GB,
SUM(data_space_GB) AS table_data_space_GB,
SUM(index_space_GB) AS table_index_space_GB,
SUM(unused_space_GB) AS table_unused_space_GB
FROM
dbo.vTableSizes
GROUP BY
database_name,
schema_name,
table_name,
distribution_policy_name,
distribution_column,
index_type_desc
ORDER BY
table_reserved_space_GB DESC;
Here's a sample execution against one of my databases showing the table names sorted by space used.

Related

The intensity of the insert operation in a table of MS SQL Server 2008 R2

I need to know how many INSERT operations occur per second for a table.
I've tried to get some scripts from here: https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/track-data-changes-sql-server?view=sql-server-2017 but there is nothing to help with.
Any clue? Thanks!
Finally I found this script to get count of records. So if execute this script periodically we can build diagram like time/record cound per table.
SELECT
t.NAME AS TableName,
p.rows AS RowCounts,
CAST(ROUND(((SUM(a.total_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS TotalSpaceMB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
CAST(ROUND(((SUM(a.used_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS UsedSpaceMB,
CAST(ROUND(((SUM(a.total_pages) - SUM(a.used_pages)) * 8) / 1024.00, 2) AS NUMERIC(36, 2)) AS UnusedSpaceMB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN
sys.schemas s ON t.schema_id = s.schema_id
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
GROUP BY
t.Name, s.Name, p.Rows
ORDER BY RowCounts desc

Postgresql: Select query on view returning no records

I have a view named vw_check_space in my public schema (using postgresql 9.4). When I run a
select * from public.vw_check_space;
as a postgres user, I get a list of rows but when I run the same query by another user 'user1', it returns nothing.
View:
CREATE OR REPLACE VIEW public.vw_check_space AS
WITH constants AS (
SELECT current_setting('block_size'::text)::numeric AS bs,
23 AS hdr,
8 AS ma
), no_stats AS (
SELECT columns.table_schema,
columns.table_name,
psut.n_live_tup::numeric AS est_rows,
pg_table_size(psut.relid::regclass)::numeric AS table_size
FROM columns
JOIN pg_stat_user_tables psut ON columns.table_schema::name = psut.schemaname AND columns.table_name::name = psut.relname
LEFT JOIN pg_stats ON columns.table_schema::name = pg_stats.schemaname AND columns.table_name::name = pg_stats.tablename AND columns.column_name::name = pg_stats.attname
WHERE pg_stats.attname IS NULL AND (columns.table_schema::text <> ALL (ARRAY['pg_catalog'::character varying, 'information_schema'::character varying]::text[]))
GROUP BY columns.table_schema, columns.table_name, psut.relid, psut.n_live_tup
), null_headers AS (
SELECT constants.hdr + 1 + sum(
CASE
WHEN pg_stats.null_frac <> 0::double precision THEN 1
ELSE 0
END) / 8 AS nullhdr,
sum((1::double precision - pg_stats.null_frac) * pg_stats.avg_width::double precision) AS datawidth,
max(pg_stats.null_frac) AS maxfracsum,
pg_stats.schemaname,
pg_stats.tablename,
constants.hdr,
constants.ma,
constants.bs
FROM pg_stats
CROSS JOIN constants
LEFT JOIN no_stats ON pg_stats.schemaname = no_stats.table_schema::name AND pg_stats.tablename = no_stats.table_name::name
WHERE (pg_stats.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND no_stats.table_name IS NULL AND (EXISTS ( SELECT 1
FROM columns
WHERE pg_stats.schemaname = columns.table_schema::name AND pg_stats.tablename = columns.table_name::name))
GROUP BY pg_stats.schemaname, pg_stats.tablename, constants.hdr, constants.ma, constants.bs
), data_headers AS (
SELECT null_headers.ma,
null_headers.bs,
null_headers.hdr,
null_headers.schemaname,
null_headers.tablename,
(null_headers.datawidth + (null_headers.hdr + null_headers.ma -
CASE
WHEN (null_headers.hdr % null_headers.ma) = 0 THEN null_headers.ma
ELSE null_headers.hdr % null_headers.ma
END)::double precision)::numeric AS datahdr,
null_headers.maxfracsum * (null_headers.nullhdr + null_headers.ma -
CASE
WHEN (null_headers.nullhdr % null_headers.ma::bigint) = 0 THEN null_headers.ma::bigint
ELSE null_headers.nullhdr % null_headers.ma::bigint
END)::double precision AS nullhdr2
FROM null_headers
), table_estimates AS (
SELECT data_headers.schemaname,
data_headers.tablename,
data_headers.bs,
pg_class.reltuples::numeric AS est_rows,
pg_class.relpages::numeric * data_headers.bs AS table_bytes,
ceil(pg_class.reltuples * (data_headers.datahdr::double precision + data_headers.nullhdr2 + 4::double precision + data_headers.ma::double precision -
CASE
WHEN (data_headers.datahdr % data_headers.ma::numeric) = 0::numeric THEN data_headers.ma::numeric
ELSE data_headers.datahdr % data_headers.ma::numeric
END::double precision) / (data_headers.bs - 20::numeric)::double precision) * data_headers.bs::double precision AS expected_bytes,
pg_class.reltoastrelid
FROM data_headers
JOIN pg_class ON data_headers.tablename = pg_class.relname
JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid AND data_headers.schemaname = pg_namespace.nspname
WHERE pg_class.relkind = 'r'::"char"
), estimates_with_toast AS (
SELECT table_estimates.schemaname,
table_estimates.tablename,
true AS can_estimate,
table_estimates.est_rows,
table_estimates.table_bytes + COALESCE(toast.relpages, 0)::numeric * table_estimates.bs AS table_bytes,
table_estimates.expected_bytes + ceil(COALESCE(toast.reltuples, 0::real) / 4::double precision) * table_estimates.bs::double precision AS expected_bytes
FROM table_estimates
LEFT JOIN pg_class toast ON table_estimates.reltoastrelid = toast.oid AND toast.relkind = 't'::"char"
), table_estimates_plus AS (
SELECT current_database() AS databasename,
estimates_with_toast.schemaname,
estimates_with_toast.tablename,
estimates_with_toast.can_estimate,
estimates_with_toast.est_rows,
CASE
WHEN estimates_with_toast.table_bytes > 0::numeric THEN estimates_with_toast.table_bytes
ELSE NULL::numeric
END AS table_bytes,
CASE
WHEN estimates_with_toast.expected_bytes > 0::double precision THEN estimates_with_toast.expected_bytes::numeric
ELSE NULL::numeric
END AS expected_bytes,
CASE
WHEN estimates_with_toast.expected_bytes > 0::double precision AND estimates_with_toast.table_bytes > 0::numeric AND estimates_with_toast.expected_bytes <= estimates_with_toast.table_bytes::double precision THEN (estimates_with_toast.table_bytes::double precision - estimates_with_toast.expected_bytes)::numeric
ELSE 0::numeric
END AS bloat_bytes
FROM estimates_with_toast
UNION ALL
SELECT current_database() AS databasename,
no_stats.table_schema,
no_stats.table_name,
false AS bool,
no_stats.est_rows,
no_stats.table_size,
NULL::numeric AS "numeric",
NULL::numeric AS "numeric"
FROM no_stats
), bloat_data AS (
SELECT current_database() AS databasename,
table_estimates_plus.schemaname,
table_estimates_plus.tablename,
table_estimates_plus.can_estimate,
table_estimates_plus.table_bytes,
round(table_estimates_plus.table_bytes / (1024::double precision ^ 2::double precision)::numeric, 3) AS table_mb,
table_estimates_plus.expected_bytes,
round(table_estimates_plus.expected_bytes / (1024::double precision ^ 2::double precision)::numeric, 3) AS expected_mb,
round(table_estimates_plus.bloat_bytes * 100::numeric / table_estimates_plus.table_bytes) AS pct_bloat,
round(table_estimates_plus.bloat_bytes / (1024::numeric ^ 2::numeric), 2) AS mb_bloat,
table_estimates_plus.est_rows
FROM table_estimates_plus
)
SELECT bloat_data.databasename,
bloat_data.schemaname,
bloat_data.tablename,
bloat_data.can_estimate,
bloat_data.table_bytes,
bloat_data.table_mb,
bloat_data.expected_bytes,
bloat_data.expected_mb,
bloat_data.pct_bloat,
bloat_data.mb_bloat,
bloat_data.est_rows
FROM bloat_data
ORDER BY bloat_data.pct_bloat DESC;
I have provided connect privilege to the database and grant usage and select privilege to user user1. I am not sure what other privileges I would be missing here. Any help would be appreciated.
PS: I have also provided usage and select privilege to the tables and schema the view is using during its creation.
https://www.postgresql.org/docs/9.4/static/view-pg-stats.html
The view pg_stats provides access to the information stored in the
pg_statistic catalog. This view allows access only to rows of
pg_statistic that correspond to tables the user has permission to
read, and therefore it is safe to allow public read access to this
view.
https://www.postgresql.org/docs/9.4/static/monitoring-stats.html
pg_stat_user_tables Same as pg_stat_all_tables, except that only user
tables are shown.
so after you grant read on other owner tables to user, you still join pg_stat_user_tables which will cut list to only those tables onwer of which you are... - either exclude it from view, or use left outer join instead of inner join
I'm talking about JOIN pg_stat_user_tables, but you should check every table you join and read about all views you include in your query

Postgres DISTINCT Query issue

SELECT DISTINCT "Users"."id" , "Users".name,
"Users"."surname", "Users"."gender",
"Users"."dob", "Searches"."start_date"
FROM "Users"
LEFT JOIN "Searches" ON "Users"."id" = "Searches"."user_id"
WHERE (SQRT( POW(69.1 * ("Users"."latitude" - 45.465454), 2) + POW(69.1 * (9.186515999999983 - "Users"."longitude") * COS("Users"."latitude" / 57.3), 2))) < 20
AND "Users"."status" = true
AND "Users"."id" != 18
AND "Searches"."activity" = \'clubbing\'
AND "Users"."gender" = \'m\'
AND "Users"."age" BETWEEN 18 AND 30
ORDER BY ABS( "Searches"."start_date" - date \'2016-07-07\' )
For some reasons the above query returns the following error:
for SELECT DISTINCT, ORDER BY expressions must appear in select list
I only want to return unique users but I really don't know what's wrong with it.
Thanks for your help
Just doing what the error message says I would include the expression ABS( "Searches"."start_date" - date '2016-07-07' ) in the SELECT list. No need to change your query logic.
absdiffdate can be discarded later when processing the result.
SELECT DISTINCT "Users"."id" , "Users".name,
"Users"."surname", "Users"."gender",
"Users"."dob", "Searches"."start_date",
ABS( "Searches"."start_date" - date '2016-07-07' ) absdiffdate
FROM "Users"
LEFT JOIN "Searches" ON "Users"."id" = "Searches"."user_id"
WHERE (SQRT( POW(69.1 * ("Users"."latitude" - 45.465454), 2) + POW(69.1 * (9.186515999999983 - "Users"."longitude") * COS("Users"."latitude" / 57.3), 2))) < 20
AND "Users"."status" = true
AND "Users"."id" != 18
AND "Searches"."activity" = 'clubbing'
AND "Users"."gender" = 'm'
AND "Users"."age" BETWEEN 18 AND 30
ORDER BY ABS( "Searches"."start_date" - date '2016-07-07' )
Will this new column results in possibly more records when DISTINCT is applied?
I don't think so because you are subtracting a constant from start_date and for similar start_date corresponds similar outcome.
In Postgres, you can use DISTINCT ON to get one row per user id:
SELECT DISTINCT ON (u."id") u."id", u.name, u."surname", u."gender", u."dob", s."start_date"
FROM "Users" u LEFT JOIN
"Searches" s
ON u."id" = s."user_id"
WHERE (SQRT( POW(69.1 * (u."latitude" - 45.465454), 2) + POW(69.1 * (9.186515999999983 - u."longitude") * COS(u."latitude" / 57.3), 2))) < 20 AND
u."status" = true AND
u."id" != 18 AND "Searches"."activity" = \'clubbing\' AND
u."gender" = \'m\' AND
u."age" BETWEEN 18 AND 30
ORDER BY users.id, ABS(s."start_date" - date \'2016-07-07\' );
Notice how table aliases make the query easier to write and to read.

JOIN tables inside a subquery in DB2

I'm having trouble with paginating with joined tables in DB2. I want to return rows 10-30 of a query that contains an INNER JOIN.
This works:
SELECT *
FROM (
SELECT row_number() OVER (ORDER BY U4SLSMN.SLNAME) AS ID,
U4SLSMN.SLNO, U4SLSMN.SLNAME, U4SLSMN.SLLC
FROM U4SLSMN) AS P
WHERE P.ID BETWEEN 10 AND 30
This does not work:
SELECT *
FROM (
SELECT row_number() OVER (ORDER BY U4SLSMN.SLNAME) AS ID,
U4SLSMN.SLNO, U4SLSMN.SLNAME, U4SLSMN.SLLC, U4CONST.C4NAME
FROM U4SLSMN INNER JOIN U4CONST ON U4SLSMN.SLNO = U4CONST.C4NAME
) AS P
WHERE P.ID BETWEEN 10 AND 30
The error I get is:
Selection error involving field *N.
Note that the JOIN query works correctly by itself, just not when it's run as a subquery.
How do I perform a join inside a subquery in DB2?
Works fine for me on v7.1 TR9
Here's what I actually ran:
select *
from ( select rownumber() over (order by vvname) as ID, idescr, vvname
from olsdta.ioritemmst
inner join olsdta.vorvendmst on ivndno = vvndno
) as P
where p.id between 10 and 30;
I much prefer the CTE version however:
with p as
( select rownumber() over (order by vvname) as ID, idescr, vvname
from olsdta.ioritemmst
inner join olsdta.vorvendmst on ivndno = vvndno
)
select *
from p
where p.id between 10 and 30;
Finally, note that at 7.1 TR11 (7.2 TR3), IBM added support of the LIMIT and OFFSET clauses. Your query could be re-done as follows:
SELECT
U4SLSMN.SLNO, U4SLSMN.SLNAME, U4SLSMN.SLLC, U4CONST.C4NAME
FROM U4SLSMN INNER JOIN U4CONST ON U4SLSMN.SLNO = U4CONST.C4NAME
ORDER BY U4SLSMN.SLNAME
LIMIT 20 OFFSET 9;
However, note that the LIMIT & OFFSET clauses are only supported in prepared or embedded SQL. You can't use them in STRSQL or STRQMQRY. I believe the "Run SQL Scripts" GUI interface does support them. Here's an article about LIMIT & OFFSET

How large is a "buffer" in PostgreSQL

I am using pg_buffercache module for finding hogs eating up my RAM cache. For example when I run this query:
SELECT c.relname, count(*) AS buffers
FROM pg_buffercache b INNER JOIN pg_class c
ON b.relfilenode = c.relfilenode AND
b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database()))
GROUP BY c.relname
ORDER BY 2 DESC
LIMIT 10;
I discover that sample_table is using 120 buffers.
How much is 120 buffers in bytes?
PostgreSQL has a hard coded block size of 8192 bytes -- see the pre-defined block_size variable. This used to be a number to hold in mind whenever you edited the config to specify shared_buffers, etc., but the config now supports suffixes like MB which will do the conversion for you.
It is possible, with hard work, to change block_size to other values. For a minority of applications there might be a more optimal size, but the number of places the code makes an assumption about the size is large.
According to what Edmund said, we can make this select in our scheme database:
SELECT c.relname,
Pg_size_pretty(Count(*) * 8192)
AS buffered,
Round(100.0 * Count(*) / (SELECT setting
FROM pg_settings
WHERE name = 'shared_buffers') :: INTEGER, 1)
AS
buffers_percent,
Round(100.0 * Count(*) * 8192 / Pg_relation_size(c.oid), 1)
AS
percent_of_relation
FROM pg_class c
INNER JOIN pg_buffercache b
ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d
ON ( b.reldatabase = d.oid
AND d.datname = Current_database() )
WHERE Pg_relation_size(c.oid) > 0
GROUP BY c.oid,
c.relname
ORDER BY 3 DESC
LIMIT 10;