I have table names with name and surname columns. I want to grab random name and surname from it, i tried this, but it takes one name and surname and prints it 100 times so it makes only one select at the start and then uses it's value,how can i fix it?
SELECT (SELECT name FROM names WHERE ID = ROUND(RANDOM() * 10 + 1)),
(SELECT surname FROM names WHERE ID = ROUND(RANDOM() * 10 + 1))
FROM GENERATE_SERIES(1, 100);
In order for Postgres to evaluate the select in the subquery multiple times, it needs to look like a correlated subquery -- one whose results depend on the values being returned by the top-level query. A minor problem here is that you don't actually care about those values. You can hack around that by meaninglessly including them in the subqueries, like this:
SELECT (SELECT name FROM names WHERE ID = ROUND(RANDOM() * 10 + 1 + i - i)),
(SELECT surname FROM names WHERE ID = ROUND(RANDOM() * 10 + 1 + i - i))
FROM GENERATE_SERIES(1, 100) i;
Another approach would be to move the subqueries to your FROM clause, put a different generate_series clause in each one, and then join them on the output of each series, but that ends up being really complicated SQL.
you didn't use the generateseries in the subquery , try with this
SELECT (SELECT name FROM names WHERE ID = ROUND(RANDOM() * 10 + g)),
(SELECT surname FROM names WHERE ID = ROUND(RANDOM() * 10 + g))
FROM GENERATE_SERIES(1, 100) g;
I have this (beginner) query:
let getCandlesSQL =
$"SELECT
date_trunc('minute', ts) ts,
instrument,
MAX(price) high,
MIN(price) low,
(SUM(price * price * quantity) / SUM(price * quantity)) midpoint,
SUM(price * quantity) volume,
(SUM(direction * price * quantity) / SUM(price * quantity)) direction
FROM {tableTradesName}
WHERE instrument = '{instrument.Ticker}' AND ts BETWEEN '{fromTime}' AND '{toTime}'
GROUP BY date_trunc('minute', ts), instrument
ORDER BY ts
LIMIT 4500"
I rebuild the string with internal variables at every call, so I don't need to use the SQL variable mechanism.
There are a few calculations that are done multiple times, for example 'price * quantity' is done many times.
Is there a way to write the query to do it once and then re-use it?
i was stuck for days in this schema now.
I am trying to populate distance column in a different table from other 2 tables. Inside the table there are lat, long, city id, distance, and location id.
This is the current table that i wanted to populate
This is the two tables that i can get to calculate the distance from
LocationID are the same as ID in the first table
To calculate the distance to the nearest city i calculate it using lat long, this is what my code look like for the nearest distance
select location_id, distance
from (SELECT t.table1.location_id as location_id,
( 6371 * acos( cos( radians(6.414478) ) *
cos( radians(t.table1.latitude::float8) ) *
cos( radians(t.table1.longitude::float8) - radians(12.466646) )
+ sin( radians(6.414478) ) * sin( radians(t.table1.latitude::float8) ) ) ) AS distance
FROM t.table1
INNER JOIN t.table2
on t.table1.location_id = t.table2.id
) km
group by location_id, distance
Having distance < 2000
order by distance limit 20;
but the table only returns null value
I'm using PostgreSQL for this code and the application used for visualising is metabase.
I would recommend you to use ST_Distance function from PostGIS extension for distance calculation instead of doing it yourself. It will be easier and definitely much faster.
Edited: Probably misunderstood the initial intentions.
This should work:
select d.city_id, d.distance, latitude,location_id, longitude
from t.table1
left join lateral (
select city_id, distance from (
select location_id city_id, ( 6371 * acos( cos( radians(table3.latitude) ) *
cos( radians(t.table1.latitude::float8) ) *
cos( radians(t.table1.longitude::float8) - radians(table3.longitude) )
+ sin( radians(table3.latitude) ) * sin( radians(t.table1.latitude::float8) ) ) ) AS distance
from t.table3
) d
where distance < 2000
order by distance
limit 1
) cities on true
Try it out.
Best regards,
Bjarni
We are transitioning across to Azure SQL Data Warehouse - and an issue that's been highlighted is the need to change some smaller tables from Round-Robin / Hash-distributed to Replicated to improve performance.
MS Design Guidance (See Here) suggests one criteria for this decision is Tables that take up less than 2Gb Disk Space. i.e. these tables could be made into Replicated tables. They suggest using DBCC PDW_SHOWSPACEUSED to determine this.
I can run this against the whole DB, or one specific table, but i'd really like to get a list of all tables and the space used (preferably in MB) - but it's beyond me.
A lot of google searching either gives me the two basic commands I already know (against the whole DB / against 1 table) or give me SQL Server queries that don't run against Azure DW - e.g. using sys.allocation_units - which is not supported in Azure DW.
I was just directed to this Microsoft article that provides a pretty solid solution to this problem.
In particular, create a view:
CREATE VIEW dbo.vTableSizes
AS
WITH base
AS (
SELECT
GETDATE() AS [execution_time],
DB_NAME() AS [database_name],
s.name AS [schema_name],
t.name AS [table_name],
QUOTENAME(s.name) + '.' + QUOTENAME(t.name) AS [two_part_name],
nt.[name] AS [node_table_name],
ROW_NUMBER() OVER (PARTITION BY
nt.[name]
ORDER BY
(
SELECT
NULL
)
) AS [node_table_name_seq],
tp.[distribution_policy_desc] AS [distribution_policy_name],
c.[name] AS [distribution_column],
nt.[distribution_id] AS [distribution_id],
i.[type] AS [index_type],
i.[type_desc] AS [index_type_desc],
nt.[pdw_node_id] AS [pdw_node_id],
pn.[type] AS [pdw_node_type],
pn.[name] AS [pdw_node_name],
di.name AS [dist_name],
di.position AS [dist_position],
nps.[partition_number] AS [partition_nmbr],
nps.[reserved_page_count] AS [reserved_space_page_count],
nps.[reserved_page_count] - nps.[used_page_count] AS [unused_space_page_count],
nps.[in_row_data_page_count] + nps.[row_overflow_used_page_count] + nps.[lob_used_page_count] AS [data_space_page_count],
nps.[reserved_page_count] - (nps.[reserved_page_count] - nps.[used_page_count])
- ([in_row_data_page_count] + [row_overflow_used_page_count] + [lob_used_page_count]) AS [index_space_page_count],
nps.[row_count] AS [row_count]
FROM
sys.schemas s
INNER JOIN
sys.tables t
ON s.[schema_id] = t.[schema_id]
INNER JOIN
sys.indexes i
ON t.[object_id] = i.[object_id]
AND i.[index_id] <= 1
INNER JOIN
sys.pdw_table_distribution_properties tp
ON t.[object_id] = tp.[object_id]
INNER JOIN
sys.pdw_table_mappings tm
ON t.[object_id] = tm.[object_id]
INNER JOIN
sys.pdw_nodes_tables nt
ON tm.[physical_name] = nt.[name]
INNER JOIN
sys.dm_pdw_nodes pn
ON nt.[pdw_node_id] = pn.[pdw_node_id]
INNER JOIN
sys.pdw_distributions di
ON nt.[distribution_id] = di.[distribution_id]
INNER JOIN
sys.dm_pdw_nodes_db_partition_stats nps
ON nt.[object_id] = nps.[object_id]
AND nt.[pdw_node_id] = nps.[pdw_node_id]
AND nt.[distribution_id] = nps.[distribution_id]
AND i.[index_id] = nps.[index_id]
LEFT OUTER JOIN
(
SELECT
*
FROM
sys.pdw_column_distribution_properties
WHERE
distribution_ordinal = 1
) cdp
ON t.[object_id] = cdp.[object_id]
LEFT OUTER JOIN
sys.columns c
ON cdp.[object_id] = c.[object_id]
AND cdp.[column_id] = c.[column_id]
WHERE
pn.[type] = 'COMPUTE'),
size
AS ( SELECT
[execution_time],
[database_name],
[schema_name],
[table_name],
[two_part_name],
[node_table_name],
[node_table_name_seq],
[distribution_policy_name],
[distribution_column],
[distribution_id],
[index_type],
[index_type_desc],
[pdw_node_id],
[pdw_node_type],
[pdw_node_name],
[dist_name],
[dist_position],
[partition_nmbr],
[reserved_space_page_count],
[unused_space_page_count],
[data_space_page_count],
[index_space_page_count],
[row_count],
([reserved_space_page_count] * 8.0) AS [reserved_space_KB],
([reserved_space_page_count] * 8.0) / 1000 AS [reserved_space_MB],
([reserved_space_page_count] * 8.0) / 1000000 AS [reserved_space_GB],
([reserved_space_page_count] * 8.0) / 1000000000 AS [reserved_space_TB],
([unused_space_page_count] * 8.0) AS [unused_space_KB],
([unused_space_page_count] * 8.0) / 1000 AS [unused_space_MB],
([unused_space_page_count] * 8.0) / 1000000 AS [unused_space_GB],
([unused_space_page_count] * 8.0) / 1000000000 AS [unused_space_TB],
([data_space_page_count] * 8.0) AS [data_space_KB],
([data_space_page_count] * 8.0) / 1000 AS [data_space_MB],
([data_space_page_count] * 8.0) / 1000000 AS [data_space_GB],
([data_space_page_count] * 8.0) / 1000000000 AS [data_space_TB],
([index_space_page_count] * 8.0) AS [index_space_KB],
([index_space_page_count] * 8.0) / 1000 AS [index_space_MB],
([index_space_page_count] * 8.0) / 1000000 AS [index_space_GB],
([index_space_page_count] * 8.0) / 1000000000 AS [index_space_TB]
FROM
base)
SELECT
*
FROM
size;
Then, this "Table space summary" query provides a list of tables and how much space each is currently using (among other information):
SELECT
database_name,
schema_name,
table_name,
distribution_policy_name,
distribution_column,
index_type_desc,
COUNT(DISTINCT partition_nmbr) AS nbr_partitions,
SUM(row_count) AS table_row_count,
SUM(reserved_space_GB) AS table_reserved_space_GB,
SUM(data_space_GB) AS table_data_space_GB,
SUM(index_space_GB) AS table_index_space_GB,
SUM(unused_space_GB) AS table_unused_space_GB
FROM
dbo.vTableSizes
GROUP BY
database_name,
schema_name,
table_name,
distribution_policy_name,
distribution_column,
index_type_desc
ORDER BY
table_reserved_space_GB DESC;
Here's a sample execution against one of my databases showing the table names sorted by space used.
I have a PostgreSQL query that looks like this:
SELECT *,
2 * 3961 * asin(sqrt((sin(radians((latitude - 40.2817993164062) / 2))) ^ 2 + cos(radians(40.2817993164062)) * cos(radians(latitude)) * (sin(radians((longitude - -111.720901489258) / 2))) ^ 2)) as distance,
(SELECT json_agg(deals.*) FROM deals WHERE vendors.id = deals.vendorid) as deals FROM vendors
WHERE ( category = 'Food' )
AND (distance < 80)
AND (nationwide IS FALSE OR nationwide is NULL)
ORDER BY featured ASC, created DESC, distance ASC
I'm getting the distance in miles using the second select part.
The problem is the part that says AND (distance < 80) I get the following error: column "distance" does not exist the weird thing is that if I remove the AND (distance < 80) it works and it also sorts correctly by distance, also the outputted data includes distance, so it's grabbing the distance correctly but for some reason wont let me use the distance as a filter in the WHERE clauses and I can't figure out why.
distance is just an alias. You could try something like:
WITH vendors_distance as (
SELECT *,
2 * 3961 * asin(sqrt((sin(radians((latitude - 40.2817993164062) / 2))) ^ 2 + cos(radians(40.2817993164062)) * cos(radians(latitude)) * (sin(radians((longitude - -111.720901489258) / 2))) ^ 2)) as distance
FROM vendors
WHERE ( category = 'Food' )
AND (nationwide IS FALSE OR nationwide is NULL)
)
SELECT vendors_distance.*,
(SELECT json_agg(deals.*) FROM deals WHERE vendors_distance.id = deals.vendorid) as deals
FROM vendors_distance
WHERE (distance < 80)
ORDER BY featured ASC, created DESC, distance ASC