I am trying to to run the following:
DROP TABLE IF exists timestamp_15min;
Create TEMP TABLE timestamp_15min AS
SELECT
dateadd(minute,min_15,dateadd(hour,hours,dateadd(day,days,start_date::timestamp))) AS date_time,
TO_CHAR(date_time,'HH24') || ':' || TO_CHAR(date_time,'MI') || ':' || '00' AS mins15
FROM
(SELECT trunc(sysdate-14) AS start_date)
CROSS JOIN (SELECT generate_series(0,14,1) AS days)
CROSS JOIN (SELECT generate_series(0,24,1) AS hours)
CROSS JOIN (SELECT generate_series(0,60,15) as min_15)
;
And I get this error
Specified types or functions (one per INFO message) not supported on Redshift tables.
If I remove the the DROP AND CREATE TABLE, and just run the below with SELECT it works fine
SELECT
dateadd(minute,min_15,dateadd(hour,hours,dateadd(day,days,start_date::timestamp))) AS date_time,
TO_CHAR(date_time,'HH24') || ':' || TO_CHAR(date_time,'MI') || ':' || '00' AS mins15
FROM
(SELECT trunc(sysdate-14) AS start_date)
CROSS JOIN (SELECT generate_series(0,14,1) AS days)
CROSS JOIN (SELECT generate_series(0,24,1) AS hours)
CROSS JOIN (SELECT generate_series(0,60,15) as min_15)
Can someone explain what function or type is Redshift blocking ?
generate_series is not supported in queries that reference Redshift user tables. It will work when querying catalog tables only.
You can use the ROW_NUMBER() function to generate a series instead.
Related
This question already has answers here:
How to bulk update sequence ID postgreSQL for all tables
(3 answers)
Closed last month.
Postgres allows for auto-increment of the Primary Key value SERIAL is used at DDL. With this, a sequence property for the table is also created along with the table. However, when tuples are inserted manually, the sequence current value does not get updated.
For One table at a time we can use SQL like this
SELECT setval('<table>_id_seq', COALESCE((SELECT MAX(id) FROM your_table), 1), false); wherein we need to manually set the <table>_id_seq' and id' attribute name.
The problem becomes labourious if there are multiple tables. So is there any generalised method to set current value for all sequences to the MAX(<whatever_sequence_attribute_is>)?
Yup! Got it
First run the following SQL
SELECT 'SELECT setval(' || quote_literal(quote_ident(sequence_namespace.nspname) || '.' || quote_ident(class_sequence.relname)) || ', COALESCE((SELECT MAX(' ||quote_ident(pg_attribute.attname)|| ') FROM ' || quote_ident(table_namespace.nspname)|| '.'||quote_ident(class_table.relname)|| '), 1), false)' || ';' FROM pg_depend INNER JOIN pg_class AS class_sequence ON class_sequence.oid = pg_depend.objid AND class_sequence.relkind = 'S' INNER JOIN pg_class AS class_table ON class_table.oid = pg_depend.refobjid INNER JOIN pg_attribute ON pg_attribute.attrelid = class_table.oid AND pg_depend.refobjsubid = pg_attribute.attnum INNER JOIN pg_namespace as table_namespace ON table_namespace.oid = class_table.relnamespace INNER JOIN pg_namespace AS sequence_namespace ON sequence_namespace.oid = class_sequence.relnamespace ORDER BY sequence_namespace.nspname, class_sequence.relname;
The result window will show as many SQL statements as sequences are there in the DB. Copy all the SQL statements and Execute all. The current value of all sequences will get updated.
Note: Please Note that the major SQL will only generate SQLs and WILL NOT UPDATE the sequences. One has to copy all the SQL Statements from the Result and execute.
I am trying to select all rows from multiple tables inside various schemas.
These are the tables inside different schemas
schema_1-->ABC_table_1,XYZ_table_2
schema_2-->ABC_table_1,JLK_table_2
.
.
schema_N-->ABC_table_1,LMN_table_2
I am trying to select all rows from table_2 from all schemas:
This query is giving me all the tables:
SELECT
table_schema || '.' || table_name
FROM
information_schema.tables
WHERE
table_type = 'BASE TABLE'
AND
table_schema NOT IN ('pg_catalog', 'information_schema');
What i need to do is something like
select * from schema_1.XYZ_table_2
Union all
select * from schema_2.JLK_table_2
.
.
schema_2.LMN_table_2
Verify if this option can work for you :
select row_to_json(row) from (select * from table1 ) row
UNION ALL
select row_to_json(row) from (select * from table2 ) row
;
If you dont want to work with JSON/HStore datamanipulation you can also try to make dynamic sql select creation for all possible columns in the schema then UNION ALL can still work .
Json looks pretty easier to do this task and also very supported to do any further actions.
I have multiple tables that all have the same columns, but in different order. I want to merge them all together. I've created an empty table with the standard columns in the order I would like. I've tried inserting with
insert into master_table select * from table1;
but that doesn't work because of the differing column order - some of the values end up in the wrong columns. What is the best way to create one table out of them all in the order specified in my empty master table?
If you are dealing with many columns and many tables, you can use the information_schema to get the columns. You can loop through all the tables you want to insert from and run this in a plpgsql procedure, replacing table1 with a variable:
EXECUTE (
SELECT
'insert into master_table
(' || string_agg(quote_ident(column_name), ',') || ')
SELECT ' || string_agg('p.' || quote_ident(column_name), ',') || '
FROM table1 p '
FROM information_schema.columns raw
WHERE table_name = 'master_table');
just indicate the proper order in the select
instead of
select *
if you want 3 field on second posiition.
select field1, field3, field2
or you can use the INSERT sintaxis
INSERT INTO master_table (field1, field3, field2)
SELECT *
I would like to check across multiple tables that the same keys / same number of keys are present in each of the tables.
Currently I have created a solution that checks the count of keys per individual table, checks the count of keys when all tables are merged together, then compares.
This solution works but I wonder if there is a more optimal solution...
Example solution as it stands:
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_a;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_b;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_c;
SELECT COUNT(DISTINCT a.variable) AS num_ids
FROM (SELECT DISTINCT VARIABLE FROM table_a) a
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_b) b ON a.variable = b.variable
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_c) c ON a.variable = c.variable;
UPDATE:
The difficultly that I'm facing putting this together in one query is that any of the tables might not be unique on the VARIABLE that I am looking to check, so I've had to use distinct before merging to avoid expanding the join
Since we are only counting, I think there is no need in joining the tables on the variable column. A UNION should be enough.
We still have to use DISTINCT to ignore/suppress duplicates, which often means extra sort.
An index on variable should help for getting counts for separate tables, but it will not help for getting the count of the combined table.
Here is an example for comparing two tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_AB
AS
(
SELECT COUNT(DISTINCT variable) AS CountAB
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
) AS AB
)
SELECT
CASE WHEN CountA = CountAB AND CountB = CountAB
THEN 'same' ELSE 'different' END AS ResultAB
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_AB
;
Three tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_C
AS
(
SELECT COUNT(DISTINCT variable) AS CountC
FROM TableC
)
,CTE_ABC
AS
(
SELECT COUNT(DISTINCT variable) AS CountABC
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableC
) AS AB
)
SELECT
CASE WHEN CountA = CountABC AND CountB = CountABC AND CountC = CountABC
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
CROSS JOIN CTE_ABC
;
I deliberately chose CTE, because as far as I know Postgres materializes CTE and in our case each CTE will have only one row.
Using array_agg with order by is even better variant, if it is available on redshift. You'll still need to use DISTINCT, but you don't have to merge all tables together.
WITH
CTE_A
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS A
FROM TableA
)
,CTE_B
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS B
FROM TableB
)
,CTE_C
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS C
FROM TableC
)
SELECT
CASE WHEN A = B AND B = C
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
;
Well, here is probably the nastiest piece of SQL I could build for you :) I will forever deny that I wrote this and that my stackoverflow account was hacked ;)
SELECT
'All OK'
WHERE
( SELECT COUNT(DISTINCT id) FROM table_a ) = ( SELECT COUNT(DISTINCT id) FROM table_b )
AND ( SELECT COUNT(DISTINCT id) FROM table_b ) = ( SELECT COUNT(DISTINCT id) FROM table_c )
By the way, this won't optimise the query - it's still doing three queries (but I guess it's better than 4?).
UPDATE: In light of your use-case below: NEW sql fiddle http://sqlfiddle.com/#!15/a0403/1
SELECT DISTINCT
tbl_a.a_count,
tbl_b.b_count,
tbl_c.c_count
FROM
( SELECT COUNT(id) a_count, array_agg(id order by id) ids FROM table_a) tbl_a,
( SELECT COUNT(id) b_count, array_agg(id order by id) ids FROM table_b) tbl_b,
( SELECT COUNT(id) c_count, array_agg(id order by id) ids FROM table_c) tbl_c
WHERE
tbl_a.ids = tbl_b.ids
AND tbl_b.ids = tbl_c.ids
The above query will only return if all tables have the same number of rows, ensuring that the IDS are also the same.
I have read that using cte's you can speed up a select distinct up to 100 times. Link to the website . They have this following example:
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (5000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 117329
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
How would one apply this to a query that has multiple joins? For example i am trying to run this query found below, however it takes roughly two and a half minutes. How would I optimize this accordingly?
SELECT DISTINCT x.code
From jpa
INNER JOIN jp ON jpa.ID=jp.ID
INNER JOIN jd ON (jd.ID=jp.ID And jd.JID=3)
INNER JOIN l ON jpa.ID=l.ID AND l.CID=3
INNER JOIN fa ON fa.ID=jpa.ID
INNER JOIN x ON fa.ID=x.ID
1) GROUP BY on every column worked faster for me.
2) If you have duplicates in some of the tables then you can also pre select that and join from that as an inner query.
3) Generally you can nest join if you expect that this join will limit data.
SQL join format - nested inner joins