Creating views in redshift - amazon-redshift

Creating views in redshift - amazon-redshift

I'm writing a code at the moment that has to access a transactions file multiple times for a given date range. I was wondering if it's possible to set up a "view" of my table to allow a single delete from at the start of the code (without affecting the table underneath) so that the date range will always be applied throughout the code
So in a simplified example changing the code from...
SELECT SUM(sales)
FROM trans_file
WHERE date_field BETWEEN '2012-01-01' AND '2012-01-31'
To this...
DELETE
FROM trans_file
WHERE date_field NOT BETWEEN '2012-01-01' AND '2012-01-31'
SELECT SUM(sales)
FROM trans_file

What you can do is to perform a Deep Copy first,
access data multiple times and then delete the "view", like this:
CREATE TABLE trans_file_view AS (
SELECT * FROM trans_file
WHERE date_field BETWEEN '2012-01-01' AND '2012-01-31'
);
SELECT SUM(sales) FROM trans_file_view;
...next SELECT statements...
DROP TABLE trans_file_view;
Bibliography:
You can read more about Deep Copy

Related

Cannot create materialized view with ORDER BY clause in TimescaleDb 2.7.0

The timescale docs seem to suggest that since 2.7.0 it should be possible to make materialized views which include an order by clause. (See "timescale.finalized" option here and "function support" here).
However, I have not been able to get this to work for me. When I try to create my materialized view I get:
ERROR: invalid continuous aggregate query
DETAIL: ORDER BY is not supported in queries defining continuous aggregates.
HINT: Use ORDER BY clauses in SELECTS from the continuous aggregate view instead.
Is there something fundamental I'm misunderstanding about how this should work?
Here is the full script:
> select extname, extversion from pg_extension where extname = 'timescaledb';
extname | extversion
-------------+------------
timescaledb | 2.7.0
(1 row)
> CREATE TABLE stocks_real_time (
time TIMESTAMPTZ NOT NULL,
price DOUBLE PRECISION NULL
);
CREATE TABLE
> SELECT create_hypertable('stocks_real_time','time');
create_hypertable
-------------------------------
(7,public,stocks_real_time,t)
(1 row)
> CREATE MATERIALIZED VIEW mat_view_stocks_real_time
WITH (timescaledb.continuous)
AS (
SELECT
time_bucket('60 minutes', time) as bucketed_time,
AVG(price) as price
FROM stocks_real_time
GROUP BY bucketed_time
ORDER BY bucketed_time
);
ERROR: invalid continuous aggregate query
DETAIL: ORDER BY is not supported in queries defining continuous aggregates.
HINT: Use ORDER BY clauses in SELECTS from the continuous aggregate view instead.
I still get the same error if I explicitly add "timescaledb.finalized=true" to the with clause.

(NB: I work at Timescale!)
We have an open issue to support this, and I think the confusion is because we now support aggregates with order by clauses in them, this means things like: SELECT percentile_cont(price) WITHIN GROUP (ORDER BY time) or SELECT array_agg(foo ORDER BY time)
So I think that is probably where the confusion is coming from, but like I said, we have an open issue to support that sort of order by. You can also apply the order by in the SELECT from the continuous aggregate though: ie SELECT * FROM mat_view_stocks_real_time ORDER BY bucketed_time and that should work just fine.

EntityDataSet fetch all data and then applying where clause

I am using a asp.net GridView control and setting DatasourceId to EntityDataSource as below .in the
page load setting the GridDataSource.EntityTypeFilter to a View name and also adding a where clause as GridDataSource.Where = sWhereClause
The View has million records but the Where condition filter out the record .The EntityDataSource first getting all million record in Sub-Query then applying the Where which timing out command. its generating the query as below. I want the where clause shouldgo with ViewName select statement itself not with sub-query table Extent1.
SELECT TOP (20)
[Filter1].[COL1],
[Filter1].[COL2]
[Filter1].[Col3]
FROM (
SELECT [Extent1].[COL1] , [Extent1].[COL2], [Extent1].[COL3]
, row_number() OVER (ORDER BY [Extent1].[COL1] ASC
) AS [row_number]
FROM
(
SELECT
ViewName.V1 ,
ViewName.V2
ViewName.V3
ViewName.V4
FROM [dbo].ViewName
)
AS [Extent1]
WHERE ([Extent1].[COL1] LIKE '%FilterValue%')
OR ([Extent1].[COL1] LIKE '%FilterValue%') OR ([Extent1].[COL2] LIKE '%FilterValue%') OR ([Extent1].[COL3] LIKE '%FilterValue%') )
) AS [Filter1]
WHERE [Filter1].[row_number] > 0
ORDER BY [Filter1].COL1] ASC
Thanks and appreciate any help in advance.

The EntityDataSource first getting all million record in Sub-Query then applying the Where which timing out command
Just because the WHERE clause is not in the subquery does not mean that subquery isn't filtered. SQL Server (assuming that's what you're using), can "push down" the WHERE clause predicates into the subquery, and into base tables in view definition.
But that won't make any difference here as your multiple LIKE predicates have to be evaluated for every single row in the output of the view, then all the rows have to be sorted to find the top 20.

Add dates ranges to a table for individual values using a cursor

I have a calendar table called CalendarInformation that gives me a list of dates from 2015 to 2025. This table has a column called BusinessDay that shows what dates are weekends or holidays. I have another table called OpenProblemtimeDiffTable with a column called number for my problem number and a date for when the problem was opened called ProblemNew and another date for the current column called Now. What I want to do is for each problem number grab its date ranges and find the dates between and then sum them up to give me the number of business days. Then I want to insert these values in another table with the problem number associated with the business day.
Thanks in advance and I hope I was clear.
TRUNCATE TABLE ProblemsMoreThan7BusinessDays
DECLARE #date AS date
DECLARE #businessday AS INT
DECLARE #Startdate as DATE, #EndDate as DATE
DECLARE CONTACT_CURSOR CURSOR FOR
SELECT date, businessday
FROM CalendarInformation
OPEN contact_cursor
FETCH NEXT FROM Contact_cursor INTO #date, #businessday
WHILE (##FETCH_STATUS=0)
BEGIN
SELECT #enddate= now FROM OpenProblemtimeDiffTable
SELECT #Startdate= problemnew FROM OpenProblemtimeDiffTable
SET #Date=#Startdate
PRINT #enddate
PRINT #startdate
SELECT #businessday= SUM (businessday) FROM CalendarInformation WHERE date > #startdate AND date <= #Enddate
INSERT INTO ProblemsMoreThan7BusinessDays (businessdays, number)
SELECT #businessday, number
FROM OpenProblemtimeDiffTable
FETCH NEXT FROM CONTACT_CURSOR INTO #date, #businessday
END
CLOSE CONTACT_CURSOR
DEALLOCATE CONTACT_CURSOR
I tried this code using a cursor and I'm close, but I cannot get the date ranges to change for each row.
So if I have a problemnumber with date ranges between 02-07-2018 and 05-20-2019, I would want in my new table the sum of business days from the calendar along with the problem number. So my output would be column number PROB0421 businessdays (with the correct sum). Then the next problem PRB0422 with date ranges of 11-6-18 to 5-20-19. So my output would be PROB0422 with the correct sum of business days.

Rather than doing this in with a cursor, you should approach this in a set based manner. That you already have a calendar table makes this a lot easier. The basic approach is to select from your data table and join into your calendar table to return all the rows in the calendar table that sit within your date range. From here you can then aggregate as you require.
This would look something like the below, though apply it to your situation and adjust as required:
select p.ProblemNow
,p.Now
,sum(c.BusinessDay) as BusinessDays
from dbo.Problems as p
join dbo.calendar as c
on c.CalendarDate between p.ProblemNow and p.Now
and c.BusinessDay = 1
group by p.ProblemNow
,p.Now

I think you can do this without a cursor. Should only require a single insert..select statement.
I assume your "businessday" column is just a bit or flag-type field that is 1 if the date is a business day and 0 if not? If so, this should work (or something close to it if I'm not understanding your environment properly).:
insert ProblemsMoreThan7BusinessDays
(
businessdays
, number
)
select
number
, sum( businessday ) -- or count(*)
from OpenProblemtimeDiffTable op
inner join CalendarInformation ci on op.problem_new >= ci.[date]
and op.[now] <= ci.[date]
and ci.businessday = 1
group by
problem_number
I usually try to avoid the use of cursors and working with data in a procedural manner, especially if I can handle the task as above. Dont think of the data as 1000's of individual rows, but think of the data as only two sets of data. How do they relate?

TSQL order by but first show these

I'm researching a dataset.
And I just wonder if there is a way to order like below in 1 query
Select * From MyTable where name ='international%' order by id
Select * From MyTable where name != 'international%' order by id
So first showing all international items, next by names who dont start with international.
My question is not about adding columns to make this work, or use multiple DB's, or a largerTSQL script to clone a DB into a new order.
I just wonder if anything after 'Where or order by' can be tricked to do this.

You can use expressions in the ORDER BY:
Select * From MyTable
order by
CASE
WHEN name like 'international%' THEN 0
ELSE 1
END,
id
(From your narrative, it also sounded like you wanted like, not =, so I changed that too)

Another way (slightly cleaner and a tiny bit faster)
-- Sample Data
DECLARE #mytable TABLE (id INT IDENTITY, [name] VARCHAR(100));
INSERT #mytable([name])
VALUES('international something' ),('ACME'),('international waffles'),('ABC Co.');
-- solution
SELECT t.*
FROM #mytable AS t
ORDER BY -PATINDEX('international%', t.[name]);
Note too that you can add a persisted computed column for -PATINDEX('international%', t.[name]) to speed things up.

Declaring variables in redshift

Background
I have been using Amazon Redshift to execute my queries.
I know there was a question asked earlier regarding this. But I don't understand how to incorporate UDFs.
I want to assign a temporary variable which takes a particular value.
I want to do this to make my script dynamic. For instance- This is my
usual way of writing code.
SELECT * FROM transaction_table WHERE invoice_date >= '2013-01-01'
AND invoice_date <= '2013-06-30';
What I want to do is ...
Something like what you will see below. I believe SQL server has a declare variable which does this sort of a thing.
SET start_date TO '2013-01-01';
SET end_date TO '2013-06-30';
SELECT * FROM transaction_table WHERE invoice_date >= start_date
AND invoice_date <= end_date;
This way I don't have to search deep in my script. I can just have a set
statement up top and just change that.
Your feedback is greatly welcome.

There are no variables in Redshift, unfortunately. You can, however, get a variable-like behaviour by creating a temporary table and referring to it as follows:
CREATE TEMPORARY TABLE _variables AS (
SELECT
'2013-01-01'::date as start_date
, '2013-06-30'::date as end_date
);
SELECT
transaction_table.*
FROM
transaction_table, _variables
WHERE
invoice_date >= start_date
AND
invoice_date <= end_date
;