select
(
select ('"'||attname||'"') as col
from pg_attribute
where attrelid='public.mv_act_inact'::regclass
order by attnum desc
limit 1 offset 1
),
(
select ('"'||attname||'"') as col
from pg_attribute
where attrelid='public.mv_act_inact'::regclass
order by attnum desc
limit 1 offset 2
)
from my_material_view;
i understand that the above is equivalent to select 'col1','col2' from my_material_view; so it's basically treating them them as strings.
but is there a way to evaluate it? so that it pulls like select "col1", "col2" from my_material_view?
i'd like to do a rolling average, and my columns are dynamic so it's not feasible to constantly write them in.
i'm especially interested in doing this without plpgsql or plpython because it's done on a dashboard application.
Related
I have a table called user_activity in Redshift that has department, user_id, activity_type, activity_id, activity_date.
I'd like to query a daily report of how many days since the last event (of any type). Using CROSS APPLY (SQL Server) or LATERAL JOIN (Postgres 9+), I'd do something like...
SELECT d.date, a.last_activity_date
FROM date_table d
CROSS JOIN (
SELECT DISTINCT user_id FROM activity_table
) u
CROSS APPLY (
SELECT TOP 1 activity_date as last_activity_date
FROM activity_table
WHERE user_id = u.user_id AND activity_date <= d.date
ORDER BY activity_date DESC
) a
For now, I write it similar to the below, but it is a bit slow and I am afraid it'll only get slower.
with user_activity as (
select distinct activity_date, user_id from activity_table
)
select
d.date, u.user_id,
max(u.activity_date) as last_activity_date
from date_table d
inner join user_activity u on u.activity_date <= d.date
where d.date between '2020-01-01' and current_date
group by 1, 2
Can someone suggest a good alternative for my needs or for CROSS APPLY / LATERAL JOIN.
As you are seeing cross joining and inequality joining will slow down as you data grows and are generally not the approach you want in Redshift. This is due to the data size increase that comes with this type of action when applied to large data tables that are typical in Redshift.
You want to use window functions to perform this type of analysis. But you will need to step back and rethink how you will structure the SQL. A MAX(activity_date) window function, partitioned by user_id and ordered by date and with a frame clause of all preceding rows, will find the most recent activity to any activity.
Now this will produce only rows for user_ids and dates that exist in the data table and it looks like you want 1 row for each date for each user_id, right? To do this you need to UNION in a frame of data that has 1 row for each date for each user_id ahead of the window function. You will need NULLs in for the other columns so that the data widths match. You will also want the dates in a separate column from activity_date. Now all dates for all user ids will be in the source and the window function will give you the result you want.
You also ask ‘how is this better than the joins?’ Well in the joins you are replicating all the data records by the number of dates which can get really big. In this approach you just have the original data records plus one row per user_id per date (which is the size of your output) and as the number of records per user_id grows this approach doesn’t.
——— Request to modify asker’s code per comments made to their approach ———
Your code is definitely on the right track as you have removed the massive inequality join of your original. I made 2 comments about it. The first is that I believe you need GROUP BY user_id, date to prevent multiple rows per user_id per date that would result if there are records for the same user_id on a single date with differing activity_types. This is a simple oversight.
The second is to state that I intended for you to use UNION ALL, not LEFT JOIN, in combining the actual data and the user_id/date framework. Your approach works fine but I have found that unioning with very large amounts of data is generally faster than joining but you do need to make sure the columns match up. Either way we end up with a data segment with 3 columns - 2 date columns, one with NULLs for framework rows, and 1 user_id. Your approach is fine and the difference in performance is likely very small unless you have huge tables.
Since you asked for a rewrite, here it is with both changes. (NOTE: my laptop is in the shop so I don’t have ready access to Redshift at the moment and this SQL is untested. If the intent is not clear from this and you need me to debug it will be delayed by a few days. I’m keeping your setup methods and SQL structure.)
with date_table as (
select '2000-01-01'::date as date
union all
select '2000-01-02'::date
union all
select '2000-01-03'::date
union all
select '2000-01-04'::date
union all
select '2000-01-05'::date
union all
select '2000-01-06'::date
),
users as (
select 1 as user_id
union all
select 2
union all
select 3
),
user_activity as (
select 1 as user_id, '2000-01-01'::date as activity_date
union all
select 1 as user_id, '2000-01-04'::date as activity_date
union all
select 3 as user_id, '2000-01-03'::date as activity_date
union all
select 1 as user_id, '2000-01-05'::date as activity_date
union all
select 1 as user_id, '2000-01-06'::date as activity_date
),
user_dates as (
select d.date, u.user_id
from date_table d
cross join users u
),
user_date_activity as (
select cal_date, user_id,
lag(max(activity_date), 1) ignore nulls over (partition by user_id order by date) as last_activity_date
from (
Select user_id, date as cal_date, NULL as activity_date from user_dates
Union all
Select user_id, activity_date as cal_date, activity_date from user_activity
)
Group by user_id, cal_date
)
select * from user_date_activity
order by user_id, cal_date```
This was my query based on Bill's answer.
with date_table as (
select '2000-01-01'::date as date
union all
select '2000-01-02'::date
union all
select '2000-01-03'::date
union all
select '2000-01-04'::date
union all
select '2000-01-05'::date
union all
select '2000-01-06'::date
),
users as (
select 1 as user_id
union all
select 2
union all
select 3
),
user_activity as (
select 1 as user_id, '2000-01-01'::date as activity_date
union all
select 1 as user_id, '2000-01-04'::date as activity_date
union all
select 3 as user_id, '2000-01-03'::date as activity_date
union all
select 1 as user_id, '2000-01-05'::date as activity_date
union all
select 1 as user_id, '2000-01-06'::date as activity_date
),
user_dates as (
select d.date, u.user_id
from date_table d
cross join users u
),
user_date_activity as (
select ud.date, ud.user_id,
lag(ua.activity_date, 1) ignore nulls over (partition by ud.user_id order by ud.date) as last_activity_date
from user_dates ud
left join user_activity ua on ud.date = ua.activity_date and ud.user_id = ua.user_id
)
select * from user_date_activity
order by user_id, date
I am using TSQL, SSMS v.17.9.1 The underlying db is Microsoft SQL Server 2014 SP3
For display purposes, I want to concatenate the results of two queries:
SELECT TOP 1 colA as 'myCol1' FROM tableA
--
SELECT TOP 1 colB as 'myCol2' FROM tableB
and display the results from the queries in one row in SSMS.
(The TOP 1 directive would hopefully guarantee the same number of results from each query, which would assist displaying them together. If this could be generalized to TOP 10 per query that would help also)
This should work for any number of rows, it assumes you want to pair ordered by the values in the column displayed
With
TableA_CTE AS
(
SELECT TOP 1 colA as myCol1
,Row_Number() OVER (ORDER BY ColA DESC) AS RowOrder
FROM tableA
),
TableB_CTE AS
(
SELECT TOP 1 colB as myCol2
,Row_Number() OVER (ORDER BY ColB DESC) AS RowOrder
FROM tableB
)
SELECT A.myCol1, B.MyCol2
FROM TableA_CTE AS A
INNER JOIN TableB_CTE AS B
ON A.RowOrder = B.RowOrder
There are currently two issues with the accepted answer:
I) a missing comma before the line: "Table B As"
II) TSQL seems to find it recursive as written, so I re-wrote it in a non-recursive way:
This is a re-working of the accepted answer that actually works in T-SQL:
USE [Database_1];
With
CTE_A AS
(
SELECT TOP 1 [Col1] as myCol1
,Row_Number() OVER (ORDER BY [Col2] desc) AS RowOrder
FROM [TableA]
)
,
CTE_B AS
(
SELECT TOP 1 [Col2] as myCol2
,Row_Number() OVER (ORDER BY [Col2] desc) AS RowOrder
FROM [TableB]
)
SELECT A.myCol1, B.myCol2
FROM CTE_A AS A
INNER JOIN CTE_B AS B
ON ( A.RowOrder = B.RowOrder)
I have part of a table like this:
timestamp | Source
----------------------------+----------
2017-07-28 14:20:28.757464 | Stream
2017-07-28 14:20:28.775248 | Poll
2017-07-28 14:20:29.777678 | Poll
2017-07-28 14:21:28.582532 | Stream
I want to achieve this:
timestamp | Source
----------------------------+----------
2017-07-28 14:20:28.757464 | Stream
2017-07-28 14:20:29.777678 | Poll
2017-07-28 14:21:28.582532 | Stream
Where the 2nd row in the original table had been removed, because it's within 50ms of a timestamp before or after it. Important is only removes rows when Source = 'Poll'.
Not sure how this can be achieved with a WHERE clause maybe?
Thanks in advance for any help.
Whatever we do, we can limit that to Pools, then union those rows with Streams.
with
streams as (
select *
from test
where Source = 'Stream'
),
pools as (
...
)
(select * from pools) union (select * from streams) order by timestamp
To get pools, there are different options:
Correlated subquery
For each row we run extra query to get the previous row with the same source, then select only those rows where there is no previous timestamp (first row) or where previous timestamp is more than 50ms older.
with
...
pools_with_prev as (
-- use correlated subquery
select
timestamp, Source,
timestamp - interval '00:00:00.05'
as timestamp_prev_limit,
(select max(t2.timestamp)from test as t2
where t2.timestamp < test.timestamp and
t2.Source = test.Source)
as timestamp_prev
from test
),
pools as (
select timestamp, Source
from pools_with_prev
-- then select rows which are >50ms apart
where timestamp_prev is NULL or
timestamp_prev < timestamp_prev_limit
)
...
https://www.db-fiddle.com/f/iVgSkvTVpqjNZ5F5RZVSd2/2
Join two sliding tables
Instead running subquery for each row, we can just create a copy of our table and slide it so each Pool row joins with the previous row of the same source type.
with
...
pools_rn as (
-- add extra row number column
-- rows: 1, 2, 3
select *,
row_number() over (order by timestamp) as rn
from test
where Source = 'Pool'
),
pools_rn_prev as (
-- add extra row number column increased by one
-- like sliding a copy of the table one row down
-- rows: 2, 3, 4
select timestamp as timestamp_prev,
row_number() over (order by timestamp)+1 as rn
from test
where Source = 'Pool'
),
pools as (
-- now join prev two tables on this column
-- each row will join with its predecessor
select timestamp, source
from pools_rn
left outer join pools_rn_prev
on pools_rn.rn = pools_rn_prev.rn
where
-- then select rows which are >50ms apart
timestamp_prev is null or
timestamp - interval '00:00:00.05' > timestamp_prev
)
...
https://www.db-fiddle.com/f/gXmSxbqkrxpvksE8Q4ogEU/2
Sliding window
Modern SQL can do something similar, with partitioning by source, then using sliding window to join with the previous row.
with
...
pools_with_prev as (
-- use sliding window to join prev timestamp
select *,
timestamp - interval '00:00:00.05'
as timestamp_prev_limit,
lag(timestamp) over(
partition by Source order by timestamp
) as timestamp_prev
from test
),
pools as (
select timestamp, Source
from pools_with_prev
-- then select rows which are >50ms apart
where timestamp_prev is NULL or
timestamp_prev < timestamp_prev_limit
)
...
https://www.db-fiddle.com/f/8KfTyqRBU62SFSoiZfpu6Q/1
I believe this is the most optimal.
In a table I have records with id's 2,4,5,8. How can I receive a list with values 1,3,6,7. I have tried in this way
SELECT t1.id + 1
FROM table t1
WHERE NOT EXISTS (
SELECT *
FROM table t2
WHERE t2.id = t1.id + 1
)
but it's not working correctly. It doesn't bring all available positions.
Is it possible without another table?
You can get all the missing ID's from a recursive CTE, like this:
with recursive numbers as (
select 1 number
from rdb$database
union all
select number+1
from rdb$database
join numbers on numbers.number < 1024
)
select n.number
from numbers n
where not exists (select 1
from table t
where t.id = n.number)
the number < 1024 condition in my example limit the query to the max 1024 recursion depth. After that, the query will end with an error. If you need more than 1024 consecutive ID's you have either run the query multiple times adjusting the interval of numbers generated or think in a different query that produces consecutive numbers without reaching that level of recursion, which is not too difficult to write.
tblUserProfile - I have a table which holds all the Profile Info (too many fields)
tblMonthlyProfiles - Another table which has just the ProfileID in it (the idea is that this table holds 2 profileids which sometimes become monthly profiles (on selection))
Now when I need to show monthly profiles, I simply do a select from this tblMonthlyProfiles and Join with tblUserProfile to get all valid info.
If there are no rows in tblMonthlyProfile, then monthly profile section is not displayed.
Now the requirement is to ALWAYS show Monthly Profiles. If there are no rows in monthlyProfiles, it should pick up 2 random profiles from tblUserProfile. If there is only one row in monthlyProfiles, it should pick up only one random row from tblUserProfile.
What is the best way to do all this in one single query ?
I thought something like this
select top 2 * from tblUserProfile P
LEFT OUTER JOIN tblMonthlyProfiles M
on M.profileid = P.profileid
ORder by NEWID()
But this always gives me 2 random rows from tblProfile. How can I solve this ?
Try something like this:
SELECT TOP 2 Field1, Field2, Field3, FinalOrder FROM
(
select top 2 Field1, Field2, Field3, FinalOrder, '1' As FinalOrder from tblUserProfile P JOIN tblMonthlyProfiles M on M.profileid = P.profileid
UNION
select top 2 Field1, Field2, Field3, FinalOrder, '2' AS FinalOrder from tblUserProfile P LEFT OUTER JOIN tblMonthlyProfiles M on M.profileid = P.profileid ORDER BY NEWID()
)
ORDER BY FinalOrder
The idea being to pick two monthly profiles (if that many exist) and then 2 random profiles (as you correctly did) and then UNION them. You'll have between 2 and 4 records at that point. Grab the top two. FinalOrder column is an easy way to make sure that you try and get the monthly's first.
If you have control of the table structure, you might save yourself some trouble by simply adding a boolean field IsMonthlyProfile to the UserProfile table. Then it's a single table query, order by IsBoolean, NewID()
In SQL 2000+ compliant syntax you could do something like:
Select ...
From (
Select TOP 2 ...
From tblUserProfile As UP
Where Not Exists( Select 1 From tblMonthlyProfile As MP1 )
Order By NewId()
) As RandomProfile
Union All
Select MP....
From tblUserProfile As UP
Join tblMonthlyProfile As MP
On MP.ProfileId = UP.ProfileId
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) >= 1
Union All
Select ...
From (
Select TOP 1 ...
From tblUserProfile As UP
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) = 1
Order By NewId()
) As RandomProfile
Using SQL 2005+ CTE you can do:
With
TwoRandomProfiles As
(
Select TOP 2 ..., ROW_NUMBER() OVER ( ORDER BY UP.ProfileID ) As Num
From tblUserProfile As UP
Order By NewId()
)
Select MP.Col1, ...
From tblUserProfile As UP
Join tblMonthlyProfile As MP
On MP.ProfileId = UP.ProfileId
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) >= 1
Union All
Select ...
From TwoRandomProfiles
Where Not Exists( Select 1 From tblMonthlyProfile As MP1 )
Union All
Select ...
From TwoRandomProfiles
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) = 1
And Num = 1
The CTE has the advantage of only querying for the random profiles once and the use of the ROW_NUMBER() column.
Obviously, in all the UNION statements the number and type of the columns must match.