redshift delete records with timestamp of saturday - amazon-redshift

Hi Guys I'm new to the redshift and need help on this please
The requirement is to delete records from a table using the dw_created_date where the weekday is saturday.
Please Help

I haven't used Amazon Redshift yet but after reading the documentation you should find your records using:
SELECT *, to_char(dw_created_date, 'D') dayofweek FROM table
Number 6 in "dayofweek" would refer to a saturday so this would delete the records:
DELETE FROM table WHERE to_char(dw_created_date, 'D') = 6
Good luck!

The Amazon Redshift function TO_CHAR allows you to extract date parts and information about dates and timestamps from Redshift and to obtain date parts.
You can find information about the TO_CHAR function here.
In order to extract the information that you need from the TO_CHAR function, you need to use the appropriate datetime format string. As examples, "D" returns the day of the week number, DY returns an abbreviation of the weekday name, and DAY provides the complete fully spelled weekday name.
You can find the information about the date time format strings for Redshift here.
Below I am providing a quick snippet of code that shows how the TO_CHAR function works.
create table tba (colint integer, colts timestamp) distkey (colint) sortkey (colts);
insert into tba (colint, colts) values (1, '2016-08-08 08:08:08');
insert into tba (colint, colts) values (1, '2016-08-09 09:09:09');
insert into tba (colint, colts) values (1, '2016-08-10 10:10:10');
insert into tba (colint, colts) values (1, '2016-08-11 10:11:11');
insert into tba (colint, colts) values (12, '2016-08-12 12:12:12');
insert into tba (colint, colts) values (13, '2016-08-13 13:13:13');
insert into tba (colint, colts) values (14, '2016-08-14 14:14:14');
insert into tba (colint, colts) values (15, '2016-08-15 15:15:15');
insert into tba (colint, colts) values (16, '2016-08-16 16:16:16');
insert into tba (colint, colts) values (17, '2016-08-17 17:17:17');
insert into tba (colint, colts) values (18, '2016-08-18 18:18:18');
insert into tba (colint, colts) values (20, '2016-08-20 20:20:20');
insert into tba (colint, colts) values (6, '2016-08-06 06:06:06');
select *
, to_char(colts,'D') day_of_week_number
, to_char(colts,'DAY') day_of_week_name
, to_char(colts,'DY') day_of_week_abbrev
from tba;
colint | colts | day_of_week_number | day_of_week_name | day_of_week_abbrev
--------+---------------------+--------------------+------------------+--------------------
15 | 2016-08-15 15:15:15 | 2 | MONDAY | MON
16 | 2016-08-16 16:16:16 | 3 | TUESDAY | TUE
18 | 2016-08-18 18:18:18 | 5 | THURSDAY | THU
1 | 2016-08-08 08:08:08 | 2 | MONDAY | MON
1 | 2016-08-09 09:09:09 | 3 | TUESDAY | TUE
1 | 2016-08-10 10:10:10 | 4 | WEDNESDAY | WED
1 | 2016-08-11 10:11:11 | 5 | THURSDAY | THU
12 | 2016-08-12 12:12:12 | 6 | FRIDAY | FRI
13 | 2016-08-13 13:13:13 | 7 | SATURDAY | SAT
14 | 2016-08-14 14:14:14 | 1 | SUNDAY | SUN
17 | 2016-08-17 17:17:17 | 4 | WEDNESDAY | WED
20 | 2016-08-20 20:20:20 | 7 | SATURDAY | SAT
6 | 2016-08-06 06:06:06 | 7 | SATURDAY | SAT
(13 rows)
Last, but not least, there are two very important things that you should pay attention if you are new to Redshift. Every time you delete or update a significant amount of data, there are 2 things you should always do:
VACUUM - Amazon Redshift does "logical" deletes of data when a "DELETE" or "UPDATE" data. So if you change a substantial number of records, you should run a vacuum to physically remove the data from your table. My "rule of thumb" is that whenever more than 5% of the data has been deleted or updated, it is time to run a VACUUM. You should run a command such as vacuum full tab;. You can find more info about VACUUM here.
ANALYZE- Amazon Redshift depends on accurate statistics on table data and distributions in order to create the most efficient query plan. If you delete, insert, or change a "significant" portion of data. You should run an ANALYZE command on the table to make sure the database has up-to-date statistics. A sample command is ANALYZE VERBOSE TBA; . You can find more information about ANALYZE here.

Related

How to calculate current month / six months ago and result as a percent change in Postgresql?

create table your_table(type text,compdate date,amount numeric);
insert into your_table values
('A','2022-01-01',50),
('A','2022-02-01',76),
('A','2022-03-01',300),
('A','2022-04-01',234),
('A','2022-05-01',14),
('A','2022-06-01',9),
('B','2022-01-01',201),
('B','2022-02-01',33),
('B','2022-03-01',90),
('B','2022-04-01',41),
('B','2022-05-01',11),
('B','2022-06-01',5),
('C','2022-01-01',573),
('C','2022-02-01',77),
('C','2022-03-01',109),
('C','2022-04-01',137),
('C','2022-05-01',405),
('C','2022-06-01',621);
I am trying to calculate to show the percentage change in $ from 6 months prior to today's date for each type. In example:
Type A decreased -82% over six months.
Type B decreased -97.5%
Type C increased +8.4%.
How do I write this in postgresql mixed in with other statements?
It looks like comparing against 5, not 6 months prior, and 2022-06-01 isn't today's date.
Join the table with itself based on the matching type and desired time difference. Demo
select
b.type,
b.compdate,
a.compdate "6 months earlier",
b.amount "amount 6 months back",
round(-(100-b.amount/a.amount*100),2) "change"
from your_table a
inner join your_table b
on a.type=b.type
and a.compdate = b.compdate - '5 months'::interval;
-- type | compdate | 6 months earlier | amount 6 months back | change
--------+------------+------------------+----------------------+--------
-- A | 2022-06-01 | 2022-01-01 | 9 | -82.00
-- B | 2022-06-01 | 2022-01-01 | 5 | -97.51
-- C | 2022-06-01 | 2022-01-01 | 621 | 8.38

how can I generate 6 month dates from a specific date

I have a table pqdf.
which have Effective_Date column, first I will do distinct of Effective_Date.
now from this date I want to generate 6 months dates,
if my start date is 2022-01-01 then my table last row value will be 2022-06-30. and total row count be around 181 rows
+----------------+
| Effective_Date |
+----------------+
| 2022-01-01 |
| 2022-01-01 |
| 2022-01-01 |
+----------------+
please help
I tried below but query but its not working.
select explode (sequence( first_value(to_date('Effective_Date'))), to_date(DATEADD(month, 6, Effective_Date)), interval 1 day) as date from pqdf
See if this works. If it doesn't, can you please also provide the error message that you are seeing?
WITH pqdf AS (
SELECT "2022-01-01" AS Effective_Date
)
SELECT
EXPLODE(SEQUENCE(
DATE(Effective_Date),
TO_DATE(DATEADD(MONTH, 6, DATE(Effective_Date))),
INTERVAL 1 DAY)
) AS date
FROM
pqdf

historical aggregation of a column up until a specified time in each row in another column

I have two tables login_attempts and checkouts in Amazon RedShift. A user can have multiple (un)successful login attempts and multiple (un)successful checkouts as shown in this example:
login_attempts
login_id | user_id | login | success
-------------------------------------------------------
1 | 1 | 2021-07-01 14:00:00 | 0
2 | 1 | 2021-07-01 16:00:00 | 1
3 | 2 | 2021-07-02 05:01:01 | 1
4 | 1 | 2021-07-04 03:25:34 | 0
5 | 2 | 2021-07-05 11:20:50 | 0
6 | 2 | 2021-07-07 12:34:56 | 1
and
checkouts
checkout_id | checkout_time | user_id | success
------------------------------------------------------------
1 | 2021-07-01 18:00:00 | 1 | 0
2 | 2021-07-02 06:54:32 | 2 | 1
3 | 2021-07-04 13:00:01 | 1 | 1
4 | 2021-07-08 09:05:00 | 2 | 1
Given this information, how can I get the following table with historical performance included for each checkout AS OF THAT TIME?
checkout_id | checkout | user_id | lastGoodLogin | lastFailedLogin | lastGoodCheckout | lastFailedCheckout |
---------------------------------------------------------------------------------------------------------------------------------------
1 | 2021-07-01 18:00:00 | 1 | 2021-07-01 16:00:00 | 2021-07-01 14:00:00 | NULL | NULL
2 | 2021-07-02 06:54:32 | 2 | 2021-07-02 05:01:01 | NULL | NULL | NULL
3 | 2021-07-04 13:00:01 | 1 | 2021-07-01 16:00:00 | 2021-07-04 03:25:34 | NULL | 2021-07-01 18:00:00
4 | 2021-07-08 09:05:00 | 2 | 2021-07-07 12:34:56 | 2021-07-05 11:20:50 | 2021-07-02 06:54:32 | NULL
Update: I was able to get lastFailedCheckout & lastGoodCheckout because that's doing window operations on the same table (checkouts) but I am failing to understand how to best join it with login_attempts table to get last[Good|Failed]Login fields. (sqlfiddle)
P.S.: I am open to PostgreSQL suggestions as well.
Good start! A couple things in your SQL - 1) You should really try to avoid inequality joins as these can lead to data explosions and aren't needed in this case. Just put a CASE statement inside your window function to use only the type of checkout (or login) you want. 2) You can use the frame clause to not self select the same row when finding previous checkouts.
Once you have this pattern you can use it to find the other 2 columns of data you are looking for. The first step is to UNION the tables together, not JOIN. This means making a few more columns so the data can live together but that is easy. Now you have the userid and the time the "thing" happened all in the same data. You just need to WINDOW 2 more times to pull the info you want. Lastly, you need to strip out the non-checkout rows with an outer select w/ where clause.
Like this:
create table login_attempts(
loginid smallint,
userid smallint,
login timestamp,
success smallint
);
create table checkouts(
checkoutid smallint,
userid smallint,
checkout_time timestamp,
success smallint
);
insert into login_attempts values
(1, 1, '2021-07-01 14:00:00', 0),
(2, 1, '2021-07-01 16:00:00', 1),
(3, 2, '2021-07-02 05:01:01', 1),
(4, 1, '2021-07-04 03:25:34', 0),
(5, 2, '2021-07-05 11:20:50', 0),
(6, 2, '2021-07-07 12:34:56', 1)
;
insert into checkouts values
(1, 1, '2021-07-01 18:00:00', 0),
(2, 2, '2021-07-02 06:54:32', 1),
(3, 1, '2021-07-04 13:00:01', 1),
(4, 2, '2021-07-08 09:05:00', 1)
;
SQL:
select *
from (
select
c.checkoutid,
c.userid,
c.checkout_time,
max(case success when 0 then checkout_time end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastFailedCheckout,
max(case success when 1 then checkout_time end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastGoodCheckout,
max(case lsuccess when 0 then login end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastFailedLogin,
max(case lsuccess when 1 then login end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastGoodLogin
from (
select checkout_time as event_time, checkoutid, userid,
checkout_time, success,
NULL as login, NULL as lsuccess
from checkouts
UNION ALL
select login as event_time,NULL as checkoutid, userid,
NULL as checkout_time, NULL as success,
login, success as lsuccess
from login_attempts
) c
) o
where o.checkoutid is not null
order by o.checkoutid

postgresql unique index preventing overlaping

My table permission looks like:
id serial,
person_id integer,
permission_id integer,
valid_from date,
valid_to date
I'd like to prevent creating permissions which overlaps valid_from, valid_to date
eg.
1 | 1 | 1 | 2010-10-01 | 2999-12-31
2 | 1 | 2 | 2010-10-01 | 2020-12-31
3 | 2 | 1 | 2015-10-01 | 2999-12-31
this can be added:
4 | 1 | 3 | 2011-10-01 | 2999-12-31 - because no such permission
5 | 2 | 1 | 2011-10-10 | 2999-12-31 - because no such person
6 | 1 | 2 | 2021-01-01 | 2999-12-31 - because doesn't overlaps id:2
but this can't
7 | 1 | 1 | 2009-10-01 | 2010-02-01 - because overlaps id:1
8 | 1 | 2 | 2019-01-01 | 2022-12-31 - because overlaps id:2
9 | 2 | 1 | 2010-01-01 | 2016-12-31 - beacuse overlaps id:3
I can do outside checking but wonder if possible to do it on database
A unique constraint is based on an equality operator and cannot be used in this case, but you can use an exclude constraint. The constraint uses btree operators <> and =, hence you have to install btree_gist extension.
create extension if not exists btree_gist;
create table permission(
id serial,
person_id integer,
permission_id integer,
valid_from date,
valid_to date,
exclude using gist (
person_id with =,
permission_id with =,
daterange(valid_from, valid_to) with &&)
);
These inserts are successful:
insert into permission values
(1, 1, 1, '2010-10-01', '2999-12-31'),
(2, 1, 2, '2010-10-01', '2020-12-31'),
(3, 2, 1, '2015-10-01', '2999-12-31'),
(4, 1, 3, '2011-10-01', '2999-12-31'),
(5, 3, 1, '2011-10-10', '2999-12-31'), -- you meant person_id = 3 I suppose
(6, 1, 2, '2021-01-01', '2999-12-31'),
(7, 1, 1, '2009-10-01', '2010-02-01'); -- ranges do not overlap!
but this one is not:
insert into permission values
(8, 1, 2, '2019-01-01', '2022-12-31');
ERROR: conflicting key value violates exclusion constraint "permission_person_id_permission_id_daterange_excl"
DETAIL: Key (person_id, permission_id, daterange(valid_from, valid_to))=(1, 2, [2019-01-01,2022-12-31)) conflicts with existing key (person_id, permission_id, daterange(valid_from, valid_to))=(1, 2, [2010-10-01,2020-12-31)).
Try it in db<>fiddle.

Postgresql Time Series for each Record

I'm having issues trying to wrap my head around how to extract some time series stats from my Postgres DB.
For example, I have several stores. I record how many sales each store made each day in a table that looks like:
+------------+----------+-------+
| Date | Store ID | Count |
+------------+----------+-------+
| 2017-02-01 | 1 | 10 |
| 2017-02-01 | 2 | 20 |
| 2017-02-03 | 1 | 11 |
| 2017-02-03 | 2 | 21 |
| 2017-02-04 | 3 | 30 |
+------------+----------+-------+
I'm trying to display this data on a bar/line graph with different lines per Store and the blank dates filled in with 0.
I have been successful getting it to show the sum per day (combining all the stores into one sum) using generate_series, but I can't figure out how to separate it out so each store has a value for each day... the result being something like:
["Store ID 1", 10, 0, 11, 0]
["Store ID 2", 20, 0, 21, 0]
["Store ID 3", 0, 0, 0, 30]
It is necessary to build a cross join dates X stores:
select store_id, array_agg(total order by date) as total
from (
select store_id, date, coalesce(sum(total), 0) as total
from
t
right join (
generate_series(
(select min(date) from t),
(select max(date) from t),
'1 day'
) gs (date)
cross join
(select distinct store_id from t) s
) using (date, store_id)
group by 1,2
) s
group by 1
order by 1
;
store_id | total
----------+-------------
1 | {10,0,11,0}
2 | {20,0,21,0}
3 | {0,0,0,30}
Sample data:
create table t (date date, store_id int, total int);
insert into t (date, store_id, total) values
('2017-02-01',1,10),
('2017-02-01',2,20),
('2017-02-03',1,11),
('2017-02-03',2,21),
('2017-02-04',3,30);