Avoiding default date value to be used when only time is provided to a datetime field on Redshift - amazon-redshift

I created a table with a datetime field "dt". Using COPY command to load data. The corresponding value for the field from the file is just the hour information, i.e., say, 14:50:00. So, the value being stored is 1900-01-01 14:50:00. I don't need the date part. How to do that.
Or may be an alternate datatype which can store only time.

Amazon Redshift supports only date(year month day) and timestamp(year month day hour minute second) format, and it doesn't support time(hour minute second) format of Postgresql.
In my idea, there are two ways to work around.
As #Damien_The_Unbeliever mentioned, ignore the date part of the timestamp format.
create table date_test(id int, timestamp timestamp);
insert into date_test2 values (1, '1900-01-01 14:50:00');
insert into date_test2 values (2, '1900-01-01 17:20:00');
select * from date_test2 where timestamp > '1900-01-01 14:50:00';
select * from date_test where date_test.timestamp > '1900-01-01 14:50:00';
id | timestamp
----+---------------------
2 | 1900-01-01 17:20:00
(1 row)
Use char or varchar type to store the time value.
create table date_test2(id int, timestamp char(8));
insert into date_test2 values (1, '14:50:00');
insert into date_test2 values (2, '17:20:00');
select * from date_test2 where timestamp > '14:50:00';
id | timestamp
----+-----------
2 | 17:20:00
(1 row)
The second solution looks easier, but it is worse performance as Redshift doc says. If you store a large amount of data, you should consider of the first one.
Here are the related links to the document about date/time column.
http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-timestamp-date-columns.html
http://docs.aws.amazon.com/redshift/latest/dg/r_Datetime_types.html

Related

TIMESTAMP- creation_date :: date between '2022-05-15' and '2022-06-15'

I just wanted to know the difference between these two codes:
select count (user_id) from tb_users where
creation_date :: date between '2022-05-15' and '2022-06-15'
Result: 41,232
select count (user_id) from tb_users where
creation_date between '2022-05-15' and '2022-06-15'
Result: 40,130
As far as I see, it is related with the timestamp, but I do not understand the difference.
Thank you!
Your column creation_date in the table is most probably in timestamp format, which is '2022-05-15 00:00:00'. By adding ::date <- you are casting your timestamp format to date format: '2022-05-15'.
You can read more about casting data types here:
https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-cast/
When you ask Postgres to implicitly coerce a DATE value to a TIMESTAMP value - the hours, minutes and seconds are set to zero.
In the first query, you explicitly cast the creation date to DATE which is successfully compared to the provided DATE values.
In the second query, the creation date is of type TIMESTAMP and so PostgreSQL converts your DATE values to TIMESTAMP values and the comparison becomes
creation_date >= '2022-05-15 00:00:00' AND creation_date <= '2022-06-15 00:00:00'
Obviously, this produces different resultset than the first query.

How to run the postgres query with date as input on the column with timestamp in long format

I want to query postgres database table which has the column with timestamp in long milliseconds. But I have the time in date format "yyyy-MM-dd HH:mm:ssZ" like this. How can I convert this date format to long milliseconds to run the query?
You can either convert your long value to a proper timestamp:
select *
from the_table
where to_timestamp(the_millisecond_column / 1000) = timestamp '2020-10-05 07:42'
Or extract the seconds from the timestamp value :
select *
from the_table
where the_millisecond_column = extract(epoch from timestamp '2020-10-05 07:42') * 1000
The better solution is however to convert that column to a proper timestamp column to avoid the constant conversion between (milliseconds) and proper timestamp values

Select if number range contains number in PostgreSQL

I need to be able to find a row based on a number range, saved as a text field. For example, the field tuesday looks like 540-1020. I want to retrieve this row if I search for 900. So far I have,
SELECT string_to_array(tuesday, '-')
FROM coverage
WHERE 900 IN string_to_array(tuesday, '-')
Where string_to_array(tuesday, '-') prints out like {540,1020}. How can I convert it into a selectable integer range?
Use a range.
SELECT string_to_array(tuesday, '-')
FROM coverage
WHERE 900 <# int4range(split_part(tuesday, '-', 1)::int4, split_part(tuesday, '-', 2)::int4, '[]');
That last parameter [] signifies an inclusive range where '100-900' would match. You could also do an exclusive upper range like [) (note the right paren) where '100-900' would not match because the upper number is excluded from the set of matching numbers.
For better query speed as your table gets larger, you can add a GIST functional index.
CREATE INDEX tuesday_range_idx ON coverage
USING GIST (int4range(split_part(tuesday, '-', 1)::int4, split_part(tuesday, '-', 2)::int4, '[]'));
This is exposing some weaknesses in your data model. By having each day as a column, you'd have to create a separate functional index for each column. You're also having to parse text into an array every time you run this. Typically you'd want the data in the table to match how you access it, not its serialized form.
Instead of
CREATE TABLE coverage (
id serial PRIMARY KEY,
year smallint, -- tracking by week
week_num smallint, -- for example
sunday varchar,
monday varchar,
tuesday varchar,
wednesday varchar,
thursday varchar,
friday varchar,
saturday varchar
);
why not something like
CREATE TABLE coverage (
id serial PRIMARY KEY,
day date NOT NULL UNIQUE,
daily_data int4range NOT NULL
);
INSERT INTO coverage (day, daily_data)
VALUES ('2020-06-02', '[540,1020]');
Then your search looks like
SELECT daily_data
FROM coverage
WHERE extract(DOW FROM day) = 2 -- Tuesday (Sunday is 0, Saturday is 6)
AND 900 <# daily_data;
You can make indexes for the daily data ranges, by date (already a unique index in my example), functional indexes for the day of the week, month, year, etc. Much more flexible.
Or if you absolutely want an array back from your SELECT
SELECT ARRAY[lower(daily_data), upper(daily_data)]
FROM coverage
WHERE extract(DOW FROM day) = 2 -- Tuesday (Sunday is 0, Saturday is 6)
AND 900 <# daily_data;

need tsql code for time intervel and the values should go to that particular interval the original time is in epoch format

I have to add column to table.
The column value is calculated based on the column already present in the table.
I have to get time-stamp (column already present) and then group them into 5 min time-slots.
E.g: if the time is:
13:03/13:02 then it should go as 13:00;
13:53/13:52 then it should go as 13:50;
13:21 then should go as 13:20 and so on
PS: basically I have to get time stamp in epoch (UNIX time stamp)format [the table has values in epoch as well as in regular time stamp]
So what I'm seeing is that you have times and you need to round them down to the nearest 5 minute increment. Try this:
DECLARE #table TABLE (times TIME)
INSERT INTO #table
VALUES ('13:03'),
('13:02'),
('13:53'),
('13:52'),
('13:21');
SELECT times,
DATEADD(MINUTE,-DATEDIFF(MINUTE,0,times) % 5,times) five_minute_increments
FROM #table
Results:
times five_minute_increments
---------------- ----------------------
13:03:00.0000000 13:00:00.0000000
13:02:00.0000000 13:00:00.0000000
13:53:00.0000000 13:50:00.0000000
13:52:00.0000000 13:50:00.0000000
13:21:00.0000000 13:20:00.0000000
Epoch Version
DECLARE #epoch BIGINT;
--Epoch is the seconds since Jan 1,1970
SET #epoch = DATEDIFF(SECOND,'1970-01-01','2015-04-01 12:06:00.000');
SELECT CAST(DATEADD(SECOND,#epoch - (#epoch %300),'1970-01-01 00:00:00.000') AS TIME) AS epochTimes
Results:
12:05:00.0000000
create table epoch1(epoch int not null,
epoch_date as dateadd(s,epoch,'19700101'))
insert epoch1(epoch) values(1331070999)
insert epoch1(epoch) values(1331070956)
insert epoch1(epoch) values(1331071998)
insert epoch1(epoch) values(1331071999)
select DATEADD(MINUTE,-DATEDIFF(MINUTE,0,dateadd(MINUTE,epoch,'19700101')) % 5,dateadd(MINUTE,epoch,'19700101')) as human ,DATEDIFF(MINUTE, '1970-01-01 00:00:00', DATEADD(MINUTE,-DATEDIFF(MINUTE,0,dateadd(MINUTE,epoch,'19700101')) % 5,dateadd(MINUTE,epoch,'19700101'))) as timeinterval
from epoch1

TSQL update Datetime with Random Value between 2 Dates

What's the easiest way to update a table that contains a DATETIME column on TSQL with RANDOM value between 2 dates?
I see various post related to that but their Random values are really sequential when you ORDER BY DATE after the update.
Assumptions
First assume that you have a database containing a table with a start datetime column and a end datetime column, which together define a datetime range:
CREATE DATABASE StackOverflow11387226;
GO
USE StackOverflow11387226;
GO
CREATE TABLE DateTimeRanges (
StartDateTime DATETIME NOT NULL,
EndDateTime DATETIME NOT NULL
);
GO
ALTER TABLE DateTimeRanges
ADD CONSTRAINT CK_PositiveRange CHECK (EndDateTime > StartDateTime);
And assume that the table contains some data:
INSERT INTO DateTimeRanges (
StartDateTime,
EndDateTime
)
VALUES
('2012-07-09 00:30', '2012-07-09 01:30'),
('2012-01-01 00:00', '2013-01-01 00:00'),
('1988-07-25 22:30', '2012-07-09 00:30');
GO
Method
The following SELECT statement returns the start datetime, the end datetime, and a pseudorandom datetime with minute precision greater than or equal to the start datetime and less than the second datetime:
SELECT
StartDateTime,
EndDateTime,
DATEADD(
MINUTE,
ABS(CHECKSUM(NEWID())) % DATEDIFF(MINUTE, StartDateTime, EndDateTime) + DATEDIFF(MINUTE, 0, StartDateTime),
0
) AS RandomDateTime
FROM DateTimeRanges;
Result
Because the NEWID() function is nondeterministic, this will return a different result set for every execution. Here is the result set I generated just now:
StartDateTime EndDateTime RandomDateTime
----------------------- ----------------------- -----------------------
2012-07-09 00:30:00.000 2012-07-09 01:30:00.000 2012-07-09 00:44:00.000
2012-01-01 00:00:00.000 2013-01-01 00:00:00.000 2012-09-08 20:41:00.000
1988-07-25 22:30:00.000 2012-07-09 00:30:00.000 1996-01-05 23:48:00.000
All the values in the column RandomDateTime lie between the values in columns StartDateTime and EndDateTime.
Explanation
This technique for generating random values is due to Jeff Moden. He wrote a great article on SQL Server Central about data generation. Read it for a more thorough explanation. Registration is required, but it's well worth it.
The idea is to generate a random offset from the start datetime, and add the offset to the start datetime to get a new datetime in between the start datetime and the end datetime.
The expression DATEDIFF(MINUTE, StartDateTime, EndDateTime) represents the total number of minutes between the start datetime and the end datetime. The offset must be less than or equal to this value.
The expression ABS(CHECKSUM(NEWID())) generates an independent random positive integer for every row. The expression can have any value from 0 to 2,147,483,647. This expression mod the first expression gives a valid offset in minutes.
The epxression DATEDIFF(MINUTE, 0, StartDateTime) represents the total number of minutes between the start datetime and a reference datetime of 0, which is shorthand for '1900-01-01 00:00:00.000'. The value of the reference datetime does not matter, but it matters that the same reference date is used in the whole expression. Add this to the offset to get the total number of minutes between the reference datetime.
The ecapsulating DATEADD function converts this to a datetime value by adding the number of minutes produced by the previous expression to the reference datetime.
You can use RAND for this:
select cast(cast(RAND()*100000 as int) as datetime)
from here
Sql-Fiddle looks quite good: http://sqlfiddle.com/#!3/b9e44/2/0