Is there a way to pull just the Year out a VARCHAR datetime value? - date

I am working on a project, in Snowflake, that requires me to combine pest & weather data tables, but the opposing tables do not share a common column. My solution has been to create a view that extracts the year from the Pest Table dates, format ex.
CREATION_DATE: 03/26/2020 09:11:15 PM,
to match the YEAR column in the Weather tables, format ex.
DATEYEAR: 2021.
However, I have come to find that the dates in the pest report are VARCHAR as opposed to traditional date/datetime values. Is there a way to pull just the Year out the VARCHAR date value? Additional information: I cannot change the tables themselves, I will need to create a view that preserves all other columns and adds a new "DATEYEAR" column.

Yes , we can and below is working example:
create table test (dt string );
insert into test(dt) values ('01/04/2022');
Select dt, DATE_PART( year, dt::date) from test

To make it easy, you can split the string into an array and take the third member of the array (using 2 since arrays are 0 based):
select strtok_to_array('03/26/2020', '/')[2]::int as MY_YEAR;

Related

How to convert a specific text string to today's date in Power BI

I have a table with a column called that contains text. Most of the values are years. However, some have the value "present" to represent groups that are still currently active. I want to convert those values to today's year, and convert the column type to date (specifically year type). I want to avoid creating a new column if possible.
Please see below for the DAX language that worked.
= Table.ReplaceValue(#"Extracted Year2",null,DateTime.LocalNow() ,Replacer.ReplaceValue,{"Disbanded"})

Which data type for date-type attributes in a dimensional table, including start & end dates?

I'm designing a data warehouse using dimensional modeling. I've read most of the Data Warehouse Toolkit by Kimbal & Ross. My question is regarding the columns in a dimensional table that hold dates. For example, here is a table for Users of the application:
CREATE TABLE user_dim (
user_key BIGINT, -- surrogate key
user_id BIGINT, -- natural key
user_name VARCHAR(100),
...
user_added_date DATE, -- type 0, date user added to the system
...
-- Type-2 SCD administrative columns
row_start_date DATE, -- first effective date for this row
row_end_date DATE, -- last effective date for this row, 9999-12-31 if current
row_current_flag VARCHAR(10), -- current or expired
)
The last three attributes are for implementing type 2 slowly-changing dimensions. See Kimbal page 150-151.
Question 1: Is there are best practice for the data type of the row_start_date and row_end_date columns? The type could be DATE (as shown), STRING/VARCHAR/CHAR ("YYYY-MM-DD"), or even BIGINT (foreign key to Date Dimension). I don't think there would be much filtering on the row start/end dates, so a key to the Date Dimension is not required.
Question 2: Is there a best practice for the data type of dimension attributes such "user_added_date"? I can see someone wanting reports on users added per fiscal quarter, so using a foreign key to Date Dimension would be helpful. Any downsides to this, besides having to join from User Dimension to Date Dimension for display of the attribute?
If it matters, I'm using Amazon Redshift.
Question 1 : For the SCD from and to dates I suggest you use timestamp. My preference is WITHOUT time zone and ensure all of your timestamps are UTC
Question 2 : I always set up a date dimension table with a logical key of the actual date. that way you can join any date (e.g. the start date of the user) to the date dimension to find the eg "fiscal month" or whatever off the date dimension. But also you can see the date without joining to the date dimension as its plain to see (stored as a date)
With redshift (or any columnar MPP DBMS) it is good practice to denormalise a little. e.g. use star schema rather than snowflake schema. This is because of the efficiencies that columnar brings, and deals with the inneficient joins (because there are no indexes)
For Question 1: row_start_date and row_end_date are not part of the incoming data. As you mentioned they are created artifially for SCD Type 2 purposes, so they should not have a key to Date dimension. User dim has no reason to have a key to Date dimension. For data type YYYY-MM-DD should be fine.
For Question 2: If you have a requirement like this I would suggest creating a derived fact table (often called accumulating snapshot fact table) to keep derived measures like user_added_date
For more info see https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/accumulating-snapshot-fact-table/

Date values from an int and a string in separate columns

I have an issue where I'm trying to get a date value out of 2 columns. One has int values in the '2017" format and the other has 'April' as the 2nd. How can I combine them to create a "2017-04-0000" date?
Thanks
Your best option is to use the dateparse formula. That way you can combine your separate fields and create a new datestamp.
Create the calculated field as follows, note here I have made the day the 1st of the month you could change this to suite your purpose:
dateparse('ddMMMMyyyy', '01'+[Manufacture Month]+str([Manufactured year]))
Find out more here

Cast varchar as date select distinct top 100

I am trying to fix a query that has come to light in SSRS after the new year. We have an input that comes from another application. It grabs a date and stores it as varchar. The SSRS report then fetches the top 100 'dates' but when 2017 dates have come around, this are not in the top 100.
The existing query is as follows
SELECT DISTINCT TOP (100)
FROM DenverTempData
ORDER by BY Date DESC
The date is stored as VARCHAR. So obviously this query doesn't grab a value such as 01012017 as being a top 100 (over values likes 12312016). I thought maybe I can simply change the datatype on this column to datetime. But the information comes from a flat file and is converted, so it's a little more difficult that that. So I'm hoping to do a select of the distinct top 100 while converting the date column to datetime or just date and grabbing the last 100 dates.
Can someone help with the query syntax? I'm thinking a cast to convert varchar to date, but how do I format with distinct top 100? I'm simply looking to retrieve the last 100 dates in chronological order from a column that is stored as varchar but contains a string representing a date.
Hopefully that makes sense
It is always a bad idea to store a date as string. This is highly culture specific!
You can cast your formatted string-date to a real date like this:
DECLARE #DateMMDDYYYY VARCHAR(100)='12312016';
SELECT CONVERT(DATE,STUFF(STUFF(#DateMMDDYYYY,5,0,'-'),3,0,'-'),110)
After the conversion your sorting (and therefore the TOP 100) should work as expected.
My strong advise: Try to store your dates in a column of a real date type to avoid such hassel!
SELECT DISTINCT TOP 100 (CAST(VarcharColumn as Date) as DateColumn)
FROM TABLE
Order by DateColumn desc

Comparing two time columns in ASP.NET

I'm rather new to ASP.NET and SQL, so I'm having a tough time trying to figure out how to compare two time columns. I have a timestamped column and then a Now() column in an .mdb database. I need to have a gridview display records that are "Greater than or equal to 3 hours" from the timestamp. Any idea how I can accomplish this?
The Transact-SQL timestamp data type is a binary data type with no time-related values.
So to answer your question: Is there a way to get DateTime value from timestamp type column?
The answer is: No
You need another column of datetime2 type and use > operator to for comparison. You might want to set default value of getutcdate() to set it when each row is inserted.
UPDATE:
Since the column is of datetime type and not timestamp type (there is a type in SQL Server called timestamp, hence the confusion) you can just do
WHERE [TimeCalled] <= DATEADD(hour, -3, GETDATE())
Make sure your server is running in the same timezone as your code. It may be safer to store all dates in UTC. In that case use GETUTCDATE instead on GETDATE
Timestamps are generally used to track changes to records, and are updated every time the record is changed. If you want to store a specific value you should use a datetime field.
If you're using a DateTime Column and you want the result in TSQL try
DATEDIFF(Hour, 'Your DateTime Column here', 'pass Now() here' )
try to execute this example in TSQL:
select DATEDIFF(Hour, '2012-11-10 00:00:59.900', '2012-11-10 05:01:00.100')