Can someone help me with the databricks timestamp/timezone syntax issue that im facing while executing my query? - databricks-sql

I am moving my data pointing from SSMS (SQL Server Management Studio) to Databricks. There will be syntax change in the query in databricks as compared to SSMS query. I am not familiar with databricks. So, I am trying to do the same query in databricks as well but, it is showing me error. How should i correct the syntax now? The error is where the conversion is happening to PST timezone. Can anyone help me with this ?
The SQL query is:
select
case
when category_name = 'Hardware' then 'Hardware'
when category_name = 'Services' then 'Services'
when category_name = 'Software' then 'Software'
when category_name = 'Subscription' then 'Cloud'
end as category_name,
sum(item_price * line_item_quantity) as order_amount,
CUSTOMER_ID
from
curated_delta.order_details
where
parentkit_id = 0
and category_name in ('Hardware', 'Software', 'services', 'Subscription')
and category_name is not null
and order_date >= cast(cast(dateadd(dd, -365, convert(datetime, sysdatetimeoffset() AT TIME ZONE 'pacific standard time')) as date) as datetime)
group by
category_name, CUSTOMER_ID
order by
1
The error I get is:
Error in SQL statement: ParseException:
no viable alternative at input 'cast(cast(dateadd(dd,-365,convert(datetime,sysdatetimeoffset() AT'(line 6, pos 79)

Related

Merge join not able to join properly on varchar column

I've created below code to implement SCD type 2 using merge, when i run the code i'm getting primary key violations on csname field. I have the below values as part of primary key, not sure whether merge SQL does support for varchar or not.
if I run the normal inner join SQL on the same key then i'm getting the matching records as well.
Any help much appreciated
csname
ER - Building Complaints
TR - Building Applications
CREATE PROCEDURE dbo.load_target
AS
BEGIN
INSERT INTO [TR_DW].[enum].[Rt]([csname],[enddatetime],[EffectiveToDate],[EffectiveFromDate],[CurrentRecord])
SELECT[csname],[enddatetime],[EffectiveToDate],[EffectiveFromDate],[CurrentRecord]
FROM
(
MERGE [TR_DW].[enum].[Rt] RtCSQSuTT
USING [TR].[enum].[Rt] RtCSQSuST
ON (RtCSQSuTT.csname = RtCSQSuST.csname)
WHEN NOT MATCHED THEN
INSERT ([csname],[enddatetime],[EffectiveToDate],[EffectiveFromDate],[CurrentRecord])
VALUES ([csname],[enddatetime],'12/31/9999', getdate(), 'Y')
WHEN MATCHED AND RtCSQSuTT.[CurrentRecord] = 'Y' AND
(ISNULL(RtCSQSuTT.[enddatetime], '') != ISNULL(RtCSQSuST.[enddatetime], ''))THEN
UPDATE SET
RtCSQSuTT.[CurrentRecord] = 'N',
RtCSQSuTT.[EffectiveFromDate] = GETDATE() - 1,
RtCSQSuTT.[EffectiveToDate] = GETDATE()
OUTPUT $Action Action_Taken,RtCSQSuST.[csqname],RtCSQSuST.[enddatetime],'12/31/9999' AS[EffectiveToDate],GETDATE() AS[EffectiveFromDate],'Y' AS[CurrentRecord]
)AS MERGE_OUT21
WHERE MERGE_OUT21.Action_Taken = 'UPDATE';
END
GO

Repeated use of parameter for multiple UDF's in FROM throws an invalid column name error

When using multiple table-valued functions in a query like beneath, SSMS throws an error. Also, the [Date] parameter of [PRECALCPAGES_asof] is underlined in red.
I am trying to understand why this fails. I think this might be related to the way the SQL Server engine works. Have looked into documentation on MSDN but unfortunately I do not know what to look for. Why is this caused and is there a way around it?
Query
SELECT
[Date]
, COUNT(*)
FROM
[Warehouse].[dbo].[DimDate]
CROSS APPLY
[PROJECTS_asof]([Date])
INNER JOIN
[PRECALCPAGES_asof]([Date]) ON [PRECALCPAGES_asof].[PROJECTID] = [PROJECTS_asof].[PROJECTID]
GROUP BY
[Date]
Error
Msg 207, Level 16, State 1, Line 9
Invalid column name 'Date'.
Functions
CREATE FUNCTION [ProfitManager].[PROJECTS_asof]
(
#date DATETIME
)
RETURNS TABLE AS
RETURN
(
SELECT
[PROJECTID]
, [PROJECT]
, ...
FROM
Profitmanager.[PROJECTS_HISTORY]
WHERE
[RowStartDate] <= #date
AND
[RowEndDate] > #date
)
GO
CREATE FUNCTION [ProfitManager].[PRECALCPAGES_asof]
(
#date DATETIME
)
RETURNS TABLE AS
RETURN
(
SELECT
[PAGEID]
, [PAGENAME]
, ...
FROM
Profitmanager.[PRECALCPAGES_HISTORY]
WHERE
[RowStartDate] <= #date
AND
[RowEndDate] > #date
)
GO
I think you can't use fields from tables as parameters to a function in a join. You should use cross apply.
SELECT
[Date]
, COUNT(*)
FROM
[Warehouse].[dbo].[DimDate]
CROSS APPLY
[PROJECTS_asof]([Date])
CROSS APPLY
[PRECALCPAGES_asof]([Date])
WHERE
[PRECALCPAGES_asof].[PROJECTID] = [PROJECTS_asof].[PROJECTID]
GROUP BY
[Date]

Column is of type timestamp without time zone but expression is of type character

I'm trying to insert records on my trying to implement an SCD2 on Redshift
but get an error.
The target table's DDL is
CREATE TABLE ditemp.ts_scd2_test (
id INT
,md5 CHAR(32)
,record_id BIGINT IDENTITY
,from_timestamp TIMESTAMP
,to_timestamp TIMESTAMP
,file_id BIGINT
,party_id BIGINT
)
This is the insert statement:
INSERT
INTO ditemp.TS_SCD2_TEST(id, md5, from_timestamp, to_timestamp)
SELECT TS_SCD2_TEST_STAGING.id
,TS_SCD2_TEST_STAGING.md5
,from_timestamp
,to_timestamp
FROM (
SELECT '20150901 16:34:02' AS from_timestamp
,CASE
WHEN last_record IS NULL
THEN '20150901 16:34:02'
ELSE '39991231 11:11:11.000'
END AS to_timestamp
,CASE
WHEN rownum != 1
AND atom.id IS NOT NULL
THEN 1
WHEN atom.id IS NULL
THEN 1
ELSE 0
END AS transfer
,stage.*
FROM (
SELECT id
FROM ditemp.TS_SCD2_TEST_STAGING
WHERE file_id = 2
GROUP BY id
HAVING count(*) > 1
) AS scd2_count_ge_1
INNER JOIN (
SELECT row_number() OVER (
PARTITION BY id ORDER BY record_id
) AS rownum
,stage.*
FROM ditemp.TS_SCD2_TEST_STAGING AS stage
WHERE file_id IN (2)
) AS stage
ON (scd2_count_ge_1.id = stage.id)
LEFT JOIN (
SELECT max(rownum) AS last_record
,id
FROM (
SELECT row_number() OVER (
PARTITION BY id ORDER BY record_id
) AS rownum
,stage.*
FROM ditemp.TS_SCD2_TEST_STAGING AS stage
)
GROUP BY id
) AS last_record
ON (
stage.id = last_record.id
AND stage.rownum = last_record.last_record
)
LEFT JOIN ditemp.TS_SCD2_TEST AS atom
ON (
stage.id = atom.id
AND stage.md5 = atom.md5
AND atom.to_timestamp > '20150901 16:34:02'
)
) AS TS_SCD2_TEST_STAGING
WHERE transfer = 1
and to short things up, I am trying to insert 20150901 16:34:02 to from_timestamp and 39991231 11:11:11.000 to to_timestamp.
and get
ERROR: 42804: column "from_timestamp" is of type timestamp without time zone but expression is of type character varying
Can anyone please suggest how to solve this issue?
Postgres isn't recognizing 20150901 16:34:02 (your input) as a valid time/date format, so it assumes it's a string.
Use a standard date format instead, preferably ISO-8601. 2015-09-01T16:34:02
SQLFiddle example
Just in case someone ends up here trying to insert into a postgresql a timestamp or a timestampz from a variable in groovy or Java from a prepared statement and getting the same error (as I did), I managed to do it by setting the property stringtype to "unspecified". According to the documentation:
Specify the type to use when binding PreparedStatement parameters set
via setString(). If stringtype is set to VARCHAR (the default), such
parameters will be sent to the server as varchar parameters. If
stringtype is set to unspecified, parameters will be sent to the
server as untyped values, and the server will attempt to infer an
appropriate type. This is useful if you have an existing application
that uses setString() to set parameters that are actually some other
type, such as integers, and you are unable to change the application
to use an appropriate method such as setInt().
Properties props = [user : "user", password: "password",
driver:"org.postgresql.Driver", stringtype:"unspecified"]
def sql = Sql.newInstance("url", props)
With this property set, you can insert a timestamp as a string variable without the error raised in the question title. For instance:
String myTimestamp= Instant.now().toString()
sql.execute("""INSERT INTO MyTable (MyTimestamp) VALUES (?)""",
[myTimestamp.toString()]
This way, the type of the timestamp (from a String) is inferred correctly by postgresql. I hope this helps.
Inside apache-tomcat-9.0.7/conf/server.xml
Add "?stringtype=unspecified" to the end of url address.
For example:
<GlobalNamingResources>
<Resource name="jdbc/??" auth="Container" type="javax.sql.DataSource"
...
url="jdbc:postgresql://127.0.0.1:5432/Local_DB?stringtype=unspecified"/>
</GlobalNamingResources>

Postgresql CASE statement schema error

I'm getting a 3F000 error from the following CASE statement.
select
entity.level1
, sum(trans_clean.amount_calculated) as total_spend
, sum(CASE WHEN extract(year from age(supplier.open_date::trans_clean.date_calculated)) <= 5 AND org_type = 'SMALL BUSINESS' THEN trans_clean.amount_calculated END) small_young
, trans_clean.date_calculated
from
entity, supplier, trans_clean
where
trans_clean.supplier_id = supplier.supplier_id and
trans_clean.entity_id = entity.entity_id and
date_calculated > '2011-12-01'
group by
entity.level1
, total_spend
, small_young
, trans_clean.date_calculated
;
The error is as follows:
ERROR: schema "trans_clean" does not exist
SQL state: 3F000
The statement works without the CASE line, I'm running the query in PGAdmin III and the right schema (public) is definitely selected.

TSQL CTE Error: Incorrect syntax near ')'

I am developing a TSQL stored proc using SSMS 2008 and am receiving the above error while generating a CTE. I want to add logic to this SP to return every day, not just the days with data. How do I do this? Here is my SP so far:
ALTER Proc [dbo].[rpt_rd_CensusWithChart]
#program uniqueidentifier = NULL,
#office uniqueidentifier = NULL
AS
DECLARE #a_date datetime
SET #a_date = case when MONTH(GETDATE()) >= 7 THEN '7/1/' + CAST(YEAR(GETDATE()) AS VARCHAR(30))
ELSE '7/1/' + CAST(YEAR(GETDATE())-1 AS VARCHAR(30)) END
if exists (
select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#ENROLLEES')
) DROP TABLE #ENROLLEES;
if exists (
select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#DISCHARGES')
) DROP TABLE #DISCHARGES;
declare #sum_enrollment int
set #sum_enrollment =
(select sum(1)
from enrollment_view A
join enrollment_info_expanded_view C on A.enrollment_id = C.enroll_el_id
where
(#office is NULL OR A.group_profile_id = #office)
AND (#program is NULL OR A.program_info_id = #program)
and (C.pe_end_date IS NULL OR C.pe_end_date > #a_date)
AND C.pe_start_date IS NOT NULL and C.pe_start_date < #a_date)
select
A.program_info_id as [Program code],
A.[program_name],
A.profile_name as Facility,
A.group_profile_id as Facility_code,
A.people_id,
1 as enrollment_id,
C.pe_start_date,
C.pe_end_date,
LEFT(datename(month,(C.pe_start_date)),3) as a_month,
day(C.pe_start_date) as a_day,
#sum_enrollment as sum_enrollment
into #ENROLLEES
from enrollment_view A
join enrollment_info_expanded_view C on A.enrollment_id = C.enroll_el_id
where
(#office is NULL OR A.group_profile_id = #office)
AND (#program is NULL OR A.program_info_id = #program)
and (C.pe_end_date IS NULL OR C.pe_end_date > #a_date)
AND C.pe_start_date IS NOT NULL and C.pe_start_date >= #a_date
;WITH #ENROLLEES AS (
SELECT '7/1/11' AS dt
UNION ALL
SELECT DATEADD(d, 1, pe_start_date) as dt
FROM #ENROLLEES s
WHERE DATEADD(d, 1, pe_start_date) <= '12/1/11')
The most obvious issue (and probably the one that causes the error message too) is the absence of the actual statement to which the last CTE is supposed to pertain. I presume it should be a SELECT statement, one that would combine the result set of the CTE with the data from the #ENROLLEES table.
And that's where another issue emerges.
You see, apart from the fact that a name that starts with a single # is hardly advisable for anything that is not a local temporary table (a CTE is not a table indeed), you've also chosen for your CTE a particular name that already belongs to an existing table (more precisely, to the already mentioned #ENROLLEES temporary table), and the one you are going to pull data from too. You should definitely not use an existing table's name for a CTE, or you will not be able to join it with the CTE due to the name conflict.
It also appears that, based on its code, the last CTE represents an unfinished implementation of the logic you say you want to add to the SP. I can suggest some idea, but before I go on I'd like you to realise that there are actually two different requests in your post. One is about finding the cause of the error message, the other is about code for a new logic. Generally you are probably better off separating such requests into distinct questions, and so you might be in this case as well.
Anyway, here's my suggestion:
build a complete list of dates you want to be accounted for in the result set (that's what the CTE will be used for);
left-join that list with the #ENROLLEES table to pick data for the existing dates and some defaults or NULLs for the non-existing ones.
It might be implemented like this:
… /* all your code up until the last WITH */
;
WITH cte AS (
SELECT CAST('7/1/11' AS date) AS dt
UNION ALL
SELECT DATEADD(d, 1, dt) as dt
FROM cte
WHERE dt < '12/1/11'
)
SELECT
cte.dt,
tmp.[Program code],
tmp.[program_name],
… /* other columns as necessary; you might also consider
enveloping some or all of the "tmp" columns in ISNULLs,
like in
ISNULL(tmp.[Program code], '(none)') AS [Program code]
to provide default values for absent data */
FROM cte
LEFT JOIN #ENROLLEES tmp ON cte.dt = tmp.pe_start_date
;