All,
Have a question. I have a query which runs fine in sql server db. To run the same in DB2 what needs to be done.
SELECT
EMPID
,TOTALSECONDS /3600 AS Hours
,((TOTALSECONDS % 3600) /60) AS Minutes
,(TOTALSECONDS % 60) AS Seconds
,STATUS
,[DATE]
FROM
(SELECT
SUM(DATEDIFF(ss,STARTDATETIME,ENDDATETIME)) AS TOTALSECONDS,
EMPID,STATUS,
CONVERT(VARCHAR,STARTDATETIME,10) AS [DATE]
FROM <TABLE>
WHERE CONVERT(DATE,STARTDATETIME) = 'xxxx-xx-xx'
GROUP BY EMPID,STATUS,CONVERT(VARCHAR,STARTDATETIME,10)) AS SUMMARY
ORDER BY STATUS,DATE
Thanks.
Your issue is that DB2 doesn't have the DATEDIFF or CONVERT functions. There are similar ones called TIMESTAMPDIFF and VARCHAR_FORMAT, respectively.
I think this query will be what you want, but I'm not 100% sure if I've converted the SQL Server formats to the DB2 formats correctly. :)
SELECT
EMPID
,TOTALSECONDS /3600 AS Hours
,(MOD(TOTALSECONDS, 3600) /60) AS Minutes
,(MOD(TOTALSECONDS, 60) AS Seconds
,STATUS
,DATE
FROM (
SELECT
SUM(TIMESTAMPDIFF(2, CHAR(ENDDATETIME-STARTDATETIME))) AS TOTALSECONDS
,EMPID
,STATUS
,VARCHAR_FORMAT(STARTDATETIME,'mm-dd-yy') AS DATE
FROM <TABLE>
WHERE VARCHAR_FORMAT(STARTDATETIME,'yyyy-mm-dd') = 'xxxx-xx-xx'
GROUP BY
EMPID
,STATUS
,VARCHAR_FORMAT(STARTDATETIME,'mm-dd-yy')
) A
ORDER BY
STATUS
,DATE
Related
I have been struggling to get the right data using Checksum for last 15+ days, and now I am trying to find other way.
I am trying to get any data output that has been changed from Previous day's file to Today's file on Punch Card's punch_start HOUR due to unexpected Time Zone hour change (not minute).
Please see the bottom sample of data.
Dataset1 (Yesterday's file):
chcecksum person_id applied_date punch_start punch_end punch_hours
-1552866149 650067 2022-09-04 2022-09-04T20:11:00Z 2022-09-04T22:52:00Z 2.68333333333333
-1367087212 650067 2022-09-04 2022-09-04T22:52:00Z 2022-09-04T23:26:00Z 0.566666666666667
Dataset2 (Today's file):
chcecksum person_id applied_date punch_start punch_end punch_hours
-1564056421 650067 2022-09-04 2022-09-04T20:11:00Z 2022-09-04T22:52:00Z 2.683333333
-1470176798 650067 2022-09-04 2022-09-04T20:52:00Z 2022-09-04T23:26:00Z 0.566666667
So, what I am trying to is if there is any change of HOUR (in this example) on punch_start only, it will notify (or select those ones).
In this case, there was change from 22:52:00Z to 20:52:00Z on the second entry.
Checksum would not work because if there is any change like 2.683333333 to 2.68333 (without change of punch_start), it will still create different checksum value.
The challenge is finding unique ID for those corresponding entries of two datasets, and it has been a struggle for me.
I have been using something like bottom to create an unique ID for each entry:
,concat(
[person_id],
[applied_date] ,
[punch_hours],
datepart(minute, convert(datetime, cast([punch_start] as datetime), 112))
But, it sill gives me a lot of duplicates because if somebody works from
9:00 AM -- 12:00 PM &
1:00 PM -- 5:00 PM on the same day,
it would create duplicates because they work on the same [applied_date] and same [punch_hours] and same [min].
How do we tackle this?
Have you looked at using EXCEPT?
-- Prep data
select *
INTO #yesterday
from (values
(-1552866149 ,650067 , '2022-09-04', cast('2022-09-04T20:11:00Z' as datetime), cast('2022-09-04T22:52:00Z' as datetime) , 2.68333333333333 ),
(-1367087212 ,650067 , '2022-09-04', cast('2022-09-04T22:52:00Z' as datetime), cast('2022-09-04T23:26:00Z' as datetime) , 0.566666666666667)
)t1(chcecksum ,person_id ,applied_date ,punch_start ,punch_end ,punch_hours)
select *
INTO #today
from (values
(-1564056421 , 650067 ,'2022-09-04', cast('2022-09-04T20:11:00Z' as datetime), cast('2022-09-04T22:52:00Z' as datetime), 2.683333333),
(-1470176798 , 650067 ,'2022-09-04', cast('2022-09-04T20:52:00Z' as datetime), cast('2022-09-04T23:26:00Z' as datetime), 0.566666667)
)t2(chcecksum ,person_id ,applied_date ,punch_start ,punch_end ,punch_hours)
-- output
select
person_id,
applied_date,
punch_end,
Round(punch_hours, 4) as punch_hours, -- hope this is acceptable
datepart(HH, punch_start) as punch_start_hour, -- only looking for changes to HOUR
format(punch_start, 'yyyy-MM-dd XX:mm') as punch_start_hourless -- mask the the hour with XX so the rest of the Datetime can still be compared
from #yesterday
except
select
person_id,
applied_date,
punch_end,
Round(punch_hours, 4) as punch_hours,
datepart(HH, punch_start) as punch_start_hour,
format(punch_start, 'yyyy-MM-dd XX:mm') as punch_start_hourless
from #today
Wrap the 'output' query in this if you want to get the original values (minus the checksum )
SELECT
person_id
,applied_date
,Cast(REPLACE(punch_start_hourless, 'XX', punch_start_hour) as Datetime) as punch_start
,punch_end
,punch_hours
FROM (
-- insert query from above
) sub
You can use FULL OUTER JOIN to identified rows that exists in one table but not in the other
select *
from Dataset1 d1
full outer join Dataset2 d2 on d1.person_id = d2.person_id
and d1.applied_date = d2.applied_date
and d1.punch_start = d2.punch_start
I am moving my data pointing from SSMS (SQL Server Management Studio) to Databricks. There will be syntax change in the query in databricks as compared to SSMS query. I am not familiar with databricks. So, I am trying to do the same query in databricks as well but, it is showing me error. How should i correct the syntax now? The error is where the conversion is happening to PST timezone. Can anyone help me with this ?
The SQL query is:
select
case
when category_name = 'Hardware' then 'Hardware'
when category_name = 'Services' then 'Services'
when category_name = 'Software' then 'Software'
when category_name = 'Subscription' then 'Cloud'
end as category_name,
sum(item_price * line_item_quantity) as order_amount,
CUSTOMER_ID
from
curated_delta.order_details
where
parentkit_id = 0
and category_name in ('Hardware', 'Software', 'services', 'Subscription')
and category_name is not null
and order_date >= cast(cast(dateadd(dd, -365, convert(datetime, sysdatetimeoffset() AT TIME ZONE 'pacific standard time')) as date) as datetime)
group by
category_name, CUSTOMER_ID
order by
1
The error I get is:
Error in SQL statement: ParseException:
no viable alternative at input 'cast(cast(dateadd(dd,-365,convert(datetime,sysdatetimeoffset() AT'(line 6, pos 79)
I have log_min_duration_statement=0 in config.
When I check log file, sql statement and duration are saved into different rows.
(Not sure what I have wrong, but statement and duration are not saved together as this answer points)
As I understand session_line_num for duration record always equals to session_line_num + 1 for relevant statement, for same session of course.
Is this correct? is below query reliable to correctly get statement with duration in one row?
(csv log imported into postgres_log table):
WITH
sql_cte AS(
SELECT session_id, session_line_num, message AS sql_statement
FROM postgres_log
WHERE
message LIKE 'statement%'
)
,durat_cte AS (
SELECT session_id, session_line_num, message AS duration
FROM postgres_log
WHERE
message LIKE 'duration%'
)
SELECT
t1.session_id,
t1.session_line_num,
t1.sql_statement,
t2.duration
FROM sql_cte t1
LEFT JOIN durat_cte t2
ON t1.session_id = t2.session_id AND t1.session_line_num + 1 = t2.session_line_num;
Basically I need to automate all of the below in a snowflake TASK
Create/replace a csv file format and stage in Snowflake
Run task query (which runs every few days to pulls some stats)
Unload the query results each time it runs into the Stage csv
Download the contents of the stage csv to a local file on my machine
What I can't get right is the COPY INTO stage, how do I unload the results of the task each time it is run, into the stage?
I don't know what to put in the FROM statement - TITANLOADSUCCESSVSFAIL is not recognized but this is the name of the TASK
COPY INTO #TitanLoadStage/unload/ FROM TITANLOADSUCCESSVSFAIL FILE_FORMAT = TitanLoadSevenDays
First time using stage, and downloading locally with SF so appreciate any advice on how to get this up and running!
Thanks,
Nick
Full Code:
-- create a csv file format
CREATE OR REPLACE FILE FORMAT TitanLoadSevenDays
type = 'CSV'
field_delimiter = '|';
--create a snowflake staging table using the csv
CREATE OR REPLACE STAGE TitanLoadStage
file_format = TitanLoadSevenDays;
CREATE TASK IF NOT EXISTS TitanLoadSuccessVsFail
WAREHOUSE = ITSM_LWH
SCHEDULE = 'USING CRON 1 * * * * Australia/Canberra' --every minute for testing purposes
COMMENT = 'Last 7 days of Titan game success vs fail load %'
AS
WITH SUCCESSCTE AS (
SELECT CLIENTNAME
, COUNT(EVENTTYPE) AS SuccessLoad --count success load events for that game
FROM vw_fact_gameload60
WHERE EVENTTYPE = 103 --success load events
AND USERTYPE = 1 --real users
AND APPID = 2 --titan games
AND EVENTARRIVALDATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE)) --only looking at the last week
GROUP BY CLIENTNAME
),
FAILCTE AS ( --same as above but for failed loads
SELECT CLIENTNAME
, COUNT(EVENTTYPE) AS FailedLoads -- count failed load events for that game
FROM vw_fact_gameload60
WHERE EVENTTYPE = 106 -- failed load events
AND USERTYPE = 1 -- real users
AND APPID = 2 -- Titan games
AND EVENTARRIVALDATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE)) -- last 7 days
--AND FACTEVENTARRIVALDATE BETWEEN DATEADD(DAY, -7, GETDATE())AND GETDATE() -- last 7 days
GROUP BY CLIENTNAME
)
SELECT COALESCE(s.CLIENTNAME, f.CLIENTNAME) AS ClientName
, ZEROIFNULL(s.SuccessLoad) + ZEROIFNULL(f.FailedLoads) AS TotalLoads --sum the success and failed loads found for 103, 106 events only, calculated in CTEs
, ZEROIFNULL(s.SuccessLoad) AS Cnt_SuccessLoad --count from success cte
, ZEROIFNULL(f.FailedLoads) AS Cnt_FailedLoads --count from fail cte
, CONCAT(ZEROIFNULL(ROUND(s.SuccessLoad * 100.0 / TotalLoads,2)) , '%') As Pct_Success --percentage of SuccessLoads against total
, CONCAT(ZEROIFNULL(ROUND(f.FailedLoads * 100.0 / TotalLoads,2)), '%') AS Pct_Fail---percentage of failedLoads against total
FROM SUCCESSCTE s
FULL OUTER JOIN FAILCTE f -- outer join in the fail CTE by game name, outer required because some titan games sucess or fail events are NULL
ON s.CLIENTNAME = f.Clientname
ORDER BY CLIENTNAME ASC
--copy the results from the query to the snowflake staging table created above
COPY INTO #TitanLoadStage/unload/ FROM TITANLOADSUCCESSVSFAIL FILE_FORMAT = TitanLoadSevenDays
-- export the stage data to csv located in common folder
GET #TitanLoadStage/unload/data_0_0_0.csv.gz file:\\itsm\group\ITS%20Management\Common\All%20Staff\SMD\Games\Snowflake%20and%20GamesDNA\Snowflake\SnowflakeCSV\TitanLoad.csv
-- start the task
ALTER TASK IF EXISTS TitanLoadSuccessVsFail RESUME
If you want to get the results of a query ran through a task, you need to materialize the results of said query to a table.
What you have now:
CREATE TASK mytask_minute
WAREHOUSE = mywh
SCHEDULE = '5 MINUTE'
AS
SELECT 1 x;
COPY INTO #TitanLoadStage/unload/
FROM mytask_minute;
(mytask_minute is not a table, so you can't select from it)
What you should do instead:
CREATE TASK mytask_minute
WAREHOUSE = mywh
SCHEDULE = '5 MINUTE'
AS
CREATE OR REPLACE TABLE task_results_table
AS
SELECT 1 x;
COPY INTO #TitanLoadStage/unload/
SELECT *
FROM task_results_table;
I need a Query to extract the first instance and last instance only between date parameters.
I have a Table recording financial information with financialyearenddate field linked to Company table via companyID. Each company is also linked to programme table and can have multiple programmes. I have a report to pull the financials for each company
on certain programme which I have adjusted to pull only the first and last instance (using MIN & MAX) however I need the first instance.
after a certain date parameter and the last instance before a certain date parameter.
Example: Company ABloggs has financials for 1999,2000,2001,2004,2006,2007,2009 but the programme ran from 2001 to 2007 so I only want
the first financial record and last financial record between those years i.e. 2001 & 2007 records. Any help appreciated.
At the moment I am using 2 queries as I needed the data in a hurry but I need it in 1 query and only where financial year end dates are between parameters and only where there are minimum of 2 GVA records for a company.
Query1:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(
SELECT
ccx_companyname,
MAX(ccx_financialyearenddate) AS LatestDate
FROM Filteredccx_gva AS Filteredccx_gva_1
GROUP BY ccx_companyname
) AS min_1
INNER JOIN Filteredccx_gva AS gva
ON min_1.ccx_companyname = gva.ccx_companyname AND
min_1.LatestDate = gva.ccx_financialyearenddate
WHERE (gva.ccx_status = ACTUAL)
Query2:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(
SELECT
ccx_companyname,
MIN(ccx_financialyearenddate) AS FirstDate
FROM Filteredccx_gva AS Filteredccx_gva_1
GROUP BY ccx_companyname
) AS MAX_1
INNER JOIN Filteredccx_gva AS gva
ON MAX_1.ccx_companyname = gva.ccx_companyname AND
MAX_1.FirstDate = gva.ccx_financialyearenddate
WHERE (gva.ccx_status = ACTUAL)
Can't you just add a where clause using the first and last date parameters. Something like this:
SELECT <companyId>, MIN(<date>), MAX(<date>)
FROM <table>
WHERE <date> BETWEEN #firstDate AND #lastDate
GROUP BY <companyId>
declare #programme table (ccx_companyname varchar(max), start_year int, end_year int);
insert #programme values
('ABloggs', 2001, 2007);
declare #companies table (ccx_companyname varchar(max), ccx_financialyearenddate int);
insert #companies values
('ABloggs', 1999)
,('ABloggs', 2000)
,('ABloggs', 2001)
,('ABloggs', 2004)
,('ABloggs', 2006)
,('ABloggs', 2007)
,('ABloggs', 2009);
select c.ccx_companyname, min(ccx_financialyearenddate), max(ccx_financialyearenddate)
from #companies c
join #programme p on c.ccx_companyname = p.ccx_companyname
where c.ccx_financialyearenddate >= p.start_year and c.ccx_financialyearenddate <= p.end_year
group by c.ccx_companyname
having count(*) > 1;
You can combine your two original queries into a single query by including the MIN and MAX aggregates in the same GROUP BY query of the virtual table. Also including COUNT() and HAVING COUNT() > 1 ensures company must have at least 2 dates. So query should look like:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(SELECT
ccx_companyname,
ccx_status,
MIN(ccx_financialyearenddate) AS FirstDate,
MAX(ccx_financialyearenddate) AS LastDate,
COUNT(*) AS NumDates
FROM Filteredccx_gva AS Filteredccx_gva_1
WHERE (ccx_status = ACTUAL)
GROUP BY ccx_companyname, ccx_status
HAVING COUNT(*) > 1
) AS MinMax
INNER JOIN Filteredccx_gva AS gva
ON MinMax.ccx_companyname = gva.ccx_companyname AND
(MinMax.FirstDate = gva.ccx_financialyearenddate OR
MinMax.LastDate = gva.ccx_financialyearenddate)
WHERE (gva.ccx_status = MinMax.ccx_status)
ORDER BY gva.ccx_companyname, gva.ccx_financialyearenddate