How to check the execution state of “scheduledJob”? - scheduled-tasks

How to check the execution state of “scheduledJob”?
How to check whether the function “scheduledJob” execute completely or not?

1. See how scheduleJob is defined
getScheduledJobs()
userId
jobId
jobDesc
startDate
endDate
frequency
scheduleTime
days
admin
daily000
Daily Job 1
2021.09.27
2021.09.27
'D'
12:03m 13:08m
admin
daily
Daily Job 1
2021.09.27
2021.09.27
'D'
12:53m 13:58m
View the completed scheduleJob
select * from getRecentJobs() where jobDesc like "%Daily%" order by startTime desc
node
userID
jobId
jobDesc
priority
parallelism
receivedTime
startTime
endTime
errorMsg
local8848
admin
daily000
Daily Job 1
8
64
2021.09.27T12:03:06.784
2021.09.27T12:03:06.785
2021.09.27T12:03:06.785
getJobMessage and getJobReturn to view the running log and return value of each job
The running log is saved in the jodId.msg file, and the return value of the scheduled job is saved in the jobId.object file. These files are saved in the directory /batchJobs

Related

Kusto - Alert resolved for specific ADF pipline

Long time watcher, first time poster so please be kind to this poor noob....
We're marching forth into Azure and I'm working on the monitoring and alerting side (because no-one else is so far). I have successfully created a number of alerts using KQL with LogAnalytics but having issues with an ADF query.
Need something that will alert as Resolved ONLY when original failed pipeline subsequently shows as Successful. Right now, we're getting a Resolved alert when any other pipeline is successful. Help me Obi Wan Kenobi - you're my only hope.....
Current query is:
let activities = ADFActivityRun
| where Status == 'Failed' and ActivityType !in ('IfCondition', 'ExecutePipeline', 'ForEach')
| project
ActivityName,
ActivityType,
Input,
Output,
ErrorMessage,
Error,
PipelineRunId,
ActivityRunId,
_ResourceId;
ADFPipelineRun
| project RunId,PipelineName, Status, Start, End
| summarize max(Start) by PipelineName
| join kind = inner ADFPipelineRun on $left.PipelineName == $right.PipelineName and $left.max_Start == $right.Start
| project RunId
, TimeGenerated
, ResourceName=split(_ResourceId, '/')[-1]
, PipelineName
, Status
, Start
, End
,Parameters
,Predecessors
| where Status == 'Failed'
| join kind = inner activities on $left.RunId == $right.PipelineRunId
| project TimeGenerated
, ResourceName=split(_ResourceId, '/')[-1]
, PipelineName
, ActivityName
, ActivityType
, Status
, Start
, End
,Parameters
,Error
,PipelineRunId
,ActivityRunId
,Predecessors

Get postgres query log statement and duration as one record

I have log_min_duration_statement=0 in config.
When I check log file, sql statement and duration are saved into different rows.
(Not sure what I have wrong, but statement and duration are not saved together as this answer points)
As I understand session_line_num for duration record always equals to session_line_num + 1 for relevant statement, for same session of course.
Is this correct? is below query reliable to correctly get statement with duration in one row?
(csv log imported into postgres_log table):
WITH
sql_cte AS(
SELECT session_id, session_line_num, message AS sql_statement
FROM postgres_log
WHERE
message LIKE 'statement%'
)
,durat_cte AS (
SELECT session_id, session_line_num, message AS duration
FROM postgres_log
WHERE
message LIKE 'duration%'
)
SELECT
t1.session_id,
t1.session_line_num,
t1.sql_statement,
t2.duration
FROM sql_cte t1
LEFT JOIN durat_cte t2
ON t1.session_id = t2.session_id AND t1.session_line_num + 1 = t2.session_line_num;

Copy snowflake task results into stage and download to csv

Basically I need to automate all of the below in a snowflake TASK
Create/replace a csv file format and stage in Snowflake
Run task query (which runs every few days to pulls some stats)
Unload the query results each time it runs into the Stage csv
Download the contents of the stage csv to a local file on my machine
What I can't get right is the COPY INTO stage, how do I unload the results of the task each time it is run, into the stage?
I don't know what to put in the FROM statement - TITANLOADSUCCESSVSFAIL is not recognized but this is the name of the TASK
COPY INTO #TitanLoadStage/unload/ FROM TITANLOADSUCCESSVSFAIL FILE_FORMAT = TitanLoadSevenDays
First time using stage, and downloading locally with SF so appreciate any advice on how to get this up and running!
Thanks,
Nick
Full Code:
-- create a csv file format
CREATE OR REPLACE FILE FORMAT TitanLoadSevenDays
type = 'CSV'
field_delimiter = '|';
--create a snowflake staging table using the csv
CREATE OR REPLACE STAGE TitanLoadStage
file_format = TitanLoadSevenDays;
CREATE TASK IF NOT EXISTS TitanLoadSuccessVsFail
WAREHOUSE = ITSM_LWH
SCHEDULE = 'USING CRON 1 * * * * Australia/Canberra' --every minute for testing purposes
COMMENT = 'Last 7 days of Titan game success vs fail load %'
AS
WITH SUCCESSCTE AS (
SELECT CLIENTNAME
, COUNT(EVENTTYPE) AS SuccessLoad --count success load events for that game
FROM vw_fact_gameload60
WHERE EVENTTYPE = 103 --success load events
AND USERTYPE = 1 --real users
AND APPID = 2 --titan games
AND EVENTARRIVALDATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE)) --only looking at the last week
GROUP BY CLIENTNAME
),
FAILCTE AS ( --same as above but for failed loads
SELECT CLIENTNAME
, COUNT(EVENTTYPE) AS FailedLoads -- count failed load events for that game
FROM vw_fact_gameload60
WHERE EVENTTYPE = 106 -- failed load events
AND USERTYPE = 1 -- real users
AND APPID = 2 -- Titan games
AND EVENTARRIVALDATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE)) -- last 7 days
--AND FACTEVENTARRIVALDATE BETWEEN DATEADD(DAY, -7, GETDATE())AND GETDATE() -- last 7 days
GROUP BY CLIENTNAME
)
SELECT COALESCE(s.CLIENTNAME, f.CLIENTNAME) AS ClientName
, ZEROIFNULL(s.SuccessLoad) + ZEROIFNULL(f.FailedLoads) AS TotalLoads --sum the success and failed loads found for 103, 106 events only, calculated in CTEs
, ZEROIFNULL(s.SuccessLoad) AS Cnt_SuccessLoad --count from success cte
, ZEROIFNULL(f.FailedLoads) AS Cnt_FailedLoads --count from fail cte
, CONCAT(ZEROIFNULL(ROUND(s.SuccessLoad * 100.0 / TotalLoads,2)) , '%') As Pct_Success --percentage of SuccessLoads against total
, CONCAT(ZEROIFNULL(ROUND(f.FailedLoads * 100.0 / TotalLoads,2)), '%') AS Pct_Fail---percentage of failedLoads against total
FROM SUCCESSCTE s
FULL OUTER JOIN FAILCTE f -- outer join in the fail CTE by game name, outer required because some titan games sucess or fail events are NULL
ON s.CLIENTNAME = f.Clientname
ORDER BY CLIENTNAME ASC
--copy the results from the query to the snowflake staging table created above
COPY INTO #TitanLoadStage/unload/ FROM TITANLOADSUCCESSVSFAIL FILE_FORMAT = TitanLoadSevenDays
-- export the stage data to csv located in common folder
GET #TitanLoadStage/unload/data_0_0_0.csv.gz file:\\itsm\group\ITS%20Management\Common\All%20Staff\SMD\Games\Snowflake%20and%20GamesDNA\Snowflake\SnowflakeCSV\TitanLoad.csv
-- start the task
ALTER TASK IF EXISTS TitanLoadSuccessVsFail RESUME
If you want to get the results of a query ran through a task, you need to materialize the results of said query to a table.
What you have now:
CREATE TASK mytask_minute
WAREHOUSE = mywh
SCHEDULE = '5 MINUTE'
AS
SELECT 1 x;
COPY INTO #TitanLoadStage/unload/
FROM mytask_minute;
(mytask_minute is not a table, so you can't select from it)
What you should do instead:
CREATE TASK mytask_minute
WAREHOUSE = mywh
SCHEDULE = '5 MINUTE'
AS
CREATE OR REPLACE TABLE task_results_table
AS
SELECT 1 x;
COPY INTO #TitanLoadStage/unload/
SELECT *
FROM task_results_table;

postgresql session variables gets reset when a not condition is used

I am using the Postgres session variables for handling pagination in my query.
I have a table message (id,uid,body,user_id,posted_date).
I select the messages belonging to a given user and then order by posted_date. Now I have to return the messages after a given uid in the list of the messages. For this I use the session variables,
select set_config('paging.count', '0',false)
SELECT *
FROM
(SELECT
m2.uid, m2.id,
case when uid = 'XYZ' THEN
set_config('paging.count', '1',false)
WHEN current_setting('paging.count') = '1' THEN
'1'
ELSE
'0' END as offset
FROM
(SELECT m1.*,mu.* FROM schema_1.message m1
WHERE m1.user_id = 1 AND m1.id IN (4078,4076,4080,4031,4055,4056,4057,3596,4193,4467,4389,4285,4338,)
ORDER BY posted_date) m2 ) m
WHERE m.offset = '1' and m.uid <> 'XYZ'
Here I initialize the session variable to 1 when the given uid is selected in the query, all the messages after that uid will have offset as 1 and thus I will get all the messages after the given message by adding a condition for offset. But this query works fine only when I don't use the last NOT condition. But as soon as I apply the NOT condition my session variables kind of get reset to the value I initialized at the start of the query.
I just can't figure out what wrong am I doing??
According to my knowledge this should work fine.
select * from
(
select
'result'
, case when 'q'='q' THEN
set_config('paging.count', '1',false)
WHEN current_setting('paging.count') = '1' THEN
'1'
ELSE
'0' END as offset
) SUBQ
where "offset" = '1' and 'q' <> 'q'
will give you no rows and it is expected result

Selecting rows only if meeting criteria

I am new to PostgreSQL and to database queries in general.
I have a list of user_id with university courses taken, date started and finished.
Some users have multiple entries and sometimes the start date or finish date (or both) are missing.
I need to retrieve the longest course taken by a user or, if start date is missing, the latest.
If multiple choices are still available, then pick random among the multiple options.
For example
on user 2 (below) I want to get only "Economics and Politics" because it has the latest date;
on user 6, only "Electrical and Electronics Engineering" because it is the longer course.
The query I did doesn't work (and I think I am off-track):
(SELECT Q.user_id, min(Q.started_at) as Started_on, max(Q.ended_at) as Completed_on,
q.field_of_study
FROM
(select distinct(user_id),started_at, Ended_at, field_of_study
from educations
) as Q
group by Q.user_id, q.field_of_study )
order by q.user_id
as the result is:
User_id Started_on Completed_on Field_of_studies
2 "2001-01-01" "" "International Economics"
2 "" "2002-01-01" "Economics and Politics"
3 "1992-01-01" "1999-01-01" "Economics, Management of ..."
5 "2012-01-01" "2016-01-01" ""
6 "2005-01-01" "2009-01-01" "Electrical and Electronics Engineering"
6 "2011-01-01" "2012-01-01" "Finance, General"
6 "" "" ""
6 "2010-01-01" "2012-01-01" "Financial Mathematics"
I think this query should do what you need, it relies on calculating the difference in days between ended_at and started_at, and uses 0001-01-01 if the started_at is null (making it a really long interval):
select
educations.user_id,
max(educations.started_at) started_at,
max(educations.ended_at) ended_at,
max(educations.field_of_study) field_of_study
from educations
join (
select
user_id,
max(
ended_at::date
-
coalesce(started_at, '0001-01-01')::date
) max_length
from educations
where (started_at is not null or ended_at is not null)
group by user_id
) x on educations.user_id = x.user_id
and ended_at::date
-
coalesce(started_at, '0001-01-01')::date
= x.max_length
group by educations.user_id
;
Sample SQL Fiddle