Azure Stream Analytics: How to use two Azure Machine Learning Function - tsql

I am using two Azure Machine Learning Function in Stream Analytics but it seems like only one can be used per query. How can I split it into two queries?
The streaming job failed: Stream Analytics job has validation errors: Multiple AML web service functions used in subquery. A query step can contain no more than one AML function. Please split the step into multiple steps.
WITH subquery as (
SELECT
id as id,
deviceId as deviceId,
username as username,
try_cast(localtime as datetime) as localtime,
AC as AC, FM as FM, UC as UC,
DL as DL, DS as DS, DP as DP,
LB as LB, ASTV as ASTV, MSTV as MSTV,
ALTV as ALTV, MLTV as MLTV, Width as Width,
Min as Min, Max as Max, Nmax as Nmax,
Nzeros as Nzeros, Mode as Mode, Mean as Mean,
Median as Median, Variance as Variance, Tendency as Tendency,
rms,fmed,fpeak,sample_entropy,
EventProcessedUtcTime as EventProcessedUtcTime,
Distress(AC,FM,UC,DL,DS,DP,1,LB,ASTV,MSTV,ALTV,MLTV,
Width,Min,Max,Nmax,Nzeros,Mode,Mean,Median,Variance,
Tendency,1,1,1,1,1,1,1,1,1,1,1,1) as resultFHR,
Labour("",1,1,1,"",rms,fmed,fpeak,sample_entropy,"","") as resultUC
FROM
iot
)
SELECT
id as id,
deviceId as deviceId,
username as username,
localtime as localtime,
AC as AC, FM as FM, UC as UC,
DL as DL, DS as DS, DP as DP,
LB as LB, ASTV as ASTV, MSTV as MSTV,
ALTV as ALTV, MLTV as MLTV, Width as Width,
Min as Min, Max as Max, Nmax as Nmax,
Nzeros as Nzeros, Mode as Mode, Mean as Mean,
Median as Median, Variance as Variance, Tendency as Tendency,
EventProcessedUtcTime as EventProcessedUtcTime,
resultFHR.[classes] as distress,
resultFHR.[probabilities] as distressProbability,
resultUC.[classes] as labour,
resultUC.[probabilities] as labourProbability
INTO
sql
FROM
subquery
SELECT
*
INTO
c2d
FROM
subquery

You just need to split it up into 2 separate steps. Eg:
WITH subquery1 as ( SELECT id as id, deviceId as deviceId, username as username, try_cast(localtime as datetime) as localtime, AC as AC, FM as FM, UC as UC, DL as DL, DS as DS, DP as DP, LB as LB, ASTV as ASTV, MSTV as MSTV, ALTV as ALTV, MLTV as MLTV, Width as Width, Min as Min, Max as Max, Nmax as Nmax, Nzeros as Nzeros, Mode as Mode, Mean as Mean, Median as Median, Variance as Variance, Tendency as Tendency, rms,fmed,fpeak,sample_entropy, EventProcessedUtcTime as EventProcessedUtcTime, Distress(AC,FM,UC,DL,DS,DP,1,LB,ASTV,MSTV,ALTV,MLTV, Width,Min,Max,Nmax,Nzeros,Mode,Mean,Median,Variance, Tendency,1,1,1,1,1,1,1,1,1,1,1,1) as resultFHR FROM iot )
WITH subquery as ( SELECT id as id, deviceId as deviceId, username as username, try_cast(localtime as datetime) as localtime, AC as AC, FM as FM, UC as UC, DL as DL, DS as DS, DP as DP, LB as LB, ASTV as ASTV, MSTV as MSTV, ALTV as ALTV, MLTV as MLTV, Width as Width, Min as Min, Max as Max, Nmax as Nmax, Nzeros as Nzeros, Mode as Mode, Mean as Mean, Median as Median, Variance as Variance, Tendency as Tendency, rms,fmed,fpeak,sample_entropy, EventProcessedUtcTime as EventProcessedUtcTime, Labour("",1,1,1,"",rms,fmed,fpeak,sample_entropy,"","") as resultUC FROM iot )
SELECT id as id, deviceId as deviceId, username as username, localtime as localtime, AC as AC, FM as FM, UC as UC, DL as DL, DS as DS, DP as DP, LB as LB, ASTV as ASTV, MSTV as MSTV, ALTV as ALTV, MLTV as MLTV, Width as Width, Min as Min, Max as Max, Nmax as Nmax, Nzeros as Nzeros, Mode as Mode, Mean as Mean, Median as Median, Variance as Variance, Tendency as Tendency, EventProcessedUtcTime as EventProcessedUtcTime, resultFHR.[classes] as distress, resultFHR.[probabilities] as distressProbability INTO sql FROM subquery1
SELECT id as id, deviceId as deviceId, username as username, localtime as localtime, AC as AC, FM as FM, UC as UC, DL as DL, DS as DS, DP as DP, LB as LB, ASTV as ASTV, MSTV as MSTV, ALTV as ALTV, MLTV as MLTV, Width as Width, Min as Min, Max as Max, Nmax as Nmax, Nzeros as Nzeros, Mode as Mode, Mean as Mean, Median as Median, Variance as Variance, Tendency as Tendency, EventProcessedUtcTime as EventProcessedUtcTime, resultUC.[classes] as labour, resultUC.[probabilities] as labourProbability INTO sql FROM subquery2

Related

Showing percentage for the group using MS SQL

I have the following query:
SELECT machine, start_date, sum(duration),
st.status_code, st.status_text
FROM er_table er
LEFT JOIN status_table st on er.status_code=st.status_code
where machine in ('mach1','mach2','mach3')
group by machine, start_date, st.status_code, st.status_text
order by machine, start_date, status_text
It produces the following result:
However, I need to add a percentage for the group of machines for a particular date. E.g. on 15 Sep, mach1 was in idle for 20 secs, thus, 20/(20+800) would give me 2% idle time.
This is the result I need to get:
I saw a similar question from a similar post and i modified my code as follows, but it didn't quite give the result I'm looking for:
SELECT machine, start_date, sum(duration),
SUM(duration) * 100.0 / SUM(SUM(duration)) OVER () AS Percentage,
st.status_code, st.status_text
FROM er_table er
LEFT JOIN status_table st on er.status_code=st.status_code
where machine in ('mach1','mach2','mach3')
group by machine, start_date, st.status_code, st.status_text
order by machine, start_date, status_text
Any help is very much appreciated. Thank you.
SELECT machine, start_date, sum(duration),
SUM(duration) * 100.0 / SUM(duration) OVER(partition by machine, start_date) AS Percentage,
st.status_code, st.status_text
FROM er_table er
LEFT JOIN status_table st on er.status_code=st.status_code
where machine in ('mach1','mach2','mach3')
group by machine, start_date, st.status_code, st.status_text
order by machine, start_date, status_text
How about doing it like this?
i used partition by in sum() over()
result:
machine
start_date
duration
percentage
status_code
status_text
mach1
2021-09-15
20
2.439024390243
1
IDLE
mach1
2021-09-15
800
97.560975609756
1
RUNNING
mach1
2021-09-16
40
4.255319148936
1
IDLE
mach1
2021-09-16
900
95.744680851063
1
RUNNING
mach2
2021-09-15
100
12.500000000000
1
IDLE
mach2
2021-09-15
700
87.500000000000
1
RUNNING

Rollup data hierarchy in postgresql 8.0?

I am trying to rollup in postgresql 8.0. In latest version on postgresql we have ROLLUP function, but how to implement rollup in postgresql 8.0 ? Anyone have experience with the same?
I tried the below
SELECT
EXTRACT (YEAR FROM rental_date) y,
EXTRACT (MONTH FROM rental_date) M,
EXTRACT (DAY FROM rental_date) d,
COUNT (rental_id)
FROM
rental
GROUP BY
ROLLUP (
EXTRACT (YEAR FROM rental_date),
EXTRACT (MONTH FROM rental_date),
EXTRACT (DAY FROM rental_date)
);
But getting the following error:
42883: function rollup( integer, integer, integer) does not exist
followed from http://www.postgresqltutorial.com/postgresql-rollup/
As GROUP BY ROLLUP was introduced with version 9.5, the query has no chance to work. But if you think about what it does it should be very easy in your case to come up with a version producing the same result.
Basically, you want to have:
an overall sum
a sum per year
and a sum per month
for the daily counts
I've written the above in a special way, so that it becomes clear what you actually need:
produce daily counts
generate sum per month from daily counts
generate sum per year from monthly sums or daily counts
generate total from yearly sums, monthly sums or daily counts
UNION ALL of the above in the order you want
As the default for GROUP BY ROLLUP is to write-out the total first and then the individual grouping sets with NULLS LAST, the following query will do the same:
WITH
daily AS (
SELECT EXTRACT (YEAR FROM rental_date) y, EXTRACT (MONTH FROM rental_date) M, EXTRACT (DAY FROM rental_date) d, COUNT (rental_id) AS count
FROM rental
GROUP BY 1, 2, 3
),
monthly AS (
SELECT y, M, NULL::double precision d, SUM (count) AS count
FROM daily
GROUP BY 1, 2
),
yearly AS (
SELECT y, NULL::double precision M, NULL::double precision d, SUM (count) AS count
FROM monthly
GROUP BY 1
),
totals AS (
SELECT NULL::double precision y, NULL::double precision M, NULL::double precision d, SUM (count) AS count
FROM yearly
)
SELECT * FROM totals
UNION ALL
SELECT * FROM daily
UNION ALL
SELECT * FROM monthly
UNION ALL
SELECT * FROM yearly
;
The above works with PostgreSQL 8.4+. If you don't even have that version, we must fall back to the old-school UNION without re-using aggregation data:
SELECT NULL::double precision y, NULL::double precision M, NULL::double precision d, COUNT (rental_id) AS count
FROM rental
UNION ALL
SELECT EXTRACT (YEAR FROM rental_date) y, EXTRACT (MONTH FROM rental_date) M, EXTRACT (DAY FROM rental_date) d, COUNT (rental_id) AS count
FROM rental
GROUP BY 1, 2, 3
UNION ALL
SELECT EXTRACT (YEAR FROM rental_date) y, EXTRACT (MONTH FROM rental_date) M, NULL::double precision d, COUNT (rental_id) AS count
FROM rental
GROUP BY 1, 2
UNION ALL
SELECT EXTRACT (YEAR FROM rental_date) y, NULL::double precision M, NULL::double precision d, COUNT (rental_id) AS count
FROM rental
GROUP BY 1
;

Calculate average weight gain

I have the following table:
ganado_id created weight
1 2018-12-24 285
2 2018-12-24 288
2 2018-10-13 241
1 2018-10-13 244
1 2018-08-11 202
I need to calculate the average weight gain for each ganado_id. Desired output:
ganado_id avg_weight_gain
1 0.618
2 0.652
The average weight gain for ganado_id = 1 is calculated this way:
SELECT ((285 - 244)::NUMERIC / ('2018-12-24'::DATE - '2018-10-13'::DATE)::NUMERIC + (244 - 202)::NUMERIC / ('2018-10-13'::DATE - '2018-08-11'::DATE)::NUMERIC) / 2
The average weight gain for ganado_id = 2 is calculated this way:
SELECT (288 - 241)::NUMERIC / ('2018-12-24'::DATE - '2018-10-13'::DATE)::NUMERIC
In production, there can be 1 to 15 weight records (first table) for each ganado_id
Try using the lag aggregate function to show you both the weight from the previous record and the date from the previous record. You can then sum the two (gain from previous record, number of days from previous record) to get the average:
with gains as (
select
ganado_id, weight, created,
weight - lag (weight) over (partition by ganado_id order by created) as gain,
created - lag (created) over (partition by ganado_id order by created) as days
from table1
)
select
ganado_id, sum (gain) * 1.0 / sum (days) as avg_gain
from gains
group by
ganado_id
-- EDIT --
Per your feedback, this would be the average of the averages:
with gains as (
select
ganado_id, weight, created,
1.0 * (weight - lag (weight) over (partition by ganado_id order by created)) /
(created - lag (created) over (partition by ganado_id order by created)) as gain_per_day
from table
)
select
ganado_id, avg (gain_per_day)
from gains
group by
ganado_id
Results:
1 0.61805555555555555556
2 0.65277777777777777778

Creating Hourly average based on 2 minutes before and after of time instantaneous in PostgreSQL

I have a temporal database with 2 minutes sampling frequency and I want to extract instantaneous hourly values as 00:00, 01:00, 02, ... 23 for each day.
So, I would like to get the average value from average of values :
HH-1:58, HH:00 and HH:02 = Average of HH o'clock
OR
HH-1:59, HH:01 and HH:03 = Average of HH o'clock
Sample Data1:
9/28/2007 23:51 -1.68
9/28/2007 23:53 -1.76
9/28/2007 23:55 -1.96
9/28/2007 23:57 -2.02
9/28/2007 23:59 -1.92
9/29/2007 0:01 -1.64
9/29/2007 0:03 -1.76
9/29/2007 0:05 -1.83
9/29/2007 0:07 -1.86
9/29/2007 0:09 -1.94
Expected Result:
For 00 midnight:
(-1.92+-1.64+-1.76)/3
Sample Data2:
9/28/2007 23:54 -1.44
9/28/2007 23:56 -1.58
9/28/2007 23:58 -2.01
9/29/2007 0:00 -1.52
9/29/2007 0:02 -1.48
9/29/2007 0:04 -1.46
Expected Results:
(-2.01+-1.52+-1.48)/3
SELECT hr, ts, aval
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY hr ORDER BY ts) rn
FROM (
SELECT *,
DATE_TRUNC('hour', ts) AS hr,
AVG(value) OVER (ORDER BY ts ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS aval
FROM mytable
) q
) q
WHERE rn = 1
PostgreSQL's window functions make anything involving adjacent rows a lot simpler than it used to be. Untried but should be roughly right:
select
date_trunc('hour', newest_time) as average_time,
(oldest_temp + middle_temp + newest_temp) / 3 as average_temp
from (
select
date_trunc('hour', sample_time) as average_time,
lag(sample_time, 2) over w as oldest_time,
lag(sample_time, 1) over w as middle_time,
sample_time as newest_time,
lag(sample_temp, 2) over w as oldest_temp,
lag(sample_temp, 1) over w as middle_temp,
sample_temp as newest_temp
from
samples
window
w as (order by sample_time)
) as s
where
oldest_time = newest_time - '4 minutes'::interval and
middle_time = newest_time - '2 minutes'::interval and
extract(minute from newest_time) in (2, 3);
I've restricted this in the where clause to exactly the scenario you've described - latest value at :02 or :03, prior 2 values 2 and 4 minutes before. Just in case you have some missing data which would otherwise give odd results like averaging over a much longer interval.

How to split start/end time columns into discrete chunks with PostgreSQL?

We have some tables, which have a structure like:
start, -- datetime
end, -- datetime
cost -- decimal
So, for example, there might be a row like:
01/01/2010 10:08am, 01/01/2010 1:56pm, 135.00
01/01/2010 11:01am, 01/01/2010 3:22pm, 118.00
01/01/2010 06:19pm, 01/02/2010 1:43am, 167.00
Etc...
I'd like to get this into a format (with a function?) that returns data in a format like:
10:00am, 10:15am, X, Y, Z
10:15am, 10:30am, X, Y, Z
10:30am, 10:45am, X, Y, Z
10:45am, 11:00am, X, Y, Z
11:00am, 11:15am, X, Y, Z
....
Where:
X = the number of rows that match
Y = the cost / expense for that chunk of time
Z = the total amount of time during this duration
IE, for the above data, we might have:
10:00am, 10:15am, 1, (135/228 minutes*7), 7
The first row starts at 10:08am, so only 7 minutes are used from 10:00-10:15.
There are 228 minutes in the start->end time.
....
11:00am, 11:15am, 2, ((135+118)/((228+261) minutes*(15+14)), 29
The second row starts right after 11:00am, so we need 15 minutes from the first row, plus 14 minutes from the second row
There are 261 minutes in the second start->end time
....
I believe I've done the math right here, but need to figure out how to make this into a PG function, so that it can be used within a report.
Ideally, I'd like to be able to call the function with some arbitrary duration, ie 15minute, or 30minute, or 60minute, and have it split up based on that.
Any ideas?
Here is my try. Given this table definition:
CREATE TABLE interval_test
(
"start" timestamp without time zone,
"end" timestamp without time zone,
"cost" integer
)
This query seems to do what you want. Not sure if it is the best solution, though.
Also note that it needs Postgres 8.4 to work, because it uses WINDOW functions and WITH queries.
WITH RECURSIVE intervals(period_start) AS (
SELECT
date_trunc('hour', MIN(start)) AS period_start
FROM interval_test
UNION ALL
SELECT intervals.period_start + INTERVAL '15 MINUTES'
FROM intervals
WHERE (intervals.period_start + INTERVAL '15 MINUTES') < (SELECT MAX("end") FROM interval_test)
)
SELECT DISTINCT period_start, intervals.period_start + INTERVAL '15 MINUTES' AS period_end,
COUNT(*) OVER (PARTITION BY period_start ) AS record_count,
SUM (LEAST(period_start + INTERVAL '15 MINUTES', "end")::timestamp - GREATEST(period_start, "start")::timestamp)
OVER (PARTITION BY period_start ) AS total_time,
(SUM(cost) OVER (PARTITION BY period_start ) /
(EXTRACT(EPOCH FROM SUM("end" - "start") OVER (PARTITION BY period_start )) / 60)) *
((EXTRACT (EPOCH FROM SUM (LEAST(period_start + INTERVAL '15 MINUTES', "end")::timestamp - GREATEST(period_start, "start")::timestamp)
OVER (PARTITION BY period_start )))/60)
AS expense
FROM interval_test
INNER JOIN intervals ON (intervals.period_start, intervals.period_start + INTERVAL '15 MINUTES') OVERLAPS (interval_test.start, interval_test.end)
ORDER BY period_start ASC