ADF reprocess daily slice 3 times per day - azure-data-factory

I have a complex ADF's pipeline with slice based scheduling, where slice = day.
Now it works like that:
Day1, Day2, Day3, ..., PreviousDay, CurrentDay
At 00:00 AM of CurrentDay it reprocess PreviousDay. So for Today i have calculated data for the previous day only.
I need to change the schedule to make it works like that:
1) slice size should be left the same = day
2) reprocessing for CurrentDay should be triggered 4 times per day to emulate results refresh (kinda running total)
The reason why i wanna leave the same slice size = 1 day, because it is a partition sizeof underlying tables. I dont wanna make them small as a few hours because it is meaningless for the current volume of data.
Cannot realize how to avoid change size of slice to a few hours and achive this goal. How to force reprocess current day? Any ideas will be helpfull for me.
Thank you.

The way to do this is to make 2 changes:
Set the availability to be StartOfInterval, thus running the CurrentDay instead of PreviousDay. Dataset availability and policies
Set the schedule of the activity to Hourly with frequency 8 (thereby running this 4 times per day) (See data-factory-scheduling-and-execution#specify-schedule-for-an-activity for more info) The activity and output should have matching slices, this can be fixed with the description below.
Since the slices of the input (Day:1) and activity (Hour:8) is different, you need to set two extra parameters in the activity for the input, to change the slice from 8 hours to 1 day, thus matching the input. The execution is based on the output slice. This is explained further here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#model-datasets-with-different-frequencies The Activity and output slice also have different slices and can be fixed with the same method.

Related

Why are my values multiplying when I apply Month/Year to my values?

When I apply Month/Year to Cases or Deaths from my data, the values explode. For Cases it goes from approximately 48 million to over 1 billion, and for Deaths it goes from about 700 thousand to over 22 million. However, when I try the same thing with Initial Claims or the Stringency Index, my values remain correct. I'm trying to find the month over month percentage change by the way. And I'm using the Date column. I only select 2020 and 2021 in the filter for Year.
What I'm asking about is Sheet 21.
Link to workbook: https://public.tableau.com/app/profile/nilajah.rivers/viz/CoronaVirusProject_16323687296770/Sheet21
Your problem is that the data points are daily cumulative deaths. If you change the date aggregation to anything other than days, Tableau will default to summing the numbers for all the days in the month. This will give the wrong result, obviously.
If you want to show the correct total deaths or cases regardless of the time aggregation (months, days, weeks etc.) then you could use the New Case or New Death numbers plus a running sum table calculation. This will always give the correct total for the time period.
Table calculations will also allow automatic calculation of the period to period % change from the same data fields.
This is a common problem when working with datasets that offer pre-calculated aggregations. Tableau doesn't need that as it can dynamically calculate the aggregation of a field over any given time period but it is easy to forget which field has pre-aggregated data and which has raw data. Pre-aggregated fields assume a particular time period and can't be used for different time periods without disentangling that assumption (which is unnecessary if you also have the raw data (in this case daily new deaths/cases).

Graph a counter from zero in prometheus/grafana

In prometheus, I have a monotonically increasing counter (ifHCInOctets from IF-MIB, in this case).
In Grafana, I can create a graph using the simple query ifHCInOctets{job='snmp',instance='$Device',ifDescr=~'eth0'} and see the counter graphed over different time ranges by selecting the desired range in the upper-right.
This is almost exactly what I want. However, I would like the graph to always start at zero and increase from there. The use-case is that I want to visualize my data usage over the course of a month to see how quickly I am approaching my data cap. (I already create a gauge object using increase(ifHCInOctets{...}[$__range]) function which shows me how much I have used in total over the given time range, but I'd like to be able to visualize that usage over time.)
Basically, I want ifHCInOctets{...} - X where X is the value of ifHCInOctets at the start of the range. My first thought was:
ifHCInOctets{...} - ifHCInOctets{...} offset $__range
But that seems to show me each data point minus the data point $__range time prior to it (rather than just subtracting the starting value from all points).
I then tried creating a query variable with the query query_result(ifHCInOctets{...} offset $__range) and setting it to update on time range change. This almost seemed to work, but the resulting graph always seemed to start slightly negative, depending on the time range selected, which made me think it wasn't doing what I thought it was.
I have also tried various forms of sum, sum_over_time, and increase, all to no avail.
You're probably looking for something like this
ifHCInOctets
-
min_over_time(
(ifHCInOctets
and
(month(timestamp(ifHCInOctets)) == scalar(month(vector($__to / 1000)))))[31d:]
)
But it doesn't take into account counter resets. And is ugly and inefficient as hell. It's basically the current value minus the min_over_time calculated over samples in the previous 31 days that fell into the same month as Grafana's $__to timestamp.
You probably want to set up a recording rule based on this expression (that adds year, month and day labels to a metric) and then calculate the increase() over any given month (including the current month). That takes into account both counter resets and counters that did not exist at the beginning of the month.

Can I calculate a moving sum on a field in InfluxDB?

I'm trying to understand if it's possible to calculate a 1 month sum of revenue data in one of my measurements. For each day, I would like the sum of the previous 30 days.
Is this possible in InfluxDB or through Grafana's query interface?
A moving average is a moving sum, divided by the number of samples. So if you want a moving sum of the past 30 values:
select 30*moving_average(field_name, 30) from measurement
Edited to add:
As Peter Halicky points out in the comments, this is is not the past 30 days. It's the past 30 data points.
If you will always have data for every single day, it's not an issue.
If you're missing a day's data, you'll still get a 30-sample average, but it'll stretch over 31 days instead of 30.
If you don't actually care about the calendar, but want to know the past 30 days of activity, this is not a problem.
If it is a problem, there are a few work-arounds. One that's probably trickier than it sounds: ensure that there is always an entry for each day.
A more robust way is to have the reporting app do this in two steps. Something like this (haven't worked out all the details, but you get the idea):
find the number of data points in the past 30 days, using a query like select count(field_name) from measurement where time > now() - 30d.
Use this number (call it n) to form the query: select n*moving_average(field_name, n) from measurement where time > now - 30d.
Yes, definitely it's possible.
Just set this part of your query like this:
SELECT sum("value") FROM "YOUR_TAG_NAME"
WHERE $timeFilter GROUP BY time(30d) fill(null)
Just make sure that your dashboard time included Last 30 days (at least).

How to do a distinct count of a metric using graphite datasource in grafana?

I have a metric that shows the state of a server. The values are integers and if the value is 0 (zero) then the server is stable, else it is unstable. And the graph we have is at a minute level. So, I want to show an aggregated value to know how many hours the server is unstable in the selected time range.
Lets say, if I select "Last 7 days" as the time duration...we have get X hours of instability of server.
And one more thing, I have a line graph (time series graph) that shows the state of server...but, the thing is when I select "Last 24 hours or 48 hours" I am getting the graph at a minute level...when I increase the duration to a quarter I am getting the graph for every 5 min or something like that....I understand it's aggregating the values....but does any body know how the grafana is doing the aggregation ??
I have tried "scaleToSeconds" function and "ConsolidateBy" functions and many more to first get the count of non zero value minutes, but no success.
Any help would be greatly appreciated.
Thanks in advance.
There are a few different ways to tackle this, there are 2 places that aggregation happens in this situation:
When you query for a time range longer than your raw retention interval and whisper returns aggregated data. The aggregation method used here is defined in your carbon aggregation configuration.
When Grafana sends a query to Graphite it passes maxDataPoints=<width of graph in pixels>, and Graphite will perform aggregation to return at most that many points (because you don't have enough pixels to render more points than that). The method used for this consolidation is controlled by the consolidateBy function.
It is possible for both of these to be used in the same query if you eg have a panel that queries 3 days worth of data and you store 2 days at 1-minute and 7 days at 5-minute intervals in whisper then you'd have 72 * 60 / 5 = 864 points from the 5-minute archive in whisper, but if your graph is only 500px wide then at runtime that would be consolidated down to 10-minute intervals and return 432 points.
So, if you want to always have access to the count then you can change your carbon configuration to use sum aggregation for those series (and remove the existing whisper files so new ones are created with the new aggregation config), and pass consolidateBy('sum') in your queries, and you'll always get the sum back for each interval.
That said, you can also address this at query time by multiplying the average back out to get a total (assuming that your whisper aggregation config is using average). The simplest way to do that will be to summarize the data with average into buckets that match the longest aggregation interval you'll be querying, then scale those values by that interval to calculate the total number of minutes. Finally, you'll want to use consolidateBy('sum') so that any runtime consolidation will work properly.
consolidateBy(scale(summarize(my.series, '10min', 'avg'), 60), 'sum')
With all of that said, you may want to consider reporting uptime in terms of percentages rather than raw minutes, in which case you can use the raw averages directly.
When you say the value is zero (0), the server is healthy - what other values are reported while the server is unhealthy/unstable? If you're only reporting zero (healthy) or one (unhealthy), for example, then you could use the sumSeries function to get a count across multiple servers.
Some more information is needed here about the types of values the server is reporting in order to give you a better answer.
Grafana does aggregate - or consolidate - data typically by using the average aggregation function. You can override this using the 'sum' aggregation in the consolidateBy function.
To get a running calculation over time, you would most likely have to use the summarize function (also with the sum aggregation) and define the time period, e.g. 1 hour, 1 day, 1 week, and so on. You could take this a step further by combining this with a time template variable so that as the period grows/shrinks, the summarize period will increase/decrease accordingly.

Tableau: graphically show compounded leadtimes

I have a chart that shows the number of departures for a given 15 minute interval as seen here.
I need to compound these counts backwards for one hour. For example, the 3 departures shown at 11:00 need to also be represented at the 10:00, 10:15, 10:30, and 10:45 columns. When completed, the 10:00 would have a total of 6 departures (10:15 -> 6, 10:30 ->5, 10:45 -> 4, 11:00 -> 4).
I have done this via VBA in excell, but am now needing to replicate the chart in Tableau and have been beating my head in for about two weeks now. I'd love to hear any and all suggestions.
You can use a Cartesian join against a large enough date range of your choosing to in effect resample your data and add the additional time intervals you desire.
For example, if you have a month's worth of data (min date -> max date = 30 days), then you have (30 * 24 * 4) 2880 15 minute intervals.
Create all those intervals in a separate data sheet
Add a bogus column with value of link for all rows
Create the same bogus in your actual data source
Join the two sheets together on the link column
Create a calculated field that is something along the following:
[Interval] <= [Flight Time] AND [Interval] >= DATEADD('hour',-1,[Flight Time])
This calculated field will evaluate to TRUE when the interval time is within one hour before the flight time. You can then drag this field onto your filter shelf and select TRUE value only. Effectively your [Interval] field becomes your new date field.
I would recommend adding that filter to the context and applying across the entire datasource. Before you add this filter you'll have 2880 times the about of data so be sure to do a live view first. Be careful with extracts using Cartesian joins as you could potentially be extracting more than you bargained for.
See the following links for different techniques on how to do this and re-sampling dates in general in tableau.
https://community.tableau.com/thread/151387
Depending on the size of your data (and if a live view is not necessary) it is often times easier and more efficient to do this type of pre-processing outside of tableau in SQL or something like python's pandas library.
Here is another solution provided from the Tableau Cumunity Forum. I have not tried tyvich's solution yet, but I know this one got me where I needed. Please follow the link to see the solution using moving table calculations.
https://community.tableau.com/thread/251154