How to downsample data older than 14 days only in influxdb and keep it in the same table as raw data? - grafana

I am having a setup collecting metrics from telegraf into influxdb. Then grafana uses influxdb as data source to display graphs.
My problem is reducing disk usage, so I want to downsample old data (older than 3 days) and keep the new data (younger than 3 days) as is (raw)
I tried Retention Policy (RP) of influxdb and Continuous Queries (CQ) as described in guide:
https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention
influxdb ("telegraf")
+----------------------------+
| |
| +-----------------------+ |
| | table disk_raw | |
| | CURRENT RP (RAW) +---------+
| | (deleted after 3d) | | |
| +-----------------------+ | |CQ (average 30 min of datapoints into 1)
| +-----------------------+ | |
| | table_disk_ds | | |
| | LONGTERM RP +<--------+
| |(downsampled, kept 90d)| |
| +-----------------------+ |
| +<----+
+----------------------------+ |
|
|
grafana | grafana query
+----------------------------+ |
| | |
| +----------------------+ | |
| | data graph | +-----+
| +----------------------+ |
| |
+----------------------------+
The problem is - this solution is giving you 2 tables, one for raw data and one for downsampled data. CQ is constantly writing out to downsampled data.
That is not so good for me as:
I am using grafana to query influxdb and it reads from single table
to the graph. And I want one graph for both old data and new data.
Using 2 databases increases disk usage
Is there any way to downsample old records in the very same table?
configuring example:
https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention
grafana query
SELECT mean("used_percent") FROM "disk" WHERE ("device" = 'dm-0') AND $timeFilter GROUP BY time(10s) fill(none)

EDIT2: Here's a workaround implemented with template variables in Grafana
https://github.com/grafana/grafana/issues/4262#issuecomment-475570324
This seems like a really good solution.
ORIGINAL ANSWER
Looking into the example from the influxb page you linked
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."orders"
FROM "orders"
GROUP BY time(30m)
END
If you specify the same source and target table, namely orders, into both INTO and FROM clauses then the data will be written to the same table.
However, that does not solve your issue.
You would still need two queries to get the data from both retention policies. If you do a generic select * from disk_raw ... Influx will use the default retention policy and return data just from there.
The way you usually go about this is by running two queries and concatenating the results. In a single request something like
select * from rp_short.diskraw; select * from rp_long.diskraw
EDIT:
Here is a discussion of why it's not possible to do what you (and a lot of other people) want https://github.com/influxdata/influxdb/issues/2625
And also some ways to work around it.
Briefly, one way is to handle the downsampling and high resolution data manually (i.e not with CQ) and keep it in the same retention policy. Another is to use a proxy that would augment the query depending on the time range of the query in order to get the correct data.

Related

Kafka / KSQL, stuck in reducing stream/table

What I have, are two streams (from two different systems, imported via connectors). Some of the information from the different streams will be used to build combined information.
Currently, I'm working with ksqlDB but I'm having problems with the last step to reduce the information from both streams.
Both streams contains a tree structure (id/parentId), so I've used a second table for each stream to find certain information from the parents, which is then joined into a table containing all the information to do the final reduce.
The main matching column is always the same, however, one or more columns (not fixed) is also needed to do the final match. The columns might also be partial matches between them.
An example output of the table might look like this:
| id | match | matchExtra1 | matchExtra2 | matchExtra3 |
| 1 | 1 | Extra1 | Extra2 | Extra3 |
| 2 | 1 | Extra1 | Extra4 | Extra5 |
| 3 | 1 | Extra6 | Extra7 | Extra8 |
| 4 | 1 | Extra9 | Extr10 | tra8 |
In this case, id 1 and 2 should be matched and id 3 and 4 should be another match.
If this is possible within ksqlDB, that would be great. If needed to work with low-level Kafka, that's fine as long as we can achieve the end result.
Basic flow as I have it right now:

SQLBase, query multiple values from one table column

I have big problem with SQLBase database, or it's engine.
I have history with MySQL but not with SQLBase.
I have:
Multiple tables joined together,
work order that has,
multiple values in a column and
I want them on query result in a row.
For example, this is what I want:
table
ordernr|type|..............|productnr
-------------------------------------
1141356| v1 | .............|fe465
1141356| v2 | .............|hty546
1141356| v3 | .............|rgrg211
1454446| v1 | .............|dw885
1454446| v2 | .............|fee885
1454446| v3 | .............|wwf6664
1231231| v1 | .............|ff664
1591591| v1 | .............|gg123
1591591| v2 | .............|jj5891
query result
ordernr | .............| v1 | v2 | v3
--------------------------------------------
1141356 | ............ |fe465|hty546|rgrg211
1454446 | ............ |dw885|fee885|wwf6664
1231231 | ............ |ff664| - | -
1591591 | ............ |gg123|jj5891| -
But when I am trying it I get only orders with one or two or three values.
Depending how I write on the query,
but I want all values showing.
I tried using left join but no result.
Only ordernr comes from other table.
Please, ask if you need more information.
I try my best to help.
EDIT:
Hi! It works! Somehow my query started working as it should. But let me say that I have worked with MySQL over 10 years without any bigger hassle, but this SQLBase is giving me high blood pressure. :)
Not knowing the schema , have you tried Group By i.e. Group By ordernr , to give all on one row.

Calculate the percentage of a column which has redundant rows in Tableau

I want to calculate the percentage of a column which has redundant rows.
For example ... I would like to calculate the percentage of "Success" for A and B in the below table
+-------+---------+
| Name | Result |
+-------+---------+
| A | Success |
| B | Success |
| A | Fail |
| A | Success |
| B | Fail |
| B | Fail |
| A | Success |
+-------+---------+
I tried using Calculated field by putting If([NAME]) = "Success" then 1 else 0
and then editing the table calculation to Percentage -> table across and down... but didn't work :(
You can absolutely use the "Percent of Total" table calculation for this. The tricky bit is going into the "Edit Table Calculation" dialog and telling Tableau how you want it to perform the calculation.
Here's an example of how to do this that you can adjust to fit your specific needs. Place [Name] and [Result] in the Rows shelf. Then place SUM(Number of Records) into Text. You'll end up with something like this:
Name Result | |
---------------------+-----+
A Fail | 1 |
Success | 3 |
--------------------+-----+
B Fail | 2 |
Success | 1 |
--------------------+-----+
Then right click on SUM(Number of Records) and click "Add Table Calculation...". At the top of the Table Calculation dialog, go to "Calculation Type:" and choose "Percent of Total". In "Summarize the values from:", it will default to "Table (Down)". Go ahead and hit Apply at this point and see what happens. Bad news - it's wrong.
Name Result | |
---------------------+-----+
A Fail | 14% |
Success | 43% |
--------------------+-----+
B Fail | 29% |
Success | 14% |
--------------------+-----+
The default "Table (Down)" is almost never what you actually want. That says to calculate the percent of total for your entire partition, but you'd actually like to see that percent of total for each Name.
Until you get really good at this part (and maybe even after you've become a Tableau Zen Master and a Tableau god among men), I recommend always going to the advanced menu when you're defining your table calculations. It's a good opportunity to really think through exactly how you want Tableau to perform the calculation. In this case, you want to calculate the percent of each result (Success and Fail) for each name.
Go to the Advanced dialog (under "Summarize the values from:"). You'll see Name and Result under Partitioning and nothing under Addressing. Move Result over to Addressing and leave Name under Partitioning. What you're saying here is "I want Tableau to calculate the percent of each result (Success or Failure). I want it to do this for each name."
Apply those changes, and you should see something like this:
Name Result | |
---------------------+-----+
A Fail | 25% |
Success | 75% |
--------------------+-----+
B Fail | 67% |
Success | 33% |
--------------------+-----+
Perfect. If you just want to see the Successes, just right click on "Fail" in the table and click Hide. Do NOT filter them out. That will remove those rows from your partition, and thus from the total that is considered in the percent of total calculation. By hiding the Fails instead of filtering them, you keep them in the partition but don't show them in the data view.

Calculate median and average in a partition in Tableau using table calculation

I have a details table of posts and subjects digged from a forum. Row is the single subject (ie postID and subjectIS is the primary key for the table), then I have some measures at subject level and some at post level. For example:
+---------+-------------+--------------+------------+--------------+--------+
| post.ID | post.Author | post.Replies | subject.ID | subject.Rank | year |
+---------+-------------+--------------+------------+--------------+--------+
| 1 | mike | 10 | movie | 4 | 1990 |
| 1 | mike | 10 | comics | 6 | 1990 |
| 2 | sarah | 0 | tv | 10 | 2001 |
| 3 | tom | 4 | tv | 10 | 2003 |
| 3 | tom | 4 | comics | 6 | 2003 |
| 4 | mike | 1 | movie | 4 | 2008 |
+---------+-------------+--------------+------------+--------------+--------+
I want to study the trend of posts and subjects by year and color it by subject.Rank.
Firsts are easily measured putting COUNTD(post.ID) and COUNTD(subject.ID) in rows and 'year' in column.
But if I drag MEDIAN(subject.Rank) in Color, I got a wrong result: it's not calculated at distinct subject.ID level but at row level.
I think I can accomplish it using table calculation features, but I have no idea on how to proceed.
It sounds like you are trying to treat Subject.Rank as a dimension, instead of as a measure. If so, just convert it to a dimension on the worksheet in question by right clicking on the field and choosing dimension. You can also convert it to a dimension in the data pane by dragging the field from the measures section up to the dimensions section. That will tell Tableau to treat that field as a dimension by default in the future.
A field can be treated a dimension in some cases, and a measure in others. Depends on what you are trying to achieve. If you are familiar with SQL, dimensions are used to partition data rows for aggregation using the GROUP BY clause.
Finally, count distinct (COUNTD) can be expensive on large datasets. Often, you can get the same result another way. So try to think of other approaches and save COUNTD for when you really need it.
Try using {fixed [1st LEVEL],[2nd level]: median()}
or
Table calculation approach
when you put in median there is an edit table calculation under advance compute using put you fields in there(Make sure its ordered the way you want it to calculate when you select them) then click OK select the at which level and restart every

JasperReports grouping changeable by user

I have no idea if this is possible or not but I'm trying figure out if it is possible to use iReport Designer to create reports where the user viewing the report is able to control the grouping.
For example I would like the user to be able to re-order the grouping and also change to which degree the report is grouped (only on one field or on multiple ones).
I don't mean SQL grouping btw, I mean for example grouping by Account and then Agent:
| Account | Agent | Invoice | Total |
+----------+---------+----------+-------+
| Account1 | | | |
| | Joe | | |
| | | Invoice2 | $600 |
| | | Invoice1 | $300 |
| Account2 | | | |
| | Sam | | |
| | | Invoice4 | $120 |
| | | Invoice7 | $230 |
| | Joe | | |
| | | Invoice3 | $200 |
+----------+-- ------+----------+-------+
And what I'm trying to figure out is, can you use iReport to make this grouping dynamic? That is, that the user might want to group by Agent first and Account second and rather than have one report for each grouping it'd be nice if there was a way of doing this with iReport.
Yes, it should be possible to create reports like that. But depending on your exact needs it may not be practical (as Alex K indicated).
If you take only your example of grouping on Account then Agent or grouping on Agent then account, it would be simple. Have a parameter that let's the user specify this choice. It would probably be a drop down list. Then in the report you would have fields like this:
Today's version: $F{Account} and $F{Agent}
Dynamic version: $P{AcctFirst} ? $F{Account} : $F{Agent} and $P{AcctFirst} ? $F{Agent} : $F{Account}
Likewise, the group definition would need to include the new AcctFirst param.
But it won't extend nicely. What if the 2 fields are different data types? What if you want to let the user choose from 3 or 4 or N fields? Each of those is solvable... but the report becomes exponentially more complex.
By the way, it's relatively common request. You'll see features like this make their way into JasperReports. But for now it's a tough one.