Grafana creating Group By Time Series using custom SQL Plugin - grafana

Creating a panel in Grafana using an SQL-like database and a custom data source Grafana plugin. I want to create a time series graph with multiple data points. E.g. if I have 3 columns (time, id, metric) I want a time series graph with a different line for each metric and the value being the amount of ids in that specific time.
My current query is something like
SELECT date_time AS 'Time', COUNT(DISTINCT(id)) AS 'value', string_field AS 'metric' FROM my_table WHERE filter='reason' AND ... GROUP BY date_time, string_field
This returns the data in a way I would expect:
|Time | value | metric |
|-------------------------------------|
| 2023-01-01 | 1 | example_metric |
However Grafana is outputting this as though the metric column didn't exist. It just outputs the 'value' amount. I want the y-axis to be the count and the label to be the 'metric' column.

Related

PySpark rolling operation with variable ranges

I have a dataframe looking like this
some_data | date | date_from | date_to
1234 |1-2-2020| 1-2-2020 | 2-2-2020
5678 |2-2-2020| 1-2-2020 | 2-3-2020
and I need to perform some operations on some_data based on time ranges that are different for every row, and stored in date_from and date_to. This is basically a rolling operation on some_data vs date, where the width of the window is not constant.
If the time ranges were the same, like always 7 days preceding/following, I would just do a window with rangeBetween. Any idea how I can still use rangeBetween with these variable ranges? I could really use the partitioning capability Window provides...
My current solution is:
a join of the table with itself to obtain a secondary/nested date column. at this point every date has the full list of possible dates
some wheres to select, for each primary date the proper secondary dates according to date_from and date_to
a groupby the primary date with agg performing the actual operation on the selected rows
But I am afraid this would not be very performant on large datasets. Can this be done with Window? Do you have a better/more performant suggestion?
Thanks a lot,
Andrea.

Graph in Grafana using Postgres Datasource with BIGINT column as time

I'm trying to construct very simple graph showing how much visits I've got in some period of time (for example for each 5 minutes).
I have Grafana of v. 5.4.0 paired well with Postgres v. 9.6 full of data.
My table below:
CREATE TABLE visit (
id serial CONSTRAINT visit_primary_key PRIMARY KEY,
user_credit_id INTEGER NOT NULL REFERENCES user_credit(id),
visit_date bigint NOT NULL,
visit_path varchar(128),
method varchar(8) NOT NULL DEFAULT 'GET'
);
Here's some data in it:
id | user_credit_id | visit_date | visit_path | method
----+----------------+---------------+---------------------------------------------+--------
1 | 1 | 1550094818029 | / | GET
2 | 1 | 1550094949537 | /mortgage/restapi/credit/{userId}/decrement | POST
3 | 1 | 1550094968651 | /mortgage/restapi/credit/{userId}/decrement | POST
4 | 1 | 1550094988557 | /mortgage/restapi/credit/{userId}/decrement | POST
5 | 1 | 1550094990820 | /index/UGiBGp0V | GET
6 | 1 | 1550094990929 | / | GET
7 | 2 | 1550095986310 | / | GET
...
So I tried these 3 variants (actually, dozens of others with no luck) with no success:
Solution A:
SELECT
visit_date as "time",
count(user_credit_id) AS "user_credit_id"
FROM visit
WHERE $__timeFilter(visit_date)
ORDER BY visit_date ASC
No data on graph. Error: pq: invalid input syntax for integer: "2019-02-14T13:16:50Z"
Solution B
SELECT
$__unixEpochFrom(visit_date),
count(user_credit_id) AS "user_credit_id"
FROM visit
GROUP BY time
ORDER BY user_credit_id
Series ASELECT
$__time(visit_date/1000,10m,previous),
count(user_credit_id) AS "user_credit_id A"
FROM
visit
WHERE
visit_date >= $__unixEpochFrom()::bigint*1000 and
visit_date <= $__unixEpochTo()::bigint*1000
GROUP BY 1
ORDER BY 1
No data on graph. No Error..
Solution C:
SELECT
$__timeGroup(visit_date, '1h'),
count(user_credit_id) AS "user_credit_id"
FROM visit
GROUP BY time
ORDER BY time
No data on graph. Error: pq: function pg_catalog.date_part(unknown, bigint) does not exist
Could someone please help me to sort out this simple problem as I think the query should be compact, naive and simple.. But Grafana docs demoing its syntax and features confuse me slightly.. Thanks in advance!
Use this query, which will works if visit_date is timestamptz:
SELECT
$__timeGroupAlias(visit_date,5m,0),
count(*) AS "count"
FROM visit
WHERE
$__timeFilter(visit_date)
GROUP BY 1
ORDER BY 1
But your visit_date is bigint so you need to convert it to timestamp (probably with TO_TIMESTAMP()) or you will need find other way how to use it with bigint. Use query inspector for debugging and you will see SQL generated by Grafana.
Jan Garaj, Thanks a lot! I should admit that your snippet and what's more valuable your additional comments advising to switch to SQL debugging dramatically helped me to make my "breakthrough".
So, the resulting query which solved my problem below:
SELECT
$__unixEpochGroup(visit_date/1000, '5m') AS "time",
count(user_credit_id) AS "Total Visits"
FROM visit
WHERE
'1970-01-01 00:00:00 GMT'::timestamp + ((visit_date/1000)::text)::interval BETWEEN
$__timeFrom()::timestamp
AND
$__timeTo()::timestamp
GROUP BY 1
ORDER BY 1
Several comments to decypher all this Grafana magic:
Grafana has its limited DSL to make configurable graphs, this set of functions converts into some meaningful SQL (this is where seeing "compiled" SQL helped me a lot, many thanks again).
To make my BIGINT column be appropriate for predefined Grafana functions we need to simply convert it to seconds from UNIX epoch so, in math language - just divide by 1000.
Now, WHERE statement seems not so simple and predictable, Grafana DSL works different where and simple division did not make trick and I solved it by using another Grafana functions to get FROM and TO points of time (period of time for which Graph should be rendered) but these functions generate timestamp type while we do have BIGINT in our column. So, thanks to Postgres we have a bunch of converter means to make it timestamp ('1970-01-01 00:00:00 GMT'::timestamp + ((visit_date/1000)::text)::interval - generates you one BIGINT value converted to Postgres TIMESTAMP with which Grafana deals just fine).
P.S. If you don't mind I've changed my question text to be more precise and detailed.

Versioning in the database

I want to store full versioning of the row every time a update is made for amount sensitive table.
So far, I have decided to use the following approach.
Do not allow updates.
Every time a update is made create a new
entry in the table.
However, I am undecided on what is the best database structure design for this change.
Current Structure
Primary Key: id
id(int) | amount(decimal) | other_columns
First Approach
Composite Primary Key: id, version
id(int) | version(int) | amount(decimal) | change_reason
1 | 1 | 100 |
1 | 2 | 20 | correction
Second Approach
Primary Key: id
Uniqueness Index on [origin_id, version]
id(int) | origin_id(int) | version(int) | amount(decimal) | change_reason
1 | NULL | 1 | 100 | NULL
2 | 1 | 2 | 20 | correction
I would suggest a new table which store unique id for item. This serves as lookup table for all available items.
item Table:
id(int)
1000
For the table which stores all changes for item, let's call it item_changes table. item_id is a FOREIGN KEY to item table's id. The relationship between item table to item_changes table, is one-to-many relationship.
item_changes Table:
id(int) | item_id(int) | version(int) | amount(decimal) | change_reason
1 | 1000 | 1 | 100 | NULL
2 | 1000 | 2 | 20 | correction
With this, item_id will never be NULL as it is a valid FOREIGN KEY to item table.
The best method is to use Version Normal Form (vnf). Here is an answer I gave for a neat way to track all changes to specific fields of specific tables.
The static table contains the static data, such as PK and other attributes which do not change over the life of the entity or such changes need not be tracked.
The version table contains all dynamic attributes that need to be tracked. The best design uses a view which joins the static table with the current version from the version table, as the current version is probably what your apps need most often. Triggers on the view maintain the static/versioned design without the app needing to know anything about it.
The link above also contains a link to a document which goes into much more detail including queries to get the current version or to "look back" at any version you need.
Why you are not going for SCD-2 (Slowly Changing Dimension), which is a rule/methodology to describe the best solution for your problem. Here is the SCD-2 advantage and example for using, and it makes standard design pattern for the database.
Type 2 - Creating a new additional record. In this methodology, all history of dimension changes is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. Both the prior and new rows contain as attributes the natural key(or other durable identifiers). Also 'effective date' and 'current indicator' columns are used in this method. There could be only one record with the current indicator set to 'Y'. For 'effective date' columns, i.e. start_date, and end_date, the end_date for current record usually is set to value 9999-12-31. Introducing changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute could be added in the future.
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y
Detail Explanation::::
Here, all you need to add the 3 extra column, START_DATE, END_DATE, CURRENT_FLAG to track your record properly. When the first time record inserted # source, this table will be store the value as:
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 NULL Y
And, when the same record will be updated then you have to update the "END_DATE" of the previous record as current_system_date and "CURRENT_FLAG" as "N", and insert the second record as below. So you can track everything about your records. as below...
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y

Pivot query by date in Amazon Redshift

I have a table in Redshift like:
category | date
----------------
1 | 9/29/2016
1 | 9/28/2016
2 | 9/28/2016
2 | 9/28/2016
which I'd like to turn into:
category | 9/29/2016 | 2/28/2016
--------------------------------
1 | 1 | 1
2 | 0 | 2
(count of each category for each date)
Pivot a table with Amazon RedShift / PostgreSQL seems to be helpful using CASE statements but that requires knowing all possible cases beforehand - how could I do this if the columns I want are every day starting from a given date?
There is no functionality provided with Amazon Redshift that can automatically pivot the data.
The Pivot a table with Amazon RedShift / PostgreSQL page you referenced shows how the output can be generated, but it is unable to automatically adjust the number of columns based upon the input data.
One option would be to write a program that queries available date ranges, then generates the SQL query. However, this isn't possible totally within Amazon Redshift.
You could do a self join on date, which i'm currently looking up how to do.

To create column with date datatype in hive table

I have created table in HIVE(0.10.0) using values :
2012-01-11 17:51 Stockton Children's Clothing 168.68 Cash
2012-01-11 17:51 Tampa Health and Beauty 441.08 Amex
............
Here date and time are tab separated values and I need to work on date column, Since Hive doesn't allow "date" datatype ,I have used "TIMESTAMP" for first date column(2012-01-11,...),
however after creating table it is showing NULL values for first column.
How to solve this? Please guide.
I loaded the data into a table with all columns defined as string and then casted the date value and loaded into another table where the column was defined as DATE. It seems to be working without any issues. The only difference is that I am using a Shark version of Hive, and to be honest with you, I am not sure whether there are any profound differences with actual Hive and Shark Hive.
Data:
hduser2#ws-25:~$ more test.txt
2010-01-05 17:51 Visakh
2013-02-16 09:31 Nair
Code:
[localhost:12345] shark> create table test_time(dt string, tm string, nm string) row format delimited fields terminated by '\t' stored as textfile;
Time taken (including network latency): 0.089 seconds
[localhost:12345] shark> describe test_time;
dt string
tm string
nm string
Time taken (including network latency): 0.06 seconds
[localhost:12345] shark> load data local inpath '/home/hduser2/test.txt' overwrite into table test_time;
Time taken (including network latency): 0.124 seconds
[localhost:12345] shark> select * from test_time;
2010-01-05 17:51 Visakh
2013-02-16 09:31 Nair
Time taken (including network latency): 0.397 seconds
[localhost:12345] shark> select cast(dt as date) from test_time;
2010-01-05
2013-02-16
Time taken (including network latency): 0.399 seconds
[localhost:12345] shark> create table test_date as select cast(dt as date) from test_time;
Time taken (including network latency): 0.71 seconds
[localhost:12345] shark> select * from test_date;
2010-01-05
2013-02-16
Time taken (including network latency): 0.366 seconds
[localhost:12345] shark>
If you are using TIMESTAMP, then you could try something in the lines of concatenating the date and time strings and then casting them.
create table test_1 as select cast(concat(dt,' ', tm,':00') as string) as ts from test_time;
select cast(ts as timestamp) from test_1;
It works fine for me by using load command from beeline side.
Data:
[root#hostname workspace]# more timedata
buy,1977-03-12 06:30:23
sell,1989-05-23 07:23:12
creating table statement:
create table mytime(id string ,t timestamp) row format delimited fields terminated by ',';
And loading data statement:
load data local inpath '/root/workspace/timedata' overwrite into table mytime;
Table structure:
describe mytime;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| id | string | |
| t | timestamp | |
+-----------+------------+----------+--+
result of querying:
select * from mytime;
+------------+------------------------+--+
| mytime.id | mytime.t |
+------------+------------------------+--+
| buy | 1977-03-12 06:30:23.0 |
| sell | 1989-05-23 07:23:12.0 |
+------------+------------------------+--+
Apache Hive Data Types are very important for query language and data modeling (representation of the data structures in a table for a company’s database).
It is necessary to know about the data types and its usage to defining the table column types.
There are mainly two types of Apache Hive Data Types. They are,
Primitive Data types
Complex Data types
Will discuss about Complex data types,
Complex Data types further classified into four types. They are explained below,
2.1 ARRAY
It is an ordered collection of fields.
The fields must all be of the same type
Syntax: ARRAY
Example: array (1, 4)
2.2 MAP
It is an unordered collection of key-value pairs.
Keys must be primitives,values may be any type.
Syntax: MAP
Example: map(‘a’,1,’c’,3)
2.3 STRUCT
It is a collection of elements of different types.
Syntax: STRUCT
Example: struct(‘a’, 1 1.0)
2.4 UNION
It is a collection of Heterogeneous data types.
Syntax: UNIONTYPE
Example: create_union(1, ‘a’, 63)