I'm trying to construct very simple graph showing how much visits I've got in some period of time (for example for each 5 minutes).
I have Grafana of v. 5.4.0 paired well with Postgres v. 9.6 full of data.
My table below:
CREATE TABLE visit (
id serial CONSTRAINT visit_primary_key PRIMARY KEY,
user_credit_id INTEGER NOT NULL REFERENCES user_credit(id),
visit_date bigint NOT NULL,
visit_path varchar(128),
method varchar(8) NOT NULL DEFAULT 'GET'
);
Here's some data in it:
id | user_credit_id | visit_date | visit_path | method
----+----------------+---------------+---------------------------------------------+--------
1 | 1 | 1550094818029 | / | GET
2 | 1 | 1550094949537 | /mortgage/restapi/credit/{userId}/decrement | POST
3 | 1 | 1550094968651 | /mortgage/restapi/credit/{userId}/decrement | POST
4 | 1 | 1550094988557 | /mortgage/restapi/credit/{userId}/decrement | POST
5 | 1 | 1550094990820 | /index/UGiBGp0V | GET
6 | 1 | 1550094990929 | / | GET
7 | 2 | 1550095986310 | / | GET
...
So I tried these 3 variants (actually, dozens of others with no luck) with no success:
Solution A:
SELECT
visit_date as "time",
count(user_credit_id) AS "user_credit_id"
FROM visit
WHERE $__timeFilter(visit_date)
ORDER BY visit_date ASC
No data on graph. Error: pq: invalid input syntax for integer: "2019-02-14T13:16:50Z"
Solution B
SELECT
$__unixEpochFrom(visit_date),
count(user_credit_id) AS "user_credit_id"
FROM visit
GROUP BY time
ORDER BY user_credit_id
Series ASELECT
$__time(visit_date/1000,10m,previous),
count(user_credit_id) AS "user_credit_id A"
FROM
visit
WHERE
visit_date >= $__unixEpochFrom()::bigint*1000 and
visit_date <= $__unixEpochTo()::bigint*1000
GROUP BY 1
ORDER BY 1
No data on graph. No Error..
Solution C:
SELECT
$__timeGroup(visit_date, '1h'),
count(user_credit_id) AS "user_credit_id"
FROM visit
GROUP BY time
ORDER BY time
No data on graph. Error: pq: function pg_catalog.date_part(unknown, bigint) does not exist
Could someone please help me to sort out this simple problem as I think the query should be compact, naive and simple.. But Grafana docs demoing its syntax and features confuse me slightly.. Thanks in advance!
Use this query, which will works if visit_date is timestamptz:
SELECT
$__timeGroupAlias(visit_date,5m,0),
count(*) AS "count"
FROM visit
WHERE
$__timeFilter(visit_date)
GROUP BY 1
ORDER BY 1
But your visit_date is bigint so you need to convert it to timestamp (probably with TO_TIMESTAMP()) or you will need find other way how to use it with bigint. Use query inspector for debugging and you will see SQL generated by Grafana.
Jan Garaj, Thanks a lot! I should admit that your snippet and what's more valuable your additional comments advising to switch to SQL debugging dramatically helped me to make my "breakthrough".
So, the resulting query which solved my problem below:
SELECT
$__unixEpochGroup(visit_date/1000, '5m') AS "time",
count(user_credit_id) AS "Total Visits"
FROM visit
WHERE
'1970-01-01 00:00:00 GMT'::timestamp + ((visit_date/1000)::text)::interval BETWEEN
$__timeFrom()::timestamp
AND
$__timeTo()::timestamp
GROUP BY 1
ORDER BY 1
Several comments to decypher all this Grafana magic:
Grafana has its limited DSL to make configurable graphs, this set of functions converts into some meaningful SQL (this is where seeing "compiled" SQL helped me a lot, many thanks again).
To make my BIGINT column be appropriate for predefined Grafana functions we need to simply convert it to seconds from UNIX epoch so, in math language - just divide by 1000.
Now, WHERE statement seems not so simple and predictable, Grafana DSL works different where and simple division did not make trick and I solved it by using another Grafana functions to get FROM and TO points of time (period of time for which Graph should be rendered) but these functions generate timestamp type while we do have BIGINT in our column. So, thanks to Postgres we have a bunch of converter means to make it timestamp ('1970-01-01 00:00:00 GMT'::timestamp + ((visit_date/1000)::text)::interval - generates you one BIGINT value converted to Postgres TIMESTAMP with which Grafana deals just fine).
P.S. If you don't mind I've changed my question text to be more precise and detailed.
Related
I am using PostgreSQL 13.4 and has intermediate level experience with PostgreSQL.
I have a table which stores reading from a device for each customer. For each customer we receive around ~3,000 data in a day and we store them in to our usage_data table as below
We have the proper indexing in place.
Column | Data Type | Index name | Idx Access Type
-------------+-----------------------------+---------------------------+---------------------------
id | bigint | |
customer_id | bigint | idx_customer_id | btree
date | timestamp | idx_date | btree
usage | bigint | idx_usage | btree
amount | bigint | idx_amount | btree
Also I have common index on 2 columns which is below.
CREATE INDEX idx_cust_date
ON public.usage_data USING btree
(customer_id ASC NULLS LAST, date ASC NULLS LAST)
;
Problem
Few weird incidents I have observed.
When I tried to get data for 06/02/2022 for a customer it took almost 20 seconds. It's simple query as below
SELECT * FROM usage_data WHERE customer_id =1 AND date = '2022-02-06'
The execution plan
When I execute same query for 15 days then I receive result in 32 milliseconds.
SELECT * FROM usage_data WHERE customer_id =1 AND date > '2022-05-15' AND date <= '2022-05-30'
The execution plan
Tried Solution
I thought it might be the issue of indexing as for this particular date I am facing this issue.
Hence I dropped all the indexing from the table and recreate it.
But the problem didn't resolve.
Solution Worked (Not Recommended)
To solve this I tried another way. Created a new database and restored the old database
I executed the same query in new database and this time it takes 10 milliseconds.
The execution plan
I don't think this is proper solution for production server
Any idea why for any specific data we are facing this issue?
Please let me know if any additional information is required.
Please guide. Thanks
I'm using confluent to write a query to get the first timestamp in a 5 minute window of a kafka topic. Here's the query (I know it's not the pretty way to do it):
CREATE STREAM start_metric_value AS
select metric_value
FROM dataaggregaion
WINDOW TUMBLING (SIZE 5 MINUTE)
where metric_datetime_utc = MIN(TIMESTAMPTOSTRING(metric_datetime_utc, 'yyyy-MM-dd HH:mm:ss')) LIMIT 1;
but I have this error :
Code generation failed for Predicate: Can't find any functions with
the name 'MIN'. expression:(METRIC_DATETIME_UTC =
MIN(TIMESTAMPTOSTRING(METRIC_DATETIME_UTC, 'yyyy-MM-dd HH:mm:ss'))),
schema:ROWKEY STRING KEY, ID STRING, METRIC_NAME STRING,
METRIC_VALUE STRING, METRIC_DATETIME_UTC BIGINT, METRIC_INDEX
STRING, IANA_TIMEZONE STRING, PROCESSED_DATETIME_UTC BIGINT,
DATA_TYPE STRING, ASSET_TYPE STRING, ROWTIME BIGINT, ROWKEY
STRING Caused by: Can't find any functions with the name 'MIN'
can any one know how to solve this problem
Not 100% clear about what you're trying to achieve. See comment above on your question about adding more details to help people understand what you're trying to achieve.
That said, I can say....
The Min function is not being recognised for two reasons:
You're passing the output of TIMESTAMPTOSTRING to MIN, but MIN does not take a string.
You can't use a aggregate function in a WHERE clause.
The error message you're seeing looks like a bug. If it still exists on the latest version of ksqlDB you may want to raise an issue in the ksqlDB GitHub project.
Even correcting these two things you're query will still fail as windowing in ksqlDB requires an aggregation, so you'll need a GROUP BY.
If, for example, you wanted to capture the min metric_datetime_utc per metric_value for each 5 minute window, you could do so with:
CREATE TABLE start_metric_value AS
SELECT
metric_value,
MIN(metric_datetime_utc) as minTs
FROM dataaggregaion
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY metric_value;
This will create a windowed table, i.e. a table where the key is made up of metric_value and the WINDOWSTART time. minTs will store the minimum datetime seen.
Let's run some data through the query to understand what's happening:
Input:
rowtime | metric_value | metric_datetime_utc
--------|---------------|--------------------
1 | A | 3
2 | A | 4
3 | A | 2
4 | B | 5
300000 | A | 6
Output to the START_METRIC_VALUE topic might be (Note: metric_Value and windowStart will be stored in the Kafka record's key, while minTs will be in the value):
metric_value | windowStart | minTs
-------------|-------------|------
A | 0 | 3
A | 0 | 3
A | 0 | 2
B | 0 | 5
A | 300000 | 6
What is actually output to the topic will depend on your value of cache.max.bytes.buffering. Setting this to 0, turning off buffering, will see the above output. However, with buffering enabled some of the intermediate results may not be output to Kafka, though the final result for each window will remain the same. You can also control what is output to Kafka using the upcoming SUPPRESS functionality
The above solution gives you the min timestamp per metric_value. If you want a global minimum datetime seen per window, then you can GROUP BY a constant. Note, this routes all events to a single ksqlDB node, so it doesn't scale well as a solution. If scaling is an issue there are solutions, e.g. like first calculating the minimum metric_value and then post-processing this to find the global minimum.
CREATE TABLE start_metric_value AS
SELECT
1 as Key,
MIN(metric_datetime_utc) as minTs
FROM dataaggregaion
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY 1;
Note: syntax is correct for version 0.10 of ksqlDB. You may need to adjust for other versions.
I am looking for a way to use Postgres (version 9.6+) Full Text Search to find other documents similar to an input document - essentially looking for a way to produce similar results to Elasticsearch's more_like_this query. Far as I can tell Postgres offers no way to compare ts_vectors to each other.
I've tried various techniques like converting the source document back into a ts_query, or reprocessing the original doc but that requires too much overhead.
Would greatly appreciate any advice - thanks!
Looks like the only option is to use pg_trgm instead of the Postgres built in full text search. Here is how I ended up implementing this:
Using a simple table (or materialized view in this case) - it holds the primary key to the post and the full text body in two columns.
Materialized view "public.text_index"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
post | text | | | | extended | |
View definition:
SELECT posts.id,
posts.body AS text
FROM posts
ORDER BY posts.publication_date DESC;
Then using a lateral join we can match rows and order them by similarity to find posts that are "close to" or "related" to any other post:
select * from text_index tx
left join lateral (
select similarity(tx.text, t.text) from text_index t where t.id = 12345
) s on true
order by similarity
desc limit 10;
This of course is a naive way to match documents and may require further tuning. Additionally using a tgrm gin index on the text column will speed up the searches significantly.
I have a table in Redshift like:
category | date
----------------
1 | 9/29/2016
1 | 9/28/2016
2 | 9/28/2016
2 | 9/28/2016
which I'd like to turn into:
category | 9/29/2016 | 2/28/2016
--------------------------------
1 | 1 | 1
2 | 0 | 2
(count of each category for each date)
Pivot a table with Amazon RedShift / PostgreSQL seems to be helpful using CASE statements but that requires knowing all possible cases beforehand - how could I do this if the columns I want are every day starting from a given date?
There is no functionality provided with Amazon Redshift that can automatically pivot the data.
The Pivot a table with Amazon RedShift / PostgreSQL page you referenced shows how the output can be generated, but it is unable to automatically adjust the number of columns based upon the input data.
One option would be to write a program that queries available date ranges, then generates the SQL query. However, this isn't possible totally within Amazon Redshift.
You could do a self join on date, which i'm currently looking up how to do.
I've been trying to find a solution to this challenge all day.
I've got a table:
id | amount | type | date | description | club_id
-------+---------+------+----------------------------+---------------------------------------+---------+--------
783 | 10000 | 5 | 2011-08-23 12:52:19.995249 | Sign on fee | 7
The table has a lot more data than this.
What I'm trying to do is get the sum of amount for each week, given a specific club_id.
The last thing I ended up with was this, but it doesn't work:
WITH RECURSIVE t AS (
SELECT EXTRACT(WEEK FROM date) AS week, amount FROM club_expenses WHERE club_id = 20 AND EXTRACT(WEEK FROM date) < 10 ORDER BY week
UNION ALL
SELECT week+1, amount FROM t WHERE week < 3
)
SELECT week, amount FROM t;
I'm not sure why it doesn't work, but it complains about the UNION ALL.
I'll be off to bed in a minute, so I won't be able to see any answers before tomorrow (sorry).
I hope I've described it adequately.
Thanks in advance!
It looks to me like you are trying to use the UNION ALL to retrieve a subset of the first part of the query. That won't work. You have two options. The first is to use user defined functions to add behavior as you need it and the second is to nest your WITH clauses. I tend to prefer the former, but you may be preferring the latter.
To do the functions/table methods approach you create a function which accepts as input a row from a table and does not hit the table directly. This provides a bunch of benefits including the ability to easily index the output. Here the function would look like:
CREATE FUNCTION week(club_expense) RETURNS int LANGUAGE SQL IMMUTABLE AS $$
select EXTRACT(WEEK FROM $1.date)
$$;
Now you have a usable macro which can be used where you would use a column. You can then:
SELECT c.week, sum(amount) FROM club_expense c
GROUP BY c.week;
Note that the c. before week is not optional. The parser converts that into week(c). If you want to limit this to a year, you can do the same with years.
This is a really neat, useful feature of Postgres.