Multiple series with one query in Grafana using PostgresQL as datasource - grafana

I have data in a Postgres table with roughly this form:
CREATE TABLE jobs
(
id BIGINT PRIMARY KEY,
started_at TIMESTAMPTZ,
duration NUMERIC,
project_id BIGINT
)
I also came up with a query that is kinda what I want:
SELECT
$__timeGroupAlias(started_at,$__interval),
avg(duration) AS "durations"
FROM jobs
WHERE
project_id = 720
GROUP BY 1
ORDER BY 1
This query filters for one exact project_id. What I actually want is one line in the chart for each project that has an entry in the table, not for just one.
I fail to find a way to do that. I tried all different flavors of group by clauses I could think of, and also tried the examples I found online but none of them worked.

Try this Grafana PostgreSQL query:
SELECT
$__timeGroupAlias(started_at, $__interval),
project_id::text AS "metric",
AVG(duration) AS "durations"
FROM jobs
WHERE $__timeFilter(started_at)
GROUP BY 1,2
ORDER BY 1

Related

Grafana timeseries with Postgresql: GROUP BY doesn't work

I have a table:
CREATE TABLE IF NOT EXISTS "case_closed" (
"case_id" varchar(256) NOT NULL,
"closed_at" TIMESTAMP WITH TIME ZONE,
"disposition" VARCHAR(128),
PRIMARY KEY ("case_id")
);
And in Grafana, I need to display more than one graph, one per each 'disposition' value (I have 2 different disposition values at the moment).
I'm trying this query:
SELECT
$__timeGroupAlias(closed_at, $__interval),
disposition AS "metric",
COUNT(*) AS "value"
FROM case_closed
WHERE
$__timeFilter(closed_at)
GROUP BY 1,2
ORDER BY 1
And it gives me this ugly picture with only one single graph:
I searched here and from all I can see my query seems to be okay, but it still doesn't work. Maybe I'm missing something and it's not the query but some settings??
Solved! There was a small thing in the Query Builder (not sure why I didn't see it in any documentation):

Benefit to adding an Index for an order by column?

We have a large table (2.8M rows) where we are finding a single row by our device_token column
CREATE TABLE public.rpush_notifications (
id bigint NOT NULL,
device_token character varying,
data text,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
...
We are constantly doing the following query:
SELECT * FROM rpush_notifications WHERE device_token = '...' ORDER BY updated_at DESC LIMIT 1
I'd like to add a index for our device_token column, and I'm wondering if there is any benefit to creating an additional index for updated_at or creating a multicolumn index for both columns device_token and updated_at given that we are ordering by, i.e.:
CREATE INDEX foo ON rpush_notifications(device_token, updated_at)
I have been unable to find an answer that would help me understand if there would be any performance benefit to adding updated_at to the index given the query we are running above. Any help appreciated. We are running Postgresql11
There is a performance benefit if you combine both columns just like you did ((device_token, updated_at)), because the database can easily find the entries with the specific device_token and it does not need to do the sorting during the query.
Even better would be an index on (device_token, updated_at DESC) as it gives you the requested row as the first one with this device_token, so there is no need to get the first and start a sequential scan from there on to find the last.

Summarize repeated data in a Postgres table

I have a Postgres 9.1 table called ngram_sightings. Each row is a record of seeing an ngram in a document. An ngram can appear multiple times in a given document.
CREATE TABLE ngram_sightings
(
ngram VARCHAR,
doc_id INTEGER
);
I want summarize this table in another table called ngram_counts.
CREATE TABLE ngram_counts
(
ngram VARCHAR PRIMARY INDEX,
-- the number of unique doc_ids for a given ngram
doc_count INTEGER,
-- the count of a given ngram in ngram_sightings
corpus_count INTEGER
);
What is the best way to do this?
ngram_sightings is ~1 billion rows.
Should I create an index on ngram_sightings.ngram first?
Give this a shot!
INSERT INTO ngram_counts (ngram, doc_count, corpus_count)
SELECT
ngram
, count(distinct doc_id) AS doc_count
, count(*) AS corpus_count
FROM ngram_counts
GROUP BY ngram;
-- EDIT --
Here is a longer version using some temporary tables. First, count how many documents each ngram is associated with. I'm using 'tf' for "term frequency" and 'df' for "doc frequency", since you are heading in the direction of tf-idf vectorization and you may as well use the standard language, it will help with the next few steps.
CREATE TEMPORARY TABLE ngram_df AS
SELECT
ngram
, count(distinct doc_id) AS df
FROM ngram_counts
GROUP BY ngram;
Now you can create table for the total count of each ngram.
CREATE TEMPORARY TABLE ngram_tf AS
SELECT
ngram
, count(*) AS tf
FROM ngram_counts
GROUP BY ngram;
Then join the two on ngram.
CREATE TABLE ngram_tfidf AS
SELECT
tf.ngram
, tf.tf
, df.df
FROM ngram_tf
INNER JOIN ngram_df ON ngram_tf.ngram = ngram_df.ngram;
At this point, I expect you will be looking up ngram quite a bit, so it makes sense to index the last table on ngram. Keep me posted!

Creating a many to many in postgresql

I have two tables that I need to make a many to many relationship with. The one table we will call inventory is populated via a form. The other table sales is populated by importing CSVs in to the database weekly.
Example tables image
I want to step through the sales table and associate each sale row with a row with the same sku in the inventory table. Here's the kick. I need to associate only the number of sales rows indicated in the Quantity field of each Inventory row.
Example: Example image of linked tables
Now I know I can do this by creating a perl script that steps through the sales table and creates links using the ItemIDUniqueKey field in a loop based on the Quantity field. What I want to know is, is there a way to do this using SQL commands alone? I've read a lot about many to many and I've not found any one doing this.
Assuming tables:
create table a(
item_id integer,
quantity integer,
supplier_id text,
sku text
);
and
create table b(
sku text,
sale_number integer,
item_id integer
);
following query seems to do what you want:
update b b_updated set item_id = (
select item_id
from (select *, sum(quantity) over (partition by sku order by item_id) as sum from a) a
where
a.sku=b_updated.sku and
(a.sum)>
(select count(1) from b b_counted
where
b_counted.sale_number<b_updated.sale_number and
b_counted.sku=b_updated.sku
)
order by a.sum asc limit 1
);

optimizing a slow postgresql query against multiple tables

One of our PostgreSQL queries started getting slow (~15 seconds) so we looked at migrating to a Graph database. Early tests show significantly faster speeds, so AWESOME.
Here's the problem- we still need to store a backup of the data in Postgres for non-analytics needs. The Graph database is just for analytics, and we'd prefer for it to remain a secondary data store. Because our business logic changed quite a bit during this migration, two existing tables turned into 4 -- and the current 'backup' selects in Postgres take anywhere from 1 to 6 minutes to run.
I've tried a few ways to optimize this, and the best seems to be turning this into two queries. If anyone can suggest obvious mistakes here , I'd love to hear a suggestion. I've tried switching up left/right/inner joins with little difference in the query planner. The join order does affect a difference ; I think I'm just not getting this correctly.
I'll go into details.
Goal : Retrieve the last 10 attachments sent to a given person
Database Structure :
CREATE TABLE message (
id SERIAL PRIMARY KEY NOT NULL ,
body_raw TEXT
);
CREATE TABLE attachments (
id SERIAL PRIMARY KEY NOT NULL ,
body_raw TEXT
);
CREATE TABLE message_2_attachments (
message_id INT NOT NULL REFERENCES message(id) ,
attachment_id INT NOT NULL REFERENCES attachments(id)
);
CREATE TABLE mailings (
id SERIAL PRIMARY KEY NOT NULL ,
event_timestamp TIMESTAMP not null ,
recipient_id INT NOT NULL ,
message_id INT NOT NULL REFERENCES message(id)
);
sidenote: the reason why a mailing is abstracted from the message is that a mailing often has more than one recipient /and/ a single message can go out to multiple recipients
This query takes about 5 minutes on a relatively small dataset (query planner time is the comment above each item ) :
-- 159374.75
EXPLAIN ANALYZE SELECT attachments.*
FROM attachments
JOIN message_2_attachments ON attachments.id = message_2_attachments.attachment_id
JOIN message ON message_2_attachments.message_id = message.id
JOIN mailings ON mailings.message_id = message.id
WHERE mailings.recipient_id = 1
ORDER BY mailings.event_timestamp desc limit 10 ;
Splitting it up into 2 queries only takes 1/8 the time :
-- 19123.22
EXPLAIN ANALYZE SELECT message_2_attachments.attachment_id
FROM mailings
JOIN message ON mailings.message_id = message.id
JOIN message_2_attachments ON message.id = message_2_attachments.message_id
JOIN attachments ON message_2_attachments.attachment_id = attachments.id
WHERE mailings.recipient_id = 1
ORDER BY mailings.event_timestamp desc limit 10 ;
-- 1.089
EXPLAIN ANALYZE SELECT * FROM attachments WHERE id IN ( results of above query )
I've tried re-writing the queries a handful of times -- different join orders, different types of joins, etc. I can't seem to make this anywhere nearly as efficient in a single query as it can be in two.
UPDATED Github has better formatting, so the full output of explain is here - https://gist.github.com/jvanasco/bc1dd38ca06e52c9a090
Plugged in the output of your explain here : http://explain.depesz.com/s/hqPT
As you can see, the :
Hash Join (cost=96588.85..158413.71 rows=44473 width=3201) (actual time=22590.630..30761.213 rows=44292 loops=1)
Hash Cond: (message_2_attachment.attachment_id = attachment.id)
is taking a good amount of time. I'd try to add indexes to the foreign keys as well with :
CREATE INDEX idx_message_2_attachments_attachment_id ON "message_2_attachments" USING btree (attachment_id);
CREATE INDEX idx_message_2_attachments_message_id ON "message_2_attachments" USING btree (message_id);`
CREATE INDEX idx_mailings_message_id ON "mailings" USING btree (message_id);
The junction table is missing a primary key. Also it is advisable to add a reversed index on this PK:
CREATE TABLE message_2_attachments (
message_id INT NOT NULL REFERENCES message(id) ,
attachment_id INT NOT NULL REFERENCES attachments(id)
, PRIMARY KEY (message_id,attachment_id) -- <<== here
);
CREATE UNIQUE INDEX ON message_2_attachments(attachment_id,message_id); -- <<== here
For the mailings table, the situation is not so clear. It looks like some combination of {event_timestamp, recipient_id, message_id} could function as a candidate key. The id field merely functions as a surrogate.