How to group by an attribute and order by date - postgresql

I have two tables:
Medics
CREATE TABLE "medic" (
"id" BIGINT NOT NULL,
"name" CHARACTER VARYING(255) NOT NULL,
PRIMARY KEY ("id")
);
Comments
CREATE TABLE IF NOT EXISTS "comment" (
"id" BIGINT NOT NULL,
"medic_id" BIGINT NOT NULL,
"comment" CHARACTER VARYING(1024) NOT NULL,
"created_at" TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT now(),
CONSTRAINT pk_comment PRIMARY KEY (id),
CONSTRAINT fk_comment_medic FOREIGN KEY (medic_id)
REFERENCES medic(id) ON UPDATE NO ACTION ON DELETE NO ACTION
);
Now I want to get medic_id, name, comments_count and all ordered by created_at
Here's what I've tried so far:
SELECT m.id, m.name, COUNT(c.id)
FROM COMMENT AS c
JOIN medic AS m ON m.id = c.medic_id
GROUP BY m.id, m.name, c.created_at
ORDER BY c.created_at DESC
But obviously this can't work because it makes no sense to group by date although I have to do it when I want to order by date.
Another appraoch was to work with window functions. Particularly rank() over (partition by m.id order by c.created_at desc). But in this case I lose the ordering over all records.
Here's some SQLFiddle.
I am using Postgres 9.3

I'm guessing you want to order by the most recent comment date:
SELECT m.id, m.name, COUNT(c.id)
FROM COMMENT c JOIN
medic m
ON m.id = c.medic_id
GROUP BY m.id, m.name
ORDER BY MAX(c.created_at) DESC;

Related

Select rows with and without match of join

This – allegedly easy – task currently I cannot solve.
SQL Fiddle
http://sqlfiddle.com/#!17/90dce/1
Schema
Given this schema and data
CREATE TABLE asset (
"id" BIGINT NULL DEFAULT NULL,
"name" TEXT NULL DEFAULT NULL,
PRIMARY KEY ("id")
);
CREATE INDEX IF NOT EXISTS "IDX_id" ON asset (id);
CREATE TABLE category (
"id" BIGINT NULL DEFAULT NULL,
"ctype" TEXT NULL DEFAULT NULL,
"name" TEXT NULL DEFAULT NULL,
PRIMARY KEY ("id")
);
CREATE INDEX IF NOT EXISTS "IDX_id" ON category (id);
CREATE TABLE asset_category (
"asset_id" BIGINT NULL DEFAULT NULL,
"category_id" BIGINT NULL DEFAULT NULL,
CONSTRAINT "FK_asset_id" FOREIGN KEY ("asset_id") REFERENCES "asset" ("id") ON UPDATE CASCADE ON DELETE SET NULL,
CONSTRAINT "FK_category_id" FOREIGN KEY ("category_id") REFERENCES "category" ("id") ON UPDATE CASCADE ON DELETE SET NULL,
UNIQUE (asset_id, category_id)
);
INSERT INTO asset (id, "name") VALUES(1, 'Awesome Asset with a hit');
INSERT INTO asset (id, "name") VALUES(2, 'Great Asset without a hit');
INSERT INTO category (id, "name", "ctype") VALUES(1, 'First Category', NULL);
INSERT INTO category (id, "name", "ctype") VALUES(2, 'Second Category', 'directory');
INSERT INTO asset_category ("asset_id", "category_id") VALUES(1, 1);
INSERT INTO asset_category ("asset_id", "category_id") VALUES(1, 2);
INSERT INTO asset_category ("asset_id", "category_id") VALUES(2, 1);
Task
I want to get all assets with their category Id (in case they have one of type "directory". Otherwise NULL as category.
See my query below, I wrote two joins letting me limit the results in the ON clause. However, since both are related to the other category, the first JOIN hinders me to get a clean result.
What I tried
This query Query A
SELECT a.id "assetId", c.id "categoryId"
FROM asset a
LEFT JOIN asset_category ac ON ac.asset_id = a.id
left join category c on (
c.id = ac.category_id
AND
c.ctype = 'directory'
)
restulting in:
assetId categoryId
1 (null)
1 2
2 (null)
That is almost good, except, assetId 1 appears twice. This probably due to first JOIN, which creates a relation to assetcategory and the other category not of type 'directory'. Same as assetId 2.
Query B uses inner join:
SELECT a.id "assetId", c.id "categoryId"
FROM asset a
LEFT JOIN asset_category ac ON ac.asset_id = a.id
inner join category c on (
c.id = ac.category_id
AND
c.ctype = 'directory'
)
resulting in
assetId categoryId
1 2
However, here the problem is, it hides asset with id 2 for me as join is not successfully resolving asset id 2.
Desired output
assetId | categoryId
1 | 2
2 | null
I would be really happy about this seemingly simple task.
demo:db<>fiddle
Your first query is a good approach. It seems you wanted only one record per id. This is what is DISTINCT ON for:
SELECT DISTINCT ON (a.id)
a.id, c.id
FROM asset a
LEFT JOIN asset_category ac ON a.id = ac.asset_id
LEFT JOIN category c ON c.id = ac.category_id AND c."ctype" = 'directory'
ORDER BY a.id, ctype NULLS LAST
So, just order your joined result by id first, and order ctype = NULL records to bottom, which makes the directory values bubble up being the first one. DISTINCT ON takes the first record for each id afterwards which is the one you expect.

Can these two queries be optimised into a single one?

Given the tables:
create table entries
(
id integer generated always as identity
constraint entries_pk
primary key,
name text not null,
description text,
type integer not null
);
create table tasks
(
id integer generated always as identity
constraint tasks_pk
primary key,
channel_id bigint not null,
type integer not null,
is_active boolean default true not null
);
I currently have two separate queries. First:
SELECT id FROM tasks WHERE is_active = true;
Then, once per result from the last query:
SELECT t.channel_id, e.name, e.description
FROM tasks t
JOIN entries e ON t.type = e.type
WHERE t.id = :task_id
ORDER BY random()
LIMIT 1;
In other words I want a single random entry for each active task.
Can this be accomplished in a single query while retaining the limit per task?
Sure; use DISTINCT ON:
SELECT DISTINCT ON (t.id)
t.id, t.channel_id, e.name, e.description
FROM tasks t
JOIN entries e USING (type)
ORDER BY t.id, random();

Using 'on conflict' with a unique constraint on a table partitioned by date

Given the following table:
CREATE TABLE event_partitioned (
customer_id varchar(50) NOT NULL,
user_id varchar(50) NOT NULL,
event_id varchar(50) NOT NULL,
comment varchar(50) NOT NULL,
event_timestamp timestamp with time zone DEFAULT NOW()
)
PARTITION BY RANGE (event_timestamp);
And partitioning by calendar week [one example]:
CREATE TABLE event_partitioned_2020_51 PARTITION OF event_partitioned
FOR VALUES FROM ('2020-12-14') TO ('2020-12-20');
And the unique constraint [event_timestamp necessary since the partition key]:
ALTER TABLE event_partitioned
ADD UNIQUE (customer_id, user_id, event_id, event_timestamp);
I would like to update if customer_id, user_id, event_id exist, otherwise insert:
INSERT INTO event_partitioned (customer_id, user_id, event_id)
VALUES ('9', '99', '999')
ON CONFLICT (customer_id, user_id, event_id, event_timestamp) DO UPDATE
SET comment = 'I got updated';
But I cannot add a unique constraint only for customer_id, user_id, event_id, hence event_timestamp as well.
So this will insert duplicates of customer_id, user_id, event_id. Even so with adding now() as a fourth value, unless now() precisely matches what's already in event_timestamp.
Is there a way that ON CONFLICT could be less 'granular' here and update if now() falls in the week of the partition, rather than precisely on '2020-12-14 09:13:04.543256' for example?
Basically I am trying to avoid duplication of customer_id, user_id, event_id, at least within a week, but still benefit from partitioning by week (so that data retrieval can be narrowed to a date range and not scan the entire partitioned table).
I don't think you can do this with on conflict in a partitioned table. You can, however, express the logic with CTEs:
with
data as ( -- data
select '9' as customer_id, '99' as user_id, '999' as event_id
),
ins as ( -- insert if not exists
insert into event_partitioned (customer_id, user_id, event_id)
select * from data d
where not exists (
select 1
from event_partitioned ep
where
ep.customer_id = d.customer_id
and ep.user_id = d.user_id
and ep.event_id = d.event_id
)
returning *
)
update event_partitioned ep -- update if insert did not happen
set comment = 'I got updated'
from data d
where
ep.customer_id = d.customer_id
and ep.user_id = d.user_id
and ep.event_id = d.event_id
and not exists (select 1 from ins)
#GMB's answer is great and works well. Since enforcing a unique constrain on a partitioned table (parent table) partitioned by time range is usually not that useful, why now just have a unique constraint/index placed on the partition itself?
In your case, event_partitioned_2020_51 can have a unique constraint:
ALTER TABLE event_partitioned_2020_51
ADD UNIQUE (customer_id, user_id, event_id, event_timestamp);
And subsequent query can just use
INSERT ... INTO event_partitioned_2020_51 ON CONFLICT (customer_id, user_id, event_id, event_timestamp)
as long as this its the partition intended, which is usually the case.

Query too slow for just 4 tables with 50000 rows each

I've been struggling for hours and I can't find why this query takes too long (> 60 minutes). All 4 tables have less than 50.000 records.
Also if I remove any table (gel6, gf6 or ger6) the query takes less than 500 ms to execute. What am I doing wrong?
Explain plan:
https://explain.depesz.com/s/ldm2
SELECT COUNT(*)
FROM agroapp.ganado g
INNER JOIN (SELECT gel5.ganado_id, gel5.estado_leche
FROM agroapp.ganado_estado_leche gel5
INNER JOIN (SELECT MAX(gel3.ganado_estado_leche_id) ganado_estado_leche_id
FROM agroapp.ganado_estado_leche gel3
INNER JOIN (SELECT gel.ganado_id, MAX(gel.created) created
FROM agroapp.ganado_estado_leche gel
GROUP BY gel.ganado_id) gel2 ON (gel2.ganado_id = gel3.ganado_id AND gel2.created = gel3.created)
GROUP BY gel3.ganado_id) gel4 ON gel4.ganado_estado_leche_id = gel5.ganado_estado_leche_id
) gel6 ON gel6.ganado_id = g.ganado_id
INNER JOIN (SELECT gf5.ganado_id, gf5.fundo_id
FROM agroapp.ganado_fundo gf5
INNER JOIN (SELECT MAX(gf3.ganado_fundo_id) ganado_fundo_id
FROM agroapp.ganado_fundo gf3
INNER JOIN (SELECT gf.ganado_id, MAX(gf.created) created
FROM agroapp.ganado_fundo gf
GROUP BY gf.ganado_id) gf2 ON (gf2.ganado_id = gf3.ganado_id AND gf2.created = gf3.created)
GROUP BY gf3.ganado_id) gf4 ON gf4.ganado_fundo_id = gf5.ganado_fundo_id
) gf6 ON gf6.ganado_id = g.ganado_id
INNER JOIN (SELECT ger5.ganado_id, ger5.estado_reproductivo
FROM agroapp.ganado_estado_reproductivo ger5
INNER JOIN (SELECT MAX(ger3.ganado_estado_reproductivo_id) ganado_estado_reproductivo_id
FROM agroapp.ganado_estado_reproductivo ger3
INNER JOIN (SELECT ger.ganado_id, MAX(ger.created) created
FROM agroapp.ganado_estado_reproductivo ger
GROUP BY ger.ganado_id) ger2 ON (ger2.ganado_id = ger3.ganado_id AND ger2.created = ger3.created)
GROUP BY ger3.ganado_id) ger4 ON ger4.ganado_estado_reproductivo_id = ger5.ganado_estado_reproductivo_id
) ger6 ON ger6.ganado_id = g.ganado_id
WHERE g.organizacion_id = 21
Tables
CREATE TABLE agroapp.ganado_estado_leche
(
ganado_estado_leche_id serial NOT NULL,
organizacion_id integer NOT NULL,
isactive character(1) NOT NULL DEFAULT 'Y'::bpchar,
created timestamp without time zone NOT NULL DEFAULT now(),
createdby numeric(10,0) NOT NULL,
updated timestamp without time zone NOT NULL DEFAULT now(),
updatedby numeric(10,0) NOT NULL,
estado_leche character varying(80) NOT NULL,
ganado_id integer NOT NULL,
fecha_manejo timestamp without time zone NOT NULL,
CONSTRAINT ganado_estado_leche_pk PRIMARY KEY (ganado_estado_leche_id),
CONSTRAINT ganado_fk FOREIGN KEY (ganado_id)
REFERENCES agroapp.ganado (ganado_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE TABLE agroapp.ganado_fundo
(
ganado_fundo_id serial NOT NULL,
organizacion_id integer NOT NULL,
isactive character(1) NOT NULL DEFAULT 'Y'::bpchar,
created timestamp without time zone NOT NULL DEFAULT now(),
createdby numeric(10,0) NOT NULL,
updated timestamp without time zone NOT NULL DEFAULT now(),
updatedby numeric(10,0) NOT NULL,
fundo_id integer NOT NULL,
ganado_id integer NOT NULL,
CONSTRAINT ganado_fundo_pk PRIMARY KEY (ganado_fundo_id),
CONSTRAINT ganado_fk FOREIGN KEY (ganado_id)
REFERENCES agroapp.ganado (ganado_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE TABLE agroapp.ganado_estado_reproductivo
(
ganado_estado_reproductivo_id serial NOT NULL,
organizacion_id integer NOT NULL,
isactive character(1) NOT NULL DEFAULT 'Y'::bpchar,
created timestamp without time zone NOT NULL DEFAULT now(),
createdby numeric(10,0) NOT NULL,
updated timestamp without time zone NOT NULL DEFAULT now(),
updatedby numeric(10,0) NOT NULL,
estado_reproductivo character varying(80) NOT NULL,
ganado_id integer NOT NULL,
fecha_manejo timestamp without time zone NOT NULL,
CONSTRAINT ganado_estado_reproductivo_pk PRIMARY KEY (ganado_estado_reproductivo_id),
CONSTRAINT ganado_fk FOREIGN KEY (ganado_id)
REFERENCES agroapp.ganado (ganado_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE TABLE agroapp.ganado
(
ganado_id serial NOT NULL,
organizacion_id integer NOT NULL,
isactive character(1) NOT NULL DEFAULT 'Y'::bpchar,
created timestamp without time zone NOT NULL DEFAULT now(),
createdby numeric(10,0) NOT NULL,
updated timestamp without time zone NOT NULL DEFAULT now(),
updatedby numeric(10,0) NOT NULL,
fecha_nacimiento timestamp without time zone NOT NULL,
tipo_ganado character varying(80) NOT NULL,
diio_id integer NOT NULL,
fundo_id integer NOT NULL,
raza_id integer NOT NULL,
estado_reproductivo character varying(80) NOT NULL,
estado_leche character varying(80),
CONSTRAINT ganado_pk PRIMARY KEY (ganado_id),
CONSTRAINT diio_fk FOREIGN KEY (diio_id)
REFERENCES agroapp.diio (diio_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fundo_fk FOREIGN KEY (fundo_id)
REFERENCES agroapp.fundo (fundo_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT raza_fk FOREIGN KEY (raza_id)
REFERENCES agroapp.raza (raza_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
Table design
This looks very much like a boolean column (yes / no):
isactive character(1) NOT NULL DEFAULT 'Y'::bpchar
If so, replace with:
isactive bool NOT NULL DEFAULT TRUE
If you might involve multiple times zones in any way, use timestamptz instead of timestamp here:
created timestamp without time zone NOT NULL DEFAULT now(),
The default now() produces timestamptz and after the assignment cast results in the current time according to the time zone of the session. I.e., the value changes with the timezone of the session, which is a sneaky point of failure. See:
- Ignoring time zones altogether in Rails and PostgreSQL
And:
createdby numeric(10,0) NOT NULL
et al. look like they should really be just integer. (Or maybe bigint if you really think you might burn through more than 2147483648 numbers ...)
Query
Looking at the first subquery:
SELECT gel5.ganado_id, gel5.estado_leche
FROM agroapp.ganado_estado_leche gel5
INNER JOIN (
SELECT MAX(gel3.ganado_estado_leche_id) ganado_estado_leche_id
FROM agroapp.ganado_estado_leche gel3
INNER JOIN (
SELECT gel.ganado_id, MAX(gel.created) created
FROM agroapp.ganado_estado_leche gel
GROUP BY gel.ganado_id
) gel2 ON (gel2.ganado_id = gel3.ganado_id AND gel2.created = gel3.created)
GROUP BY gel3.ganado_id
) gel4 ON gel4.ganado_estado_leche_id = gel5.ganado_estado_leche_id
The innermost subquery gets the max. created per ganado_id, the next one the max ganado_estado_leche_id of those rows. And finally you join back and retrieve all ganado_id that appear in combination with the identified max ganado_estado_leche_id per partition. I have a hard time making sense of this, but it can be simplified to:
SELECT gel2.ganado_id
FROM agroapp.ganado_estado_leche gel2
JOIN (
SELECT DISTINCT ON (ganado_id) ganado_estado_leche_id
FROM agroapp.ganado_estado_leche
ORDER BY ganado_id, created DESC NULLS LAST, ganado_estado_leche_id DESC NULLS LAST
) gel1 USING (ganado_estado_leche_id)
See:
Select first row in each GROUP BY group?
Looks like an incorrect query to me. Same with the rest of the query: the joins multiply rows in an odd fashion. Not sure what you are trying to count, but I doubt the query counts just that. You did not provide enough information to make sense of it.

Merging columns from 2 different tables to apply aggregate function

I have below 3 Tables
Create table products(
prod_id character(20) NOT NULL,
name character varying(100) NOT NULL,
CONSTRAINT prod_pkey PRIMARY KEY (prod_id)
)
Create table dress_Sales(
prod_id character(20) NOT NULL,
dress_amount numeric(7,2) NOT NULL,
CONSTRAINT prod_pkey PRIMARY KEY (prod_id),
CONSTRAINT prod_id_fkey FOREIGN KEY (prod_id)
REFERENCES products (prod_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
Create table sports_Sales(
prod_id character(20) NOT NULL,
sports_amount numeric(7,2) NOT NULL,
CONSTRAINT prod_pkey PRIMARY KEY (prod_id),
CONSTRAINT prod_id_fkey FOREIGN KEY (prod_id)
REFERENCES products (prod_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I want to get the Sum and Average sales amount form both the tables(Only for the Selected Prod_id). I have tried the below code but it's not producing any value.
select sum(coalesce(b.dress_amount, c.sports_amount)) as total_Amount
from products a JOIN dress_sales b on a.prod_id = b.prod_id
JOIN sports_sales c on a.prod_id = c.prod_id and a.prod_id = ANY( {"123456","456789"}')`
Here 1000038923 is in dress_sales table and 8002265822 is in sports_sales.
Looks like your product can exist in only one table (dress_sales or sports_sales).
In this case you should use left join:
select
sum(coalesce(b.dress_amount, c.sports_amount)) as total_amount,
avg(coalesce(b.dress_amount, c.sports_amount)) as avg_amount
from products a
left join dress_sales b using(prod_id)
left join sports_sales c using(prod_id)
where
a.prod_id in ('1', '2');
If you use inner join (which is default) the product row will not appear in the result set as it will not be joined with either dress_sales or sports_sales.
If you have a product that appears in both tables you can use a subquery that can handle both dress_amount and sports_amount values.
select sum(combined.amount), avg(combined.amount)
from
(select prod_id, dress_amount as amount from dress_sales
union all
select prod_id, sports_amount as amount from sports_sales) combined
where
combined.prod_id in ('1','2');