sample table:
create table #dev (name varchar(20), team varchar(10), qualification varchar(10))
insert into #dev values
('a','t','be')
,('b','d','bsc')
,('c','p','be')
,('d','j','be')
,('e','d','bsc')
,('f','j','bsc')
,('g','d','be')
,('h','j','be')
,('i','d','bsc')
,('j','j','bsc')
,('k','d','be')
select team,qualification,COUNT(qualification ) from #dev
group by team,qualification having qualification = 'be'
select team,qualification,COUNT(qualification ) from #dev
where qualification = 'be'
group by team,qualification
They return same result. Here, I have a doubt. that is:
which query is efficient..?
which query is preferred by expert?
Answer will be very useful to us, in future.
Thanks,
TamilPugal.
Related
I have the following database schema (oversimplified):
create sequence partners_partner_id_seq;
create table partners
(
partner_id integer default nextval('partners_partner_id_seq'::regclass) not null primary key,
name varchar(255) default NULL::character varying,
company_id varchar(20) default NULL::character varying,
vat_id varchar(50) default NULL::character varying,
is_deleted boolean default false not null
);
INSERT INTO partners(name, company_id, vat_id) VALUES('test1','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test2','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test3','3214567890102', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test4','9999999999999', 'GE9999999999999');
I am trying to figure out how to return test1, test2 (because the company_id column value duplicates vertically) and test3 (because the vat_id column value duplicates vertically as well).
To put it in other words - I need to find duplicating company_id and vat_id records and group them together, so that test1, test2 and test3 would be together, because they duplicate by company_id and vat_id.
So far I have the following query:
SELECT *
FROM (
SELECT *, LEAD(row, 1) OVER () AS nextrow
FROM (
SELECT *, ROW_NUMBER() OVER (w) AS row
FROM partners
WHERE is_deleted = false
AND ((company_id != '' AND company_id IS NOT null) OR (vat_id != '' AND vat_id IS NOT NULL))
WINDOW w AS (PARTITION BY company_id, vat_id ORDER BY partner_id DESC)
) x
) y
WHERE (row > 1 OR nextrow > 1)
AND is_deleted = false
This successfully shows all company_id duplicates, but does not appear to show vat_id ones - test3 row is missing. Is this possible to be done within one query?
Here is a db-fiddle with the schema, data and predefined query reproducing my result.
You can do this with recursion, but depending on the size of your data you may want to iterate, instead.
The trick is to make the name just another match key instead of treating it differently than the company_id and vat_id:
create table partners (
partner_id integer generated always as identity primary key,
name text,
company_id text,
vat_id text,
is_deleted boolean not null default false
);
insert into partners (name, company_id, vat_id) values
('test1','1010109191191', 'BG1010109191192'),
('test2','1010109191191', 'BG1010109191192'),
('test3','3214567890102', 'BG1010109191192'),
('test4','9999999999999', 'GE9999999999999'),
('test5','3214567890102', 'BG8888888888888'),
('test6','2983489023408', 'BG8888888888888')
;
I added a couple of test cases and left in the lone partner.
with recursive keys as (
select partner_id,
array['n_'||name, 'c_'||company_id, 'v_'||vat_id] as matcher,
array[partner_id] as matchlist,
1 as size
from partners
), matchers as (
select *
from keys
union all
select p.partner_id, c.matcher,
p.matchlist||c.partner_id as matchlist,
p.size + 1
from matchers p
join keys c
on c.matcher && p.matcher
and not p.matchlist #> array[c.partner_id]
), largest as (
select distinct sort(matchlist) as matchlist
from matchers m
where not exists (select 1
from matchers
where matchlist #> m.matchlist
and size > m.size)
-- and size > 1
)
select *
from largest
;
matchlist
{1,2,3,5,6}
{4}
fiddle
EDIT UPDATE
Since recursion did not perform, here is an iterative example in plpgsql that uses a temporary table:
create temporary table match1 (
partner_id int not null,
group_id int not null,
matchkey uuid not null
);
create index on match1 (matchkey);
create index on match1 (group_id);
insert into match1
select partner_id, partner_id, md5('n_'||name)::uuid from partners
union all
select partner_id, partner_id, md5('c_'||company_id)::uuid from partners
union all
select partner_id, partner_id, md5('v_'||vat_id)::uuid from partners;
do $$
declare _cnt bigint;
begin
loop
with consolidate as (
select group_id,
min(group_id) over (partition by matchkey) as new_group_id
from match1
), minimize as (
select group_id, min(new_group_id) as new_group_id
from consolidate
group by group_id
), doupdate as (
update match1
set group_id = m.new_group_id
from minimize m
where m.group_id = match1.group_id
and m.new_group_id != match1.group_id
returning *
)
select count(*) into _cnt from doupdate;
if _cnt = 0 then
exit;
end if;
end loop;
end;
$$;
updated fiddle
This is my trivial test table,
create table test (
id int not null generated always as identity,
first_name. varchar,
primary key (id),
unique(first_name)
);
As an alternative to insert-into-on-conflict sentences, I was trying to use the coalesce laziness to execute a select whenever possible or an insert, only when select fails to find a row.
coalesce laziness is described in documentation. See https://www.postgresql.org/docs/current/functions-conditional.html
Like a CASE expression, COALESCE only evaluates the arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated. This SQL-standard function provides capabilities similar to NVL and IFNULL, which are used in some other database systems.
I also want to get back the id value of the row, having being inserted or not.
I started with:
select coalesce (
(select id from test where first_name='carlos'),
(insert into test(first_name) values('carlos') returning id)
);
but an error syntax error at or near "into" was found.
See it on this other DBFiddle
https://www.db-fiddle.com/f/t7TVkoLTtWU17iaTAbEhDe/0
Then I tried:
select coalesce (
(select id from test where first_name='carlos'),
(with r as (
insert into test(first_name) values('carlos') returning id
) select id from r
)
);
Here I am getting a WITH clause containing a data-modifying statement must be at the top level error that I don't understand, as insert is the first and only sentence within the with.
I am testing this with DBFiddle and PostgreSQL 13. The source code can be found at
https://www.db-fiddle.com/f/hp8T1iQ8eS4wozDCBhBXDw/5
Different method: chained CTEs:
CREATE TABLE test
( id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, first_name VARCHAR UNIQUE
);
WITH sel AS (
SELECT id FROM test WHERE first_name = 'carlos'
)
, ins AS (
INSERT INTO test(first_name)
SELECT 'carlos'
WHERE NOT EXISTS (SELECT 1 FROM test WHERE first_name = 'carlos')
RETURNING id
)
, omg AS (
SELECT id FROM sel
UNION ALL
SELECT id FROM ins
)
SELECT id
FROM omg
;
It seems that the returning value from the insert into clause is not equivalent in nature to the scalar query of a select clause. So I try encapsulating the insert into into an SQL function and it worked.
create or replace function insert_first_name(
_first_name varchar
) returns int
language sql as $$
insert into test (first_name)
values (_first_name)
returning id;
$$;
select coalesce (
(select id from test where first_name='carlos'),
(select insert_first_name('carlos'))
);
See it on https://www.db-fiddle.com/f/73rVXgqGfrG4VmjrAk6Z3i/2
This is a refinement on #wildplasser accepted answer. it avoids comparing first_name twice and uses coalesce instead of union all. Kind of an selsert in just one sentence.
with sel as (
select id from test where first_name = 'carlos'
)
, ins as (
insert into test(first_name)
select 'carlos'
where (select id from sel) is null
returning id
)
select coalesce (
(select id from sel),
(select id from ins)
);
See it at https://www.db-fiddle.com/f/goRh4TyAebTkEZFHk6WbtK/6
Let's say we have the following table that stores id of an observation and its address_id. You can create the table with the following code:
drop table if exists schema.pl_address_cnt;
create table schema.pl_address_cnt (
id serial,
address_id int);
insert into schema.pl_address_cnt(address_id) values
(100), (101), (100), (101), (100), (125), (128), (200), (200), (100);
My task is to count for each id how many other ids (thus -1) have the same address_id. I've come up with a solution that turns out to be quite expensive (explain) on the original dataset. I wonder whether my solution can be somehow optimised.
with tmp_table as (select address_id
, count(distinct id) as id_count
from schema.pl_address_cnt
group by address_id
)
select id
, id_count - 1
from schema.pl_address_cnt as pac
left join tmp_table as tt on tt.address_id=pac.address_id;
You can try to omit the CTE and do a self left join on common address but different ID and then aggregate this.
SELECT pac1.id,
count(pac2.id)
FROM pl_address_cnt pac1
LEFT JOIN pl_address_cnt pac2
ON pac1.address_id = pac2.address_id
AND pac1.id <> pac2.id
GROUP BY pac1.id
ORDER BY pac1.id;
For performance you can try indexes on (address_id, id) and (id).
I have a stored procedure with 2 CTEs. The second CTE has a parameter
WITH path_sequences
AS
(
),
WITH categories
AS
(
... WHERE CategoryId = #CategoryId
// I dont know how to get this initial parameter inside the CTE
)
SELECT * FROM path_sequences p
JOIN categories c
ON p.CategoryId = c.CategoryId
The initial parameter that I need to get inside the second TCE is p.CategoryId. How do I do that without having to create another stored procedure to contain the second CTE?
Thanks for helping
You can create table valued function
create function ftCategories
(
#CategoryID int
)
returns table
as return
with categories as (
... WHERE CategoryId = #CategoryId
)
select Col1, Col2 ...
from categories
and use it as
SELECT *
FROM path_sequences p
cross apply ftCategories(p.CategoryId) c
I have created simple query using your code. You can use it like -
DECLARE #CategoryId INT
SET #CategoryId = 1
;WITH path_sequences
AS
(
SELECT 1 CategoryId
),
categories
AS
(
SELECT 1 CategoryId WHERE 1 = #CategoryId
)
SELECT * FROM path_sequences p
JOIN categories c
ON p.CategoryId = c.CategoryId
This syntax is for External Aliases:
-- CTES With External Aliases:
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)
AS
-- Define the CTE query.
(
SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
)
The only way to add parameters is to use scope variables like so:
--Declare a variable:
DECLARE #category INT
WITH
MyCTE1 (exName1, exName2)
AS
(
SELECT <SELECT LIST>
FROM <TABLE LIST>
--Use the variable as 'a parameter'
WHERE CategoryId = #CategoryId
)
First remove the second WITH, separate each cte with just a comma. Next you can add parameters like this:
DECLARE #category INT; -- <~~ Parameter outside of CTEs
WITH
MyCTE1 (col1, col2) -- <~~ were poorly named param1 and param2 previously
AS
(
SELECT blah blah
FROM blah
WHERE CategoryId = #CategoryId
),
MyCTE2 (col1, col2) -- <~~ were poorly named param1 and param2 previously
AS
(
)
SELECT *
FROM MyCTE2
INNER JOIN MyCTE1 ON ...etc....
EDIT (and CLARIFICATION):
I have renamed the columns from param1 and param2 to col1 and col2 (which is what I meant originally).
My example assumes that each SELECT has exactly two columns. The columns are optional if you want to return all of the columns from the underlying query AND those names are unique. If you have more or less columns than what is being SELECTed you will need to specify names.
Here is another example:
Table:
CREATE TABLE Employee
(
Id INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
ManagerId INT NULL
)
Fill table with some rows:
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Donald', 'Duck', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Micky', 'Mouse', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Daisy', 'Duck', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Fred', 'Flintstone', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Darth', 'Vader', null)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Bugs', 'Bunny', null)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Daffy', 'Duck', null)
CTEs:
DECLARE #ManagerId INT = 5;
WITH
MyCTE1 (col1, col2, col3, col4)
AS
(
SELECT *
FROM Employee e
WHERE 1=1
AND e.Id = #ManagerId
),
MyCTE2 (colx, coly, colz, cola)
AS
(
SELECT e.*
FROM Employee e
INNER JOIN MyCTE1 mgr ON mgr.col1 = e.ManagerId
WHERE 1=1
)
SELECT
empsWithMgrs.colx,
empsWithMgrs.coly,
empsWithMgrs.colz,
empsWithMgrs.cola
FROM MyCTE2 empsWithMgrs
Notice in the CTEs the columns are being aliased. MyCTE1 exposes columns as col1, col2, col3, col4 and MyCTE2 references MyCTE1.col1 when it references it. Notice the final select uses MyCTE2's column names.
Results:
For anyone still struggling with this, the only thing you need to is terminate your declaration of variables with a semicolon before the CTE. Nothing else is required.
DECLARE #test AS INT = 42;
WITH x
AS (SELECT #test AS 'Column')
SELECT *
FROM x
Results:
Column
-----------
42
(1 row affected)
I have a table t1 as below:
create table t1 (
person_id int,
item_name varchar(30),
item_value varchar(100)
);
There are five records in this table:
person_id | item_name | item_value
1 'NAME' 'john'
1 'GENDER' 'M'
1 'DOB' '1970/02/01'
1 'M_PHONE' '1234567890'
1 'ADDRESS' 'Some Addresses unknown'
Now I want to use crosstab function to extract NAME, GENDER data, so I write a SQL as:
select * from crosstab(
'select person_id, item_name, item_value from t1
where person_id=1 and item_name in ('NAME', 'GENDER') ')
as virtual_table (person_id int, NAME varchar, GENDER varchar)
My problem is, as you see the SQL in crosstab() contains condition of item_name, which will cause the quotation marks to be incorrect.
How do I solve the problem?
To avoid any confusion about how to escape single quotes and generally simplify the syntax, use dollar-quoting for the query string:
SELECT *
FROM crosstab(
$$
SELECT person_id, item_name, item_value
FROM t1
WHERE person_id = 1
AND item_name IN ('NAME', 'GENDER')
$$
) AS virtual_table (person_id int, name varchar, gender varchar);
See:
Insert text with single quotes in PostgreSQL
And you should add ORDER BY to your query string. I quote the manual for the tablefunc module:
In practice the SQL query should always specify ORDER BY 1,2 to ensure
that the input rows are properly ordered, that is, values with the
same row_name are brought together and correctly ordered within the
row. Notice that crosstab itself does not pay any attention to the
second column of the query result; it's just there to be ordered by,
to control the order in which the third-column values appear across the page.
See:
PostgreSQL Crosstab Query
Double your single quotes to escape them:
select * from crosstab(
'select person_id, item_name, item_value from t1
where person_id=1 and item_name in (''NAME'', ''GENDER'') ')
as virtual_table (person_id int, NAME varchar, GENDER varchar)