I have the following database schema (oversimplified):
create sequence partners_partner_id_seq;
create table partners
(
partner_id integer default nextval('partners_partner_id_seq'::regclass) not null primary key,
name varchar(255) default NULL::character varying,
company_id varchar(20) default NULL::character varying,
vat_id varchar(50) default NULL::character varying,
is_deleted boolean default false not null
);
INSERT INTO partners(name, company_id, vat_id) VALUES('test1','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test2','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test3','3214567890102', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test4','9999999999999', 'GE9999999999999');
I am trying to figure out how to return test1, test2 (because the company_id column value duplicates vertically) and test3 (because the vat_id column value duplicates vertically as well).
To put it in other words - I need to find duplicating company_id and vat_id records and group them together, so that test1, test2 and test3 would be together, because they duplicate by company_id and vat_id.
So far I have the following query:
SELECT *
FROM (
SELECT *, LEAD(row, 1) OVER () AS nextrow
FROM (
SELECT *, ROW_NUMBER() OVER (w) AS row
FROM partners
WHERE is_deleted = false
AND ((company_id != '' AND company_id IS NOT null) OR (vat_id != '' AND vat_id IS NOT NULL))
WINDOW w AS (PARTITION BY company_id, vat_id ORDER BY partner_id DESC)
) x
) y
WHERE (row > 1 OR nextrow > 1)
AND is_deleted = false
This successfully shows all company_id duplicates, but does not appear to show vat_id ones - test3 row is missing. Is this possible to be done within one query?
Here is a db-fiddle with the schema, data and predefined query reproducing my result.
You can do this with recursion, but depending on the size of your data you may want to iterate, instead.
The trick is to make the name just another match key instead of treating it differently than the company_id and vat_id:
create table partners (
partner_id integer generated always as identity primary key,
name text,
company_id text,
vat_id text,
is_deleted boolean not null default false
);
insert into partners (name, company_id, vat_id) values
('test1','1010109191191', 'BG1010109191192'),
('test2','1010109191191', 'BG1010109191192'),
('test3','3214567890102', 'BG1010109191192'),
('test4','9999999999999', 'GE9999999999999'),
('test5','3214567890102', 'BG8888888888888'),
('test6','2983489023408', 'BG8888888888888')
;
I added a couple of test cases and left in the lone partner.
with recursive keys as (
select partner_id,
array['n_'||name, 'c_'||company_id, 'v_'||vat_id] as matcher,
array[partner_id] as matchlist,
1 as size
from partners
), matchers as (
select *
from keys
union all
select p.partner_id, c.matcher,
p.matchlist||c.partner_id as matchlist,
p.size + 1
from matchers p
join keys c
on c.matcher && p.matcher
and not p.matchlist #> array[c.partner_id]
), largest as (
select distinct sort(matchlist) as matchlist
from matchers m
where not exists (select 1
from matchers
where matchlist #> m.matchlist
and size > m.size)
-- and size > 1
)
select *
from largest
;
matchlist
{1,2,3,5,6}
{4}
fiddle
EDIT UPDATE
Since recursion did not perform, here is an iterative example in plpgsql that uses a temporary table:
create temporary table match1 (
partner_id int not null,
group_id int not null,
matchkey uuid not null
);
create index on match1 (matchkey);
create index on match1 (group_id);
insert into match1
select partner_id, partner_id, md5('n_'||name)::uuid from partners
union all
select partner_id, partner_id, md5('c_'||company_id)::uuid from partners
union all
select partner_id, partner_id, md5('v_'||vat_id)::uuid from partners;
do $$
declare _cnt bigint;
begin
loop
with consolidate as (
select group_id,
min(group_id) over (partition by matchkey) as new_group_id
from match1
), minimize as (
select group_id, min(new_group_id) as new_group_id
from consolidate
group by group_id
), doupdate as (
update match1
set group_id = m.new_group_id
from minimize m
where m.group_id = match1.group_id
and m.new_group_id != match1.group_id
returning *
)
select count(*) into _cnt from doupdate;
if _cnt = 0 then
exit;
end if;
end loop;
end;
$$;
updated fiddle
I was wondering if it is possible to have the input for a column be one string, with the output being a different string through some dictionary in PostgreSQL. I do know how to use CASE to convert numbers to strings using a SELECT statement, however, I was hoping to create a table such that inputs only require numbers but outputs always give strings.
As an example, for currency USD, CDN and GBP, where 1 = USD, 2 = CDN and 3 = GBP, an example would be:
CREATE TABLE test_table (
currency CHAR (1) CHECK (currency IN ('1','2','3'))
)
Where I could do this:
INSERT INTO test_table (currency)
VALUES ('1')
INSERT INTO test_table (currency)
VALUES ('1')
INSERT INTO test_table (currency)
VALUES ('2')
INSERT INTO test_table (currency)
VALUES ('3')
INSERT INTO test_table (currency)
VALUES ('3')
and the output would look like this:
You can use a CASE expression:
select case currency
when '1' then 'USD'
when '2' then 'CDN'
when '3' then 'GBP'
when '4' then 'EUR'
end as currency
from test_table;
But a better solution would be to create a currency table:
create table currency
(
id integer primary key,
currency_code varchar(3)
);
Then create a foreign key from your base table to the lookup table:
create table test_table
(
...
currency_id integer not null references currency,
...
);
Then use a join to display the code:
select c.code
from test_table t
join currency c on c.id = t.currency_id;
I have a PostgreSQL function similar to this:
CREATE OR REPLACE FUNCTION dbo.MyTestFunction(
_ID INT
)
RETURNS dbo.MyTable AS
$$
SELECT *,
(SELECT Name FROM dbo.MySecondTable WHERE RecordID = PersonID)
FROM dbo.MyTable
WHERE PersonID = _ID
$$ LANGUAGE SQL STABLE;
I would really like to NOT have to replace the RETURNS dbo.MyTable AS with something like:
RETURNS TABLE(
col1 INT,
col2 TEXT,
col3 BOOLEAN,
col4 TEXT
) AS
and list out all the columns of MyTable and Name of MySecondTable. Is this something that can be done? Thanks.
--EDIT--
To clarify I have to return ALL columns in MyTable and 1 column from MySecondTable. If MyTable has >15 columns, I don't want to have to list out all the columns in a RETURNS TABLE (col1.. coln).
You just list the columns that you want returned in the SELECT portion of your SQL statement:
SELECT t1.column1, t1.column2,
(SELECT Name FROM dbo.MySecondTable WHERE RecordID = PersonID)
FROM dbo.MyTable t1
WHERE PersonID = _ID
Now you'll just get column1, column3, and name returned
Furthermore, you'll probably find better performance using a LEFT OUTER JOIN in your FROM portion of the SQL statement as opposed to the correlated subquery you have now:
SELECT t1.column1, t1.column2, t2.Name
FROM dbo.MyTable t1
LEFT OUTER JOIN dbo.MySecondTable t2 ON
t2.RecordID = t1.PersonID
WHERE PersonID = _ID
Took a bit of a guess on where RecordID and PersonID were coming from, but that's the general idea.
Please, find below my schema:
CREATE TABLE reps (
id SERIAL PRIMARY KEY,
rep TEXT NOT NULL UNIQUE
);
CREATE TABLE terms (
id SERIAL PRIMARY KEY,
terms TEXT NOT NULL UNIQUE
);
CREATE TABLE shipVia (
id SERIAL PRIMARY KEY,
ship_via TEXT NOT NULL UNIQUE
);
CREATE TABLE invoices (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL CONSTRAINT customerNotEmpty CHECK(customer <> ''),
term_id INT REFERENCES terms,
rep_id INT NOT NULL REFERENCES reps,
ship_via_id INT REFERENCES shipVia,
...
item_count INT NOT NULL,
modified TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
version INT NOT NULL DEFAULT 0
);
CREATE TABLE invoiceItems (
id SERIAL PRIMARY KEY,
invoice_id INT NOT NULL REFERENCES invoices ON DELETE CASCADE,
name TEXT NOT NULL CONSTRAINT nameNotEmpty CHECK(name <> ''),
description TEXT,
qty INT NOT NULL CONSTRAINT validQty CHECK (qty > 0),
price DOUBLE PRECISION NOT NULL
);
I am trying to insert an invoice along with its invoice items in one SQL using writable CTE. I am currently stuck with the following SQL statement:
WITH new_invoice AS (
INSERT INTO invoices (id, customer, term_id, ship_via_id, rep_id, ..., item_count)
SELECT $1, $2, t.id, s.id, r.id, ..., $26
FROM reps r
JOIN terms t ON t.terms = $3
JOIN shipVia s ON s.ship_via = $4
WHERE r.rep = $5
RETURNING id
) INSERT INTO invoiceItems (invoice_id, name, qty, price, description) VALUES
(new_invoice.id,$27,$28,$29,$30)
,(new_invoice.id,$31,$32,$33,$34)
,(new_invoice.id,$35,$36,$37,$38);
Of course, this SQL is wrong, here is what PostgreSQL 9.2 has to say about it:
ERROR: missing FROM-clause entry for table "new_invoice"
LINE 13: (new_invoice.id,$27,$28,$29,$30)
^
********** Error **********
ERROR: missing FROM-clause entry for table "new_invoice"
SQL state: 42P01
Character: 704
Is it possible at all?
EDIT 1
I am trying the following version:
PREPARE insert_invoice_3 AS WITH
new_invoice AS (
INSERT INTO invoices (id, customer, term_id, ship_via_id, rep_id, ..., item_count)
SELECT $1, $2, t.id, s.id, r.id, ..., $26
FROM reps r
JOIN terms t ON t.terms = $3
JOIN shipVia s ON s.ship_via = $4
WHERE r.rep = $5
RETURNING id
),
v (name, qty, price, description) AS (
VALUES ($27,$28,$29,$30)
,($31,$32,$33,$34)
,($35,$36,$37,$38)
)
INSERT INTO invoiceItems (invoice_id, name, qty, price, description)
SELECT new_invoice.id, v.name, v.qty, v.price, v.description
FROM v, new_invoice;
And here is what I get in return:
ERROR: column "qty" is of type integer but expression is of type text
LINE 19: SELECT new_invoice.id, v.name, v.qty, v.price, v.descriptio...
^
HINT: You will need to rewrite or cast the expression.
********** Error **********
ERROR: column "qty" is of type integer but expression is of type text
SQL state: 42804
Hint: You will need to rewrite or cast the expression.
Character: 899
I guess v (name, qty, price, description) is not enough, the data types must be specified as well. However, v (name, qty INT, price, description) does not work - syntax error.
EDIT 2
Next, I have just tried the second version:
PREPARE insert_invoice_3 AS WITH
new_invoice AS (
INSERT INTO invoices (id, customer, term_id, ship_via_id, rep_id, ..., item_count)
SELECT $1, $2, t.id, s.id, r.id, ..., $26
FROM reps r
JOIN terms t ON t.terms = $3
JOIN shipVia s ON s.ship_via = $4
WHERE r.rep = $5
RETURNING id
)
INSERT INTO invoiceItems (invoice_id, name, qty, price, description)
(
SELECT i.id, $27, $28, $29, $30 FROM new_invoice i
UNION ALL
SELECT i.id, $31, $32, $33, $34 FROM new_invoice i
UNION ALL
SELECT i.id, $35, $36, $37, $38 FROM new_invoice i
);
Here is what I get:
ERROR: column "qty" is of type integer but expression is of type text
LINE 15: SELECT i.id, $27, $28, $29, $30 FROM new_invoice i
^
HINT: You will need to rewrite or cast the expression.
********** Error **********
ERROR: column "qty" is of type integer but expression is of type text
SQL state: 42804
Hint: You will need to rewrite or cast the expression.
Character: 759
Seems like the same error. It is interesting that if I remove all the UNION ALL and leave just one SELECT statement - it works!
EDIT 3
Why do I have to cast the parameters? Is it possible to specify the type of columns in the CTE?
PostgreSQL has such an extended interpretation of the VALUES clause that it may be used as a subquery by itself.
So you may express your query in this form:
WITH new_invoice AS (
INSERT INTO ...
RETURNING id
),
v(a,b,c,d) AS (values
($27,$28,$29,$30),
($31,$32,$33,$34),
...
)
INSERT INTO invoiceItems (invoice_id, name, qty, price, description)
SELECT new_invoice.id, a,b,c,d FROM v, new_invoice;
That assumes you want to insert the cartesian product of new_invoice and the values, which mostly makes sense if new_invoice is actually a single-row value.
WITH new_invoice AS (
INSERT INTO invoices ...
RETURNING id
)
INSERT INTO invoiceItems (invoice_id, name, qty, price, description)
VALUES ((select id from new_invoice), $27 , $28, $29, $30),
((select id from new_invoice), $31 , $32, $33, $34),
((select id from new_invoice), $35 , $36, $37, $38);
Instead of insert ... values ...., use insert ... select ...:
) INSERT INTO invoiceItems (invoice_id, name, qty, price, description)
SELECT new_invoice.id,$27,$28,$29,$30 FROM new_invoice
UNION ALL
...
In Postgres 8.4 or higher, what is the most efficient way to get a row of data populated by defaults without actually creating the row. Eg, as a transaction (pseudocode):
create table "mytable"
(
id serial PRIMARY KEY NOT NULL,
parent_id integer NOT NULL DEFAULT 1,
random_id integer NOT NULL DEFAULT random(),
)
begin transaction
fake_row = insert into mytable (id) values (0) returning *;
delete from mytable where id=0;
return fake_row;
end transaction
Basically I'd expect a query with a single row where parent_id is 1 and random_id is a random number (or other function return value) but I don't want this record to persist in the table or impact on the primary key sequence serial_id_seq.
My options seem to be using a transaction like above or creating views which are copies of the table with the fake row added but I don't know all the pros and cons of each or whether a better way exists.
I'm looking for an answer that assumes no prior knowledge of the datatypes or default values of any column except id or the number or ordering of the columns. Only the table name will be known and that a record with id 0 should not exist in the table.
In the past I created the fake record 0 as a permanent record but I've come to consider this record a type of pollution (since I typically have to filter it out of future queries).
You can copy the table definition and defaults to the temp table with:
CREATE TEMP TABLE table_name_rt (LIKE table_name INCLUDING DEFAULTS);
And use this temp table to generate dummy rows. Such table will be dropped at the end of the session (or transaction) and will only be visible to current session.
You can query the catalog and build a dynamic query
Say we have this table:
create table test10(
id serial primary key,
first_name varchar( 100 ),
last_name varchar( 100 ) default 'Tom',
age int not null default 38,
salary float default 100.22
);
When you run following query:
SELECT string_agg( txt, ' ' order by id )
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, -9999 || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
you will get this sting as a result:
"SELECT -9999 as id , null::character varying as first_name ,
'Tom'::character varying as last_name , 38 as age , 100.22 as salary"
then execute this query and you will get the "phantom row".
We can build a function that build and excecutes the query and return our row as a result:
CREATE OR REPLACE FUNCTION get_phantom_rec (p_i test10.id%type )
returns test10 as $$
DECLARE
v_sql text;
myrow test10%rowtype;
begin
SELECT string_agg( txt, ' ' order by id )
INTO v_sql
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, p_i || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
EXECUTE v_sql INTO myrow;
RETURN myrow;
END$$ LANGUAGE plpgsql ;
and then this simple query gives you what you want:
select * from get_phantom_rec ( -9999 );
id | first_name | last_name | age | salary
-------+------------+-----------+-----+--------
-9999 | | Tom | 38 | 100.22
I would just select the fake values as literals:
select 1 id, 1 parent_id, 1 user_id
The returned row will be (virtually) indistinguishable from a real row.
To get the values from the catalog:
select
0 as id, -- special case for serial type, just return 0
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'parent_id') as parent_id,
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'user_id') as user_id;
Note that you must know what the columns are and their type, but this is reasonable. If you change the table schema (except default value), you would need to tweak the query.
See the above as a SQLFiddle.