PostgreSQL. Stored function or query improvements - postgresql

I'm using PostgreSQL and have an employee table:
employeeid | FirstName | LastName | Department | Position
1 | Aaaa | | |
2 | Bbbb | | |
3 | | | |
. | | | |
Reports table:
employeeid | enter | exit
1 | 2020-11-08 09:02:21 | 2020-11-08 18:12:01
. | ... |
Now, I'm querying for missed employees on a certain date something like that function:
for i in select employeeid from employee
loop
if select not exists(select 1 from reports where enter::date = '2020-11-08' and employeeid = i) then
return query select employeeid, lastname, firstname, department, position from employee where employeeid = i;
end if;
end loop;
Seems to me, it's not an ideal solution. Is there any better approach to achieve the same result?
Thanks.

Your code is pretty non effective. Probably it cannot be written badly :-). The reply #a_horse_with_no_name is absolutely correct, just for completeness I'll fix your plpgsql code.
DECLARE _e employee;
BEGIN
FOR _e IN SELECT * FROM employee
LOOP
IF NOT EXISTS(SELECT *
FROM reports r
WHERE r.enter::date = date '2020-11-08'
AND r.employeeid = _e.employeeid)
THEN
RETURN NEXT _e;
END IF;
END LOOP;
END;
When you write stored procedure (or any query based application), the important value is number of queries - low number is better (attention - there are exceptions - sometimes too much complex queries can be slower due their complexity). In your example there are employees * 2 + 1 queries (with larger overhead of plpgsql - RETURN QUERY is more expensive than RETURN NEXT). Solution proposed by #a_horse_with_no_name is one query (without a overhead of plpgsql). Any my example has employees + 1 queries (with lower overhead of plpgsql).
Your example is good example of one common SQL antipattern - "using ISAM style".

You are right: more often than not, a loop is a bad way to do things in a relational database.
You can do this in plain SQL:
select e.employeeid, e.lastname, e.firstname, e.department, e.position
from employee e
where not exists (select *
from reports r
where r.enter::date = date '2020-11-08'
and e.employeeid = r.employeeid);
This will return all employees for which no row exists in the reports table on 2020-11-08

Related

How to return a comma separated values of column without having to loop through the result set

Let say I have this 2 table
+----+---------+ +----+-----------+----------------+
| Id | Country | | Id | CountryId | City |
+----+---------+ +----+-----------+----------------+
| 1 | USA | | 1 | 1 | Washington, DC |
+----+---------+ +----+-----------+----------------+
| 2 | Canada | | 2 | 2 | Ottawa |
+----+---------+ +----+-----------+----------------+
| 3 | 1 | New York |
+----+-----------+----------------+
| 4 | 1 | Baltimore |
+----+-----------+----------------+
I need to produce a result like:
Id | Country | Cities
---+---------+--------------------------------------
1 | USA | Washington, DC, New York, Baltimore
---+------------------------------------------------
2 | Canada | Ottawa
So far, I am looping through the left side table result like this:
DECLARE #table
(
Id INT IDENTITY(1, 1),
CountryId INT,
City VARCHAR(50)
)
DECLARE #tableString
(
Id INT IDENTITY(1, 1),
CountryId INT,
Cities VARCHAR(100)
)
INSERT INTO #table
SELECT Id, City
FROM tblCountries
DECLARE #city VARCHAR(50)
DECLARE #id INT
DECLARE #count INT
DECLARE #i INT = 1
SELECT #count = COUNT(*) FROM #table
WHILE (#i <= #count)
BEGIN
SELECT #Id = Id, #city = City FROM #table WHERE Id = #i
IF(EXISTS(SELECT * FROM #tableString WHERE CountryId = #Id))
BEGIN
UPDATE #tableString SET Cities = Cities + ', ' + #city WHERE Id = #Id
END
ELSE
BEGIN
INSERT INTO #tableString (CountryId, city) VALUES (#Id, #city)
END
SET #i = #i + 1
END
SELECT tc.Id, tc.Country, ts.Cities
FROM tblCountries tc
LEFT JOIN #tableString ts
ON tc.Id = ts.CountryId
My concern is that with all those looping in TSQL, it may be a performance killer. Even with fewer, it appears to be slow. Is there a better way to concatenate those string without having to loop through the data set as if I was working in C#
.
Thanks for helping
This was answered many times, but I've got the feeling, that some explanation might help you...
... am I missing something? It seems like this is related to XML
The needed functionality STRING_AGG() was introduced with SQL-Server 2017. The other direction STRING_SPLIT() came with v2016.
But many people still use older versions (and will do this for years), so we need workarounds. There were approaches with loops, bad and slow... And you might use recursive CTEs. And - that's the point here! - we can use some abilities of XML to solve this.
Try this out:
DECLARE #xml XML=
N'<root>
<element>text1</element>
<element>text2</element>
<element>text3</element>
</root>';
--The query will return the first <element> below <root> and return text1.
SELECT #xml.value(N'(/root/element)[1]','nvarchar(max)');
--But now try this:
SELECT #xml.value(N'(/root)[1]','nvarchar(max)')
The result is text1text2text3.
The reason for this: If you call .value() on an element without a detailed specification of what you want to read, you'll get the whole element back. Find details here.
Now imagine an XML like this
DECLARE #xml2 XML=
N'<root>
<element>, text1</element>
<element>, text2</element>
<element>, text3</element>
</root>';
With the same query as above you'd get , text1, text2, text3. The only thing left is to cut off the leading comma and the space. This is done - in most examples - with STUFF().
So the challenge is to create this XML. And this is what you find in the linked examples.
A general example is this: Read all tables and list their columns as a CSV-list:
SELECT TOP 10
TABLE_NAME
,STUFF(
(SELECT ',' + c.COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS c
WHERE c.TABLE_SCHEMA=t.TABLE_SCHEMA AND c.TABLE_NAME=t.TABLE_NAME
ORDER BY c.COLUMN_NAME
FOR XML PATH('')
),1,1,'') AS AllTableColumns
FROM INFORMATION_SCHEMA.TABLES t

Function to flag duplicates in within query Postgresql

I would like to write a function that flags duplicates in specified columns in postgresql.
For example, if I had the following table:
country | landscape | household
--------------------------------
TZA | L01 | HH02
TZA | L01 | HH03
KEN | L02 | HH01
RWA | L03 | HH01
I would like to be able to run the following query:
SELECT country,
landscape,
household,
flag_duplicates(country, landscape) AS flag
FROM mytable
And get the following result:
country | landscape | household | flag
---------------------------------------
TZA | L01 | HH02 | duplicated
TZA | L01 | HH03 | duplicated
KEN | L02 | HH01 |
RWA | L03 | HH01 |
Inside the body of the function, I think I need something like:
IF (country || landscape IN (SELECT country || landscape FROM mytable
GROUP BY country || landscape)
HAVING count(*) > 1) THEN 'duplicated'
ELSE NULL
But I am confused about how to pass all of those as arguments. I appreciate the help. I am using postgresql version 9.3.
You don't need a function to accomplish that. Using function for every row in result set is not so good idea because of performance. A way better solution is use pure SQL (even with subqueries) and give database engine chance to optimize it. In your very example it should be something like that:
SELECT t.country,t.landscape,t.household,case when duplicates.count>1 then 'duplicate'end
FROM mytable t JOIN (
SELECT count(household) FROM mytable GROUP BY country,landscape
) duplicates ON duplicates.country=t.country AND duplicates.landscape=t.landscape
which produces exactly the same result.
Update - if You want to use function at all cost, here is working example:
CREATE FUNCTION find_duplicates(arg_country varchar, arg_landscape varchar) returns varchar AS $$
BEGIN
RETURN CASE WHEN count(household)>1 THEN 'duplicated' END FROM mytable
WHERE country=arg_country AND landscape=arg_landscape
GROUP BY country,landscape;
END
$$
LANGUAGE plpgsql STABLE;
select
*,
(count(*) over (partition by country, landscape)) > 1 as flag
from
mytable;
For function look at the #MarcinH answer but add stable to the function's definition to make its calls faster.

Postgresql Update inside For Loop

I'm new enough to postgresql, and I'm having issues updating a column of null values in a table using a for loop. The table i'm working on is huge so for brevity i'll give a smaller example which should get the point across. Take the following table
+----+----------+----------+
| id | A | B | C |
+----+----------+----------+
| a | 1 | 0 | NULL |
| b | 1 | 1 | NULL |
| c | 2 | 4 | NULL |
| a | 3 | 2 | NULL |
| c | 2 | 3 | NULL |
| d | 4 | 2 | NULL |
+----+----------+----------+
I want to write a for loop which iterates through all of the rows and does some operation
on the values in columns a and b and then inserts a new value in c.
For example, where id = a , update table set C = A*B, or where id = d set C = A + B etc. This would then give me a table like
+----+----------+----------+
| id | A | B | C |
+----+----------+----------+
| a | 1 | 0 | 0 |
| b | 1 | 1 | NULL |
| c | 2 | 4 | NULL |
| a | 3 | 2 | 6 |
| c | 2 | 3 | NULL |
| d | 4 | 2 | 6 |
+----+----------+----------+
So ultimately I'd like to loop through all the rows of the table and update column C according to the value in the "id" column. The function I've written (which isn't giving any errors but also isn't updating anything either) looks like this...
-- DROP FUNCTION some_function();
CREATE OR REPLACE FUNCTION some_function()
RETURNS void AS
$BODY$
DECLARE
--r integer; not too sure if this needs to be declared or not
result int;
BEGIN
FOR r IN select * from 'table_name'
LOOP
select(
case
when id = 'a' THEN B*C
when id = 'd' THEN B+C
end)
into result;
update table set C = result
WHERE id = '';
END LOOP;
RETURN;
END
$BODY$
LANGUAGE plpgsql
I'm sure there's something silly i'm missing, probably around what I'm, returning... void in this case. But as I only want to update existing rows should I need to return anything? There's probably easier ways of doing this than using a loop but I'd like to get it working using this method.
If anyone could point me in the right direction or point out anything blatantly obvious that I'm doing wrong I'd much appreciate it.
Thanks in advance.
No need for a loop or a function, this can be done with a single update statement:
update table_name
set c = case
when id = 'a' then a*b
when id = 'd' then a+b
else c -- don't change anything
end;
SQLFiddle: http://sqlfiddle.com/#!15/b65cb/2
The reason your function isn't doing anything is this:
update table set C = result
WHERE id = '';
You don't have a row with an empty string in the column id. Your function also seems to use the wrong formula: when id = 'a' THEN B*C I guess that should be: then a*b. As C is NULL initially, b*c will also yield null. So even if your update in the loop would find a row, it would update it to NULL.
You are also retrieving the values incorrectly from the cursor.
If you really, really want to do it inefficiently in a loop, the your function should look something like this (not tested!):
CREATE OR REPLACE FUNCTION some_function()
RETURNS void AS
$BODY$
DECLARE
result int;
BEGIN
-- r is a structure that contains an element for each column in the select list
FOR r IN select * from table_name
LOOP
if r.id = 'a' then
result := r.a * r.b;
end if;
if r.id = 'b' then
result := r.a + r.b;
end if;
update table
set C = result
WHERE id = r.id; -- note the where condition that uses the value from the record variable
END LOOP;
END
$BODY$
LANGUAGE plpgsql
But again: if your table is "huge" as you say, the loop is an extremely bad solution. Relational databases are made to deal with "sets" of data. Row-by-row processing is an anti-pattern that will almost always have bad performance.
Or to put it the other way round: doing set-based operations (like my single update example) is always the better choice.

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.

Converting Access Pivot Table to SQL Server

I'm having trouble converting a MS Access pivot table over to SQL Server. Was hoping someone might help..
TRANSFORM First(contacts.value) AS FirstOfvalue
SELECT contacts.contactid
FROM contacts RIGHT JOIN contactrecord ON contacts.[detailid] = contactrecord.[detailid]
GROUP BY contacts.contactid
PIVOT contactrecord.wellknownname
;
Edit: Responding to some of the comments
Contacts table has three fields
contactid | detailid | value |
1 1 Scott
contactrecord has something like
detailid | wellknownname
1 | FirstName
2 | Address1
3 | foobar
contractrecord is dyanamic in that the user at anytime can create a field to be added to contacts
the access query pulls out
contactid | FirstName | Address1 | foobar
1 | Scott | null | null
which is the pivot on the wellknownname. The key here is that the number of columns is dynamic since the user can, at anytime, create another field for the contact. Being new to pivot tables altogether, I'm wondering how I can recreate this access query in sql server.
As for transform... that's a built in access function. More information is found about it here. First() will just take the first result on that matching row.
I hope this helps and appreciate all the help.
I quick search for dynamic pivot tables comes up with this article.
After renaming things in his last query on the page I came up with this:
DECLARE #PivotColumnHeaders VARCHAR(max);
SELECT #PivotColumnHeaders = COALESCE(#PivotColumnHeaders + ',['+ CAST(wellknownname as varchar) + ']','['+ CAST(wellknownname as varchar) + ']')
FROM contactrecord;
DECLARE #PivotTableSQL NVARCHAR(max);
SET #PivotTableSQL = N'
SELECT *
FROM (
SELECT
c.contactid,
cr.wellknownname,
c.value
FROM contacts c
RIGHT JOIN contactrecord cr
on c.detailid = cr.detailid
) as pivotData
pivot(
min(value)
for wellknownname in (' + #PivotColumnHeaders +')
) as pivotTable
'
;
execute(#PivotTableSQL);
which despite its ugliness, it does the job