Concatenate string instead of just replacing it - postgresql

I have a table with standard columns where I want to perform regular INSERTs.
But one of the columns is of type varchar with special semantics. It's a string that's supposed to behave as a set of strings, where the elements of the set are separated by commas.
Eg. if one row has in that varchar column the value fish,sheep,dove, and I insert the string ,fish,eagle, I want the result to be fish,sheep,dove,eagle (ie. eagle gets added to the set, but fish doesn't because it's already in the set).
I have here this Postgres code that does the "set concatenation" that I want:
SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array('fish,sheep,dove' || ',fish,eagle', ','))) AS x;
But I can't figure out how to apply this logic to insertions.
What I want is something like:
CREATE TABLE IF NOT EXISTS t00(
userid int8 PRIMARY KEY,
a int8,
b varchar);
INSERT INTO t00 (userid,a,b) VALUES (0,1,'fish,sheep,dove');
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array(t00.b || EXCLUDED.b, ','))) AS x;
How can I achieve something like that?

Storing comma separated values is a huge mistake to begin with. But if you really want to make your life harder than it needs to be, you might want to create a function that merges two comma separated lists:
create function merge_lists(p_one text, p_two text)
returns text
as
$$
select string_agg(item, ',')
from (
select e.item
from unnest(string_to_array(p_one, ',')) as e(item)
where e.item <> '' --< necessary because of the leading , in your data
union
select t.item
from unnest(string_to_array(p_two, ',')) t(item)
where t.item <> ''
) t;
$$
language sql;
If you are using Postgres 14 or later, unnest(string_to_array(..., ',')) can be replace with string_to_table(..., ',')
Then your INSERT statement gets a bit simpler:
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = merge_lists(excluded.b, t00.b);

I think I was only missing parentheses around the SELECT statement:
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = (SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array(t00.b || EXCLUDED.b, ','))) AS x);

Related

Dropping multiple columns based on their name?

I'm creating a procedure/function in PostgreSQL. I have an array containing some column name and a temporary table as follows;
columns_names varchar[] := array['A','B','C','D'];
table PQR(A integer, B integer, C integer, X integer, Y integer);
I want to drop columns X and Y(i.e columns which are not present given array).
Is there any way to achieve this in single line statement?
Something like
alter table pqr drop column where columnName not in column_names
You could do that if you are using function like you mentioned and language is set to plpgsql, then dynamic SQL is possible.
For example:
EXECUTE concat('ALTER TABLE ',
attrelid::regclass::text, ' ',
string_agg(concat('DROP COLUMN ', attname), ', ')
)
FROM pg_attribute
WHERE attnum > 0
AND NOT attisdropped
AND attrelid = 'PQR'::regclass
AND attname != ALL(array['A','B','C','D'])
GROUP BY attrelid;
It will only work for one table, otherwise it will complain about returning more than one row.
If you need more tables, then you can use LOOP and execute query in it.

Get all instances of primary keys of a table

This is a simple example of what I need, for any given table, I need to get all the instances of the primary keys, this is a little example, but I need a generic way to do it.
create table foo
(
a numeric
,b text
,c numeric
constraint pk_foo primary key (a,b)
)
insert into foo(a,b,c) values (1,'a',1),(2,'b',2),(3,'c',3);
select <the magical thing>
result
a|b
1 |1|a|
2 |2|b|
3 |3|c|
.. ...
I need to control if the instances of the primary keys are changed by the user, but I don't want to repeat code in too many tables! I need a generic way to do it, I will put <the magical thing>
in a function to put it on a trigger before update and blah blah blah...
In PostgreSQL you must always provide a resulting type for a query. However, you can obtain the code of the query you need, and then execute the query from the client:
create or replace function get_key_only_sql(regclass) returns string as $$
select 'select '|| (
select string_agg(quote_ident(att.attname), ', ' order by col)
from pg_index i
join lateral unnest(indkey) col on (true)
join pg_attribute att on (att.attrelid = i.indrelid and att.attnum = col)
where i.indrelid = $1 and i.indisprimary
group by i.indexrelid
limit 1) || ' from '||$1::text
end;
$$ language sql;
Here's some client pseudocode using the function above:
sql = pgexecscalar("select get_key_only_sql('mytable'::regclass)");
rs = pgopen(sql);

Get columns that differ between 2 rows

I have a table company with 60 columns. The goal is to create a tool to find, compare and eliminate duplicates in this table.
Example: I find 2 companies that potentially are the same, but I need to know which values (columns) differ between these 2 rows in order to continue.
I think it is possible to compare column by column x 60, but I search for a simpler and more generic solution.
Something like:
SELECT * FROM company where co_id=22
SHOW DIFFERENCE
SELECT * FROM company where co_id=33
The result should be the column names that differ.
For this you may use an intermediate key/value representation of the rows, with JSON functions or alternatively with the hstore extension (now only of historical interest). JSON comes built-in with every reasonably recent version of PostgreSQL, whereas hstore must be installed in the database with CREATE EXTENSION.
Demo:
CREATE TABLE table1 (id int primary key, t1 text, t2 text, t3 text);
Let's insert two rows that differ by the primary key and one other column (t3).
INSERT INTO table1 VALUES
(1,'foo','bar','baz'),
(2,'foo','bar','biz');
Solution with json
First with get a key/value representation of the rows with the original row number, then we pair the rows based on their original row number and
filter out those with the same "value" column
WITH rowcols AS (
select rn, key, value
from (select row_number() over () as rn,
row_to_json(table1.*) as r from table1) AS s
cross join lateral json_each_text(s.r)
)
select r1.key from rowcols r1 join rowcols r2
on (r1.rn=r2.rn-1 and r1.key = r2.key)
where r1.value <> r2.value;
Sample result:
key
-----
id
t3
Solution with hstore
SELECT skeys(h1-h2) from
(select hstore(t.*) as h1 from table1 t where id=1) h1
CROSS JOIN
(select hstore(t.*) as h2 from table1 t where id=2) h2;
h1-h2 computes the difference key by key and skeys() outputs the result as a set.
Result:
skeys
-------
id
t3
The select-list might be refined with skeys((h1-h2)-'id'::text) to always remove id which, as the primary key, will obviously always differ between rows.
Here's a stored procedure that should get you most of the way...
While this should work "as is", it has no error checking, which you should add.
It gets all the columns in the table, and loops over them. A difference is when the count of the distinct items is more than one.
Also, the output is:
The count of the number of differences
Messages for each column where there is a difference
It might be more useful to return a rowset of the columns with the differences. Anyway, good luck!
Usage:
SELECT showdifference('public','company','co_id',22,33)
CREATE OR REPLACE FUNCTION showdifference(p_schema text, p_tablename text,p_idcolumn text,p_firstid integer, p_secondid integer)
RETURNS INTEGER AS
$BODY$
DECLARE
l_diffcount INTEGER;
l_column text;
l_dupcount integer;
column_cursor CURSOR FOR select column_name from information_schema.columns where table_name = p_tablename and table_schema = p_schema and column_name <> p_idcolumn;
BEGIN
-- need error checking here, to ensure the table and schema exist and the columns exist
-- Should also check that the records ids exist.
-- Should also check that the column type of the id field is integer
-- Set the number of differences to zero.
l_diffcount := 0;
-- use a cursor to iterate over the columns found in information_schema.columns
-- open the cursor
OPEN column_cursor;
LOOP
FETCH column_cursor INTO l_column;
EXIT WHEN NOT FOUND;
-- build a query to see if there is a difference between the columns. If there is raise a notice
EXECUTE 'select count(distinct ' || quote_ident(l_column) || ' ) from ' || quote_ident(p_schema) || '.' || quote_ident(p_tablename) || ' where ' || quote_ident(p_idcolumn) || ' in ('|| p_firstid || ',' || p_secondid ||')'
INTO l_dupcount;
IF l_dupcount > 1 THEN
-- increment the counter
l_diffcount := l_diffcount +1;
RAISE NOTICE '% has % differences', l_column, l_dupcount ; -- for "real" you might want to return a rowset and could do something here
END IF;
END LOOP;
-- close the cursor
CLOSE column_cursor;
RETURN l_diffcount;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT
COST 100;

T-SQL Loop in a stored proc

how do I loop through a comma separated variable using tsql in a stored proc
So for instance my list would look like this
"1,2,3,4,5,6,7,8,9,10"
and I would loop thought this list and made some necessary table
insert based on this list
You could do it a couple ways, but if this would be a list of ID's it could be done like this as well. It would change your list format a bit.
UPDATE table
SET column = value
WHERE ID in ('1','2','3','4','5','6','7','8','9','10')
You could do a loop as well
DECLARE #List CHAR(100)
DECLARE #ListItem int
DECLARE #Pos int
SET #List = '1,2,3,4,5,6,7,8,9,10'
WHILE LEN(#List) > 0
BEGIN
--Pull Item Frim List
SET #Pos = CHARINDEX(',', #List)
IF #Pos = 0
BEGIN
SET #ListItem = #List
END
ELSE
BEGIN
SET #ListItem = SUBSTRING(#List, 1, #Pos - 1)
END
UPDATE table
SET column = value
WHERE ID = #ListItem
--Remove Item Frim List
IF #Pos = 0
BEGIN
SET #List = ''
END
ELSE
BEGIN
SET #List = SUBSTRING(#List, #Pos + 1, LEN(#List) - #Pos)
END
END
I'd try to avoid looping and insert the rows directly from your comma list.
Use a table values parameter (new in SQl Server 2008). Set it up by creating the actual table parameter type:
CREATE TYPE IntTableType AS TABLE (ID INTEGER PRIMARY KEY)
Your procedure would then be:
Create Procedure up_TEST
#Ids IntTableType READONLY
AS
SELECT *
FROM ATable a
WHERE a.Id IN (SELECT ID FROM #Ids)
RETURN 0
GO
if you can't use table value parameters, see: "Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog, then there are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method. in general, you need to create a split function. This is how a split function can be used to insert rows:
INSERT INTO YourTableA (colA)
SELECT
b.col1
FROM dbo.yourSplitFunction(#Parameter) b
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
Create Procedure up_TEST
#Ids VARCHAR(MAX)
AS
SELECT * FROM ATable a
WHERE a.Id IN (SELECT ListValue FROM dbo.FN_ListToTable(',',#Ids))
GO
or insert rows from it:
Create Procedure up_TEST
#Ids VARCHAR(MAX)
,#OtherValue varchar(5)
AS
INSERT INTO YourTableA
(colA, colB, colC)
SELECT
ListValue, #OtherValue, GETDATE()
FROM dbo.FN_ListToTable(',',#Ids)
GO
Using CTE (Common Table Expression) is the most elegant solution I think check this question on stackoverflow,
T-SQL: Opposite to string concatenation - how to split string into multiple records

Most succinct way to transform a CSV string to a table in T-SQL?

-- Given a CSV string like this:
declare #roles varchar(800)
select #roles = 'Pub,RegUser,ServiceAdmin'
-- Question: How to get roles into a table view like this:
select 'Pub'
union
select 'RegUser'
union
select 'ServiceAdmin'
After posting this, I started playing with some dynamic SQL. This seems to work, but seems like there might be some security risks by using dynamic SQL - thoughts on this?
declare #rolesSql varchar(800)
select #rolesSql = 'select ''' + replace(#roles, ',', ''' union select ''') + ''''
exec(#rolesSql)
If you're working with SQL Server compatibility level 130 then the STRING_SPLIT function is now the most succinct method available.
Reference link: https://msdn.microsoft.com/en-gb/library/mt684588.aspx
Usage:
SELECT * FROM string_split('Pub,RegUser,ServiceAdmin',',')
RESULT:
value
-----------
Pub
RegUser
ServiceAdmin
See my answer from here
But basically you would:
Create this function in your DB:
CREATE FUNCTION dbo.Split(#origString varchar(max), #Delimiter char(1))
returns #temptable TABLE (items varchar(max))
as
begin
declare #idx int
declare #split varchar(max)
select #idx = 1
if len(#origString )<1 or #origString is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#origString)
if #idx!=0
set #split= left(#origString,#idx - 1)
else
set #split= #origString
if(len(#split)>0)
insert into #temptable(Items) values(#split)
set #origString= right(#origString,len(#origString) - #idx)
if len(#origString) = 0 break
end
return
end
and then call the function and pass in the string you want to split.
Select * From dbo.Split(#roles, ',')
Here's a thorough discussion of your options:
Arrays and Lists in SQL Server
What i do in this case is just using some string replace to convert it to json and open the json like a table. May not be suitable for every use case but it is very simple to get running and works with strings and files. With files you just need to watch your line break character, mostly i find it to be "Char(13)+Char(10)"
declare #myCSV nvarchar(MAX)= N'"Id";"Duration";"PosX";"PosY"
"•P001";223;-30;35
"•P002";248;-28;35
"•P003";235;-26;35'
--CSV to JSON
--convert to json by replacing some stuff
declare #myJson nvarchar(MAX)= '[['+ replace(#myCSV, Char(13)+Char(10), '],[' ) +']]'
set #myJson = replace(#myJson, ';',',') -- Optional: ensure coma delimiters for json if the current delimiter differs
-- set #myJson = replace(#myJson, ',,',',null,') -- Optional: empty in between
-- set #myJson = replace(#myJson, ',]',',null]') -- Optional: empty before linebreak
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 0))-1 AS LineNumber, *
FROM OPENJSON( #myJson )
with (
col0 varchar(255) '$[0]'
,col1 varchar(255) '$[1]'
,col2 varchar(255) '$[2]'
,col3 varchar(255) '$[3]'
,col4 varchar(255) '$[4]'
,col5 varchar(255) '$[5]'
,col6 varchar(255) '$[6]'
,col7 varchar(255) '$[7]'
,col8 varchar(255) '$[8]'
,col9 varchar(255) '$[9]'
--any name column count is possible
) csv
order by (SELECT 0) OFFSET 1 ROWS --hide header row
Using SQL Server's built in XML parsing is also an option. Of course, this glosses over all the nuances of an RFC-4180 compliant CSV.
-- Given a CSV string like this:
declare #roles varchar(800)
select #roles = 'Pub,RegUser,ServiceAdmin'
-- Here's the XML way
select split.csv.value('.', 'varchar(100)') as value
from (
select cast('<x>' + replace(#roles, ',', '</x><x>') + '</x>' as xml) as data
) as csv
cross apply data.nodes('/x') as split(csv)
If you are using SQL 2016+, using string_split is better, but this is a common way to do this prior to SQL 2016.
Using BULK INSERT you can import a csv file into your sql table -
http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
Even the accepted answer is working fine. but I got this function much faster even for thousands of record. create below function and use.
IF EXISTS (
SELECT 1
FROM Information_schema.Routines
WHERE Specific_schema = 'dbo'
AND specific_name = 'FN_CSVToStringListTable'
AND Routine_Type = 'FUNCTION'
)
BEGIN
DROP FUNCTION [dbo].[FN_CSVToStringListTable]
END
GO
CREATE FUNCTION [dbo].[FN_CSVToStringListTable] (#InStr VARCHAR(MAX))
RETURNS #TempTab TABLE (Id NVARCHAR(max) NOT NULL)
AS
BEGIN
;-- Ensure input ends with comma
SET #InStr = REPLACE(#InStr + ',', ',,', ',')
DECLARE #SP INT
DECLARE #VALUE VARCHAR(1000)
WHILE PATINDEX('%,%', #INSTR) <> 0
BEGIN
SELECT #SP = PATINDEX('%,%', #INSTR)
SELECT #VALUE = LEFT(#INSTR, #SP - 1)
SELECT #INSTR = STUFF(#INSTR, 1, #SP, '')
INSERT INTO #TempTab (Id)
VALUES (#VALUE)
END
RETURN
END
GO
---Test like this.
declare #v as NVARCHAR(max) = N'asdf,,as34df,234df,fs,,34v,5fghwer,56gfg,';
SELECT Id FROM dbo.FN_CSVToStringListTable(#v)
I was about you use the solution mentioned in the accepted answer, but doing more research led me to use Table Value Types:
These are far more efficient and you don't need a TVF (Table valued function) just to create a table from csv. You can use it directly in your scripts or pass that to a stored procedure as a Table Value Parameter. The Type can be created as :
CREATE TYPE [UniqueIdentifiers] AS TABLE(
[Id] [varchar](20) NOT NULL
)