Concate text fields with PostgreSQL - postgresql

What is wrong with this query?
SELECT gid, (tex1||' '||tex2) AS ident FROM my_table ;
The structure of my_table is as follow:
gid serial NOT NULL
tex1 CHARACTER VARYING(254)
tex2 CHARACTER VARYING(254)
The content of my_table is:
gid | tex1 | tex2
----+--------+--------
1 | A | dog
2 | Two | birds
3 | More | things
The result of the query is:
gid | ident
----+-------
1 |
2 |
3 |
I would have never thought having troubles with such a simple query...
Thanks for help !

Are you sure that the sample data you are providing is correct?
What is the output of: SELECT gid, (tex1||' '||tex2) IS NULL FROM my_table;?
It seems that either tex1 or tex2 (or both) is NULL, thus the concatenation also produces NULL. Use COALESCE to provide a non null default value to use in such cases.
SELECT
gid,
(COALESCE(tex1, '') || ' ' || COALESCE(tex2, '')) AS ident
FROM
my_table;

Related

SQL WHERE condition that one field's string can be found in another field

Here's some sample data
ID | sys1lname | sys2lname
------------------------------------
1 | JOHNSON | JOHNSON
2 | FULTON | ANDERS-FULTON
3 | SMITH | SMITH-DAVIDS
4 | HARRISON | JONES
The goal is to find records where the last names do NOT match, BUT allow when sys1lname can be found somewhere within sys2lname, which may or may not be a hyphenated name. So from the above data, only record 4 should return.
When I put this (SUBSTRING(sys2lname, CHARINDEX(sys2lname, ccm.NAME_LAST), LEN(sys1lname))) in the SELECT statement it will properly return the part of sys2lname that matches sys1lname.
But when I use that in the WHERE clause
WHERE 1=1
AND sys1lname <> sys2lname
OR sys1lname not in ('%' + (SUBSTRING(sys2lname, CHARINDEX(sys1lname, sys2lname), LEN(sys1lname))))
the records with hyphenated names are in the result set.
And I can't figure out why.
Just use a NOT LIKE:
SELECT ID
FROM dbo.YourTable
WHERE sys2lname NOT LIKE '%' + sys1lname + '%';
If you could have a name like 'Smith' in sys1lname and 'BlackSmith' (or even 'Green-Blacksmith') in sys2lname and don't want them to match, I would use STRING_SPLIT and a NOT EXISTS:
SELECT ID
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM STRING_SPLIT(YT.sys2lname,'-') SS
WHERE SS.[value] = YT.sys1lname);

DB2: How to transpose mutlidimensional table from row to column to find data changes across rows

I am trying the following with Db2:
Problem
So I've got a table with 80+ columns and two rows.
I need to accomplish is checking what columns have changed value between the two rows, and return a table of the column names that have changed, their initial value from row1, and their new value from row2.
Approach so far
My initial idea was to perform a pivot of the two rows into two columns, row 1 as column 1, row 2 as column 2, then join a column of column names (likely taken from syscat.columns) to the table as column 3, at which point I can then do a select where column1 != column2, hence returning the rows with all the data needed. But alas, it was not long after coming up with this that I discover DB2 doesn't support pivot / unpivot...
Question
So is there any idea for how to accomplish this in DB2, taking a table with 80+ columns and two rows like so:
| Col A | Col B | Col C | ... | Col Z|
| ----- | ----- | ----- | --- | ---- |
| Val A | Val B | 123 | ... | 01/01/2021 |
| Val C | Val B | 124 | ... | 02/01/2021 |
And returning a table with the columns changed, their initial value, and their new value:
| Initial | New | ColName|
| ----- | ----- | ----- |
| Val A | Val C | Col A |
| 123 | 124 | Col C |
| 01/01/2021 | 02/01/2021 | Col Z |
Also note the column data types also vary, so will need to be converted to varchar
DB2 version is 11.1
EDIT: Also for reference as per comment request, this is code I attempted to use to achieve this goal:
WITH
INIT AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MIN(SOMEDATE) FROM TABLE),
LATE AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MAX(SOMEDATE) FROM TABLE),
COLS AS (SELECT COLNAME FROM SYSCAT.COLUMNS WHERE TABNAME='TABLE' ORDER BY COLNO)
SELECT * FROM (
SELECT
COLNAME AS ATTRIBUTE,
(SELECT COLNAME AS INITIAL FROM INIT),
(SELECT COLNAME AS NEW FROM LATE)
FROM
COLS
WHERE
(INITIAL != NEW) OR (INITIAL IS NULL AND NEW IS NOT NULL) OR (INITIAL IS NOT NULL AND NEW IS NULL));
Only issue with this one is that I couldn't figure how to use the values from the COLS table as the columns to be selected
You may easily generate text of the expressions needed, if you don't want to type them manually.
Consider the following example, if you want to print different column values only in 2 rows of the same quite a wide table SYSCAT.TABLES. We use the following query for such an expression generation.
SELECT
'DECODE(I.I, '
|| LISTAGG(COLNO || ', A.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS INITIAL' AS EXPR_INITIAL
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', B.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS NEW' AS EXPR_NEW
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', ''' || COLNAME || '''', ', ')
|| ') AS COLNAME' AS EXPR_COLNAME
FROM SYSCAT.COLUMNS C
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB';
It doesn't matter how many columns the table contains. We just filter out the columns of *LOB types as an example. If you want them as well, you should change the ::VARCHAR(128) casting to some ::CLOB(XXX).
These 3 generated expressions we put to the corresponding places in the query below:
WITH MYTAB AS
(
-- We enumerate the rows to reference them later
SELECT ROWNUMBER() OVER () RN_, T.*
FROM SYSCAT.TABLES T
WHERE TABSCHEMA = 'SYSCAT'
FETCH FIRST 2 ROWS ONLY
)
SELECT *
FROM
(
SELECT
-- Place here the result got in the EXPR_INITIAL column
-- , Place here the result got in the EXPR_NEW column
-- , Place here the result got in the EXPR_COLNAME column
FROM MYTAB A, MYTAB B
,
(
SELECT COLNO AS I
FROM SYSCAT.COLUMNS
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB'
) I
WHERE A.RN_ = 1 AND B.RN_ = 2
)
WHERE INITIAL IS DISTINCT FROM NEW;
The result I got in my database:
|INITIAL |NEW |COLNAME |
|--------------------------|--------------------------|---------------|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|ALTER_TIME |
|26 |15 |COLCOUNT |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|CREATE_TIME |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|INVALIDATE_TIME|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|LAST_REGEN_TIME|
|ATTRIBUTES |AUDITPOLICIES |TABNAME |

Fetch records with distinct value of one column while replacing another col's value when multiple records

I have 2 tables that I need to join based on distinct rid while replacing the column value with having different values in multiple rows. Better explained with an example set below.
CREATE TABLE usr (rid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(12) NOT NULL,
email VARCHAR(20) NOT NULL);
CREATE TABLE usr_loc
(rid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
code CHAR NOT NULL PRIMARY KEY,
loc_id INT NOT NULL PRIMARY KEY);
INSERT INTO usr VALUES
(1,'John','john#product'),
(2,'Linda','linda#product'),
(3,'Greg','greg#product'),
(4,'Kate','kate#product'),
(5,'Johny','johny#product'),
(6,'Mary','mary#test');
INSERT INTO usr_loc VALUES
(1,'A',4532),
(1,'I',4538),
(1,'I',4545),
(2,'I',3123),
(3,'A',4512),
(3,'A',4527),
(4,'I',4567),
(4,'A',4565),
(5,'I',4512),
(6,'I',4567);
(6,'I',4569);
Required Result Set
+-----+-------+------+-----------------+
| rid | name | Code | email |
+-----+-------+------+-----------------+
| 1 | John | B | 'john#product' |
| 2 | Linda | I | 'linda#product' |
| 3 | Greg | A | 'greg#product' |
| 4 | Kate | B | 'kate#product' |
| 5 | Johny | I | 'johny#product' |
| 6 | Mary | I | 'mary#test' |
+-----+-------+------+-----------------+
I have tried some queries to join and some to count but lost with the one which exactly satisfies the whole scenario.
The query I came up with is
SELECT distinct(a.rid)as rid, a.name, a.email, 'B' as code
FROM usr
JOIN usr_loc b ON a.rid=b.rid
WHERE a.rid IN (SELECT rid FROM usr_loc GROUP BY rid HAVING COUNT(*) > 1);`
You need to group by the users and count how many occurrences you have in usr_loc. If more than a single one, then replace the code by B. See below:
select
rid,
name,
case when cnt > 1 then 'B' else min_code end as code,
email
from (
select u.rid, u.name, u.email, min(l.code) as min_code, count(*) as cnt
from usr u
join usr_loc l on l.rid = u.rid
group by u.rid, u.name, u.email
) x;
Seems to me that you are using MySQL, rather than IBM DB2. Is that so?

Function to flag duplicates in within query Postgresql

I would like to write a function that flags duplicates in specified columns in postgresql.
For example, if I had the following table:
country | landscape | household
--------------------------------
TZA | L01 | HH02
TZA | L01 | HH03
KEN | L02 | HH01
RWA | L03 | HH01
I would like to be able to run the following query:
SELECT country,
landscape,
household,
flag_duplicates(country, landscape) AS flag
FROM mytable
And get the following result:
country | landscape | household | flag
---------------------------------------
TZA | L01 | HH02 | duplicated
TZA | L01 | HH03 | duplicated
KEN | L02 | HH01 |
RWA | L03 | HH01 |
Inside the body of the function, I think I need something like:
IF (country || landscape IN (SELECT country || landscape FROM mytable
GROUP BY country || landscape)
HAVING count(*) > 1) THEN 'duplicated'
ELSE NULL
But I am confused about how to pass all of those as arguments. I appreciate the help. I am using postgresql version 9.3.
You don't need a function to accomplish that. Using function for every row in result set is not so good idea because of performance. A way better solution is use pure SQL (even with subqueries) and give database engine chance to optimize it. In your very example it should be something like that:
SELECT t.country,t.landscape,t.household,case when duplicates.count>1 then 'duplicate'end
FROM mytable t JOIN (
SELECT count(household) FROM mytable GROUP BY country,landscape
) duplicates ON duplicates.country=t.country AND duplicates.landscape=t.landscape
which produces exactly the same result.
Update - if You want to use function at all cost, here is working example:
CREATE FUNCTION find_duplicates(arg_country varchar, arg_landscape varchar) returns varchar AS $$
BEGIN
RETURN CASE WHEN count(household)>1 THEN 'duplicated' END FROM mytable
WHERE country=arg_country AND landscape=arg_landscape
GROUP BY country,landscape;
END
$$
LANGUAGE plpgsql STABLE;
select
*,
(count(*) over (partition by country, landscape)) > 1 as flag
from
mytable;
For function look at the #MarcinH answer but add stable to the function's definition to make its calls faster.

Postgres: How to JOIN

I am working on Postgres 9.3. I have two tables, the first for payment items:
Table "public.prescription"
Column | Type | Modifiers
-------------------+-------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('frontend_prescription_id_seq'::regclass)
presentation_code | character varying(15) | not null
presentation_name | character varying(1000) | not null
actual_cost | double precision | not null
pct_id | character varying(3) | not null
And the second for organisations:
Table "public.pct"
Column | Type | Modifiers
-------------------+-------------------------+-----------
code | character varying(3) | not null
name | character varying(200) |
I have a query to get all the payments for a particular code:
SELECT sum(actual_cost) as total_cost, pct_id as row_id
FROM prescription
WHERE presentation_code='1234' GROUP BY pct_id
Here is the query plan for that query.
Now, I'd like to annotate each row with the name property of the associated organisation. This is what I'm trying:
SELECT sum(prescription.actual_cost) as total_cost, prescription.pct_id, pct.name as row_id
FROM prescription, pct
WHERE prescription.presentation_code='0212000AAAAAAAA'
GROUP BY prescription.pct_id, pct.name;
Here's the ANALYSE for that query. It's incredibly slow: what am I doing wrong?
I think there must be a way to annotate each row with the pct.name AFTER the first query has run, which would be faster.
With JOIN (LEFT JOIN in this case, because we want the line even if there is no pct):
SELECT
sum(prescription.actual_cost) as total_cost,
prescription.pct_id,
pct.name as row_id
FROM prescription
LEFT JOIN pct ON pct.code = prescription.pct_id
WHERE
prescription.presentation_code='0212000AAAAAAAA'
GROUP BY
prescription.pct_id,
pct.name;
I don't know if it's work well, I didn't try this query.
You are taking data from 2 tables, but you do not join the tables in any way. Effectively, you make a full join, resulting in the Cartesian product of both tables. If you look at your ANALYZE statistics, you see that your nested loop processes 62 million rows. that takes time.
Add in a join condition to make this all fast:
SELECT sum(prescription.actual_cost) as total_cost, prescription.pct_id, pct.name as row_id
FROM prescription
JOIN pct On pct.code = prescription.pct_id
WHERE prescription.presentation_code = '0212000AAAAAAAA'
GROUP BY prescription.pct_id, pct.name;