Related
PostgreSQL 11.1 PgAdmin 4.1
This works some of the time:
BEGIN;
SET CONSTRAINTS ALL DEFERRED;
WITH _in(trx, lastname, firstname, birthdate, old_disp, old_medname, old_sig, old_form, new_disp, new_medname, new_sig, new_form, new_refills) AS (
VALUES ('2001-06-07 00:00:00'::timestamp,
UPPER(TRIM('JONES')), UPPER(TRIM('TOM')), '1952-12-30'::date,
64::integer,
LOWER(TRIM('adipex 37.5mg tab')), LOWER(TRIM('one tab po qd')), LOWER(TRIM('tab')),
63::integer,
LOWER(TRIM('adipex 37.5mg tab')), LOWER(TRIM('one tab po qd')), LOWER(TRIM('tab')),
33::integer
)
),
_s AS ( -- RESOLVE ALL SURROGATE KEYS.
SELECT n.*, d1.recid as old_medication_recid, d2.recid as new_medication_recid, pt.recid as patient_recid
FROM _in n
JOIN medications d1 ON (n.old_medname, n.old_sig, n.old_form) = (d1.medname, d1.sig, d1.form)
JOIN medications d2 ON (n.new_medname, n.new_sig, n.new_form) = (d2.medname, d2.sig, d2.form)
JOIN patients pt ON (pt.lastname, pt.firstname, pt.birthdate) = (n.lastname, n.firstname, n.birthdate)
),
_t AS ( -- REMOVE CONFLICTING RECORD, IF ANY.
DELETE FROM rx r
USING _s n
WHERE (r.trx::date, r.disp, r.patient_recid, r.medication_recid)=(n.trx::date, n.new_disp, n.patient_recid, n.new_medication_recid)
RETURNING r.*
),
_u AS( -- GET NEW SURROGATE KEY.
SELECT COALESCE(_t.recid, r.recid) as target_recid, r.recid as old_recid
FROM _s n
JOIN rx r ON (r.trx, r.disp, r.patient_recid, r.medication_recid) = (n.trx, n.old_disp, n.patient_recid, n.old_medication_recid)
LEFT JOIN _t ON (_t.trx::date, _t.disp, _t.patient_recid, _t.medication_recid) = (n.trx::date, n.new_disp, n.patient_recid, n.new_medication_recid)
)
UPDATE rx r -- UPDATE ORIGINAL RECORD WITH NEW VALUES.
SET disp = n.new_disp, medication_recid = n.new_medication_recid, refills = n.new_refills, recid = _u.target_recid
FROM _s n, _u
WHERE r.recid = _u.old_recid
RETURNING r.*;
COMMIT;
Where table rx is defined as:
CREATE TABLE phoenix.rx
(
recid integer NOT NULL DEFAULT nextval('rx_recid_seq'::regclass),
trx timestamp without time zone NOT NULL,
disp integer NOT NULL,
refills integer,
tprinted timestamp without time zone,
tstop timestamp without time zone,
modified timestamp without time zone DEFAULT now(),
patient_recid integer NOT NULL,
medication_recid integer NOT NULL,
dposted date NOT NULL,
CONSTRAINT pk_rx_recid PRIMARY KEY (recid),
CONSTRAINT rx_unique UNIQUE (dposted, disp, patient_recid, medication_recid)
DEFERRABLE,
CONSTRAINT rx_medication_fk FOREIGN KEY (medication_recid)
REFERENCES phoenix.medications (recid) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT
DEFERRABLE,
CONSTRAINT rx_patients FOREIGN KEY (patient_recid)
REFERENCES phoenix.patients (recid) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT
)
After many hours, it is found that the "Delete.." of a conflicting record works as expected, but the "COALESCE" STATEMENT seems to fail when deciding on the new surrogate key (primary key) of rx.recid -- it does not seem to receive the result of the delete. (Or maybe the timing is wrong???)
Any help would be most appreciated.
TIA
This is documented:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13, so they cannot “see” one another's effects on the target tables.
Don't use the same table twice in a statement with a CTE if it occurs in a DML statement. Rather, use DELETE ... RETURNING and use the returned values in the other parts of the statement.
If you cannot rewrite the statement like that, use more than one statement instead of putting everything into a single CTE.
#LaurenzAlbe is totally correct in his answer. Below is a working solution to my problem. There are a few things to note:
The unique constraint is formed on a column in rx defined as a date and created by a trigger on update/insert that casts the timestamp of trx to a date: as in trx::date. For reasons I am not clear on, using r.trx::date in place of r.dposted leads to many records being identified and not the one record I want. Not sure why???. So the first fix was to use r.dposted, not r.trx::date.
Although the cte's are designed to be independent of each other, by using "RETURNING..." and incorporating the cte's in a step-wise fashion, one can be built upon another to obtain a final result set.
The working code is:
WITH _in(trx, lastname, firstname, birthdate, old_disp, old_medname, old_sig, old_form, new_disp, new_medname, new_sig, new_form, new_refills) AS (
VALUES ('2001-06-07 00:00:00'::timestamp,
UPPER(TRIM('smith')), UPPER(TRIM('john')), '1957-12-30'::date,
28::integer,
LOWER(TRIM('test')), LOWER(TRIM('i am sig')), LOWER(TRIM('tab')),
28::integer,
LOWER(TRIM('test 1')), LOWER(TRIM('i am sig')), LOWER(TRIM('tab')),
8::integer
)
),
_m AS (
SELECT n.*, d1.recid as old_medication_recid, d2.recid as new_medication_recid, pt.recid as patient_recid
FROM _in n
JOIN patients pt ON (pt.lastname, pt.firstname, pt.birthdate) = (n.lastname, n.firstname, n.birthdate)
JOIN medications d1 ON (n.old_medname, n.old_sig, n.old_form) = (d1.medname, d1.sig, d1.form)
LEFT JOIN medications d2 ON (n.new_medname, n.new_sig, n.new_form) = (d2.medname, d2.sig, d2.form)
),
_t AS ( -- REMOVE CONFLICTING RECORD, IF ANY.
DELETE FROM rx r
USING _m
WHERE (r.dposted, r.disp, r.patient_recid, r.medication_recid) = (_m.trx::date,_m.new_disp, _m.patient_recid, _m.new_medication_recid)
RETURNING r.*
),
_s AS ( -- GET NEW SURROGATE KEY
SELECT _m.*, r1.recid as old_recid, r2.recid as new_recid, COALESCE(r2.recid, r1.recid) as target_recid
FROM _m
JOIN rx r1 ON (r1.dposted, r1.disp, r1.patient_recid, r1.medication_recid) = (_m.trx::date,_m.old_disp, _m.patient_recid, _m.old_medication_recid)
LEFT JOIN rx r2 ON (r2.dposted, r2.disp, r2.patient_recid, r2.medication_recid) = (_m.trx::date,_m.new_disp, _m.patient_recid, _m.new_medication_recid)
LEFT JOIN _t ON (_t.recid = r2.recid)
)
UPDATE rx -- UPDATE ORIGINAL RECORD WITH NEW VALUES.
SET disp = _s.new_disp, medication_recid = _s.new_medication_recid, refills = _s.new_refills, recid = _s.target_recid
FROM _s
WHERE rx.recid = _s.old_recid
RETURNING rx.*;
COMMIT;
Hope this helps somebody.
I have a field in my database table called ADDRESSFORMAT
1,The Lodge
Street
Town
Postcode
Where the contents are separated by a CHAR (13) and CHAR (10)
How would I go about creating fields in a query that would only pull back either the first line, second line...and so on?
The following is an in-line approach.
The Cross Apply B generates a "clean string". It will eliminate any number of repeating CRLFs and create a pipe delimited string to be processed by Cross Appy C.
I should note that this method of eliminating repeating strings was demonstrated by Gordon Linoff several weeks back. Sorry I can't find the original post.
Example
Declare #YourTable table (ID int,ADDRESSFORMAT varchar(max))
Insert Into #YourTable values
(1,'The Lodge
Street
Town
Postcode')
Select A.ID
,C.*
From #YourTable A
Cross Apply (
Select CleanString = replace(replace(replace(replace(replace(ADDRESSFORMAT,char(13),'|'),char(10),'|'),'|','><'),'<>',''),'><','|')
) B
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(B.CleanString,'|','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A
) C
Returns
ID Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
1 The Lodge Street Town Postcode NULL NULL NULL NULL NULL
I have the following heap of text:
"BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,
URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,
16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,
1.0,Version,1.0,".
What I'd like to do is extract data from this in the following manner:
BundleSize:155648
DynamicSize:204800
Identifier:com.URLConnectionSample
Name:URLConnectionSample
ShortVersion:1.0
Version:1.0
BundleSize:155648
DynamicSize:16384
Identifier:com.IdentifierForVendor3
Name:IdentifierForVendor3
ShortVersion:1.0
Version:1.0
All tips and suggestions are welcome.
It isn't quite clear what do you need to do with this data. If you really need to process it entirely in the database (looks like the task for your favorite scripting language instead), one option is to use hstore.
Converting records one by one is easy:
Assuming
%s =
BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0
SELECT * FROM each(hstore(string_to_array(%s, ',')));
Output:
key | value
--------------+-------------------------
Name | URLConnectionSample
Version | 1.0
BundleSize | 155648
Identifier | com.URLConnectionSample
DynamicSize | 204800
ShortVersion | 1.0
If you have table with columns exactly matching field names (note the quotes, populate_record is case-sensitive to key names):
CREATE TABLE data (
"BundleSize" integer, "DynamicSize" integer, "Identifier" text,
"Name" text, "ShortVersion" text, "Version" text);
You can insert hstore records into it like this:
INSERT INTO data SELECT * FROM
populate_record(NULL::data, hstore(string_to_array(%s, ',')));
Things get more complicated if you have comma-separated values for more than one record.
%s = BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,1.0,Version,1.0,
You need to break up an array into chunks of number_of_fields * 2 = 12 elements first.
SELECT hstore(row) FROM (
SELECT array_agg(str) AS row FROM (
SELECT str, row_number() OVER () AS i FROM
unnest(string_to_array(%s, ',')) AS str
) AS str_sub
GROUP BY (i - 1) / 12) AS row_sub
WHERE array_length(row, 1) = 12;
Output:
"Name"=>"URLConnectionSample", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.URLConnectionSample", "DynamicSize"=>"204800", "ShortVersion"=>"1.0"
"Name"=>"IdentifierForVendor3", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.IdentifierForVendor3", "DynamicSize"=>"16384", "ShortVersion"=>"1.0"
And inserting this into the aforementioned table:
INSERT INTO data SELECT (populate_record(NULL::data, hstore(row))).* FROM ...
the rest of the query is the same.
For Iseries/IBMi DB2.
I am joining multiple files/tables together.
I have written the code in both DDS and SQL.
The DDS Logical File is working exactly as expected, but I can not use it for embedded sql in rpgle as it then defaults to the SQE engine resulting in horrendous performance.
The SQL view, on the other hand had NULLs until I used IFNULL( MBRDESCR, ''). But now MBRDECSR is a VARCHAR. Which is unacceptable.
So how do I create a SQL join without NULLs and VARCHARs?
Requested Sample Code:
DDS:
JDFTVAL
R TRANSR JFILE(TRANSPF MBRPF)
J JOIN(1 2)
JFLD(MBRID MBRID)
*
TRANSID JREF(1)
MBRID JREF(1)
MBRNAME JREF(2)
MBRSURNME JREF(2)
*
K TRANSID
K MBRID
SQL:
CREATE VIEW TRANSV01 AS (
SELECT TRANSID ,
MBRID ,
CAST(IFNULL(MBRNAME , '') as Char(20)) ,
CAST(IFNULL(MBRSURNME, '') as Char(25))
FROM TRANSPF
--Member Name
LEFT OUTER JOIN MBRPF on MBRID = MBRID
) RCDFMT TRANSR;
Please note the following:
Example above is simplified
Not every MBRID in the TRANSPF has a corresponding entry in the MBRPF (ie. no referential constraint). Thus when MBRPF is joined to the TRANSPF, there will be NULL values in MBRNAME, MBRSURNME. Unless JDFTVAL or IFNULL() is used.
I prefer not to have a VARCHAR, because of performance and extname() in rpgle.
I prefer not to have NULL values, I do not want the pgm to have to handle them.
Assuming it's the 'Allows the null value' that you find undesirable, use a UNION. The first SELECT chooses all the rows that match, which will set the NOT NULL property for you. The second SELECT chooses all the rows that don't have a match - you provide filler fields for those.
CREATE VIEW TRANSV01 AS (
SELECT TRANSID ,
MBRID ,
MBRNAME ,
MBRSURNME
FROM TRANSPF
--Member Name
JOIN MBRPF on MBRID = MBRID
UNION
SELECT TRANSID ,
MBRID ,
CAST('') as Char(20)) ,
CAST('') as Char(25))
FROM TRANSPF
--Member Name
EXCEPTION JOIN MBRPF on MBRID = MBRID
) RCDFMT TRANSR;
I'm trying to figure out a way to store metadata about a column without repeating myself.
I'm currently working on a generic dimension loading SSIS package that will handle all my dimensions. It currently does :
Create a temporary table identical to the given table name in parameters (this is a generic stored procedure that receive the table name as parameter, and then do : select top 0 * into ##[INSERT ORIGINAL TABLE NAME HERE] from [INSERT ORIGINAL TABLE NAME HERE]).
==> Here we insert custom code for this particular dimension that will first query the data from a datasource and get my delta, then transform the data and finally loads it into my temporary table.
Merge the temporary table into my original table with a T-SQL MERGE, taking care of type1 and type2 fields accordingly.
My problem right now is that I have to maintain a table with all the fields in it to store a metadata to tell my scripts if this particular field is type1 or type2... this is nonsense, I can get the same data (minus type1/type2) from sys.columns/sys.types.
I was ultimately thinking about renaming my fields to include their type in it, such as :
FirstName_T2, LastName_T2, Sex_T1 (well, I know this can be type2, let's not fall into that debate here).
What do you guyz would do with that? My solution (using a table with that metadata) is currently in place and working, but it's obvious that repeating myself from the systables to a custom table is nonsense, just for a simple type1/type2 info.
UPDATE: I also thought about creating user defined types like varchar => t1_varchar, t2_varchar, etc. This sounds like something a bit sluggy too...
Everything you need should already be in INFORMATION_SCHEMA.COLUMNS
I can't follow your thinking of not using provided tables/views...
Edit: As scarpacci mentioned, this somewhat portable if needed.
I know this is bad, but I will post an answer to my own question... Thanks to GBN for the help tho!
I am now storing "flags" in the "description" field of my columns. I, for example, can store a flag this way : "TYPE_2_DATA".
Then, I use this query to get the flag back for each and every column :
select columns.name as [column_name]
,types.name as [type_name]
,extended_properties.value as [column_flags]
from sys.columns
inner join sys.types
on columns.system_type_id = types.system_type_id
left join sys.extended_properties
on extended_properties.major_id = columns.object_id
and extended_properties.minor_id = columns.column_id
and extended_properties.name = 'MS_Description'
where object_id = ( select id from sys.sysobjects where name = 'DimDivision' )
and is_identity = 0
order by column_id
Now I can store metadata about columns without having to create a separate table. I use what's already in place and I don't repeat myself. I'm not sure this is the best possible solution yet, but it works and is far better than duplicating information.
In the future, I will be able to use this field to store more metadata, where as : "TYPE_2_DATA|ANOTHER_FLAG|ETC|OH BOY!".
UPDATE :
I now store the information in separate extended properties. You can manage extended properties using sp_addextendedproperty and sp_updateextendedproperty stored procedures. I have created a simple store procedure that help me to update those values regardless if they currently exist or not :
create procedure [dbo].[UpdateSCDType]
#tablename nvarchar(50),
#fieldname nvarchar(50),
#scdtype char(1),
#dbschema nvarchar(25) = 'dbo'
as
begin
declare #already_exists int;
if ( #scdtype = '1' or #scdtype = '2' )
begin
select #already_exists = count(1)
from sys.columns
inner join sys.extended_properties
on extended_properties.major_id = columns.object_id
and extended_properties.minor_id = columns.column_id
and extended_properties.name = 'ScdType'
where object_id = (select sysobjects.id from sys.sysobjects where sysobjects.name = #tablename)
and columns.name = #fieldname
if ( #already_exists = 0 )
begin
exec sys.sp_addextendedproperty
#name = N'Scd_Type',
#value = #scdtype,
#level0type = N'SCHEMA',
#level0name = #dbschema,
#level1type = N'TABLE',
#level1name = #tablename,
#level2type = N'COLUMN',
#level2name = #fieldname
end
else
begin
exec sys.sp_updateextendedproperty
#name = N'Scd_Type',
#value = #scdtype,
#level0type = N'SCHEMA',
#level0name = #dbschema,
#level1type = N'TABLE',
#level1name = #tablename,
#level2type = N'COLUMN',
#level2name = #fieldname
end
end
end
Thanks again