INSERT INTO SELECT JOIN returning null from joined table (Postgres) - postgresql

I'm using the following Knex statement to copy data from two tables into another table:
const insert = knex.from(knex.raw('?? (??, ??, ??, ??, ??, ??, ??)', ['carrier', 'docket_number', 'dot_number', 'legal_name', 'dba_name', 'nbr_power_unit', 'rating', 'carrier_operation']))
.insert(knex('carrier_temp as ct').leftJoin('carrier_census_temp as cen', 'ct.dot_number', 'cen.DOT_NUMBER')
.select(['ct.docket_number as docket_number',
'ct.dot_number as dot_number',
'ct.legal_name as legal_name',
'ct.dba_name as dba_name',
'cen.tot_pwr as nbr_power_unit',
'cen.RATING as rating',
knex.raw('CASE WHEN cen.crrinter = \'A\' THEN \'INTERSTATE\' ELSE \'INTRASTATE\' END as "carrier_operation"')])).toString()
const conflict = knex.raw('ON CONFLICT (docket_number) DO NOTHING;').toString()
const q = insert + conflict
await knex.raw(q).debug()
The generated sql is:
INSERT INTO "carrier"
(
"docket_number",
"dot_number",
"legal_name",
"dba_name",
"nbr_power_unit",
"rating",
"carrier_operation"
)
SELECT "ct"."docket_number" AS "docket_number",
"ct"."dot_number" AS "dot_number",
"ct"."legal_name" AS "legal_name",
"ct"."dba_name" AS "dba_name",
"cen"."tot_pwr" AS "nbr_power_unit",
"cen"."RATING" AS "rating",
CASE
WHEN cen.crrinter = 'A' THEN 'INTERSTATE'
ELSE 'INTRASTATE'
END AS "carrier_operation"
FROM "carrier_temp" AS "ct"
left join "carrier_census_temp" AS "cen"
ON "ct"."dot_number" = "cen"."DOT_NUMBER" ON conflict (docket_number) DO nothing;
The carrier table receives all the columns that are referenced from carrier_temp or ct correctly, but the columns pulled from carrier_census_temp or cen are all ending up as NULL (nbr_power_unit, rating, and carrier_operation). That is, except for the case statement, which is setting every row's carrier_operation to INTRASTATE. If I instead equate NULL in the case statement, it still sets every row to INTRASTATE. Does anyone have any idea why this is?

See comments on question. I totally missed something wrong in my data that led to nothing ever being joined, which just leads everything to always be NULL.

Related

COALESCE failing following CTE Deletion. (PostgreSQL)

PostgreSQL 11.1 PgAdmin 4.1
This works some of the time:
BEGIN;
SET CONSTRAINTS ALL DEFERRED;
WITH _in(trx, lastname, firstname, birthdate, old_disp, old_medname, old_sig, old_form, new_disp, new_medname, new_sig, new_form, new_refills) AS (
VALUES ('2001-06-07 00:00:00'::timestamp,
UPPER(TRIM('JONES')), UPPER(TRIM('TOM')), '1952-12-30'::date,
64::integer,
LOWER(TRIM('adipex 37.5mg tab')), LOWER(TRIM('one tab po qd')), LOWER(TRIM('tab')),
63::integer,
LOWER(TRIM('adipex 37.5mg tab')), LOWER(TRIM('one tab po qd')), LOWER(TRIM('tab')),
33::integer
)
),
_s AS ( -- RESOLVE ALL SURROGATE KEYS.
SELECT n.*, d1.recid as old_medication_recid, d2.recid as new_medication_recid, pt.recid as patient_recid
FROM _in n
JOIN medications d1 ON (n.old_medname, n.old_sig, n.old_form) = (d1.medname, d1.sig, d1.form)
JOIN medications d2 ON (n.new_medname, n.new_sig, n.new_form) = (d2.medname, d2.sig, d2.form)
JOIN patients pt ON (pt.lastname, pt.firstname, pt.birthdate) = (n.lastname, n.firstname, n.birthdate)
),
_t AS ( -- REMOVE CONFLICTING RECORD, IF ANY.
DELETE FROM rx r
USING _s n
WHERE (r.trx::date, r.disp, r.patient_recid, r.medication_recid)=(n.trx::date, n.new_disp, n.patient_recid, n.new_medication_recid)
RETURNING r.*
),
_u AS( -- GET NEW SURROGATE KEY.
SELECT COALESCE(_t.recid, r.recid) as target_recid, r.recid as old_recid
FROM _s n
JOIN rx r ON (r.trx, r.disp, r.patient_recid, r.medication_recid) = (n.trx, n.old_disp, n.patient_recid, n.old_medication_recid)
LEFT JOIN _t ON (_t.trx::date, _t.disp, _t.patient_recid, _t.medication_recid) = (n.trx::date, n.new_disp, n.patient_recid, n.new_medication_recid)
)
UPDATE rx r -- UPDATE ORIGINAL RECORD WITH NEW VALUES.
SET disp = n.new_disp, medication_recid = n.new_medication_recid, refills = n.new_refills, recid = _u.target_recid
FROM _s n, _u
WHERE r.recid = _u.old_recid
RETURNING r.*;
COMMIT;
Where table rx is defined as:
CREATE TABLE phoenix.rx
(
recid integer NOT NULL DEFAULT nextval('rx_recid_seq'::regclass),
trx timestamp without time zone NOT NULL,
disp integer NOT NULL,
refills integer,
tprinted timestamp without time zone,
tstop timestamp without time zone,
modified timestamp without time zone DEFAULT now(),
patient_recid integer NOT NULL,
medication_recid integer NOT NULL,
dposted date NOT NULL,
CONSTRAINT pk_rx_recid PRIMARY KEY (recid),
CONSTRAINT rx_unique UNIQUE (dposted, disp, patient_recid, medication_recid)
DEFERRABLE,
CONSTRAINT rx_medication_fk FOREIGN KEY (medication_recid)
REFERENCES phoenix.medications (recid) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT
DEFERRABLE,
CONSTRAINT rx_patients FOREIGN KEY (patient_recid)
REFERENCES phoenix.patients (recid) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT
)
After many hours, it is found that the "Delete.." of a conflicting record works as expected, but the "COALESCE" STATEMENT seems to fail when deciding on the new surrogate key (primary key) of rx.recid -- it does not seem to receive the result of the delete. (Or maybe the timing is wrong???)
Any help would be most appreciated.
TIA
This is documented:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13, so they cannot “see” one another's effects on the target tables.
Don't use the same table twice in a statement with a CTE if it occurs in a DML statement. Rather, use DELETE ... RETURNING and use the returned values in the other parts of the statement.
If you cannot rewrite the statement like that, use more than one statement instead of putting everything into a single CTE.
#LaurenzAlbe is totally correct in his answer. Below is a working solution to my problem. There are a few things to note:
The unique constraint is formed on a column in rx defined as a date and created by a trigger on update/insert that casts the timestamp of trx to a date: as in trx::date. For reasons I am not clear on, using r.trx::date in place of r.dposted leads to many records being identified and not the one record I want. Not sure why???. So the first fix was to use r.dposted, not r.trx::date.
Although the cte's are designed to be independent of each other, by using "RETURNING..." and incorporating the cte's in a step-wise fashion, one can be built upon another to obtain a final result set.
The working code is:
WITH _in(trx, lastname, firstname, birthdate, old_disp, old_medname, old_sig, old_form, new_disp, new_medname, new_sig, new_form, new_refills) AS (
VALUES ('2001-06-07 00:00:00'::timestamp,
UPPER(TRIM('smith')), UPPER(TRIM('john')), '1957-12-30'::date,
28::integer,
LOWER(TRIM('test')), LOWER(TRIM('i am sig')), LOWER(TRIM('tab')),
28::integer,
LOWER(TRIM('test 1')), LOWER(TRIM('i am sig')), LOWER(TRIM('tab')),
8::integer
)
),
_m AS (
SELECT n.*, d1.recid as old_medication_recid, d2.recid as new_medication_recid, pt.recid as patient_recid
FROM _in n
JOIN patients pt ON (pt.lastname, pt.firstname, pt.birthdate) = (n.lastname, n.firstname, n.birthdate)
JOIN medications d1 ON (n.old_medname, n.old_sig, n.old_form) = (d1.medname, d1.sig, d1.form)
LEFT JOIN medications d2 ON (n.new_medname, n.new_sig, n.new_form) = (d2.medname, d2.sig, d2.form)
),
_t AS ( -- REMOVE CONFLICTING RECORD, IF ANY.
DELETE FROM rx r
USING _m
WHERE (r.dposted, r.disp, r.patient_recid, r.medication_recid) = (_m.trx::date,_m.new_disp, _m.patient_recid, _m.new_medication_recid)
RETURNING r.*
),
_s AS ( -- GET NEW SURROGATE KEY
SELECT _m.*, r1.recid as old_recid, r2.recid as new_recid, COALESCE(r2.recid, r1.recid) as target_recid
FROM _m
JOIN rx r1 ON (r1.dposted, r1.disp, r1.patient_recid, r1.medication_recid) = (_m.trx::date,_m.old_disp, _m.patient_recid, _m.old_medication_recid)
LEFT JOIN rx r2 ON (r2.dposted, r2.disp, r2.patient_recid, r2.medication_recid) = (_m.trx::date,_m.new_disp, _m.patient_recid, _m.new_medication_recid)
LEFT JOIN _t ON (_t.recid = r2.recid)
)
UPDATE rx -- UPDATE ORIGINAL RECORD WITH NEW VALUES.
SET disp = _s.new_disp, medication_recid = _s.new_medication_recid, refills = _s.new_refills, recid = _s.target_recid
FROM _s
WHERE rx.recid = _s.old_recid
RETURNING rx.*;
COMMIT;
Hope this helps somebody.

Merge join not able to join properly on varchar column

I've created below code to implement SCD type 2 using merge, when i run the code i'm getting primary key violations on csname field. I have the below values as part of primary key, not sure whether merge SQL does support for varchar or not.
if I run the normal inner join SQL on the same key then i'm getting the matching records as well.
Any help much appreciated
csname
ER - Building Complaints
TR - Building Applications
CREATE PROCEDURE dbo.load_target
AS
BEGIN
INSERT INTO [TR_DW].[enum].[Rt]([csname],[enddatetime],[EffectiveToDate],[EffectiveFromDate],[CurrentRecord])
SELECT[csname],[enddatetime],[EffectiveToDate],[EffectiveFromDate],[CurrentRecord]
FROM
(
MERGE [TR_DW].[enum].[Rt] RtCSQSuTT
USING [TR].[enum].[Rt] RtCSQSuST
ON (RtCSQSuTT.csname = RtCSQSuST.csname)
WHEN NOT MATCHED THEN
INSERT ([csname],[enddatetime],[EffectiveToDate],[EffectiveFromDate],[CurrentRecord])
VALUES ([csname],[enddatetime],'12/31/9999', getdate(), 'Y')
WHEN MATCHED AND RtCSQSuTT.[CurrentRecord] = 'Y' AND
(ISNULL(RtCSQSuTT.[enddatetime], '') != ISNULL(RtCSQSuST.[enddatetime], ''))THEN
UPDATE SET
RtCSQSuTT.[CurrentRecord] = 'N',
RtCSQSuTT.[EffectiveFromDate] = GETDATE() - 1,
RtCSQSuTT.[EffectiveToDate] = GETDATE()
OUTPUT $Action Action_Taken,RtCSQSuST.[csqname],RtCSQSuST.[enddatetime],'12/31/9999' AS[EffectiveToDate],GETDATE() AS[EffectiveFromDate],'Y' AS[CurrentRecord]
)AS MERGE_OUT21
WHERE MERGE_OUT21.Action_Taken = 'UPDATE';
END
GO

sqlalchemy to create temporary table

I created a temporary table with sqlalchemy (with an underlying postgres database) that is going to be joined with a database table. However, in some cases when a value is empty '' then postgres throws the error:
failed to find conversion function from unknown to text
SqlAlchemy assembles everything to the following context
[SQL: 'WITH temp_table AS \n(SELECT %(param_1)s AS id, %(param_2)s AS email, %(param_3)s AS phone)\n SELECT campaigns_contact.id, campaigns_contact.email, campaigns_contact.phone \nFROM campaigns_contact JOIN temp_table ON temp_table.id = campaigns_contact.id AND temp_table.email = campaigns_contact.email AND temp_table.phone = campaigns_contact.phone'] [parameters: {'param_1': 83, 'param_2': '', 'param_3': '+1234567890'}]
I assemble the temporary table as follows
stmts = []
for row in import_data:
row_values = [literal(row[value]).label(value) for value in values]
stmts.append(select(row_values))
subquery = union_all(*stmts)
subquery = subquery.cte(name="temp_table")
The problem seems to be the part here
...%(param_2)s AS email...
which after replacing the param_2 results in
...'' AS email...
which will cause the error mentioned above.
One way to solve the issue is to perform a cast
...''::text AS email...
However, I don't know how to perform ::text cast with sqlalchemy!?

Joining with set-returning function (SRF) and access columns in SQLAlchemy

Suppose I have an activity table and a subscription table. Each activity has an array of generic references to some other object, and each subscription has a single generic reference to some other object in the same set.
CREATE TABLE activity (
id serial primary key,
ob_refs UUID[] not null
);
CREATE TABLE subscription (
id UUID primary key,
ob_ref UUID,
subscribed boolean not null
);
I want to join with the set-returning function unnest so I can find the "deepest" matching subscription, something like this:
SELECT id
FROM (
SELECT DISTINCT ON (activity.id)
activity.id,
x.ob_ref, x.ob_depth,
subscription.subscribed IS NULL OR subscription.subscribed = TRUE
AS subscribed,
FROM activity
LEFT JOIN subscription
ON activity.ob_refs #> array[subscription.ob_ref]
LEFT JOIN unnest(activity.ob_refs)
WITH ORDINALITY AS x(ob_ref, ob_depth)
ON subscription.ob_ref = x.ob_ref
ORDER BY x.ob_depth DESC
) sub
WHERE subscribed = TRUE;
But I can't figure out how to do that second join and get access to the columns. I've tried creating a FromClause like this:
act_ref_t = (sa.select(
[sa.column('unnest', UUID).label('ob_ref'),
sa.column('ordinality', sa.Integer).label('ob_depth')],
from_obj=sa.func.unnest(Activity.ob_refs))
.suffix_with('WITH ORDINALITY')
.alias('act_ref_t'))
...
query = (query
.outerjoin(
act_ref_t,
Subscription.ob_ref == act_ref_t.c.ob_ref))
.order_by(activity.id, act_ref_t.ob_depth)
But that results in this SQL with another subquery:
LEFT OUTER JOIN (
SELECT unnest AS ob_ref, ordinality AS ref_i
FROM unnest(activity.ob_refs) WITH ORDINALITY
) AS act_ref_t
ON subscription.ob_refs #> ARRAY[act_ref_t.ob_ref]
... which fails because of the missing and unsupported LATERAL keyword:
There is an entry for table "activity", but it cannot be referenced from this part of the query.
So, how can I create a JOIN clause for this SRF without using a subquery? Or is there something else I'm missing?
Edit 1 Using sa.text with TextClause.columns instead of sa.select gets me a lot closer:
act_ref_t = (sa.sql.text(
"unnest(activity.ob_refs) WITH ORDINALITY")
.columns(sa.column('unnest', UUID),
sa.column('ordinality', sa.Integer))
.alias('act_ref'))
But the resulting SQL fails because it wraps the clause in parentheses:
LEFT OUTER JOIN (unnest(activity.ob_refs) WITH ORDINALITY)
AS act_ref ON subscription.ob_ref = act_ref.unnest
The error is syntax error at or near ")". Can I get TextAsFrom to not be wrapped in parentheses?
It turns out this is not directly supported by SA, but the correct behaviour can be achieved with a ColumnClause and a FunctionElement. First import this recipe as described by zzzeek in this SA issue. Then create a special unnest function that includes the WITH ORDINALITY modifier:
class unnest_func(ColumnFunction):
name = 'unnest'
column_names = ['unnest', 'ordinality']
#compiles(unnest_func)
def _compile_unnest_func(element, compiler, **kw):
return compiler.visit_function(element, **kw) + " WITH ORDINALITY"
You can then use it in joins, ordering, etc. like this:
act_ref = unnest_func(Activity.ob_refs)
query = (query
.add_columns(act_ref.c.unnest, act_ref.c.ordinality)
.outerjoin(act_ref, sa.true())
.outerjoin(Subscription, Subscription.ob_ref == act_ref.c.unnest)
.order_by(act_ref.c.ordinality.desc()))

TSQL CTE Error: Incorrect syntax near ')'

I am developing a TSQL stored proc using SSMS 2008 and am receiving the above error while generating a CTE. I want to add logic to this SP to return every day, not just the days with data. How do I do this? Here is my SP so far:
ALTER Proc [dbo].[rpt_rd_CensusWithChart]
#program uniqueidentifier = NULL,
#office uniqueidentifier = NULL
AS
DECLARE #a_date datetime
SET #a_date = case when MONTH(GETDATE()) >= 7 THEN '7/1/' + CAST(YEAR(GETDATE()) AS VARCHAR(30))
ELSE '7/1/' + CAST(YEAR(GETDATE())-1 AS VARCHAR(30)) END
if exists (
select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#ENROLLEES')
) DROP TABLE #ENROLLEES;
if exists (
select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#DISCHARGES')
) DROP TABLE #DISCHARGES;
declare #sum_enrollment int
set #sum_enrollment =
(select sum(1)
from enrollment_view A
join enrollment_info_expanded_view C on A.enrollment_id = C.enroll_el_id
where
(#office is NULL OR A.group_profile_id = #office)
AND (#program is NULL OR A.program_info_id = #program)
and (C.pe_end_date IS NULL OR C.pe_end_date > #a_date)
AND C.pe_start_date IS NOT NULL and C.pe_start_date < #a_date)
select
A.program_info_id as [Program code],
A.[program_name],
A.profile_name as Facility,
A.group_profile_id as Facility_code,
A.people_id,
1 as enrollment_id,
C.pe_start_date,
C.pe_end_date,
LEFT(datename(month,(C.pe_start_date)),3) as a_month,
day(C.pe_start_date) as a_day,
#sum_enrollment as sum_enrollment
into #ENROLLEES
from enrollment_view A
join enrollment_info_expanded_view C on A.enrollment_id = C.enroll_el_id
where
(#office is NULL OR A.group_profile_id = #office)
AND (#program is NULL OR A.program_info_id = #program)
and (C.pe_end_date IS NULL OR C.pe_end_date > #a_date)
AND C.pe_start_date IS NOT NULL and C.pe_start_date >= #a_date
;WITH #ENROLLEES AS (
SELECT '7/1/11' AS dt
UNION ALL
SELECT DATEADD(d, 1, pe_start_date) as dt
FROM #ENROLLEES s
WHERE DATEADD(d, 1, pe_start_date) <= '12/1/11')
The most obvious issue (and probably the one that causes the error message too) is the absence of the actual statement to which the last CTE is supposed to pertain. I presume it should be a SELECT statement, one that would combine the result set of the CTE with the data from the #ENROLLEES table.
And that's where another issue emerges.
You see, apart from the fact that a name that starts with a single # is hardly advisable for anything that is not a local temporary table (a CTE is not a table indeed), you've also chosen for your CTE a particular name that already belongs to an existing table (more precisely, to the already mentioned #ENROLLEES temporary table), and the one you are going to pull data from too. You should definitely not use an existing table's name for a CTE, or you will not be able to join it with the CTE due to the name conflict.
It also appears that, based on its code, the last CTE represents an unfinished implementation of the logic you say you want to add to the SP. I can suggest some idea, but before I go on I'd like you to realise that there are actually two different requests in your post. One is about finding the cause of the error message, the other is about code for a new logic. Generally you are probably better off separating such requests into distinct questions, and so you might be in this case as well.
Anyway, here's my suggestion:
build a complete list of dates you want to be accounted for in the result set (that's what the CTE will be used for);
left-join that list with the #ENROLLEES table to pick data for the existing dates and some defaults or NULLs for the non-existing ones.
It might be implemented like this:
… /* all your code up until the last WITH */
;
WITH cte AS (
SELECT CAST('7/1/11' AS date) AS dt
UNION ALL
SELECT DATEADD(d, 1, dt) as dt
FROM cte
WHERE dt < '12/1/11'
)
SELECT
cte.dt,
tmp.[Program code],
tmp.[program_name],
… /* other columns as necessary; you might also consider
enveloping some or all of the "tmp" columns in ISNULLs,
like in
ISNULL(tmp.[Program code], '(none)') AS [Program code]
to provide default values for absent data */
FROM cte
LEFT JOIN #ENROLLEES tmp ON cte.dt = tmp.pe_start_date
;