Identifying isNull and isNotNul in dataframe - pyspark

I'm trying to identify the columns which are null and which are not null and depending on that trying to insert a string.
ab_final = join_df.withColumn("linked_A_B",
when(
col("a3_inbound").isNull() & ("a3_item").isNull()
), 'No Value'
).when(
(col("it_item").isNull() & ("it_export").isNull()),
'NoShipment'
).when(
(col("a3_inbound").isNotNull() & ("it_export").isNotNull()),
'Export')
)
I'm getting the below error
str' object has no attribute 'isnull'
Plz help

Related

ADF CopyData Is it possible to have a dynamic Additional Columns that can be nullable?

I have a configuration table with file names, destination table name (and other configs) to copy data into a SQL table. Sometimes I want the filename in a new column, but not for every files.
Is it possible to have a default value to not generate additional column for some files?
I tried
#json(
if(
equals(item().AdditionalColumns, null),
'{}',
item().AdditionalColumns
)
)
But I get this error: The value of property 'additionalColumns' is in unexpected type 'IList`1'.
And
#json(
if(
equals(item().AdditionalColumns, null),
'{[]}',
item().AdditionalColumns
)
)
But I get this error: The function 'json' parameter is not valid. The provided value '{[]}' cannot be parsed: 'Invalid property identifier character: [. Path '', line 1, position 1
Thank you
I figured out.
#json(
if(
equals(item()?.AdditionalColumns, null),
'[]',
item()?.AdditionalColumns
)
)

Using dbt_utils.union_relations wrong but don't know how/why?

So I am a new DBT user, super cool stuff, but I am running into an issue with the union_relations macro. I am feeding relations to this function, but the compiled/run query is not finding any columns from the relations.
Here is the code I'm running:
{{dbt_utils.union_relations(relations=[ref('transform_hist_ca_map_stu_obj_assessment'), ref('transform_hist_sc_map_stu_obj_assessment')])}}
)
select *
from conformed_obj_assessment
where student_assessment_identifier is not null
and assessment_identifier is not null
and identification_code is not null
and student_unique_id is not null
and performance_level is not null
And this is the error I receive:
syntax error at or near "from" LINE 1706: from __dbt__CTE__transform_hist_ca_map_stu_obj_a... ^ compiled SQL at target/run/rally_dw/conformed/conformed_student_objective_assessment.sql
Basically the first column is a DBT generated column, and there are supposed to be columns from the relations after that, but for some reason these columns are not being pulled in. I'm wondering if this is because the relations I want to pull from are currently ephemeral, so not materialized, So I'm wondering if that's causing an issue. Here is the compiled SQL, the CTEs return data, but for some reason it's not getting pulled into the last CTE.
create table "dashboarding"."dev_em_conformed"."conformed_student_objective_assessment__dbt_tmp"
as (
with __dbt__CTE__historical_ca_map_stu_obj_assessment as (
with hist_ca_map_stu_obj_assess as (
select * from "dashboarding"."raw_ea"."historical_ca_map_student_obj_assessment"
),
cleaned as (
select distinct
source_org,
assessment_id as assessment_identifier,
student_assessment_identifier,
student_unique_id,
performance_levels as performance_level,
scale_score as score,
assessment_id,
to_date(test_date, 'YYYY-MM-DD') as test_date,
identification_code,
null as parent_objective_assessment_name
from hist_ca_map_stu_obj_assess
)
select * from cleaned
), __dbt__CTE__transform_hist_ca_map_stu_obj_assessment as (
with hist_ca_stu_obj_assess as (
select * from __dbt__CTE__historical_ca_map_stu_obj_assessment
),
final as(
select
null as source_org,
student_assessment_identifier,
assessment_id as assessment_identifier,
identification_code as identification_code,
null as school_year,
student_unique_id,
null as student_grade_level,
null as assessment_grade_level,
NULL as administration_date,
null as administration_end_date,
null as objective_assessment_name,
score,
performance_level,
parent_objective_assessment_name,
null as parent_objective_assessment_id
from hist_ca_stu_obj_assess
)
select * from final
), __dbt__CTE__historical_sc_map_stu_obj_assessment as (
with hist_sc_map_soa as (
select * from "dashboarding"."raw_ea"."historical_sc_map_student_obj_assessment"
),
cleaned as (
select distinct
source_org,
assessment_id as assessment_identifier,
student_assessment_identifier,
student_unique_id,
performance_levels as performance_level,
scale_score as score,
assessment_id,
to_date(test_date, 'YYYY-MM-DD') as test_date,
identification_code,
null as parent_objective_assessment_name
from hist_sc_map_soa
)
select * from cleaned
), __dbt__CTE__transform_hist_sc_map_stu_obj_assessment as (
with hist_sc_stu_obj_assess as (
select * from __dbt__CTE__historical_sc_map_stu_obj_assessment
),
final as(
select
null as source_org,
student_assessment_identifier,
assessment_id as assessment_identifier,
identification_code as identification_code,
null as school_year,
student_unique_id,
null as student_grade_level,
null as assessment_grade_level,
NULL as administration_date,
null as administration_end_date,
null as objective_assessment_name,
score,
performance_level,
parent_objective_assessment_name,
null as parent_objective_assessment_id
from hist_sc_stu_obj_assess
)
select * from final
), conformed_obj_assessment as(
(
select
cast('__dbt__CTE__transform_hist_ca_map_stu_obj_assessment' as
varchar
) as _dbt_source_relation,
---NO MORE COLUMNS???
from __dbt__CTE__transform_hist_ca_map_stu_obj_assessment
)
union all
(
select
cast('__dbt__CTE__transform_hist_sc_map_stu_obj_assessment' as
varchar
) as _dbt_source_relation,
---NO MORE COLUMNS??
from __dbt__CTE__transform_hist_sc_map_stu_obj_assessment
)
)
select *
from conformed_obj_assessment
where student_assessment_identifier is not null
and assessment_identifier is not null
and identification_code is not null
and student_unique_id is not null
and performance_level is not null
);
Any thoughts would be super appreciated thank you!
The union_relations macro relies on knowing what columns are in your relations (tables/views), as stored in the information schema. Since this model is ephemeral, there aren't any records in the information schema, which is why there's SQL like this:
select
cast('__dbt__CTE__transform_hist_ca_map_stu_obj_assessment' as
varchar
) as _dbt_source_relation,
from __dbt__CTE__transform_hist_ca_map_stu_obj_assessment
I noticed that you're using a slightly older version of dbt-utils — while we haven't fixed this issue, we have improved the way this issue is handled (released in v0.5.0).
A newer version of dbt-utils will helpfully tell you something like:
Compilation Error in model test_ephemeral (models/test_ephemeral.sql)
The `union_relations` macro cannot be used with ephemeral models, as it relies on the information schema.
`__dbt__CTE__my_ephemeral` is an ephemeral model. Consider making is a view or table instead.` is an ephemeral model. Consider making is a view or table instead.
So, as the (new) error message suggests — the only way around this is to make your upstream models a view or table.

The column index is out of range: 2, number of columns: 1 error while updating jsonb column

I am trying to update jsonb column in java with mybatis.
Following is my mapper method
#Update("update service_user_assn set external_group = external_group || '{\"service_name\": \"#{service_name}\" }' where user=#{user} " +
" and service_name= (select service_name from services where service_name='Google') " )
public int update(#Param("service_name")String service_name,#Param("user") Integer user);
I am getting the following error while updating the jsonb (external_group) cloumn.
### Error updating database. Cause: org.postgresql.util.PSQLException: The column index is out of range: 2, number of columns: 1.
### The error may involve com.apds.mybatis.mapper.ServiceUserMapper.update-Inline
I am able to update with the same way for non-jsonb columns.
Also if I am putting hardcoded value it's working for jsonb columns.
How to solve this error while updating jsonb column?
You should not enclose #{} in single quotes because it will become part of a literal rather than a placeholder. i.e.
external_group = external_group || '{"service_name": "?"}' where ...
So, there will be only one placeholder in the PreparedStatement and you get the error.
The correct way is to concatenate the #{} in SQL.
You may also need to cast the literal to jsonb type explicitly.
#Update({
"update service_user_assn set",
"external_group = external_group",
"|| ('{\"service_name\": \"' || #{service_name} || '\" }')::jsonb",
"where user=#{user} and",
"service_name= (select service_name from services where service_name='Google')"})
The SQL being executed would look as follows.
external_group = external_group || ('{"service_name": "' || ? || '"}')::jsonb where ...

understanding complex SP in DB2

I need to make changes to an SP which has a bunch of complex XML functions and what not
Declare ResultCsr2 Cursor For
WITH
MDI_BOM_COMP(PROD_ID,SITE_ID, xml ) AS (
SELECT TC401F.T41PID,TC401F.T41SID,
XMLSERIALIZE(
XMLAGG(
XMLELEMENT( NAME "MDI_BOM_COMP",
XMLFOREST(
trim(TC401F.T41CTY) AS COMPONENT_TYPE,
TC401F.T41LNO AS COMP_NUM,
trim(TC401F.T41CTO) AS CTRY_OF_ORIGIN,
trim(TC401F.T41DSC) AS DESCRIPTION,
TC401F.T41EFR AS EFFECTIVE_FROM,
TC401F.T41EFT AS EFFECTIVE_TO,
trim(TC401F.T41MID) AS MANUFACTURER_ID,
trim(TC401F.T41MOC) AS MANUFACTURER_ORG_CODE,
trim(TC401F.T41CNO) AS PROD_ID,
trim(TC401F.T41POC) AS PROD_ORG_CODE,
TC401F.T41QPR AS QTY_PER,
trim(TC401F.T41SBI) AS SUB_BOM_ID,
trim(TC401F.T41SBO) AS SUB_BOM_ORG_CODE, --ADB01
trim(TC401F.T41VID) AS SUPPLIER_ID,
trim(TC401F.T41SOC) AS SUPPLIER_ORG_CODE,
TC401F.T41UCT AS UNIT_COST
)
)
) AS CLOB(1M)
)
FROM TC401F TC401F
GROUP BY T41PID,T41SID
)
SELECT
RowNum, '<BOM_INBOUND>' ||
XMLSERIALIZE (
XMLELEMENT(NAME "INTEGRATION_MESSAGE_CONTROL",
XMLFOREST(
'FULL_UPDATE' as ACTION,
'POLARIS' as COMPANY_CODE,
TRIM(TC400F.T40OCD) as ORG_CODE,
'5' as PRIORITY,
'INBOUND_ENTITY_INTEGRATION' as MESSAGE_TYPE,
'POLARIS_INTEGRATION' as USERID,
'TA' as RECEIVER,
HEX(Generate_Unique()) as SOURCE_SYSTEM_TOKEN
),
XMLELEMENT(NAME "BUS_KEY",
XMLFOREST(
TRIM(TC400F.T40BID) as BOM_ID,
TRIM(TC400F.T40OCD) as ORG_CODE
)
)
) AS VARCHAR(1000)
)
|| '<MDI_BOM>' ||
XMLSERIALIZE (
XMLFOREST(
TRIM(TC400F.T40ATP) AS ASSEMBLY_TYPE,
TRIM(TC400F.T40BID) AS BOM_ID,
TRIM(TC400F.T40CCD) AS CURRENCY_CODE,
TC400F.T40DPC AS DIRECT_PROCESSING_COST,
TC400F.T40EFD AS EFFECTIVE_FROM,
TC400F.T40EFT AS EFFECTIVE_TO,
TRIM(TC400F.T40MID) AS MANUFACTURER_ID,
TRIM(TC400F.T40MOC) AS MANUFACTURER_ORG_CODE,
TRIM(TC400F.T40OCD) AS ORG_CODE,
TRIM(TC400F.T40PRF) AS PROD_FAMILY,
TRIM(TC400F.T40PID) AS PROD_ID,
TRIM(TC400F.T40POC) AS PROD_ORG_CODE,
TRIM(TC400F.T40ISA) AS IS_ACTIVE,
TRIM(TC400F.T40VID) AS SUPPLIER_ID,
TRIM(TC400F.T40SOC) AS SUPPLIER_ORG_CODE,
TRIM(TC400F.T40PSF) AS PROD_SUB_FAMILY,
CASE TRIM(TC400F.T40PML)
WHEN '' THEN TRIM(TC400F.T40PML)
ELSE TRIM(TC400F.T40PML) || '~' || TRIM(TC403F.T43MDD)
END AS PROD_MODEL
) AS VARCHAR(3000)
)
|| IFNULL(MBC.xml, '') ||
XMLSERIALIZE (
XMLFOREST(
XMLFOREST(
TRIM(TC400F.T40CCD) AS CURRENCY_CODE,
TC400F.T40PRI AS PRICE,
TRIM(TC400F.T40PTY) AS PRICE_TYPE
) AS MDI_BOM_PRICE,
XMLFOREST(
TRIM(TC400F.T40CCD) AS CURRENCY_CODE,
TRIM(TC400F.T40PRI) AS PRICE,
'TRANSACTION_VALUE' AS PRICE_TYPE
) AS MDI_BOM_PRICE,
XMLFOREST(
TRIM(TC400F.T40INA) AS INCLUDE_IN_AVERAGING
) AS MDI_BOM_IMPL_BOM_PROD_FAMILY_AUTOMOBILES
) AS VARCHAR(3000)
)
|| '</MDI_BOM>' ||
'</BOM_INBOUND>' XML
FROM (
SELECT
ROW_NUMBER() OVER (
ORDER BY T40STS
,T40SID
,T40BID
) AS RowNum
,t.*
FROM TC400F t
) TC400F
LEFT OUTER JOIN MDI_BOM_COMP MBC
ON TC400F.T40SID = MBC.SITE_ID
AND TC400F.T40PID = MBC.PROD_ID
LEFT OUTER JOIN TC403F TC403F
ON TC400F.T40PML <> ''
AND TC400F.T40PML = TC403F.T43MDL
WHERE TC400F.T40STS = '10'
AND TC400F.RowNUM BETWEEN
(P_STARTROW + (P_PAGENOS - 1) * P_NBROFRCDS)
AND (P_STARTROW + (P_PAGENOS - 1) * P_NBROFRCDS +
P_NBROFRCDS - 1);
Given above is a cursor declaration in the SP code which I am struggling to understand. The very first WITH itself seems to be mysterious. I have used it along with temporary table names but this is the first time, Im seeing something of this sort which seems to be an SP or UDF? Can someone please guide me on how to understand and make sense out of all this?
Adding further to the question, the actual requirement here is to arrange the data in the XML such a way that that those records which have TC401F.T41SBI field populated should appear in the beginning of the XML output..
This field is being selected as below in the code:
trim(TC401F.T41SBI) AS SUB_BOM_ID. If this field is non-blank, this should appear first in the XML and any records with this field value Blank should appear only after. What would be the best approach to do this? Using ORDER BY in any way does not really seem to help as the XML is actually created through some functions and ordering by does not affect how the items are arranged within the XML. One approach I could think of was using a where clause where TC401F.T41SBI <> '' first then append those records where TC401F.T41SBI = ''
Best I can do is help with the CTE.
WITH
MDI_BOM_COMP(PROD_ID,SITE_ID, xml ) AS (
SELECT TC401F.T41PID,TC401F.T41SID,
This just generates a table named MDI_BOM_COMP with three columns named PROD_ID, SITE_ID, and XML. The table will have one record for each PROD_ID, SITE_ID, and the contents of XML will be an XML snippet with all the components for that product and site.
Now the XML part can be a bit confusing, but if we break it down into it's scalar and aggregate components, we can make it a bit more understandable.
First ignore the grouping. so the CTE retrieves each row in TC401F. XMLELEMENT and XMLFORREST are scalar functions. XMLELEMENT creates a single XML element The tag is the first parameter, and the content of the element is the second in the above example. XMLFORREST is like a bunch of XMLELEMENTs concatenated together.
XMLSERIALIZE(
XMLAGG(
XMLELEMENT( NAME "MDI_BOM_COMP",
XMLFOREST(
trim(TC401F.T41CTY) AS COMPONENT_TYPE,
TC401F.T41LNO AS COMP_NUM,
trim(TC401F.T41CTO) AS CTRY_OF_ORIGIN,
trim(TC401F.T41DSC) AS DESCRIPTION,
TC401F.T41EFR AS EFFECTIVE_FROM,
TC401F.T41EFT AS EFFECTIVE_TO,
trim(TC401F.T41MID) AS MANUFACTURER_ID,
trim(TC401F.T41MOC) AS MANUFACTURER_ORG_CODE,
trim(TC401F.T41CNO) AS PROD_ID,
trim(TC401F.T41POC) AS PROD_ORG_CODE,
TC401F.T41QPR AS QTY_PER,
trim(TC401F.T41SBI) AS SUB_BOM_ID,
trim(TC401F.T41SBO) AS SUB_BOM_ORG_CODE, --ADB01
trim(TC401F.T41VID) AS SUPPLIER_ID,
trim(TC401F.T41SOC) AS SUPPLIER_ORG_CODE,
TC401F.T41UCT AS UNIT_COST
)
)
) AS CLOB(1M)
So in the example, for each row in the table, XMLFORREST creates a list of XML elements, one for each of COMPONENT_TYPE, COMP_NUM, CTRY_OF_ORIGIN, etc. These elements form the content of another XML element MDI_BOM_COMP which is created by XMLELEMENT.
Now for each row in the table we have selected PROD_ID, SITE_ID, and created some XML. Next we group by PROD_ID, and SITE_ID. The aggregation function XMLAGG will collect all the XML for each PROD_ID and SITE_ID, and concatenate it together.
Finally XMLSERIALIZE will convert the internal XML representation to the string format we all know and love ;)
I think I found the answer for my requirement. I had to add an order by field name after XMLELEMENT function

Conditional Where statement syntax or some other solution

#MallUnit is a parameter with value 'Unit 401,Unit 402,Unit 403'
I would like to have a conditional where statement. Assume that before the AND there are other conditions that work just fine. Basically, if ScheduledMallUnitTypeID is null evaluate using the IN condition. Otherwise, use the like clause.
AND
CASE ScheduledMallUnitTypeID IS NULL THEN
ScheduledMallUnitTypeID IN
(
SELECT Value
FROM Toolbox.dbo.ReportingPortalMultiSetParameterFix(#MallUnit)
)
ELSE ScheduledMallUnitTypeID LIKE #MallUnit
END
This would work:
WHERE
( ScheduledMallUnitTypeID IS NULL AND
ScheduledMallUnitTypeID IN
(
SELECT Value
FROM Toolbox.dbo.ReportingPortalMultiSetParameterFix(#MallUnit)
)
)
OR
(
ScheduledMallUnitTypeID IS NOT NULL AND
ScheduledMallUnitTypeID LIKE #MallUnit
)