How to pass column values in where condition - scala

I am failing to achieve simple SQL query in Spark.
I would like to write the below query in Scala Spark:
select * from emp where emp_id in (select distinct manager_id from emp ;
Below is what I tried:
empdf.where(col("emp_id").isin(empdf.select("manager_id").collect().map(_(0)).toList)).show()
I got the below error:
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(null, 68319, 68319, 68319, 65646, 65646, 69062, 66928, 66928, 66928, 66928, 67858, 66928, 67832)

It's better to do a semi join to avoid collecting as list:
empdf.alias("t1").join(empdf.alias("t2"), expr("t1.emp_id = t2.manager_id"), "left_semi")
If you want to use isin, you can expand the List using : _* (see this post):
empdf.where(col("emp_id").isin(empdf.select("manager_id").collect().map(_(0)).toList: _*)).show()
Or use isInCollection:
empdf.where(col("emp_id").isInCollection(empdf.select("manager_id").collect().map(_(0)).toList)).show()

You can also write as a SQL,
empdf.createorreplaceview("emp")
spark.sql("select * from emp where emp_id in (select distinct manager_id from emp ")

Related

Getting duplicate column ERROR while trying to insert same column with two different datatypes using SELECT INTO clause in PostgreSql

I need to insert createdate column twice with two different datatypes one with the datatype defined in the table itself and another in char datatype.
I can insert it by changing the alias name of createdate column but can't insert with same alias name which i need.
so help me out to get correct way of doing it.
My query:
SELECT DISTINCT TE.id, T.debatchqueuelink, TE.transactionlink,
EC.errorclassification, TE.errorvalue,
EC.errorparameter, TE.classificationlink, TE.description,
TE.createdate AS createdate, TO_CHAR(TE.createdate, 'MM/dd/yyyy') AS createdate,
TE.status, TE.rebutt, TE.rebuttedstatus, BQ.appbatchnumber,
BQ.scanbatchnumber, BQ.clientlink, BQ.locationlink, T.patientid,
(DEUD.firstname|| ' ' ||DEUD.lastname) AS deusername, DEUD.email AS deuseremail,
(QCUD.firstname|| ' ' ||QCUD.lastname) AS qcusername, TE.inactive,
TE.decomment
INTO table373
FROM qctransactionerror TE
INNER JOIN errorclassification EC ON EC.id = TE.classificationlink
INNER JOIN qctransaction T ON T.id = TE.transactionlink
INNER JOIN batchqueue BQ ON T.debatchqueuelink = BQ.id
INNER JOIN batchqueue QCBQ ON T.qcbatchqueuelink = QCBQ.id
INNER JOIN userdetail QCUD ON QCBQ.assignedto = QCUD.id
INNER JOIN userdetail DEUD ON BQ.assignedto = DEUD.id
WHERE TE.inactive='t'
AND TE.status IN ('ERROR','QCCORRECTED')
LIMIT 0
The actual error message I am getting is:
Duplicate column:column "createdate" specified more than once

ERROR: Teradata prepare: Syntax error when usign date format 'ddmmyyy'd in SAS

I'm using this code to extract information from this database. However, it is showing me this error:
ERROR: Teradata prepare: Syntax error, expected something like ')' between a string or a Unicode character literal and the word
'd'. SQL statement was: WITH vmher102ult as ( select cod_cte, max(fec_consulta) as max_fec_consulta from
klarmxpw_her.vmher102 where cod_cte not in ('','0','00000000') and fec_consulta>='01MAR2021'd group by cod_cte) select t1.*
from klarmxpw_her.vmher101 as t1 inner join vmher102ult as t2 on t1.cod_cte=t2.cod_cte and
t1.fec_consulta=t2.max_fec_consulta.
The code I'm using for this pass through is the following:
proc sql;
connect to teradata as tera (user=&tuser. password=&tpass. server='TDMX03');
create table vmher101_m as
select * from connection to tera (
WITH vmher102ult as (
select cod_cte, max(fec_consulta) as max_fec_consulta
from klarmxpw_her.vmher102
where cod_cte not in ('','0','00000000')
and fec_consulta>='01MAR2021'd
group by cod_cte)
select t1.*
from klarmxpw_her.vmher101 as t1
inner join vmher102ult as t2
on t1.cod_cte=t2.cod_cte and t1.fec_consulta=t2.max_fec_consulta);
disconnect from ter;
Does anybody know what can I do?
You need to use TERADATA code inside the () after from connection to tera.
Try
and fec_consulta>= DATE '2021-03-01'
Teradata Documentation

COALESCE command issue in Hive

I am trying to run a hive query using COALESCE function to create a view. But it is throwing error like
cannot recognize input near '(' 'SELECT' 'realvalue' in expression specification
The query is given below. Please help and mention what is wrong in this.
CREATE VIEW IF NOT EXISTS exampledb.`ara_service` AS
SELECT T1.EntityId, T1.entityname AS EntityName,
T1.`xxx`,
T1.`yyy`,
COALESCE (T1.`aaa`, (SELECT `realvalue` FROM exampledb.`aba_service`
WHERE `id` = '333')) AS `CombinedValue`,
T1.`ddd`,
T1.`jjj`,
etc..
Please help. The error is in the usage of the select statement inside COALESCE .
NoViableAltException(231#[435:1: precedenceEqualExpression : ( ( LPAREN precedenceBitwiseOrExpression COMMA )=> precedenceEqualExpressionMutiple | precedenceEqualExpressionSingle );])
Thanks
if all you need is a default value, you could do
CREATE VIEW IF NOT EXISTS exampledb.`ara_service` AS
SELECT T1.EntityId, T1.entityname AS EntityName,
T1.`xxx`,
T1.`yyy`,
COALESCE (T1.`aaa`, def.`realvalue` ) AS `CombinedValue`,
T1.`ddd`,
T1.`jjj`,
FROM your_table T1
CROSS JOIN (
SELECT `realvalue`
FROM exampledb.`aba_service` WHERE `id` = '333') def

Joining with set-returning function (SRF) and access columns in SQLAlchemy

Suppose I have an activity table and a subscription table. Each activity has an array of generic references to some other object, and each subscription has a single generic reference to some other object in the same set.
CREATE TABLE activity (
id serial primary key,
ob_refs UUID[] not null
);
CREATE TABLE subscription (
id UUID primary key,
ob_ref UUID,
subscribed boolean not null
);
I want to join with the set-returning function unnest so I can find the "deepest" matching subscription, something like this:
SELECT id
FROM (
SELECT DISTINCT ON (activity.id)
activity.id,
x.ob_ref, x.ob_depth,
subscription.subscribed IS NULL OR subscription.subscribed = TRUE
AS subscribed,
FROM activity
LEFT JOIN subscription
ON activity.ob_refs #> array[subscription.ob_ref]
LEFT JOIN unnest(activity.ob_refs)
WITH ORDINALITY AS x(ob_ref, ob_depth)
ON subscription.ob_ref = x.ob_ref
ORDER BY x.ob_depth DESC
) sub
WHERE subscribed = TRUE;
But I can't figure out how to do that second join and get access to the columns. I've tried creating a FromClause like this:
act_ref_t = (sa.select(
[sa.column('unnest', UUID).label('ob_ref'),
sa.column('ordinality', sa.Integer).label('ob_depth')],
from_obj=sa.func.unnest(Activity.ob_refs))
.suffix_with('WITH ORDINALITY')
.alias('act_ref_t'))
...
query = (query
.outerjoin(
act_ref_t,
Subscription.ob_ref == act_ref_t.c.ob_ref))
.order_by(activity.id, act_ref_t.ob_depth)
But that results in this SQL with another subquery:
LEFT OUTER JOIN (
SELECT unnest AS ob_ref, ordinality AS ref_i
FROM unnest(activity.ob_refs) WITH ORDINALITY
) AS act_ref_t
ON subscription.ob_refs #> ARRAY[act_ref_t.ob_ref]
... which fails because of the missing and unsupported LATERAL keyword:
There is an entry for table "activity", but it cannot be referenced from this part of the query.
So, how can I create a JOIN clause for this SRF without using a subquery? Or is there something else I'm missing?
Edit 1 Using sa.text with TextClause.columns instead of sa.select gets me a lot closer:
act_ref_t = (sa.sql.text(
"unnest(activity.ob_refs) WITH ORDINALITY")
.columns(sa.column('unnest', UUID),
sa.column('ordinality', sa.Integer))
.alias('act_ref'))
But the resulting SQL fails because it wraps the clause in parentheses:
LEFT OUTER JOIN (unnest(activity.ob_refs) WITH ORDINALITY)
AS act_ref ON subscription.ob_ref = act_ref.unnest
The error is syntax error at or near ")". Can I get TextAsFrom to not be wrapped in parentheses?
It turns out this is not directly supported by SA, but the correct behaviour can be achieved with a ColumnClause and a FunctionElement. First import this recipe as described by zzzeek in this SA issue. Then create a special unnest function that includes the WITH ORDINALITY modifier:
class unnest_func(ColumnFunction):
name = 'unnest'
column_names = ['unnest', 'ordinality']
#compiles(unnest_func)
def _compile_unnest_func(element, compiler, **kw):
return compiler.visit_function(element, **kw) + " WITH ORDINALITY"
You can then use it in joins, ordering, etc. like this:
act_ref = unnest_func(Activity.ob_refs)
query = (query
.add_columns(act_ref.c.unnest, act_ref.c.ordinality)
.outerjoin(act_ref, sa.true())
.outerjoin(Subscription, Subscription.ob_ref == act_ref.c.unnest)
.order_by(act_ref.c.ordinality.desc()))

Postgres CTE : type character varying(255)[] in non-recursive term but type character varying[] overall

I am new to SO and postgres so please excuse my ignorance. Attempting to get the cluster for a graph in postgres using a solution similar to the one in this post Find cluster given node in PostgreSQL
the only difference is my id is a UUID and I am using varchar(255) to store this id
when i try to run the query I get the following error (but not sure how to cast):
ERROR: recursive query "search_graph" column 1 has type character varying(255)[] in non-recursive term but type character varying[] overall
SQL state: 42804
Hint: Cast the output of the non-recursive term to the correct type.
Character: 81
my code (basically same as previous post):
WITH RECURSIVE search_graph(path, last_profile1, last_profile2) AS (
SELECT ARRAY[id], id, id
FROM node WHERE id = '408d6b12-d03e-42c2-a2a7-066b3c060a0b'
UNION ALL
SELECT sg.path || m.toid || m.fromid, m.fromid, m.toid
FROM search_graph sg
JOIN rel m
ON (m.fromid = sg.last_profile2 AND NOT sg.path #> ARRAY[m.toid])
OR (m.toid = sg.last_profile1 AND NOT sg.path #> ARRAY[m.fromid])
)
SELECT DISTINCT unnest(path) FROM search_graph;
Try casting the SELECT lists in the recursive and non-recursive terms to varchar.
WITH RECURSIVE search_graph(path, last_profile1, last_profile2) AS (
SELECT ARRAY[id]::varchar[], id::varchar, id::varchar
FROM node WHERE id = '408d6b12-d03e-42c2-a2a7-066b3c060a0b'
UNION ALL
SELECT (sg.path || m.toid || m.fromid)::varchar[], m.fromid::varchar, m.toid::varchar
FROM search_graph sg
JOIN rel m
ON (m.fromid = sg.last_profile2 AND NOT sg.path #> ARRAY[m.toid])
OR (m.toid = sg.last_profile1 AND NOT sg.path #> ARRAY[m.fromid])
)
SELECT DISTINCT unnest(path) FROM search_graph;