COALESCE command issue in Hive - select

I am trying to run a hive query using COALESCE function to create a view. But it is throwing error like
cannot recognize input near '(' 'SELECT' 'realvalue' in expression specification
The query is given below. Please help and mention what is wrong in this.
CREATE VIEW IF NOT EXISTS exampledb.`ara_service` AS
SELECT T1.EntityId, T1.entityname AS EntityName,
T1.`xxx`,
T1.`yyy`,
COALESCE (T1.`aaa`, (SELECT `realvalue` FROM exampledb.`aba_service`
WHERE `id` = '333')) AS `CombinedValue`,
T1.`ddd`,
T1.`jjj`,
etc..
Please help. The error is in the usage of the select statement inside COALESCE .
NoViableAltException(231#[435:1: precedenceEqualExpression : ( ( LPAREN precedenceBitwiseOrExpression COMMA )=> precedenceEqualExpressionMutiple | precedenceEqualExpressionSingle );])
Thanks

if all you need is a default value, you could do
CREATE VIEW IF NOT EXISTS exampledb.`ara_service` AS
SELECT T1.EntityId, T1.entityname AS EntityName,
T1.`xxx`,
T1.`yyy`,
COALESCE (T1.`aaa`, def.`realvalue` ) AS `CombinedValue`,
T1.`ddd`,
T1.`jjj`,
FROM your_table T1
CROSS JOIN (
SELECT `realvalue`
FROM exampledb.`aba_service` WHERE `id` = '333') def

Related

LPAD function errors when used in WITH variable in Redshift

Can you tell me why this is throwing an error in Redshift?
WITH Testing_PADDING AS (SELECT '12345678' AS column1)
SELECT LPAD(column1, 9,'0') FROM Testing_PADDING;
Here is the error I receive:
"Invalid operation: failed to find conversion function from "unknown" to text;"
Redshift can't determine data type from the context, so you need to explicitly set it
WITH Testing_PADDING AS (SELECT '12345678'::text AS column1)
SELECT
LPAD(column1, 9, '0')
FROM Testing_PADDING;
I suspect that one of your strings isn't being seen as text - likely the column1 text. (Sorry don't have a cluster up not to test)
Try:
WITH Testing_PADDING AS (SELECT '12345678'::text AS column1)
SELECT LPAD(column1, 9,'0'::text) FROM Testing_PADDING;

How to pass column values in where condition

I am failing to achieve simple SQL query in Spark.
I would like to write the below query in Scala Spark:
select * from emp where emp_id in (select distinct manager_id from emp ;
Below is what I tried:
empdf.where(col("emp_id").isin(empdf.select("manager_id").collect().map(_(0)).toList)).show()
I got the below error:
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(null, 68319, 68319, 68319, 65646, 65646, 69062, 66928, 66928, 66928, 66928, 67858, 66928, 67832)
It's better to do a semi join to avoid collecting as list:
empdf.alias("t1").join(empdf.alias("t2"), expr("t1.emp_id = t2.manager_id"), "left_semi")
If you want to use isin, you can expand the List using : _* (see this post):
empdf.where(col("emp_id").isin(empdf.select("manager_id").collect().map(_(0)).toList: _*)).show()
Or use isInCollection:
empdf.where(col("emp_id").isInCollection(empdf.select("manager_id").collect().map(_(0)).toList)).show()
You can also write as a SQL,
empdf.createorreplaceview("emp")
spark.sql("select * from emp where emp_id in (select distinct manager_id from emp ")

postgresql how to use boolean in where statement

I'm a really bad in sql , my query is
select * from car_wash where
(select ST_Within((select car_wash.lon_lat from car_wash),(select ST_Buffer(ST_GeomFromText('POINT(65.3 323.2)'),20)))) = true
AND car_wash.was_deleted=false;
But i know that it isn't correct because nested query can return more than 1 column, how to modify this query to use where clause
select *
from car_wash cw
where
ST_Within (
cw.lon_lat,
ST_Buffer(ST_GeomFromText('POINT(65.3 323.2)'),20)
)
AND
not car_wash.was_deleted
I don't use postgresql, but maybe something like this works:
select * from car_wash
where EXISTS (select ST_Within((select car_wash.lon_lat from car_wash),
(select ST_Buffer(ST_GeomFromText('POINT(65.3 323.2)'),20))) within
WHERE within = true)
AND car_wash.was_deleted=false;
If it doesn't work, I have a variant, so tell me when.

Joining with set-returning function (SRF) and access columns in SQLAlchemy

Suppose I have an activity table and a subscription table. Each activity has an array of generic references to some other object, and each subscription has a single generic reference to some other object in the same set.
CREATE TABLE activity (
id serial primary key,
ob_refs UUID[] not null
);
CREATE TABLE subscription (
id UUID primary key,
ob_ref UUID,
subscribed boolean not null
);
I want to join with the set-returning function unnest so I can find the "deepest" matching subscription, something like this:
SELECT id
FROM (
SELECT DISTINCT ON (activity.id)
activity.id,
x.ob_ref, x.ob_depth,
subscription.subscribed IS NULL OR subscription.subscribed = TRUE
AS subscribed,
FROM activity
LEFT JOIN subscription
ON activity.ob_refs #> array[subscription.ob_ref]
LEFT JOIN unnest(activity.ob_refs)
WITH ORDINALITY AS x(ob_ref, ob_depth)
ON subscription.ob_ref = x.ob_ref
ORDER BY x.ob_depth DESC
) sub
WHERE subscribed = TRUE;
But I can't figure out how to do that second join and get access to the columns. I've tried creating a FromClause like this:
act_ref_t = (sa.select(
[sa.column('unnest', UUID).label('ob_ref'),
sa.column('ordinality', sa.Integer).label('ob_depth')],
from_obj=sa.func.unnest(Activity.ob_refs))
.suffix_with('WITH ORDINALITY')
.alias('act_ref_t'))
...
query = (query
.outerjoin(
act_ref_t,
Subscription.ob_ref == act_ref_t.c.ob_ref))
.order_by(activity.id, act_ref_t.ob_depth)
But that results in this SQL with another subquery:
LEFT OUTER JOIN (
SELECT unnest AS ob_ref, ordinality AS ref_i
FROM unnest(activity.ob_refs) WITH ORDINALITY
) AS act_ref_t
ON subscription.ob_refs #> ARRAY[act_ref_t.ob_ref]
... which fails because of the missing and unsupported LATERAL keyword:
There is an entry for table "activity", but it cannot be referenced from this part of the query.
So, how can I create a JOIN clause for this SRF without using a subquery? Or is there something else I'm missing?
Edit 1 Using sa.text with TextClause.columns instead of sa.select gets me a lot closer:
act_ref_t = (sa.sql.text(
"unnest(activity.ob_refs) WITH ORDINALITY")
.columns(sa.column('unnest', UUID),
sa.column('ordinality', sa.Integer))
.alias('act_ref'))
But the resulting SQL fails because it wraps the clause in parentheses:
LEFT OUTER JOIN (unnest(activity.ob_refs) WITH ORDINALITY)
AS act_ref ON subscription.ob_ref = act_ref.unnest
The error is syntax error at or near ")". Can I get TextAsFrom to not be wrapped in parentheses?
It turns out this is not directly supported by SA, but the correct behaviour can be achieved with a ColumnClause and a FunctionElement. First import this recipe as described by zzzeek in this SA issue. Then create a special unnest function that includes the WITH ORDINALITY modifier:
class unnest_func(ColumnFunction):
name = 'unnest'
column_names = ['unnest', 'ordinality']
#compiles(unnest_func)
def _compile_unnest_func(element, compiler, **kw):
return compiler.visit_function(element, **kw) + " WITH ORDINALITY"
You can then use it in joins, ordering, etc. like this:
act_ref = unnest_func(Activity.ob_refs)
query = (query
.add_columns(act_ref.c.unnest, act_ref.c.ordinality)
.outerjoin(act_ref, sa.true())
.outerjoin(Subscription, Subscription.ob_ref == act_ref.c.unnest)
.order_by(act_ref.c.ordinality.desc()))

Column is of type timestamp without time zone but expression is of type character

I'm trying to insert records on my trying to implement an SCD2 on Redshift
but get an error.
The target table's DDL is
CREATE TABLE ditemp.ts_scd2_test (
id INT
,md5 CHAR(32)
,record_id BIGINT IDENTITY
,from_timestamp TIMESTAMP
,to_timestamp TIMESTAMP
,file_id BIGINT
,party_id BIGINT
)
This is the insert statement:
INSERT
INTO ditemp.TS_SCD2_TEST(id, md5, from_timestamp, to_timestamp)
SELECT TS_SCD2_TEST_STAGING.id
,TS_SCD2_TEST_STAGING.md5
,from_timestamp
,to_timestamp
FROM (
SELECT '20150901 16:34:02' AS from_timestamp
,CASE
WHEN last_record IS NULL
THEN '20150901 16:34:02'
ELSE '39991231 11:11:11.000'
END AS to_timestamp
,CASE
WHEN rownum != 1
AND atom.id IS NOT NULL
THEN 1
WHEN atom.id IS NULL
THEN 1
ELSE 0
END AS transfer
,stage.*
FROM (
SELECT id
FROM ditemp.TS_SCD2_TEST_STAGING
WHERE file_id = 2
GROUP BY id
HAVING count(*) > 1
) AS scd2_count_ge_1
INNER JOIN (
SELECT row_number() OVER (
PARTITION BY id ORDER BY record_id
) AS rownum
,stage.*
FROM ditemp.TS_SCD2_TEST_STAGING AS stage
WHERE file_id IN (2)
) AS stage
ON (scd2_count_ge_1.id = stage.id)
LEFT JOIN (
SELECT max(rownum) AS last_record
,id
FROM (
SELECT row_number() OVER (
PARTITION BY id ORDER BY record_id
) AS rownum
,stage.*
FROM ditemp.TS_SCD2_TEST_STAGING AS stage
)
GROUP BY id
) AS last_record
ON (
stage.id = last_record.id
AND stage.rownum = last_record.last_record
)
LEFT JOIN ditemp.TS_SCD2_TEST AS atom
ON (
stage.id = atom.id
AND stage.md5 = atom.md5
AND atom.to_timestamp > '20150901 16:34:02'
)
) AS TS_SCD2_TEST_STAGING
WHERE transfer = 1
and to short things up, I am trying to insert 20150901 16:34:02 to from_timestamp and 39991231 11:11:11.000 to to_timestamp.
and get
ERROR: 42804: column "from_timestamp" is of type timestamp without time zone but expression is of type character varying
Can anyone please suggest how to solve this issue?
Postgres isn't recognizing 20150901 16:34:02 (your input) as a valid time/date format, so it assumes it's a string.
Use a standard date format instead, preferably ISO-8601. 2015-09-01T16:34:02
SQLFiddle example
Just in case someone ends up here trying to insert into a postgresql a timestamp or a timestampz from a variable in groovy or Java from a prepared statement and getting the same error (as I did), I managed to do it by setting the property stringtype to "unspecified". According to the documentation:
Specify the type to use when binding PreparedStatement parameters set
via setString(). If stringtype is set to VARCHAR (the default), such
parameters will be sent to the server as varchar parameters. If
stringtype is set to unspecified, parameters will be sent to the
server as untyped values, and the server will attempt to infer an
appropriate type. This is useful if you have an existing application
that uses setString() to set parameters that are actually some other
type, such as integers, and you are unable to change the application
to use an appropriate method such as setInt().
Properties props = [user : "user", password: "password",
driver:"org.postgresql.Driver", stringtype:"unspecified"]
def sql = Sql.newInstance("url", props)
With this property set, you can insert a timestamp as a string variable without the error raised in the question title. For instance:
String myTimestamp= Instant.now().toString()
sql.execute("""INSERT INTO MyTable (MyTimestamp) VALUES (?)""",
[myTimestamp.toString()]
This way, the type of the timestamp (from a String) is inferred correctly by postgresql. I hope this helps.
Inside apache-tomcat-9.0.7/conf/server.xml
Add "?stringtype=unspecified" to the end of url address.
For example:
<GlobalNamingResources>
<Resource name="jdbc/??" auth="Container" type="javax.sql.DataSource"
...
url="jdbc:postgresql://127.0.0.1:5432/Local_DB?stringtype=unspecified"/>
</GlobalNamingResources>