I write you because I donĀ“t know how to execute a snowFlake procedure with Azure Databricks.
This is my SnowFlake procedure:
CREATE OR REPLACE PROCEDURE getBalanceFrontAndInTotalFront(tableName VARCHAR, stringBalanceFront VARCHAR, stringInTotalFront VARCHAR)
RETURNS VARCHAR
NOT NULL
LANGUAGE javascript
AS
$$
var tableName = TABLENAME;
var balanceFront = STRINGBALANCEFRONT;
var inTotalFront = STRINGINTOTALFRONT;
// Dynamically compose the SQL statement to execute.
var sqlCommand = "SELECT BALANCE_FRONT, IN_TOTAL_FRONT, SUM(AMOUNT) AS AMOUNT FROM (SELECT " + balanceFront + " AS \"BALANCE_FRONT\", AMOUNT, " + inTotalFront + " AS \"IN_TOTAL_FRONT\" FROM " + tableName + ") GROUP BY BALANCE_FRONT, IN_TOTAL_FRONT";
// Prepare statement.
var stmt = snowflake.createStatement({sqlText: sqlCommand});
// Execute Statement
var rs = stmt.execute();
arrayValues=[]
while (rs.next()) {
var column1 = rs.getColumnValue(1);
var column2 = rs.getColumnValue(2);
var column3 = rs.getColumnValue(3);
arrayValues.push([column1 + ':' + column3 + ':' + column2]);
}
return arrayValues;
$$;
When I execute the procedure in SnowFlake
set stringBalanceFront = 'CASE WHEN Balance_Type like (\'%A%\')THEN \'ACTIVO\' WHEN Balance_Type like (\'%P%\') THEN \'PASIVO\' WHEN Balance_Type like (\'%N%\') THEN \'NETO\' ELSE \'RESTO\' END';
set stringInTotalFront = 'CASE WHEN Balance_Type like (\'%A%\')THEN \'true\' ELSE \'false\' END';
CALL getBalanceFrontAndInTotalFront('DMAAS_OUTPUT_DATA_TABLE_0049_D18CER', $stringBalanceFront, $stringInTotalFront);
I obtain next array of strings
RESTO:-184281744:false,ACTIVO:-17881395:true,NETO:20599:false,PASIVO:12672:false
I am trying to run this procedure from Spark with the following code and it obviously fails
val stringBalanceFront = Funciones.generarCondiciones(dfOrdenado, Variables.CAMPO_BALANCE_FRONT.toLowerCase())
val stringInTotalFront = Funciones.generarCondiciones(dfOrdenado, Variables.CAMPO_IN_TOTAL_FRONT.toLowerCase())
val query = s"CALL getBalanceFrontAndInTotalFront(${cfgVal.getRutaMasterNoAgregada}, ${stringBalanceFront}, ${stringInTotalFront});"
val arrayBalanceFront = spark.read
.format(SNOWFLAKE_SOURCE_NAME)
.options(snowOptionsRead)
.option("query", query)
.load()
And I get the next error:
21/07/15 17:14:36 ERROR Uncaught throwable from user code: net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
syntax error line 1 at position 15 unexpected 'CALL'.
What is the correct way to execute a SnowFlake procedure from Spark? Keep in mind that I want to return the results to a val in Spark.
Thanks in advance!
Best regards.
Related
I am executing a custom function:
def test_insert(col1 , **kwargs):
try:
sql = "INSERT INTO target_tbl SELECT * FROM source_tbl WHERE col1 = '{}'".format(col1)
if len(kwargs.items()) != 0:
for i in kwargs.items():
sql = sql + " AND {} = '{}'".format(i[0],i[1])
return sql
except Exception as e:
return e
It generates a sql script like :- test_insert('1001' , col2 = 'PRD002') => o/p: INSERT INTO sales_history_output SELECT * FROM sales_history WHERE col1 = 'SOP001' AND col2 = 'PRD002'.
Now I want to pass the **kwargs function parameter through databricks notebook parameter. Is there any way to do this? When I am passing '1001' , col2 = 'PRD002' in notebook parameter, it is reading as a single string and not like **kwargs
I would expect to always receive a resultset with one row on a SELECT COUNT, but results.next() always returns false. This is on HSQLDB 2.5.1.
The code below prints:
number of columns: 1. First column C1 with type INTEGER
No COUNT results
statement = connection.createStatement();
// check if table empty
statement.executeQuery("SELECT COUNT(*) FROM mytable");
ResultSet results = statement.getResultSet();
System.out.println("number of columns: " + results.getMetaData().getColumnCount() + ". First column " +results.getMetaData().getColumnName(1) + " with type " +results.getMetaData().getColumnTypeName(1) );
int numberOfRows = 0;
boolean hasResults = results.next();
if (hasResults){
numberOfRows = results.getInt(1);
System.out.println("Table size " + numberOfRows );
}else{
System.out.println("No COUNT results" );
}
statement.close();
Executing the same SQL statement in my SQL console works fine:
C1
104
Other JDBC actions on this database work fine as well.
Is there something obvious I'm missing?
The getResultSet method is applicable to execute, but not executeQuery which returns a ResultSet. That is the one you need to refer to, at the moment you are losing it as you are not assigning it to anything.
See https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeQuery(java.lang.String) and https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getResultSet()
ResultSet results = statement.executeQuery("SELECT COUNT(*) FROM mytable");
I need to use a variable that I've created before in spark to select data from a teradata table:
%spark
sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
val query = "select distinct cod_contrato from xxx.contratos"
val df = sqlContext.sql(query)
val dfv = df.select("cod_contrato")
the variable is a string.
So I would like to query the databe usign that vector of strings:
If I use:
%spark
val sql = s"(SELECT * FROM xx2.CONTRATOS where cod_contrato in '$dfv') as query"
I get:
(SELECT * FROM xx2.CONTRATOS where cod_contrato in '[cod_contrato: string]') as query
The desired result would be:
SELECT * FROM xx2.CONTRATOS where cod_contrato in ('11111', '11112' )
How can I transform the vector to a list enclosed by () and with quotation in each element?
thanks
This is my trial. From some dataframe,
val test = df.select("id").as[String].collect
> test: Array[String] = Array(6597, 8011, 2597, 5022, 5022, 6852, 6852, 5611, 14838, 14838, 2588, 2588)
and so the test is now array. Thus, by using mkString,
val sql = s"SELECT * FROM xx2.CONTRATOS where cod_contrato in " + test.mkString("('", "','", "')") + " as query"
> sql: String = SELECT * FROM xx2.CONTRATOS where cod_contrato in ('6597','8011','2597','5022','5022','6852','6852','5611','14838','14838','2588','2588') as query
where the final result is now string.
Make a temp view of the values you want to filter on and then reference it in the query
%spark
sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
val query = "select distinct cod_contrato from xxx.contratos"
sqlContext.sql(query).selectExpr("cast(cod_contrato as string)").createOrReplaceTempView("dfv_table"")
val sql = "(SELECT * FROM xx2.CONTRATOS where cod_contrato in (select * from dfv_table)) as query"
this will work for the query in spark sql, but will not return a query string. Lamanus's answer should be sufficient if all you want is the query as string
I'm changing queries from an Oracle Database to PostgreSQL, and in this query I am getting this error:
ERROR: syntax error at or near "SET"
the query is:
{call UPDATE alarm_instance SET last_update_time=default, wait_expire_time=null, core_number=nextval(SEQ_ALRM_NUMBR)
where wait_time <= current_date RETURNING alarm_instance_id bulk collect INTO ?}
I am using JDBC to connect to the database and here is the call code
try (CallableStatement cs = super.prepareCall_(query)) {
cs.registerOutParameter(1, Types.ARRAY);
cs.execute();
...
I have taken a long look at Postgres documentation and cannot find what is wrong and didn't find any answer to this specific situation
An UPDATE statement can't be executed with a CallableStatement. A CallableStatement is essentially only intended to call stored procedures. In case of Oracle that includes anonymous PL/SQL blocks.
And bulk collect is invalid in Postgres to begin with.
It seems you want something like this:
String sql =
"UPDATE alarm_instance " +
" SET last_update_time=default, " +
" wait_expire_time=null, "
" core_number=nextval('SEQ_ALRM_NUMBR') " +
" where wait_time <= current_date RETURNING alarm_instance_id";
Statement stmt = connection.createStatement();
stmt.execute(sql);
int rowsUpdated = stmt.getUpdateCount();
ResultSet rs = stmt.getResultSet();
while (rs.next() {
// do something with the returned IDs
}
How can I execute lengthy, multiline Hive Queries in Spark SQL? Like query below:
val sqlContext = new HiveContext (sc)
val result = sqlContext.sql ("
select ...
from ...
");
Use """ instead, so for example
val results = sqlContext.sql ("""
select ....
from ....
""");
or, if you want to format code, use:
val results = sqlContext.sql ("""
|select ....
|from ....
""".stripMargin);
You can use triple-quotes at the start/end of the SQL code or a backslash at the end of each line.
val results = sqlContext.sql ("""
create table enta.scd_fullfilled_entitlement as
select *
from my_table
""");
results = sqlContext.sql (" \
create table enta.scd_fullfilled_entitlement as \
select * \
from my_table \
")
val query = """(SELECT
a.AcctBranchName,
c.CustomerNum,
c.SourceCustomerId,
a.SourceAccountId,
a.AccountNum,
c.FullName,
c.LastName,
c.BirthDate,
a.Balance,
case when [RollOverStatus] = 'Y' then 'Yes' Else 'No' end as RollOverStatus
FROM
v_Account AS a left join v_Customer AS c
ON c.CustomerID = a.CustomerID AND c.Businessdate = a.Businessdate
WHERE
a.Category = 'Deposit' AND
c.Businessdate= '2018-11-28' AND
isnull(a.Classification,'N/A') IN ('Contractual Account','Non-Term Deposit','Term Deposit')
AND IsActive = 'Yes' ) tmp """
It is worth noting that the length is not the issue, just the writing. For this you can use """ as Gaweda suggested or simply use a string variable, e.g. by building it with string builder. For example:
val selectElements = Seq("a","b","c")
val builder = StringBuilder.newBuilder
builder.append("select ")
builder.append(selectElements.mkString(","))
builder.append(" where d<10")
val results = sqlContext.sql(builder.toString())
In addition to the above ways, you can use the below-mentioned way as well:
val results = sqlContext.sql("select .... " +
" from .... " +
" where .... " +
" group by ....
");
Write your sql inside triple quotes, like """ sql code """
df = spark.sql(f""" select * from table1 """)
This is same for Scala Spark and PySpark.