How to Convert T-SQL IF statement to Databricks PySpark - pyspark

I have the following code in T-SQL
IF NOT EXISTS ( SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'airports' AND COLUMN_NAME = 'airport_region') SELECT * FROM airports;
I would like to convert the above T-SQL to Pyspark.
I have the following dataframe
df = df1.createOrReplaceTempView('airports')
My attempt at converting the above is as follows:
sql("""IF NOT EXISTS(SELECT * FROM airports where table = airports and COLUMN = 'airport_region') select * from airports""")
The above gives me a ParseException: error.
Any thoughts?

I have reproduced the above and if you want to do it in Pyspark you can use the above query and inside if, execute the SQL script.
df.createOrReplaceTempView("sample1")
if('name1' not in df.columns):
spark.sql("select * from sample1").show()
If you want to do it in SQL query you can try like below.
First get the columns names as dataframe and save as a temporary view and using that view, if the your column name not exists in that select the required table.
column_names=spark.sql("show columns in sample1");
column_names.createOrReplaceTempView("tempcols")
spark.sql("select * from sample1 where not exists (select * from tempcols where col_name='name1')").show()
If column exists:

Try this:
if('airport_region' not in df1.columns):
<do stuff>

Related

Execute SQL stored in dataframe using pyspark

I have a list of SQL's stored in a hive table column, I have to get one sql at a time from the hive table and execute the SQL, I'm getting the SQL as a dataframe, but can anyone tell me how to execute the SQL stored as a dataframe?
The column parameter_value contains the SQL
extrac_sql = spark.sql(""" select parameter_value
from schema_name.table_params
where project_name = 'some_projectname'
and sub_project_name = 'some_sub_project'
and parameter_name = 'extract_sql' """)
now the extract_sql contains the sql, how to execute it?
You can do it as follows:
sqls = spark.sql(""" select parameter_value
from schema_name.table_params
where project_name = 'some_projectname'
and sub_project_name = 'some_sub_project'
and parameter_name = 'extract_sql' """).collect()
for sql in sqls:
spark.sql(sql[0]).show()

PSQL - UPDATE tableA off tableB where Column1 Match

I have a broken column with null values, however I have managed to import the data off a csv into TempTable
MediaRecords - localpath column is null
TempTable - localpath column is correct
UPDATE mediarecords
SET localpath = TempTable.localpath
FROM TempTable
WHERE recordid = TempTable.recordid;
I keep getting ERROR: relation "temptable" does not exist
LINE 3: FROM TempTable
however I can browse the table and see the data.
I tried following this How to update selected rows with values from a CSV file in Postgres? and here we are
Hu Bucky, can you check how the table is actually called because I see you referring at it as TempTable with camelcase and the error states temptable all lowercase.
PostgreSQL could be case sensitive. As example if you do the following
create table "TempTableABC" (id int);
Trying to select from temptableABC will fail
defaultdb=> select * from temptableABC;
ERROR: relation "temptableabc" does not exist
LINE 1: select * from temptableABC;
^
You'll need to use the same quoted syntax to make it work
defaultdb=> select * from "TempTableABC";
id
----
(0 rows)
UPDATE mediarecords
SET localpath = "TempTable".localpath
FROM public."TempTable"
WHERE "mediarecords".recordid = "TempTable".recordid;
Worked

I have a requirement where I need to dynamically convert column values into column headers in Azure SQL DW

table structure is something like below (total number of record goes upto 150)
After transposing, table result set should be like below where .... represent n number of columns
Basically, my idea is to create a temp table on the fly and have its column names defined from the select statement to get result-set shown in 2nd picture
Query should be something like ---
SELECT * INTO #Cols FROM (select * of above resultset)A WHERE 1=2
Note:- Please refrain from using FOR XML Path as Azure SQL DW currently doesn't support this feature.
I have no way of validating this works, however, from my searchfu STRING_AGG is available on Azure Data warehouse. I assume it has access to QUOTENAME and it does have access to dynamic statements so you can do something like this:
DECLARE #SQL_Start nvarchar(4000) = N'SELECT ',
#SQL_Columns nvarchar(4000),
#SQL_End nvarchar(4000) = N'INTO SomeTable FROM YourTable WHERE 1 = 2;';
SET #SQL_Columns = (SELECT STRING_AGG(QUOTENAME(ColumnName),',') WITHIN GROUP (ORDER BY ColumnName)
FROM (SELECT DISTINCT ColumnName
FROM YourTable) YT);
EXEC(#SQL_Start + #SQL_Columns + #SQL_End);
But, again, the real solution is to fix your design.

MyBatis select query with IN but without forEach

how can I use a select query where I have only two values for one column without using for each loop ?
Without using IN, you can use OR:
SELECT * FROM `table` WHERE co1 = `value1` OR col1 = `value2`

How to create a filter that does the SQL equivalent of WHERE ... IN for SQLite.Swift

I would like to filter results for the same column with multiple values
example in sql:
SELECT * FROM myTable WHERE status = 1 AND current_condition = ("New", "Working")
this will return all rows from myTable where the status is 1 and the current_condition is "New" OR "Working"
how do I do this in SQLite.swift?
You can use raw SQL in Swift. So you can use the string you posted.
Raw SQL
Using Filters
I use Filters, gives me more insight.