Create clickhouse temporary table with doobie - scala

I want to create clickhouse temporary table with doobie, but i don't know how to add session_id parameter in query
I tried
val sql =
sql"CREATE TEMPORARY TABLE " ++ Fragment.const(tableName) ++ sql"( call_id String ) ENGINE Memory()"
HC.setClientInfo("session_id",sessionId.toString) *> sql.update.run
but this not working

Related

Spark 2.4 Unable to Insert Record using variable

I am trying to insert record into a table using a variable but it is failing.
command:
val query = "INSERT into TABLE Feed_metadata_s2 values ('LOGS','RUN_DATE',{} )".format(s"$RUN_DATE")
spark.sql(s"query")
spark.sql("INSERT into TABLE Feed_metadata_s2 values ('LOGS','ExtractStartTimestamp',$ExtractStartTimestamp)")
error:
INSERT into TABLE Feed_metadata_s2 values ('SDEDLOGS','ExtractStartTimestamp',$ExtractStartTimestamp)
------------------------------------------------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
It seems you're confused with string interpolation... you need to put s before the last query so that the variable is substituted into the string. Also the first two lines can be simplified:
val query = s"INSERT into TABLE Feed_metadata_s2 values ('LOGS','RUN_DATE',$RUN_DATE)"
spark.sql(query)
spark.sql(s"INSERT into TABLE Feed_metadata_s2 values ('LOGS','ExtractStartTimestamp',$ExtractStartTimestamp)")

Slick:Insert into a Table from Raw SQL Select

Insert into a Table from Raw SQL Select
val rawSql: DBIO[Vector[(String, String)]] = sql"SELECT id, name FROM SomeTable".as[(String, String)]
val myTable :TableQuery[MyClass] // with columns id (String), name(String) and some other columns
Is there a way to use forceInsert functions to insert data from select into the tables?
If not, Is there a way to generate a sql string by using forceInsertStatements?
Something like:
db.run {
myTable.map{ t => (t.id, t.name)}.forceInsert????(rawSql)
}
P.S. I don't want to make two I/O calls because my RAW SQL might be returning thousands of records.
Thanks for the help.
If you can represent your rawSql query as a Slick query instead...
val query = someTable.map(row => (row.id, row.name))
...for example, then forceInsertQuery will do what you need. An example might be:
val action =
myTable.map(row => (row.someId, row.someName))
.forceInsertQuery(
someTable.map(query)
)
However, I presume you're using raw SQL for a good reason. In that case, I don't believe you can use forceInsert (without a round-trip to the database) because the raw SQL is already an action (not a query).
But, as you're using raw SQL, why not do the whole thing in raw SQL? Something like:
val rawEverything =
sqlu" insert into mytable (someId, someName) select id, name from sometable "
...or similar.

How to add a new column to a Delta Lake table?

I'm trying to add a new column to data stored as a Delta Table in Azure Blob Storage. Most of the actions being done on the data are upserts, with many updates and few new inserts. My code to write data currently looks like this:
DeltaTable.forPath(spark, deltaPath)
.as("dest_table")
.merge(myDF.as("source_table"),
"dest_table.id = source_table.id")
.whenNotMatched()
.insertAll()
.whenMatched(upsertCond)
.updateExpr(upsertStat)
.execute()
From these docs, it looks like Delta Lake supports adding new columns on insertAll() and updateAll() calls only. However, I'm updating only when certain conditions are met and want the new column added to all the existing data (with a default value of null).
I've come up with a solution that seems extremely clunky and am wondering if there's a more elegant approach. Here's my current proposed solution:
// Read in existing data
val myData = spark.read.format("delta").load(deltaPath)
// Register table with Hive metastore
myData.write.format("delta").saveAsTable("input_data")
// Add new column
spark.sql("ALTER TABLE input_data ADD COLUMNS (new_col string)")
// Save as DataFrame and overwrite data on disk
val sqlDF = spark.sql("SELECT * FROM input_data")
sqlDF.write.format("delta").option("mergeSchema", "true").mode("overwrite").save(deltaPath)
Alter your delta table first and then you do your merge operation:
from pyspark.sql.functions import lit
spark.read.format("delta").load('/mnt/delta/cov')\
.withColumn("Recovered", lit(''))\
.write\
.format("delta")\
.mode("overwrite")\
.option("overwriteSchema", "true")\
.save('/mnt/delta/cov')
New columns can also be added with SQL commands as follows:
ALTER TABLE dbName.TableName ADD COLUMNS (newColumnName dataType)
UPDATE dbName.TableName SET newColumnName = val;
This is the approach that worked for me using scala
Having a delta table, named original_table, which path is:
val path_to_delta = "/mnt/my/path"
This table currently has got 1M records with the following schema: pk, field1, field2, field3, field4
I want to add a new field, named new_field, to the existing schema without loosing the data already stored in original_table.
So I first created a dummy record with a simple schema containing just pk and newfield
case class new_schema(
pk: String,
newfield: String
)
I created a dummy record using that schema:
import spark.implicits._
val dummy_record = Seq(new new_schema("delete_later", null)).toDF
I inserted this new record (the existing 1M records will have newfield populated as null). I also removed this dummy record from the original table:
dummy_record
.write
.format("delta")
.option("mergeSchema", "true")
.mode("append")
.save(path_to_delta )
val original_dt : DeltaTable = DeltaTable.forPath(spark, path_to_delta )
original_dt .delete("pk = 'delete_later'")
Now the original table will have 6 fields: pk, field1, field2, field3, field4 and newfield
Finally I upsert the newfield values in the corresponding 1M records using pk as join key
val df_with_new_field = // You bring new data from somewhere...
original_dt
.as("original")
.merge(
df_with_new_field .as("new"),
"original.pk = new.pk")
.whenMatched
.update( Map(
"newfield" -> col("new.newfield")
))
.execute()
https://www.databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html
Have you tried using the merge statement?
https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html

Check the Metastore for the Table availability in Spark

I want to check if the table is available in a specific hive database before using the below select query.
How to get the information from the metastore.
sparkSession.sql("select * from db_name.table_name")
you can run below command before run your operation on the table
sparkSession.sql("use databaseName");
val df = sparkSession.sql("show tables like 'tableName'")
if(df.head(1).isEmpty == false){
//write the code
}
Try this
#This will give the list of tables in the database as a dataframe.
tables = sparkSession.sql('SHOW TABLES IN '+db_name)
#You can collect them to bring in a list of row items
tables_list = tables.select('tableName').collect()
# convert them into a python array
tables_list_arr=[]
for table_name in tables:
tables_list_arr.append(str(table_name['tableName']))
# Then execute your command
if(table_name in tables_list_arr):
sparkSession.sql("select * from db_name.table_name")

How to delete data from Hive external table for Non-Partition column?

I have created an external table in Hive partitioned by client and month.
The requirement asks to delete the data for ID=201 from that table but it's not partitioned by the ID column.
I have tried to do with Insert Overwrite but it's not working.
We are using Spark 2.2.0.
How can I solve this problem?
val sqlDF = spark.sql("select * from db.table")
val newSqlDF1 = sqlDF.filter(!col("ID").isin("201") && col("month").isin("062016"))
val columns = newSqlDF1.schema.fieldNames.mkString(",")
newSqlDF1.createOrReplaceTempView("myTempTable") --34
spark.sql(s"INSERT OVERWRITE TABLE db.table PARTITION(client, month) select ${columns} from myTempTable")