I have a Spark temporary table spark_tmp_view with DATE_KEY column. I am trying to create a Hive table (without writing the temp table to a parquet location. What I have tried to run is spark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS mydb.result AS SELECT * FROM spark_tmp_view PARTITIONED BY(DATE_KEY DATE)")
The error I got is mismatched input 'BY' expecting <EOF> I tried to search but still haven't been able to figure out the how to do it from a Spark app, and how to insert data after. Could someone please help? Many thanks.
PARTITIONED BY is part of definition of a table being created, so it should precede ...AS SELECT..., see Spark SQL syntax.
Related
I want to use this syntax that was posted in a similar thread to update the partitions of a delta table I have. The challenge is the delta table does not exist in databricks but we use databricks to update the delta able via Azure Data factory.
How can I adjust the below syntax to update the partitions and overwrite the table via the table path?
Python:
val input = spark.read.table("mytable")
input.write.format("delta")
.mode("overwrite")
.option("overwriteSchema", "true")
.partitionBy("colB") // different column
.saveAsTable("mytable")
SQL:
REPLACE TABLE <tablename>
USING DELTA
PARTITIONED BY (view_date)
AS
SELECT * FROM <tablename>
I tried to adjust the above code but could not adjust it to use the delta table path.
If you have path, then you need to use correct functions:
for Python, you need to use .load for reading, and .save for writing
if you're using SQL, then you specify table as following:
delta.`path`
I want to do aggregations on the result of
SHOW TABLES FROM databasename
Or create a new table with the result like
CREATE TABLE database.new_table AS (
SHOW TABLES FROM database
);
But I'm getting multiple different errors if I try to do anything else with SHOW TABLES.
Is there another way of doing anything with the result of SHOW TABLES or another way creating a table with all the column names in a database? I have previously worked with Teradata where it's quite easy.
Edit: I only have access to Databricks SQL Analytics. So can only write in pure SQL.
Another way of doing it:
spark.sql("use " + databasename)
df = spark.sql("show tables")
df.write.saveAsTable('databasename.new_table')
I am trying to insert data into a table using insert overwrite statement but I am getting below error.
org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;
command is as below
spark.sql("INSERT OVERWRITE TABLE edm_hive SELECT run_number+1 from edm_hive")
I am trying to use temp table, store the results and then update in final table but that is also not working.
Also I am trying to insert record into table using some variable but that is also not working.
e.g.
spark.sql("INSERT into TABLE Feed_metadata_s2 values ('LOGS','StartTimestamp',$StartTimestamp)")
Please suggest
This solution works well for me. I added that property in sparkSession.
Spark HiveContext : Insert Overwrite the same table it is read from
val spark = SparkSession.builder()
.config("spark.sql.hive.convertMetastoreParquet","false")
In SQL stored procedures, we have an option of creating a temporary table "#temp" whose structure is as that of another table that it is referring to. Here we don't explicitly create and mention the structure of "#temp" table.
Do we have similar option is HQL Hive script to create a temp table during run time without actually creating the table structure. Thus I can dump data to temp table and use it. Below code shows an example of #temp table in SQL.
SELECT name, age, gender
INTO #MaleStudents
FROM student
WHERE gender = 'Male'
Hive has the concept of temporary tables, which are local to a user's session. These tables behave just like any other table, and can be created using CTAS commands too. Hive automatically deletes all temporary tables at the end of the Hive session in which they are created.
Read more about them here.
Hive Documentation
DWGEEK
You can create simple temporary table. On this table you can perform any operation.
Once you are done with work and log out of your session they will be deleted automatically.
Syntax for temporary table is :
CREATE TEMPORARY TABLE TABLE_NAME_HERE (key string, value string)
I have a scenario and would like to get an expert opinion on it.
I have to load a Hive table in partitions from a relational DB via spark (python). I cannot create the hive table as I am not sure how many columns there are in the source and they might change in the future, so I have to fetch data by using; select * from tablename.
However, I am sure of the partition column and know that will not change. This column is of "date" datatype in the source db.
I am using SaveAsTable with partitionBy options and I am able to properly create folders as per the partition column. The hive table is also getting created.
The issue I am facing is that since the partition column is of "date" data type and the same is not supported in hive for partitions. Due to this I am unable to read data via hive or impala queries as it says date is not supported as partitioned column.
Please note that I cannot typecast the column at the time of issuing the select statement as I have to do a select * from tablename, and not select a,b,cast(c) as varchar from table.