How to create a new table with the results of SHOW TABLES in Databricks SQL? - databricks-sql

I want to do aggregations on the result of
SHOW TABLES FROM databasename
Or create a new table with the result like
CREATE TABLE database.new_table AS (
SHOW TABLES FROM database
);
But I'm getting multiple different errors if I try to do anything else with SHOW TABLES.
Is there another way of doing anything with the result of SHOW TABLES or another way creating a table with all the column names in a database? I have previously worked with Teradata where it's quite easy.
Edit: I only have access to Databricks SQL Analytics. So can only write in pure SQL.

Another way of doing it:
spark.sql("use " + databasename)
df = spark.sql("show tables")
df.write.saveAsTable('databasename.new_table')

Related

Delete tables in batches (Pyspark)

I have a database that has many tables in it. I want to drop all tables in that database that have "oct" in the name in a batch. Is there a way to do this? I can't find a clear answer online and I don't want to make a mistake and delete tables I shouldn't. Thanks for any help in advance!
I assume, you are talking about Hive for simplicity, and the metastore is configured. Then, you can use spark.sql to achieve it with the usual SQL commands. List the tables using like (with pattern matching), iterate the dataframe and drop them.
# Pick all tables in 'agg' schema which contains word 'customer' in it. Usual pattern matching.(In your case, its oct)
df = spark.sql("show tables in agg like '*customer*'")
# Iterate the dataframe that contains list of tables, and drop one by one.
for row in df.rdd.collect():
print(f'Dropping table {row.tableName}')
spark.sql(f'drop table agg.{row.tableName}')

Hive create partitioned table based on Spark temporary table

I have a Spark temporary table spark_tmp_view with DATE_KEY column. I am trying to create a Hive table (without writing the temp table to a parquet location. What I have tried to run is spark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS mydb.result AS SELECT * FROM spark_tmp_view PARTITIONED BY(DATE_KEY DATE)")
The error I got is mismatched input 'BY' expecting <EOF> I tried to search but still haven't been able to figure out the how to do it from a Spark app, and how to insert data after. Could someone please help? Many thanks.
PARTITIONED BY is part of definition of a table being created, so it should precede ...AS SELECT..., see Spark SQL syntax.

On Google Data Studio, using PostgreSQL data, how do I "SELECT * ..." but for camelCase columns?

On Google Data Studio, I cannot create a chart from Postgres data if table columns are in camelCase. I have data in PostgreSQL where I want to get charts from. Integrating it as a data source works fine. Now, I have a problem when creating a chart.
After creating a chart and selecting a data source, I try to add a column, which results in this error:
Error with SQL statement: ERROR: column "columnname" does not exist Hint: Perhaps you meant to reference the column "table.columnName". Position: 8
It just so happens that all my columns are in camelCase. Is there no way around this? Surely this is a basic question that has been resolved.
When connecting to your data source, try using 'Custom query' instead of selecting a table from your database. Then manually write your SQL query where you cast your camel case column names to lower case using sql alias. Worked for me.
example:
SELECT
"camelCaseColA" as cola,
"camelCaseColB" as colb,
"camelCaseColC" as colc
FROM
tableName as table

how to dump data into a temporary table(without actually creating the temporary table) from an external table in Hive Script during run time

In SQL stored procedures, we have an option of creating a temporary table "#temp" whose structure is as that of another table that it is referring to. Here we don't explicitly create and mention the structure of "#temp" table.
Do we have similar option is HQL Hive script to create a temp table during run time without actually creating the table structure. Thus I can dump data to temp table and use it. Below code shows an example of #temp table in SQL.
SELECT name, age, gender
INTO #MaleStudents
FROM student
WHERE gender = 'Male'
Hive has the concept of temporary tables, which are local to a user's session. These tables behave just like any other table, and can be created using CTAS commands too. Hive automatically deletes all temporary tables at the end of the Hive session in which they are created.
Read more about them here.
Hive Documentation
DWGEEK
You can create simple temporary table. On this table you can perform any operation.
Once you are done with work and log out of your session they will be deleted automatically.
Syntax for temporary table is :
CREATE TEMPORARY TABLE TABLE_NAME_HERE (key string, value string)

Most straightforward way to add a row to an SQL Server table in ADO.NET without hardcoded SQL?

I am wondering what the best / most efficient / common way is to add a row to an SQL Server table using C# and ADO.NET. I know of course that I can just create an SQL statement for that, but first, the destination table schema might vary, so I want to keep this flexible, and second, there are so much columns that I do not want to code and maintain this manually. So I currently use a SqlCommandBuilder that is automatically creating the proper insert statement for me, together with an SQLDataAdapter, like this:
var dataAdapter = new SqlDataAdapter("select * from sometable", _databaseConnection);
new SqlCommandBuilder(dataAdapter);
dataAdapter.Fill(dataTable);
// ... add row to dataTable, fill fields from some external file that
// ... includes column names as well,
//.... add some more field values not from the file, etc. ...
dataAdapter.Update(dataTable);
This seems pretty inefficient though to first grab all the records from the table even though I do not need them for anything (especially considering that there might even already be a million records in there). Using some select statement like select * from sometable where 1=2 would work, but it does not seem like a very clean approach. I imagine there is some different solution for this that I am just not aware of.
Thanks,
Timo
I think the best way to insert rows is by using Stored Procedures through the ADO.NET command object.
If you are inserting massive amounts of data and are using SQL Server 2008 you can pass DataTable objects to a stored procedure by using a User-Defined Table Types.
In SQL:
CREATE TYPE SAMPLE_TABLE_TYPE --
AS
field1 VARCHAR(255)
field2 VARCHAR(255)
CREATE STORED PROCEDURE insert_data
AS
#data Sample_TABLE_TYPE
BEGIN
INSERT INTO table1 (field1, field1)
SELECT username, password FROM #data;
In .NET:
DataTable myTable = new DataTable();
myTable.Columns.Add(new DataColumn("field1", typeof(string));
myTable.Columns.Add(new DataColumn("field1", typeof(string));
SqlCommand command = new SqlCommand(conn, CommandType.StoredProcedure);
command.Parameters.Add("#data", myTable);
command.ExecuteNonQuery();
If you data also contains updates you can use the new MERGE function used in SQL Server 2008 to efficiently perform both inserts and updates in the same procedure.
However, if creating User-Defined Table Types and creating stored procedures is too much work, and you need a complete dynamic solution I would stick with what you have, with the recommendation of using the
Where 1 = 0
appended to your SQL text.
You also can use "SELECT TOP(0) * FROM SOMETABLE;" query.