How do I insert a pandas DataFrame to an existing PostgreSQL table? - postgresql

I have a pandas DataFrame df and a PostgreSQL table my_table. I wish to truncate my_table and insert df (which has columns in the same order) into my_table, without affecting the schema of my_table. How do I do this?
In a rather naive attempt, I tried dropping my_table and then using pandas.DataFrame.to_sql, but this creates a new table with a different schema.

I would manually truncate the table and then simply let Pandas do its job:
con.execute('TRUNCATE my_table RESTART IDENTITY;')
df.to_sql('my_table', con, if_exists='append')

Related

PySpark delete/insert in transaction with JDBC

I want to delete a MSSQL table with ID in a PySpark dataframe and then insert into MSSQL table with another dataframe. I have 2 questions:
Is it possible to write sql with dataframe? Something like DELETE mssql_table WHERE ID IN (SELECT ID from some_dataframe)?
Is it possible to make DELETE/INSERT in a transaction?
Thanks

Insert Spark DF as column in existing hive table

I'm looking for a way to append a column spark DF to an existing Hive table, I'm using the code below to overwrite the table but only works when df schema and hive table schema are equal, but sometimes I need to add one column and since schemas don't match it does not work.
Is there a way to append the df as a column?
Or I have to make an ALTER TABLE ADD COLUMN in a spark.sql()?
temp = spark.table('temp')
temp.write.mode('overwrite').insertInto(datalab + '.' + table,overwrite=True)
Hope my question is understandable, thanks.
You can get a dataframe with new set of data that has additional columns, then append that to the existing table, in the following manner.
new_data_df = df with additional columns
new_data_df.write.mode('append').saveAsTable('same_table_name', mergeSchema=True)
So suppose, the new column you have added is 'column_new', the older records in the table will have been set with null values.

Hive create partitioned table based on Spark temporary table

I have a Spark temporary table spark_tmp_view with DATE_KEY column. I am trying to create a Hive table (without writing the temp table to a parquet location. What I have tried to run is spark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS mydb.result AS SELECT * FROM spark_tmp_view PARTITIONED BY(DATE_KEY DATE)")
The error I got is mismatched input 'BY' expecting <EOF> I tried to search but still haven't been able to figure out the how to do it from a Spark app, and how to insert data after. Could someone please help? Many thanks.
PARTITIONED BY is part of definition of a table being created, so it should precede ...AS SELECT..., see Spark SQL syntax.

How to save results of query into a new column, postgres

How to add results of selecting query to a new column in already existing table in PostgreSQL in pgAdmin?
Results of the select query is an alteration of over columns in the same table.
Just create the column and update data afterwards (using the command line bundled in pgadmin):
ALTER TABLE tablename ADD COLUMN colname coltype;
UPDATE tablename SET colname = yourexpression;

Delete column in hive table

I am working with hive version 0.9 and I need delete columns of a hive table. I have searched in several manuals of hive commands but I only I have found commands to version 0.14. Is possible to delete a column of a hive table in hive version 0.9? What is the command?
Thanks.
We can’t simply drop a table column from a hive table using the below statement like sql.
ALTER TABLE tbl_name drop column column_name ---- it will not work.
So there is a shortcut to drop columns from a hive table.
Let’s say we have a hive table.
From this table I want to drop the column Dob. You can use the ALTER TABLE REPLACE statement to drop a column.
ALTER TABLE test_tbl REPLACE COLUMNS(ID STRING,NAME STRING,AGE STRING); you have to give the column names which you want to keep in the table
There isn't a drop column or delete column in Hive.
A SELECT statement can take regex-based column specification in Hive releases prior to 0.13.0, or in 0.13.0 and later releases if the configuration property hive.support.quoted.identifiers is set to none.
That being said you could create a new table or view using the following:
drop table if exists database.table_name;
create table if not exists database.table_name as
select `(column_to_remove_1|...|column_to_remove_N)?+.+`
from database.some_table
where
...
;
This will create a table that has all the columns from some_table except the columns named column_to_remove_1, ... , to column_to_remove_N. You can also choose to create a view instead.
ALTER TABLE table_name REPLACE COLUMNS ( c1 int, c2 String);
NOTE: eliminate column from column list. It will keep matched columns and removed unmentioned columns from table schema.
we can not delete column from hive table . But droping a table(if its external) in hive and the recreating table(with column excluded) ,wont delete ur data .
so what can u do is(if u dont have table structure) run this command :
show create table database_name.table_name;
Then you can copy it and edit it (with column eliminated).Afterwards you can do as per invoke the shell
table details are empid,name,dept,salary ,address. i want remove address column. Just write REPLACE COLUMNS like below query
jdbc:hive2://> alter table employee replace columns(empid int, name string,dept string,salary int);
As mentioned before, you can't drop table using an alter statement.
Alter - replace is not guaranteed to work in all the cases.
I found the best answer for this here:
https://stackoverflow.com/a/48921280/4385453