I'm trying to execute an S3 copy operation via Spark-Redshift and I'm looking to modify the Redshift table structure before running the copy command in order to add any missing columns (they should be all VARCHAR).
What I'm able to do is send an SQL query before running the copy, so ideally I would have liked to ALTER TABLE ADD COLUMN IF NOT EXISTS column_name VARCHAR(256). Unfortunately, Redshift does not offer support for ADD COLUMN IF NOT EXISTS, so I'm currently looking for a workaround.
I've tried to query the pg_table_def table to check for the existence of the column, and that works, but I'm not sure how to chain that with an ALTER TABLE statement. Here's the current state of my query, I'm open to any suggestions for accomplishing the above.
select
case when count(*) < 1 then ALTER TABLE tbl { ADD COLUMN 'test_col' VARCHAR(256) }
else 'ok'
end
from pg_table_def where schemaname = 'schema' and tablename = 'tbl' and pg_table_def.column = 'test_col'
Also, I've seen this question: Redshift: add column if not exists, however the accepted answer isn't mentioning how to actually achieve this.
Related
I'm trying to change all the USER-DEFINED columns to TEXT in a specific view using pgsql.
Is it possible to do that in a single alter table query ou do I need to check first what columns contain that dataype and then perform the dataype change one by one?
This is what I'm trying:
ALTER TABLE if exists "schemaName"."Table_A"
ALTER COLUMN (
select column_name
from information_schema.columns inf
where table_name = 'Table_A' and inf.data_type = 'USER-DEFINED')
TYPE TEXT;
I'm getting and error in the subquery start "("
You need to do this one by one. Generally speaking such DDL statements cannot work on several objects in one statement.
For ALTER TABLE, see: https://www.postgresql.org/docs/12/sql-altertable.html.
FOR ALTER VIEW, see: https://www.postgresql.org/docs/current/sql-alterview.html
I am using Redshift. I want a query to delete selected rows from a redshift table if the table exists otherwise just ignore the statement.
Redshift's SQL dialect doesn't contain control-of-flow statements like IF.. THEN so you are not going to be able to do this in a single SQL statement.
Your application or process will need to first query the Redshift table metadata to determine if a table exists e.g.
select 1 from pg_tables where schemaname = 'myschema' and tablename = 'myschema';
If data is returned (i.e. the table exists) then the application or process will execute the delete statement, if no data is returned the application or process does nothing. Basically you need to handle the "if this then do this" logic externally to Redshift.
I recommend #Nathan's answer. I would use python/psycopg2 to set up this logic. The first query would check for the table's existence in pg_tables (eg SELECT count(1) FROM pg_tables WHERE tablename='foo'), and store the result in a variable. Then you'd check the results of that variable to decide whether to kick off a second query (your delete).
But, maybe you don't want to do it in Python. You're just all about Redshift (it's pretty sweet). You could just run the DELETE query in Redshift. If the table is not present, the query fails and nothing happens. If the table is, you delete your data. There's no harm in generating an error here.
In a PostgreSQL db I'm working on, half of the tables have one particular column, always named the same, that is of type varchar(5). The size became a bit too restricting and I want to change it to varchar(10).
The number of tables in my particular case is actually very manageable to do it by hand. But I was wondering how one could script this with a query for larger dbs. It generally should be possible in just a few steps.
Identify all the tables in the schema, then (?) filter by condition if column present.
Create ALTER TABLE statements for each table found
I have some idea about how to write a query that identifies all tables in the schema. But I wouldn't know how to filter them. And if I didn't filter them, I assume the generated alter table statements would break.
Would be great if someone could share their knowledge on this.
Thanks to Abelisto for providing some guidance. Eventually, this is how I did it.
First, I created a query that in turn creates the ALTER TABLE statements. MyDB and MyColumn need to reflect actual values.
SELECT
'ALTER TABLE '||columns.table_name||' ALTER COLUMN '||MyColumn||' TYPE varchar(20);'
FROM
information_schema.columns
WHERE
columns.table_catalog = 'MyDB' AND
columns.table_schema = 'public' AND
columns.column_name = 'MyColumn';
Then it was just a matter of executing the output as a new query. All done.
I am working with hive version 0.9 and I need delete columns of a hive table. I have searched in several manuals of hive commands but I only I have found commands to version 0.14. Is possible to delete a column of a hive table in hive version 0.9? What is the command?
Thanks.
We can’t simply drop a table column from a hive table using the below statement like sql.
ALTER TABLE tbl_name drop column column_name ---- it will not work.
So there is a shortcut to drop columns from a hive table.
Let’s say we have a hive table.
From this table I want to drop the column Dob. You can use the ALTER TABLE REPLACE statement to drop a column.
ALTER TABLE test_tbl REPLACE COLUMNS(ID STRING,NAME STRING,AGE STRING); you have to give the column names which you want to keep in the table
There isn't a drop column or delete column in Hive.
A SELECT statement can take regex-based column specification in Hive releases prior to 0.13.0, or in 0.13.0 and later releases if the configuration property hive.support.quoted.identifiers is set to none.
That being said you could create a new table or view using the following:
drop table if exists database.table_name;
create table if not exists database.table_name as
select `(column_to_remove_1|...|column_to_remove_N)?+.+`
from database.some_table
where
...
;
This will create a table that has all the columns from some_table except the columns named column_to_remove_1, ... , to column_to_remove_N. You can also choose to create a view instead.
ALTER TABLE table_name REPLACE COLUMNS ( c1 int, c2 String);
NOTE: eliminate column from column list. It will keep matched columns and removed unmentioned columns from table schema.
we can not delete column from hive table . But droping a table(if its external) in hive and the recreating table(with column excluded) ,wont delete ur data .
so what can u do is(if u dont have table structure) run this command :
show create table database_name.table_name;
Then you can copy it and edit it (with column eliminated).Afterwards you can do as per invoke the shell
table details are empid,name,dept,salary ,address. i want remove address column. Just write REPLACE COLUMNS like below query
jdbc:hive2://> alter table employee replace columns(empid int, name string,dept string,salary int);
As mentioned before, you can't drop table using an alter statement.
Alter - replace is not guaranteed to work in all the cases.
I found the best answer for this here:
https://stackoverflow.com/a/48921280/4385453
I need to duplicate selected rows with all the fields exactly same except ID ident int which is added automatically by SQL.
What is the best way to duplicate/clone record or records (up to 50)?
Is there any T-SQL functionality in MS SQL 2008 or do I need to select insert in stored procedures ?
The only way to accomplish what you want is by using Insert statements which enumerate every column except the identity column.
You can of course select multiple rows to be duplicated by using a Select statement in your Insert statements. However, I would assume that this will violate your business key (your other unique constraint on the table other than the surrogate key which you have right?) and require some other column to be altered as well.
Insert MyTable( ...
Select ...
From MyTable
Where ....
If it is a pure copy (minus the ID field) then the following will work (replace 'NameOfExistingTable' with the table you want to duplicate the rows from and optionally use the Where clause to limit the data that you wish to duplicate):
SELECT *
INTO #TempImportRowsTable
FROM (
SELECT *
FROM [NameOfExistingTable]
-- WHERE ID = 1
) AS createTable
-- If needed make other alterations to the temp table here
ALTER TABLE #TempImportRowsTable DROP COLUMN Id
INSERT INTO [NameOfExistingTable]
SELECT * FROM #TempImportRowsTable
DROP TABLE #TempImportRowsTable
If you're able to check the duplication condition as rows are inserted, you could put an INSERT trigger on the table. This would allow you to check the columns as they are inserted instead of having to select over the entire table.