Redshift - Adding a column, do we have to change our previous CSVs to include it? - amazon-redshift

I currently have a redshift table in our database that has 10 columns, and I want to add another. It's trivial to do an alter table to do this.
My question - When I do this, will all my old CSV files fail to insert into redshift (via COPY from S3) given they won't have this new column?
I was hoping the columns would just be NULL vs. it failing on import, but I haven't seen any documentation on this.
Ideally I wish I could specify the actual column name in the header row of the CSV, but I haven't seen if that is possible anywhere.

FILLRECORD in COPY command does that: 'Allows data files to be loaded when contiguous columns are missing at the end of some of the records'.

Related

Importing issue in postgresql using pgadmin 4

The file is not importing after having created a table. The first line of code is for the table (COPY), the second line of code is for the path of the file (FROM) and the WITH I am not entirely sure if there's a prior line of code that needs to be entered for its success as its not being highlighted in pink. The importing should be going through in either the built-in tool of pgAdmin or the syntax but neither of them generates the needed output. Here are some screenshots:
So I did another table, this time focusing on a single column and ensuring that the name of the column matched on both the table and the file and it worked. The prior example had several columns that had difference in spellings of the column content in table and the file:
You can try this sequentially...
1. First create csv file. .csv file column sequence is most important.
2. Consider the below employee_info.csv file
And consider your database table employee_info table which contain (emp_id [numeric],emp_name[character],emp_sal[numeric],emp_loc [character])
Then Execute the below query
a. copy employee_info(emp_id,emp_name,emp_sal,emp_loc) from 'C:\Users\Zbook\Desktop\employee_info.csv' DELIMITERS ',' CSV;
Note: Ensure that each .csv file row value has not null. Like below...

How can I ensure Redshift Unload Copy columns are in correct order?

I am trying to use the UnloadCopyUtility to migrate an instance to an encrypted instance but some of the tables fail because it is trying to insert the values into the wrong columns. Is there are a way I can ensure the columns are mapped to the values correctly? I can adjust the python script locally if need be
I feel, this should be possible in UnloadCopy utility as well.
But here I'm trying to answer more of generic solution withput UnloadCopy utility, so that it may be helpful to others as an alternate solution.
In unload command you could specify the columns like C1,C2,C3,...
Use same sequence columns in copy command while loading data in RedShift.
Unload command example.
unload ('select C1,C2,C3,... from venue') to 's3://mybucket/tickit/unload/venue_' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' parallel off;
Copy command example with specific columns sequence of above unloaded files.
copy table (C1,C2,C3,...) from 's3://<your-bucket-name>/load/key_prefix' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' options;

PostgreSQL: Import columns into table, matching key/ID

I have a PostgreSQL database. I had to extend an existing, big table with a few more columns.
Now I need to fill those columns. I tought I can create an .csv file (out of Excel/Calc) which contains the IDs / primary keys of existing rows - and the data for the new, empty fields. Is it possible to do so? If it is, how to?
I remember doing exactly this pretty easily using Microsoft SQL Management Server, but for PostgreSQL I am using PG Admin (but I am ofc willing to switch the tool if it'd be helpfull). I tried using the import function of PG Admin which uses the COPY function of PostgreSQL, but it seems like COPY isn't suitable as it can only create whole new rows.
Edit: I guess I could write a script which loads the csv and iterates over the rows, using UPDATE. But I don't want to reinvent the wheel.
Edit2: I've found this question here on SO which provides an answer by using a temp table. I guess I will use it - although it's more of a workaround than an actual solution.
PostgreSQL can import data directly from CSV files with COPY statements, this will however only work, as you stated, for new rows.
Instead of creating a CSV file you could just generate the necessary SQL UPDATE statements.
Suppose this would be the CSV file
PK;ExtraCol1;ExtraCol2
1;"foo",42
4;"bar",21
Then just produce the following
UPDATE my_table SET ExtraCol1 = 'foo', ExtraCol2 = 42 WHERE PK = 1;
UPDATE my_table SET ExtraCol1 = 'bar', ExtraCol2 = 21 WHERE PK = 4;
You seem to work under Windows, so I don't really know how to accomplish this there (probably with PowerShell), but under Unix you could generate the SQL from a CSV easily with tools like awk or sed. An editor with regular expression support would probably suffice too.

can a modify stage bulk rename columns based on wildcards?

I need to modify columns based on business rules using RCP. For example, all source columns that end with '_ID' must be changed to '_KEY' to meet the target.
An example: Test_ID in source becomes Test_KEY in target
I have multiple tables, some with 2 "ID" columns, and some with 20. Is there a way to configure the modify stage to bulk rename columns based on wildcard?
If not, is there another way?
Thanks.
I doubt that there is an option using modify stage with wildcards for this.
One alternative could be a schema file which can be used with any of following stages:
Sequential File, File Set, External Source, External Target, Column Import, Column Export
This schema file could also be generated or modified to adjust the column names as needed.
Another way could be to generate the appropriate SQL statement if the data resides in a database or is written to one.

Dump subset of records in an OpenEdge database table in the ".d" file format

I am looking for the easiest way to manually dump a subset of records in an OpenEdge database table in the Progress ".d" file format.
The best way I can imagine is creating an extra test database with the identical schema as the source database, and then copying the subset of records over to the test database using FOR EACH and BUFFER-COPY statements. Then just export the data from the test database using the Dump Data and Definitions Table Contens (.d file )... menu option.
That seems like a lot of trouble. If you can identify the subset of records in order to do the BUFFER-COPY than you should also be able to:
OUTPUT TO VALUE( "table.d" ).
FOR EACH table NO-LOCK WHERE someCondition:
EXPORT table.
END.
OUTPUT CLOSE.
Which is, essentially, what the dictionary "dump data" .d file is less a few lines of administrivia at the bottom which can be safely omitted for most purposes.