I'm trying to copy a portion of the data from a DynamoDB table to Redshift.
I know that I can use something like the following to copy the entire table and it's working:
COPY some_table from 'dynamodb://some_ddb_table'
credentials 'aws_iam_role=some_role' readratio 50;
However I'm looking for something like the following to copy only a portion of the data (where X=Y in this case):
COPY some_table from 'dynamodb://some_ddb_table where X=Y'
credentials 'aws_iam_role=some_role' readratio 50;
Is there a way to do this?
I am quite new to this:
My destination is a postgres table, and I want to update two fields (col1, col2) base on a column value from another sql server table (when postgres_table.a = sqlserver_table.b).
I know this could be easily realized by using OLEDB Command, however, since my destination table is a postgres table that I used ODBC to connect, the OLEDB Command won't work for this case.
Any thoughts on this?
It's a bit hacky, but how about using Foreach loop and Execute SQL task?
So first you read in the values to an object variable (use the Execute SQL -tas for this). Then use that variable as the source for Foreach loop and use another Execute SQL -task inside the loop to send an update to Postgres with correct values.
In SQL stored procedures, we have an option of creating a temporary table "#temp" whose structure is as that of another table that it is referring to. Here we don't explicitly create and mention the structure of "#temp" table.
Do we have similar option is HQL Hive script to create a temp table during run time without actually creating the table structure. Thus I can dump data to temp table and use it. Below code shows an example of #temp table in SQL.
SELECT name, age, gender
INTO #MaleStudents
FROM student
WHERE gender = 'Male'
Hive has the concept of temporary tables, which are local to a user's session. These tables behave just like any other table, and can be created using CTAS commands too. Hive automatically deletes all temporary tables at the end of the Hive session in which they are created.
Read more about them here.
Hive Documentation
You can create simple temporary table. On this table you can perform any operation.
Once you are done with work and log out of your session they will be deleted automatically.
Syntax for temporary table is :
As Redshift is based on PostgreSQL, does it have an option to overwrite or append data in table while copying from S3 to redshift?
Only thing I got is use of triggers but they don't accept any argument.
All I need to write a script which takes an argument as yes/no (or similar) if the data is already in the table.
When loading data from Amazon S3 into Amazon Redshift using the COPY command, data is appended to the target table.
Redshift does not have an "overwrite" option. If you wish to replace existing data with the data being loaded, you could:
Load the data into a temporary table
Delete rows in the main table that match the incoming data, eg:
DELETE FROM main-table WHERE id IN (SELECT id from temp-table)
Copy the rows from the temporary table to the main table, eg:
SELECT * FROM temp-table INTO main-table
See: Updating and Inserting New Data
Redshift doesn't allow you to create triggers or events like other sql databases, the solution I found is to run update (sql query)though you can use also Python or other language and schedule the Rscript with crontab task.
As of May 2019, Redshift supports stored procedures so you can package up a set of queries/statements like this:
CREATE OR REPLACE PROCEDURE public.copy_and_cleanse_data(overwrite bool)
AS $$
if overwrite IS TRUE THEN DELETE FROM myredshifttable;
copy myredshifttable
from 's3://awssampledbuswest2/tickit/category_pipe.txt'
iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
region 'us-west-2';
UPDATE myredshifttable SET myfield = REPLACE(myfield, 'foo', 'bar');
$$ LANGUAGE plpgsql
Then use or schedule the following query:
CALL public.copy_and_cleanse_data()
I want to dump a subset of a table of my postgres database. Is there a way to dump a SELECT statement without creating a view?
I need to copy a part of the table to an other postgres database.
Use COPY to dump it directly to disk.
Example (from the fine manual) using a SELECT:
(SELECT * FROM country WHERE country_name LIKE 'A%')
TO '/usr1/proj/bray/sql/a_list_countries.copy';
I'm wondering if I can use a trigger on a table to "ignore" columns that are in a COPY statement from STDIN but which are not in the target table. Sorry if the wording/syntax of the question is off, but here is and explanation of what I'm trying to say. I'm new to triggers so any advice is helpful.
I'm using the PostGIS Shapefile importer to copy shapefiles to the spatial tables in my PostgreSQL database.
This creates a COPY statement which contains all the fields in the shapefile something like:
COPY "public"."stations" ("column1","column2","column3","column4", geom) FROM stdin;
column1 and column2 are in the file but not in the target table, so the COPY fails.
Is there a way to create a trigger to create something that would have the same result as:
COPY "public"."stations" ("column3","column4", geom) FROM stdin;
No, you cannot skip columns that are present in the input file. This will error out, before triggers are even invoked. And you cannot use rules either. I quote the manual:
COPY FROM will invoke any triggers and check constraints on the
destination table. However, it will not invoke rules.
You can either edit the file or use a temporary staging table:
COPY to a temporary table with matching columns.
Use INSERT to write the desired columns to the final target table(s) - or the whole range of SQL DDL commands for more sophisticated matters.