AWS Redshift Bulk Insert + Encoding definition

AWS Redshift Bulk Insert + Encoding definition - postgresql

Is it possible to do a bulk insert into REdshift using the create table as syntax while defining data type and encoding at the same time? What's the correct syntax?
EG The following gives a syntax error near 'as':
create table my_table (
a int not null encode runlength,
b int not null encode runlength
) distkey(a) sortkey (a, b) as (
select * from other_table
);
I can only get it to work by defining column name only (a or b) and that's a huge limitation...

You can specify the DIST and SORT keys in a CREATE TABLE … AS query like this:
CREATE TABLE new_table
DISTSTYLE KEY
DISTKEY ( my_dist )
SORTKEY ( my_sort )
AS
SELECT *
FROM old_table
;
As per the docs: http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_AS.html I don't believe you can alter the compression encoding from the source table using CREATE TABLE AS.

More details on Redshift CTAS is given here: http://docs.aws.amazon.com/redshift/latest/dg/r_CTAS_usage_notes.html . In a nutshell, no where its mentioned that you can define the encoding in the CTAS statement. But you can define Sort Keys and Hash Keys. The default encoding chosesn by this statement is none.
However if you want to do a bulk insert, why don't you do in two steps.
Create table new_table with your encoding and sort/hash keys
Insert into new_table as select * from old_table

Related

Postgres Sequence Value Not Incrementing After Inserting Records

I have the following table:
DROP TABLE IF EXISTS TBL_CACL;
CREATE TABLE TBL_CACL (
ID INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
NAME VARCHAR(250) UNIQUE NOT NULL
);
I am able to query postgres likes this:
SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
to determine that the default name for the sequence on the table is tbl_cacl_id_seq
I can then read its nextVal like this:
SELECT nextval('tbl_cacl_id_seq');
Directly after creating the table the value is '1'
However, if I insert several rows of data like this:
INSERT INTO TBL_CACL
VALUES
(1, 'CACL_123'),
(2, 'CACL_234'),
(3, 'CACL_345');
and I read nextVal for the table, it returns '2'
I would have thought that tbl_cacl_id_seq would be '4'. Clearly I misunderstanding how the insert is related to the nextVal. Why is the sequence out of sync with the inserts and how do I get them in sequence? Thanks for any advice.

The tbl_cacl_id_seq gets incremented only when the function nextval('tbl_cacl_id_seq') is called. This function call will not happen since you have provided values for ID column during insert (hence no need to get the default value).

Good Pattern for executing SELECT after INSERT with Amazon Aurora Postgres?

Imagine you have an Amazon Aurora Postgres DB. You perform an INSERT into one table. You then need do a SELECT to get the auto-generated CompanyId of the newly added record. You determine that there is often a significant enough delay between when the INSERT occurs and when the record is available to run the SELECT on.
I've discussed with my colleagues some possible code patterns to best handle this lag time. What, in your opinion, is the best approach?

You don't need a separate SELECT statement. The best and most efficient option is to just use the returning clause:
insert into some_table (c1, c2, c3)
values (...)
returning *;
Instead of returning * you can also specify the column you want, e.g.: returning company_id
Another other option is to use currval() or lastval() after the insert to the get the value of the sequence directly:
insert into some_table (..)
values (...);
select lastval();
The usage of lastval() requires that no other value is generated by a different sequence between the INSERT and the SELECT. If you can't guarantee that, use currval() and specify the name of the sequence:
insert into some_table (...)
values (...);
select currval('some_table_company_id_seq');
If you want to avoid hardcoding the sequence name, use pg_get_serial_sequence()
select currval(pg_get_serial_sequence('some_table', 'company_id'));

How to Update/Insert (Upsert) for Multiple Values DB2

I am trying to do an UPSERT in DB2 9.7 without creating a temporary table to merge. I am specifying values as parameters, however I'm always getting a syntax error for the comma separating the values when I try to include more than one row of values.
MERGE INTO table_name AS tab
USING (VALUES
(?,?),
(?,?)
) AS merge (COL1, COL2)
ON tab.COL1 = merge.COL1
WHEN MATCHED THEN
UPDATE SET tab.COL1 = merge.COL1,
tab.COL2 = merge.COL2
WHEN NOT MATCHED THEN
INSERT (COL1, COL2)
VALUES (merge.COL1, merge.COL2)
I have also tried teknopaul's answer from Does DB2 have an “insert or update” statement, but have received another syntax error complaining about the use of SELECT.
Does anybody know how to correctly include a table with values in my merge, without actually creating/dropping one on the database?

I believe you need something like USING (SELECT * FROM VALUES ( ...) ) AS ...

Postgres: Can I create an index to use in the SELECT clause?

I have defined a function that determines the timezone from table tz_world for a set of lon, lat values:
create function get_timezone(numeric, numeric)
returns character varying(30) as $$
select tzid from tz_world where ST_Contains(geom, ST_MakePoint($1, $2));
$$ language SQL immutable;
Now I would like to use this function in the SELECT clause of a query on a different table:
select get_timezone(lon, lat) from event where...;
The function is rather slow, so I tried using an index to speed things up:
create index event_timezone_idx on event (get_timezone(event.lon, event.lat));
While this speeds up queries where the function is used in the WHERE clause, it has no effect on the variant above where get_timezone(lon, lat) is used in the SELECT clause.
Is it possible to rephrase the query and/or index to speed up the timezone determination?
Update
Thank you for the answers!! I decided to include an extra column for the timezone in the end and populate it when creating/updating the events.

I would recommend you create a local temporary table of the part of the select where you want to create the index on and then create an index on the temporary one:
CREATE LOCAL TEMPORARY TABLE temp_table AS (
select
.
.
.
);
CREATE INDEX temp_table idx
ON temp_table
USING btree
(col1,col2,....);
Otherwise write what you want your WHERE condition to be, indexes only work on WHERE clauses and values for the index should be exactly the ones you are trying to filter on.

Getting the primary key after row insertion using redshift

I'm using postgresql 8.0.2 with amazon redshift and I'm trying to set up a INSERT command that also returns the PRIMARY KEY.
I was originally trying to do the following:
with get_connection() as conn:
with conn.cursor() as cur:
cur.execute('INSERT INTO orders (my_id, my_amount) \
VALUES (%s, %s) RETURNING row_id;', (some_id, some_amount))
conn.commit()
However, the RETURNING command only works on postgresql 8.2 and above.
I saw that currval might be a possible way to get this to work, but I read that it requires a sequence object.
I'm trying to insert the following schema
CREATE SEQUENCE order_seq;
CREATE TABLE IF NOT EXISTS orders
(
order_id INTEGER IDENTITY(1,1) PRIMARY KEY DISTKEY,
)
Then do:
with get_connection() as conn:
with conn.cursor() as cur:
cur.execute('INSERT INTO orders (my_id, my_amount) \
VALUES (%s, %s);', (some_id, some_amount))
conn.commit()
cur.execute('SELECT currval();')
row_id = cursor.fetchone()[0]
UPDATE: Sequence objects are not supported by redshift either. I feel like this is a pretty basic procedure but there is no easy way to get a reference to the current row.

Just define your column as:
order_id INTEGER PRIMARY KEY DISTKEY
And with your sequence created order_seq use this as insert command:
cur.execute('INSERT INTO orders (order_id, my_id, my_amount) \
VALUES (nextval(''order_seq''), %s, %s);', (some_id, some_amount))
Since you are using a sequence you have to add the field on the insert command to use the nextval properly.
And to retrieve current sequence value do as follow:
cur.execute('SELECT currval(''order_seq'')')
row_id = cursor.fetchone()[0]
I'm not familirized with the language you are using so you may have to change the syntaxe to scape the double quotes I use.
The syntaxe of nextval and currval is like: nextval('sequenceName') and currval('sequenceName')
So if it does not support sequences the only way I see that it could solve your issue is following this steps:
Open a transaction (so others wont get the same id)
fetch max id of your table like select max(order_id) from orders into a variable
use this value on the insert as it was the sequence.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

AWS Redshift Bulk Insert + Encoding definition - postgresql

Related

Postgres Sequence Value Not Incrementing After Inserting Records

Good Pattern for executing SELECT after INSERT with Amazon Aurora Postgres?

How to Update/Insert (Upsert) for Multiple Values DB2

Postgres: Can I create an index to use in the SELECT clause?

Getting the primary key after row insertion using redshift

Categories

Resources